Sound Modular Extraction of Control Flow Graphs from Java Bytecode
2012 (English)Licentiate thesis, monograph (Other academic)
Control flow graphs (CFGs) are abstract program models that preserve the control flow information. They have been widely utilized for many static analyses in the past decades. Unfortunately, previous studies about the CFG construction from modern languages, such as Java, have either neglected advanced features that influence the control flow, or do not provide a correctness argument. This is a bearable issue for some program analyses, but not for formal methods, where the soundness of CFGs is a mandatory condition for the verification of safety-critical properties. Moreover, when developing open systems, i.e., systems in which at least one component is missing, one may want to extract CFGs to verify the available components. Soundness is even harder to achieve in this scenario, because of the unknown inter-dependencies involving missing components. In this work we present two variants of a CFG extraction algorithm from Java bytecode considering precise exceptional flow, which are sound w.r.t to the JVM behavior. The first algorithm extracts CFGs from fully-provided (closed) programs only. It proceeds in two phases. Initially the Java bytecode is translated into a stack-less intermediate representation named BIR, which provides explicit representation of exceptions, and is more compact than the original bytecode. Next, we define the transformation from BIR to CFGs, which, among other features, considers the propagation of uncaught exceptions within method calls. We then establish its correctness: the behavior of the extracted CFGs is shown to be a sound over-approximation of the behavior of the original programs. Thus, temporal safety properties that hold for the CFGs also hold for the program. We prove this by suitably combining the properties of the two transformations with those of a previous idealized CFG extraction algorithm, whose correctness has been proven directly. The second variant of the algorithm is defined for open systems. We generalize the extraction algorithm for closed systems for a modular set-up, and resolve inter-dependencies involving missing components by using user-provided interfaces. We establish its correctness by defining a refinement relation between open systems, which constrains the instantiation of missing components. We prove that if the relation holds, then the CFGs extracted from the components of the original open system are sound over-approximations of the CFGs for the same components in the refined system. Thus, temporal safety properties that hold for an open system also hold for closed systems that refine it. We have implemented both algorithms as the ConFlEx tool. It uses Sawja, an external library for the static analysis of Java bytecode, to transform bytecode into BIR, and to resolve virtual method calls. We have extended Sawja to support open systems, and improved its exception type analysis. Experimental results have shown that the algorithm for closed systems generates more precise CFGs than the modular algorithm. This was expected, due to the heavy over-approximations the latter has to perform to be sound. Also, both algorithms are linear in the number of bytecode instructions. Therefore, ConFlEx is efficient for the extraction of CFGs from either open, or closed Java bytecode programs.
Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2012. , vii, 74 p.
Trita-CSC-A, ISSN 1653-5723 ; 2012:16
Software Verification, Static Analysis, Program Models, Compositional Verification
IdentifiersURN: urn:nbn:se:kth:diva-105275ISBN: 978-91-7501-570-5OAI: oai:DiVA.org:kth-105275DiVA: diva2:570503
2012-12-12, D3, Lindstedtsvägen 5, KTH, Stockholm, 13:00 (English)
Ahrendt, Wolfgang, Senior Lecturer
Gurov, Dilian, Docent
QC 201211222012-11-222012-11-192012-11-22Bibliographically approved