We present an algorithm to extract flow graphs from Java bytecode, including exceptional control flows. We prove its correctness, meaning that the behavior of the extracted control-flow graph is a sound over-approximation of the behavior of the original program. Thus any safety property that holds for the extracted control-flow graph also holds for the original program. This makes control-flow graphs suitable for performing various static analyses, such as model checking.The extraction is performed in two phases. In the first phase the program is transformed into a BIR program, a stack-less intermediate representation of Java bytecode, from which the control-flow graph is extracted in the second phase. We use this intermediate format because it results in compact flow graphs, with provably correct exceptional control flow. To prove the correctness of the two-phase extraction, we also define an idealized extraction algorithm, whose correctness can be proven directly. Then we show that the behavior of the control-flow graph extracted via the intermediate representation is an over-approximation of the behavior of the directly extracted graphs, and thus of the original program. We implemented the indirect extraction as the CFGEx tool and performed several test-cases to show the efficiency of the algorithm.
QC 20120116