Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 16) Show all publications
Ye, H., Martinez, M., Durieux, T. & Monperrus, M. (2019). A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark. In: Cheung, SC Sun, X Zhang, T (Ed.), 2019 IEEE 1ST INTERNATIONAL WORKSHOP ON INTELLIGENT BUG FIXING (IBF '19): . Paper presented at 1st IEEE International Workshop on Intelligent Bug Fixing (IBF), Hangzhou, PEOPLES R CHINA, FEB 24, 2019 (pp. 1-10). IEEE
Open this publication in new window or tab >>A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark
2019 (English)In: 2019 IEEE 1ST INTERNATIONAL WORKSHOP ON INTELLIGENT BUG FIXING (IBF '19) / [ed] Cheung, SC Sun, X Zhang, T, IEEE , 2019, p. 1-10Conference paper, Published paper (Refereed)
Abstract [en]

Automatic program repair papers tend to repeatedly use the same benchmarks. This poses a threat to the external validity of the findings of the program repair research community. In this paper, we perform an automatic repair experiment on a benchmark called QuixBugs that has never been studied in the context of program repair. In this study, we report on the characteristics of QuixBugs, and study five repair systems, Arja, Astor, Nopol, NPEfix and RSRepair, which are representatives of generate-and-validate repair techniques and synthesis repair techniques. We propose three patch correctness assessment techniques to comprehensively study overfitting and incorrect patches. Our key results are: 1) 15 / 40 buggy programs in the QuixBugs can be repaired with a test-suite adequate patch; 2) a total of 64 plausible patches for those 15 buggy programs in the QuixBugs are present in the search space of the considered tools; 3) the three patch assessment techniques discard in total 33 / 64 patches that are overfitting. This sets a baseline for future research of automatic repair on QuixBugs. Our experiment also highlights the major properties and challenges of how to perform automated correctness assessment of program repair patches. All experimental results are publicly available on Github in order to facilitate future research on automatic program repair.

Place, publisher, year, edition, pages
IEEE, 2019
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-252426 (URN)10.1109/IBF.2019.8665475 (DOI)000467268600001 ()2-s2.0-85063931398 (Scopus ID)
Conference
1st IEEE International Workshop on Intelligent Bug Fixing (IBF), Hangzhou, PEOPLES R CHINA, FEB 24, 2019
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20190614

Available from: 2019-06-14 Created: 2019-06-14 Last updated: 2019-06-26Bibliographically approved
Yu, Z., Martinez, M., Danglot, B., Durieux, T. & Monperrus, M. (2019). Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system. Journal of Empirical Software Engineering, 24(1), 33-67
Open this publication in new window or tab >>Alleviating patch overfitting with automatic test generation: a study of feasibility and effectiveness for the Nopol repair system
Show others...
2019 (English)In: Journal of Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 24, no 1, p. 33-67Article in journal (Refereed) Published
Abstract [en]

Among the many different kinds of program repair techniques, one widely studied family of techniques is called test suite based repair. However, test suites are in essence input-output specifications and are thus typically inadequate for completely specifying the expected behavior of the program under repair. Consequently, the patches generated by test suite based repair techniques can just overfit to the used test suite, and fail to generalize to other tests. We deeply analyze the overfitting problem in program repair and give a classification of this problem. This classification will help the community to better understand and design techniques to defeat the overfitting problem. We further propose and evaluate an approach called UnsatGuided, which aims to alleviate the overfitting problem for synthesis-based repair techniques with automatic test case generation. The approach uses additional automatically generated tests to strengthen the repair constraint used by synthesis-based repair techniques. We analyze the effectiveness of UnsatGuided: 1) analytically with respect to alleviating two different kinds of overfitting issues; 2) empirically based on an experiment over the 224 bugs of the Defects4J repository. The main result is that automatic test generation is effective in alleviating one kind of overfitting, issue-regression introduction, but due to oracle problem, has minimal positive impact on alleviating the other kind of overfitting issue-incomplete fixing.

Place, publisher, year, edition, pages
SPRINGER, 2019
Keywords
Program repair, Synthesis-based repair, Patch overfitting, Automatic test case generation
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-245146 (URN)10.1007/s10664-018-9619-4 (DOI)000458634400002 ()2-s2.0-85046832140 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20190611

Available from: 2019-03-13 Created: 2019-03-13 Last updated: 2019-06-11Bibliographically approved
Martinez, M. & Monperrus, M. (2019). Astor: Exploring the design space of generate-and-validate program repair beyond GenProg. Journal of Systems and Software, 151, 65-80
Open this publication in new window or tab >>Astor: Exploring the design space of generate-and-validate program repair beyond GenProg
2019 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 151, p. 65-80Article in journal (Refereed) Published
Abstract [en]

This article contributes to defining the design space of program repair. Repair approaches can be loosely characterized according to the main design philosophy, in particular “generate- and-validate” and synthesis-based approaches. Each of those repair approaches is a point in the design space of program repair. Our goal is to facilitate the design, development and evaluation of repair approaches by providing a framework that: a) contains components commonly present in most approaches, b) provides built-in implementations of existing repair approaches. This paper presents a Java framework named Astor that focuses on the design space of generate-and-validate repair approaches. The key novelty of Astor is to provides explicit extension points to explore the design space of program repair. Thanks to those extension points, researchers can both reuse existing program repair components and implement new ones. Astor includes 6 unique implementations of repair approaches in Java, including GenProg for Java called jGenProg. Researchers have already defined new approaches over Astor. The implementations of program repair approaches built already available in Astor are capable of repairing, in total, 98 real bugs from 5 large Java programs. Astor code is publicly available on Github: https://github.com/SpoonLabs/astor.

Place, publisher, year, edition, pages
Elsevier, 2019
Keywords
Automated Program Repair, Defects, Evaluation Frameworks, Software Bugs, Software Maintenance, Software Testing
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-246440 (URN)10.1016/j.jss.2019.01.069 (DOI)000462105200005 ()2-s2.0-85061161484 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20190329

Available from: 2019-03-29 Created: 2019-03-29 Last updated: 2019-06-24Bibliographically approved
Madeiral, F., Urli, S., Maia, M. & Monperrus, M. (2019). BEARS: An Extensible Java Bug Benchmark for Automatic Program Repair Studies. In: SANER 2019 - Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering: . Paper presented at 26th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2019, 24 February 2019 through 27 February 2019 (pp. 468-478). Institute of Electrical and Electronics Engineers (IEEE), Article ID 8667991.
Open this publication in new window or tab >>BEARS: An Extensible Java Bug Benchmark for Automatic Program Repair Studies
2019 (English)In: SANER 2019 - Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 468-478, article id 8667991Conference paper, Published paper (Refereed)
Abstract [en]

Benchmarks of bugs are essential to empirically evaluate automatic program repair tools. In this paper, we present BEARS, a project for collecting and storing bugs into an extensible bug benchmark for automatic repair studies in Java. The collection of bugs relies on commit building state from Continuous Integration (CI) to find potential pairs of buggy and patched program versions from open-source projects hosted on GitHub. Each pair of program versions passes through a pipeline where an attempt of reproducing a bug and its patch is performed. The core step of the reproduction pipeline is the execution of the test suite of the program on both program versions. If a test failure is found in the buggy program version candidate and no test failure is found in its patched program version candidate, a bug and its patch were successfully reproduced. The uniqueness of Bears is the usage of CI (builds) to identify buggy and patched program version candidates, which has been widely adopted in the last years in open-source projects. This approach allows us to collect bugs from a diversity of projects beyond mature projects that use bug tracking systems. Moreover, BEARS was designed to be publicly available and to be easily extensible by the research community through automatic creation of branches with bugs in a given GitHub repository, which can be used for pull requests in the BEARS repository. We present in this paper the approach employed by BEARS, and we deliver the version 1.0 of BEARS, which contains 251 reproducible bugs collected from 72 projects that use the Travis CI and Maven build environment.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
Java programming language, Open source software, Pipelines, Reengineering, Repair, Software testing, Automatic creations, Automatic programs, Bug tracking system, Continuous integrations, Open source projects, Repair tools, Research communities, Test failure, Program debugging
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-252215 (URN)10.1109/SANER.2019.8667991 (DOI)000469754100045 ()2-s2.0-85064151415 (Scopus ID)9781728105918 (ISBN)
Conference
26th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2019, 24 February 2019 through 27 February 2019
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20190611

Available from: 2019-06-11 Created: 2019-06-11 Last updated: 2019-06-26Bibliographically approved
White, M., Tufano, M., Martinez, M., Monperrus, M. & Poshyvanyk, D. (2019). Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities. In: SANER 2019 - Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering: . Paper presented at 26th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2019; Hangzhou; China; 24 February 2019 through 27 February 2019 (pp. 479-490). Institute of Electrical and Electronics Engineers (IEEE), Article ID 8668043.
Open this publication in new window or tab >>Sorting and Transforming Program Repair Ingredients via Deep Learning Code Similarities
Show others...
2019 (English)In: SANER 2019 - Proceedings of the 2019 IEEE 26th International Conference on Software Analysis, Evolution, and Reengineering, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 479-490, article id 8668043Conference paper, Published paper (Refereed)
Abstract [en]

In the field of automated program repair, the redundancy assumption claims large programs contain the seeds of their own repair. However, most redundancy-based program repair techniques do not reason about the repair ingredients-The code that is reused to craft a patch. We aim to reason about the repair ingredients by using code similarities to prioritize and transform statements in a codebase for patch generation. Our approach, DeepRepair, relies on deep learning to reason about code similarities. Code fragments at well-defined levels of granularity in a codebase can be sorted according to their similarity to suspicious elements (i.e., code elements that contain suspicious statements) and statements can be transformed by mapping out-of-scope identifiers to similar identifiers in scope. We examined these new search strategies for patch generation with respect to effectiveness from the viewpoint of a software maintainer. Our comparative experiments were executed on six open-source Java projects including 374 buggy program revisions and consisted of 19,949 trials spanning 2,616 days of computation time. Deep-Repair's search strategy using code similarities generally found compilable ingredients faster than the baseline, jGenProg, but this improvement neither yielded test-Adequate patches in fewer attempts (on average) nor found significantly more patches (on average) than the baseline. Although the patch counts were not statistically different, there were notable differences between the nature of DeepRepair patches and jGenProg patches. The results show that our learning-based approach finds patches that cannot be found by existing redundancy-based repair techniques.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
code clones, deep learning, language models, neural networks, program repair, software testing and debugging, Codes (symbols), Open source software, Program debugging, Redundancy, Reengineering, Repair, Software testing, Code clone, Code similarities, Comparative experiments, Language model, Learning-based approach, Repair techniques, Search strategies
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-252216 (URN)10.1109/SANER.2019.8668043 (DOI)000469754100046 ()2-s2.0-85064149756 (Scopus ID)9781728105918 (ISBN)
Conference
26th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2019; Hangzhou; China; 24 February 2019 through 27 February 2019
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20190611

Available from: 2019-06-11 Created: 2019-06-11 Last updated: 2019-06-26Bibliographically approved
Papoudakis, G., Preux, P. & Monperrus, M. (2018). A generative model for sparse, evolving digraphs. In: 6th International Conference on Complex Networks and Their Applications, Complex Networks 2017: . Paper presented at 6th International Conference on Complex Networks and Their Applications, Complex Networks 2017, Lyon, France, 29 November 2017 through 1 December 2017 (pp. 531-542). Springer, 689
Open this publication in new window or tab >>A generative model for sparse, evolving digraphs
2018 (English)In: 6th International Conference on Complex Networks and Their Applications, Complex Networks 2017, Springer, 2018, Vol. 689, p. 531-542Conference paper (Refereed)
Abstract [en]

Generating graphs that are similar to real ones is an open problem, while the similarity notion is quite elusive and hard to formalize. In this paper, we focus on sparse digraphs and propose SDG, an algorithm that aims at generating graphs similar to real ones. Since real graphs are evolving and this evolution is important to study in order to understand the underlying dynamical system, we tackle the problem of generating series of graphs. We propose SEDGE, an algorithm meant to generate series of graphs similar to a real series. SEDGE is an extension of SDG. We consider graphs that are representations of software programs and show experimentally that our approach outperforms other existing approaches. Experiments show the performance of both algorithms.

Place, publisher, year, edition, pages
Springer, 2018
Series
Studies in Computational Intelligence, ISSN 1860-949X ; 689
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-220199 (URN)10.1007/978-3-319-72150-7_43 (DOI)2-s2.0-85036642634 (Scopus ID)9783319721491 (ISBN)
Conference
6th International Conference on Complex Networks and Their Applications, Complex Networks 2017, Lyon, France, 29 November 2017 through 1 December 2017
Note

QC 20171218

Available from: 2017-12-18 Created: 2017-12-18 Last updated: 2018-01-13Bibliographically approved
Danglot, B., Preux, P., Baudry, B. & Monperrus, M. (2018). Correctness Attraction: A Study of Stability of Software Behavior Under Runtime Perturbation. In: PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE): . Paper presented at 40th ACM/IEEE International Conference on Software Engineering (ICSE), Gothenburg, MAY 27-JUN 03, 2018 (pp. 481-481). IEEE
Open this publication in new window or tab >>Correctness Attraction: A Study of Stability of Software Behavior Under Runtime Perturbation
2018 (English)In: PROCEEDINGS 2018 IEEE/ACM 40TH INTERNATIONAL CONFERENCE ON SOFTWARE ENGINEERING (ICSE), IEEE , 2018, p. 481-481Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
IEEE, 2018
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-241350 (URN)10.1145/3180155.3182548 (DOI)000454843300055 ()
Conference
40th ACM/IEEE International Conference on Software Engineering (ICSE), Gothenburg, MAY 27-JUN 03, 2018
Note

QC 20190118

Available from: 2019-01-18 Created: 2019-01-18 Last updated: 2019-01-18Bibliographically approved
Danglot, B., Preux, P., Baudry, B. & Monperrus, M. (2018). Correctness attraction: a study of stability of software behavior under runtime perturbation. Journal of Empirical Software Engineering, 23(4), 2086-2119
Open this publication in new window or tab >>Correctness attraction: a study of stability of software behavior under runtime perturbation
2018 (English)In: Journal of Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 23, no 4, p. 2086-2119Article in journal (Refereed) Published
Abstract [en]

Can the execution of software be perturbed without breaking the correctness of the output? In this paper, we devise a protocol to answer this question from a novel perspective. In an experimental study, we observe that many perturbations do not break the correctness in ten subject programs. We call this phenomenon “correctness attraction”. The uniqueness of this protocol is that it considers a systematic exploration of the perturbation space as well as perfect oracles to determine the correctness of the output. To this extent, our findings on the stability of software under execution perturbations have a level of validity that has never been reported before in the scarce related work. A qualitative manual analysis enables us to set up the first taxonomy ever of the reasons behind correctness attraction.

Place, publisher, year, edition, pages
Springer, 2018
Keywords
Perturbation analysis; Software correctness; Empirical study
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-240079 (URN)10.1007/s10664-017-9571-8 (DOI)000435804100008 ()2-s2.0-85038623794 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research , Trustfull
Note

QC 20181211

Available from: 2018-12-11 Created: 2018-12-11 Last updated: 2019-06-11Bibliographically approved
Sobreira, V., Durieux, T., Madeiral, F., Monperrus, M. & De Almeida Maia, M. (2018). Dissection of a bug dataset: Anatomy of 395 patches from Defects4J. In: 25th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2018 - Proceedings: . Paper presented at 25th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, University of Molise Campobasso, Italy, 20 March 2018 through 23 March 2018 (pp. 130-140). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Dissection of a bug dataset: Anatomy of 395 patches from Defects4J
Show others...
2018 (English)In: 25th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2018 - Proceedings, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 130-140Conference paper, Published paper (Refereed)
Abstract [en]

Well-designed and publicly available datasets of bugs are an invaluable asset to advance research fields such as fault localization and program repair as they allow directly and fairly comparison between competing techniques and also the replication of experiments. These datasets need to be deeply understood by researchers: The answer for questions like 'which bugs can my technique handle?' and 'for which bugs is my technique effective?' depends on the comprehension of properties related to bugs and their patches. However, such properties are usually not included in the datasets, and there is still no widely adopted methodology for characterizing bugs and patches. In this work, we deeply study 395 patches of the Defects4J dataset. Quantitative properties (patch size and spreading) were automatically extracted, whereas qualitative ones (repair actions and patterns) were manually extracted using a thematic analysis-based approach. We found that 1) the median size of Defects4J patches is four lines, and almost 30% of the patches contain only addition of lines; 2) 92% of the patches change only one file, and 38% has no spreading at all; 3) the top-3 most applied repair actions are addition of method calls, conditionals, and assignments, occurring in 77% of the patches; and 4) nine repair patterns were found for 95% of the patches, where the most prevalent, appearing in 43% of the patches, is on conditional blocks. These results are useful for researchers to perform advanced analysis on their techniques' results based on Defects4J. Moreover, our set of properties can be used to characterize and compare different bug datasets.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-233724 (URN)10.1109/SANER.2018.8330203 (DOI)2-s2.0-85051026937 (Scopus ID)9781538649695 (ISBN)
Conference
25th IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2018, University of Molise Campobasso, Italy, 20 March 2018 through 23 March 2018
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20180830

Available from: 2018-08-30 Created: 2018-08-30 Last updated: 2019-03-18Bibliographically approved
Monperrus, M. & Weimer, W. (2018). Editor's Note: Special Issue on Automatic Software Repair. Journal of Empirical Software Engineering, 23(5), pp. 2865-2865
Open this publication in new window or tab >>Editor's Note: Special Issue on Automatic Software Repair
2018 (English)In: Journal of Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 23, no 5, p. 2865-2865Article in journal, News item (Other (popular science, discussion, etc.)) Published
Place, publisher, year, edition, pages
Springer, 2018
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-233263 (URN)10.1007/s10664-018-9632-7 (DOI)000440031100011 ()
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20180821

Available from: 2018-08-21 Created: 2018-08-21 Last updated: 2019-03-12Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3505-3383

Search in DiVA

Show all publications