kth.sePublications
Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Improving the Precision of Automatic Program Repair with Machine Learning
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.ORCID iD: 0000-0003-4807-2110
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Automatic program repair as a research field aims to eliminate software bugs and vulnerabilities in an automatic manner. Automatic program repair holds great promise to reduce the debugging cost and increase the productivity of software development. Test suite is one of the most widely used specifications for automatic program repair to specify correct program behavior and guide patch generation. However, test suite is an incomplete specification with limited input-output data. This results in automatic program repair generating patches that merely satisfy test suite specifications, yet fail to repair buggy programs in general. The generation of a great number of incorrect patches leads to a low precision of automatic program repair.

In this thesis, we focus on improving the precision of automatic program repair from three perspectives: patch generation, patch assessment in practice, and patch assessment for scientific usage. This thesis makes contributions to the following in automatic program repair. First of all, to increase the precision of patch generation, we propose two learning-based automatic program repair approaches to encourage the generation of more correct patches with fewer candidate patches. Second, to increase the precision of patch assessment in practice, we propose to build a probabilistic model based on static code features to discard incorrect patches and thus increase the ratio of correct patches to all generated patches. Third, to increase the patch assessment precision for scientific usage, we propose to use automatically generated test cases to discard incorrect patches.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. , p. 97
Series
TRITA-EECS-AVL ; 2023:10
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-323295ISBN: 978-91-8040-469-3 (print)OAI: oai:DiVA.org:kth-323295DiVA, id: diva2:1735113
Public defence
2023-02-24, https://kth-se.zoom.us/j/63393781380, Kollegiesalen, Brinellvägen 6, Stockholm, 13:30 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20230208

Available from: 2023-02-08 Created: 2023-02-07 Last updated: 2023-02-27Bibliographically approved
List of papers
1. SelfAPR: Self-Supervised Program Repair with Test Execution Diagnostics
Open this publication in new window or tab >>SelfAPR: Self-Supervised Program Repair with Test Execution Diagnostics
Show others...
2023 (English)In: Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering, Association for Computing Machinery (ACM) , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Learning-based program repair has achieved good results in a recent series of papers. Yet, we observe that the related work fails to repair some bugs because of a lack of knowledge about 1) the application domain of the program being repaired, and 2) the fault type being repaired. In this paper, we solve both problems by changing the learning paradigm from supervised training to self-supervised training in an approach called SelfAPR. First, SelfAPR generates training samples on disk by perturbing a previous version of the program being repaired, enforcing the neural model to capture project-specific knowledge. This is different from the previous work based on mined past commits. Second, SelfAPR executes all training samples and extracts and encodes test execution diagnostics into the input representation, steering the neural model to fix the kind of fault. This is different from the existing studies that only consider static source code as input. We implement SelfAPR and evaluate it in a systematic manner. We generate 1 039 873 training samples obtained by perturbing 17 open-source projects. We evaluate SelfAPR on 818 bugs from Defects4J, SelfAPR correctly repairs 110 of them, outperforming all the supervised learning repair approaches.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Series
ASE ’22
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-323660 (URN)10.1145/3551349.3556926 (DOI)2-s2.0-85146336362 (Scopus ID)
Conference
the 37th IEEE/ACM International Conference on Automated Software Engineering
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20230214

Available from: 2023-02-08 Created: 2023-02-08 Last updated: 2023-08-04Bibliographically approved
2. Neural Program Repair with Execution-based Backpropagation
Open this publication in new window or tab >>Neural Program Repair with Execution-based Backpropagation
2022 (English)In: ICSE '22: Proceedings of the 44th International Conference on Software Engineering, Association for Computing Machinery (ACM) , 2022, p. 1506-1518Conference paper, Published paper (Refereed)
Abstract [en]

Neural machine translation (NMT) architectures have achieved promising results for automatic program repair. Yet, they have the limitation of generating low-quality patches (e.g., not compilable patches). This is because the existing works only optimize a purely syntactic loss function based on characters and tokens without incorporating program-specific information during neural network weight optimization. In this paper, we propose a novel program repair model called RewardRepair. The core novelty of RewardRepair is to improve NMT-based program repair with a loss function based on program compilation and test execution information, rewarding the network to produce patches that compile and that do not overfit. We conduct several experiments to evaluate RewardRepair showing that it is feasible and effective to use compilation and test execution results to optimize the underlying neural repair model. RewardRepair correctly repairs 207 bugs over four benchmarks. we report on repair success for 121 bugs that are fixed for the first time in the literature. Also, RewardRepair produces up to 45.3% of compilable patches, an improvement over the 39% by the state-of-the-art.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2022
Series
International Conference on Software Engineering, ISSN 0270-5257
National Category
Specific Languages Computer Sciences
Identifiers
urn:nbn:se:kth:diva-316694 (URN)10.1145/3510003.3510222 (DOI)000832185400122 ()2-s2.0-85130298109 (Scopus ID)
Conference
44th ACM/IEEE International Conference on Software Engineering, ICSE 2022, Pittsburgh, 22 May 2022, through 27 May 2022
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20221109

Part of proceedings: ISBN 978-145039221-1

Available from: 2022-09-05 Created: 2022-09-05 Last updated: 2023-02-08Bibliographically approved
3. Automated Classification of Overfitting Patches with Statically Extracted Code Features
Open this publication in new window or tab >>Automated Classification of Overfitting Patches with Statically Extracted Code Features
Show others...
2022 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 48, no 8, p. 2920-2938Article in journal (Refereed) Published
Abstract [en]

Automatic program repair (APR) aims to reduce the cost of manually fixing software defects. However, APR suffers from generating a multitude of overfitting patches, those patches that fail to correctly repair the defect beyond making the tests pass. This paper presents a novel overfitting patch detection system called ODS to assess the correctness of APR patches. ODS first statically compares a patched program and a buggy program in order to extract code features at the abstract syntax tree (AST) level. Then, ODS uses supervised learning with the captured code features and patch correctness labels to automatically learn a probabilistic model. The learned ODS model can then finally be applied to classify new and unseen program repair patches. We conduct a large-scale experiment to evaluate the effectiveness of ODS on patch correctness classification based on 10,302 patches from Defects4J, Bugs.jar and Bears benchmarks. The empirical evaluation shows that ODS is able to correctly classify 71.9% of program repair patches from 26 projects, which improves the state-of-the-art. ODS is applicable in practice and can be employed as a post-processing procedure to classify the patches generated by different APR systems. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2022
Keywords
Automatic program repair, Code features, Feature extraction, Maintenance engineering, Overfitting patch, Patch assessment, Predictive models, Software, Syntactics, Tools, Training
National Category
Computer Sciences Software Engineering Computer Systems
Identifiers
urn:nbn:se:kth:diva-308879 (URN)10.1109/TSE.2021.3071750 (DOI)000846878500013 ()2-s2.0-85104193931 (Scopus ID)
Funder
Swedish Foundation for Strategic Research, TrustfullWallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20220216

Available from: 2022-02-16 Created: 2022-02-16 Last updated: 2023-02-08Bibliographically approved
4. Automated patch assessment for program repair at scale
Open this publication in new window or tab >>Automated patch assessment for program repair at scale
2021 (English)In: Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 26, no 2, article id 20Article in journal (Refereed) Published
Abstract [en]

In this paper, we do automatic correctness assessment for patches generated by program repair systems. We consider the human-written patch as ground truth oracle and randomly generate tests based on it, a technique proposed by Shamshiri et al., called Random testing with Ground Truth (RGT) in this paper. We build a curated dataset of 638 patches for Defects4J generated by 14 state-of-the-art repair systems, we evaluate automated patch assessment on this dataset. The results of this study are novel and significant: First, we improve the state of the art performance of automatic patch assessment with RGT by 190% by improving the oracle; Second, we show that RGT is reliable enough to help scientists to do overfitting analysis when they evaluate program repair systems; Third, we improve the external validity of the program repair knowledge with the largest study ever.

Place, publisher, year, edition, pages
Springer Nature, 2021
Keywords
Automatic program repair, Automatic patch assessment
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-292295 (URN)10.1007/s10664-020-09920-w (DOI)000620938900001 ()2-s2.0-85101589275 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research , trustfull
Note

QC 20210406

Available from: 2021-04-06 Created: 2021-04-06 Last updated: 2023-02-08Bibliographically approved
5. A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark
Open this publication in new window or tab >>A Comprehensive Study of Automatic Program Repair on the QuixBugs Benchmark
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Automatic program repair papers tend to repeatedly use the same benchmarks. This poses a threat to the external validity of the findings of the program repair research community. In this paper, we perform an automatic repair experiment on a benchmark called QuixBugs that has been recently published. This benchmark has never been studied in the context of program repair. In this study, we report on the characteristics of QuixBugs, and we design and perform an experiment about the effectiveness of test-suite based program repair on QuixBugs. We study two repair systems, Astor and Nopol, which are representatives of generate-and-validate repair technique and synthesis repair technique respectively. We propose three patch correctness assessment techniques to comprehensively study overfitting and incorrect patches. Our key results are: 1) 13/40 buggy programs in the QuixBugs can be repaired with a test-suite adequate patch; 2) a total of 22 different plausible patches for those 13 buggy programs in the QuixBugs are present in the search space of the considered tools; 3) the three patch assessment techniques discard in total 12/22 patches that are overfitting. This sets a baseline for future research of automatic repair on QuixBugs. Our experiment also highlights the major properties and challenges of how to perform automated correctness assessment of program repair patches. All experimental results are publicly available on Github in order to facilitate future research on automatic program repair.

National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-239890 (URN)
Note

QC 20181206

Available from: 2018-12-04 Created: 2018-12-04 Last updated: 2023-02-08Bibliographically approved

Open Access in DiVA

fulltext(1445 kB)482 downloads
File information
File name FULLTEXT04.pdfFile size 1445 kBChecksum SHA-512
1383487e1b608ec6211102d3f7ad25c72ddd319e2d41c3b53e42ea9b2aafcf6dfd88acc1c9c79732068890d6c7c5aecc1513d8b99490294fa11910ec653d59c0
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Ye, He
By organisation
Theoretical Computer Science, TCS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 488 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1929 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf