kth.sePublications
Planned maintenance
A system upgrade is planned for 10/12-2024, at 12:00-13:00. During this time DiVA will be unavailable.
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Efficient Exploration and Analysis of Program Repair Search Spaces
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.ORCID iD: 0000-0003-2183-9633
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The ubiquitous presence of software has made its failures a huge source of cost. To avoid this cost, fixing buggy programs is essential. As manual bug fixing, also known as debugging, is notoriously resource demanding, researchers have proposed to replace it with automated program repair (APR) approaches. As input, an APR approach takes a specification and a buggy program that violates the specification; it then tries to generate a patched program that complies with the specification and correctly implements the expected behavior. To ensure APR is practical and can replace manual debugging, we have to make it efficient.

APR works in two major phases: search space exploration, and search space analysis. In the search space exploration phase, the APR approach predicts the potentially faulty parts of the code (fault localization) and replaces them with various alternatives to generate patches (patch generation). In this phase, finding the best alternatives requires extensive and costly analysis, which makes patch generation a key to APR efficiency. In the search space analysis phase of APR, the patches generated in the exploration phase are analyzed to identify a correct patch with an automated patch assessment and a manual patch assessment component. The automated patch assessment aims at minimizing the number of patches that require manual assessment for identifying a correct patch. As the human expertise needed for manual patch assessment is the most valuable development resource, strong automated and manual patch assessment techniques are crucial for efficient APR.

In this thesis, we aim at improving the efficiency of three main components involved in the exploration and analysis of APR search spaces: patch generation, automated patch assessment, and manual patch assessment. For this purpose, the thesis makes five contributions as follows.

We make two contributions to make patch generation efficient. First, we introduce Sorald, a template-based APR approach for fixing SonarJava static warnings. Sorald employs accurately designed templates to generate exactly one patch that is highly likely to fix the bug. The lightweight patch generation technique and the small search space that needs little analysis makes Sorald an efficient APR approach. Our second contribution is CigaR, an iterative LLM-based APR tool that fixes bugs that cause a test failure. CigaR uses three carefully designed prompts and two prompting strategies to generate likely correct patches with minimal LLM token cost.

We also make two other contributions to make automated patch assessment efficient. First, we propose LighteR, a lightweight tool for estimating the potential of fix templates used by template-based APR approaches. LighteR compares fix templates against developer-made bug-fixes to assess if the templates follow the modification patterns used by experts. The result of this assessment is used to rank the patches based on the potential of templates used for their generation. This ranking is used to prioritize patches for manual assessment and thus, finding the correct patch with minimal manual analysis. Our second contribution to APR automated patch assessment is Mokav. Mokav is an execution-driven iterative LLM-based tool that generates tests to differentiate between patches. The generated tests help to group APR patches into separate clusters. Only one patch from each of the clusters requires manually analysis. This makes APR efficient by significantly reducing the manual patch assessment effort.

Our last contribution is Collector-Sahab, which aims at helping code reviewers better understand behavioral changes caused by patches. Given two versions of a program P \& Q, Collector-Sahab collects the execution trace of both P \& Q. It next compares the traces and identifies runtime differences at variable and return value level. Finally, it augments the code diff between P \& Q with a concise selection of extracted runtime differences. This code augmentation helps code reviewers to better understand the behavior of APR patches and thus, reduces human effort needed for manual patch assessment.

To sum up, in this thesis we aim at making APR useful in practice by improving its efficiency. For this purpose, we propose novel methods to make patch generation, automated patch assessment, and manual patch assessment efficient.

Abstract [sv]

Programvara används i många olika sammanhang vilket gör dess misslyckanden till en enorm kostnadskälla. För att undvika denna kostnad är det viktigt att fixa buggiga program. Eftersom manuell buggfixning, även känd som debugging, är notoriskt resurskrävande, har forskare föreslagit att den ska ersättas med tillvägagångssätt för automatiserad programreparation (APR). Som input tar en APR-metod en specifikation och ett buggigt program som bryter mot specifikationen; den försöker sedan generera ett korrigerat program som överensstämmer med specifikationen och korrekt implementerar det förväntade beteendet. För att säkerställa att APR är praktiskt användbart och kan ersätta manuell felsökning, måste vi se till att det görs effektivt.

APR fungerar i två huvudfaser: sökrumsutforskning och sökrumsanalys. I sökrumsutforskningsfasen förutsäger APR-metoden de potentiellt felaktiga delarna av koden (fellokalisering) och ersätter dem med olika alternativ för att generera patchar (patchgenerering). Att hitta de bästa alternativen  kräver i denna fas omfattande och kostsamma analyser, vilket gör patchgenerering till en nyckel till effektiv APR. I sökrumsanalysfasen av APR analyseras patcharna som genereras i utforskningsfasen för att identifiera en korrekt patch med en automatisk patchbedömning och en manuell komponent för bedömning av patchar. Den automatiska patchbedömningen syftar till att minimera antalet patchar som kräver manuell bedömning för att identifiera en korrekt patch. Eftersom den mänskliga expertis som behövs för manuell patchbedömning är den mest värdefulla utvecklarresursen, är starka automatiserade och manuella patchbedömningstekniker avgörande för effektiv APR.

I den här avhandlingen syftar vi till att förbättra effektiviteten hos tre huvudkomponenter som är involverade i utforskning och analys av APR-sökrum: patchgenerering, automatiserad patchbedömning och manuell patchbedömning. För detta ändamål ger avhandlingen fem bidrag enligt följande.

Vi ger två bidrag för att göra patchgenerering effektiv. Först introducerar vi Sorald, en mallbaserad APR-metod för att fixa statiska varningar som ges av SonarJava. Sorald använder noggrant utformade mallar för att generera exakt en patch som med stor sannolikhet kommer att fixa buggen. Den enkla patchgenereringstekniken och det lilla sökrummet som inte kräver så mycket analys gör Sorald till en effektiv APR-metod. Vårt andra bidrag är CigaR, ett iterativt LLM-baserat APR-verktyg som fixar buggar som orsakar ett testfel. CigaR använder tre noggrant utformade promptar och två promptstrategier för att generera sannolikt korrekta patchar med minimal LLM-tokenkostnad.

Vi ger också två andra bidrag för att göra automatiserad patchbedömning effektiv. Först föreslår vi LighteR, ett enkelt verktyg för att uppskatta potentialen för fixa mallar som används av mallbaserade APR-metoder. LighteR jämför fixa mallar med buggfixar gjorda av utvecklare för att bedöma om mallarna följer modifieringsmönster som används av experter. Resultatet av denna bedömning används för att rangordna patcharna baserat på potentialen hos mallar som används för deras generering. Denna rankning används för att prioritera patchar för manuell bedömning och därmed hitta rätt patch med minimal manuell analys. Vårt andra bidrag till APR automatisk patchbedömning är Mokav. Mokav är ett exekveringsdrivet iterativt LLM-baserat verktyg som genererar tester för att skilja mellan patchar. De genererade testerna hjälper till att gruppera APR-lappar i separata kluster. Endast en patch från varje kluster behöver bedömas manuellt. Detta gör APR effektiv genom att avsevärt minska den manuella patchbedömningen.

Vårt sista bidrag är Collector-Sahab, som syftar till att hjälpa kodgranskare att bättre förstå beteendeförändringar orsakade av patchar. Givet två versioner av ett program P \& Q, samlar Collector-Sahab in exekveringsspår för både P \& Q. Därefter jämförs spåren och körtidsskillnader på variabel- och returvärdesnivå identifieras. Slutligen utökar den koddifferensen mellan P \& Q med ett kortfattat urval av extraherade skillnader vid körning. Denna förstärkning av koden hjälper kodgranskare att bättre förstå beteendet hos APR-patchar och minskar på så sätt den mänskliga ansträngningen som krävs för manuell patchbedömning.

Sammanfattningsvis, i denna avhandling syftar vi till att göra APR användbar i praktiken genom att förbättra dess effektivitet. För detta ändamål föreslår vi nya metoder för att göra patchgenerering, automatiserad patchbedömning och manuell patchbedömning effektiv.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. , p. xi, 107
Series
TRITA-EECS-AVL ; 2024:76
Keywords [en]
Automated Program Repair, Program Repair Efficiency, Large Language Models
National Category
Software Engineering
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-355105ISBN: 978-91-8106-071-3 (print)OAI: oai:DiVA.org:kth-355105DiVA, id: diva2:1907504
Public defence
2024-11-22, https://kth-se.zoom.us/j/64148823838, F3 (Flodis), Lindstedtsvägen 26, Stockholm, Sweden, 10:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20241023

Available from: 2024-10-23 Created: 2024-10-22 Last updated: 2024-10-28Bibliographically approved
List of papers
1. Estimating the potential of program repair search spaces with commit analysis
Open this publication in new window or tab >>Estimating the potential of program repair search spaces with commit analysis
Show others...
2022 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 188, article id 111263Article in journal (Refereed) Published
Abstract [en]

The most natural method for evaluating program repair systems is to run them on bug datasets, such as Defects4J. Yet, using this evaluation technique on arbitrary real-world programs requires heavy configuration. In this paper, we propose a purely static method to evaluate the potential of the search space of repair approaches. This new method enables researchers and practitioners to encode the search spaces of repair approaches and select potentially useful ones without struggling with tool configuration and execution. We encode the search spaces by specifying the repair strategies they employ. Next, we use the specifications to check whether past commits lie in repair search spaces. For a repair approach, including many human-written past commits in its search space indicates its potential to generate useful patches. We implement our evaluation method in LIGHTER. LIGHTER gets a Git repository and outputs a list of commits whose source code changes lie in repair search spaces. We run LIGHTER on 55,309 commits from the history of 72 Github repositories with and show that LIGHTER's precision and recall are 77% and 92%, respectively. Overall, our experiments show that our novel method is both lightweight and effective to study the search space of program repair approaches.

Place, publisher, year, edition, pages
Elsevier BV, 2022
Keywords
Program repair, Search-space, Static code analysis, Commit analysis
National Category
Computer Sciences Software Engineering Computer Systems
Identifiers
urn:nbn:se:kth:diva-312192 (URN)10.1016/j.jss.2022.111263 (DOI)000783133900016 ()2-s2.0-85124690116 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research, trustfull
Note

QC 20220817

Available from: 2022-05-18 Created: 2022-05-18 Last updated: 2024-10-22Bibliographically approved
2. Sorald: Automatic Patch Suggestions for SonarQube Static Analysis Violations
Open this publication in new window or tab >>Sorald: Automatic Patch Suggestions for SonarQube Static Analysis Violations
Show others...
2022 (English)In: IEEE Transactions on Dependable and Secure Computing, ISSN 1545-5971, E-ISSN 1941-0018, p. 1-1Article in journal (Refereed) Published
Abstract [en]

Previous work has shown that early resolution of issues detected by static code analyzers can prevent major costs later on. However, developers often ignore such issues for two main reasons. First, many issues should be interpreted to determine if they correspond to actual flaws in the program. Second, static analyzers often do not present the issues in a way that is actionable. To address these problems, we present Sorald: a novel system that uses metaprogramming templates to transform the abstract syntax trees of programs and suggests fixes for static analysis warnings. Thus, the burden on the developer is reduced from interpreting and fixing static issues, to inspecting and approving full fledged solutions. Sorald fixes violations of 10 rules from SonarJava, one of the most widely used static analyzers for Java. We evaluate Sorald on a dataset of 161 popular repositories on Github. Our analysis shows the effectiveness of Sorald as it fixes 65% (852/1,307) of the violations that meets the repair preconditions. Overall, our experiments show it is possible to automatically fix notable violations of the static analysis rules produced by the state-of-the-art static analyzer SonarJava.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
automatic program repair, Codes, Computer bugs, Java, Maintenance engineering, metaprogramming, Software development management, Static analysis, Static code analysis, Syntactics, Codes (symbols), Computer software, Java programming language, Program debugging, Repair, Software design, Trees (mathematics), Automatic programs, Code, Meta Programming, Static analyzers, Static codes
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-323274 (URN)10.1109/TDSC.2022.3167316 (DOI)001029054600009 ()2-s2.0-85128651786 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research, trustfull
Note

QC 20230124

Available from: 2023-01-24 Created: 2023-01-24 Last updated: 2024-10-22Bibliographically approved
3. Augmenting Diffs With Runtime Information
Open this publication in new window or tab >>Augmenting Diffs With Runtime Information
2023 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 49, no 11, p. 4988-5007Article in journal (Refereed) Published
Abstract [en]

Source code diffs are used on a daily basis as part of code review, inspection, and auditing. To facilitate understanding, they are typically accompanied by explanations that describe the essence of what is changed in the program. As manually crafting high-quality explanations is a cumbersome task, researchers have proposed automatic techniques to generate code diff explanations. Existing explanation generation methods solely focus on static analysis, i.e., they do not take advantage of runtime information to explain code changes. In this article, we propose Collector-Sahab, a novel tool that augments code diffs with runtime difference information. Collector-Sahab compares the program states of the original (old) and patched (new) versions of a program to find unique variable values. Then, Collector-Sahab adds this novel runtime information to the source code diff as shown, for instance, in code reviewing systems. As an evaluation, we run Collector-Sahab on 584 code diffs for Defects4J bugs and find it successfully augments the code diff for 95% (555/584) of them. We also perform a user study and ask eight participants to score the augmented code diffs generated by Collector-Sahab. Per this user study, we conclude that developers find the idea of adding runtime data to code diffs promising and useful. Overall, our experiments show the effectiveness and usefulness of Collector-Sahab in augmenting code diffs with runtime difference information. Publicly-available repository: https://github.com/ASSERT-KTH/collector-sahab.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Code diff, code review, dynamic program analysis, runtime differencing
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-348563 (URN)10.1109/TSE.2023.3324258 (DOI)2-s2.0-85174816179 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20240701

Available from: 2024-06-26 Created: 2024-06-26 Last updated: 2024-10-22Bibliographically approved
4. CigaR: Cost-efficient Program Repair with LLMs
Open this publication in new window or tab >>CigaR: Cost-efficient Program Repair with LLMs
(English)Manuscript (preprint) (Other academic)
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-355079 (URN)
Note

QC 20241023

Available from: 2024-10-21 Created: 2024-10-21 Last updated: 2024-10-23Bibliographically approved
5. Mokav: Execution-driven Differential Testing with LLMs
Open this publication in new window or tab >>Mokav: Execution-driven Differential Testing with LLMs
(English)Manuscript (preprint) (Other academic)
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-355077 (URN)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20241023

Available from: 2024-10-21 Created: 2024-10-21 Last updated: 2024-10-23Bibliographically approved

Open Access in DiVA

summary(2700 kB)144 downloads
File information
File name SUMMARY01.pdfFile size 2700 kBChecksum SHA-512
7c5bcfb8cce8cb39a28d3fedc2162f1a87a8868b57af6ff0b4e24d21b5850846c18c98a0dea5ef73acc55299e1a28c0923c3de29d1210bf05a12bb9cfe04850c
Type summaryMimetype application/pdf

Authority records

Etemadi, Khashayar

Search in DiVA

By author/editor
Etemadi, Khashayar
By organisation
Theoretical Computer Science, TCS
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 369 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf