kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning the Relation Between Code Features and Code Transforms With Structured Prediction
Shandong Univ, Jinan, Peoples R China..
Univ Politecn Cataluna, Barcelona, Spain..
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.ORCID iD: 0000-0002-6673-6438
Univ Luxembourg, L-4365 Esch Sur Alzette, Luxembourg..
Show others and affiliations
2023 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 49, no 7, p. 3872-3900Article in journal (Refereed) Published
Abstract [en]

To effectively guide the exploration of the code transform space for automated code evolution techniques, we present in this article the first approach for structurally predicting code transforms at the level of AST nodes using conditional random fields (CRFs). Our approach first learns offline a probabilistic model that captures how certain code transforms are applied to certain AST nodes, and then uses the learned model to predict transforms for arbitrary new, unseen code snippets. Our approach involves a novel representation of both programs and code transforms. Specifically, we introduce the formal framework for defining the so-called AST-level code transforms and we demonstrate how the CRF model can be accordingly designed, learned, and used for prediction. We instantiate our approach in the context of repair transform prediction for Java programs. Our instantiation contains a set of carefully designed code features, deals with the training data imbalance issue, and comprises transform constraints that are specific to code. We conduct a large-scale experimental evaluation based on a dataset of bug fixing commits from real-world Java projects. The results show that when the popular evaluation metric top-3 is used, our approach predicts the code transforms with an accuracy varying from 41% to 53% depending on the transforms. Our model outperforms two baselines based on history probability and neural machine translation (NMT), suggesting the importance of considering code structure in achieving good prediction accuracy. In addition, a proof-of-concept synthesizer is implemented to concretize some repair transforms to get the final patches. The evaluation of the synthesizer on the Defects4j benchmark confirms the usefulness of the predicted AST-level repair transforms in producing high-quality patches.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2023. Vol. 49, no 7, p. 3872-3900
Keywords [en]
Code transform, big code, machine learning, program repair
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-334718DOI: 10.1109/TSE.2023.3275380ISI: 001033501500012Scopus ID: 2-s2.0-85161054017OAI: oai:DiVA.org:kth-334718DiVA, id: diva2:1791105
Funder
Swedish Foundation for Strategic Research, trustullWallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20231127

Available from: 2023-08-24 Created: 2023-08-24 Last updated: 2023-11-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Chen, ZiminMonperrus, Martin

Search in DiVA

By author/editor
Chen, ZiminMonperrus, Martin
By organisation
Theoretical Computer Science, TCS
In the same journal
IEEE Transactions on Software Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 41 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf