kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 15) Show all publications
Soto Valero, C., Durieux, T., Harrand, N. & Baudry, B. (2023). Coverage-Based Debloating for Java Bytecode. ACM Transactions on Software Engineering and Methodology, 32(2), 1-34
Open this publication in new window or tab >>Coverage-Based Debloating for Java Bytecode
2023 (English)In: ACM Transactions on Software Engineering and Methodology, ISSN 1049-331X, E-ISSN 1557-7392, Vol. 32, no 2, p. 1-34Article in journal (Refereed) Published
Abstract [en]

Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating, which we call coverage-based debloating. We implement the technique for one single language: Java bytecode. We leverage a combination of state-of-the-art Java bytecode coverage tools to precisely capture what parts of a project and its dependencies are used when running with a specific workload. Then, we automatically remove the parts that are not covered, in order to generate a debloated version of the project. We succeed to debloat 211 library versions from a dataset of 94 unique  open-source Java libraries. The debloated versions are syntactically correct and preserve their original behavior according to the workload. Our results indicate that 68.3% of the libraries’ bytecode and 20.3% of their total dependencies can be removed through coverage-based debloating.

For the first time in the literature on software debloating, we assess the utility of debloated libraries with respect to client applications that reuse them. We select 988 client projects that either have a direct reference to the debloated library in their source code or which test suite covers at least one class of the libraries that we debloat. Our results show that 81.5% of the clients, with at least one test that uses the library, successfully compile and pass their test suite when the original library is replaced by its debloated version.

Place, publisher, year, edition, pages
ACM Digital Library, 2023
Keywords
software bloat, code coverage, program specialization, bytecode, software maintenance
National Category
Computer Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-316426 (URN)10.1145/3546948 (DOI)000970588900011 ()2-s2.0-85147732395 (Scopus ID)
Projects
WASP
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20251222

Available from: 2022-08-17 Created: 2022-08-17 Last updated: 2025-12-22Bibliographically approved
Ye, H., Gu, J., Martinez, M., Durieux, T. & Monperrus, M. (2022). Automated Classification of Overfitting Patches with Statically Extracted Code Features. IEEE Transactions on Software Engineering, 48(8), 2920-2938
Open this publication in new window or tab >>Automated Classification of Overfitting Patches with Statically Extracted Code Features
Show others...
2022 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 48, no 8, p. 2920-2938Article in journal (Refereed) Published
Abstract [en]

Automatic program repair (APR) aims to reduce the cost of manually fixing software defects. However, APR suffers from generating a multitude of overfitting patches, those patches that fail to correctly repair the defect beyond making the tests pass. This paper presents a novel overfitting patch detection system called ODS to assess the correctness of APR patches. ODS first statically compares a patched program and a buggy program in order to extract code features at the abstract syntax tree (AST) level. Then, ODS uses supervised learning with the captured code features and patch correctness labels to automatically learn a probabilistic model. The learned ODS model can then finally be applied to classify new and unseen program repair patches. We conduct a large-scale experiment to evaluate the effectiveness of ODS on patch correctness classification based on 10,302 patches from Defects4J, Bugs.jar and Bears benchmarks. The empirical evaluation shows that ODS is able to correctly classify 71.9% of program repair patches from 26 projects, which improves the state-of-the-art. ODS is applicable in practice and can be employed as a post-processing procedure to classify the patches generated by different APR systems. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2022
Keywords
Automatic program repair, Code features, Feature extraction, Maintenance engineering, Overfitting patch, Patch assessment, Predictive models, Software, Syntactics, Tools, Training
National Category
Computer Sciences Software Engineering Computer Systems
Identifiers
urn:nbn:se:kth:diva-308879 (URN)10.1109/TSE.2021.3071750 (DOI)000846878500013 ()2-s2.0-85104193931 (Scopus ID)
Funder
Swedish Foundation for Strategic Research, TrustfullWallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20220216

Available from: 2022-02-16 Created: 2022-02-16 Last updated: 2023-02-08Bibliographically approved
Ye, H., Martinez, M., Durieux, T. & Monperrus, M. (2021). A comprehensive study of automatic program repair on the QuixBugs benchmark. Journal of Systems and Software, 171, Article ID 110825.
Open this publication in new window or tab >>A comprehensive study of automatic program repair on the QuixBugs benchmark
2021 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 171, article id 110825Article in journal (Refereed) Published
Abstract [en]

Automatic program repair papers tend to repeatedly use the same benchmarks. This poses a threat to the external validity of the findings of the program repair research community. In this paper, we perform an empirical study of automatic repair on a benchmark of bugs called QuixBugs, which has been little studied. In this paper, (1) We report on the characteristics of QuixBugs; (2) We study the effectiveness of 10 program repair tools on it; (3) We apply three patch correctness assessment techniques to comprehensively study the presence of overfitting patches in QuixBugs. Our key results are: (1) 16/40 buggy programs in QuixBugs can be repaired with at least a test suite adequate patch; (2) A total of 338 plausible patches are generated on the QuixBugs by the considered tools, and 53.3% of them are overfitting patches according to our manual assessment; (3) The three automated patch correctness assessment techniques, RGTEvosuite, RGTInputSampling and GTInvariants, achieve an accuracy of 98.2%, 80.8% and 58.3% in overfitting detection, respectively. To our knowledge, this is the largest empirical study of automatic repair on QuixBugs, combining both quantitative and qualitative insights. All our empirical results are publicly available on GitHub in order to facilitate future research on automatic program repair. 

Place, publisher, year, edition, pages
Elsevier Inc., 2021
Keywords
Automatic program repair, Bug benchmark, Patch correctness assessment, Software engineering, Assessment technique, Automatic programs, Empirical studies, External validities, Overfitting, Repair tools, Research communities, Automatic test pattern generation
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-285291 (URN)10.1016/j.jss.2020.110825 (DOI)000592499600001 ()2-s2.0-85091196139 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20201202

Available from: 2020-12-02 Created: 2020-12-02 Last updated: 2024-03-18Bibliographically approved
Madeiral Delfim, F. & Durieux, T. (2021). A large-scale study on human-cloned changes for automated program repair. In: 2021 IEEE/ACM 18Th International Conference On Mining Software Repositories (Msr 2021): . Paper presented at 29th IEEE/ACM International Conference on Program Comprehension (ICPC) / 18th IEEE/ACM International Conference on Mining Software Repositories (MSR), MAY 22-30, 2021, ELECTR NETWORK (pp. 510-514). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A large-scale study on human-cloned changes for automated program repair
2021 (English)In: 2021 IEEE/ACM 18Th International Conference On Mining Software Repositories (Msr 2021), Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 510-514Conference paper, Published paper (Refereed)
Abstract [en]

Research in automatic program repair has shown that real bugs can be automatically fixed. However, there are several challenges involved in such a task that are not yet fully addressed. As an example, consider that a test-suite-based repair tool performs a change in a program to fix a bug spotted by a failing test case, but then the same or another test case fails. This could mean that the change is a partial fix for the bug or that another bug was manifested. However, the repair tool discards the change and possibly performs other repair attempts. One might wonder if the applied change should be also applied in other locations in the program so that the bug is fully fixed. In this paper, we are interested in investigating the extent of bug fix changes being cloned by developers within patches. Our goal is to investigate the need of multi-location repair by using identical or similar changes in identical or similar contexts. To do so, we analyzed 3,049 multi-hunk patches from the ManySStuBs4J dataset, which is a large dataset of single statement bug fix changes. We found out that 68% of the multi-hunk patches contain at least one change clone group. Moreover, most of these patches (70%) are strictly-cloned ones, which are patches fully composed of changes belonging to one single change clone group. Finally, most of the strictly-cloned patches (89%) contain change clones with identical changes, independently of their contexts. We conclude that automated solutions for creating patches composed of identical or similar changes can be useful for fixing bugs.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Series
IEEE International Working Conference on Mining Software Repositories, ISSN 2160-1852
Keywords
automatic program repair, patch, change clone
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-303514 (URN)10.1109/MSR52588.2021.00064 (DOI)000693399500052 ()2-s2.0-85111446885 (Scopus ID)
Conference
29th IEEE/ACM International Conference on Program Comprehension (ICPC) / 18th IEEE/ACM International Conference on Mining Software Repositories (MSR), MAY 22-30, 2021, ELECTR NETWORK
Note

Part of proceedings: ISBN 978-1-7281-8710-5, QC 20230117

Available from: 2021-10-15 Created: 2021-10-15 Last updated: 2023-01-17Bibliographically approved
Soto Valero, C., Durieux, T. & Baudry, B. (2021). A Longitudinal Analysis of Bloated Java Dependencies. In: Spinellis, D Gousios, G Chechik, M DiPenta, M (Ed.), Proceedings Of The 29Th Acm Joint Meeting On European Software Engineering Conference And Symposium On The Foundations Of Software Engineering (Esec/Fse '21): . Paper presented at 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), AUG 23-28, 2021, ELECTR NETWORK (pp. 1021-1031). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>A Longitudinal Analysis of Bloated Java Dependencies
2021 (English)In: Proceedings Of The 29Th Acm Joint Meeting On European Software Engineering Conference And Symposium On The Foundations Of Software Engineering (Esec/Fse '21) / [ed] Spinellis, D Gousios, G Chechik, M DiPenta, M, Association for Computing Machinery (ACM) , 2021, p. 1021-1031Conference paper, Published paper (Refereed)
Abstract [en]

We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we study across a total of 31,515 versions of Maven dependency trees. Bloated dependencies steadily increase over time, and 89.2 % of the direct dependencies that are bloated remain bloated in all subsequent versions of the studied projects. This empirical evidence suggests that developers can safely remove a bloated dependency. We further report novel insights regarding the unnecessary maintenance efforts induced by bloat. We find that 22 % of dependency updates performed by developers are made on bloated dependencies, and that Dependabot suggests a similar ratio of updates on bloated dependencies.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
software bloat, dependencies, java
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-309540 (URN)10.1145/3468264.3468589 (DOI)000744425500088 ()2-s2.0-85116273059 (Scopus ID)
Conference
29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), AUG 23-28, 2021, ELECTR NETWORK
Note

Part of proceedings ISBN: 978-1-4503-8562-6

QC 20220316

Available from: 2022-03-16 Created: 2022-03-16 Last updated: 2023-05-10Bibliographically approved
Harrand, N., Durieux, T., Broman, D. & Baudry, B. (2021). Automatic Diversity in the Software Supply Chain.
Open this publication in new window or tab >>Automatic Diversity in the Software Supply Chain
2021 (English)Report (Other academic)
Abstract [en]

Despite its obvious benefits, the increased adoption of package managers to automate the reuse of libraries has opened the door to a new class of hazards: supply chain attacks. By injecting malicious code in one library, an attacker may compromise all instances of all applications that depend on the library. To mitigate the impact of supply chain attacks, we propose the concept of Library Substitution Framework. This novel concept leverages one key observation: when an application depends on a library, it is very likely that there exists other libraries that provide similar features. The key objective of Library Substitution Framework is to enable the developers of an application to harness this diversity of libraries in their supply chain. The framework lets them generate a population of application variants, each depending on a different alternative library that provides similar functionalities. To investigate the relevance of this concept, we develop ARGO, a proof-of-concept implementation of this framework that harnesses the diversity of JSON suppliers. We study the feasibility of library substitution and its impact on a set of 368 clients. Our empirical results show that for 195 of the 368 java applications tested, we can substitute the original JSON library used by the client by at least 15 other JSON libraries without modifying the client's code. These results show the capacity of a Library Substitution Framework to diversify the supply chain of the client applications of the libraries it targets.

Publisher
p. 18
Keywords
Software supply chain, Library Substitution, Software repository, Software reuse, Java, Maven Central Repository
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-310821 (URN)
Funder
Knut and Alice Wallenberg Foundation
Note

QC 20220419

Available from: 2022-04-07 Created: 2022-04-07 Last updated: 2022-06-25Bibliographically approved
Durieux, T., Soto Valero, C. & Baudry, B. (2021). DUETS: A Dataset of Reproducible Pairs of Java Library-Clients. In: 2021 IEEE/Acm 18Th International Conference On Mining Software Repositories (MSR 2021): . Paper presented at 29th IEEE/ACM International Conference on Program Comprehension (ICPC) / 18th IEEE/ACM International Conference on Mining Software Repositories (MSR), MAY 22-30, 2021, ELECTR NETWORK (pp. 545-549). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>DUETS: A Dataset of Reproducible Pairs of Java Library-Clients
2021 (English)In: 2021 IEEE/Acm 18Th International Conference On Mining Software Repositories (MSR 2021), Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 545-549Conference paper, Published paper (Refereed)
Abstract [en]

Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suites of clients and libraries. DUETS is meant to support both static and dynamic analysis. This means that the libraries and the clients compile correctly, they are executable and their test suites pass. The dataset is composed of open-source projects that have more than five stars on GitHub. The final dataset contains 395 libraries and 2;874 clients. Additionally, we provide the raw data that we use to create this dataset, such as 34;560 pom.xml files or the complete file list from 34;560 projects. This dataset can be used to study how libraries are used by their clients or as a list of software projects that successfully build. The client's test suite can be used as an additional verification step for code transformation techniques that modify the libraries.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Series
IEEE International Working Conference on Mining Software Repositories, ISSN 2160-1852
Keywords
Mining software repositories, software reuse, Java, Maven
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-303381 (URN)10.1109/MSR52588.2021.00071 (DOI)000693399500059 ()2-s2.0-85108531256 (Scopus ID)
Conference
29th IEEE/ACM International Conference on Program Comprehension (ICPC) / 18th IEEE/ACM International Conference on Mining Software Repositories (MSR), MAY 22-30, 2021, ELECTR NETWORK
Note

Part of proceedings: ISBN 978-1-7281-8710-5, QC 20230117

Available from: 2021-10-13 Created: 2021-10-13 Last updated: 2023-01-17Bibliographically approved
Harrand, N., Durieux, T., Broman, D. & Baudry, B. (2021). The Behavioral Diversity of Java JSON Libraries. In: IEEE (Ed.), 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE): . Paper presented at 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), Wuhan, China, October 25-28, 2021 (pp. 412-422). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>The Behavioral Diversity of Java JSON Libraries
2021 (English)In: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) / [ed] IEEE, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 412-422Conference paper, Published paper (Refereed)
Abstract [en]

JSON is an essential file and data format in domains that span scientific computing, web APIs or configuration management. Its popularity has motivated significant software development effort to build multiple libraries to process JSON data. Previous studies focus on performance comparison among these libraries and lack a software engineering perspective. We present the first systematic analysis and comparison of the input / output behavior of 20 JSON libraries, in a single software ecosystem: Java/Maven. We assess behavior diversity by running each library against a curated set of 473 JSON files, including both well-formed and ill-formed files. The main design differences, which influence the behavior of the libraries, relate to the choice of data structure to represent JSON objects and to the encoding of numbers. We observe a remarkable behavioral diversity with ill-formed files, or corner cases such as large numbers or duplicate data. Our unique behavioral assessment of JSON libraries paves the way for a robust processing of ill-formed files, through a multi-version architecture.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
JSON, Java, Behavioral Diversity
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-310817 (URN)10.1109/ISSRE52982.2021.00050 (DOI)000783962100037 ()2-s2.0-85126395366 (Scopus ID)
Conference
2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), Wuhan, China, October 25-28, 2021
Funder
Knut and Alice Wallenberg Foundation
Note

Part of ISBN 9781665425872

QC 20251002

Available from: 2022-04-07 Created: 2022-04-07 Last updated: 2025-10-02Bibliographically approved
Durieux, T., Hamadi, Y. & Monperrus, M. (2020). Fully Automated HTML and JavaScript Rewriting for Constructing a Self-healing Web Proxy. Software testing, verification & reliability, 30(2), Article ID e1731.
Open this publication in new window or tab >>Fully Automated HTML and JavaScript Rewriting for Constructing a Self-healing Web Proxy
2020 (English)In: Software testing, verification & reliability, ISSN 0960-0833, E-ISSN 1099-1689, Vol. 30, no 2, article id e1731Article in journal (Refereed) Published
Abstract [en]

Over the last few years, the complexity of web applications has increased to provide more dynamic web applications to users. The drawback of this complexity is the growing number of errors in the front-end applications. In this paper, we present an approach to provide self-healing for the web. We implemented this approach in two different tools: (i) BikiniProxy, an HTTP repair proxy, and (ii) BugBlock, a browser extension. They use five self-healing strategies to rewrite the buggy HTML and JavaScript code to handle errors in web pages. We evaluate BikiniProxy and BugBlock with a new benchmark of 555 reproducible JavaScript errors of which 31.76% can be automatically self-healed by BikiniProxy and 15.67% by BugBlock.

Place, publisher, year, edition, pages
WILEY, 2020
Keywords
self-healing, bugs, JavaScript, proxy, chrome extension
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-270897 (URN)10.1002/stvr.1731 (DOI)000515567500006 ()2-s2.0-85079571132 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research , trustfull
Note

QC 20200324

Available from: 2020-03-24 Created: 2020-03-24 Last updated: 2024-03-18Bibliographically approved
Ferreira, J. F., Cruz, P., Durieux, T. & Abreu, R. (2020). SmartBugs: A Framework to Analyze Solidity Smart Contracts. In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, 22 September 2020 - 25 September 2020: . Paper presented at 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), SEP 21-25, 2020, ELECTR NETWORK (pp. 1349-1352). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>SmartBugs: A Framework to Analyze Solidity Smart Contracts
2020 (English)In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, 22 September 2020 - 25 September 2020, Association for Computing Machinery (ACM) , 2020, p. 1349-1352Conference paper, Published paper (Refereed)
Abstract [en]

Over the last few years, there has been substantial research on automated analysis, testing, and debugging of Ethereum smart contracts. However, it is not trivial to compare and reproduce that research. To address this, we present SmartBugs, an extensible and easy-to-use execution framework that simplifies the execution of analysis tools on smart contracts written in Solidity, the primary language used in Ethereum. SmartBugs is currently distributed with support for 10 tools and two datasets of Solidity contracts. The first dataset can be used to evaluate the precision of analysis tools, as it contains 143 annotated vulnerable contracts with 208 tagged vulnerabilities. The second dataset contains 47,518 unique contracts collected through Etherscan. We discuss how SmartBugs supported the largest experimental setup to date both in the number of tools and in execution time. Moreover, we show how it enables easy integration and comparison of analysis tools by presenting a new extension to the tool SmartCheck that improves substantially the detection of vulnerabilities related to the DASP10 categories Bad Randomness, Time Manipulation, and Access Control (identified vulnerabilities increased from 11% to 24%).

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020
Series
IEEE ACM International Conference on Automated Software Engineering, ISSN 1527-1366
Keywords
Smart contracts, Solidity, Ethereum, Blockchain, Tools, Debugging, Testing, Reproducible Bugs
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-296846 (URN)10.1145/3324884.3415298 (DOI)000651313500142 ()2-s2.0-85099245555 (Scopus ID)
Conference
35th IEEE/ACM International Conference on Automated Software Engineering (ASE), SEP 21-25, 2020, ELECTR NETWORK
Note

QC 20210614

Available from: 2021-06-14 Created: 2021-06-14 Last updated: 2022-06-25Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-1996-6134

Search in DiVA

Show all publications