Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (10 av 15) Visa alla publikasjoner
Soto Valero, C., Durieux, T., Harrand, N. & Baudry, B. (2023). Coverage-Based Debloating for Java Bytecode. ACM Transactions on Software Engineering and Methodology, 32(2), 1-34
Åpne denne publikasjonen i ny fane eller vindu >>Coverage-Based Debloating for Java Bytecode
2023 (engelsk)Inngår i: ACM Transactions on Software Engineering and Methodology, ISSN 1049-331X, E-ISSN 1557-7392, Vol. 32, nr 2, s. 1-34Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating, which we call coverage-based debloating. We implement the technique for one single language: Java bytecode. We leverage a combination of state-of-the-art Java bytecode coverage tools to precisely capture what parts of a project and its dependencies are used when running with a specific workload. Then, we automatically remove the parts that are not covered, in order to generate a debloated version of the project. We succeed to debloat 211 library versions from a dataset of 94 unique  open-source Java libraries. The debloated versions are syntactically correct and preserve their original behavior according to the workload. Our results indicate that 68.3% of the libraries’ bytecode and 20.3% of their total dependencies can be removed through coverage-based debloating.

For the first time in the literature on software debloating, we assess the utility of debloated libraries with respect to client applications that reuse them. We select 988 client projects that either have a direct reference to the debloated library in their source code or which test suite covers at least one class of the libraries that we debloat. Our results show that 81.5% of the clients, with at least one test that uses the library, successfully compile and pass their test suite when the original library is replaced by its debloated version.

sted, utgiver, år, opplag, sider
ACM Digital Library, 2023
Emneord
software bloat, code coverage, program specialization, bytecode, software maintenance
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-316426 (URN)10.1145/3546948 (DOI)000970588900011 ()2-s2.0-85147732395 (Scopus ID)
Prosjekter
WASP
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Merknad

QC 20251222

Tilgjengelig fra: 2022-08-17 Laget: 2022-08-17 Sist oppdatert: 2025-12-22bibliografisk kontrollert
Ye, H., Gu, J., Martinez, M., Durieux, T. & Monperrus, M. (2022). Automated Classification of Overfitting Patches with Statically Extracted Code Features. IEEE Transactions on Software Engineering, 48(8), 2920-2938
Åpne denne publikasjonen i ny fane eller vindu >>Automated Classification of Overfitting Patches with Statically Extracted Code Features
Vise andre…
2022 (engelsk)Inngår i: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 48, nr 8, s. 2920-2938Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Automatic program repair (APR) aims to reduce the cost of manually fixing software defects. However, APR suffers from generating a multitude of overfitting patches, those patches that fail to correctly repair the defect beyond making the tests pass. This paper presents a novel overfitting patch detection system called ODS to assess the correctness of APR patches. ODS first statically compares a patched program and a buggy program in order to extract code features at the abstract syntax tree (AST) level. Then, ODS uses supervised learning with the captured code features and patch correctness labels to automatically learn a probabilistic model. The learned ODS model can then finally be applied to classify new and unseen program repair patches. We conduct a large-scale experiment to evaluate the effectiveness of ODS on patch correctness classification based on 10,302 patches from Defects4J, Bugs.jar and Bears benchmarks. The empirical evaluation shows that ODS is able to correctly classify 71.9% of program repair patches from 26 projects, which improves the state-of-the-art. ODS is applicable in practice and can be employed as a post-processing procedure to classify the patches generated by different APR systems. 

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers Inc., 2022
Emneord
Automatic program repair, Code features, Feature extraction, Maintenance engineering, Overfitting patch, Patch assessment, Predictive models, Software, Syntactics, Tools, Training
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-308879 (URN)10.1109/TSE.2021.3071750 (DOI)000846878500013 ()2-s2.0-85104193931 (Scopus ID)
Forskningsfinansiär
Swedish Foundation for Strategic Research, TrustfullWallenberg AI, Autonomous Systems and Software Program (WASP)
Merknad

QC 20220216

Tilgjengelig fra: 2022-02-16 Laget: 2022-02-16 Sist oppdatert: 2023-02-08bibliografisk kontrollert
Ye, H., Martinez, M., Durieux, T. & Monperrus, M. (2021). A comprehensive study of automatic program repair on the QuixBugs benchmark. Journal of Systems and Software, 171, Article ID 110825.
Åpne denne publikasjonen i ny fane eller vindu >>A comprehensive study of automatic program repair on the QuixBugs benchmark
2021 (engelsk)Inngår i: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 171, artikkel-id 110825Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Automatic program repair papers tend to repeatedly use the same benchmarks. This poses a threat to the external validity of the findings of the program repair research community. In this paper, we perform an empirical study of automatic repair on a benchmark of bugs called QuixBugs, which has been little studied. In this paper, (1) We report on the characteristics of QuixBugs; (2) We study the effectiveness of 10 program repair tools on it; (3) We apply three patch correctness assessment techniques to comprehensively study the presence of overfitting patches in QuixBugs. Our key results are: (1) 16/40 buggy programs in QuixBugs can be repaired with at least a test suite adequate patch; (2) A total of 338 plausible patches are generated on the QuixBugs by the considered tools, and 53.3% of them are overfitting patches according to our manual assessment; (3) The three automated patch correctness assessment techniques, RGTEvosuite, RGTInputSampling and GTInvariants, achieve an accuracy of 98.2%, 80.8% and 58.3% in overfitting detection, respectively. To our knowledge, this is the largest empirical study of automatic repair on QuixBugs, combining both quantitative and qualitative insights. All our empirical results are publicly available on GitHub in order to facilitate future research on automatic program repair. 

sted, utgiver, år, opplag, sider
Elsevier Inc., 2021
Emneord
Automatic program repair, Bug benchmark, Patch correctness assessment, Software engineering, Assessment technique, Automatic programs, Empirical studies, External validities, Overfitting, Repair tools, Research communities, Automatic test pattern generation
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-285291 (URN)10.1016/j.jss.2020.110825 (DOI)000592499600001 ()2-s2.0-85091196139 (Scopus ID)
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Merknad

QC 20201202

Tilgjengelig fra: 2020-12-02 Laget: 2020-12-02 Sist oppdatert: 2024-03-18bibliografisk kontrollert
Madeiral Delfim, F. & Durieux, T. (2021). A large-scale study on human-cloned changes for automated program repair. In: 2021 IEEE/ACM 18Th International Conference On Mining Software Repositories (Msr 2021): . Paper presented at 29th IEEE/ACM International Conference on Program Comprehension (ICPC) / 18th IEEE/ACM International Conference on Mining Software Repositories (MSR), MAY 22-30, 2021, ELECTR NETWORK (pp. 510-514). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>A large-scale study on human-cloned changes for automated program repair
2021 (engelsk)Inngår i: 2021 IEEE/ACM 18Th International Conference On Mining Software Repositories (Msr 2021), Institute of Electrical and Electronics Engineers (IEEE) , 2021, s. 510-514Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Research in automatic program repair has shown that real bugs can be automatically fixed. However, there are several challenges involved in such a task that are not yet fully addressed. As an example, consider that a test-suite-based repair tool performs a change in a program to fix a bug spotted by a failing test case, but then the same or another test case fails. This could mean that the change is a partial fix for the bug or that another bug was manifested. However, the repair tool discards the change and possibly performs other repair attempts. One might wonder if the applied change should be also applied in other locations in the program so that the bug is fully fixed. In this paper, we are interested in investigating the extent of bug fix changes being cloned by developers within patches. Our goal is to investigate the need of multi-location repair by using identical or similar changes in identical or similar contexts. To do so, we analyzed 3,049 multi-hunk patches from the ManySStuBs4J dataset, which is a large dataset of single statement bug fix changes. We found out that 68% of the multi-hunk patches contain at least one change clone group. Moreover, most of these patches (70%) are strictly-cloned ones, which are patches fully composed of changes belonging to one single change clone group. Finally, most of the strictly-cloned patches (89%) contain change clones with identical changes, independently of their contexts. We conclude that automated solutions for creating patches composed of identical or similar changes can be useful for fixing bugs.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2021
Serie
IEEE International Working Conference on Mining Software Repositories, ISSN 2160-1852
Emneord
automatic program repair, patch, change clone
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-303514 (URN)10.1109/MSR52588.2021.00064 (DOI)000693399500052 ()2-s2.0-85111446885 (Scopus ID)
Konferanse
29th IEEE/ACM International Conference on Program Comprehension (ICPC) / 18th IEEE/ACM International Conference on Mining Software Repositories (MSR), MAY 22-30, 2021, ELECTR NETWORK
Merknad

Part of proceedings: ISBN 978-1-7281-8710-5, QC 20230117

Tilgjengelig fra: 2021-10-15 Laget: 2021-10-15 Sist oppdatert: 2023-01-17bibliografisk kontrollert
Soto Valero, C., Durieux, T. & Baudry, B. (2021). A Longitudinal Analysis of Bloated Java Dependencies. In: Spinellis, D Gousios, G Chechik, M DiPenta, M (Ed.), Proceedings Of The 29Th Acm Joint Meeting On European Software Engineering Conference And Symposium On The Foundations Of Software Engineering (Esec/Fse '21): . Paper presented at 29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), AUG 23-28, 2021, ELECTR NETWORK (pp. 1021-1031). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>A Longitudinal Analysis of Bloated Java Dependencies
2021 (engelsk)Inngår i: Proceedings Of The 29Th Acm Joint Meeting On European Software Engineering Conference And Symposium On The Foundations Of Software Engineering (Esec/Fse '21) / [ed] Spinellis, D Gousios, G Chechik, M DiPenta, M, Association for Computing Machinery (ACM) , 2021, s. 1021-1031Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we study across a total of 31,515 versions of Maven dependency trees. Bloated dependencies steadily increase over time, and 89.2 % of the direct dependencies that are bloated remain bloated in all subsequent versions of the studied projects. This empirical evidence suggests that developers can safely remove a bloated dependency. We further report novel insights regarding the unnecessary maintenance efforts induced by bloat. We find that 22 % of dependency updates performed by developers are made on bloated dependencies, and that Dependabot suggests a similar ratio of updates on bloated dependencies.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2021
Emneord
software bloat, dependencies, java
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-309540 (URN)10.1145/3468264.3468589 (DOI)000744425500088 ()2-s2.0-85116273059 (Scopus ID)
Konferanse
29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), AUG 23-28, 2021, ELECTR NETWORK
Merknad

Part of proceedings ISBN: 978-1-4503-8562-6

QC 20220316

Tilgjengelig fra: 2022-03-16 Laget: 2022-03-16 Sist oppdatert: 2023-05-10bibliografisk kontrollert
Harrand, N., Durieux, T., Broman, D. & Baudry, B. (2021). Automatic Diversity in the Software Supply Chain.
Åpne denne publikasjonen i ny fane eller vindu >>Automatic Diversity in the Software Supply Chain
2021 (engelsk)Rapport (Annet vitenskapelig)
Abstract [en]

Despite its obvious benefits, the increased adoption of package managers to automate the reuse of libraries has opened the door to a new class of hazards: supply chain attacks. By injecting malicious code in one library, an attacker may compromise all instances of all applications that depend on the library. To mitigate the impact of supply chain attacks, we propose the concept of Library Substitution Framework. This novel concept leverages one key observation: when an application depends on a library, it is very likely that there exists other libraries that provide similar features. The key objective of Library Substitution Framework is to enable the developers of an application to harness this diversity of libraries in their supply chain. The framework lets them generate a population of application variants, each depending on a different alternative library that provides similar functionalities. To investigate the relevance of this concept, we develop ARGO, a proof-of-concept implementation of this framework that harnesses the diversity of JSON suppliers. We study the feasibility of library substitution and its impact on a set of 368 clients. Our empirical results show that for 195 of the 368 java applications tested, we can substitute the original JSON library used by the client by at least 15 other JSON libraries without modifying the client's code. These results show the capacity of a Library Substitution Framework to diversify the supply chain of the client applications of the libraries it targets.

Publisher
s. 18
Emneord
Software supply chain, Library Substitution, Software repository, Software reuse, Java, Maven Central Repository
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-310821 (URN)
Forskningsfinansiär
Knut and Alice Wallenberg Foundation
Merknad

QC 20220419

Tilgjengelig fra: 2022-04-07 Laget: 2022-04-07 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Durieux, T., Soto Valero, C. & Baudry, B. (2021). DUETS: A Dataset of Reproducible Pairs of Java Library-Clients. In: 2021 IEEE/Acm 18Th International Conference On Mining Software Repositories (MSR 2021): . Paper presented at 29th IEEE/ACM International Conference on Program Comprehension (ICPC) / 18th IEEE/ACM International Conference on Mining Software Repositories (MSR), MAY 22-30, 2021, ELECTR NETWORK (pp. 545-549). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>DUETS: A Dataset of Reproducible Pairs of Java Library-Clients
2021 (engelsk)Inngår i: 2021 IEEE/Acm 18Th International Conference On Mining Software Repositories (MSR 2021), Institute of Electrical and Electronics Engineers (IEEE) , 2021, s. 545-549Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Software engineering researchers look for software artifacts to study their characteristics or to evaluate new techniques. In this paper, we introduce DUETS, a new dataset of software libraries and their clients. This dataset can be exploited to gain many different insights, such as API usage, usage inputs, or novel observations about the test suites of clients and libraries. DUETS is meant to support both static and dynamic analysis. This means that the libraries and the clients compile correctly, they are executable and their test suites pass. The dataset is composed of open-source projects that have more than five stars on GitHub. The final dataset contains 395 libraries and 2;874 clients. Additionally, we provide the raw data that we use to create this dataset, such as 34;560 pom.xml files or the complete file list from 34;560 projects. This dataset can be used to study how libraries are used by their clients or as a list of software projects that successfully build. The client's test suite can be used as an additional verification step for code transformation techniques that modify the libraries.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2021
Serie
IEEE International Working Conference on Mining Software Repositories, ISSN 2160-1852
Emneord
Mining software repositories, software reuse, Java, Maven
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-303381 (URN)10.1109/MSR52588.2021.00071 (DOI)000693399500059 ()2-s2.0-85108531256 (Scopus ID)
Konferanse
29th IEEE/ACM International Conference on Program Comprehension (ICPC) / 18th IEEE/ACM International Conference on Mining Software Repositories (MSR), MAY 22-30, 2021, ELECTR NETWORK
Merknad

Part of proceedings: ISBN 978-1-7281-8710-5, QC 20230117

Tilgjengelig fra: 2021-10-13 Laget: 2021-10-13 Sist oppdatert: 2023-01-17bibliografisk kontrollert
Harrand, N., Durieux, T., Broman, D. & Baudry, B. (2021). The Behavioral Diversity of Java JSON Libraries. In: IEEE (Ed.), 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE): . Paper presented at 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), Wuhan, China, October 25-28, 2021 (pp. 412-422). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>The Behavioral Diversity of Java JSON Libraries
2021 (engelsk)Inngår i: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) / [ed] IEEE, Institute of Electrical and Electronics Engineers (IEEE) , 2021, s. 412-422Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

JSON is an essential file and data format in domains that span scientific computing, web APIs or configuration management. Its popularity has motivated significant software development effort to build multiple libraries to process JSON data. Previous studies focus on performance comparison among these libraries and lack a software engineering perspective. We present the first systematic analysis and comparison of the input / output behavior of 20 JSON libraries, in a single software ecosystem: Java/Maven. We assess behavior diversity by running each library against a curated set of 473 JSON files, including both well-formed and ill-formed files. The main design differences, which influence the behavior of the libraries, relate to the choice of data structure to represent JSON objects and to the encoding of numbers. We observe a remarkable behavioral diversity with ill-formed files, or corner cases such as large numbers or duplicate data. Our unique behavioral assessment of JSON libraries paves the way for a robust processing of ill-formed files, through a multi-version architecture.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2021
Emneord
JSON, Java, Behavioral Diversity
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-310817 (URN)10.1109/ISSRE52982.2021.00050 (DOI)000783962100037 ()2-s2.0-85126395366 (Scopus ID)
Konferanse
2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), Wuhan, China, October 25-28, 2021
Forskningsfinansiär
Knut and Alice Wallenberg Foundation
Merknad

Part of ISBN 9781665425872

QC 20251002

Tilgjengelig fra: 2022-04-07 Laget: 2022-04-07 Sist oppdatert: 2025-10-02bibliografisk kontrollert
Durieux, T., Hamadi, Y. & Monperrus, M. (2020). Fully Automated HTML and JavaScript Rewriting for Constructing a Self-healing Web Proxy. Software testing, verification & reliability, 30(2), Article ID e1731.
Åpne denne publikasjonen i ny fane eller vindu >>Fully Automated HTML and JavaScript Rewriting for Constructing a Self-healing Web Proxy
2020 (engelsk)Inngår i: Software testing, verification & reliability, ISSN 0960-0833, E-ISSN 1099-1689, Vol. 30, nr 2, artikkel-id e1731Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Over the last few years, the complexity of web applications has increased to provide more dynamic web applications to users. The drawback of this complexity is the growing number of errors in the front-end applications. In this paper, we present an approach to provide self-healing for the web. We implemented this approach in two different tools: (i) BikiniProxy, an HTTP repair proxy, and (ii) BugBlock, a browser extension. They use five self-healing strategies to rewrite the buggy HTML and JavaScript code to handle errors in web pages. We evaluate BikiniProxy and BugBlock with a new benchmark of 555 reproducible JavaScript errors of which 31.76% can be automatically self-healed by BikiniProxy and 15.67% by BugBlock.

sted, utgiver, år, opplag, sider
WILEY, 2020
Emneord
self-healing, bugs, JavaScript, proxy, chrome extension
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-270897 (URN)10.1002/stvr.1731 (DOI)000515567500006 ()2-s2.0-85079571132 (Scopus ID)
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research , trustfull
Merknad

QC 20200324

Tilgjengelig fra: 2020-03-24 Laget: 2020-03-24 Sist oppdatert: 2024-03-18bibliografisk kontrollert
Ferreira, J. F., Cruz, P., Durieux, T. & Abreu, R. (2020). SmartBugs: A Framework to Analyze Solidity Smart Contracts. In: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, 22 September 2020 - 25 September 2020: . Paper presented at 35th IEEE/ACM International Conference on Automated Software Engineering (ASE), SEP 21-25, 2020, ELECTR NETWORK (pp. 1349-1352). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>SmartBugs: A Framework to Analyze Solidity Smart Contracts
2020 (engelsk)Inngår i: 35th IEEE/ACM International Conference on Automated Software Engineering, ASE 2020, 22 September 2020 - 25 September 2020, Association for Computing Machinery (ACM) , 2020, s. 1349-1352Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Over the last few years, there has been substantial research on automated analysis, testing, and debugging of Ethereum smart contracts. However, it is not trivial to compare and reproduce that research. To address this, we present SmartBugs, an extensible and easy-to-use execution framework that simplifies the execution of analysis tools on smart contracts written in Solidity, the primary language used in Ethereum. SmartBugs is currently distributed with support for 10 tools and two datasets of Solidity contracts. The first dataset can be used to evaluate the precision of analysis tools, as it contains 143 annotated vulnerable contracts with 208 tagged vulnerabilities. The second dataset contains 47,518 unique contracts collected through Etherscan. We discuss how SmartBugs supported the largest experimental setup to date both in the number of tools and in execution time. Moreover, we show how it enables easy integration and comparison of analysis tools by presenting a new extension to the tool SmartCheck that improves substantially the detection of vulnerabilities related to the DASP10 categories Bad Randomness, Time Manipulation, and Access Control (identified vulnerabilities increased from 11% to 24%).

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2020
Serie
IEEE ACM International Conference on Automated Software Engineering, ISSN 1527-1366
Emneord
Smart contracts, Solidity, Ethereum, Blockchain, Tools, Debugging, Testing, Reproducible Bugs
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-296846 (URN)10.1145/3324884.3415298 (DOI)000651313500142 ()2-s2.0-85099245555 (Scopus ID)
Konferanse
35th IEEE/ACM International Conference on Automated Software Engineering (ASE), SEP 21-25, 2020, ELECTR NETWORK
Merknad

QC 20210614

Tilgjengelig fra: 2021-06-14 Laget: 2021-06-14 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-1996-6134