kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 14) Show all publications
Etemadi, K., Harrand, N., Larsén, S., Adzemovic, H., Luong Phu, H., Verma, A., . . . Monperrus, M. (2023). Sorald: Automatic Patch Suggestions for SonarQube Static Analysis Violations. IEEE Transactions on Dependable and Secure Computing, 20(4), 2794-2810
Open this publication in new window or tab >>Sorald: Automatic Patch Suggestions for SonarQube Static Analysis Violations
Show others...
2023 (English)In: IEEE Transactions on Dependable and Secure Computing, ISSN 1545-5971, E-ISSN 1941-0018, Vol. 20, no 4, p. 2794-2810Article in journal (Refereed) Published
Abstract [en]

Previous work has shown that early resolution of issues detected by static code analyzers can prevent major costs later on. However, developers often ignore such issues for two main reasons. First, many issues should be interpreted to determine if they correspond to actual flaws in the program. Second, static analyzers often do not present the issues in a way that is actionable. To address these problems, we present Sorald: a novel system that uses metaprogramming templates to transform the abstract syntax trees of programs and suggests fixes for static analysis warnings. Thus, the burden on the developer is reduced from interpreting and fixing static issues, to inspecting and approving full fledged solutions. Sorald fixes violations of 10 rules from SonarJava, one of the most widely used static analyzers for Java. We evaluate Sorald on a dataset of 161 popular repositories on Github. Our analysis shows the effectiveness of Sorald as it fixes 65% (852/1,307) of the violations that meets the repair preconditions. Overall, our experiments show it is possible to automatically fix notable violations of the static analysis rules produced by the state-of-the-art static analyzer SonarJava.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
automatic program repair, Codes, Computer bugs, Java, Maintenance engineering, metaprogramming, Software development management, Static analysis, Static code analysis, Syntactics, Codes (symbols), Computer software, Java programming language, Program debugging, Repair, Software design, Trees (mathematics), Automatic programs, Code, Meta Programming, Static analyzers, Static codes
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-323274 (URN)10.1109/TDSC.2022.3167316 (DOI)001029054600009 ()2-s2.0-85128651786 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research, trustfull
Note

QC 20250513

Available from: 2023-01-24 Created: 2023-01-24 Last updated: 2025-05-13Bibliographically approved
Harrand, N., Benelallam, A., Soto Valero, C., Bettega, F., Barais, O. & Baudry, B. (2022). API beauty is in the eye of the clients: 2.2 million Maven dependencies reveal the spectrum of client-API usages. Journal of Systems and Software, 184, 111134, Article ID 111134.
Open this publication in new window or tab >>API beauty is in the eye of the clients: 2.2 million Maven dependencies reveal the spectrum of client-API usages
Show others...
2022 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 184, p. 111134-, article id 111134Article in journal (Refereed) Published
Abstract [en]

Hyrum's law states a common observation in the software industry: "With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody". Meanwhile, recent research results seem to contradict this observation when they state that "for most APIs, there is a small number of features that are actually used". In this work, we perform a large scale empirical study of client-API relationships in the Maven ecosystem, in order to investigate this seeming paradox between the observations in industry and the research literature. We study the 94 most popular libraries in Maven Central, as well as the 829,410 client artifacts that declare a dependency to these libraries and that are available in Maven Central, summing up to 2.2M dependencies. Our analysis indicates the existence of a wide spectrum of API usages, with enough clients, most API types end up being used at least once. Our second key observation is that, for all libraries, there is a small set of API types that are used by the vast majority of its clients. The practical consequences of this study are two-fold: (i) it is possible for API maintainers to find an essential part of their API on which they can focus their efforts; (ii) API developers should limit the public API elements to the set of features for which they are ready to have users. (C) 2021 The Author(s). Published by Elsevier Inc.

Place, publisher, year, edition, pages
Elsevier BV, 2022
Keywords
Mining software repositories, Bytecode analysis, Software reuse, Java, Maven Central Repository
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-305550 (URN)10.1016/j.jss.2021.111134 (DOI)000722219800008 ()2-s2.0-85119384404 (Scopus ID)
Note

QC 20211206

Available from: 2021-12-06 Created: 2021-12-06 Last updated: 2022-12-06Bibliographically approved
Soto Valero, C., Durieux, T., Harrand, N. & Baudry, B. (2022). Coverage-Based Debloating for Java Bytecode. ACM Transactions on Software Engineering and Methodology
Open this publication in new window or tab >>Coverage-Based Debloating for Java Bytecode
2022 (English)In: ACM Transactions on Software Engineering and Methodology, ISSN 1049-331X, E-ISSN 1557-7392Article in journal (Refereed) Published
Abstract [en]

Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating, which we call coverage-based debloating. We implement the technique for one single language: Java bytecode. We leverage a combination of state-of-the-art Java bytecode coverage tools to precisely capture what parts of a project and its dependencies are used when running with a specific workload. Then, we automatically remove the parts that are not covered, in order to generate a debloated version of the project. We succeed to debloat 211 library versions from a dataset of 94 unique  open-source Java libraries. The debloated versions are syntactically correct and preserve their original behavior according to the workload. Our results indicate that 68.3% of the libraries’ bytecode and 20.3% of their total dependencies can be removed through coverage-based debloating.

For the first time in the literature on software debloating, we assess the utility of debloated libraries with respect to client applications that reuse them. We select 988 client projects that either have a direct reference to the debloated library in their source code or which test suite covers at least one class of the libraries that we debloat. Our results show that 81.5% of the clients, with at least one test that uses the library, successfully compile and pass their test suite when the original library is replaced by its debloated version.

Place, publisher, year, edition, pages
ACM Digital Library, 2022
Keywords
software bloat, code coverage, program specialization, bytecode, software maintenance
National Category
Computer Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-316426 (URN)10.1145/3546948 (DOI)000970588900011 ()2-s2.0-85147732395 (Scopus ID)
Projects
WASP
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20230529

Available from: 2022-08-17 Created: 2022-08-17 Last updated: 2023-05-29Bibliographically approved
Harrand, N. (2022). Software Diversity for Third-Party Dependencies. (Doctoral dissertation). Stockholm,Sweden: KTH Royal Institute of Technology
Open this publication in new window or tab >>Software Diversity for Third-Party Dependencies
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Thanks to the emergence of package managers and online software repositories, modern software development heavily relies on the reuse of third-party libraries. This practice has significant benefits in terms of productivity and reliability. Yet, the reuse of software libraries leads large groups of applications to share a significant amount of code, including potential defects such as bugs or vulnerabilities. The lack of diversity in these group of applications make them more prone to large-scale failures, and more predictable for attackers attempting to exploit their shared vulnerabilities.To mitigate these risks opened by library reuse, this dissertation proposes to introduce diversity in software applications.We create variants of software applications through transformations targeting the libraries they depend on. These variants provide functionalities equivalent to their original, while not sharing the exact same behavior.

In this dissertation, we cover three aspects of software diversity.First, we study the existing behavioral diversity of alternative libraries implementing similar functionalities.We perform two case studies on two families of reusable software artifacts: JSON libraries and Bytecode decompilers. We provide empirical evidence that both groups of artifacts exhibit significant natural input/output behavioral diversity.

Second, we study software transformations targeting libraries themselves. We propose six source-to-source transformations targeting software libraries, as well as a general architecture to implement library substitution. We implement this architecture in a JSON library substitution framework, leveraging the diversity of behavior we observe in JSON libraries. We assess the impact of these transformations on open-source libraries and software applications through two experiments.

Finally, we study the properties of software applications and libraries that make them prone to transformation without changing their functionalities. We analyze the variants produced during our software diversification experiments and discuss our findings. In particular, we observe that the existence of alternative implementations at different granularity, instructions, methods, classes, and libraries, provides an important source of potential diversity that can be leveraged.

Abstract [sv]

Tack vare uppkomsten av pakethanterare och mjukvaruförråd på nätet ärmodern programvaruutveckling i hög grad beroende av återanvändning avbibliotek från tredje part. Denna praxis har betydande fördelar när det gällerproduktivitet och tillförlitlighet. Återanvändning av programvarubibliotek iett stort antal program leder dock till att dessa program delar en betydandemängd kod, inklusive potentiella fel som buggar och sårbarheter. Omprogramvarudefekter delas i stor utsträckning uppstår en risk för storskaligafel. Dessutom ökar risken för att samma sårbarhet kan användas mot fleraprogram med samma tredje-partsbibliotek. För att minska riskerna medåteranvändning av bibliotek föreslås i denna avhandling att man skaparvarianter av programvaror genom omvandlingar som är inriktade på debibliotek programvarorna är beroende av.I denna avhandling täcker vi tre aspekter av mjukvarumångfald. Förststuderar vi den befintliga beteendemässiga mångfalden hos alternativabibliotek som implementerar likvärdig funktionalitet. Vi genomför tvåfallstudier av två familjer av återanvändbar mjukvara: JSON-bibliotek ochBytecode-dekompilatorer. Vi ger empiriska bevis för att båda grupperna avmjukvara uppvisar en betydande beteendemässig mångfald när det gällerinput/output.Den andra aspekten som vi studerar är programvaruomvandlingarinriktade på själva biblioteken. Vi föreslår sex omvandlingar från källkodtill källkod inriktade på mjukvarubibliotek, samt en generell arkitektur föratt genomföra ersättningar av hela bibliotek. Vi tillämpar denna arkitekturi ett ramverk för att ersätta JSON-bibliotek och utnyttjar den mångfaldav beteenden som vi observerar i dessa. Vi bedömer effekterna av dessaomvandlingar på bibliotek och program med öppen källkod genom tvåexperiment.Slutligen studerar vi de egenskaper hos programvara och bibliotek somgör att de lämpar sig för omvandling utan att deras funktionalitet ändras.Vi analyserar de varianter som produceras under våra mjukvarudiversifieringsexperiment och diskuterar våra resultat. Vi konstaterar särskilt att förekomsten av alternativa implementeringar i olika skala, instruktioner, metoder,klasser och bibliotek, utgör en viktig källa till potentiell mångfald som kanutnyttjas.

Place, publisher, year, edition, pages
Stockholm,Sweden: KTH Royal Institute of Technology, 2022. p. 100
Series
TRITA-EECS-AVL ; 2022:22
Keywords
Automated Software Engineering, Software Diversity, Software libraries, Software Monoculture
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-310824 (URN)978-91-8040-184-5 (ISBN)
Public defence
2022-05-05, D2, Lindstedtsvägen 9, Stockholm, 13:30 (English)
Opponent
Supervisors
Funder
Knut and Alice Wallenberg Foundation
Note

QCR 20220413

Available from: 2022-04-13 Created: 2022-04-07 Last updated: 2022-06-25Bibliographically approved
Soto Valero, C., Harrand, N., Monperrus, M. & Baudry, B. (2021). A comprehensive study of bloated dependencies in the Maven ecosystem. Empirical Software Engineering, 26(3), Article ID 45.
Open this publication in new window or tab >>A comprehensive study of bloated dependencies in the Maven ecosystem
2021 (English)In: Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 26, no 3, article id 45Article in journal (Refereed) Published
Abstract [en]

Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application's code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challenges related to dependency management. In this paper, we propose an original study of one such challenge: the emergence of bloated dependencies. Bloated dependencies are libraries that are packaged with the application's compiled code but that are actually not necessary to build and run the application. They artificially grow the size of the built binary and increase maintenance effort. We propose DepClean, a tool to determine the presence of bloated dependencies in Maven artifacts. We analyze 9,639 Java artifacts hosted on Maven Central, which include a total of 723,444 dependency relationships. Our key result is as follows: 2.7% of the dependencies directly declared are bloated, 15.4% of the inherited dependencies are bloated, and 57% of the transitive dependencies of the studied artifacts are bloated. In other words, it is feasible to reduce the number of dependencies of Maven artifacts to 1/4 of its current count. Our qualitative assessment with 30 notable open-source projects indicates that developers pay attention to their dependencies when they are notified of the problem. They are willing to remove bloated dependencies: 21/26 answered pull requests were accepted and merged by developers, removing 140 dependencies in total: 75 direct and 65 transitive.

Place, publisher, year, edition, pages
Springer Nature, 2021
Keywords
Dependency management, Software reuse, Debloating, Program analysis
National Category
Software Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-293389 (URN)10.1007/s10664-020-09914-8 (DOI)000634831400002 ()2-s2.0-85103393782 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research , trustfull
Note

QC 20210519

Available from: 2021-04-26 Created: 2021-04-26 Last updated: 2024-03-18Bibliographically approved
Harrand, N., Durieux, T., Broman, D. & Baudry, B. (2021). Automatic Diversity in the Software Supply Chain.
Open this publication in new window or tab >>Automatic Diversity in the Software Supply Chain
2021 (English)Report (Other academic)
Abstract [en]

Despite its obvious benefits, the increased adoption of package managers to automate the reuse of libraries has opened the door to a new class of hazards: supply chain attacks. By injecting malicious code in one library, an attacker may compromise all instances of all applications that depend on the library. To mitigate the impact of supply chain attacks, we propose the concept of Library Substitution Framework. This novel concept leverages one key observation: when an application depends on a library, it is very likely that there exists other libraries that provide similar features. The key objective of Library Substitution Framework is to enable the developers of an application to harness this diversity of libraries in their supply chain. The framework lets them generate a population of application variants, each depending on a different alternative library that provides similar functionalities. To investigate the relevance of this concept, we develop ARGO, a proof-of-concept implementation of this framework that harnesses the diversity of JSON suppliers. We study the feasibility of library substitution and its impact on a set of 368 clients. Our empirical results show that for 195 of the 368 java applications tested, we can substitute the original JSON library used by the client by at least 15 other JSON libraries without modifying the client's code. These results show the capacity of a Library Substitution Framework to diversify the supply chain of the client applications of the libraries it targets.

Publisher
p. 18
Keywords
Software supply chain, Library Substitution, Software repository, Software reuse, Java, Maven Central Repository
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-310821 (URN)
Funder
Knut and Alice Wallenberg Foundation
Note

QC 20220419

Available from: 2022-04-07 Created: 2022-04-07 Last updated: 2022-06-25Bibliographically approved
Harrand, N., Durieux, T., Broman, D. & Baudry, B. (2021). The Behavioral Diversity of Java JSON Libraries. In: IEEE (Ed.), 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE): . Paper presented at International Symposium on Software Reliability Engineering (ISSRE). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>The Behavioral Diversity of Java JSON Libraries
2021 (English)In: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) / [ed] IEEE, Institute of Electrical and Electronics Engineers (IEEE) , 2021Conference paper, Published paper (Refereed)
Abstract [en]

JSON is an essential file and data format in domains that span scientific computing, web APIs or configuration management. Its popularity has motivated significant software development effort to build multiple libraries to process JSON data. Previous studies focus on performance comparison among these libraries and lack a software engineering perspective. We present the first systematic analysis and comparison of the input / output behavior of 20 JSON libraries, in a single software ecosystem: Java/Maven. We assess behavior diversity by running each library against a curated set of 473 JSON files, including both well-formed and ill-formed files. The main design differences, which influence the behavior of the libraries, relate to the choice of data structure to represent JSON objects and to the encoding of numbers. We observe a remarkable behavioral diversity with ill-formed files, or corner cases such as large numbers or duplicate data. Our unique behavioral assessment of JSON libraries paves the way for a robust processing of ill-formed files, through a multi-version architecture.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
JSON, Java, Behavioral Diversity
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-310817 (URN)10.1109/ISSRE52982.2021.00050 (DOI)000783962100037 ()2-s2.0-85126395366 (Scopus ID)
Conference
International Symposium on Software Reliability Engineering (ISSRE)
Funder
Knut and Alice Wallenberg Foundation
Note

Part of proceedings: ISBN 978-1-6654-2587-2

QC 20220524

Available from: 2022-04-07 Created: 2022-04-07 Last updated: 2023-01-18Bibliographically approved
Harrand, N., Soto Valero, C., Monperrus, M. & Baudry, B. (2020). Java decompiler diversity and its application to meta-decompilation. Journal of Systems and Software, 168, Article ID 110645.
Open this publication in new window or tab >>Java decompiler diversity and its application to meta-decompilation
2020 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 168, article id 110645Article in journal (Refereed) Published
Abstract [en]

During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. In this paper, we assess the strategies of eight Java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. Our results show that no single modern decompiler is able to correctly handle the variety of bytecode structures coming from real-world programs. The highest ranking decompiler in this study produces syntactically correct, and semantically equivalent code output for 84%, respectively 78%, of the classes in our dataset. Our results demonstrate that each decompiler correctly handles a different set of bytecode classes. We propose a new decompiler called Arlecchino that leverages the diversity of existing decompilers. To do so, we merge partial decompilation into a new one based on compilation errors. Arlecchino handles 37.6% of bytecode classes that were previously handled by no decompiler. We publish the sources of this new bytecode decompiler. (C) 2020 Published by Elsevier Inc.

Place, publisher, year, edition, pages
Elsevier BV, 2020
Keywords
Java bytecode, Decompilation, Reverse engineering, Source code analysis
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-279880 (URN)10.1016/j.jss.2020.110645 (DOI)000557871300009 ()2-s2.0-85085736599 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research , trustfull
Note

QC 20210602

Available from: 2020-09-15 Created: 2020-09-15 Last updated: 2022-06-25Bibliographically approved
Harrand, N., Allier, S., Rodriguez-Cancio, M., Monperrus, M. & Baudry, B. (2019). A journey among Java neutral program variants. Genetic Programming and Evolvable Machines, 20(4), 531-580
Open this publication in new window or tab >>A journey among Java neutral program variants
Show others...
2019 (English)In: Genetic Programming and Evolvable Machines, ISSN 1389-2576, E-ISSN 1573-7632, Vol. 20, no 4, p. 531-580Article in journal (Refereed) Published
Abstract [en]

Neutral program variants are alternative implementations of a program, yet equivalent with respect to the test suite. Techniques such as approximate computing or genetic improvement share the intuition that potential for enhancements lies in these acceptable behavioral differences (e.g., enhanced performance or reliability). Yet, the automatic synthesis of neutral program variants, through program transformations remains a key challenge. This work aims at characterizing plastic code regions in Java programs, i.e., the code regions that are modifiable while maintaining functional correctness, according to a test suite. Our empirical study relies on automatic variations of 6 real-world Java programs. First, we transform these programs with three state-of-the-art program transformations: add, replace and delete statements. We get a pool of 23,445 neutral variants, from which we gather the following novel insights: developers naturally write code that supports fine-grain behavioral changes; statement deletion is a surprisingly effective program transformation; high-level design decisions, such as the choice of a data structure, are natural points that can evolve while keeping functionality. Second, we design 3 novel program transformations, targeted at specific plastic regions. New experiments reveal that respectively 60%, 58% and 73% of the synthesized variants (175,688 in total) are neutral and exhibit execution traces that are different from the original.

Place, publisher, year, edition, pages
Springer, 2019
Keywords
Neutral program variant, Program transformation, Java, Code plasticity
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-264174 (URN)10.1007/s10710-019-09355-3 (DOI)000492843200004 ()2-s2.0-85068185414 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research , trustfull
Note

QC 20191122

Available from: 2019-11-22 Created: 2019-11-22 Last updated: 2022-06-26Bibliographically approved
Soto Valero, C., Benelallam, A., Harrand, N., Barais, O. & Baudry, B. (2019). The Emergence of Software Diversity inMaven Central. In: 16th International Conference on Mining Software Repositories: . Paper presented at MSR (pp. 333-343). Montréal, QC, Canada: IEEE conference proceedings
Open this publication in new window or tab >>The Emergence of Software Diversity inMaven Central
Show others...
2019 (English)In: 16th International Conference on Mining Software Repositories, Montréal, QC, Canada: IEEE conference proceedings, 2019, p. 333-343Conference paper, Published paper (Refereed)
Abstract [en]

Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to releasea new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a library can pick any version. In this work, we hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central. We analyze 1,487,956 artifacts that represent all the versions of 73,653 libraries. We observe that more than 30% of libraries have multiple versions that are actively used by latest artifacts. In the case of popular libraries, more than 50% of their versions are used. We also observe that more than 17% of libraries have several versions that are significantly more used than the other versions. Our results indicate that the immutability of artifacts in Maven Central does support a sustained level of diversity among versions of libraries in the repository.

Place, publisher, year, edition, pages
Montréal, QC, Canada: IEEE conference proceedings, 2019
Keywords
Maven Central, Software Diversity, LibraryVersions, Evolution, Open-Source Software
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-254553 (URN)10.1109/MSR.2019.00059 (DOI)2-s2.0-85072344031 (Scopus ID)
Conference
MSR
Note

QC 20190802

Available from: 2019-07-01 Created: 2019-07-01 Last updated: 2023-05-10Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-2491-2771

Search in DiVA

Show all publications