kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Software Diversity for Third-Party Dependencies
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-2491-2771
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Thanks to the emergence of package managers and online software repositories, modern software development heavily relies on the reuse of third-party libraries. This practice has significant benefits in terms of productivity and reliability. Yet, the reuse of software libraries leads large groups of applications to share a significant amount of code, including potential defects such as bugs or vulnerabilities. The lack of diversity in these group of applications make them more prone to large-scale failures, and more predictable for attackers attempting to exploit their shared vulnerabilities.To mitigate these risks opened by library reuse, this dissertation proposes to introduce diversity in software applications.We create variants of software applications through transformations targeting the libraries they depend on. These variants provide functionalities equivalent to their original, while not sharing the exact same behavior.

In this dissertation, we cover three aspects of software diversity.First, we study the existing behavioral diversity of alternative libraries implementing similar functionalities.We perform two case studies on two families of reusable software artifacts: JSON libraries and Bytecode decompilers. We provide empirical evidence that both groups of artifacts exhibit significant natural input/output behavioral diversity.

Second, we study software transformations targeting libraries themselves. We propose six source-to-source transformations targeting software libraries, as well as a general architecture to implement library substitution. We implement this architecture in a JSON library substitution framework, leveraging the diversity of behavior we observe in JSON libraries. We assess the impact of these transformations on open-source libraries and software applications through two experiments.

Finally, we study the properties of software applications and libraries that make them prone to transformation without changing their functionalities. We analyze the variants produced during our software diversification experiments and discuss our findings. In particular, we observe that the existence of alternative implementations at different granularity, instructions, methods, classes, and libraries, provides an important source of potential diversity that can be leveraged.

Abstract [sv]

Tack vare uppkomsten av pakethanterare och mjukvaruförråd på nätet ärmodern programvaruutveckling i hög grad beroende av återanvändning avbibliotek från tredje part. Denna praxis har betydande fördelar när det gällerproduktivitet och tillförlitlighet. Återanvändning av programvarubibliotek iett stort antal program leder dock till att dessa program delar en betydandemängd kod, inklusive potentiella fel som buggar och sårbarheter. Omprogramvarudefekter delas i stor utsträckning uppstår en risk för storskaligafel. Dessutom ökar risken för att samma sårbarhet kan användas mot fleraprogram med samma tredje-partsbibliotek. För att minska riskerna medåteranvändning av bibliotek föreslås i denna avhandling att man skaparvarianter av programvaror genom omvandlingar som är inriktade på debibliotek programvarorna är beroende av.I denna avhandling täcker vi tre aspekter av mjukvarumångfald. Förststuderar vi den befintliga beteendemässiga mångfalden hos alternativabibliotek som implementerar likvärdig funktionalitet. Vi genomför tvåfallstudier av två familjer av återanvändbar mjukvara: JSON-bibliotek ochBytecode-dekompilatorer. Vi ger empiriska bevis för att båda grupperna avmjukvara uppvisar en betydande beteendemässig mångfald när det gällerinput/output.Den andra aspekten som vi studerar är programvaruomvandlingarinriktade på själva biblioteken. Vi föreslår sex omvandlingar från källkodtill källkod inriktade på mjukvarubibliotek, samt en generell arkitektur föratt genomföra ersättningar av hela bibliotek. Vi tillämpar denna arkitekturi ett ramverk för att ersätta JSON-bibliotek och utnyttjar den mångfaldav beteenden som vi observerar i dessa. Vi bedömer effekterna av dessaomvandlingar på bibliotek och program med öppen källkod genom tvåexperiment.Slutligen studerar vi de egenskaper hos programvara och bibliotek somgör att de lämpar sig för omvandling utan att deras funktionalitet ändras.Vi analyserar de varianter som produceras under våra mjukvarudiversifieringsexperiment och diskuterar våra resultat. Vi konstaterar särskilt att förekomsten av alternativa implementeringar i olika skala, instruktioner, metoder,klasser och bibliotek, utgör en viktig källa till potentiell mångfald som kanutnyttjas.

Place, publisher, year, edition, pages
Stockholm,Sweden: KTH Royal Institute of Technology, 2022. , p. 100
Series
TRITA-EECS-AVL ; 2022:22
Keywords [en]
Automated Software Engineering, Software Diversity, Software libraries, Software Monoculture
National Category
Computer Systems
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-310824ISBN: 978-91-8040-184-5 (print)OAI: oai:DiVA.org:kth-310824DiVA, id: diva2:1650630
Public defence
2022-05-05, D2, Lindstedtsvägen 9, Stockholm, 13:30 (English)
Opponent
Supervisors
Funder
Knut and Alice Wallenberg Foundation
Note

QCR 20220413

Available from: 2022-04-13 Created: 2022-04-07 Last updated: 2022-06-25Bibliographically approved
List of papers
1. A journey among Java neutral program variants
Open this publication in new window or tab >>A journey among Java neutral program variants
Show others...
2019 (English)In: Genetic Programming and Evolvable Machines, ISSN 1389-2576, E-ISSN 1573-7632, Vol. 20, no 4, p. 531-580Article in journal (Refereed) Published
Abstract [en]

Neutral program variants are alternative implementations of a program, yet equivalent with respect to the test suite. Techniques such as approximate computing or genetic improvement share the intuition that potential for enhancements lies in these acceptable behavioral differences (e.g., enhanced performance or reliability). Yet, the automatic synthesis of neutral program variants, through program transformations remains a key challenge. This work aims at characterizing plastic code regions in Java programs, i.e., the code regions that are modifiable while maintaining functional correctness, according to a test suite. Our empirical study relies on automatic variations of 6 real-world Java programs. First, we transform these programs with three state-of-the-art program transformations: add, replace and delete statements. We get a pool of 23,445 neutral variants, from which we gather the following novel insights: developers naturally write code that supports fine-grain behavioral changes; statement deletion is a surprisingly effective program transformation; high-level design decisions, such as the choice of a data structure, are natural points that can evolve while keeping functionality. Second, we design 3 novel program transformations, targeted at specific plastic regions. New experiments reveal that respectively 60%, 58% and 73% of the synthesized variants (175,688 in total) are neutral and exhibit execution traces that are different from the original.

Place, publisher, year, edition, pages
Springer, 2019
Keywords
Neutral program variant, Program transformation, Java, Code plasticity
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-264174 (URN)10.1007/s10710-019-09355-3 (DOI)000492843200004 ()2-s2.0-85068185414 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research , trustfull
Note

QC 20191122

Available from: 2019-11-22 Created: 2019-11-22 Last updated: 2022-06-26Bibliographically approved
2. Java decompiler diversity and its application to meta-decompilation
Open this publication in new window or tab >>Java decompiler diversity and its application to meta-decompilation
2020 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 168, article id 110645Article in journal (Refereed) Published
Abstract [en]

During compilation from Java source code to bytecode, some information is irreversibly lost. In other words, compilation and decompilation of Java code is not symmetric. Consequently, decompilation, which aims at producing source code from bytecode, relies on strategies to reconstruct the information that has been lost. Different Java decompilers use distinct strategies to achieve proper decompilation. In this work, we hypothesize that the diverse ways in which bytecode can be decompiled has a direct impact on the quality of the source code produced by decompilers. In this paper, we assess the strategies of eight Java decompilers with respect to three quality indicators: syntactic correctness, syntactic distortion and semantic equivalence modulo inputs. Our results show that no single modern decompiler is able to correctly handle the variety of bytecode structures coming from real-world programs. The highest ranking decompiler in this study produces syntactically correct, and semantically equivalent code output for 84%, respectively 78%, of the classes in our dataset. Our results demonstrate that each decompiler correctly handles a different set of bytecode classes. We propose a new decompiler called Arlecchino that leverages the diversity of existing decompilers. To do so, we merge partial decompilation into a new one based on compilation errors. Arlecchino handles 37.6% of bytecode classes that were previously handled by no decompiler. We publish the sources of this new bytecode decompiler. (C) 2020 Published by Elsevier Inc.

Place, publisher, year, edition, pages
Elsevier BV, 2020
Keywords
Java bytecode, Decompilation, Reverse engineering, Source code analysis
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-279880 (URN)10.1016/j.jss.2020.110645 (DOI)000557871300009 ()2-s2.0-85085736599 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research , trustfull
Note

QC 20210602

Available from: 2020-09-15 Created: 2020-09-15 Last updated: 2022-06-25Bibliographically approved
3. The Behavioral Diversity of Java JSON Libraries
Open this publication in new window or tab >>The Behavioral Diversity of Java JSON Libraries
2021 (English)In: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) / [ed] IEEE, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 412-422Conference paper, Published paper (Refereed)
Abstract [en]

JSON is an essential file and data format in domains that span scientific computing, web APIs or configuration management. Its popularity has motivated significant software development effort to build multiple libraries to process JSON data. Previous studies focus on performance comparison among these libraries and lack a software engineering perspective. We present the first systematic analysis and comparison of the input / output behavior of 20 JSON libraries, in a single software ecosystem: Java/Maven. We assess behavior diversity by running each library against a curated set of 473 JSON files, including both well-formed and ill-formed files. The main design differences, which influence the behavior of the libraries, relate to the choice of data structure to represent JSON objects and to the encoding of numbers. We observe a remarkable behavioral diversity with ill-formed files, or corner cases such as large numbers or duplicate data. Our unique behavioral assessment of JSON libraries paves the way for a robust processing of ill-formed files, through a multi-version architecture.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
JSON, Java, Behavioral Diversity
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-310817 (URN)10.1109/ISSRE52982.2021.00050 (DOI)000783962100037 ()2-s2.0-85126395366 (Scopus ID)
Conference
2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), Wuhan, China, October 25-28, 2021
Funder
Knut and Alice Wallenberg Foundation
Note

Part of ISBN 9781665425872

QC 20251002

Available from: 2022-04-07 Created: 2022-04-07 Last updated: 2025-10-02Bibliographically approved
4. API beauty is in the eye of the clients: 2.2 million Maven dependencies reveal the spectrum of client-API usages
Open this publication in new window or tab >>API beauty is in the eye of the clients: 2.2 million Maven dependencies reveal the spectrum of client-API usages
Show others...
2022 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 184, p. 111134-, article id 111134Article in journal (Refereed) Published
Abstract [en]

Hyrum's law states a common observation in the software industry: "With a sufficient number of users of an API, it does not matter what you promise in the contract: all observable behaviors of your system will be depended on by somebody". Meanwhile, recent research results seem to contradict this observation when they state that "for most APIs, there is a small number of features that are actually used". In this work, we perform a large scale empirical study of client-API relationships in the Maven ecosystem, in order to investigate this seeming paradox between the observations in industry and the research literature. We study the 94 most popular libraries in Maven Central, as well as the 829,410 client artifacts that declare a dependency to these libraries and that are available in Maven Central, summing up to 2.2M dependencies. Our analysis indicates the existence of a wide spectrum of API usages, with enough clients, most API types end up being used at least once. Our second key observation is that, for all libraries, there is a small set of API types that are used by the vast majority of its clients. The practical consequences of this study are two-fold: (i) it is possible for API maintainers to find an essential part of their API on which they can focus their efforts; (ii) API developers should limit the public API elements to the set of features for which they are ready to have users. (C) 2021 The Author(s). Published by Elsevier Inc.

Place, publisher, year, edition, pages
Elsevier BV, 2022
Keywords
Mining software repositories, Bytecode analysis, Software reuse, Java, Maven Central Repository
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-305550 (URN)10.1016/j.jss.2021.111134 (DOI)000722219800008 ()2-s2.0-85119384404 (Scopus ID)
Note

QC 20211206

Available from: 2021-12-06 Created: 2021-12-06 Last updated: 2022-12-06Bibliographically approved
5. Automatic Diversity in the Software Supply Chain
Open this publication in new window or tab >>Automatic Diversity in the Software Supply Chain
2021 (English)Report (Other academic)
Abstract [en]

Despite its obvious benefits, the increased adoption of package managers to automate the reuse of libraries has opened the door to a new class of hazards: supply chain attacks. By injecting malicious code in one library, an attacker may compromise all instances of all applications that depend on the library. To mitigate the impact of supply chain attacks, we propose the concept of Library Substitution Framework. This novel concept leverages one key observation: when an application depends on a library, it is very likely that there exists other libraries that provide similar features. The key objective of Library Substitution Framework is to enable the developers of an application to harness this diversity of libraries in their supply chain. The framework lets them generate a population of application variants, each depending on a different alternative library that provides similar functionalities. To investigate the relevance of this concept, we develop ARGO, a proof-of-concept implementation of this framework that harnesses the diversity of JSON suppliers. We study the feasibility of library substitution and its impact on a set of 368 clients. Our empirical results show that for 195 of the 368 java applications tested, we can substitute the original JSON library used by the client by at least 15 other JSON libraries without modifying the client's code. These results show the capacity of a Library Substitution Framework to diversify the supply chain of the client applications of the libraries it targets.

Publisher
p. 18
Keywords
Software supply chain, Library Substitution, Software repository, Software reuse, Java, Maven Central Repository
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-310821 (URN)
Funder
Knut and Alice Wallenberg Foundation
Note

QC 20220419

Available from: 2022-04-07 Created: 2022-04-07 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

fulltext(1620 kB)666 downloads
File information
File name FULLTEXT01.pdfFile size 1620 kBChecksum SHA-512
985dba90959c39a98e5ebb6139ad91b8f871cf43106dd262a220f90728ff35d6770767d1274873bc2cfb160144aeefe0db3b1af08e20e874b742c188e2351210
Type fulltextMimetype application/pdf

Authority records

Harrand, Nicolas

Search in DiVA

By author/editor
Harrand, Nicolas
By organisation
Software and Computer systems, SCS
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 670 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2104 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf