kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The Behavioral Diversity of Java JSON Libraries
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Digital futures.ORCID iD: 0000-0002-2491-2771
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Digital futures.ORCID iD: 0000-0002-1996-6134
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Digital futures.ORCID iD: 0000-0001-8457-4105
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Digital futures.ORCID iD: 0000-0002-4015-4640
2021 (English)In: 2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE) / [ed] IEEE, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 412-422Conference paper, Published paper (Refereed)
Abstract [en]

JSON is an essential file and data format in domains that span scientific computing, web APIs or configuration management. Its popularity has motivated significant software development effort to build multiple libraries to process JSON data. Previous studies focus on performance comparison among these libraries and lack a software engineering perspective. We present the first systematic analysis and comparison of the input / output behavior of 20 JSON libraries, in a single software ecosystem: Java/Maven. We assess behavior diversity by running each library against a curated set of 473 JSON files, including both well-formed and ill-formed files. The main design differences, which influence the behavior of the libraries, relate to the choice of data structure to represent JSON objects and to the encoding of numbers. We observe a remarkable behavioral diversity with ill-formed files, or corner cases such as large numbers or duplicate data. Our unique behavioral assessment of JSON libraries paves the way for a robust processing of ill-formed files, through a multi-version architecture.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2021. p. 412-422
Keywords [en]
JSON, Java, Behavioral Diversity
National Category
Computer Systems
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-310817DOI: 10.1109/ISSRE52982.2021.00050ISI: 000783962100037Scopus ID: 2-s2.0-85126395366OAI: oai:DiVA.org:kth-310817DiVA, id: diva2:1650589
Conference
2021 IEEE 32nd International Symposium on Software Reliability Engineering (ISSRE), Wuhan, China, October 25-28, 2021
Funder
Knut and Alice Wallenberg Foundation
Note

Part of ISBN 9781665425872

QC 20251002

Available from: 2022-04-07 Created: 2022-04-07 Last updated: 2025-10-02Bibliographically approved
In thesis
1. Software Diversity for Third-Party Dependencies
Open this publication in new window or tab >>Software Diversity for Third-Party Dependencies
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Thanks to the emergence of package managers and online software repositories, modern software development heavily relies on the reuse of third-party libraries. This practice has significant benefits in terms of productivity and reliability. Yet, the reuse of software libraries leads large groups of applications to share a significant amount of code, including potential defects such as bugs or vulnerabilities. The lack of diversity in these group of applications make them more prone to large-scale failures, and more predictable for attackers attempting to exploit their shared vulnerabilities.To mitigate these risks opened by library reuse, this dissertation proposes to introduce diversity in software applications.We create variants of software applications through transformations targeting the libraries they depend on. These variants provide functionalities equivalent to their original, while not sharing the exact same behavior.

In this dissertation, we cover three aspects of software diversity.First, we study the existing behavioral diversity of alternative libraries implementing similar functionalities.We perform two case studies on two families of reusable software artifacts: JSON libraries and Bytecode decompilers. We provide empirical evidence that both groups of artifacts exhibit significant natural input/output behavioral diversity.

Second, we study software transformations targeting libraries themselves. We propose six source-to-source transformations targeting software libraries, as well as a general architecture to implement library substitution. We implement this architecture in a JSON library substitution framework, leveraging the diversity of behavior we observe in JSON libraries. We assess the impact of these transformations on open-source libraries and software applications through two experiments.

Finally, we study the properties of software applications and libraries that make them prone to transformation without changing their functionalities. We analyze the variants produced during our software diversification experiments and discuss our findings. In particular, we observe that the existence of alternative implementations at different granularity, instructions, methods, classes, and libraries, provides an important source of potential diversity that can be leveraged.

Abstract [sv]

Tack vare uppkomsten av pakethanterare och mjukvaruförråd på nätet ärmodern programvaruutveckling i hög grad beroende av återanvändning avbibliotek från tredje part. Denna praxis har betydande fördelar när det gällerproduktivitet och tillförlitlighet. Återanvändning av programvarubibliotek iett stort antal program leder dock till att dessa program delar en betydandemängd kod, inklusive potentiella fel som buggar och sårbarheter. Omprogramvarudefekter delas i stor utsträckning uppstår en risk för storskaligafel. Dessutom ökar risken för att samma sårbarhet kan användas mot fleraprogram med samma tredje-partsbibliotek. För att minska riskerna medåteranvändning av bibliotek föreslås i denna avhandling att man skaparvarianter av programvaror genom omvandlingar som är inriktade på debibliotek programvarorna är beroende av.I denna avhandling täcker vi tre aspekter av mjukvarumångfald. Förststuderar vi den befintliga beteendemässiga mångfalden hos alternativabibliotek som implementerar likvärdig funktionalitet. Vi genomför tvåfallstudier av två familjer av återanvändbar mjukvara: JSON-bibliotek ochBytecode-dekompilatorer. Vi ger empiriska bevis för att båda grupperna avmjukvara uppvisar en betydande beteendemässig mångfald när det gällerinput/output.Den andra aspekten som vi studerar är programvaruomvandlingarinriktade på själva biblioteken. Vi föreslår sex omvandlingar från källkodtill källkod inriktade på mjukvarubibliotek, samt en generell arkitektur föratt genomföra ersättningar av hela bibliotek. Vi tillämpar denna arkitekturi ett ramverk för att ersätta JSON-bibliotek och utnyttjar den mångfaldav beteenden som vi observerar i dessa. Vi bedömer effekterna av dessaomvandlingar på bibliotek och program med öppen källkod genom tvåexperiment.Slutligen studerar vi de egenskaper hos programvara och bibliotek somgör att de lämpar sig för omvandling utan att deras funktionalitet ändras.Vi analyserar de varianter som produceras under våra mjukvarudiversifieringsexperiment och diskuterar våra resultat. Vi konstaterar särskilt att förekomsten av alternativa implementeringar i olika skala, instruktioner, metoder,klasser och bibliotek, utgör en viktig källa till potentiell mångfald som kanutnyttjas.

Place, publisher, year, edition, pages
Stockholm,Sweden: KTH Royal Institute of Technology, 2022. p. 100
Series
TRITA-EECS-AVL ; 2022:22
Keywords
Automated Software Engineering, Software Diversity, Software libraries, Software Monoculture
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-310824 (URN)978-91-8040-184-5 (ISBN)
Public defence
2022-05-05, D2, Lindstedtsvägen 9, Stockholm, 13:30 (English)
Opponent
Supervisors
Funder
Knut and Alice Wallenberg Foundation
Note

QCR 20220413

Available from: 2022-04-13 Created: 2022-04-07 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

fulltext(296 kB)460 downloads
File information
File name FULLTEXT01.pdfFile size 296 kBChecksum SHA-512
f7254488b0184a031323ceba756bb11a73884d10b2ab65881ca8cb26e39dc56e06ae70776e098dd7bd94195c39ea4ab6373ddcc6b15fa5ea93d4707773a652e5
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Harrand, NicolasDurieux, ThomasBroman, DavidBaudry, Benoit

Search in DiVA

By author/editor
Harrand, NicolasDurieux, ThomasBroman, DavidBaudry, Benoit
By organisation
Software and Computer systems, SCSDigital futures
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 463 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 263 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf