kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Coverage-Based Debloating for Java Bytecode
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0003-0541-6411
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-1996-6134
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-2491-2771
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-4015-4640
2023 (English)In: ACM Transactions on Software Engineering and Methodology, ISSN 1049-331X, E-ISSN 1557-7392, Vol. 32, no 2, p. 1-34Article in journal (Refereed) Published
Abstract [en]

Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating, which we call coverage-based debloating. We implement the technique for one single language: Java bytecode. We leverage a combination of state-of-the-art Java bytecode coverage tools to precisely capture what parts of a project and its dependencies are used when running with a specific workload. Then, we automatically remove the parts that are not covered, in order to generate a debloated version of the project. We succeed to debloat 211 library versions from a dataset of 94 unique  open-source Java libraries. The debloated versions are syntactically correct and preserve their original behavior according to the workload. Our results indicate that 68.3% of the libraries’ bytecode and 20.3% of their total dependencies can be removed through coverage-based debloating.

For the first time in the literature on software debloating, we assess the utility of debloated libraries with respect to client applications that reuse them. We select 988 client projects that either have a direct reference to the debloated library in their source code or which test suite covers at least one class of the libraries that we debloat. Our results show that 81.5% of the clients, with at least one test that uses the library, successfully compile and pass their test suite when the original library is replaced by its debloated version.

Place, publisher, year, edition, pages
ACM Digital Library, 2023. Vol. 32, no 2, p. 1-34
Keywords [en]
software bloat, code coverage, program specialization, bytecode, software maintenance
National Category
Computer Engineering
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-316426DOI: 10.1145/3546948ISI: 000970588900011Scopus ID: 2-s2.0-85147732395OAI: oai:DiVA.org:kth-316426DiVA, id: diva2:1687986
Projects
WASP
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20251222

Available from: 2022-08-17 Created: 2022-08-17 Last updated: 2025-12-22Bibliographically approved
In thesis
1. Debloating Java Dependencies
Open this publication in new window or tab >>Debloating Java Dependencies
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Software systems have a natural tendency to grow in size and complexity. A part of this growth comes with the addition of new features or bug fixes, while another part is due to useless code that accumulates over time. This phenomenon, known as "software bloat," increases with the practice of reusing software dependencies, which has exceeded the capacity of human developers to efficiently manage them. Software bloat in third-party dependencies presents a multifaceted challenge for application development, encompassing issues of security, performance, and maintenance. To address these issues, researchers have developed software debloating techniques that automatically remove unnecessary code. Despite significant progress has been made in the realm of software debloating, the pervasive issue of dependency bloat warrants special attention. In this thesis, we contribute to the field of software debloating by proposing novel techniques specifically targeting dependencies in the Java ecosystem.

First, we investigate the growth of completely unused software dependencies, which we call "bloated dependencies." We propose a technique to automatically detect and remove bloated dependencies in Java projects built with Maven. We empirically study the usage status of dependencies in the Maven Central repository and remove bloated dependencies in mature Java projects. We demonstrate that once a bloated dependency is detected, it can be safely removed as its future usage is unlikely.

Second, we focus on dependencies that are only partially used. We introduce a technique to specialize these dependencies in Java projects based on their actual usage. Our approach systematically identifies the subset of functionalities within each dependency that is sufficient to build the project and removes the rest. We demonstrate that our dependency specialization approach can halve the project classes to dependency classes ratio.

Last, we assess the impact of debloating projects with respect to client applications that reuse them. We present a novel coverage-based debloating technique that determines which class members in Java libraries and their dependencies are necessary for their clients. Our debloating technique effectively decreases the size of debloated libraries while preserving the essential functionalities required to successfully build their clients. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. p. x, 103
Series
TRITA-EECS-AVL ; 2023:36
Keywords
Software debloating, software dependencies, Java bytecode, package manager, static program analysis, dynamic program analysis
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-326755 (URN)978-91-8040-557-7 (ISBN)
Public defence
2023-06-01, D2, Lindstedtsvägen 9, KTH, Stockholm, 13:15 (English)
Opponent
Supervisors
Funder
Knut and Alice Wallenberg Foundation
Note

QC 20230510

Available from: 2023-05-10 Created: 2023-05-10 Last updated: 2023-05-25Bibliographically approved

Open Access in DiVA

fulltext(1169 kB)369 downloads
File information
File name FULLTEXT01.pdfFile size 1169 kBChecksum SHA-512
d4121273dd95fe35164937207b2be809141d610d92deb73fb83f46e55e6d4202af7a3204c3992f292cc1bd8d91bd5dacbdb0fcee3b063e3938394a02dbdee66e
Type fulltextMimetype application/pdf

Other links

Publisher's full textScopus

Authority records

Soto Valero, CésarDurieux, ThomasHarrand, NicolasBaudry, Benoit

Search in DiVA

By author/editor
Soto Valero, CésarDurieux, ThomasHarrand, NicolasBaudry, Benoit
By organisation
Software and Computer systems, SCS
In the same journal
ACM Transactions on Software Engineering and Methodology
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 370 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 347 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf