kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Debloating Java Dependencies
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. (ASSERT)ORCID iD: 0000-0003-0541-6411
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Software systems have a natural tendency to grow in size and complexity. A part of this growth comes with the addition of new features or bug fixes, while another part is due to useless code that accumulates over time. This phenomenon, known as "software bloat," increases with the practice of reusing software dependencies, which has exceeded the capacity of human developers to efficiently manage them. Software bloat in third-party dependencies presents a multifaceted challenge for application development, encompassing issues of security, performance, and maintenance. To address these issues, researchers have developed software debloating techniques that automatically remove unnecessary code. Despite significant progress has been made in the realm of software debloating, the pervasive issue of dependency bloat warrants special attention. In this thesis, we contribute to the field of software debloating by proposing novel techniques specifically targeting dependencies in the Java ecosystem.

First, we investigate the growth of completely unused software dependencies, which we call "bloated dependencies." We propose a technique to automatically detect and remove bloated dependencies in Java projects built with Maven. We empirically study the usage status of dependencies in the Maven Central repository and remove bloated dependencies in mature Java projects. We demonstrate that once a bloated dependency is detected, it can be safely removed as its future usage is unlikely.

Second, we focus on dependencies that are only partially used. We introduce a technique to specialize these dependencies in Java projects based on their actual usage. Our approach systematically identifies the subset of functionalities within each dependency that is sufficient to build the project and removes the rest. We demonstrate that our dependency specialization approach can halve the project classes to dependency classes ratio.

Last, we assess the impact of debloating projects with respect to client applications that reuse them. We present a novel coverage-based debloating technique that determines which class members in Java libraries and their dependencies are necessary for their clients. Our debloating technique effectively decreases the size of debloated libraries while preserving the essential functionalities required to successfully build their clients. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. , p. x, 103
Series
TRITA-EECS-AVL ; 2023:36
Keywords [en]
Software debloating, software dependencies, Java bytecode, package manager, static program analysis, dynamic program analysis
National Category
Computer Systems
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-326755ISBN: 978-91-8040-557-7 (print)OAI: oai:DiVA.org:kth-326755DiVA, id: diva2:1755964
Public defence
2023-06-01, D2, Lindstedtsvägen 9, KTH, Stockholm, 13:15 (English)
Opponent
Supervisors
Funder
Knut and Alice Wallenberg Foundation
Note

QC 20230510

Available from: 2023-05-10 Created: 2023-05-10 Last updated: 2023-05-25Bibliographically approved
List of papers
1. The Emergence of Software Diversity inMaven Central
Open this publication in new window or tab >>The Emergence of Software Diversity inMaven Central
Show others...
2019 (English)In: 16th International Conference on Mining Software Repositories, Montréal, QC, Canada: IEEE conference proceedings, 2019, p. 333-343Conference paper, Published paper (Refereed)
Abstract [en]

Maven artifacts are immutable: an artifact that is uploaded on Maven Central cannot be removed nor modified. The only way for developers to upgrade their library is to releasea new version. Consequently, Maven Central accumulates all the versions of all the libraries that are published there, and applications that declare a dependency towards a library can pick any version. In this work, we hypothesize that the immutability of Maven artifacts and the ability to choose any version naturally support the emergence of software diversity within Maven Central. We analyze 1,487,956 artifacts that represent all the versions of 73,653 libraries. We observe that more than 30% of libraries have multiple versions that are actively used by latest artifacts. In the case of popular libraries, more than 50% of their versions are used. We also observe that more than 17% of libraries have several versions that are significantly more used than the other versions. Our results indicate that the immutability of artifacts in Maven Central does support a sustained level of diversity among versions of libraries in the repository.

Place, publisher, year, edition, pages
Montréal, QC, Canada: IEEE conference proceedings, 2019
Keywords
Maven Central, Software Diversity, LibraryVersions, Evolution, Open-Source Software
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-254553 (URN)10.1109/MSR.2019.00059 (DOI)2-s2.0-85072344031 (Scopus ID)
Conference
MSR
Note

QC 20190802

Available from: 2019-07-01 Created: 2019-07-01 Last updated: 2023-05-10Bibliographically approved
2. A comprehensive study of bloated dependencies in the Maven ecosystem
Open this publication in new window or tab >>A comprehensive study of bloated dependencies in the Maven ecosystem
2021 (English)In: Empirical Software Engineering, ISSN 1382-3256, E-ISSN 1573-7616, Vol. 26, no 3, article id 45Article in journal (Refereed) Published
Abstract [en]

Build automation tools and package managers have a profound influence on software development. They facilitate the reuse of third-party libraries, support a clear separation between the application's code and its external dependencies, and automate several software development tasks. However, the wide adoption of these tools introduces new challenges related to dependency management. In this paper, we propose an original study of one such challenge: the emergence of bloated dependencies. Bloated dependencies are libraries that are packaged with the application's compiled code but that are actually not necessary to build and run the application. They artificially grow the size of the built binary and increase maintenance effort. We propose DepClean, a tool to determine the presence of bloated dependencies in Maven artifacts. We analyze 9,639 Java artifacts hosted on Maven Central, which include a total of 723,444 dependency relationships. Our key result is as follows: 2.7% of the dependencies directly declared are bloated, 15.4% of the inherited dependencies are bloated, and 57% of the transitive dependencies of the studied artifacts are bloated. In other words, it is feasible to reduce the number of dependencies of Maven artifacts to 1/4 of its current count. Our qualitative assessment with 30 notable open-source projects indicates that developers pay attention to their dependencies when they are notified of the problem. They are willing to remove bloated dependencies: 21/26 answered pull requests were accepted and merged by developers, removing 140 dependencies in total: 75 direct and 65 transitive.

Place, publisher, year, edition, pages
Springer Nature, 2021
Keywords
Dependency management, Software reuse, Debloating, Program analysis
National Category
Software Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-293389 (URN)10.1007/s10664-020-09914-8 (DOI)000634831400002 ()2-s2.0-85103393782 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research , trustfull
Note

QC 20210519

Available from: 2021-04-26 Created: 2021-04-26 Last updated: 2024-03-18Bibliographically approved
3. A Longitudinal Analysis of Bloated Java Dependencies
Open this publication in new window or tab >>A Longitudinal Analysis of Bloated Java Dependencies
2021 (English)In: Proceedings Of The 29Th Acm Joint Meeting On European Software Engineering Conference And Symposium On The Foundations Of Software Engineering (Esec/Fse '21) / [ed] Spinellis, D Gousios, G Chechik, M DiPenta, M, Association for Computing Machinery (ACM) , 2021, p. 1021-1031Conference paper, Published paper (Refereed)
Abstract [en]

We study the evolution and impact of bloated dependencies in a single software ecosystem: Java/Maven. Bloated dependencies are third-party libraries that are packaged in the application binary but are not needed to run the application. We analyze the history of 435 Java projects. This historical data includes 48,469 distinct dependencies, which we study across a total of 31,515 versions of Maven dependency trees. Bloated dependencies steadily increase over time, and 89.2 % of the direct dependencies that are bloated remain bloated in all subsequent versions of the studied projects. This empirical evidence suggests that developers can safely remove a bloated dependency. We further report novel insights regarding the unnecessary maintenance efforts induced by bloat. We find that 22 % of dependency updates performed by developers are made on bloated dependencies, and that Dependabot suggests a similar ratio of updates on bloated dependencies.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
software bloat, dependencies, java
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-309540 (URN)10.1145/3468264.3468589 (DOI)000744425500088 ()2-s2.0-85116273059 (Scopus ID)
Conference
29th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE), AUG 23-28, 2021, ELECTR NETWORK
Note

Part of proceedings ISBN: 978-1-4503-8562-6

QC 20220316

Available from: 2022-03-16 Created: 2022-03-16 Last updated: 2023-05-10Bibliographically approved
4. Coverage-Based Debloating for Java Bytecode
Open this publication in new window or tab >>Coverage-Based Debloating for Java Bytecode
2022 (English)In: ACM Transactions on Software Engineering and Methodology, ISSN 1049-331X, E-ISSN 1557-7392Article in journal (Refereed) Published
Abstract [en]

Software bloat is code that is packaged in an application but is actually not necessary to run the application. The presence of software bloat is an issue for security, for performance, and for maintenance. In this paper, we introduce a novel technique for debloating, which we call coverage-based debloating. We implement the technique for one single language: Java bytecode. We leverage a combination of state-of-the-art Java bytecode coverage tools to precisely capture what parts of a project and its dependencies are used when running with a specific workload. Then, we automatically remove the parts that are not covered, in order to generate a debloated version of the project. We succeed to debloat 211 library versions from a dataset of 94 unique  open-source Java libraries. The debloated versions are syntactically correct and preserve their original behavior according to the workload. Our results indicate that 68.3% of the libraries’ bytecode and 20.3% of their total dependencies can be removed through coverage-based debloating.

For the first time in the literature on software debloating, we assess the utility of debloated libraries with respect to client applications that reuse them. We select 988 client projects that either have a direct reference to the debloated library in their source code or which test suite covers at least one class of the libraries that we debloat. Our results show that 81.5% of the clients, with at least one test that uses the library, successfully compile and pass their test suite when the original library is replaced by its debloated version.

Place, publisher, year, edition, pages
ACM Digital Library, 2022
Keywords
software bloat, code coverage, program specialization, bytecode, software maintenance
National Category
Computer Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-316426 (URN)10.1145/3546948 (DOI)000970588900011 ()2-s2.0-85147732395 (Scopus ID)
Projects
WASP
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20230529

Available from: 2022-08-17 Created: 2022-08-17 Last updated: 2023-05-29Bibliographically approved
5. The Multibillion Dollar Software Supply Chain of Ethereum
Open this publication in new window or tab >>The Multibillion Dollar Software Supply Chain of Ethereum
2022 (English)In: Computer, ISSN 0018-9162, E-ISSN 1558-0814, Vol. 55, no 10, p. 26-34Article in journal (Refereed) Published
Abstract [en]

Ethereum is the single largest programmable blockchain platform today. Ethereum nodes operate the blockchain, relying on a vast supply chain of third-party software dependencies. In this article, we perform an analysis of the software supply chain of Java Ethereum nodes and distill the challenges of maintaining and securing this blockchain technology.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-320681 (URN)10.1109/MC.2022.3175542 (DOI)000861426400004 ()2-s2.0-85139498604 (Scopus ID)
Funder
Swedish Foundation for Strategic Research, chains
Note

QC 20221031

Available from: 2022-10-31 Created: 2022-10-31 Last updated: 2023-08-04Bibliographically approved
6. Automatic Specialization of Third-Party Java Dependencies
Open this publication in new window or tab >>Automatic Specialization of Third-Party Java Dependencies
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Modern software systems rely on a multitude of third-party dependencies. This large-scale code reuse reduces developmentcosts and time, and it poses new challenges with respect to maintenance and security. Techniques such as tree shaking or shading canremove dependencies that are completely unused by a project, which partly address these challenges. Yet, the remaining dependenciesare likely to be used only partially, leaving room for further reduction of third-party code. In this paper, we propose a novel technique tospecialize dependencies of Java projects, based on their actual usage. For each dependency, we systematically identify the subset of itsfunctionalities that is necessary to build the project, and remove the rest. Each specialized dependency is repackaged. Then, wegenerate specialized dependency trees where the original dependencies are replaced by the specialized versions and we rebuild theproject. We implement our technique in a tool called DepTrim, which we evaluate with 30 notable open-source Java projects. DepTrim specializes a total of 343 (86.6%) dependencies across these projects, and successfully rebuilds each project with a specializeddependency tree. Moreover, through this specialization, DepTrim removes a total of 60,962 (47.0%) classes from the dependencies,reducing the ratio of dependency classes to project classes from 8.7× in the original projects to 4.4 × after specialization. Theseresults indicate the relevance of dependency specialization to significantly reduce the share of third-party code in Java projects.

Keywords
Software specialization, software debloating, maven, software supply chain, software ecosystem
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-326769 (URN)10.48550/arXiv.2302.08370 (DOI)
Funder
Knut and Alice Wallenberg Foundation, JKB 64274
Note

QC 20230511

Available from: 2023-05-10 Created: 2023-05-10 Last updated: 2023-05-11Bibliographically approved

Open Access in DiVA

thesis(1167 kB)653 downloads
File information
File name FULLTEXT03.pdfFile size 1167 kBChecksum SHA-512
c12db8c5d33328404bca948580fa261493d16e16eaa276915d567eaa4fd2457ad59a9f55041a1f1da1f2bc2db7467b8f86cf83e0ffeccd6eec7715e29e20d584
Type fulltextMimetype application/pdf

Authority records

Soto Valero, César

Search in DiVA

By author/editor
Soto Valero, César
By organisation
Software and Computer systems, SCS
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 653 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1291 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf