kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Causes and Canonicalization of Unreproducible Builds in Java
KTH, School of Electrical Engineering and Computer Science (EECS), Theoretical Computer Science.ORCID iD: 0000-0003-2263-7902
Université de Montréal, Montréal, Canada, H3T 1J4.ORCID iD: 0000-0002-4015-4640
KTH, School of Electrical Engineering and Computer Science (EECS), Theoretical Computer Science.ORCID iD: 0000-0003-3505-3383
2026 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 52, no 1, p. 54-69Article in journal (Refereed) Published
Abstract [en]

The increasing complexity of software supply chains and the rise of supply chain attacks have elevated concerns around software integrity. Users and stakeholders face significant challenges in validating that a given software artifact corresponds to its declared source. Reproducible Builds address this challenge by ensuring that independently performed builds from identical source code produce identical binaries. However, achieving reproducibility at scale remains difficult, especially in Java, due to a range of non-deterministic factors and caveats in the build process. In this work, we focus on reproducibility in Java-based software, archetypal of enterprise applications. We introduce a conceptual framework for reproducible builds, we analyze a large dataset from Reproducible Central, and we develop a novel taxonomy of six root causes of unreproducibility. We study actionable mitigations: artifact and bytecode canonicalization using OSS-Rebuild and jNorm respectively. Finally, we present Chains-Rebuild (improvements to OSS-Rebuild), a tool that raises reproducibility success from 9.48% to 26.60% on 12,803 unreproducible artifacts. To sum up, our contributions are the first large-scale taxonomy of build unreproducibility causes in Java, a publicly available dataset of unreproducible builds, and Chains-Rebuild, a canonicalization tool for mitigating unreproducible builds in Java.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2026. Vol. 52, no 1, p. 54-69
Keywords [en]
Java, Reproducible builds, canonicalization, software supply chain
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-377736DOI: 10.1109/TSE.2025.3627891ISI: 001662933000012Scopus ID: 2-s2.0-105020911420OAI: oai:DiVA.org:kth-377736DiVA, id: diva2:2044960
Note

QC 20260311

Available from: 2026-03-11 Created: 2026-03-11 Last updated: 2026-03-11Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Sharma, AmanBaudry, BenoitMonperrus, Martin

Search in DiVA

By author/editor
Sharma, AmanBaudry, BenoitMonperrus, Martin
By organisation
Theoretical Computer Science
In the same journal
IEEE Transactions on Software Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 11 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf