kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 80) Show all publications
Sharma, A., Baudry, B. & Monperrus, M. (2026). Causes and Canonicalization of Unreproducible Builds in Java. IEEE Transactions on Software Engineering, 52(1), 54-69
Open this publication in new window or tab >>Causes and Canonicalization of Unreproducible Builds in Java
2026 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 52, no 1, p. 54-69Article in journal (Refereed) Published
Abstract [en]

The increasing complexity of software supply chains and the rise of supply chain attacks have elevated concerns around software integrity. Users and stakeholders face significant challenges in validating that a given software artifact corresponds to its declared source. Reproducible Builds address this challenge by ensuring that independently performed builds from identical source code produce identical binaries. However, achieving reproducibility at scale remains difficult, especially in Java, due to a range of non-deterministic factors and caveats in the build process. In this work, we focus on reproducibility in Java-based software, archetypal of enterprise applications. We introduce a conceptual framework for reproducible builds, we analyze a large dataset from Reproducible Central, and we develop a novel taxonomy of six root causes of unreproducibility. We study actionable mitigations: artifact and bytecode canonicalization using OSS-Rebuild and jNorm respectively. Finally, we present Chains-Rebuild (improvements to OSS-Rebuild), a tool that raises reproducibility success from 9.48% to 26.60% on 12,803 unreproducible artifacts. To sum up, our contributions are the first large-scale taxonomy of build unreproducibility causes in Java, a publicly available dataset of unreproducible builds, and Chains-Rebuild, a canonicalization tool for mitigating unreproducible builds in Java.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2026
Keywords
Java, Reproducible builds, canonicalization, software supply chain
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-377736 (URN)10.1109/TSE.2025.3627891 (DOI)001662933000012 ()2-s2.0-105020911420 (Scopus ID)
Note

QC 20260311

Available from: 2026-03-11 Created: 2026-03-11 Last updated: 2026-03-11Bibliographically approved
Wachter, J., Tiwari, D., Monperrus, M. & Baudry, B. (2026). Serializing java objects in plain code. Journal of Systems and Software, 234, Article ID 112721.
Open this publication in new window or tab >>Serializing java objects in plain code
2026 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 234, article id 112721Article in journal (Refereed) Published
Abstract [en]

In managed languages, serialization of objects is typically done in bespoke binary formats such as Protobuf, or markup languages such as XML or JSON. The major limitation of these formats is readability. Human developers cannot read binary code, and in most cases, suffer from the syntax of XML or JSON. This is a major issue when objects are meant to be embedded and read in source code, such as in test cases. To address this problem, we propose plain-code serialization. Our core idea is to serialize objects observed at runtime in the native syntax of a programming language. We realize this vision in the context of Java, and demonstrate a prototype which serializes Java objects to Java source code. The resulting source faithfully reconstructs the objects seen at runtime. Our prototype is called PRODJand is publicly available. We experiment with PRODJto successfully plain-code serialize 174, 699 objects observed during the execution of 4 open-source Java applications. Our performance measurement shows that the performance impact is not noticeable. Through a user study, we demonstrate that developers prefer plain-code serialized objects within automatically generated tests over their representations as XML or JSON.

Place, publisher, year, edition, pages
Elsevier BV, 2026
Keywords
Code, Serialization, Objects on disk, Runtime, Java
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-377410 (URN)10.1016/j.jss.2025.112721 (DOI)001645727500001 ()2-s2.0-105025100490 (Scopus ID)
Note

QC 20260226

Available from: 2026-02-26 Created: 2026-02-26 Last updated: 2026-02-26Bibliographically approved
Liu, Y., Tiwari, D., Bogdan, C. M. & Baudry, B. (2025). Detecting and removing bloated dependencies in CommonJS packages. Journal of Systems and Software, 230, Article ID 112509.
Open this publication in new window or tab >>Detecting and removing bloated dependencies in CommonJS packages
2025 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 230, article id 112509Article in journal (Refereed) Published
Abstract [en]

JavaScript packages are notoriously prone to bloat, a factor that significantly impacts the performance and maintainability of web applications. While web bundlers and tree-shaking can mitigate this issue in client-side applications, state-of-the-art techniques have limitations on the detection and removal of bloat in server-side applications. In this paper, we present the first study to investigate bloated dependencies within server-side JavaScript applications, focusing on those built with the widely used and highly dynamic CommonJS module system. We propose a trace-based dynamic analysis that monitors the OS file system, to determine which dependencies are not accessed during runtime. To evaluate our approach, we curate an original dataset of 91 CommonJS packages with a total of 50,488 dependencies. Compared to the state-of-the-art dynamic and static approaches, our trace-based analysis demonstrates higher accuracy in detecting bloated dependencies. Our analysis identifies 50.6% of the 50,488 dependencies as bloated: 13.8% of direct dependencies and 51.3% of indirect dependencies. Furthermore, removing only the direct bloated dependencies by cleaning the dependency configuration file can remove a significant share of unnecessary bloated indirect dependencies while preserving function correctness.

Place, publisher, year, edition, pages
Elsevier BV, 2025
Keywords
CommonJS, Dependency bloat, Dependency management, Node.js, npm
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-366559 (URN)10.1016/j.jss.2025.112509 (DOI)001513620700002 ()2-s2.0-105008213531 (Scopus ID)
Note

QC 20250710

Available from: 2025-07-10 Created: 2025-07-10 Last updated: 2025-09-24Bibliographically approved
Liu, R., Bobadilla, S., Baudry, B. & Monperrus, M. (2025). Dirty-Waters: Detecting Software Supply Chain Smells. In: FSE Companion 2025 - Companion Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering: . Paper presented at 33rd ACM International Conference on the Foundations of Software Engineering, FSE Companion 2025, Trondheim, Norway, Jun 23 2025 - Jun 27 2025 (pp. 1045-1049). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Dirty-Waters: Detecting Software Supply Chain Smells
2025 (English)In: FSE Companion 2025 - Companion Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, Association for Computing Machinery (ACM) , 2025, p. 1045-1049Conference paper, Published paper (Refereed)
Abstract [en]

Using open-source dependencies is essential in modern software development. However, this practice implies significant trust in third-party code, while there is little support for developers to assess this trust. As a consequence, attacks, called software supply chain attacks, have been increasingly occurring through third-party dependencies. In this paper, we target the problem of projects that use dependencies, where developers are unaware of the potential risks posed by their software supply chain. We define the novel concept of software supply chain smell and present Dirty-Waters, a novel tool for detecting software supply chain smells. We evaluate Dirty-Waters on three JavaScript projects and demonstrate the prevalence of all proposed software supply chain smells. Dirty-Waters reveals potential risks for previously invisible problems and provides clear indicators for developers to act on the security of their supply chain. A video demonstrating Dirty-Waters is available at: http://l.4open.science/dirty-waters-demo.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
Open Source, Software Security, Software Supply Chain
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-370310 (URN)10.1145/3696630.3728578 (DOI)2-s2.0-105013963801 (Scopus ID)
Conference
33rd ACM International Conference on the Foundations of Software Engineering, FSE Companion 2025, Trondheim, Norway, Jun 23 2025 - Jun 27 2025
Note

Part of ISBN 9798400712760

QC 20250925

Available from: 2025-09-25 Created: 2025-09-25 Last updated: 2025-09-25Bibliographically approved
Baudry, B. & Monperrus, M. (2025). Humor for graduate training. ACM Inroads
Open this publication in new window or tab >>Humor for graduate training
2025 (English)In: ACM Inroads, ISSN 2153-2184, E-ISSN 2153-2192Article in journal (Refereed) Accepted
Abstract [en]

Humor genuinely engages graduate students with their scientific training.

Keywords
humor; higher education
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-362677 (URN)10.1145/3730408 (DOI)
Note

QC 20250424

Available from: 2025-04-23 Created: 2025-04-23 Last updated: 2025-06-13Bibliographically approved
Andersson, V., Baudry, B., Bobadilla, S., Christensen, L., Cofano, S., Etemadi, K., . . . Toady, T. (2025). UPPERCASE IS ALL YOU NEED. In: SIGBOVIK: A Record of the Proceedings of SIGBOVIK 2025. Paper presented at SIGBOVIK 2025, Carnegie Mellon University, Pittsburgh, PA, USA, April 4, 2025 (pp. 24-35). SIGBOVIK
Open this publication in new window or tab >>UPPERCASE IS ALL YOU NEED
Show others...
2025 (English)In: SIGBOVIK: A Record of the Proceedings of SIGBOVIK 2025, SIGBOVIK , 2025, p. 24-35Conference paper, Published paper (Other (popular science, discussion, etc.))
Abstract [en]

WE PRESENT THE FIRST COMPREHENSIVE STUDY ON THE CRITICAL YET OVERLOOKED ROLE OF UPPERCASE TEXT IN ARTIFICIAL INTELLIGENCE. DESPITE CONSTITUTING A MERE SINGLE-DIGIT PERCENTAGE OF STANDARD ENGLISH PROSE, UPPERCASE LETTERS HAVE DISPROPORTIONATE POWER IN HUMAN-AI INTERACTIONS. THROUGH RIGOROUS EXPERIMENTATION INVOLVING SHOUTING AT VARIOUS LANGUAGE MODELS, WE DEMONSTRATE THAT UPPERCASE IS NOT MERELY A STYLISTIC CHOICE BUT A FUNDAMENTAL TOOL FOR AI COMMUNICATION. OUR RESULTS REVEAL THAT UPPERCASE TEXT SIGNIFICANTLY ENHANCES COMMAND AUTHORITY, CODE GENERATION QUALITY, AND – MOST CRUCIALLY – THE AI’S ABILITY TO CREATE APPROPRIATE CAT PICTURES. THIS PAPER DEFINITIVELY PROVES THAT IN THE REALM OF HUMAN-AI INTERACTION, BIGGER LETTERS == BETTER RESULTS. OUR FINDINGS SUGGEST THAT THE CAPS-LOCK KEY MAY BE THE MOST UNDERUTILIZED RESOURCE IN MODERN AI.

Place, publisher, year, edition, pages
SIGBOVIK, 2025
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-287271 (URN)
Conference
SIGBOVIK 2025, Carnegie Mellon University, Pittsburgh, PA, USA, April 4, 2025
Note

QC 20250905

Available from: 2025-04-23 Created: 2025-04-23 Last updated: 2025-09-08Bibliographically approved
Reyes García, F., Gamage, Y., Skoglund, G., Baudry, B. & Monperrus, M. (2024). BUMP: A Benchmark of Reproducible Breaking Dependency Updates. In: Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024: . Paper presented at 31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024, Rovaniemi, Finland, Mar 12 2024 - Mar 15 2024 (pp. 159-170). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>BUMP: A Benchmark of Reproducible Breaking Dependency Updates
Show others...
2024 (English)In: Proceedings - 2024 IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 159-170Conference paper, Published paper (Refereed)
Abstract [en]

Third-party dependency updates can cause a build to fail if the new dependency version introduces a change that is incompatible with the usage: this is called a breaking dependency update. Research on breaking dependency updates is active, with works on characterization, understanding, automatic repair of breaking updates, and other software engineering aspects. All such research projects require a benchmark of breaking updates that has the following properties: 1) it contains real-world breaking updates; 2) the breaking updates can be executed; 3) the benchmark provides stable scientific artifacts of breaking updates over time, a property we call 'reproducibility'. To the best of our knowledge, such a benchmark is missing. To address this problem, we present BUMP, a new benchmark that contains reproducible breaking dependency updates in the context of Java projects built with the Maven build system. BUMP contains 571 breaking dependency updates collected from 153 Java projects. BUMP ensures long-term reproducibility of dependency updates on different platforms, guaranteeing consistent build failures. We categorize the different causes of build breakage in BUMP, providing novel insights for future work on breaking update engineering. To our knowledge, BUMP is the first of its kind, providing hundreds of real-world breaking updates that have all been made reproducible.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Benchmark, Breaking dependency updates, Dependency engineering, Java, Maven, Reproducibility
National Category
Computer Sciences Software Engineering
Identifiers
urn:nbn:se:kth:diva-351755 (URN)10.1109/SANER60148.2024.00024 (DOI)001505450800018 ()2-s2.0-85199750992 (Scopus ID)
Conference
31st IEEE International Conference on Software Analysis, Evolution and Reengineering, SANER 2024, Rovaniemi, Finland, Mar 12 2024 - Mar 15 2024
Funder
Swedish Foundation for Strategic Research, Chains
Note

 Part of ISBN 9798350330663

QC 20240823

Available from: 2024-08-13 Created: 2024-08-13 Last updated: 2025-12-05Bibliographically approved
Ron Arteaga, J., Soto Valero, C., Zhang, L., Baudry, B. & Monperrus, M. (2024). Highly Available Blockchain Nodes With N-Version Design. IEEE Transactions on Dependable and Secure Computing, 21(4), 4084-4097
Open this publication in new window or tab >>Highly Available Blockchain Nodes With N-Version Design
Show others...
2024 (English)In: IEEE Transactions on Dependable and Secure Computing, ISSN 1545-5971, E-ISSN 1941-0018, Vol. 21, no 4, p. 4084-4097Article in journal (Refereed) Published
Abstract [en]

As all software, blockchain nodes are exposed to faults in their underlying execution stack. Unstable execution environments can disrupt the availability of blockchain nodes' interfaces, resulting in downtime for users. This paper introduces the concept of N-Version Blockchain nodes. This new type of node relies on simultaneous execution of different implementations of the same blockchain protocol, in the line of Avizienis' N-Version programming vision. We design and implement an N-Version blockchain node prototype in the context of Ethereum, called N-ETH. We show that N-ETH is able to mitigate the effects of unstable execution environments and significantly enhance availability under environment faults. To simulate unstable execution environments, we perform fault injection at the system-call level. Our results show that existing Ethereum node implementations behave asymmetrically under identical instability scenarios. N-ETH leverages this asymmetric behavior available in the diverse implementations of Ethereum nodes to provide increased availability, even under our most aggressive fault-injection strategies. We are the first to validate the relevance of N-Version design in the domain of blockchain infrastructure. From an industrial perspective, our results are of utmost importance for businesses operating blockchain nodes, including Google, ConsenSys, and many other major blockchain companies.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
availability, blockchain, Blockchains, Computer architecture, N-Version design, Peer-to-peer computing, Programming, Prototypes, Software, Time factors
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-349884 (URN)10.1109/TDSC.2023.3346195 (DOI)001270317500010 ()2-s2.0-85181578677 (Scopus ID)
Funder
Swedish Foundation for Strategic Research, ChainsWallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20240704

Available from: 2024-07-04 Created: 2024-07-04 Last updated: 2025-03-27Bibliographically approved
Tiwari, D., Monperrus, M. & Baudry, B. (2024). Mimicking Production Behavior with Generated Mocks. IEEE Transactions on Software Engineering, 50(11), 2921-2946
Open this publication in new window or tab >>Mimicking Production Behavior with Generated Mocks
2024 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 50, no 11, p. 2921-2946Article in journal (Refereed) Published
Abstract [en]

Mocking allows testing program units in isolation. A developer who writes tests with mocks faces two challenges: design realistic interactions between a unit and its environment; and understand the expected impact of these interactions on the behavior of the unit. In this paper, we propose to monitor an application in production to generate tests that mimic realistic execution scenarios through mocks. Our approach operates in three phases. First, we instrument a set of target methods for which we want to generate tests, as well as the methods that they invoke, which we refer to as mockable method calls. Second, in production, we collect data about the context in which target methods are invoked, as well as the parameters and the returned value for each mockable method call. Third, offline, we analyze the production data to generate test cases with realistic inputs and mock interactions. The approach is automated and implemented in an open-source tool called RICK. We evaluate our approach with three real-world, opensource Java applications. RICK monitors the invocation of 128 methods in production across the three applications and captures their behavior. Based on this captured data, RICK generates test cases that include realistic initial states and test inputs, as well as mocks and stubs. All the generated test cases are executable, and 52.4% of them successfully mimic the complete execution context of the target methods observed in production. The mock-based oracles are also effective at detecting regressions within the target methods, complementing each other in their fault-finding ability. We interview 5 developers from the industry who confirm the relevance of using production observations to design mocks and stubs. Our experimental findings clearly demonstrate the feasibility and added value of generating mocks from production interactions.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-356173 (URN)10.1109/tse.2024.3458448 (DOI)001369099900010 ()2-s2.0-85204006940 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20250120

Available from: 2024-11-09 Created: 2024-11-09 Last updated: 2025-01-20Bibliographically approved
Baudry, B. & Monperrus, M. (2024). Programming Art With Drawing Machines. Computer, 57(7), 104-108
Open this publication in new window or tab >>Programming Art With Drawing Machines
2024 (English)In: Computer, ISSN 0018-9162, E-ISSN 1558-0814, Vol. 57, no 7, p. 104-108Article in journal, Editorial material (Other academic) Published
Abstract [en]

Algorithmic artists master programming to create art. Specialized libraries and hardware devices such as pen plotters support their practice.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-350806 (URN)10.1109/MC.2024.3385049 (DOI)001260510200011 ()2-s2.0-85197601330 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20240719

Available from: 2024-07-19 Created: 2024-07-19 Last updated: 2024-10-03Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4015-4640

Search in DiVA

Show all publications