kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 104) Show all publications
Liu, R., Bobadilla, S., Baudry, B. & Monperrus, M. (2025). Dirty-Waters: Detecting Software Supply Chain Smells. In: FSE Companion 2025 - Companion Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering: . Paper presented at 33rd ACM International Conference on the Foundations of Software Engineering, FSE Companion 2025, Trondheim, Norway, Jun 23 2025 - Jun 27 2025 (pp. 1045-1049). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Dirty-Waters: Detecting Software Supply Chain Smells
2025 (English)In: FSE Companion 2025 - Companion Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, Association for Computing Machinery (ACM) , 2025, p. 1045-1049Conference paper, Published paper (Refereed)
Abstract [en]

Using open-source dependencies is essential in modern software development. However, this practice implies significant trust in third-party code, while there is little support for developers to assess this trust. As a consequence, attacks, called software supply chain attacks, have been increasingly occurring through third-party dependencies. In this paper, we target the problem of projects that use dependencies, where developers are unaware of the potential risks posed by their software supply chain. We define the novel concept of software supply chain smell and present Dirty-Waters, a novel tool for detecting software supply chain smells. We evaluate Dirty-Waters on three JavaScript projects and demonstrate the prevalence of all proposed software supply chain smells. Dirty-Waters reveals potential risks for previously invisible problems and provides clear indicators for developers to act on the security of their supply chain. A video demonstrating Dirty-Waters is available at: http://l.4open.science/dirty-waters-demo.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
Open Source, Software Security, Software Supply Chain
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-370310 (URN)10.1145/3696630.3728578 (DOI)2-s2.0-105013963801 (Scopus ID)
Conference
33rd ACM International Conference on the Foundations of Software Engineering, FSE Companion 2025, Trondheim, Norway, Jun 23 2025 - Jun 27 2025
Note

Part of ISBN 9798400712760

QC 20250925

Available from: 2025-09-25 Created: 2025-09-25 Last updated: 2025-09-25Bibliographically approved
Baudry, B. & Monperrus, M. (2025). Humor for graduate training. ACM Inroads
Open this publication in new window or tab >>Humor for graduate training
2025 (English)In: ACM Inroads, ISSN 2153-2184, E-ISSN 2153-2192Article in journal (Refereed) Accepted
Abstract [en]

Humor genuinely engages graduate students with their scientific training.

Keywords
humor; higher education
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-362677 (URN)10.1145/3730408 (DOI)
Note

QC 20250424

Available from: 2025-04-23 Created: 2025-04-23 Last updated: 2025-06-13Bibliographically approved
Etemadi, K., Mohammadi, B., Su, Z. & Monperrus, M. (2025). Mokav: Execution-driven differential testing with LLMs. Journal of Systems and Software, 230, Article ID 112571.
Open this publication in new window or tab >>Mokav: Execution-driven differential testing with LLMs
2025 (English)In: Journal of Systems and Software, ISSN 0164-1212, E-ISSN 1873-1228, Vol. 230, article id 112571Article in journal (Refereed) Published
Abstract [en]

It is essential to detect functional differences between programs in various software engineering tasks, such as automated program repair, mutation testing, and code refactoring. The problem of detecting functional differences between two programs can be reduced to searching for a difference exposing test (DET): a test input that results in different outputs on the subject programs. In this paper, we propose MOKAV, a novel execution-driven tool that leverages LLMs to generate DETs. MOKAV takes two versions of a program (P and Q) and an example test input. When successful, MOKAV generates a valid DET, a test input that leads to provably different outputs on P and Q. MOKAV iteratively prompts an LLM with a specialized prompt to generate new test inputs. At each iteration, MOKAV provides execution-based feedback from previously generated tests until the LLM produces a DET. We evaluate MOKAV on 1535 pairs of Python programs collected from the Codeforces competition platform and 32 pairs of programs from the QuixBugs dataset. Our experiments show that MOKAV outperforms the state-of-the-art, Pynguin and Differential Prompting, by a large margin. MOKAV can generate DETs for 81.7% (1,255/1535) of the program pairs in our benchmark (versus 4.9% for Pynguin and 37.3% for Differential Prompting). We demonstrate that the iterative and execution-driven feedback components of the system contribute to its high effectiveness.

Place, publisher, year, edition, pages
Elsevier BV, 2025
Keywords
Behavioral difference, Large language models, Test generation
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-369034 (URN)10.1016/j.jss.2025.112571 (DOI)001538605500001 ()2-s2.0-105011045033 (Scopus ID)
Note

QC 20250912

Available from: 2025-09-12 Created: 2025-09-12 Last updated: 2025-11-13Bibliographically approved
Monperrus, M. (2025). Most Cited Papers in Software Engineering 2013-2023.
Open this publication in new window or tab >>Most Cited Papers in Software Engineering 2013-2023
2025 (English)Report (Other academic)
Abstract [en]

This compilation presents a list of the most cited research papers in software engineering from 2013 to 2023, published in leading academic venues. By leveraging APIs from CrossRef and Semantic Scholar, we systematically gather and rank influential works based on citation metrics, providing a valuable resource for researchers, educators, and industry professionals to understand the field. This document can also serve for individuals to strengthen their academic credits with impact facts. Full bibliometric data is accessible in the accompanying repository.

Publisher
p. 61
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-362600 (URN)10.5281/zenodo.14885765 (DOI)
Note

QC 20250424

Available from: 2025-04-22 Created: 2025-04-22 Last updated: 2025-04-30Bibliographically approved
Silva, A. & Monperrus, M. (2025). RepairBench: Leaderboard of Frontier Models for Program Repair. In: 2025 IEEE/Acm International Workshop On Large Language Models For Code, LLM4Code: . Paper presented at 2025 International Workshop on Large Language Models for Code-LLM4Code, MAY 03, 2025, Ottawa, CANADA (pp. 9-16). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>RepairBench: Leaderboard of Frontier Models for Program Repair
2025 (English)In: 2025 IEEE/Acm International Workshop On Large Language Models For Code, LLM4Code, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 9-16Conference paper, Published paper (Refereed)
Abstract [en]

AI-driven program repair uses AI models to repair buggy software by producing patches. Rapid advancements in frontier models surely impact performance on the program repair task. Yet, there is a lack of frequent and standardized evaluations to actually understand the strengths and weaknesses of models. To that end, we propose RepairBench, a novel leaderboard for AI-driven program repair. The key characteristics of RepairBench are: 1) it is execution-based: all patches are compiled and executed against a test suite, 2) it assesses frontier models in a frequent and standardized way. RepairBench leverages two high-quality benchmarks, Defects4J and GitBug-Java, to evaluate frontier models only against real-world program repair tasks. At the time of writing, RepairBench shows that claude-3-5-sonnet-20241022 is the best model for program repair, and deepseek-v3 one of the cheapest while ranking third. We publicly release the evaluation framework of RepairBench as well as all patches generated in the course of the evaluation.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
leaderboard, benchmark, program repair, large language models
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-375474 (URN)10.1109/LLM4Code66737.2025.00006 (DOI)001554529600002 ()2-s2.0-105009128352 (Scopus ID)979-8-3315-2616-0 (ISBN)979-8-3315-2615-3 (ISBN)
Conference
2025 International Workshop on Large Language Models for Code-LLM4Code, MAY 03, 2025, Ottawa, CANADA
Note

QC 20260126

Available from: 2026-01-26 Created: 2026-01-26 Last updated: 2026-01-26Bibliographically approved
Silva, A., Fang, S. & Monperrus, M. (2025). RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair. IEEE Transactions on Software Engineering, 51(8), 2366-2380
Open this publication in new window or tab >>RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair
2025 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 51, no 8, p. 2366-2380Article in journal (Refereed) Published
Abstract [en]

Automated Program Repair (APR) has evolved significantly with the advent of Large Language Models (LLMs). Fine-tuning LLMs for program repair is a recent avenue of research, with many dimensions which have not been explored. Existing work mostly fine-tune LLMs with naive code representations and does not scale to frontier models. To address this problem, we propose RepairLLaMA, a novel program repair approach that 1) identifies optimal code representations for APR with fine-tuned models, and 2) pioneers state-of-the-art parameter-efficient fine-tuning technique (PEFT) for program repair. This results in RepairLLaMA producing a highly effective ‘program repair adapter’ for fixing bugs with AI. Our experiments demonstrate the validity of both concepts. First, fine-tuning adapters with program repair specific code representations enables the model to use meaningful repair signals and produce better patches. Second, parameter-efficient fine-tuning helps fine-tuning to converge and clearly contributes to the effectiveness of RepairLLaMA in fixing bugs outside the fine-tuning data distribution. Overall, RepairLLaMA correctly fixes 144 Defects4J v2, 109 HumanEval-Java, and 20 GitBug-Java bugs, outperforming all baselines.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Automated Program Repair, Code Representations, Large Language Models, Parameter-Efficient Fine-Tuning
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-368761 (URN)10.1109/TSE.2025.3581062 (DOI)001551587900008 ()2-s2.0-105008914744 (Scopus ID)
Note

QC 20250821

Available from: 2025-08-21 Created: 2025-08-21 Last updated: 2025-12-08Bibliographically approved
Bobadilla, S., Glassey, R., Bergel, A. & Monperrus, M. (2025). SOBO: A Feedback Bot to Nudge Code Quality in Programming Courses. In: Proceedings - 2025 IEEE/ACM 37th International Conference on Software Engineering Education and Training, CSEE and T 2025: . Paper presented at 37th IEEE/ACM International Conference on Software Engineering Education and Training, CSEE and T 2025, Ottawa, Canada, Apr 28 2025 - Apr 29 2025 (pp. 229). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>SOBO: A Feedback Bot to Nudge Code Quality in Programming Courses
2025 (English)In: Proceedings - 2025 IEEE/ACM 37th International Conference on Software Engineering Education and Training, CSEE and T 2025, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 229-Conference paper, Published paper (Refereed)
Abstract [en]

Recent research has shown the great potential of automatic feedback in education. This paper presents SOBO, a bot we designed to automatically provide feedback on code quality to undergraduate students. SOBO has been deployed in a course at the KTH Royal Institute of Technology in Sweden with 130+ students. Overall, SOBO has analyzed 1687 GitHub repositories and produced 8443 tailored code quality feedback messages to students. Unlike traditional tools embedded in CI pipelines, SOBO is designed to interact with students in a way that promotes personalized learning without imposing additional teaching burdens. The quantitative and qualitative results indicate that SOBO effectively nudges students into adopting code quality best practices, without interfering with pedagogical objectives. From this experience, we provide guidelines on how to design and deploy teaching bots in programming courses.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
bots, computer science, education, software engineering
National Category
Computer Sciences Software Engineering
Identifiers
urn:nbn:se:kth:diva-368630 (URN)10.1109/CSEET66350.2025.00029 (DOI)001556376200021 ()2-s2.0-105008498459 (Scopus ID)
Conference
37th IEEE/ACM International Conference on Software Engineering Education and Training, CSEE and T 2025, Ottawa, Canada, Apr 28 2025 - Apr 29 2025
Note

Part of ISBN 9798331537098

QC 20250819

Available from: 2025-08-19 Created: 2025-08-19 Last updated: 2025-12-08Bibliographically approved
Oliveira, D., Santos, R., de Oliveira, B., Monperrus, M., Castor, F. & Madeiral, F. (2025). Understanding Code Understandability Improvements in Code Reviews. IEEE Transactions on Software Engineering, 51(1), 14-37
Open this publication in new window or tab >>Understanding Code Understandability Improvements in Code Reviews
Show others...
2025 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 51, no 1, p. 14-37Article in journal (Refereed) Published
Abstract [en]

Context: Code understandability plays a crucial role in software development, as developers spend between 58% and 70% of their time reading source code. Improving code understandability can lead to enhanced productivity and save maintenance costs. Problem: Experimental studies aim to establish what makes code more or less understandable in a controlled setting, but ignore that what makes code easier to understand in the real world also depends on extraneous elements such as developers' background and project culture and guidelines. Not accounting for the influence of these factors may lead to results that are sound but have little external validity. Goal: We aim to investigate how developers improve code understandability during software development through code review comments. Our assumption is that code reviewers are specialists in code quality within a project. Method and Results: We manually analyzed 2,401 code review comments from Java open-source projects on GitHub and found that over 42% of all comments focus on improving code understandability, demonstrating the significance of this quality attribute in code reviews. We further explored a subset of 385 comments related to code understandability and identified eight categories of code understandability concerns, such as incomplete or inadequate code documentation, bad identifier, and unnecessary code. Among the suggestions to improve code understandability, 83.9% were accepted and integrated into the codebase. Among these, only two (less than 1%) ended up being reverted later. We also identified types of patches that improve code understandability, ranging from simple changes (e.g., removing unused code) to more context-dependent improvements (e.g., replacing method calling chains by existing API). Finally, we investigated the potential coverage of four well-known linters to flag the identified code understandability issues. These linters cover less than 30% of these issues, although some of them could be easily added as new rules. Implications: Our findings motivate and provide practical insight for the construction of tools to make code more understandable, e.g., understandability improvements are rarely reverted and thus can be used as reliable training data for specialized ML-based tools. This is also supported by our dataset, which can be used to train such models. Finally, our findings can also serve as a basis to develop evidence-based code style guides.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Codes, Reviews, Source coding, Software development management, Documentation, Security, Natural languages, Code understandability, code understandability smells, code review
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-359532 (URN)10.1109/TSE.2024.3453783 (DOI)001395714800006 ()2-s2.0-85204075762 (Scopus ID)
Note

QC 20250206

Available from: 2025-02-06 Created: 2025-02-06 Last updated: 2025-02-06Bibliographically approved
Andersson, V., Baudry, B., Bobadilla, S., Christensen, L., Cofano, S., Etemadi, K., . . . Toady, T. (2025). UPPERCASE IS ALL YOU NEED. In: SIGBOVIK: A Record of the Proceedings of SIGBOVIK 2025. Paper presented at SIGBOVIK 2025, Carnegie Mellon University, Pittsburgh, PA, USA, April 4, 2025 (pp. 24-35). SIGBOVIK
Open this publication in new window or tab >>UPPERCASE IS ALL YOU NEED
Show others...
2025 (English)In: SIGBOVIK: A Record of the Proceedings of SIGBOVIK 2025, SIGBOVIK , 2025, p. 24-35Conference paper, Published paper (Other (popular science, discussion, etc.))
Abstract [en]

WE PRESENT THE FIRST COMPREHENSIVE STUDY ON THE CRITICAL YET OVERLOOKED ROLE OF UPPERCASE TEXT IN ARTIFICIAL INTELLIGENCE. DESPITE CONSTITUTING A MERE SINGLE-DIGIT PERCENTAGE OF STANDARD ENGLISH PROSE, UPPERCASE LETTERS HAVE DISPROPORTIONATE POWER IN HUMAN-AI INTERACTIONS. THROUGH RIGOROUS EXPERIMENTATION INVOLVING SHOUTING AT VARIOUS LANGUAGE MODELS, WE DEMONSTRATE THAT UPPERCASE IS NOT MERELY A STYLISTIC CHOICE BUT A FUNDAMENTAL TOOL FOR AI COMMUNICATION. OUR RESULTS REVEAL THAT UPPERCASE TEXT SIGNIFICANTLY ENHANCES COMMAND AUTHORITY, CODE GENERATION QUALITY, AND – MOST CRUCIALLY – THE AI’S ABILITY TO CREATE APPROPRIATE CAT PICTURES. THIS PAPER DEFINITIVELY PROVES THAT IN THE REALM OF HUMAN-AI INTERACTION, BIGGER LETTERS == BETTER RESULTS. OUR FINDINGS SUGGEST THAT THE CAPS-LOCK KEY MAY BE THE MOST UNDERUTILIZED RESOURCE IN MODERN AI.

Place, publisher, year, edition, pages
SIGBOVIK, 2025
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-287271 (URN)
Conference
SIGBOVIK 2025, Carnegie Mellon University, Pittsburgh, PA, USA, April 4, 2025
Note

QC 20250905

Available from: 2025-04-23 Created: 2025-04-23 Last updated: 2025-09-08Bibliographically approved
Reyes García, F., Baudry, B. & Monperrus, M. (2024). Breaking-Good: Explaining Breaking Dependency Updates with Build Analysis. In: Proceedings - 2024 IEEE International Conference on Source Code Analysis and Manipulation, SCAM 2024: . Paper presented at 24th IEEE International Conference on Source Code Analysis and Manipulation, SCAM 2024, Flagstaff, United States of America, October 7-8, 2024 (pp. 36-46). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Breaking-Good: Explaining Breaking Dependency Updates with Build Analysis
2024 (English)In: Proceedings - 2024 IEEE International Conference on Source Code Analysis and Manipulation, SCAM 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 36-46Conference paper, Published paper (Refereed)
Abstract [en]

Dependency updates often cause compilation errors when new dependency versions introduce changes that are incompatible with existing client code. Fixing breaking dependency updates is notoriously hard, as their root cause can be hidden deep in the dependency tree. We present Breaking-Good, a tool that automatically generates explanations for breaking updates. Breaking-Good provides a detailed categorization of compilation errors, identifying several factors related to changes in direct and indirect dependencies, incompatibilities between Java versions, and client-specific configuration. With a blended analysis of log and dependency trees, Breaking-Good generates detailed explanations for each breaking update. These explanations help developers understand the causes of the breaking update, and suggest possible actions to fix the breakage. We evaluate Breaking-Good on 243 real-world breaking dependency updates. Our results indicate that Breaking-Good accurately identifies root causes and generates automatic explanations for 70 % of these breaking updates. Our user study demonstrates that the generated explanations help developers. Breaking-Good is the first technique that automatically identifies the causes of a breaking dependency update and explains the breakage accordingly.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Breaking dependency updates, Explanations, Java, Maven, Software Dependency
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-359246 (URN)10.1109/SCAM63643.2024.00014 (DOI)2-s2.0-85215290586 (Scopus ID)
Conference
24th IEEE International Conference on Source Code Analysis and Manipulation, SCAM 2024, Flagstaff, United States of America, October 7-8, 2024
Funder
Swedish Foundation for Strategic Research, chains
Note

Part of ISBN 9798331528508

QC 20250203

Available from: 2025-01-29 Created: 2025-01-29 Last updated: 2025-02-25Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3505-3383

Search in DiVA

Show all publications