kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Mimicking Production Behavior with Generated Mocks
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0003-0293-2592
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS.ORCID iD: 0000-0003-3505-3383
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0002-4015-4640
2024 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, Vol. 50, no 11, p. 2921-2946Article in journal (Refereed) Published
Abstract [en]

Mocking allows testing program units in isolation. A developer who writes tests with mocks faces two challenges: design realistic interactions between a unit and its environment; and understand the expected impact of these interactions on the behavior of the unit. In this paper, we propose to monitor an application in production to generate tests that mimic realistic execution scenarios through mocks. Our approach operates in three phases. First, we instrument a set of target methods for which we want to generate tests, as well as the methods that they invoke, which we refer to as mockable method calls. Second, in production, we collect data about the context in which target methods are invoked, as well as the parameters and the returned value for each mockable method call. Third, offline, we analyze the production data to generate test cases with realistic inputs and mock interactions. The approach is automated and implemented in an open-source tool called RICK. We evaluate our approach with three real-world, opensource Java applications. RICK monitors the invocation of 128 methods in production across the three applications and captures their behavior. Based on this captured data, RICK generates test cases that include realistic initial states and test inputs, as well as mocks and stubs. All the generated test cases are executable, and 52.4% of them successfully mimic the complete execution context of the target methods observed in production. The mock-based oracles are also effective at detecting regressions within the target methods, complementing each other in their fault-finding ability. We interview 5 developers from the industry who confirm the relevance of using production observations to design mocks and stubs. Our experimental findings clearly demonstrate the feasibility and added value of generating mocks from production interactions.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. Vol. 50, no 11, p. 2921-2946
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:kth:diva-356173DOI: 10.1109/tse.2024.3458448ISI: 001369099900010Scopus ID: 2-s2.0-85204006940OAI: oai:DiVA.org:kth-356173DiVA, id: diva2:1911846
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20250120

Available from: 2024-11-09 Created: 2024-11-09 Last updated: 2025-01-20Bibliographically approved
In thesis
1. Augmenting Test Oracles with Production Observations
Open this publication in new window or tab >>Augmenting Test Oracles with Production Observations
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Software testing is the process of verifying that a software system behaves as it is intended to behave. Significant resources are invested in creating and maintaining strong test suites to ensure software quality. However, in-house tests seldom reflect all the scenarios that may occur as a software system executes in production environments. The literature on the automated generation of tests proposes valuable techniques that assist developers with their testing activities. Yet the gap between tested behaviors and field behaviors remains largely overlooked. Consequently, the behaviors relevant for end users are not reflected in the test suite, and the faults that may surface for end-users in the field may remain undetected by developer-written or automatically generated tests.

This thesis proposes a novel framework for using production observations, made as a system executes in the field, in order to generate tests. The generated tests include test inputs that are sourced from the field, and oracles that verify behaviors exhibited by the system in response to these inputs. We instantiate our framework in three distinct ways.

First, for a target project, we focus on methods that are inadequately tested by the developer-written test suite. At runtime, we capture objects that are associated with the invocations of these methods. The captured objects are used to generate tests that recreate the observed production state and contain oracles that specify the expected behavior. Our evaluation demonstrates that this strategy results in improved test quality for the target project.

With the second instantiation of our framework, we observe the invocations of target methods at runtime, as well as the invocations of methods called within the target methods. Using the objects associated with these invocations, we generate tests that use mocks, stubs, and mock-based oracles. We find that the generated oracles verify distinct aspects of the behaviors observed in the field, and also detect regressions within the system.

Third, we adapt our framework to capture the arguments with which target methods are invoked, during the execution of the test suite and in the field. We generate a data provider using the union of captured arguments, which supplies values to a parameterized unit test that is derived from a developer-written unit test. Using this strategy, we discover developer-written oracles that are actually generalizable to a larger input space.

We evaluate the three instances of our proposed framework against real-world software projects exercised with production workloads. Our findings demonstrate that runtime observations can be harnessed to generate complete tests, with inputs and oracles. The generated tests are representative of real-world usage, and can augment developer-written test suites.

Abstract [sv]

Programvarutestning är processen för att verifiera att ett mjukvarusystem fungerar som det är tänkt att fungera. Betydande resurser investeras i att skapa och underhålla starka testsviter för att säkerställa mjukvarukvalitet. Interna tester återspeglar dock sällan alla scenarier som kan uppstå när ett mjukvarusystem körs i produktionsmiljöer. Litteraturen om automatiserad testgenerering föreslår värdefulla tekniker för att hjälpa utvecklare i deras testaktiviteter. Ändå förbises gapet mellan testade beteenden och beteenden i produktionsmiljöer till stor del. Följaktligen återspeglas inte beteenden som är relevanta för slutanvändare i testsviten, och de fel som kan visas för slutanvändare i reella situationer kan förbli oupptäckta av utvecklarskrivna eller automatiskt genererade tester.

Denna avhandling föreslår ett nytt ramverk för att använda produktionsobservationer, gjorda när ett system exekverar i produktionsmiljö, för att generera tester. De genererade testen inkluderar testindata som kommer från reella användare och orakel som verifierar beteenden som uppvisas av systemet som svar på dessa indata. Vi instansierar vårt ramverk på tre olika sätt.

Först, för ett målprojekt, fokuserar vi på metoder som är otillräckligt testade av den utvecklarskrivna testsviten. Vid körning registrerar vi objekt som är associerade med anropen till dessa metoder. De registrerade objekten används för att generera tester som återskapar det observerade produktionstillståndet och innehåller orakel som anger det förväntade beteendet. Vår utvärdering visar att denna strategi resulterar i förbättrad testkvalitet för målprojektet.

Med den andra instansieringen av vårt ramverk observerar vi anrop till målmetoder vid körning, såväl som anrop till metoder som anropas inom målmetoderna. Med hjälp av objekten som är associerade med dessa anrop genererar vi tester som använder mocks, stubs och mock-baserade orakel. Vi finner att de genererade oraklen verifierar distinkta aspekter av beteenden som observerats i produktionsmiljöer, och även upptäcker regressioner inom systemet.

För det tredje anpassar vi vårt ramverk för att registrera de argument med vilka målmetoder anropas, under körning av testsviter och i produktion. Vi genererar en dataleverantör med hjälp av sammansättningen av registrerade argument, som tillhandahåller värden till ett parameteriserat enhetstest härlett från ett utvecklarskrivet enhetstest. Med den här strategin upptäcker vi utvecklarskrivna orakel som faktiskt är generaliserbara till ett större inmatningsutrymme.

Vi utvärderar de tre fallen av vårt föreslagna ramverk mot verkliga programvaruprojekt som körs med produktionsbelastning. Våra resultat visar att körtidsobservationer kan utnyttjas för att generera kompletta tester, med indata och orakel. De genererade testerna är representativa för användning i verkligheten och kan utöka utvecklarskrivna testsviter.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. p. ix, 71
Series
TRITA-EECS-AVL ; 2024:87
Keywords
Test generation, Test oracles, Production observations, Testgenerering, Testorakel, Produktionsobservationer
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-356183 (URN)978-91-8106-109-3 (ISBN)
Public defence
2024-12-13, https://kth-se.zoom.us/j/64605922145, Kollegiesalen, Brinellvägen 6, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20241112

Available from: 2024-11-12 Created: 2024-11-12 Last updated: 2024-11-18Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Tiwari, DeepikaMonperrus, MartinBaudry, Benoit

Search in DiVA

By author/editor
Tiwari, DeepikaMonperrus, MartinBaudry, Benoit
By organisation
Software and Computer systems, SCSTheoretical Computer Science, TCS
In the same journal
IEEE Transactions on Software Engineering
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 55 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf