kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Augmenting Test Oracles with Production Observations
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0003-0293-2592
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Software testing is the process of verifying that a software system behaves as it is intended to behave. Significant resources are invested in creating and maintaining strong test suites to ensure software quality. However, in-house tests seldom reflect all the scenarios that may occur as a software system executes in production environments. The literature on the automated generation of tests proposes valuable techniques that assist developers with their testing activities. Yet the gap between tested behaviors and field behaviors remains largely overlooked. Consequently, the behaviors relevant for end users are not reflected in the test suite, and the faults that may surface for end-users in the field may remain undetected by developer-written or automatically generated tests.

This thesis proposes a novel framework for using production observations, made as a system executes in the field, in order to generate tests. The generated tests include test inputs that are sourced from the field, and oracles that verify behaviors exhibited by the system in response to these inputs. We instantiate our framework in three distinct ways.

First, for a target project, we focus on methods that are inadequately tested by the developer-written test suite. At runtime, we capture objects that are associated with the invocations of these methods. The captured objects are used to generate tests that recreate the observed production state and contain oracles that specify the expected behavior. Our evaluation demonstrates that this strategy results in improved test quality for the target project.

With the second instantiation of our framework, we observe the invocations of target methods at runtime, as well as the invocations of methods called within the target methods. Using the objects associated with these invocations, we generate tests that use mocks, stubs, and mock-based oracles. We find that the generated oracles verify distinct aspects of the behaviors observed in the field, and also detect regressions within the system.

Third, we adapt our framework to capture the arguments with which target methods are invoked, during the execution of the test suite and in the field. We generate a data provider using the union of captured arguments, which supplies values to a parameterized unit test that is derived from a developer-written unit test. Using this strategy, we discover developer-written oracles that are actually generalizable to a larger input space.

We evaluate the three instances of our proposed framework against real-world software projects exercised with production workloads. Our findings demonstrate that runtime observations can be harnessed to generate complete tests, with inputs and oracles. The generated tests are representative of real-world usage, and can augment developer-written test suites.

Abstract [sv]

Programvarutestning är processen för att verifiera att ett mjukvarusystem fungerar som det är tänkt att fungera. Betydande resurser investeras i att skapa och underhålla starka testsviter för att säkerställa mjukvarukvalitet. Interna tester återspeglar dock sällan alla scenarier som kan uppstå när ett mjukvarusystem körs i produktionsmiljöer. Litteraturen om automatiserad testgenerering föreslår värdefulla tekniker för att hjälpa utvecklare i deras testaktiviteter. Ändå förbises gapet mellan testade beteenden och beteenden i produktionsmiljöer till stor del. Följaktligen återspeglas inte beteenden som är relevanta för slutanvändare i testsviten, och de fel som kan visas för slutanvändare i reella situationer kan förbli oupptäckta av utvecklarskrivna eller automatiskt genererade tester.

Denna avhandling föreslår ett nytt ramverk för att använda produktionsobservationer, gjorda när ett system exekverar i produktionsmiljö, för att generera tester. De genererade testen inkluderar testindata som kommer från reella användare och orakel som verifierar beteenden som uppvisas av systemet som svar på dessa indata. Vi instansierar vårt ramverk på tre olika sätt.

Först, för ett målprojekt, fokuserar vi på metoder som är otillräckligt testade av den utvecklarskrivna testsviten. Vid körning registrerar vi objekt som är associerade med anropen till dessa metoder. De registrerade objekten används för att generera tester som återskapar det observerade produktionstillståndet och innehåller orakel som anger det förväntade beteendet. Vår utvärdering visar att denna strategi resulterar i förbättrad testkvalitet för målprojektet.

Med den andra instansieringen av vårt ramverk observerar vi anrop till målmetoder vid körning, såväl som anrop till metoder som anropas inom målmetoderna. Med hjälp av objekten som är associerade med dessa anrop genererar vi tester som använder mocks, stubs och mock-baserade orakel. Vi finner att de genererade oraklen verifierar distinkta aspekter av beteenden som observerats i produktionsmiljöer, och även upptäcker regressioner inom systemet.

För det tredje anpassar vi vårt ramverk för att registrera de argument med vilka målmetoder anropas, under körning av testsviter och i produktion. Vi genererar en dataleverantör med hjälp av sammansättningen av registrerade argument, som tillhandahåller värden till ett parameteriserat enhetstest härlett från ett utvecklarskrivet enhetstest. Med den här strategin upptäcker vi utvecklarskrivna orakel som faktiskt är generaliserbara till ett större inmatningsutrymme.

Vi utvärderar de tre fallen av vårt föreslagna ramverk mot verkliga programvaruprojekt som körs med produktionsbelastning. Våra resultat visar att körtidsobservationer kan utnyttjas för att generera kompletta tester, med indata och orakel. De genererade testerna är representativa för användning i verkligheten och kan utöka utvecklarskrivna testsviter.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2024. , p. ix, 71
Series
TRITA-EECS-AVL ; 2024:87
Keywords [en]
Test generation, Test oracles, Production observations
Keywords [sv]
Testgenerering, Testorakel, Produktionsobservationer
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:kth:diva-356183ISBN: 978-91-8106-109-3 (print)OAI: oai:DiVA.org:kth-356183DiVA, id: diva2:1912540
Public defence
2024-12-13, https://kth-se.zoom.us/j/64605922145, Kollegiesalen, Brinellvägen 6, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20241112

Available from: 2024-11-12 Created: 2024-11-12 Last updated: 2024-11-18Bibliographically approved
List of papers
1. Production Monitoring to Improve Test Suites
Open this publication in new window or tab >>Production Monitoring to Improve Test Suites
2021 (English)In: IEEE Transactions on Reliability, ISSN 0018-9529, E-ISSN 1558-1721, p. 1-17Article in journal (Refereed) Published
Abstract [en]

In this article, we propose to use production executions to improve the quality of testing for certain methods of interest for developers. These methods can be methods that are not covered by the existing test suite or methods that are poorly tested. We devise an approach called pankti which monitors applications as they execute in production and then automatically generates differential unit tests, as well as derived oracles, from the collected data. pankti’s monitoring and generation focuses on one single programming language, Java. We evaluate it on three real-world, open-source projects: a videoconferencing system, a PDF manipulation library, and an e-commerce application. We show that pankti is able to generate differential unit tests by monitoring target methods in production and that the generated tests improve the quality of the test suite of the application under consideration.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Production monitoring, test generation, test improvement, test oracle, test quality
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-309112 (URN)10.1109/tr.2021.3101318 (DOI)000732651500001 ()2-s2.0-85114718407 (Scopus ID)
Funder
Knut and Alice Wallenberg FoundationWallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20220223

Available from: 2022-02-21 Created: 2022-02-21 Last updated: 2024-11-12Bibliographically approved
2. Harvesting Production GraphQL Queries to Detect Schema Faults
Open this publication in new window or tab >>Harvesting Production GraphQL Queries to Detect Schema Faults
2022 (English)In: 2022 IEEE Conference on Software Testing, Verification and Validation (ICST 2022), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 365-376Conference paper, Published paper (Refereed)
Abstract [en]

GraphQL is a new paradigm to design web APIs. Despite its growing popularity, there are few techniques to verify the implementation of a GraphQL, API. We present a new testing approach based on GraphQL queries that are logged while users interact with an application in production. Our core motivation is that production queries capture real usages of the application, and are known to trigger behavior that may not be tested by developers. For each logged query, a test is generated to assert the validity of the GraphQL response with respect to the schema. We implement our approach in a tool called AutoGraphQL, and evaluate it on two real-world case studies that are diverse in their domain and technology stack: an open-source e-commerce application implemented in Python called Saleor, and an industrial case study which is a PHP-based finance website called Frontapp. AutoGraphQL successfully generates test cases for the two applications. The generated tests cover 26.9% of the Saleor schema, including parts of the API not exercised by the original test suite, as well as 48.7% of the Frontapp schema, detecting 8 schema faults, thanks to production queries.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
IEEE International Conference on Software Testing Verification and Validation, ISSN 2381-2834
Keywords
GraphQL, production monitoring, automated test generation, test oracle, API testing, schema
National Category
Information Studies Other Computer and Information Science Computer Sciences
Identifiers
urn:nbn:se:kth:diva-318242 (URN)10.1109/ICST53961.2022.00014 (DOI)000850246600033 ()2-s2.0-85129497249 (Scopus ID)
Conference
15th IEEE International Conference on Software Testing, Verification and Validation (ICST), APR 04-13, 2022, ELECTR NETWORK
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

Part of proceedings: ISBN 978-1-6654-6679-0, QC 20220920

Available from: 2022-09-20 Created: 2022-09-20 Last updated: 2024-11-12Bibliographically approved
3. PROZE: Generating Parameterized Unit Tests Informed by Runtime Data
Open this publication in new window or tab >>PROZE: Generating Parameterized Unit Tests Informed by Runtime Data
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Typically, a conventional unit test (CUT) verifies the expected behavior of the unit under test through one specific input / output pair. In contrast, a parameterized unit test (PUT) receives a set of inputs as arguments, and contains assertions that are expected to hold true for all these inputs. PUTs increase test quality, as they assess correctness on a broad scope of inputs and behaviors. However, defining assertions over a set of inputs is a hard task for developers, which limits the adoption of PUTs in practice.In this paper, we address the problem of finding oracles for PUTs that hold over multiple inputs. We design a system called PROZE, that generates PUTs by identifying developer-written assertions that are valid for more than one test input. We implement our approach as a two-step methodology: first, at runtime, we collect inputs for a target method that is invoked within a CUT; next, we isolate the valid assertions of the CUT to be used within a PUT.We evaluate our approach against 5 real-world Java modules, and collect valid inputs for 128 target methods from test and field executions. We generate 2,287 PUTs, which invoke the target methods with a significantly larger number of test inputs than the original CUTs. We execute the PUTs and find 217 that provably demonstrate that their oracles hold for a larger range of inputs than envisioned by the developers. From a testing theory perspective, our results show that developers express assertions within CUTs that are general enough to hold beyond one particular input. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-356174 (URN)10.48550/arXiv.2407.00768 (DOI)
Conference
IEEE International Conference on Source Code Analysis and Manipulation SCAM, October 7-8 in Flagstaff, Arizona, USA
Note

QC 20241111

Available from: 2024-11-09 Created: 2024-11-09 Last updated: 2024-11-12Bibliographically approved
4. Mimicking Production Behavior with Generated Mocks
Open this publication in new window or tab >>Mimicking Production Behavior with Generated Mocks
2024 (English)In: IEEE Transactions on Software Engineering, ISSN 0098-5589, E-ISSN 1939-3520, p. 1-26Article in journal (Refereed) Epub ahead of print
Abstract [en]

Mocking allows testing program units in isolation. A developer who writes tests with mocks faces two challenges: design realistic interactions between a unit and its environment; and understand the expected impact of these interactions on the behavior of the unit. In this paper, we propose to monitor an application in production to generate tests that mimic realistic execution scenarios through mocks. Our approach operates in three phases. First, we instrument a set of target methods for which we want to generate tests, as well as the methods that they invoke, which we refer to as mockable method calls. Second, in production, we collect data about the context in which target methods are invoked, as well as the parameters and the returned value for each mockable method call. Third, offline, we analyze the production data to generate test cases with realistic inputs and mock interactions. The approach is automated and implemented in an open-source tool called RICK. We evaluate our approach with three real-world, opensource Java applications. RICK monitors the invocation of 128 methods in production across the three applications and captures their behavior. Based on this captured data, RICK generates test cases that include realistic initial states and test inputs, as well as mocks and stubs. All the generated test cases are executable, and 52.4% of them successfully mimic the complete execution context of the target methods observed in production. The mock-based oracles are also effective at detecting regressions within the target methods, complementing each other in their fault-finding ability. We interview 5 developers from the industry who confirm the relevance of using production observations to design mocks and stubs. Our experimental findings clearly demonstrate the feasibility and added value of generating mocks from production interactions.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Software Engineering
Identifiers
urn:nbn:se:kth:diva-356173 (URN)10.1109/tse.2024.3458448 (DOI)2-s2.0-85204006940 (Scopus ID)
Note

QC 20241111

Available from: 2024-11-09 Created: 2024-11-09 Last updated: 2024-11-12Bibliographically approved

Open Access in DiVA

DeepikaTiwari_PhDThesis(4574 kB)698 downloads
File information
File name FULLTEXT01.pdfFile size 4574 kBChecksum SHA-512
52b5c1b5e3fd08ba5b398336b62990c8892e14d3e1c96e2947b771a370b6cadb5af15230de280b3d642b3c2b1b4f1ab2cdaac3ef6466f7d8a598914105d93107
Type fulltextMimetype application/pdf

Authority records

Tiwari, Deepika

Search in DiVA

By author/editor
Tiwari, Deepika
By organisation
Software and Computer systems, SCS
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 698 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1074 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf