kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automatic Observability for Dockerized Java Applications
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. (ASSERT)ORCID iD: 0000-0002-7211-3894
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. (ASSERT)ORCID iD: 0000-0003-0293-2592
Show others and affiliations
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Docker is a virtualization technique heavily used in industry to build cloud-based systems. In this context, observability means that it is hard for engineers to get timely and accurate information about the running state in production, due to scale and virtualization. In this paper, we present a novel approach, called POBS, to automatically improve observability of Dockerized Java applications. POBS is based on automated transformations of Docker configuration files. Our approach injects additional modules in the production application, for providing better observability and for supporting fault injection. We evaluate POBS with open-source Java applications. Our key result is that 564/880 (64%) of Docker configuration files can be automatically augmented with better observability. This calls for more research on automated transformation techniques in the Docker ecosystem.

Keywords [en]
observability, fault injection, dynamic analysis, software resilience, Docker
National Category
Software Engineering
Identifiers
URN: urn:nbn:se:kth:diva-275717OAI: oai:DiVA.org:kth-275717DiVA, id: diva2:1437439
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20210113

Available from: 2020-06-09 Created: 2020-06-09 Last updated: 2022-11-02Bibliographically approved
In thesis
1. Application-level Chaos Engineering
Open this publication in new window or tab >>Application-level Chaos Engineering
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

With the development of software techniques, software systems nowadays are becoming highly complex. In order to keep such systems as reliable as possible, developers need to design various error-handling mechanisms. Considering that the error-handling code needs to work properly in production, it should not only be tested offline but also verified in production after deploying the system. Chaos engineering is a technique that assesses a software system's error-handling mechanisms in production directly. In order to apply chaos engineering, developers first monitor the target system and identify its steady state. Then specific failures are injected in a controlled manner so that the system's error-handling code is triggered and analyzed. By comparing the observed behavior during a chaos engineering experiment with the steady state, developers confirm whether the designed error-handling mechanisms work as expected.

In the field of chaos engineering, there still exist technical challenges that affect the effectiveness of the approach. This thesis makes contributions to the following three open challenges in chaos engineering.

First of all, as chaos engineering experiments are done in production, it is important to improve the efficiency of these experiments. In order to reduce unrealistic experiments, we propose a new approach that synthesizes chaos engineering fault models using the naturally happening errors in production.

Second, in order to analyze a system's steady state and detect its abnormal behavior during chaos engineering experiments, sufficient observability is the key. We propose a multi-layer observability improvement solution for Dockerized Java applications. With the help of our solution, developers are able to improve an application's observability at the operating system level, the runtime environment level, and the application level, with limited effort.

Last, chaos engineering should be helpful to locate actual places for resilience improvements. We propose three fault injection approaches that apply chaos engineering at the application level to take domain-specific knowledge into consideration.

Abstract [sv]

Modern mjukvarusystem blir allt mer komplex i samband med ett växande behov av digitalisering i samhället. För att bibehålla tillförlitligheten gentemot det ökande komplexitetet i systemet, löser utvecklare detta genom att introducera olika felhantering mekanismer som utvärderas med rigorösa tester och verifieringar både i test- och produktionsmiljö för säkerställandet av ett fungerade system. För utvärdering av produktionsmiljö, kan kaosteknik appliceras genom att först övervaka systemet för identifiering av dess stabila tillstånd. Sedan kan specifika och kontrollerbara fel injiceras för att aktivera de nämnda felhantering mekanismer som producerar utdata. Denna utdatan fångas upp för att jämföra med de beteende hos systemet i sitt stabila tillstånd i målet att verifiera om mekanismerna fungerar som förväntas.

Inom området kaosteknik finns det fortfarande tekniska utmaningar som påverkar effektiviteten. Denna avhandling ger bidrag till följande tre öppna utmaningar inom kaosteknik.

Först och främst, eftersom kaostekniska experiment görs i produktionen, är det viktigt att förbättra effektiviteten i dessa experiment. För att minska orealistiska experiment föreslår vi ett nytt tillvägagångssätt som syntetiserar kaostekniska felmodeller med hjälp av de naturliga felen i produktionen.

För det andra, för att analysera ett systems steady state och upptäcka dess onormala beteende under kaostekniska experiment, är tillräcklig observerbarhet nyckeln. Vi föreslår en lösning för förbättring av observerbarhet i flera lager för Dockeriserade Java-applikationer. Med hjälp av vår lösning kan utvecklare förbättra en applikations observerbarhet på operativsystemnivå, körtidsmiljönivå och applikationsnivå, med begränsad ansträngning.

Till sist borde kaosteknik vara till hjälp för att hitta faktiska platser för förbättringar av motståndskraften. Vi föreslår tre felinjektionsmetoder som tillämpar kaosteknik på applikationsnivå för att ta hänsyn till domänspecifik kunskap.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2022. p. vi, 71
Series
TRITA-EECS-AVL ; 2022:57
Keywords
fault injection, dynamic analysis, software resilience, chaos engineering
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-320638 (URN)978-91-8040-347-4 (ISBN)
Public defence
2022-11-29, Zoom: https://kth-se.zoom.us/j/61717169026, F3, Lindstedtsvägen 26 & 28, Stockholm, 09:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20221102

Available from: 2022-11-02 Created: 2022-11-02 Last updated: 2025-10-30Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

arXiv

Authority records

Zhang, LongTiwari, Deepika

Search in DiVA

By author/editor
Zhang, LongTiwari, Deepika
By organisation
Theoretical Computer Science, TCS
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 152 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf