kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Abstracting Failures Away From Stateful Dataflow Systems
KTH, School of Electrical Engineering and Computer Science (EECS).
2025 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Abstrahera Fel Från Tillståndfull Dataflöde (Swedish)
Abstract [en]

Systems distributed across several computers are essential for modern infrastruc- ture, and their reliability is reliant on the correctness of the constituent computers’ failure-handling protocols. Correctness in such systems is often understood as fail- ure transparency, a property that enables to use a system as if no failures occur in it; in other words, it states that there is a high-level model of the system, from which the failures are abstracted away. This work proves that failure transparency is provided by the Asynchronous Barrier Snapshotting protocol used in Apache Flink, a prominent distributed stateful dataflow system. This protocol is formal- ized in operational semantics for the first time in this thesis. As no prior definition of failure transparency is suitable for this formalization, a novel definition is pro- posed, applicable to systems expressed in small-step operational semantics with explicit failure-related rules. The work demonstrates how failure transparency can be proven by reasoning about each execution as a whole, presenting a proof tech- nique convenient for proofs about checkpoint-recovery protocols. The results are a first step towards a verified stateful dataflow programming stack.

Abstract [sv]

System fördelade över flera datorer är väsentliga för den moderna infrastrukturen, och deras tillförlitlighet är baserad på korrektheten i protokollen som hanterar fel i de ingående datorerna. Riktigheten förstås ofta som failure transparency, en egenskap som gör det möjligt att använda ett system som om inga fel uppstår i det; med andra ord står det att det finns en högnivåmodell av systemet, från vilken miss- lyckandena abstraheras bort. Detta arbete bevisar att feltransparens tillhandahålls av protokollet Asynchronous Barrier Snapshotting som används i Apache Flink, en framträdande representant för distribuerade system med stateful dataflöde. Den första operativa semantiken för protokollet presenteras; Dessutom, eftersom det inte fanns någon definition av feltransparens för modeller i småstegsoperativ se- mantik, föreslås en ny definition, tillämplig på system uttryckta i småstegsoperativa semantik med explicita felrelaterade regler. Beviset visar hur misslyckandetrans- parens kan bevisas genom att resonera om varje exekvering som helhet, vilket gör det praktiskt i bevis om protokoll för återställning av checkpoints. Resultaten är ett första steg mot en verifierad stack för stateful dataflödesprogrammering.

Place, publisher, year, edition, pages
2025. , p. 74
Series
TRITA-EECS-EX ; 2025:82
Keywords [en]
Failure Transparency, Stateful Dataflow, Operational Semantics, Checkpoint Recovery
Keywords [sv]
Feltransparens, Tillståndfull Dataflöde, Operationell Semantik, Kontrollpunktsåterställning
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-363595OAI: oai:DiVA.org:kth-363595DiVA, id: diva2:1959284
Supervisors
Examiners
Available from: 2025-06-02 Created: 2025-05-20 Last updated: 2025-06-02Bibliographically approved

Open Access in DiVA

fulltext(853 kB)19 downloads
File information
File name FULLTEXT01.pdfFile size 853 kBChecksum SHA-512
8bb7520292d2a9e4ab2f0a061237f1c4fb97c95d71f1d203d44089eed6a3b4512444a3c4abacf431e5b84d194f9d3608b0e38f9d930fef70fd3e58f651f0b89a
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 19 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 148 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf