kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 34) Show all publications
Spenger, J., Carbone, P. & Haller, P. (2026). Failure-Transparent Actors. In: Concurrent Programming, Open Systems and Formal Methods: (pp. 81-113). Springer Nature, LNCS 16120
Open this publication in new window or tab >>Failure-Transparent Actors
2026 (English)In: Concurrent Programming, Open Systems and Formal Methods, Springer Nature , 2026, Vol. LNCS 16120, p. 81-113Chapter in book (Refereed)
Abstract [en]

Failures in a distributed system are not only possible but expected and notoriously difficult to handle. For this reason, it is imperative to provide system-level means for building failure-transparent services, i.e., services which transparently recover from failures, effectively masking them. Towards this, this paper presents a syntax and semantics for compositionally failure-transparent actors. It is structured around three kinds of failure-transparent compositions: composition within a system; between systems; and application-level composition. For the former two, we prove that the semantics is failure transparent by simulation using prophecy variables. For the latter, we discuss its implementation; additionally, we discuss the necessity for leaking system-level failures to the application-level. The presented material provides low-level building blocks for failure-transparent services, thus greatly simplifying their construction.

Place, publisher, year, edition, pages
Springer Nature, 2026
Series
Lecture Notes in Computer Science ; 16120
Keywords
Actor model, Failure transparency, Operational semantics, Service composition
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-372178 (URN)10.1007/978-3-032-05291-9_4 (DOI)2-s2.0-105017373606 (Scopus ID)
Note

Part of ISBN 9783032052902, 9783032052919

QC 20251028

Available from: 2025-10-28 Created: 2025-10-28 Last updated: 2025-11-19Bibliographically approved
Mei, Y., Xia, R., Lan, Z., Hu, K., Huang, L., Carbone, P., . . . Wang, F. (2025). Disaggregated State Management in Apache Flink® 2.0. In: Proceedings of the VLDB Endowment: . Paper presented at 51st International Conference on Very Large Data Bases, VLDB 2025, London, United Kingdom of Great Britain, Sep 1 2025 - Sep 5 2025 (pp. 4846-4859). Association for Computing Machinery (ACM), 18(12)
Open this publication in new window or tab >>Disaggregated State Management in Apache Flink® 2.0
Show others...
2025 (English)In: Proceedings of the VLDB Endowment, Association for Computing Machinery (ACM) , 2025, Vol. 18, no 12, p. 4846-4859Conference paper, Published paper (Other academic)
Abstract [en]

We present Apache Flink 2.0, an evolution of the popular stream processing system's architecture that decouples computation from state management. Flink 2.0 relies on a remote distributed file system (DFS) for primary state storage and uses local disks as a secondary cache, with state updates streamed continuously and directly to the DFS. To address the latency implications of remote storage, Flink 2.0 incorporates an asynchronous runtime execution model. Furthermore, Flink 2.0 introduces ForSt, a novel state store featuring a unified file system that enables faster and lightweight checkpointing, recovery, and reconfiguration with minimal intrusion to the existing Flink runtime architecture. Using a comprehensive set of Nexmark benchmarks and a large-scale stateful production workload, we evaluate Flink 2.0's large-state processing, checkpointing, and recovery mechanisms. Our results show significant performance improvements and reduced resource utilization compared to the baseline Flink 1.20 implementation. Specifically, we observe up to 94% reduction in checkpoint duration, up to 49× faster recovery after failures or a rescaling operation, and up to 50% cost savings.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
National Category
Computer Sciences Computer Systems
Identifiers
urn:nbn:se:kth:diva-371714 (URN)10.14778/3750601.3750609 (DOI)2-s2.0-105016612021 (Scopus ID)
Conference
51st International Conference on Very Large Data Bases, VLDB 2025, London, United Kingdom of Great Britain, Sep 1 2025 - Sep 5 2025
Note

QC 20251023

Available from: 2025-10-23 Created: 2025-10-23 Last updated: 2025-10-23Bibliographically approved
Gulisano, V., Papatriantafilou, M., Carbone, P., Cardellini, V. & Mencagli, G. (2025). Foreword. In: Debs 2025 Proceedings of the 19th ACM International Conference on Distributed and Event Based Systems: . Paper presented at DEBS '25: The 19th ACM International Conference on Distributed and Event-based Systems, Gothenburg Sweden, June 10 - 13, 2025 (pp. vi-vii). Association for Computing Machinery, Inc
Open this publication in new window or tab >>Foreword
Show others...
2025 (English)In: Debs 2025 Proceedings of the 19th ACM International Conference on Distributed and Event Based Systems, Association for Computing Machinery, Inc , 2025, p. vi-viiConference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Association for Computing Machinery, Inc, 2025
National Category
Information Systems
Identifiers
urn:nbn:se:kth:diva-369278 (URN)2-s2.0-105013053467 (Scopus ID)
Conference
DEBS '25: The 19th ACM International Conference on Distributed and Event-based Systems, Gothenburg Sweden, June 10 - 13, 2025
Note

Part of ISBN 9798400713323

QC 20250902

Available from: 2025-09-02 Created: 2025-09-02 Last updated: 2025-09-02Bibliographically approved
Spenger, J., Krafeld, K., van Gemeren, R., Haller, P. & Carbone, P. (2025). Holon Streaming: Global Aggregations with Windowed CRDTs.
Open this publication in new window or tab >>Holon Streaming: Global Aggregations with Windowed CRDTs
Show others...
2025 (English)Manuscript (preprint) (Other academic)
Abstract [en]

Scaling global aggregations is a challenge for exactly-once stream processing systems. Current systems implement these either by computing the aggregation in a single task instance, or by static aggregation trees, which limits scalability and may become a bottleneck. Moreover, the end-to-end latency is determined by the slowest path in the tree, and failures and reconfiguration cause large latency spikes due to the centralized coordination. Towards these issues, we present Holon Streaming, an exactly-once stream processing system for global aggregations. Its deterministic programming model uses windowed conflict-free replicated data types (Windowed CRDTs), a novel abstraction for shared replicated state. Windowed CRDTs make computing global aggregations scalable. Furthermore, their guarantees such as determinism and convergence enable the design of efficient failure recovery algorithms by decentralized coordination. Our evaluation shows a 5x lower latency and 2x higher throughput than an existing stream processing system on global aggregation workloads, with an 11x latency reduction under failure scenarios. The paper demonstrates the effectiveness of decentralized coordination with determinism, and the utility of Windowed CRDTs for global aggregations. 

National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-372428 (URN)10.48550/arXiv.2510.25757 (DOI)
Note

QC 20251112

Available from: 2025-11-11 Created: 2025-11-11 Last updated: 2025-11-19Bibliographically approved
Spenger, J., Carbone, P. & Haller, P. (2024). A Survey of Actor-Like Programming Models for Serverless Computing. In: Frank de Boer, Ferruccio Damiani, Reiner Hähnle, Einar Broch Johnsen, Eduard Kamburjan (Ed.), Active Object Languages: Current Research Trends (pp. 123-146). Springer Nature
Open this publication in new window or tab >>A Survey of Actor-Like Programming Models for Serverless Computing
2024 (English)In: Active Object Languages: Current Research Trends / [ed] Frank de Boer, Ferruccio Damiani, Reiner Hähnle, Einar Broch Johnsen, Eduard Kamburjan, Springer Nature , 2024, p. 123-146Chapter in book (Refereed)
Abstract [en]

Serverless computing promises to significantly simplify cloud computing by providing Functions-as-a-Service where invocations of functions, triggered by events, are automatically scheduled for execution on compute nodes. Notably, the serverless computing model does not require the manual provisioning of virtual machines; instead, FaaS enables load-based billing and auto-scaling according to the workload, reducing costs and making scheduling more efficient. While early serverless programming models only supported stateless functions and severely restricted program composition, recently proposed systems offer greater flexibility by adopting ideas from actor and dataflow programming. This paper presents a survey of actor-like programming abstractions for stateful serverless computing, and provides a characterization of their properties and highlights their origin. 

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14360
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-366418 (URN)10.1007/978-3-031-51060-1_5 (DOI)001215137600006 ()2-s2.0-85184287969 (Scopus ID)
Funder
Swedish Foundation for Strategic Research, BD15-0006EU, Horizon Europe, 101092711
Note

Part of ISBN 9783031510595, 9783031510601

QC 20250922

Available from: 2025-07-07 Created: 2025-07-07 Last updated: 2025-11-11Bibliographically approved
Fragkoulis, M., Carbone, P., Kalavri, V. & Katsifodimos, A. (2024). A survey on the evolution of stream processing systems. The VLDB journal, 33(2), 507-541
Open this publication in new window or tab >>A survey on the evolution of stream processing systems
2024 (English)In: The VLDB journal, ISSN 1066-8888, E-ISSN 0949-877X, Vol. 33, no 2, p. 507-541Article in journal (Refereed) Published
Abstract [en]

Stream processing has been an active research field for more than 20 years, but it is now witnessing its prime time due to recent successful efforts by the research community and numerous worldwide open-source communities. This survey provides a comprehensive overview of fundamental aspects of stream processing systems and their evolution in the functional areas of out-of-order data management, state management, fault tolerance, high availability, load management, elasticity, and reconfiguration. We review noteworthy past research findings, outline the similarities and differences between the first (’00–’10) and second (’11–’23) generation of stream processing systems, and discuss future trends and open problems.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Cloud applications, Fault-tolerance, Stream processing, Streaming analytics
National Category
Computer Sciences Computer Systems
Identifiers
urn:nbn:se:kth:diva-367109 (URN)10.1007/s00778-023-00819-8 (DOI)001433971000001 ()2-s2.0-85177547566 (Scopus ID)
Note

QC 20250715

Available from: 2025-07-15 Created: 2025-07-15 Last updated: 2025-07-15Bibliographically approved
Segeljakt, K., Haridi, S. & Carbone, P. (2024). AquaLang: A Dataflow Programming Language. In: DEBS 2024 - Proceedings of the 18th ACM International Conference on Distributed and Event-Based Systems: . Paper presented at 18th ACM International Conference on Distributed and Event-Based Systems, DEBS 2024, Villeurbanne, France, Jun 25 2024 - Jun 28 2024 (pp. 42-53). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>AquaLang: A Dataflow Programming Language
2024 (English)In: DEBS 2024 - Proceedings of the 18th ACM International Conference on Distributed and Event-Based Systems, Association for Computing Machinery (ACM) , 2024, p. 42-53Conference paper, Published paper (Refereed)
Abstract [en]

Dataflow systems are widely used today for building and running continuous data-intensive applications. However, the unavoidable semantic gap between the host languages of dataflow system libraries and the dataflow model creates programmability limitations that hinder performance, safety, and ease of use. We propose AquaLang, a new language designed for dataflow systems. Programs in AquaLang blend strongly typed relational and functional syntax and are verified using an effect system that prevents undefined behaviour that can occur when introducing user-defined logic that violates dataflow semantics. Unverified external code is also feasible in AquaLang through the novel use of sandboxing. Furthermore, on top of standard dataflow optimisations employed by current systems, AquaLang's ability to analyze algebraic properties of user-defined functions further unlocks the potential of deeper dataflow program re-writing. In our evaluation, we measure up to one order of magnitude speedup for Nexmark queries against hand-written Flink programs attributed to pushdown and window incrementalisation techniques.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
Data Streams, Dataflow Systems, Programming Languages
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-351925 (URN)10.1145/3629104.3666030 (DOI)001283849100006 ()2-s2.0-85200659561 (Scopus ID)
Conference
18th ACM International Conference on Distributed and Event-Based Systems, DEBS 2024, Villeurbanne, France, Jun 25 2024 - Jun 28 2024
Note

Part of ISBN 9798400704437

QC 20240827

Available from: 2024-08-19 Created: 2024-08-19 Last updated: 2024-09-10Bibliographically approved
Lindén, J., Ermedahl, A., Salomonsson, H., Daneshtalab, M., Forsberg, B. & Carbone, P. (2024). Autonomous Realization of Safety- and Time-Critical Embedded Artificial Intelligence. In: 2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024 - Proceedings: . Paper presented at 2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024, Valencia, Spain, Mar 25 2024 - Mar 27 2024. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Autonomous Realization of Safety- and Time-Critical Embedded Artificial Intelligence
Show others...
2024 (English)In: 2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024 - Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2024Conference paper, Published paper (Refereed)
Abstract [en]

There is an evident need to complement embedded critical control logic with AI inference, but today's AI-capable hardware, software, and processes are primarily targeted towards the needs of cloud-centric actors. Telecom and defense airspace industries, which make heavy use of specialized hardware, face the challenge of manually hand-tuning AI workloads and hardware, presenting an unprecedented cost and complexity due to the diversity and sheer number of deployed instances. Furthermore, embedded AI functionality must not adversely affect real-time and safety requirements of the critical business logic. To address this, end-to-end AI pipelines for critical platforms are needed to automate the adaption of networks to fit into resource-constrained devices under critical and real-time constraints, while remaining interoperable with de-facto standard AI tools and frameworks used in the cloud. We present two industrial applications where such solutions are needed to bring AI to critical and resource-constrained hardware, and a generalized end-to-end AI pipeline that addresses these needs. Crucial steps to realize it are taken in the industry-academia collaborative FASTER-AI project.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
embedded systems, machine learning
National Category
Computer Systems Software Engineering
Identifiers
urn:nbn:se:kth:diva-350536 (URN)10.23919/DATE58400.2024.10546824 (DOI)001253778900307 ()2-s2.0-85196520555 (Scopus ID)
Conference
2024 Design, Automation and Test in Europe Conference and Exhibition, DATE 2024, Valencia, Spain, Mar 25 2024 - Mar 27 2024
Note

Part of ISBN 978-3-9819263-8-5

QC 20241119

Available from: 2024-07-16 Created: 2024-07-16 Last updated: 2024-11-19Bibliographically approved
Siachamis, G., Psarakis, K., Fragkoulis, M., Van Deursen, A., Carbone, P. & Katsifodimos, A. (2024). CheckMate: Evaluating Checkpointing Protocols for Streaming Dataflows. In: Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024: . Paper presented at 40th IEEE International Conference on Data Engineering, ICDE 2024, Utrecht, Netherlands, Kingdom of the, May 13 2024 - May 17 2024 (pp. 4030-4043). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>CheckMate: Evaluating Checkpointing Protocols for Streaming Dataflows
Show others...
2024 (English)In: Proceedings - 2024 IEEE 40th International Conference on Data Engineering, ICDE 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 4030-4043Conference paper, Published paper (Refereed)
Abstract [en]

Stream processing in the last decade has seen broad adoption in both commercial and research settings. One key element for this success is the ability of modern stream processors to handle failures while ensuring exactly-once processing guarantees. At the moment of writing, virtually all stream processors that guarantee exactly-once processing implement a variant of Apache Flink's coordinated checkpoints -an extension of the original Chandy-Lamport checkpoints from 1985. However, the reasons behind this prevalence of the coordinated approach remain anecdotal, as reported by practitioners of the stream processing community. At the same time, common checkpointing approaches, such as the uncoordinated and the communication-induced ones, remain largely unexplored. This paper is the first to address this gap by i) shedding light on why practitioners have favored the coordinated approach and ii) investigating whether there are viable alternatives. To this end, we implement three checkpointing approaches that we surveyed and adapted for the distinct needs of streaming dataflows. Our analysis shows that the coordinated approach outperforms the uncoordinated and communication-induced protocols under uniformly distributed workloads. To our surprise, however, the uncoordinated approach is not only competitive to the coordinated one in uniformly distributed workloads, but it also outperforms the coordinated approach in skewed workloads. We conclude that rather than blindly employing coordinated checkpointing, research should focus on optimizing the very promising uncoordinated approach, as it can address issues with skew and support prevalent cyclic queries. We believe that our findings can trigger further research into checkpointing mechanisms.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Benchmarking, Checkpointing, Experimental evaluation, Fault tolerance, Stream processing
National Category
Computer Sciences Software Engineering
Identifiers
urn:nbn:se:kth:diva-351951 (URN)10.1109/ICDE60146.2024.00309 (DOI)2-s2.0-85200460157 (Scopus ID)
Conference
40th IEEE International Conference on Data Engineering, ICDE 2024, Utrecht, Netherlands, Kingdom of the, May 13 2024 - May 17 2024
Note

 Part of ISBN 9798350317152

QC 20240827

Available from: 2024-08-19 Created: 2024-08-19 Last updated: 2024-08-27Bibliographically approved
Hasselberg, A., Timoudas, T. O., Carbone, P. & Dán, G. (2024). Cliffhanger: An Experimental Evaluation of Stateful Serverless at the Edge. In: 2024 19th Wireless On-Demand Network Systems and Services Conference: . Paper presented at 19th Wireless On-Demand Network Systems and Services Conference (WONS), JAN 29-31, 2024, Chamonix, FRANCE (pp. 41-48). IEEE
Open this publication in new window or tab >>Cliffhanger: An Experimental Evaluation of Stateful Serverless at the Edge
2024 (English)In: 2024 19th Wireless On-Demand Network Systems and Services Conference, IEEE, 2024, p. 41-48Conference paper, Published paper (Refereed)
Abstract [en]

The serverless computing paradigm has transformed cloud service deployment by enabling automatic scaling of resources in response to varying demand. Building on this, stateful serverless computing introduces critical capabilities for data management, fault tolerance, and consistency, which are particularly relevant in the context of distributed deployments, notably in edge computing environments. In this work, we explore the feasibility of stateful serverless computing in resource-limited edge environments through an empirical study utilizing a multi-view object tracking application. Our results show that while these systems perform well in cloud environments, their effectiveness is severely affected at the edge due to state, application, and resource management solutions optimized for cloud environments. Existing solutions are most detrimental to applications with intermittent workloads, as typical combinations of concurrency handling and resource reservation can lead to minutes of unstable system behavior due to cold starts. Our results highlight the need for a tailored approach in stateful serverless systems for edge computing scenarios.

Place, publisher, year, edition, pages
IEEE, 2024
Series
Annual Conference on Wireless On Demand Network Systems and Services, ISSN 2688-4917
Keywords
Distributed computing, Edge Computing, Fog computing
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-345136 (URN)001181198900007 ()
Conference
19th Wireless On-Demand Network Systems and Services Conference (WONS), JAN 29-31, 2024, Chamonix, FRANCE
Note

QC 20240408

Part of ISBN 978-3-903176-61-4

Available from: 2024-04-08 Created: 2024-04-08 Last updated: 2024-04-08Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-9351-8508

Search in DiVA

Show all publications