Change search
Refine search result
1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Carbone, Paris
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS. Ericsson AB.
    Fault Tolerant Distributed Complex Event Processing on Stream Computing Platforms2013Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    Recent advances in reliable distributed computing have made it possible to provide high availability and scalability to traditional systems and thus serve them as reliable services. For some systems, their parallel nature in addition to weak consistency requirements allowed a more trivial transision such as distributed storage, online data analysis, batch processing and distributed stream processing. On the other hand, systems such as Complex Event Processing (CEP) still maintain a monolithic architecture, being able to offer high expressiveness at the expense of low distribution. In this work, we address the main challenges of providing a highly-available Distributed CEP service with a focus on reliability, since it is the most crucial and untouched aspect of that transition. The experimental solution presented targets low average detection latency and leverages event delegation mechanisms present on existing stream execution platforms and in-memory logging to provide availability of any complex event processing abstraction on top via redundancy and partial recovery.

  • 2.
    Carbone, Paris
    KTH, School of Electrical Engineering and Computer Science (EECS), Software and Computer systems, SCS.
    Scalable and Reliable Data Stream Processing2018Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Data-stream management systems have for long been considered as a promising architecture for fast data management. The stream processing paradigm poses an attractive means of declaring persistent application logic coupled with state over evolving data. However, despite contributions in programming semantics addressing certain aspects of data streaming, existing approaches have been lacking a clear, universal specification for the underlying system execution. We investigate the case of data stream processing as a general-purpose scalable computing architecture that can support continuous and iterative state-driven workloads. Furthermore, we examine how this architecture can enable the composition of reliable, reconfigurable services and complex applications that go even beyond the needs of scalable data analytics, a major trend in the past decade.

    In this dissertation, we specify a set of core components and mechanisms to compose reliable data stream processing systems while adopting three crucial design principles: blocking-coordination avoidance, programming-model transparency, and compositionality. Furthermore, we identify the core open challenges among the academic and industrial state of the art and provide a complete solution using these design principles as a guide. Our contributions address the following problems: I) Reliable Execution and Stream State Management, II) Computation Sharing and Semantics for Stream Windows, and III) Iterative Data Streaming. Several parts of this work have been integrated into Apache Flink, a widely-used, open-source scalable computing framework, and supported the deployment of hundreds of long-running large-scale production pipelines worldwide.

  • 3.
    Carbone, Paris
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Ewen, Stephan
    Fora, Gyula
    Haridi, Seif
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Richter, Stefan
    Tzoumas, Kostas
    State Management in Apache Flink (R) Consistent Stateful Distributed Stream Processing2017In: Proceedings of the VLDB Endowment, ISSN 2150-8097, E-ISSN 2150-8097, Vol. 10, no 12, p. 1718-1729Article in journal (Refereed)
    Abstract [en]

    Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster reconfigurations, software patches and partial failures. Although prior systems research has addressed some of these specific problems, the practical challenge lies on how such guarantees can be materialized in a transparent, non-intrusive manner that relieves the user from unnecessary constraints. Such needs served as the main design principles of state management in Apache Flink, an open source, scalable stream processor. We present Flink's core pipelined, in-flight mechanism which guarantees the creation of lightweight, consistent, distributed snapshots of application state, progressively, without impacting continuous execution. Consistent snapshots cover all needs for system reconfiguration, fault tolerance and version management through coarse grained rollback recovery. Application state is declared explicitly to the system, allowing efficient partitioning and transparent commits to persistent storage. We further present Flink's backend implementations and mechanisms for high availability, external state queries and output commit. Finally, we demonstrate how these mechanisms behave in practice with metrics and largedeployment insights exhibiting the low performance trade-offs of our approach and the general benefits of exploiting asynchrony in continuous, yet sustainable system deployments.

  • 4.
    Carbone, Paris
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Ewen, Stephan
    data Artisans.
    Fóra, Gyula
    King Digital Entertainment Limited.
    Haridi, Seif
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Richter, Stefan
    data Artisans.
    Tzoumas, Kostas
    data Artisans.
    State Management in Apache Flink: Consistent Stateful Distributed Stream Processing2017In: Proceedings of the VLDB Endowment, ISSN 2150-8097, E-ISSN 2150-8097, Vol. 10, p. 1718-1729, article id 12Article in journal (Refereed)
    Abstract [en]

    Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster reconfigurations, software patches and partial failures. Although prior systems research has addressed some of these specific problems, the practical challenge lies on how such guarantees can be materialized in a transparent, non-intrusive manner that relieves the user from unnecessary constraints. Such needs served as the main design principles of state management in Apache Flink, an open source, scalable stream processor.

    We present Flink’s core pipelined, in-flight mechanism which guarantees the creation of lightweight, consistent, distributed snapshots of application state, progressively, without impacting continuous execution. Consistent snapshots cover all needs for system reconfiguration, fault tolerance and version management through coarse grained rollback recovery. Application state is declared explicitly to the system, allowing efficient partitioning and transparent commits to persistent storage. We further present Flink’s backend implementations and mechanisms for high availability, external state queries and output commit. Finally, we demonstrate how these mechanisms behave in practice with metrics and large deployment insights exhibiting the low performance trade-offs of our approach and the general benefits of exploiting asynchrony in continuous, yet sustainable system deployments.

  • 5.
    Carbone, Paris
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Fóra, Gyula
    CSL Computer Systems Laboratory, SICS Swedish Institute of Compute Science.
    Ewen, Stephan
    Data Artisans GmbH.
    Haridi, Seif
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Tzoumas, Kostas
    Data Artisans GmbH.
    Lightweight Asynchronous Snapshots for Distributed Dataflows2015Report (Other academic)
    Abstract [en]

    Distributed stateful stream processing enables the deployment and execution of large scale continuous computations in the cloud, targeting both low latency and high throughput. One of the most fundamental challenges of this paradigm is providing processing guarantees under potential failures. Existing approaches rely on periodic global state snapshots that can be used for failure recovery. Those approaches suffer from two main drawbacks. First, they often stall the overall computation which impacts ingestion. Second, they eagerly persist all records in transit along with the operation states which results in larger snapshots than required. In this work we propose Asynchronous Barrier Snapshotting (ABS), a lightweight algorithm suited for modern dataflow execution engines that minimises space requirements. ABS persists only operator states on acyclic execution topologies while keeping a minimal record log on cyclic dataflows. We implemented ABS on Apache Flink, a distributed analytics engine that supports stateful stream processing. Our evaluation shows that our algorithm does not have a heavy impact on the execution, maintaining linear scalability and performing well with frequent snapshots. 

  • 6.
    Carbone, Paris
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Gévay, G. E.
    Hermann, G.
    Katsifodimos, A.
    Soto, J.
    Markl, V.
    Haridi, Seif
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Large-scale data stream processing systems2017In: Handbook of Big Data Technologies, Springer International Publishing , 2017, p. 219-260Chapter in book (Other academic)
    Abstract [en]

    In our data-centric society, online services, decision making, and other aspects are increasingly becoming heavily dependent on trends and patterns extracted from data. A broad class of societal-scale data management problems requires system support for processing unbounded data with low latency and high throughput. Large-scale data stream processing systems perceive data as infinite streams and are designed to satisfy such requirements. They have further evolved substantially both in terms of expressive programming model support and also efficient and durable runtime execution on commodity clusters. Expressive programming models offer convenient ways to declare continuous data properties and applied computations, while hiding details on how these data streams are physically processed and orchestrated in a distributed environment. Execution engines provide a runtime for such models further allowing for scalable yet durable execution of any declared computation. In this chapter we introduce the major design aspects of large scale data stream processing systems, covering programming model abstraction levels and runtime concerns. We then present a detailed case study on stateful stream processing with Apache Flink, an open-source stream processor that is used for a wide variety of processing tasks. Finally, we address the main challenges of disruptive applications that large-scale data streaming enables from a systemic point of view.

  • 7.
    Carbone, Paris
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS. SICS Sweden.
    Katsifodimos, Asterios
    Ewen, Stephan
    Markl, Volker
    Haridi, Seif
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS. SICS Sweden.
    Tzoumas, Kostas
    Apache flink: Stream and batch processing in a single engine2015In: Bulletin of the IEEE Computer Society Technical Committee on Data Engineering, Vol. 36, no 4Article in journal (Refereed)
  • 8.
    Carbone, Paris
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Traub, Jonas
    Katsifodimo, Asterios
    Haridi, Seif
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Mark, Volker
    Cutty: Aggregate Sharing for User-Defined Windows2016In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Association for Computing Machinery (ACM), 2016, Vol. 24-28-, p. 1201-1210Conference paper (Refereed)
    Abstract [en]

    Aggregation queries on data streams are evaluated over evolving and often overlapping logical views called windows. While the aggregation of periodic windows were extensively studied in the past through the use of aggregate sharing techniques such as Panes and Pairs, little to no work has been put in optimizing the aggregation of very common, non-periodic windows. Typical examples of non-periodic windows are punctuations and sessions which can implement complex business logic and are often expressed as user-defined operators on platforms such as Google Dataflow or Apache Storm. The aggregation of such non-periodic or user-defined windows either falls back to expensive, best-effort aggregate sharing methods, or is not optimized at all.

    In this paper we present a technique to perform efficient aggregate sharing for data stream windows, which are declared as user-defined functions (UDFs) and can contain arbitrary business logic. To this end, we first introduce the concept of User-Defined Windows (UDWs), a simple, UDF-based programming abstraction that allows users to programmatically define custom windows. We then define semantics for UDWs, based on which we design Cutty, a low-cost aggregate sharing technique. Cutty improves and outperforms the state of the art for aggregate sharing on single and multiple queries. Moreover, it enables aggregate sharing for a broad class of non-periodic UDWs. We implemented our techniques on Apache Flink, an open source stream processing system, and performed experiments demonstrating orders of magnitude of reduction in aggregation costs compared to the state of the art.

  • 9.
    Carbone, Paris
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Vandikas, K.
    Zaloshnja, F.
    Towards highly available complex event processing deployments in the cloud2013In: International Conference on Next Generation Mobile Applications, Services, and Technologies, IEEE , 2013, p. 153-158Conference paper (Refereed)
    Abstract [en]

    Recent advances in distributed computing have made it possible to achieve high availability on traditional systems and thus serve them as reliable services. For several offline computational applications, such as fine grained batch processing, their parallel nature in addition to weak consistency requirements allowed a more trivial transition. On the other hand, on-line processing systems such as Complex Event Processing (CEP) still maintain a monolithic architecture, being able to offer high expressiveness and vertical scalability at the expense of low distribution. Despite attempts to design dedicated distributed CEP systems there is potential for existing systems to benefit from a sustainable cloud deployment. In this work we address the main challenges of providing such a CEP service with a focus on reliability, since it is the most crucial aspect of that transition. Our approach targets low average detection latency and sustain-ability by leveraging event delegation mechanisms present on existing stream execution platforms. It also introduces redundancy and transactional logging to provide improved fault tolerance and partial recovery. Our performance analysis illustrates the benefits of our approach and shows acceptable performance costs for on-line CEP exhibited by the fault tolerance mechanisms we introduced.

  • 10.
    Carbone, Paris
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Vlassov, Vladimir
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Auto-Scoring of Personalised News in the Real-Time Web: Challenges, Overview and Evaluation of the State-of-the-Art Solutions2015Conference paper (Refereed)
    Abstract [en]

    The problem of automated personalised news recommendation, often referred as auto-scoring has attracted substantial research throughout the last decade in multiple domains such as data mining and machine learning, computer systems, e commerce and sociology. A typical "recommender systems" approach to solving this problem usually adopts content-based scoring, collaborative filtering or more often a hybrid approach. Due to their special nature, news articles introduce further challenges and constraints to conventional item recommendation problems, characterised by short lifetime and rapid popularity trends. In this survey, we provide an overview of the challenges and current solutions in news personalisation and ranking from both an algorithmic and system design perspective, and present our evaluation of the most representative scoring algorithms while also exploring the benefits of using a hybrid approach. Our evaluation is based on a real-life case study in news recommendations.

  • 11.
    Kroll, Lars
    et al.
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Carbone, Paris
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Haridi, Seif
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    Kompics Scala: Narrowing the gap between algorithmic specification and executable code (short paper)2017In: Proceedings of the 8th ACM SIGPLAN International Symposium on Scala, ACM Digital Library, 2017, p. 73-77Conference paper (Refereed)
    Abstract [en]

    Message-based programming frameworks facilitate the development and execution of core distributed computing algorithms today. Their twofold aim is to expose a programming model that minimises logical errors incurred during translation from an algorithmic specification to executable program, and also to provide an efficient runtime for event pattern-matching and scheduling of distributed components. Kompics Scala is a framework that allows for a direct, streamlined translation from a formal algorithm specification to practical code by reducing the cognitive gap between the two representations. Furthermore, its runtime decouples event pattern-matching and component execution logic yielding clean, thoroughly expected behaviours. Our evaluation shows low and constant performance overhead of Kompics Scala compared to similar frameworks that otherwise fail to offer the same level of model clarity.

1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf