Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Cutty: Aggregate Sharing for User-Defined Windows
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-9351-8508
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-6718-0144
Show others and affiliations
2016 (English)In: Proceedings of the 25th ACM International on Conference on Information and Knowledge Management, Association for Computing Machinery (ACM), 2016, Vol. 24-28-, 1201-1210 p.Conference paper, Published paper (Refereed)
Abstract [en]

Aggregation queries on data streams are evaluated over evolving and often overlapping logical views called windows. While the aggregation of periodic windows were extensively studied in the past through the use of aggregate sharing techniques such as Panes and Pairs, little to no work has been put in optimizing the aggregation of very common, non-periodic windows. Typical examples of non-periodic windows are punctuations and sessions which can implement complex business logic and are often expressed as user-defined operators on platforms such as Google Dataflow or Apache Storm. The aggregation of such non-periodic or user-defined windows either falls back to expensive, best-effort aggregate sharing methods, or is not optimized at all.

In this paper we present a technique to perform efficient aggregate sharing for data stream windows, which are declared as user-defined functions (UDFs) and can contain arbitrary business logic. To this end, we first introduce the concept of User-Defined Windows (UDWs), a simple, UDF-based programming abstraction that allows users to programmatically define custom windows. We then define semantics for UDWs, based on which we design Cutty, a low-cost aggregate sharing technique. Cutty improves and outperforms the state of the art for aggregate sharing on single and multiple queries. Moreover, it enables aggregate sharing for a broad class of non-periodic UDWs. We implemented our techniques on Apache Flink, an open source stream processing system, and performed experiments demonstrating orders of magnitude of reduction in aggregation costs compared to the state of the art.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2016. Vol. 24-28-, 1201-1210 p.
Keyword [en]
Computer circuits, Computer programming, Data communication systems, Knowledge management, Open source software, Open systems, Semantics
National Category
Computer Science
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-198942DOI: 10.1145/2983323.2983807ISI: 000390890800124Scopus ID: 2-s2.0-84996567073ISBN: 978-1-4503-4073-1 (print)OAI: oai:DiVA.org:kth-198942DiVA: diva2:1059571
Conference
25th ACM International Conference on Information and Knowledge Management, CIKM 2016, Indianapolis, United States, 24 October 2016 through 28 October 2016
Note

QC 20170130

Available from: 2016-12-22 Created: 2016-12-22 Last updated: 2017-01-30Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Carbone, ParisHaridi, Seif
By organisation
Software and Computer systems, SCS
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 33 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf