Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
SpanEdge: Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
(SICS Swedish ICT)
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
2016 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In stream processing, data is streamed as a continuous flow of data items, which are generated from multiple sources and geographical locations. The common approach for stream processing is to transfer raw data streams to a central data center that entails communication over the wide-area network (WAN). However, this approach is inefficient and falls short for two main reasons: i) the burst in the amount of data generated at the network edge by an increasing number of connected devices, ii) the emergence of applications with predictable and low latency requirements. In this paper, we propose SpanEdge, a novel approach that unifies stream processing across a geodistributed infrastructure, including the central and near-theedge data centers. SpanEdge reduces or eliminates the latency incurred by WAN links by distributing stream processing applications across the central and the near-the-edge data centers. Furthermore, SpanEdge provides a programming environment, which allows programmers to specify parts of their applications that need to be close to the data source. Programmers can develop a stream processing application, regardless of the number of data sources and their geographical distributions. As a proof of concept, we implemented and evaluated a prototype of SpanEdge. Our results show that SpanEdge can optimally deploy the stream processing applications in a geo-distributed infrastructure, which significantly reduces the bandwidth consumption and the response latency.

Place, publisher, year, edition, pages
2016.
Keyword [en]
geo-distributed stream processing, geo-distributed infrastructure, edge computing, edge-based analytics
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:kth:diva-193581DOI: 10.1109/SEC.2016.17ISI: 000391420900036OAI: oai:DiVA.org:kth-193581DiVA: diva2:1024984
Conference
The First IEEE/ACM Symposium on Edge Computing (SEC)
Note

QC 20161005

Available from: 2016-10-04 Created: 2016-10-04 Last updated: 2017-02-03Bibliographically approved
In thesis
1. Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
Open this publication in new window or tab >>Towards Unifying Stream Processing over Central and Near-the-Edge Data Centers
2016 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, our goal is to enable and achieve effective and efficient real-time stream processing in a geo-distributed infrastructure, by combining the power of central data centers and micro data centers. Our research focus is to address the challenges of distributing the stream processing applications and placing them closer to data sources and sinks. We enable applications to run in a geo-distributed setting and provide solutions for the network-aware placement of distributed stream processing applications across geo-distributed infrastructures.

 First, we evaluate Apache Storm, a widely used open-source distributed stream processing system, in the community network Cloud, as an example of a geo-distributed infrastructure. Our evaluation exposes new requirements for stream processing systems to function in a geo-distributed infrastructure. Second, we propose a solution to facilitate the optimal placement of the stream processing components on geo-distributed infrastructures. We present a novel method for partitioning a geo-distributed infrastructure into a set of computing clusters, each called a micro data center. According to our results, we can increase the minimum available bandwidth in the network and likewise, reduce the average latency to less than 50%. Next, we propose a parallel and distributed graph partitioner, called HoVerCut, for fast partitioning of streaming graphs. Since a lot of data can be presented in the form of graph, graph partitioning can be used to assign the graph elements to different data centers to provide data locality for efficient processing. Last, we provide an approach, called SpanEdge that enables stream processing systems to work on a geo-distributed infrastructure. SpenEdge unifies stream processing over the central and near-the-edge data centers (micro data centers). As a proof of concept, we implement SpanEdge by extending Apache Storm that enables it to run across multiple data centers.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2016. 33 p.
Series
TRITA-ICT, 2016:27
Keyword
geo-distributed stream processing, geo-distributed infrastructure, edge computing, edge-based analytics
National Category
Computer and Information Science
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-193582 (URN)978-91-7729-118-3 (ISBN)
Presentation
2016-11-14, Sal 208, Electrum, Kungl Tekniska högskolan, Kistagången 16, Kista, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20161005

Available from: 2016-10-05 Created: 2016-10-04 Last updated: 2016-10-12Bibliographically approved

Open Access in DiVA

fulltext(642 kB)180 downloads
File information
File name FULLTEXT01.pdfFile size 642 kBChecksum SHA-512
3795d3047e43ec843489a737d850d1f2cce116634dc0bf49a459ab2798941c344cf8a91f9b8d1f2a2ba915d7d9c4c39e8d59317526972be55da69455ef5946df
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Peiro Sajjad, HoomanDanniswara, KenVlassov, Vladimir
By organisation
Software and Computer systems, SCS
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 180 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 706 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf