Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-7510-6286
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-9637-2065
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
Barcelona Super Computing Center and Technical University of Catalunya.
2016 (English)Conference paper (Refereed)
Abstract [en]

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. However, recent studies on micro-architectural characterization of in-memory data analytics are limited to only batch processing workloads. We compare the micro-architectural performance of batch processing and stream processing workloads in Apache Spark using hardware performance counters on a dual socket server. In our evaluation experiments, we have found that batch processing and stream processing has same micro-architectural behavior in Spark if the difference between two implementations is of micro-batching only. If the input data rates are small, stream processing workloads are front-end bound. However, the front end bound stalls are reduced at larger input data rates and instruction retirement is improved. Moreover, Spark workloads using DataFrames have improved instruction retirement over workloads using RDDs.

Place, publisher, year, edition, pages
IEEE, 2016. 59-66 p.
Keyword [en]
Microarchitectural Performance, Spark Streaming, Workload Characterization
National Category
Computer Systems
Research subject
Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-196123DOI: 10.1109/BDCloud-SocialCom-SustainCom.2016.20ISI: 000392516300009ScopusID: 2-s2.0-85000885440OAI: oai:DiVA.org:kth-196123DiVA: diva2:1046082
Conference
The 6th IEEE International Conference on Big Data and Cloud Computing
Note

QC 20161130

Available from: 2016-11-11 Created: 2016-11-11 Last updated: 2017-02-24Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Awan, Ahsan JavedBrorsson, MatsVlassov, Vladimir
By organisation
Software and Computer systems, SCS
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 65 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf