Change search
ReferencesLink to record
Permanent link

Direct link
Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-7510-6286
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.ORCID iD: 0000-0002-9637-2065
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
Technical University of Catalunya (UPC), Computer Architecture Department.
2015 (English)In: Proceedings - 2015 IEEE 5th International Conference on Big Data and Cloud Computing, BDCloud 2015, IEEE Computer Society, 2015, 1-8 p., 7310708Conference paper (Refereed)
Abstract [en]

In last decade, data analytics have rapidly progressed from traditional disk-based processing tomodern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.

Place, publisher, year, edition, pages
IEEE Computer Society, 2015. 1-8 p., 7310708
Keyword [en]
cloud chambers, cloud computing, data analysis, resource allocation, storage management, Apache Spark framework, Spark workload, data analysis workload, disk-based processing, in-memory data analytics, in-memory processing, memory bound latency, microarchitecture level performance, modern cloud server, performance characterization, single node NUMA machine, thread level load imbalance, work time inflation, workload scalability, Benchmark testing, Big data, Data analysis, Instruction sets, Scalability, Servers, Sparks, Data Analytics, NUMA, Spark Performance, Workload Characterization
National Category
Computer Systems
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-179403DOI: 10.1109/BDCloud.2015.37ISI: 000380444200001ScopusID: 2-s2.0-84962757128ISBN: 978-1-4673-7182-7OAI: oai:DiVA.org:kth-179403DiVA: diva2:882889
Conference
Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on, Dalian, China, 26-28 Aug. 2015
Note

QC 20160118 QC 20160922

Available from: 2015-12-16 Created: 2015-12-16 Last updated: 2016-09-22Bibliographically approved
In thesis
1. Performance Characterization of In-Memory Data Analytics on a Scale-up Server
Open this publication in new window or tab >>Performance Characterization of In-Memory Data Analytics on a Scale-up Server
2016 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark defines the state of the art in big data analytics platforms for (i) exploiting data-flow and in-memory computing and (ii) for exhibiting superior scale-out performance on the commodity machines, little effort has been devoted at understanding the performance of in-memory data analytics with Spark on modern scale-up servers. This thesis characterizes the performance of in-memory data analytics with Spark on scale-up servers.

Through empirical evaluation of representative benchmark workloads on a dual socket server, we have found that in-memory data analytics with Spark exhibit poor multi-core scalability beyond 12 cores due to thread level load imbalance and work-time inflation. We have also found that workloads are bound by the latency of frequent data accesses to DRAM. By enlarging input data size, application performance degrades significantly due to substantial increase in wait time during I/O operations and garbage collection, despite 10% better instruction retirement rate (due to lower L1 cache misses and higher core utilization).

For data accesses we have found that simultaneous multi-threading is effective in hiding the data latencies. We have also observed that (i) data locality on NUMA nodes can improve the performance by 10% on average, (ii) disabling next-line L1-D prefetchers can reduce the execution time by up-to 14%. For GC impact, we match memory behaviour with the garbage collector to improve performance of applications between 1.6x to 3x. and recommend to use multiple small executors that can provide up-to 36% speedup over single large executor.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2016. 111 p.
Series
, TRITA-ICT, 2016:07
National Category
Computer Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-185581 (URN)ISBN 978-91-7595-926-9 (ISBN)
Presentation
2016-05-23, Ka-210, Electrum 229, Kista, Stockholm, 09:15 (English)
Opponent
Supervisors
Note

QC 20160425

Available from: 2016-04-25 Created: 2016-04-22 Last updated: 2016-04-25Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Javed Awan, AhsanBrorsson, MatsVlassov, Vladimir
By organisation
Software and Computer systems, SCS
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 52 hits
ReferencesLink to record
Permanent link

Direct link