kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (9 of 9) Show all publications
Javed Awan, A., Ohara, M., Ayguade, E., Ishizaki, K., Brorsson, M. & Vlassov, V. (2017). Identifying the potential of Near Data Processing for Apache Spark. In: Proceedings of the International Symposium on Memory Systems, MEMSYS 2017: . Paper presented at Proceedings of the International Symposium on Memory Systems, MEMSYS 2017, Alexandria, VA, USA, October 02 - 05, 2017 (pp. 60-67). Association for Computing Machinery (ACM), Article ID F131197.
Open this publication in new window or tab >>Identifying the potential of Near Data Processing for Apache Spark
Show others...
2017 (English)In: Proceedings of the International Symposium on Memory Systems, MEMSYS 2017, Association for Computing Machinery (ACM), 2017, p. 60-67, article id F131197Conference paper, Published paper (Refereed)
Abstract [en]

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. There is also a renewed interest in Near Data Processing (NDP) due to technological advancement in the last decade. However, it is not known if NDP architectures can improve the performance of big data processing frameworks such as Apache Spark. In this paper, we build the case of NDP architecture comprising programmable logic based hybrid 2D integrated processing-in-memory and instorage processing for Apache Spark, by extensive profiling of Apache Spark based workloads on Ivy Bridge Server.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2017
Keywords
Processing-in-memory, In-storage Processing, Apache Spark
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-211727 (URN)10.1145/3132402.3132427 (DOI)000557248700006 ()2-s2.0-85033586379 (Scopus ID)
Conference
Proceedings of the International Symposium on Memory Systems, MEMSYS 2017, Alexandria, VA, USA, October 02 - 05, 2017
Note

ISBN for proceedings: 9781450353359

QC 20171124

QC 20210518

Available from: 2017-08-11 Created: 2017-08-11 Last updated: 2023-03-06Bibliographically approved
Awan, A. J. (2017). Project Night-King: Improving the performance of big data analytics using Near Data Computing Architectures.
Open this publication in new window or tab >>Project Night-King: Improving the performance of big data analytics using Near Data Computing Architectures
2017 (English)Other (Other (popular science, discussion, etc.)) [Artistic work]
Abstract [en]

The goal of Project Night-King is to improve the single-node performance of scale-out big data processing frameworks like Apache Spark using programmable accelerators near DRAM and NVRAM. Using modeling techniques, we estimate the lower bound of 5x performance improvement for Spark MLlib workloads.

Publisher
p. 1
Keywords
Near Data Processing Architecture, Apache Spark, In-Storage Processing, Processing-in-Memory
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
SRA - ICT
Identifiers
urn:nbn:se:kth:diva-216962 (URN)
Note

QC 20171031

Available from: 2017-10-25 Created: 2017-10-25 Last updated: 2024-03-18Bibliographically approved
Awan, A. J. (2016). Accelerating Apache Spark with Fixed Function Hardware Accelerators Near DRAM and NVRAM.
Open this publication in new window or tab >>Accelerating Apache Spark with Fixed Function Hardware Accelerators Near DRAM and NVRAM
2016 (English)Other (Refereed)
Publisher
p. 1
Keywords
Apache Spark, Hardware Acceleration
National Category
Embedded Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-213728 (URN)10.13140/RG.2.2.17593.26724 (DOI)
Note

QC 20170906

Available from: 2017-09-05 Created: 2017-09-05 Last updated: 2024-03-18Bibliographically approved
Awan, A. J., Brorsson, M., Vlassov, V. & Ayguade, E. (2016). Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads. In: : . Paper presented at The 6th IEEE International Conference on Big Data and Cloud Computing (pp. 59-66). IEEE
Open this publication in new window or tab >>Micro-architectural Characterization of Apache Spark on Batch and Stream Processing Workloads
2016 (English)Conference paper, Published paper (Refereed)
Abstract [en]

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. However, recent studies on micro-architectural characterization of in-memory data analytics are limited to only batch processing workloads. We compare the micro-architectural performance of batch processing and stream processing workloads in Apache Spark using hardware performance counters on a dual socket server. In our evaluation experiments, we have found that batch processing and stream processing has same micro-architectural behavior in Spark if the difference between two implementations is of micro-batching only. If the input data rates are small, stream processing workloads are front-end bound. However, the front end bound stalls are reduced at larger input data rates and instruction retirement is improved. Moreover, Spark workloads using DataFrames have improved instruction retirement over workloads using RDDs.

Place, publisher, year, edition, pages
IEEE: , 2016
Keywords
Microarchitectural Performance, Spark Streaming, Workload Characterization
National Category
Computer Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-196123 (URN)10.1109/BDCloud-SocialCom-SustainCom.2016.20 (DOI)000392516300009 ()2-s2.0-85000885440 (Scopus ID)
Conference
The 6th IEEE International Conference on Big Data and Cloud Computing
Note

QC 20161130

Available from: 2016-11-11 Created: 2016-11-11 Last updated: 2024-03-15Bibliographically approved
Awan, A. J., Brorsson, M., Vlassov, V. & Ayguade, E. (2016). Node architecture implications for in-memory data analytics on scale-in clusters. In: : . Paper presented at 3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (pp. 237-246). IEEE Press
Open this publication in new window or tab >>Node architecture implications for in-memory data analytics on scale-in clusters
2016 (English)Conference paper, Published paper (Refereed)
Abstract [en]

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics. Recent studies propose scale-in clusters with in-storage processing devices to process big data analytics with Spark However the proposal is based solely on the memory bandwidth characterization of in-memory data analytics and also does not shed light on the specification of host CPU and memory. Through empirical evaluation of in-memory data analytics with Apache Spark on an Ivy Bridge dual socket server, we have found that (i) simultaneous multi-threading is effective up to 6 cores (ii) data locality on NUMA nodes can improve the performance by 10% on average, (iii) disabling next-line L1-D prefetchers can reduce the execution time by up to 14%, (iv) DDR3 operating at 1333 MT/s is sufficient and (v) multiple small executors can provide up to 36% speedup over single large executor.

Place, publisher, year, edition, pages
IEEE Press, 2016
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-198161 (URN)10.1145/3006299.3006319 (DOI)000408919800026 ()2-s2.0-85013223047 (Scopus ID)
Conference
3rd IEEE/ACM International Conference on Big Data Computing, Applications and Technologies
Note

QC 20161219

Available from: 2016-12-13 Created: 2016-12-13 Last updated: 2024-03-15Bibliographically approved
Awan, A. J. (2016). Performance Characterization of In-Memory Data Analytics on a Scale-up Server. (Licentiate dissertation). KTH Royal Institute of Technology
Open this publication in new window or tab >>Performance Characterization of In-Memory Data Analytics on a Scale-up Server
2016 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark defines the state of the art in big data analytics platforms for (i) exploiting data-flow and in-memory computing and (ii) for exhibiting superior scale-out performance on the commodity machines, little effort has been devoted at understanding the performance of in-memory data analytics with Spark on modern scale-up servers. This thesis characterizes the performance of in-memory data analytics with Spark on scale-up servers.

Through empirical evaluation of representative benchmark workloads on a dual socket server, we have found that in-memory data analytics with Spark exhibit poor multi-core scalability beyond 12 cores due to thread level load imbalance and work-time inflation. We have also found that workloads are bound by the latency of frequent data accesses to DRAM. By enlarging input data size, application performance degrades significantly due to substantial increase in wait time during I/O operations and garbage collection, despite 10% better instruction retirement rate (due to lower L1 cache misses and higher core utilization).

For data accesses we have found that simultaneous multi-threading is effective in hiding the data latencies. We have also observed that (i) data locality on NUMA nodes can improve the performance by 10% on average, (ii) disabling next-line L1-D prefetchers can reduce the execution time by up-to 14%. For GC impact, we match memory behaviour with the garbage collector to improve performance of applications between 1.6x to 3x. and recommend to use multiple small executors that can provide up-to 36% speedup over single large executor.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2016. p. 111
Series
TRITA-ICT ; 2016:07
National Category
Computer Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-185581 (URN)978-91-7595-926-9 (ISBN)
Presentation
2016-05-23, Ka-210, Electrum 229, Kista, Stockholm, 09:15 (English)
Opponent
Supervisors
Note

QC 20160425

Available from: 2016-04-25 Created: 2016-04-22 Last updated: 2022-06-22Bibliographically approved
Awan, A. J., Brorsson, M., Vlassov, V. & Ayguade, E. (2015). How Data Volume Affects Spark Based Data Analytics on a Scale-up Server. In: Big Data Benchmarks, Performance Optimization, and Emerging Hardware: 6th Workshop, BPOE 2015, Kohala, HI, USA, August 31 - September 4, 2015. Revised Selected Papers. Paper presented at 6th International Workshop on Bigdata Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with 41st International Conference on Very Large Data Bases (VLDB),Kohala, HI, USA, August 31 - September 4, 2015 (pp. 81-92). Springer, 9495
Open this publication in new window or tab >>How Data Volume Affects Spark Based Data Analytics on a Scale-up Server
2015 (English)In: Big Data Benchmarks, Performance Optimization, and Emerging Hardware: 6th Workshop, BPOE 2015, Kohala, HI, USA, August 31 - September 4, 2015. Revised Selected Papers, Springer, 2015, Vol. 9495, p. 81-92Conference paper, Published paper (Refereed)
Abstract [en]

Sheer increase in volume of data over the last decade has triggered research in cluster computing frameworks that enable web enterprises to extract big insights from big data. While Apache Spark is gaining popularity for exhibiting superior scale-out performance on the commodity machines, the impact of data volume on the performance of Spark based data analytics in scale-up configuration is not well understood. We present a deep-dive analysis of Spark based applications on a large scale-up server machine. Our analysis reveals that Spark based data analytics are DRAM bound and do not benefit by using more than 12 cores for an executor. By enlarging input data size, application performance degrades significantly due to substantial increase in wait time during I/O operations and garbage collection, despite 10 % better instruction retirement rate (due to lower L1 cache misses and higher core utilization). We match memory behaviour with the garbage collector to improve performance of applications between 1.6x to 3x.

Place, publisher, year, edition, pages
Springer, 2015
Series
Lecture Notes in Computer Science
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-181325 (URN)10.1007/978-3-319-29006-5_7 (DOI)2-s2.0-84958073801 (Scopus ID)978-3-319-29005-8 (ISBN)
Conference
6th International Workshop on Bigdata Benchmarks, Performance Optimization and Emerging Hardware (BpoE), held in conjunction with 41st International Conference on Very Large Data Bases (VLDB),Kohala, HI, USA, August 31 - September 4, 2015
Note

QC 20160224

Available from: 2016-02-01 Created: 2016-02-01 Last updated: 2024-03-15Bibliographically approved
Javed Awan, A., Brorsson, M., Vlassov, V. & Ayguade, E. (2015). Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server. In: Proceedings - 2015 IEEE 5th International Conference on Big Data and Cloud Computing, BDCloud 2015: . Paper presented at Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on, Dalian, China, 26-28 Aug. 2015 (pp. 1-8). IEEE Computer Society, Article ID 7310708.
Open this publication in new window or tab >>Performance Characterization of In-Memory Data Analytics on a Modern Cloud Server
2015 (English)In: Proceedings - 2015 IEEE 5th International Conference on Big Data and Cloud Computing, BDCloud 2015, IEEE Computer Society, 2015, p. 1-8, article id 7310708Conference paper, Published paper (Refereed)
Abstract [en]

In last decade, data analytics have rapidly progressed from traditional disk-based processing tomodern in-memory processing. However, little effort has been devoted at enhancing performance at micro-architecture level. This paper characterizes the performance of in-memory data analytics using Apache Spark framework. We use a single node NUMA machine and identify the bottlenecks hampering the scalability of workloads. We also quantify the inefficiencies at micro-architecture level for various data analysis workloads. Through empirical evaluation, we show that spark workloads do not scale linearly beyond twelve threads, due to work time inflation and thread level load imbalance. Further, at the micro-architecture level, we observe memory bound latency to be the major cause of work time inflation.

Place, publisher, year, edition, pages
IEEE Computer Society, 2015
Keywords
cloud chambers, cloud computing, data analysis, resource allocation, storage management, Apache Spark framework, Spark workload, data analysis workload, disk-based processing, in-memory data analytics, in-memory processing, memory bound latency, microarchitecture level performance, modern cloud server, performance characterization, single node NUMA machine, thread level load imbalance, work time inflation, workload scalability, Benchmark testing, Big data, Data analysis, Instruction sets, Scalability, Servers, Sparks, Data Analytics, NUMA, Spark Performance, Workload Characterization
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-179403 (URN)10.1109/BDCloud.2015.37 (DOI)000380444200001 ()2-s2.0-84962757128 (Scopus ID)978-1-4673-7182-7 (ISBN)
Conference
Big Data and Cloud Computing (BDCloud), 2015 IEEE Fifth International Conference on, Dalian, China, 26-28 Aug. 2015
Note

QC 20160118 QC 20160922

Available from: 2015-12-16 Created: 2015-12-16 Last updated: 2024-03-15Bibliographically approved
Awan, A. J., Brorsson, M., Vlassov, V. & Ayguade, E.Architectural Impact on Performance of In-memoryData Analytics: Apache Spark Case Study.
Open this publication in new window or tab >>Architectural Impact on Performance of In-memoryData Analytics: Apache Spark Case Study
(English)Manuscript (preprint) (Other academic)
Abstract [en]

While cluster computing frameworks are contin-uously evolving to provide real-time data analysis capabilities,Apache Spark has managed to be at the forefront of big data an-alytics for being a unified framework for both, batch and streamdata processing. However, recent studies on micro-architecturalcharacterization of in-memory data analytics are limited to onlybatch processing workloads. We compare micro-architectural per-formance of batch processing and stream processing workloadsin Apache Spark using hardware performance counters on a dualsocket server. In our evaluation experiments, we have found thatbatch processing are stream processing workloads have similarmicro-architectural characteristics are bounded by the latency offrequent data access to DRAM. For data accesses we have foundthat simultaneous multi-threading is effective in hiding the datalatencies. We have also observed that (i) data locality on NUMAnodes can improve the performance by 10% on average and(ii)disabling next-line L1-D prefetchers can reduce the executiontime by up-to 14% and (iii) multiple small executors can provideup-to 36% speedup over single large executor

Keywords
Performance Characterization, Apache Spark, Micro-architecture
National Category
Computer Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-185580 (URN)
Note

QC 20160425

Available from: 2016-04-22 Created: 2016-04-22 Last updated: 2023-03-06Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-7510-6286

Search in DiVA

Show all publications