Identifying the potential of Near Data Processing for Apache SparkShow others and affiliations
2017 (English)In: Proceedings of the International Symposium on Memory Systems, MEMSYS 2017, Association for Computing Machinery (ACM), 2017, p. 60-67, article id F131197Conference paper, Published paper (Refereed)
Abstract [en]
While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. There is also a renewed interest in Near Data Processing (NDP) due to technological advancement in the last decade. However, it is not known if NDP architectures can improve the performance of big data processing frameworks such as Apache Spark. In this paper, we build the case of NDP architecture comprising programmable logic based hybrid 2D integrated processing-in-memory and instorage processing for Apache Spark, by extensive profiling of Apache Spark based workloads on Ivy Bridge Server.
Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2017. p. 60-67, article id F131197
Keywords [en]
Processing-in-memory, In-storage Processing, Apache Spark
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-211727DOI: 10.1145/3132402.3132427ISI: 000557248700006Scopus ID: 2-s2.0-85033586379OAI: oai:DiVA.org:kth-211727DiVA, id: diva2:1130759
Conference
Proceedings of the International Symposium on Memory Systems, MEMSYS 2017, Alexandria, VA, USA, October 02 - 05, 2017
Note
ISBN for proceedings: 9781450353359
QC 20171124
QC 20210518
2017-08-112017-08-112023-03-06Bibliographically approved