PonIC: Using Stratosphere to Speed Up Pig Analytics
2013 (English)In: Euro-Par 2013 Parallel Processing: 19th International Conference, Aachen, Germany, August 26-30, 2013. Proceedings, Springer Berlin/Heidelberg, 2013, 279-290 p.Conference paper (Refereed)
Pig, a high-level dataflow system built on top of Hadoop MapReduce, has greatly facilitated the implementation of data-intensive applications. Pig successfully manages to conceal Hadoop’s one input and two-stage inflexible pipeline limitations, by translating scripts into MapReduce jobs. However, these limitations are still present in the backend, often resulting in inefficient execution.Stratosphere, a data-parallel computing framework consisting of PACT, an extension to the MapReduce programming model and the Nephele execution engine, overcomes several limitations of Hadoop MapReduce. In this paper, we argue that Pig can highly benefit from using Stratosphere as the backend system and gain performance, without any loss of expressiveness.We have ported Pig on top of Stratosphere and we present a process for translating Pig Latin scripts into PACT programs. Our evaluation shows that Pig Latin scripts can execute on our prototype up to 8 times faster for a certain class of applications.
Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2013. 279-290 p.
, Lecture Notes in Computer Science, ISSN 0302-9743 ; 8097
big data, data analytics, programming systems
IdentifiersURN: urn:nbn:se:kth:diva-129072DOI: 10.1007/978-3-642-40047-6_30ISI: 000341243100030ScopusID: 2-s2.0-84883160941ISBN: 978-3-642-40046-9OAI: oai:DiVA.org:kth-129072DiVA: diva2:649882
19th International Conference on Parallel Processing, Euro-Par 2013; Aachen, Germany, 26 August - 30 August 2013
ProjectsSSF project End-to-End Clouds (E2E Cloouds)Erasmus Mundus Joint Doctorate in Distributed Computing (EMJD-DC)
FunderSwedish Foundation for Strategic Research , RIT10-0043
QC 201310172013-09-192013-09-192014-10-03Bibliographically approved