Change search
ReferencesLink to record
Permanent link

Direct link
Comparative Evaluation of Spark andStratosphere
KTH, School of Information and Communication Technology (ICT).
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Nowadays, although MapReduce is applied to the parallel processing on big data, it has some limitations: for instance, lack of generic but efficient and richly functional primitive parallel methods, incapability of entering multiple input parameters on the entry of parallel methods, and inefficiency in the way of handling iterative algorithms. Spark and Stratosphere are developed to deal with (partly) the shortcoming of MapReduce. The goal of this thesis is to evaluate Spark and Stratosphere both from the point of view of theoretical programming model and practical execution on specified application algorithms. In the introductory section of comparative programming models, we mainly explore and compare the features of Spark and Stratosphere that overcome the limitation of MapReduce. After the comparison in theoretical programming model, we further evaluate their practical performance by running three different classes of applications and assessing usage of computing resources and execution time. It is concluded that Spark has promising features for iterative algorithms in theory but it may not achieve the expected performance improvement to run iterative applications if the amount of memory used for cached operations is close to the actual available memory in the cluster environment. In that case, the reason for the poor results in performance is because larger amount of memory participates in the caching operation and in turn, only a small amount memory is available for computing operations of actual algorithms. Stratosphere shows favorable characteristics as a general parallel computing framework, but it has no support for iterative algorithms and spends more computing resources than Spark for the same amount of work. In another aspect, applications based on Stratosphere can achieve benefits by manually setting compiler hints when developing the code, whereas Spark has no corresponding functionality.

Place, publisher, year, edition, pages
2013. , 72 p.
Trita-ICT-EX, 2013:21
Keyword [en]
Parallel Computing Framework, Distributed Computing, Cluster, RDDs, PACTs.
National Category
Engineering and Technology
URN: urn:nbn:se:kth:diva-118226OAI: diva2:605106
Educational program
Master of Science - Software Engineering of Distributed Systems
Available from: 2013-03-21 Created: 2013-02-13 Last updated: 2013-03-21Bibliographically approved

Open Access in DiVA

fulltext(1262 kB)2919 downloads
File information
File name FULLTEXT01.pdfFile size 1262 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 2919 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 548 hits
ReferencesLink to record
Permanent link

Direct link