Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Comparative Evaluation of Spark andStratosphere
KTH, School of Information and Communication Technology (ICT).
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Nowadays, although MapReduce is applied to the parallel processing on big data, it has some limitations: for instance, lack of generic but efficient and richly functional primitive parallel methods, incapability of entering multiple input parameters on the entry of parallel methods, and inefficiency in the way of handling iterative algorithms. Spark and Stratosphere are developed to deal with (partly) the shortcoming of MapReduce. The goal of this thesis is to evaluate Spark and Stratosphere both from the point of view of theoretical programming model and practical execution on specified application algorithms. In the introductory section of comparative programming models, we mainly explore and compare the features of Spark and Stratosphere that overcome the limitation of MapReduce. After the comparison in theoretical programming model, we further evaluate their practical performance by running three different classes of applications and assessing usage of computing resources and execution time. It is concluded that Spark has promising features for iterative algorithms in theory but it may not achieve the expected performance improvement to run iterative applications if the amount of memory used for cached operations is close to the actual available memory in the cluster environment. In that case, the reason for the poor results in performance is because larger amount of memory participates in the caching operation and in turn, only a small amount memory is available for computing operations of actual algorithms. Stratosphere shows favorable characteristics as a general parallel computing framework, but it has no support for iterative algorithms and spends more computing resources than Spark for the same amount of work. In another aspect, applications based on Stratosphere can achieve benefits by manually setting compiler hints when developing the code, whereas Spark has no corresponding functionality.

Place, publisher, year, edition, pages
2013. , 72 p.
Series
Trita-ICT-EX, 2013:21
Keyword [en]
Parallel Computing Framework, Distributed Computing, Cluster, RDDs, PACTs.
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-118226OAI: oai:DiVA.org:kth-118226DiVA: diva2:605106
Educational program
Master of Science - Software Engineering of Distributed Systems
Uppsok
Technology
Examiners
Available from: 2013-03-21 Created: 2013-02-13 Last updated: 2013-03-21Bibliographically approved

Open Access in DiVA

fulltext(1262 kB)3025 downloads
File information
File name FULLTEXT01.pdfFile size 1262 kBChecksum SHA-512
2e20cbe4b0f7c6c342abd13c395d0ce39282810145351ab17b3b81fb807995a721aa15b01b745edd55d328ec8d57a70db0c62087ba4b25d42ee880225ae5fb95
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 3025 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 608 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf