Evaluation and benchmarking of Tachyon as a memory-centric distributed storage system for Apache Hadoop
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Hadoop was developed as an open-source software framework that leveraged initially the MapReduce programming model and therefore was able to efficiently analyse and process large datasets. At the core of Hadoop is the Hadoop distributed file system or HDFS, which is used as the default storage across the cluster. Hadoop can also be used with other types of storage, with or without HDFS, such as Amazon S3, Windows Azure Storage Blobs, GlusterFS, Tachyon etc. This thesis focuses on Tachyon, a distributed file system that claims to enable reliable data sharing at memory speed across cluster computing frameworks. We benchmark and evaluate HDFS with and without Tachyon in regards to performance. To do so we used TestDFSIO as a benchmark to simulate different MapReduce workloads and an in-production Spark job from Spotify. Tachyon's different writetypes were also put to the test and evaluated. To see how cloud solutions compare, we perform the same evaluations of Tachyon over Google Cloud Storage.
Place, publisher, year, edition, pages
2016. , 38 p.
Information Systems, Social aspects
IdentifiersURN: urn:nbn:se:kth:diva-189571OAI: oai:DiVA.org:kth-189571DiVA: diva2:947054
Subject / course
Information and Communication Technology
Master of Science - Software Engineering of Distributed Systems
Dowling, Jim, Associate Professor