Change search
ReferencesLink to record
Permanent link

Direct link
Making Big Data Smaller: Reducing the storage requirements for big data with erasure coding for Hadoop
KTH, School of Information and Communication Technology (ICT).
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The amount of data stored in modern data centres is growing rapidly nowadays. Large-scale distributed file systems, that maintain the massive data sets in data centres, are designed to work with commodity hardware. Due to the quality and quantity of the hardware components in such systems, failures are considered normal events and, as such, distributed file systems are designed to be highly fault-tolerant. A common approach to achieve fault tolerance is using redundancy by storing three copies of a file across different storage nodes, thereby increasing the storage requirements by a factor of three and further aggravating the storage problem.

A concrete implementation of such a file system is the Hadoop Distributed File System (HDFS). This thesis explores the use of RAID-like mechanisms in order to decrease the storage requirements for big data. We designed and implemented a prototype that extends HDFS with a simple but powerful erasure coding API. Compared to existing approaches, we decided to locate the erasure-coding management logic in the HDFS NameNode, as this allows us to use internal HDFS APIs and state. Because of that, we can repair failures associated with erasurecoded files more quickly and with lower cost. We evaluate our prototype, and we also show that the use of erasure coding instead of replication can greatly decrease the storage requirements of big data without scarifying reliability and availability. Finally, we argue that our API can support a large range of custom encoding strategies, while adding the erasure coding logic to the NameNode can significantly improve the management of the encoded files.

Place, publisher, year, edition, pages
2014. , 76 p.
TRITA-ICT-EX, 2014:98
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-177201OAI: diva2:871985
Available from: 2015-12-08 Created: 2015-11-17 Last updated: 2015-12-08Bibliographically approved

Open Access in DiVA

No full text

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 30 hits
ReferencesLink to record
Permanent link

Direct link