KTHFS – A HIGHLY AVAILABLE ANDSCALABLE FILE SYSTEM
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
KTHFS is a highly available and scalable file system built from the version 0.24 of the Hadoop Distributed File system. It provides a platform to overcome the limitations of existing distributed file systems. These limitations include scalability of metadata server in terms of memory usage, throughput and its availability.
This document describes KTHFS architecture and how it addresses these problems by providing a well coordinated distributed stateless metadata server (or in our case, Namenode) architecture. This is backed with the help of a persistence layer such as NDB cluster. Its primary focus is towards High Availability of the Namenode.
It achieves scalability and recovery by persisting the metadata to an NDB cluster. All namenodes are connected to this NDB cluster and hence are aware of the state of the file system at any point in time.
In terms of High Availability, KTHFS provides Multi-Namenode architecture. Since these namenodes are stateless and have a consistent view of the metadata, clients can issue requests on any of the namenodes. Hence, if one of these servers goes down, clients can retry its operation on the next available namenode.
We next discuss the evaluation of KTHFS in terms of its metadata capacity for medium and large size clusters, throughput and high availability of the Namenode and an analysis of the underlying NDBcluster.
Finally, we conclude this document with a few words on the ongoing and future work in KTHFS.
Place, publisher, year, edition, pages
2013. , 73 p.
Namenode, NDB cluster, MySQL cluster, KTHFS, HDFS, metadata, High Availability, Scalability, throughput
Engineering and Technology
IdentifiersURN: urn:nbn:se:kth:diva-117918OAI: oai:DiVA.org:kth-117918DiVA: diva2:603878
Master of Science - Software Engineering of Distributed Systems
Dowling, Jim, Associate professor