Scaling YARN: A Distributed Resource Manager for Hadoop
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
In recent years, there has been a growing need for computer systems that are capable of handling unprecedented amounts of data. To this end, Hadoop HDFS and Hadoop YARN have become the de facto standard for meeting demanding storage requirements and for managing applications that can process this data. Although YARN is a major advancement from its predecessor MapReduce in terms of scalability and fault-tolerance, its Resource Manager component that performs resource allocation introduces a potential single point of failure and a performance bottleneck due to its centralized architecture. This thesis presents a novel architecture in which the Resource Manager runs on a distributed network of stateless commodity machines as its state is migrated to MySQL Cluster, a relational write-scalable and highly available in-memory database. By doing so, the Resource Manager becomes more scalable as it can now run on multiple nodes as well as more fault-tolerant as arbitrary node failures do not result in state loss. In this work we implemented the proposed architecture for the Resource Tracker service which performs cluster node management for the Resource Manager. Experimental results validate the correctness of our proposal, demonstrate how it scales well by utilizing stateless Resource Manager machines and evaluate its performance in terms of request throughput, system resource and database utilization.
Place, publisher, year, edition, pages
2014. , 82 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-177200OAI: oai:DiVA.org:kth-177200DiVA: diva2:871976