Change search
ReferencesLink to record
Permanent link

Direct link
Distributed Resource Management for YARN
KTH, School of Information and Communication Technology (ICT).
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In the last year, Hadoop YARN has become the defacto standard resource management platform for data-intensive applications, with support for a wide range of data analytics platforms such as Apache Spark, MapReduce V2, MPI, Apache Flink, and Apache Giraph. The ResourceManager fulfills three main functions: it manages the set of active applications (Applications service), it schedules resources (CPU, memory) to applications (the FIFO/Capacity/Fair Scheduler), and it monitors the state of resources in the cluster (ResourceTracker service). Though YARN is more scalable and fault-tolerant than its predecessor, the Job-Tracker in MapReduce, its ResourceManager is still a single point of failure and a performance bottleneck due to its centralized architecture. Single point of failure problem of YARN has been addressed in Hops-YARN that provides multiple ResourceManagers (one active and others on standby), where the ResourceManager’s state is persisted to MYSQL Cluster and can quickly be recovered by a standby ResourceManager in the event of failure of the active ResourceManager.

In large YARN clusters, with up to 4000 nodes, the ResourceTracker service handles over one thousand heartbeats per second from the nodes in the cluster (NodeManagers), as such become a scalability bottleneck. Large clusters handle this by reducing the frequency of heartbeats from NodeManagers, but this comes at the cost of reduced interactivity for YARN (slower application startup times), as all communication from the ResourceManager to NodeManagers is sent in response to heartbeat messages. Since Hops-YARN is still using a centralized scheduler for all applications, distributing the ResourceTracker service across multiple nodes will reduce the amount of heartbeat messages that need to be processed per ResourceTracker, thus enabling both larger cluster sizes and lower latency for scheduling containers to applications. In this thesis, we will scale-out the ResourceTracker service, by distributing it over standby ResourceManagers using MySQL NDB Cluster event streaming. As such, the distributed Resource Management for YARN that is designed and developed in this project is a first step towards making the monolithic YARN ResourceManager scalable and more interactive.

Place, publisher, year, edition, pages
2015. , 80 p.
TRITA-ICT-EX, 2015:231
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-187044OAI: diva2:928708
Educational program
Master of Science - Distributed Computing
Available from: 2016-05-16 Created: 2016-05-16 Last updated: 2016-05-16Bibliographically approved

Open Access in DiVA

fulltext(2754 kB)18 downloads
File information
File name FULLTEXT01.pdfFile size 2754 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 18 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 22 hits
ReferencesLink to record
Permanent link

Direct link