Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Distributed Resource Management for YARN
KTH, School of Information and Communication Technology (ICT).
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In the last year, Hadoop YARN has become the defacto standard resource management platform for data-intensive applications, with support for a wide range of data analytics platforms such as Apache Spark, MapReduce V2, MPI, Apache Flink, and Apache Giraph. The ResourceManager fulfills three main functions: it manages the set of active applications (Applications service), it schedules resources (CPU, memory) to applications (the FIFO/Capacity/Fair Scheduler), and it monitors the state of resources in the cluster (ResourceTracker service). Though YARN is more scalable and fault-tolerant than its predecessor, the Job-Tracker in MapReduce, its ResourceManager is still a single point of failure and a performance bottleneck due to its centralized architecture. Single point of failure problem of YARN has been addressed in Hops-YARN that provides multiple ResourceManagers (one active and others on standby), where the ResourceManager’s state is persisted to MYSQL Cluster and can quickly be recovered by a standby ResourceManager in the event of failure of the active ResourceManager.

In large YARN clusters, with up to 4000 nodes, the ResourceTracker service handles over one thousand heartbeats per second from the nodes in the cluster (NodeManagers), as such become a scalability bottleneck. Large clusters handle this by reducing the frequency of heartbeats from NodeManagers, but this comes at the cost of reduced interactivity for YARN (slower application startup times), as all communication from the ResourceManager to NodeManagers is sent in response to heartbeat messages. Since Hops-YARN is still using a centralized scheduler for all applications, distributing the ResourceTracker service across multiple nodes will reduce the amount of heartbeat messages that need to be processed per ResourceTracker, thus enabling both larger cluster sizes and lower latency for scheduling containers to applications. In this thesis, we will scale-out the ResourceTracker service, by distributing it over standby ResourceManagers using MySQL NDB Cluster event streaming. As such, the distributed Resource Management for YARN that is designed and developed in this project is a first step towards making the monolithic YARN ResourceManager scalable and more interactive.

Place, publisher, year, edition, pages
2015. , 80 p.
Series
TRITA-ICT-EX, 2015:231
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:kth:diva-187044OAI: oai:DiVA.org:kth-187044DiVA: diva2:928708
Educational program
Master of Science - Distributed Computing
Examiners
Available from: 2016-05-16 Created: 2016-05-16 Last updated: 2017-06-15Bibliographically approved

Open Access in DiVA

fulltext(2754 kB)85 downloads
File information
File name FULLTEXT01.pdfFile size 2754 kBChecksum SHA-512
7d3e67ba6b00af7f76d604e3611026404be49b334d083867fdb2ff1e7c989144e954d4b60cc393e616834b73a42bd41e9bd0d799dd62540d2ba0d16ba8dc7b6f
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 85 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 60 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf