Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Maintaining Strong Consistency Semantics in a Horizontally Scalable and Highly Available Implementation of HDFS
KTH, School of Information and Communication Technology (ICT).
KTH, School of Information and Communication Technology (ICT).
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The Hadoop Distributed Filesystem (HDFS) is the storage layer of Hadoop, scaling to support tens of petabytes of data at companies such as Facebook and Yahoo. One wellknown limitation of HDFS is that its metadata has been stored inmemory on a single node, called the NameNode. To overcome NameNode’s limitations, a distributed file system concept based on HDFS, called KTHFS, was proposed in which NameNode’s metadata are stored on an inmemory replicated distributed database, MySQL Cluster.

In this thesis, we show how to store the metadata of HDFS NameNode in an external distributed database while maintaining strong consistency semantics of HDFS for both filesystem operations and primitive HDFS operations. Our implementation supports MySQL Cluster, to store the metadata, although it only supports a readcommitted transaction isolation model. As a readcommitted isolation model cannot guarantee strong consistency, we needed to carefully design how metadata is read and written in MySQL Cluster to ensure our system preserves HDFS’s consistency model and is both deadlock free and highly performant. We developed a transaction model based on taking metadata snapshotting and the careful ordering of database operations. Our model is general enough to support any database providing at least readcommitted isolation level. We evaluate our model and show how HDFS can scale, while maintaining strong consistency, to terabytes of metadata.

Place, publisher, year, edition, pages
2013. , 64 p.
Series
Trita-ICT-EX, 2013:79
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:kth:diva-127464OAI: oai:DiVA.org:kth-127464DiVA: diva2:644417
Educational program
Master of Science - Distributed Computing
Examiners
Available from: 2013-08-30 Created: 2013-08-30 Last updated: 2013-08-30Bibliographically approved

Open Access in DiVA

fulltext(1720 kB)308 downloads
File information
File name FULLTEXT01.pdfFile size 1720 kBChecksum SHA-512
f252ff7866f17bce2b6606d19151af09bf2a65f099d8ac2f6616827b7e694d7dd54d2f187907423b6d13b09f89d8aae5ea993d3a36a0d0b819e5f5f4018ef5ee
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 308 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 235 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf