Change search
ReferencesLink to record
Permanent link

Direct link
Load Balancing in a Distributed Storage System for Big and Small Data
KTH, School of Information and Communication Technology (ICT).
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Distributed storage services form the backbone of modern large-scale applications and data processing solutions. In this integral role they have to provide a scalable, reliable and performant service. One of the major challenges any distributed storage system has to address is skew in the data load, which can either be in the distribution of data items or data access over the nodes in the system. One widespread approach to deal with skewed load is data assignment based on uniform consistent hashing. However, there is an opposing desire to optimise and exploit data-locality. That is to say, it is advantageous to collocate items that are typically accessed together. Often this locality property can be achieved by storing keys in an ordered fashion and using application level knowledge to construct keys in such a way that items accessed together will end up very close together in the key space. It can easily be seen, however, that this behaviour exacerbates the load skew issue. A different approach to load balancing is partitioning the data into small subsets which can be relocated independently. These subsets may be known as partitions, tablets or virtual nodes, for example. In this thesis we present the design of CaracalDB, a distributed keyvalue store which provides automatic load-balancing and data-locality, as well as fast re-replication after node failures, while remaining flexible enough to support different consistency levels to choose from. We also evaluate an early prototype of the system, and show that the approach is viable.

Place, publisher, year, edition, pages
2013. , 57 p.
Trita-ICT-EX, 2013:57
National Category
Engineering and Technology
URN: urn:nbn:se:kth:diva-129304OAI: diva2:651375
Educational program
Master of Science - Software Engineering of Distributed Systems
Available from: 2013-09-25 Created: 2013-09-25 Last updated: 2013-09-25Bibliographically approved

Open Access in DiVA

fulltext(638 kB)553 downloads
File information
File name FULLTEXT01.pdfFile size 638 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 553 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 708 hits
ReferencesLink to record
Permanent link

Direct link