Gossip-based Partitioning and Replication Middleware for Online Social Networks
Independent thesis Advanced level (degree of Master (One Year)), 40 credits / 60 HE creditsStudent thesis
The nature of Online Social Networks (OSNs) has created many scalability challenges for OSN providers in the last decade. The main challenge comes with a huge amount of data that is generated by OSNs and requires large data centers to handle the workload. The existence of strong community structure in social networks creates complex dependability patterns within the OSN data and makes it difficult to deploy over multiple servers without breaking the relation structure. Such systems may require communication among mul- tiple servers and can generate high inter-server traffic. With no intelligent data allocation strategy, vital operations of OSNs like query processing and data management inevitably result in high and costly inter-server traffic. Ex- isting solutions, i.e., distributed databases, key-value stores and auto scaling, may be inefficient and unable to scale for OSNs. In our work, we present a distributed partitioning and replication middleware that efficiently partitions an OSN graph and distributes it across the cluster in order to achieve high availability and scalability for OSNs. Our algorithm splits the social graph into equal sized partitions and periodically updates the partitions to achieve optimal placement of users. Additionally, our algorithm achieves fault toler- ance and data locality at a low cost of redundancy through replication. We compared our algorithm with a de-facto random partitioning, state-of-the-art solution SPAR and a distributed graph partitioning algorithm called JA-BE- JA. In the experiments, we show that our approach generates four times lower replication overhead compared to random partitioning, and generates lower replication overhead by a factor of two compared to SPAR. We also demon- strate that our algorithm is able to tolerate high churn rates and scale for large OSNs.
Place, publisher, year, edition, pages
2013. , 75 p.
social networks, distributed systems, storage
IdentifiersURN: urn:nbn:se:kth:diva-137298OAI: oai:DiVA.org:kth-137298DiVA: diva2:678549
Subject / course
Master of Science - Distributed Computing
2013-06-06, Elev C, 8th Floor, Forum, Kista, 13:12 (English)
Girdzijauskas, Šarūnas, assistant professor
Montelius, Johan, Associate Professor