Using Map-Reduce for Large Scale Analysis of Graph-Based Data
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
As social networks have gained in popularity, maintaining and processing the social network graph information using graph algorithms has become an essential source for discovering potential features of the graph. The escalating size of the social networks has made it impossible to process the huge graphs on a single ma chine in a “real-time” level of execution. This thesis is looking into representing and distributing graph-based algorithms using Map-Reduce model.
Graph-based algorithms are discussed in the beginning. Then, several distributed graph computing infrastructures are reviewed, followed by Map-Reduce introduction and some graph computation toolkits based on Map-Reduce model. By reviewing the background and related work, graph-based algorithms are categorized, and adaptation of graph-based algorithms to Map-Reduce model is discussed. Two particular algorithms, MCL and DBSCAN are chosen to be designed using Map- Reduce model, and implemented using Hadoop. New matrix multiplication method is proposed while designing MCL. The DBSCAN is reformulated into connectivity problem using filter method, and Kingdom Expansion Game is proposed to do fast expansion. Scalability and performance of these new designs are evaluated. Conclusion is made according to the literature study, practical design experience and evaluation data. Some suggestions of graph-based algorithms design using Map-Reduce model are also given in the end.
Place, publisher, year, edition, pages
2011. , 74 p.
Engineering and Technology
IdentifiersURN: urn:nbn:se:kth:diva-102822OAI: oai:DiVA.org:kth-102822DiVA: diva2:556816
Master of Science - Software Engineering of Distributed Systems
Vlassov, Vladimir, Universitetslektor