The shortest path is not always a straight line: Leveraging semimetricity in graph analysis
2016 (English)In: Proceedings of the VLDB Endowment, ISSN 2150-8097, E-ISSN 2150-8097, Vol. 9, no 9, 672-683 p.Article in journal (Refereed) Published
In this paper, we leverage the concept of the metric backbone to improve the effciency of large-scale graph analytics. The metric backbone is the minimum subgraph that preserves the shortest paths of a weighted graph. We use the metric backbone in place of the original graph to compute vari- ous graph metrics exactly or with good approximation. By computing on a smaller graph, we improve the performance of graph analytics applications on two different systems, a batch graph processing system and a graph database. Further, we provide an algorithm for the computation of the metric backbone on large graphs. While one can com- pute the metric backbone by solving the all-pairs-shortest- paths problem, this approach incurs prohibitive time and space complexity for big graphs. Instead, we propose a heuristic that makes computing the metric backbone prac- tical even for large graphs. Additionally, we analyze several real datasets of different sizes and domains and we show that we can approximate the metric backbone by removing only first-order semi-metric edges; edges for which a shorter two-hop path exists. We provide a distributed implementation of our algorithm and apply it in large scale scenarios. We evaluate our algo- rithm using a variety of real graphs, including a Facebook social network subgraph of ~50 billion edges. We measure the impact of using the metric backbone on runtime per- formance in two graph management systems. We achieve query speedups of up to 6.7x in the Neo4j commercial graph database and job speedups of up to 6x in the Giraph graph processing system.
Place, publisher, year, edition, pages
ACM Press, 2016. Vol. 9, no 9, 672-683 p.
Complex networks, Graphic methods, Query processing, All pairs shortest paths, Different sizes, Distributed implementation, Graph analytics, Graph processing, Management systems, Real data sets, Time and space complexity, Graph theory
IdentifiersURN: urn:nbn:se:kth:diva-195512ISI: 000386428500005ScopusID: 2-s2.0-84975818524OAI: oai:DiVA.org:kth-195512DiVA: diva2:1049730
5 September 2016 through 9 September 2016
QC 201611252016-11-252016-11-032016-11-29Bibliographically approved