GCNSplit: Bounding the State of Streaming Graph PartitioningShow others and affiliations
2022 (English)In: Proceedings of the 5th International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference, Association for Computing Machinery, Inc , 2022, article id 3Conference paper, Published paper (Refereed)
Abstract [en]
This paper introduces GCNSplit, a streaming graph partitioning framework capable of handling unbounded streams with bounded state requirements. We frame partitioning as a classification problem and we employ an unsupervised model whose loss function minimizes edge-cuts. GCNSplit leverages an inductive graph convolutional network (GCN) to embed graph characteristics into a low-dimensional space and assign edges to partitions in an online manner. We evaluate GCNSplit with real-world graph datasets of various sizes and domains. Our results demonstrate that GCNSplit provides high-throughput, top-quality partitioning, and successfully leverages data parallelism. It achieves a throughput of 430K edges/s on a real-world graph of 1.6B edges using a bounded 147KB-sized model, contrary to the state-of-the-art HDRF algorithm that requires > 116GB in-memory state. With a well-balanced normalized load of 1.01, GCNSplit achieves a replication factor on par with HDRF, showcasing high partitioning quality while storing three orders of magnitude smaller partitioning state. Owing to the power of GCNs, we show that GCNSplit can generalize to entirely unseen graphs while outperforming the state-of-the-art stream partitioners in some cases.
Place, publisher, year, edition, pages
Association for Computing Machinery, Inc , 2022. article id 3
Keywords [en]
data streams, graph neural networks, graph partitioning
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-317564DOI: 10.1145/3533702.3534920Scopus ID: 2-s2.0-85137089721OAI: oai:DiVA.org:kth-317564DiVA, id: diva2:1695576
Conference
5th International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference, 17 June 2022, Philadelphia, USA
Note
QC 20220914
Part of proceedings: ISBN 978-145039377-5
2022-09-142022-09-142022-09-14Bibliographically approved