kth.sePublications
Change search
Refine search result
1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abbas, Zainab
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Scalable Streaming Graph and Time Series Analysis Using Partitioning and Machine Learning2021Doctoral thesis, monograph (Other academic)
    Abstract [en]

    Recent years have witnessed a massive increase in the amount of data generated by the Internet of Things (IoT) and social media. Processing huge amounts of this data poses non-trivial challenges in terms of the hardware and performance requirements of modern-day applications. The data we are dealing with today is of massive scale, high intensity and comes in various forms. MapReduce was a popular and clever choice of handling big data using a distributed programming model, which made the processing of huge volumes of data possible using clusters of commodity machines. However, MapReduce was not a good fit for performing complex tasks, such as graph processing, iterative programs and machine learning. Modern data processing frameworks, that are being popularly used to process complex data and perform complex analysis tasks, overcome the shortcomings of MapReduce. Some of these popular frameworks include Apache Spark for batch and stream processing, Apache Flink for stream processing and Tensor Flow for machine learning.

    In this thesis, we deal with complex analytics on data modeled as time series, graphs and streams. Time series are commonly used to represent temporal data generated by IoT sensors. Analysing and forecasting time series, i.e. extracting useful characteristics and statistics of data and predicting data, is useful for many fields that include, neuro-physiology, economics, environmental studies, transportation, etc. Another useful data representation we work with, are graphs. Graphs are complex data structures used to represent relational data in the form of vertices and edges. Graphs are present in various application domains, such as recommendation systems, road traffic analytics, web analysis, social media analysis. Due to the increasing size of graph data, a single machine is often not sufficient to process the complete graph. Therefore, the computation, as well as the data, must be distributed. Graph partitioning, the process of dividing graphs into subgraphs, is an essential step in distributed graph processing of large scale graphs because it enables parallel and distributed processing.

    The majority of data generated from IoT and social media originates as a continuous stream, such as series of events from a social media network, time series generated from sensors, financial transactions, etc. The stream processing paradigm refers to the processing of data streaming that is continuous and possibly unbounded. Combining both graphs and streams leads to an interesting and rather challenging domain of streaming graph analytics. Graph streams refer to data that is modelled as a stream of edges or vertices with adjacency lists representing relations between entities of continuously evolving data generated by a single or multiple data sources. Streaming graph analytics is an emerging research field with great potential due to its capabilities of processing large graph streams with limited amounts of memory and low latency. 

    In this dissertation, we present graph partitioning techniques for scalable streaming graph and time series analysis. First, we present and evaluate the use of data partitioning to enable data parallelism in order to address the challenge of scale in large spatial time series forecasting. We propose a graph partitioning technique for large scale spatial time series forecasting of road traffic as a use-case. Our experimental results on traffic density prediction for real-world sensor dataset using Long Short-Term Memory Neural Networks show that the partitioning-based models take 12x lower training time when run in parallel compared to the unpartitioned model of the entire road infrastructure. Furthermore, the partitioning-based models have 2x lower prediction error (RMSE) compared to the entire road model. Second, we showcase the practical usefulness of streaming graph analytics for large spatial time series analysis with the real-world task of traffic jam detection and reduction. We propose to apply streaming graph analytics by performing useful analytics on traffic data stream at scale with high throughput and low latency. Third, we study, evaluate, and compare the existing state-of-the-art streaming graph partitioning algorithms. We propose a uniform analysis framework built using Apache Flink to evaluate and compare partitioning features and characteristics of streaming graph partitioning methods. Finally, we present GCNSplit, a novel ML-driven streaming graph partitioning solution, that uses a small and constant in-memory state (bounded state) to partition (possibly unbounded) graph streams. Our results demonstrate that \ours provides high-throughput partitioning and can leverage data parallelism to sustain input rates of 100K edges/s. GCNSplit exhibits a partitioning quality, in terms of graph cuts and load balance, that matches that of the state-of-the-art HDRF (High Degree Replicated First) algorithm while storing three orders of magnitude smaller partitioning state.

    Download full text (pdf)
    fulltext
  • 2.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Al-Shishtawy, Ahmad
    RISE SICS, Stockholm, Sweden.
    Girdzijauskas, Sarunas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. RISE SICS, Stockholm, Sweden..
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Short-Term Traffic Prediction Using Long Short-Term Memory Neural Networks2018Conference paper (Refereed)
    Abstract [en]

    Short-term traffic prediction allows Intelligent Transport Systems to proactively respond to events before they happen. With the rapid increase in the amount, quality, and detail of traffic data, new techniques are required that can exploit the information in the data in order to provide better results while being able to scale and cope with increasing amounts of data and growing cities. We propose and compare three models for short-term road traffic density prediction based on Long Short-Term Memory (LSTM) neural networks. We have trained the models using real traffic data collected by Motorway Control System in Stockholm that monitors highways and collects flow and speed data per lane every minute from radar sensors. In order to deal with the challenge of scale and to improve prediction accuracy, we propose to partition the road network into road stretches and junctions, and to model each of the partitions with one or more LSTM neural networks. Our evaluation results show that partitioning of roads improves the prediction accuracy by reducing the root mean square error by the factor of 5. We show that we can reduce the complexity of LSTM network by limiting the number of input sensors, on average to 35% of the original number, without compromising the prediction accuracy.

  • 3.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Ivarsson, Jón Reginbald
    KTH.
    Al-Shishtawy, A.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Scaling Deep Learning Models for Large Spatial Time-Series Forecasting:
    2019In: Proceedings - 2019 IEEE International Conference on Big Data, Big Data 2019:
    , Institute of Electrical and Electronics Engineers Inc. , 2019, p. 1587-1594
    Conference paper (Refereed)
    Abstract [en]

    Neural networks are used for different machine learning tasks, such as spatial time-series forecasting. Accurate modelling of a large and complex system requires large datasets to train a deep neural network that causes a challenge of scale as training the network and serving the model are computationally and memory intensive. One example of a complex system that produces a large number of spatial time-series is a large road sensor infrastructure deployed for traffic monitoring. The goal of this work is twofold: 1) To model large amount of spatial time-series from road sensors; 2) To address the scalability problem in a real-life task of large-scale road traffic prediction which is an important part of an Intelligent Transportation System.We propose a partitioning technique to tackle the scalability problem that enables parallelism in both training and prediction: 1) We represent the sensor system as a directed weighted graph based on the road structure, which reflects dependencies between sensor readings, and weighted by sensor readings and inter-sensor distances; 2) We propose an algorithm to automatically partition the graph taking into account dependencies between spatial time-series from sensors; 3) We use the generated sensor graph partitions to train a prediction model per partition. Our experimental results on traffic density prediction using Long Short-Term Memory (LSTM) Neural Networks show that the partitioning-based models take 2x, if run sequentially, and 12x, if run in parallel, less training time, and 20x less prediction time compared to the unpartitioned model of the entire road infrastructure. The partitioning-based models take 100x less total sequential training time compared to single sensor models, i.e., one model per sensor. Furthermore, the partitioning-based models have 2x less prediction error (RMSE) compared to both the single sensor models and the entire road model. 

  • 4.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Kalavri, Vasiliki
    Systems Group, ETH, Zurich, Switzerland.
    Carbone, Paris
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Streaming Graph Partitioning: An Experimental Study2018In: Proceedings of the VLDB Endowment, E-ISSN 2150-8097, Vol. 11, no 11, p. 1590-1603Article in journal (Refereed)
    Abstract [en]

    Graph partitioning is an essential yet challenging task for massive graph analysis in distributed computing. Common graph partitioning methods scan the complete graph to obtain structural characteristics offline, before partitioning. However, the emerging need for low-latency, continuous graph analysis led to the development of online partitioning methods. Online methods ingest edges or vertices as a stream, making partitioning decisions on the fly based on partial knowledge of the graph. Prior studies have compared offline graph partitioning techniques across different systems. Yet, little effort has been put into investigating the characteristics of online graph partitioning strategies.

    In this work, we describe and categorize online graph partitioning techniques based on their assumptions, objectives and costs. Furthermore, we employ an experimental comparison across different applications and datasets, using a unified distributed runtime based on Apache Flink. Our experimental results showcase that model-dependent online partitioning techniques such as low-cut algorithms offer better performance for communication-intensive applications such as bulk synchronous iterative algorithms, albeit higher partitioning costs. Otherwise, model-agnostic techniques trade off data locality for lower partitioning costs and balanced workloads which is beneficial when executing data-parallel single-pass graph algorithms.

  • 5.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Sigurdsson, Thorsteinn Thorri
    KTH.
    Al-Shishtawy, Ahmad
    RISE Res Inst Sweden, Stockholm, Sweden..
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Evaluation of the Use of Streaming Graph Processing Algorithms for Road Congestion Detection2018In: 2018 IEEE INT CONF ON PARALLEL & DISTRIBUTED PROCESSING WITH APPLICATIONS, UBIQUITOUS COMPUTING & COMMUNICATIONS, BIG DATA & CLOUD COMPUTING, SOCIAL COMPUTING & NETWORKING, SUSTAINABLE COMPUTING & COMMUNICATIONS / [ed] Chen, JJ Yang, LT, IEEE COMPUTER SOC , 2018, p. 1017-1025Conference paper (Refereed)
    Abstract [en]

    Real-time road congestion detection allows improving traffic safety and route planning. In this work, we propose to use streaming graph processing algorithms for road congestion detection and evaluate their accuracy and performance. We represent road infrastructure sensors in the form of a directed weighted graph and adapt the Connected Components algorithm and some existing graph processing algorithms, originally used for community detection in social network graphs, for the task of road congestion detection. In our approach, we detect Connected Components or communities of sensors with similarly weighted edges that reflect different states in the traffic, e.g., free flow or congested state, in regions covered by detected sensor groups. We have adapted and implemented the Connected Components and community detection algorithms for detecting groups in the weighted sensor graphs in batch and streaming manner. We evaluate our approach by building and processing the road infrastructure sensor graph for Stockholm's highways using real-world data from the Motorway Control System operated by the Swedish traffic authority. Our results indicate that the Connected Components and DenGraph community detection algorithms can detect congestion with accuracy up to approximate to 94% for Connected Components and up to approximate to 88% for DenGraph. The Louvain Modularity algorithm for community detection fails to detect congestion regions for sparsely connected graphs, representing roads that we have considered in this study. The Hierarchical Clustering algorithm using speed and density readings is able to detect congestion without details, such as shockwaves.

  • 6.
    Abbas, Zainab
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Sottovia, Paolo
    Huawei Munich Research Centre, Munich, Germany.
    Hassan, Mohamad Al Hajj
    Huawei Munich Research Centre, Munich, Germany.
    Foroni, Daniele
    Huawei Munich Research Centre, Munich, Germany.
    Bortoli, Stefano
    Huawei Munich Research Centre, Munich, Germany.
    Real-time Traffic Jam Detection and Congestion Reduction Using Streaming Graph Analytics2020In: 2020 IEEE International Conference on Big Data (Big Data), Institute of Electrical and Electronics Engineers (IEEE) , 2020, p. 3109-3118Conference paper (Refereed)
    Abstract [en]

    Traffic congestion is a problem in day to day life, especially in big cities. Various traffic control infrastructure systems have been deployed to monitor and improve the flow of traffic across cities. Real-time congestion detection can serve for many useful purposes that include sending warnings to drivers approaching the congested area and daily route planning. Most of the existing congestion detection solutions combine historical data with continuous sensor readings and rely on data collected from multiple sensors deployed on the road, measuring the speed of vehicles. While in our work we present a framework that works in a pure streaming setting where historic data is not available before processing. The traffic data streams, possibly unbounded, arrive in real-time. Moreover, the data used in our case is collected only from sensors placed on the intersections of the road. Therefore, we investigate in creating a real-time congestion detection and reduction solution, that works on traffic streams without any prior knowledge. The goal of our work is 1) to detect traffic jams in real-time, and 2) to reduce the congestion in the traffic jam areas.In this work, we present a real-time traffic jam detection and congestion reduction framework: 1) We propose a directed weighted graph representation of the traffic infrastructure network for capturing dependencies between sensor data to measure traffic congestion; 2) We present online traffic jam detection and congestion reduction techniques built on a modern stream processing system, i.e., Apache Flink; 3) We develop dynamic traffic light policies for controlling traffic in congested areas to reduce the travel time of vehicles. Our experimental results indicate that we are able to detect traffic jams in real-time and deploy new traffic light policies which result in 27% less travel time at the best and 8% less travel time on average compared to the travel time with default traffic light policies. Our scalability results show that our system is able to handle high-intensity streaming data with high throughput and low latency.

  • 7.
    Arsalan, Muhammad
    et al.
    Tech Univ Carolo Wilhelmina Braunschweig, Braunschweig, Germany..
    Di Matteo, Davide
    KTH.
    Imtiaz, Sana
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. KRY Int AB, Stockholm, Sweden..
    Abbas, Zainab
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. KRY Int AB, Stockholm, Sweden..
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Issakov, Vadim
    Tech Univ Carolo Wilhelmina Braunschweig, Braunschweig, Germany..
    Energy-Efficient Privacy-Preserving Time-Series Forecasting on User Health Data Streams2022In: Proceedings - 2022 IEEE 21st International Conference on Trust, Security and Privacy in Computing and Communications, TrustCom 2022, Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 541-546Conference paper (Refereed)
    Abstract [en]

    Health monitoring devices are gaining popularity both as wellness tools and as a source of information for healthcare decisions. In this work, we use Spiking Neural Networks (SNNs) for time-series forecasting due to their proven energy-saving capabilities. Thanks to their design that closely mimics the natural nervous system, SNNs are energy-efficient in contrast to classic Artificial Neural Networks (ANNs). We design and implement an energy-efficient privacy-preserving forecasting system on real-world health data streams using SNNs and compare it to a state-of-the-art system with Long short-term memory (LSTM) based prediction model. Our evaluation shows that SNNs tradeoff accuracy (2.2x greater error), to grant a smaller model (19% fewer parameters and 77% less memory consumption) and a 43% less training time. Our model is estimated to consume 3.36 mu J energy, which is significantly less than the traditional ANNs. Finally, we apply epsilon-differential privacy for enhanced privacy guarantees on our federated learning-based models. With differential privacy of epsilon = 0.1, our experiments report an increase in the measured average error (RMSE) of only 25%.

  • 8.
    Fedeli, Stefano
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Schain, Frida
    Schain Res, Stockholm, Sweden..
    Imtiaz, Sana
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Abbas, Zainab
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Privacy Preserving Survival Prediction2021In: 2021 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA) / [ed] Chen, Y Ludwig, H Tu, Y Fayyad, U Zhu, X Hu, X Byna, S Liu, X Zhang, J Pan, S Papalexakis, V Wang, J Cuzzocrea, A Ordonez, C, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 4600-4608Conference paper (Refereed)
    Abstract [en]

    Predictive modeling has the potential to improve risk stratification of cancer patients and thereby contribute to optimized treatment strategies and better outcomes for patients in clinical practice. To develop robust predictive models for decision-making in healthcare, sensitive patient-level data is often required when developing the training models. Consequently, data privacy is an important aspect to consider when building these predictive models and in subsequent communication of the results. In this study we have used Graph Neural Networks for survival prediction, and compared the accuracy to state-of-the-art prediction models after applying Differential Privacy and k-Anonymity, i.e. two privacy-preservation solutions. By using two different data sources we demonstrated that Graph Neural Networks and Survival Forests are the two most well-performing survival prediction methods when used in combination with privacy preservation solutions. Furthermore, when the predictive model was built using clinical expertise in the specific area of interest, the prediction accuracy of the proposed knowledge based graph model drops by at most 10% when used with privacy preservation solutions. Our proposed knowledge based graph is therefore more suitable to be used in combination with privacy preservation solutions as compared to other graph models.

  • 9.
    Imtiaz, Sana
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Horchidan, Sonia-Florina
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Abbas, Zainab
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Arsalan, Muhammad
    Otto-von-Guericke Universitat Magdeburg, Magdeburg, Germany.
    Chaudhry, Hassan Nazeer
    DEIB, Politecnico di Milano, Milan, Italy.
    Vlassov, Vladimir
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Privacy Preserving Time-Series Forecasting of User Health Data Streams2020In: 2020 IEEE International Conference on Big Data (Big Data), Institute of Electrical and Electronics Engineers (IEEE) , 2020, p. 3428-3437Conference paper (Refereed)
    Abstract [en]

    Privacy preservation plays a vital role in health care applications as the requirements for privacy preservation are very strict in this domain. With the rapid increase in the amount, quality and detail of health data being gathered with smart devices, new mechanisms are required that can cope with the challenges of large scale and real-time processing requirements. Federated learning (FL) is one of the conventional approaches that facilitate the training of AI models without access to the raw data. However, recent studies have shown that FL alone does not guarantee sufficient privacy. Differential privacy (DP) is a well-known approach for privacy guarantees, however, because of the noise addition, DP needs to make a trade-off between privacy and accuracy. In this work, we design and implement an end-to-end pipeline using DP and FL for the first time in the context of health data streams. We propose a clustering mechanism to leverage the similarities between users to improve the prediction accuracy as well as significantly reduce the model training time. Depending on the dataset and features, our predictions are no more than 0.025% far off the ground-truth value with respect to the range of value. Moreover, our clustering mechanism brings a significant reduction in the training time, with up to 49% reduction in prediction accuracy error in the best case, as compared to training a single model on the entire dataset. Our proposed privacy preserving mechanism at best introduces a decrease of ≈ 2% in the prediction accuracy of the trained models. Furthermore, our proposed clustering mechanism reduces the prediction error even in highly noisy settings by as much as 38% as compared to using a single federated private model.

  • 10.
    Mitropolitsky, Milko
    et al.
    KTH.
    Abbas, Zainab
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Payberah, Amir H.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Graph Representation Matters in Device Placement2020In: DIDL 2020 - Proceedings of the 2020 Workshop on Distributed Infrastructures for Deep Learning, Part of Middleware 2020, Association for Computing Machinery, Inc , 2020, p. 1-6Conference paper (Refereed)
    Abstract [en]

    Modern Neural Network (NN) models require more data and parameters to perform ever more complicated tasks. One approach to train a massive NN is to distribute it across multiple devices. This approach raises a problem known as the device placement problem. Most of the state-of-the-art solutions that tackle this problem leverage graph embedding techniques. In this work, we assess the impact of different graph embedding techniques on the quality of device placement, measured by (i) the execution time of partitioned NN models, and (ii) the computation time of the graph embedding technique. In particular, we expand Placeto, a state-of-the-art device placement solution, and evaluate the impact of two graph embedding techniques, GraphSAGE and P-GNN, compared to the original Placeto graph embedding model, Placeto-GNN. In terms of the execution time improvement, we achieve an increase of 23.967% when using P-GNN compared to Placeto-GNN, while GraphSAGE produces 1.165% better results than Placeto-GNN. Regarding computation time, GraphSAGE has a gain of 11.569% compared to Placeto-GNN, whereas P-GNN is 6.95% slower than it.

  • 11.
    Zwolak, Michal
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS).
    Abbas, Zainab
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Horchidan, Sonia-Florina
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Carbone, Paris
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Kalavri, V.
    GCNSplit: Bounding the State of Streaming Graph Partitioning2022In: Proceedings of the 5th International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM 2022 - In conjunction with the 2022 ACM SIGMOD/PODS Conference, Association for Computing Machinery, Inc , 2022, article id 3Conference paper (Refereed)
    Abstract [en]

    This paper introduces GCNSplit, a streaming graph partitioning framework capable of handling unbounded streams with bounded state requirements. We frame partitioning as a classification problem and we employ an unsupervised model whose loss function minimizes edge-cuts. GCNSplit leverages an inductive graph convolutional network (GCN) to embed graph characteristics into a low-dimensional space and assign edges to partitions in an online manner. We evaluate GCNSplit with real-world graph datasets of various sizes and domains. Our results demonstrate that GCNSplit provides high-throughput, top-quality partitioning, and successfully leverages data parallelism. It achieves a throughput of 430K edges/s on a real-world graph of 1.6B edges using a bounded 147KB-sized model, contrary to the state-of-the-art HDRF algorithm that requires > 116GB in-memory state. With a well-balanced normalized load of 1.01, GCNSplit achieves a replication factor on par with HDRF, showcasing high partitioning quality while storing three orders of magnitude smaller partitioning state. Owing to the power of GCNs, we show that GCNSplit can generalize to entirely unseen graphs while outperforming the state-of-the-art stream partitioners in some cases.

1 - 11 of 11
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf