Graph partitioning is an essential yet challenging task for massive graph analysis in distributed computing. Common graph partitioning methods scan the complete graph to obtain structural characteristics offline, before partitioning. However, the emerging need for low-latency, continuous graph analysis led to the development of online partitioning methods. Online methods ingest edges or vertices as a stream, making partitioning decisions on the fly based on partial knowledge of the graph. Prior studies have compared offline graph partitioning techniques across different systems. Yet, little effort has been put into investigating the characteristics of online graph partitioning strategies.
In this work, we describe and categorize online graph partitioning techniques based on their assumptions, objectives and costs. Furthermore, we employ an experimental comparison across different applications and datasets, using a unified distributed runtime based on Apache Flink. Our experimental results showcase that model-dependent online partitioning techniques such as low-cut algorithms offer better performance for communication-intensive applications such as bulk synchronous iterative algorithms, albeit higher partitioning costs. Otherwise, model-agnostic techniques trade off data locality for lower partitioning costs and balanced workloads which is beneficial when executing data-parallel single-pass graph algorithms.
Across many fields of science, primary data sets like sensor read-outs, time series, and genomic sequences are analyzed by complex chains of specialized tools and scripts exchanging intermediate results in domain-specific file formats. Scientific work ow management systems (SWfMSs) support the development and execution of these tool chains by providing work ow specification languages, graphical editors, fault-tolerant execution engines, etc. However, many SWfMSs are not prepared to handle large data sets because of inadequate support for distributed computing. On the other hand, most SWfMSs that do support distributed computing only allow static task execution orders. We present SAASFEE, a SWfMS which runs arbitrarily complex work ows on Hadoop YARN. Work ows are specified in Cuneiform, a functional work ow language focusing on parallelization and easy integration of existing software. Cuneiform work ows are executed on Hi-WAY, a higher-level scheduler for running work ows on YARN. Distinct features of SAASFEE are the ability to execute iterative work ows, an adaptive task scheduler, re-executable provenance traces, and compatibility to selected other work ow systems. In the demonstration, we present all components of SAASFEE using real-life work ows from the field of genomics.
Stream processors are emerging in industry as an apparatus that drives analytical but also mission critical services handling the core of persistent application logic. Thus, apart from scalability and low-latency, a rising system need is first-class support for application state together with strong consistency guarantees, and adaptivity to cluster reconfigurations, software patches and partial failures. Although prior systems research has addressed some of these specific problems, the practical challenge lies on how such guarantees can be materialized in a transparent, non-intrusive manner that relieves the user from unnecessary constraints. Such needs served as the main design principles of state management in Apache Flink, an open source, scalable stream processor.
We present Flink’s core pipelined, in-flight mechanism which guarantees the creation of lightweight, consistent, distributed snapshots of application state, progressively, without impacting continuous execution. Consistent snapshots cover all needs for system reconfiguration, fault tolerance and version management through coarse grained rollback recovery. Application state is declared explicitly to the system, allowing efficient partitioning and transparent commits to persistent storage. We further present Flink’s backend implementations and mechanisms for high availability, external state queries and output commit. Finally, we demonstrate how these mechanisms behave in practice with metrics and large deployment insights exhibiting the low performance trade-offs of our approach and the general benefits of exploiting asynchrony in continuous, yet sustainable system deployments.
In this paper, we leverage the concept of the metric backbone to improve the effciency of large-scale graph analytics. The metric backbone is the minimum subgraph that preserves the shortest paths of a weighted graph. We use the metric backbone in place of the original graph to compute vari- ous graph metrics exactly or with good approximation. By computing on a smaller graph, we improve the performance of graph analytics applications on two different systems, a batch graph processing system and a graph database. Further, we provide an algorithm for the computation of the metric backbone on large graphs. While one can com- pute the metric backbone by solving the all-pairs-shortest- paths problem, this approach incurs prohibitive time and space complexity for big graphs. Instead, we propose a heuristic that makes computing the metric backbone prac- tical even for large graphs. Additionally, we analyze several real datasets of different sizes and domains and we show that we can approximate the metric backbone by removing only first-order semi-metric edges; edges for which a shorter two-hop path exists. We provide a distributed implementation of our algorithm and apply it in large scale scenarios. We evaluate our algo- rithm using a variety of real graphs, including a Facebook social network subgraph of ~50 billion edges. We measure the impact of using the metric backbone on runtime per- formance in two graph management systems. We achieve query speedups of up to 6.7x in the Neo4j commercial graph database and job speedups of up to 6x in the Giraph graph processing system.