Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Distributed Vertex-Cut Partitioning
KTH, Skolan för informations- och kommunikationsteknik (ICT), Programvaruteknik och Datorsystem, SCS.
KTH, Skolan för informations- och kommunikationsteknik (ICT), Programvaruteknik och Datorsystem, SCS.
KTH, Skolan för elektro- och systemteknik (EES), Kommunikationsnät. (LCN)ORCID-id: 0000-0003-4516-7317
KTH, Skolan för informations- och kommunikationsteknik (ICT), Programvaruteknik och Datorsystem, SCS.ORCID-id: 0000-0002-6718-0144
2014 (Engelska)Ingår i: In the 14th IFIP international conference on Distributed Applications and Interoperable Systems (DAIS’14)., 2014, s. 186-200Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Graph processing has become an integral part of big data analytics. With the ever increasing size of the graphs, one needs to partition them into smaller clusters, which can be managed and processed more easily on multiple machines in a distributed fashion. While there exist numerous solutions for edge-cut partitioning of graphs, very little effort has been made for vertex-cut partitioning. This is in spite of the fact that vertex-cuts are proved significantly more effective than edge-cuts for processing most real world graphs. In this paper we present Ja-be-Ja-vc, a parallel and distributed algorithm for vertex-cut partitioning of large graphs. In a nutshell, Ja-be-Ja-vc is a local search algorithm that iteratively improves upon an initial random assignment of edges to partitions. We propose several heuristics for this optimization and study their impact on the final partitioning. Moreover, we employ simulated annealing technique to escape local optima. We evaluate our solution on various graphs and with variety of settings, and compare it against two state-of-the-art solutions. We show that Ja-be-Ja-vc outperforms the existing solutions in that it not only creates partitions of any requested size, but also requires a vertex-cut that is better than its counterparts and more than 70% better than random partitioning.

Ort, förlag, år, upplaga, sidor
2014. s. 186-200
Serie
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN 0302-9743
Nyckelord [en]
Big data, Graphic methods, Interoperability, Iterative methods, Simulated annealing, Data analytics, Graph processing, Local search algorithm, Multiple machine, Parallel and distributed algorithms, Random assignment, Real-world graphs, Simulated annealing techniques, Graph theory
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:kth:diva-145359DOI: 10.1007/978-3-662-43352-2_15Scopus ID: 2-s2.0-84902593727ISBN: 9783662433515 (tryckt)OAI: oai:DiVA.org:kth-145359DiVA, id: diva2:717980
Konferens
14th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS’14).
Anmärkning

QC 20140519

Tillgänglig från: 2014-05-19 Skapad: 2014-05-19 Senast uppdaterad: 2018-01-11Bibliografiskt granskad
Ingår i avhandling
1. Gossip-based Algorithms for Information Dissemination and Graph Clustering
Öppna denna publikation i ny flik eller fönster >>Gossip-based Algorithms for Information Dissemination and Graph Clustering
2014 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Decentralized algorithms are becoming ever more prevalent in almost all real-world applications that are either data intensive, computation intensive or both. This thesis presents a few decentralized solutions for large-scale (i) data dissemination, (ii) graph partitioning, and (iii) data disambiguation. All these solutions are based on gossip, a light weight peer-to-peer data exchange protocol, and thus, appropriate for execution in a distributed environment.

For efficient data dissemination, we make use of the publish/subscribe communication model and provide two distributed solutions, one for topicbased and one for content-based subscriptions, named Vitis and Vinifera respectively. These systems propagate large quantities of data to interested users with a relatively low overhead. Without any central coordinator and only with the use of gossip, we build a novel topology that enables efficient routing in an unstructured overlay. We construct a hybrid system by injecting structure into an otherwise unstructured network. The resulting structure resembles a navigable small-world network that spans along clusters of nodes that have similar subscriptions. The properties of such an overlay make it an ideal platform for efficient data dissemination in large-scale systems. Our solutions significantly outperforms their counterparts on various subscription and churn scenarios, from both synthetic models and real-world traces.

We then investigate how gossiping protocols can be used, not for overlay construction, but for operating on fixed overlay topologies, which resemble graphs. In particular we study the NP-Complete problem of graph partitioning and present a distributed partitioning solution for very large graphs. This solution, called Ja-be-Ja, is based on local search and does not require access to the entire graph simultaneously. It is, therefore, appropriate for graphs that can not even fit into the memory of a single computer. Once again gossip-based algorithms prove efficient as they enable implementing light-weight peer sampling services, which supply graph nodes with partial knowledge about other nodes in the graph. The performance of our partitioning algorithm is comparable to centralized graph partitioning algorithms, and yet it is scalable and can be executed on several machines in parallel or even in a completely distributed peer-to-peer overlay. It can be used for both edge-cut and vertex-cut partitioning of graphs and can produce partition sizes of any given distribution.

We further extend the use of gossiping protocols to find natural clusters in a graph instead of producing a given number of partitions. This problem, known as graph community detection, has extensive application in various fields and communities. We take the use of our community detection algorithm to the realm of linguistics and address a well-known problem of data disambiguation. In particular, we provide a parallel community detection algorithm for cross-document coreference problem. We operate on graphs that we construct by representing documents’ keywords as nodes and the co-location of those keywords in a document as edges. We then exploit the particular nature of such graphs, which is coreferent words are topologically clustered, and thus, can be efficiently discovered by our community detection algorithm.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH Royal Institute of Technology, 2014. s. x, 22
Serie
TRITA-ICT-ECS AVH, ISSN 1653-6363 ; 14:09
Nationell ämneskategori
Datorsystem
Identifikatorer
urn:nbn:se:kth:diva-145361 (URN)978-91-7595-108-9 (ISBN)
Disputation
2014-05-22, Sal/Hall D, KTH - ICT, Isafjordsgatan 39, Kista, 13:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20140519

Tillgänglig från: 2014-05-19 Skapad: 2014-05-19 Senast uppdaterad: 2014-05-19Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Personposter BETA

Haridi, Seif

Sök vidare i DiVA

Av författaren/redaktören
Rahimian, FatemehPayberah, Amir H.Girdzijauskas, SarunasHaridi, Seif
Av organisationen
Programvaruteknik och Datorsystem, SCSKommunikationsnät
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetricpoäng

doi
isbn
urn-nbn
Totalt: 354 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf