Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Gossip-based Algorithms for Information Dissemination and Graph Clustering
KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
2014 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Decentralized algorithms are becoming ever more prevalent in almost all real-world applications that are either data intensive, computation intensive or both. This thesis presents a few decentralized solutions for large-scale (i) data dissemination, (ii) graph partitioning, and (iii) data disambiguation. All these solutions are based on gossip, a light weight peer-to-peer data exchange protocol, and thus, appropriate for execution in a distributed environment.

For efficient data dissemination, we make use of the publish/subscribe communication model and provide two distributed solutions, one for topicbased and one for content-based subscriptions, named Vitis and Vinifera respectively. These systems propagate large quantities of data to interested users with a relatively low overhead. Without any central coordinator and only with the use of gossip, we build a novel topology that enables efficient routing in an unstructured overlay. We construct a hybrid system by injecting structure into an otherwise unstructured network. The resulting structure resembles a navigable small-world network that spans along clusters of nodes that have similar subscriptions. The properties of such an overlay make it an ideal platform for efficient data dissemination in large-scale systems. Our solutions significantly outperforms their counterparts on various subscription and churn scenarios, from both synthetic models and real-world traces.

We then investigate how gossiping protocols can be used, not for overlay construction, but for operating on fixed overlay topologies, which resemble graphs. In particular we study the NP-Complete problem of graph partitioning and present a distributed partitioning solution for very large graphs. This solution, called Ja-be-Ja, is based on local search and does not require access to the entire graph simultaneously. It is, therefore, appropriate for graphs that can not even fit into the memory of a single computer. Once again gossip-based algorithms prove efficient as they enable implementing light-weight peer sampling services, which supply graph nodes with partial knowledge about other nodes in the graph. The performance of our partitioning algorithm is comparable to centralized graph partitioning algorithms, and yet it is scalable and can be executed on several machines in parallel or even in a completely distributed peer-to-peer overlay. It can be used for both edge-cut and vertex-cut partitioning of graphs and can produce partition sizes of any given distribution.

We further extend the use of gossiping protocols to find natural clusters in a graph instead of producing a given number of partitions. This problem, known as graph community detection, has extensive application in various fields and communities. We take the use of our community detection algorithm to the realm of linguistics and address a well-known problem of data disambiguation. In particular, we provide a parallel community detection algorithm for cross-document coreference problem. We operate on graphs that we construct by representing documents’ keywords as nodes and the co-location of those keywords in a document as edges. We then exploit the particular nature of such graphs, which is coreferent words are topologically clustered, and thus, can be efficiently discovered by our community detection algorithm.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2014. , x, 22 p.
Series
TRITA-ICT-ECS AVH, ISSN 1653-6363 ; 14:09
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-145361ISBN: 978-91-7595-108-9 (print)OAI: oai:DiVA.org:kth-145361DiVA: diva2:718004
Public defence
2014-05-22, Sal/Hall D, KTH - ICT, Isafjordsgatan 39, Kista, 13:00 (English)
Opponent
Supervisors
Note

QC 20140519

Available from: 2014-05-19 Created: 2014-05-19 Last updated: 2014-05-19Bibliographically approved
List of papers
1. Vitis: A Gossip-based Hybrid Overlay for Internet-scale Publish/Subscribe Enabling Rendezvous Routing in Unstructured Overlay Networks
Open this publication in new window or tab >>Vitis: A Gossip-based Hybrid Overlay for Internet-scale Publish/Subscribe Enabling Rendezvous Routing in Unstructured Overlay Networks
2011 (English)In: Proceedings - 25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011, 2011, 746-757 p.Conference paper, Published paper (Refereed)
Abstract [en]

Peer-to-peer overlay networks are attractive solutions for building Internet-scale publish/subscribe systems. However, scalability comes with a cost: a message published on a certain topic often needs to traverse a large number of uninterested (unsubscribed) nodes before reaching all its subscribers. This might sharply increase resource consumption for such relay nodes (in terms of bandwidth transmission cost, CPU, etc) and could ultimately lead to rapid deterioration of the system's performance once the relay nodes start dropping the messages or choose to permanently abandon the system. In this paper, we introduce {\em Vitis}, a gossip-based publish/subscribe system that significantly decreases the number of relay messages, and scales to an unbounded number of nodes and topics. This is achieved by the novel approach of enabling rendezvous routing on unstructured overlays. We construct a hybrid system by injecting structure into an otherwise unstructured network. The resulting structure resembles a navigable small-world network, which spans along clusters of nodes that have similar subscriptions. The properties of such an overlay make it an ideal platform for efficient data dissemination in large-scale systems. We perform extensive simulations and evaluate Vitis by comparing its performance against two base-line publish/subscribe systems: one that is oblivious to node subscriptions, and another that exploits the subscription similarities. Our measurements show that Vitis significantly outperforms the base-line solutions on various subscription and churn scenarios, from both synthetic models and real-world traces

National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-92088 (URN)10.1109/IPDPS.2011.75 (DOI)2-s2.0-80053271646 (Scopus ID)978-1-61284-372-8 (ISBN)
Conference
25th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2011. Anchorage, AK. 16 May 2011 - 20 May 2011
Funder
ICT - The Next Generation
Note

QC 20120327

Available from: 2012-03-26 Created: 2012-03-26 Last updated: 2014-05-19Bibliographically approved
2. Subscription Awareness Meets Rendezvous Routing
Open this publication in new window or tab >>Subscription Awareness Meets Rendezvous Routing
2012 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Publish/subscribe communication model has become an indispensable part of the Web 2.0 applications, such as social networks and news syndication. Although there exist a few systems that provide a genuinely scalable service for topic-based publish/subscribe model, the content-based solutions are still suffering from restricted subscription schemes, heavy and unbalanced load on the participating nodes, or excessively high matching complexity. We address these problems by constructing a distributed content-based publish/subscribe system by using only those components that are proven to be scalable and can withstand the workloads of massive sizes. Our publish/subscribe solution, Vinifera, requires only a bounded node degree and as we show, through simulations, it scales well to large network sizes and remains efficient under various subscription patterns and loads.

National Category
Computer and Information Science
Identifiers
urn:nbn:se:kth:diva-107413 (URN)
Conference
AP2PS 2012,The Fourth International Conference on Advances in P2P Systems
Note

QC 20130115

Available from: 2012-12-11 Created: 2012-12-11 Last updated: 2014-05-19Bibliographically approved
3. Locality Awareness in a Peer-to-Peer Publish/Subscribe System
Open this publication in new window or tab >>Locality Awareness in a Peer-to-Peer Publish/Subscribe System
2012 (English)In: Distributed applications and interoperable systems: 12th IFIP WG 6.1 International Conference, DAIS 2012, Stockholm, Sweden, June 13-16, 2012. Proceedings, Springer, 2012, Vol. 7272, 45-58 p.Conference paper, Published paper (Refereed)
Abstract [en]

Peer-to-peer publish/subscribe systems are promising solutions to provide distributed content distribution services at Internet-scale with low cost. One of the potential problems with peer-to-peer overlays, however, is the inefficient traffic and large delays, due to the mismatch between the physical network and the overlay topology. This paper introduces a locality-aware extension to a peer-to-peer publish/subscribe system, named Vitis. The ultimate purpose is to avoid communications over long-distance links, instead, nodes send data over short-distance and low-cost links, when possible, while maintaining an acceptable quality of service. We show, through simulations, that the average data delivery time is up to 40% improved. The cost to pay is at most 10% more relaying in the peer-to-peer overlay.

Place, publisher, year, edition, pages
Springer, 2012
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 7272
Keyword
Locality-awareness, Peer-to-Peer, Publish/Subscribe
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-109813 (URN)10.1007/978-3-642-30823-9_4 (DOI)2-s2.0-84862733929 (Scopus ID)978-364230822-2 (ISBN)
Conference
12th IFIP International Conference on Distributed Applications and Interoperable Systems, DAIS 2012, Stockholm, Sweden, 13 June 2012 through 16 June 2012
Funder
ICT - The Next Generation
Note

QC 20130115

Available from: 2013-01-09 Created: 2013-01-09 Last updated: 2017-01-17Bibliographically approved
4. JA-BE-JA: A Distributed Algorithm for Balanced Graph Partitioning
Open this publication in new window or tab >>JA-BE-JA: A Distributed Algorithm for Balanced Graph Partitioning
Show others...
2013 (English)In: 7th International Conference on Self-Adaptive and Self-Organizing Systems (SASO), 2013 IEEE, IEEE conference proceedings, 2013, 51-60 p.Conference paper, Published paper (Refereed)
Abstract [en]

Balanced graph partitioning is a well known NP-complete problem with a wide range of applications. These applications include many large-scale distributed problems including the optimal storage of large sets of graph-structured data over several hosts-A key problem in today's Cloud infrastructure. However, in very large-scale distributed scenarios, state-of-the-Art algorithms are not directly applicable, because they typically involve frequent global operations over the entire graph. In this paper, we propose a fully distributed algorithm, called JA-BE-JA, that uses local search and simulated annealing techniques for graph partitioning. The algorithm is massively parallel: there is no central coordination, each node is processed independently, and only the direct neighbors of the node, and a small subset of random nodes in the graph need to be known locally. Strict synchronization is not required. These features allow JA-BE-JA to be easily adapted to any distributed graph-processing system from data centers to fully distributed networks. We perform a thorough experimental analysis, which shows that the minimal edge-cut value achieved by JA-BE-JA is comparable to state-of-the-Art centralized algorithms such as METIS. In particular, on large social networks JA-BEJA outperforms METIS, which makes JA-BE-JA-A bottom-up, self-organizing algorithm-A highly competitive practical solution for graph partitioning.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2013
Series
International Conference on Self-Adaptive and Self-Organizing Systems, ISSN 1949-3673
Keyword
distributed algorithm, graph partitioning, load balancing, simulated annealing
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-134801 (URN)10.1109/SASO.2013.13 (DOI)000335222500006 ()2-s2.0-84893207551 (Scopus ID)978-076955129-6 (ISBN)
Conference
SASO 2013, Seventh IEEE International Conference on Self-Adaptive and Self-Organizing Systems, Philadelphia, USA; September 9-13, 2013
Note

QC 20131218

Available from: 2013-11-28 Created: 2013-11-28 Last updated: 2014-06-03Bibliographically approved
5. Distributed Vertex-Cut Partitioning
Open this publication in new window or tab >>Distributed Vertex-Cut Partitioning
2014 (English)In: In the 14th IFIP international conference on Distributed Applications and Interoperable Systems (DAIS’14)., 2014, 186-200 p.Conference paper, Published paper (Refereed)
Abstract [en]

Graph processing has become an integral part of big data analytics. With the ever increasing size of the graphs, one needs to partition them into smaller clusters, which can be managed and processed more easily on multiple machines in a distributed fashion. While there exist numerous solutions for edge-cut partitioning of graphs, very little effort has been made for vertex-cut partitioning. This is in spite of the fact that vertex-cuts are proved significantly more effective than edge-cuts for processing most real world graphs. In this paper we present Ja-be-Ja-vc, a parallel and distributed algorithm for vertex-cut partitioning of large graphs. In a nutshell, Ja-be-Ja-vc is a local search algorithm that iteratively improves upon an initial random assignment of edges to partitions. We propose several heuristics for this optimization and study their impact on the final partitioning. Moreover, we employ simulated annealing technique to escape local optima. We evaluate our solution on various graphs and with variety of settings, and compare it against two state-of-the-art solutions. We show that Ja-be-Ja-vc outperforms the existing solutions in that it not only creates partitions of any requested size, but also requires a vertex-cut that is better than its counterparts and more than 70% better than random partitioning.

Series
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), ISSN 0302-9743
Keyword
Big data, Graphic methods, Interoperability, Iterative methods, Simulated annealing, Data analytics, Graph processing, Local search algorithm, Multiple machine, Parallel and distributed algorithms, Random assignment, Real-world graphs, Simulated annealing techniques, Graph theory
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-145359 (URN)10.1007/978-3-662-43352-2_15 (DOI)2-s2.0-84902593727 (Scopus ID)9783662433515 (ISBN)
Conference
14th IFIP International Conference on Distributed Applications and Interoperable Systems (DAIS’14).
Note

QC 20140519

Available from: 2014-05-19 Created: 2014-05-19 Last updated: 2015-12-02Bibliographically approved
6. Parallel Community Detection For Cross-Document Coreference
Open this publication in new window or tab >>Parallel Community Detection For Cross-Document Coreference
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a highly parallel solution for cross-document coreference resolution, which can deal with billions of documents that exist in the current web. At the core of our solution lies a novel algorithm for community detection in large scale graphs. We operate on graphs which we construct by representing documents' keywords as nodes and the co-location of those keywords in a document as edges. We then exploit the particular nature of such graphs where coreferent words are topologically clustered and can be efficiently discovered by our community detection algorithm. The accuracy of our technique is considerably higher than that of the state of the art, while the convergence time is by far shorter. In particular, we increase the accuracy for a baseline dataset by more than 15% compared to the best reported result so far. Moreover, we outperform the best reported result for a dataset provided for the Word Sense Induction task in SemEval 2010.

Place, publisher, year, edition, pages
IEEE, 2014
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-145360 (URN)10.1109/WI-IAT.2014.79 (DOI)000365543800007 ()2-s2.0-84912558916 (Scopus ID)978-147994143-8 (ISBN)
Conference
2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2014; University of WarsawWarsaw; Poland; 11 August 2014 - 14 August 2014
Note

Updated from manuscript to conference paper.

QC 20150108

Available from: 2014-05-19 Created: 2014-05-19 Last updated: 2016-01-08Bibliographically approved

Open Access in DiVA

Thesis(2057 kB)595 downloads
File information
File name FULLTEXT01.pdfFile size 2057 kBChecksum SHA-512
a02a8828d644309c303072df350fe0e5d57656dac924c1e1ca6078ce40d07575e3f2059c8cd2779289c452b8d8c157b1385256c333c89a7c2e806ead54063063
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Rahimian, Fatemeh
By organisation
Software and Computer systems, SCS
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 595 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 500 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf