Peer to Peer Search Protocol
Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Decentralized full-text search is still an unsolved problem in peer-to-peer research. As part of this thesis, we introduce Sweep, a fully decentralized full-text search engine built on Apache Lucene, that takes significant steps towards providing reliable, low-latency, and accurate search results. Our main contributions include a novel gossip-based protocol for fast and efficient replication of the search index and a protocol for the automated sharding of the search index.
Therefore, each peer maintains a replica of a bounded-size subset of the whole search index also known as shard. Our approach is based on a low overhead gossip-based leader selection algorithm within each shard, whose cost is independent of the number of peers. For each shard, peers add new index entries to the leader group, and peers securely pull updates within their shard using a Gradient topology that ensures efficient dissemination of updates in log(N) hops within the shard. The full-text search involves a fanout search to each shard, with latency variance reduction techniques to help achieve low response times. We show in simulation the viability of our approach and its robustness to failures and flash crowds.
Place, publisher, year, edition, pages
2015. , 69 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-186713OAI: oai:DiVA.org:kth-186713DiVA: diva2:927845
Master of Science - Distributed Computing