Parallel Community Detection For Cross-Document Coreference
2014 (English)Conference paper (Refereed)
This paper presents a highly parallel solution for cross-document coreference resolution, which can deal with billions of documents that exist in the current web. At the core of our solution lies a novel algorithm for community detection in large scale graphs. We operate on graphs which we construct by representing documents' keywords as nodes and the co-location of those keywords in a document as edges. We then exploit the particular nature of such graphs where coreferent words are topologically clustered and can be efficiently discovered by our community detection algorithm. The accuracy of our technique is considerably higher than that of the state of the art, while the convergence time is by far shorter. In particular, we increase the accuracy for a baseline dataset by more than 15% compared to the best reported result so far. Moreover, we outperform the best reported result for a dataset provided for the Word Sense Induction task in SemEval 2010.
Place, publisher, year, edition, pages
IEEE , 2014. 46-53 p.
IdentifiersURN: urn:nbn:se:kth:diva-145360DOI: 10.1109/WI-IAT.2014.79ISI: 000365543800007ScopusID: 2-s2.0-84912558916ISBN: 978-147994143-8OAI: oai:DiVA.org:kth-145360DiVA: diva2:717994
2014 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology - Workshops, WI-IAT 2014; University of WarsawWarsaw; Poland; 11 August 2014 - 14 August 2014
Updated from manuscript to conference paper.
QC 201501082014-05-192014-05-192016-01-08Bibliographically approved