Semi-Supervised Multiple Disambiguation
2015 (English)In: IEEE Computer Society Conference Publishing Services / [ed] IEEE, IEEE , 2015Conference paper (Refereed)
Determining the true entity behind an ambiguousword is an NP-Hard problem known as Disambiguation. Previoussolutions often disambiguate a single ambiguous mention acrossmultiple documents. They assume each document contains onlya single ambiguous word and a rich set of unambiguous contextwords. However, nowadays we require fast disambiguation ofshort texts (like news feeds, reviews or Tweets) with few contextwords and multiple ambiguous words. In this research we focuson Multiple Disambiguation (MD) in contrast to Single Disambiguation(SD). Our solution is inspired by a recent algorithm developed for SD. The algorithm categorizes documents by first,transferring them into a graph and then, clustering the graphbased on its topological structure. We changed the graph-baseddocument-modeling of the algorithm, to account for MD. Also,we added a new parameter that controls the resolution of theclustering. Then, we used a supervised sampling approach formerging the clusters when appropriate. Our algorithm, comparedwith the original model, achieved 10% higher quality in termsof F1-Score using only 4% sampling from the dataset.
Place, publisher, year, edition, pages
IEEE , 2015.
IdentifiersURN: urn:nbn:se:kth:diva-184878DOI: 10.1109/Trustcom.2015.566OAI: oai:DiVA.org:kth-184878DiVA: diva2:917395
The 9th IEEE International Conference on Big Data Science and Engineering (IEEE BigDataSE-15)
QC 201604072016-04-062016-04-062016-05-30Bibliographically approved