Global Evaluation of Random Indexing through Swedish Word Clustering Compared to the People’s Dictionary of Synonyms
2009 (English)In: Proceedings of the International Conference RANLP-2009, 2009, 376-380 p.Conference paper (Refereed)
Evaluation of word space models is usually local in the sense that it only considers words that are deemed very similar by the model. We propose a global evaluation scheme based on clustering of the words. A clustering of high quality in an external evaluation against a semantic resource, such as a dictionary of synonyms, indicates a word space model of high quality. We use Random Indexing to create several different models and compare them by clustering evaluation against the People's Dictionary of Synonyms, a list of Swedish synonyms that are graded by the public. Most notably we get better results for models based on syntagmatic information (words that appear together) than for models based on paradigmatic information (words that appear in similar contexts). This is quite contrary to previous results that have been presented for local evaluation. Clusterings to ten clusters result in a recall of 83% for a syntagmatic model, compared to 34% for a comparable paradigmatic model, and 10% for a random partition.
Place, publisher, year, edition, pages
2009. 376-380 p.
, International Conference Recent Advances in Natural Language Processing, RANLP, ISSN 1313-8502
Random Indexing, Word Space Model, Word Clustering, Evaluation, Dictionary of Synonyms
Computer and Information Science Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:kth:diva-10125ScopusID: 2-s2.0-84866846352OAI: oai:DiVA.org:kth-10125DiVA: diva2:209222
International Conference on Recent Advances in Natural Language Processing, RANLP-2009; Borovets; Bulgaria; 14 September 2009 through 16 September 2009
QC 201008062009-03-242009-03-242014-09-24Bibliographically approved