Inferring the location of authors from words in their texts
2015 (English)In: Proceedings of the 20th Nordic Conference of Computational Linguistics, Linköping University Electronic Press, 2015Conference paper (Refereed)
For the purposes of computational dialec- tology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to de- termine how positionally annotated mi- croblog posts can be used to learn loca- tion indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We in- troduce the notion of placeness to describe how locational words are.
We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating lo- cational information in a centroid for each text gives the most useful results. The re- sults are applied to data in the Swedish language.
Place, publisher, year, edition, pages
Linköping University Electronic Press, 2015.
, Linköping Electronic Conference Proceedings, ISSN 1650-3740 ; 109
General Language Studies and Linguistics
Research subject Information and Communication Technology
IdentifiersURN: urn:nbn:se:kth:diva-169619ISBN: 978-91-7519-098-3OAI: oai:DiVA.org:kth-169619DiVA: diva2:823404
NoDaLiDa,May 11–13, 2015 in Vilnius, Lithuania
ProjectsSINUS (Spridning av innovationer i nutida svenska)
FunderSwedish Research Council
Qc 201506182015-06-182015-06-182015-06-18Bibliographically approved