Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Inferring the location of authors from words in their texts
Gavagai.
KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.ORCID iD: 0000-0003-4042-4919
Stockholms universitet.
Stockholms universitet.
2015 (English)In: Proceedings of the 20th Nordic Conference of Computational Linguistics, Linköping University Electronic Press, 2015Conference paper, Published paper (Refereed)
Abstract [en]

For the purposes of computational dialec- tology or other geographically bound text analysis tasks, texts must be annotated with their or their authors’ location. Many texts are locatable but most have no ex- plicit annotation of place. This paper describes a series of experiments to de- termine how positionally annotated mi- croblog posts can be used to learn loca- tion indicating words which then can be used to locate blog texts and their authors. A Gaussian distribution is used to model the locational qualities of words. We in- troduce the notion of placeness to describe how locational words are.

We find that modelling word distributions to account for several locations and thus several Gaussian distributions per word, defining a filter which picks out words with high placeness based on their local distributional context, and aggregating lo- cational information in a centroid for each text gives the most useful results. The re- sults are applied to data in the Swedish language. 

Place, publisher, year, edition, pages
Linköping University Electronic Press, 2015.
Series
Linköping Electronic Conference Proceedings, ISSN 1650-3740 ; 109
National Category
General Language Studies and Linguistics
Research subject
Information and Communication Technology
Identifiers
URN: urn:nbn:se:kth:diva-169619ISBN: 978-91-7519-098-3 (print)OAI: oai:DiVA.org:kth-169619DiVA: diva2:823404
Conference
NoDaLiDa,May 11–13, 2015 in Vilnius, Lithuania
Projects
SINUS (Spridning av innovationer i nutida svenska)
Funder
Swedish Research Council
Note

Qc 20150618

Available from: 2015-06-18 Created: 2015-06-18 Last updated: 2015-06-18Bibliographically approved

Open Access in DiVA

fulltext(26529 kB)101 downloads
File information
File name FULLTEXT01.pdfFile size 26529 kBChecksum SHA-512
7d751056fd05248cc1de77ed1a51d9e35a4e74e67fb7d9533bcc26904869da99fe1e15f580d131b132d526944b2f1ee26e31dec9cdb0cf439bf0c47649bac6b3
Type fulltextMimetype application/pdf

Other links

http://aclweb.org/anthology/W/W15/W15-1826.pdfConference website

Authority records BETA

Karlgren, Jussi

Search in DiVA

By author/editor
Karlgren, Jussi
By organisation
Theoretical Computer Science, TCS
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 101 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 543 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf