kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Distributional Semantic Online Lexicon for Linguistic Explorations of Societies
Show others and affiliations
2022 (English)In: Social science computer review, ISSN 0894-4393, E-ISSN 1552-8286Article in journal (Refereed) Published
Abstract [en]

Linguistic Explorations of Societies (LES) is an interdisciplinary research project with scholars from the fields of political science, computer science, and computational linguistics. The overarching ambition of LES has been to contribute to the survey-based comparative scholarship by compiling and analyzing online text data within and between languages and countries. To this end, the project has developed an online semantic lexicon, which allows researchers to explore meanings and usages of words in online media across a substantial number of geo-coded languages. The lexicon covers data from approximately 140 language-country combinations and is, to our knowledge, the most extensive free research resource of its kind. Such a resource makes it possible to critically examine survey translations and identify discrepancies in order to modify and improve existing survey methodology, and its unique features further enable Internet researchers to study public debate online from a comparative perspective. In this article, we discuss the social scientific rationale for using online text data as a complement to survey data, and present the natural language processing-based methodology behind the lexicon including its underpinning theory and practical modeling. Finally, we engage in a critical reflection about the challenges of using online text data to gauge public opinion and political behavior across the world.

Place, publisher, year, edition, pages
SAGE Publications , 2022.
Keywords [en]
distributional semantics, natural language processing, word2vec, comparative surveys, language use, semantic similarities, Language Technology (Computational Linguistics), Språkteknologi (språkvetenskaplig databehandling)
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-322157DOI: 10.1177/08944393211049774ISI: 000787865700001Scopus ID: 2-s2.0-85130070813OAI: oai:DiVA.org:kth-322157DiVA, id: diva2:1715731
Note

QC 20221202

Available from: 2022-12-02 Created: 2022-12-02 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Quantifying Meaning
Open this publication in new window or tab >>Quantifying Meaning
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [sv]

Distributionella semantikmodeller är en klass av maskininlärningsmodeller med syfte att konstruera representationer som fångar semantik, i.e. mening, av objekt som bär mening på ett datadrivet sätt. Denna avhandling är särskilt inriktad på konstruktion av semantisk representation av ord, en strävan som har en lång historia inom datalingvistik och som sett dramatiska utvecklingar under de senaste åren.

Det primära forskningsmålet med denna avhandling är att utforska gränserna och tillämpningarna av distributionella semantikmodeller av ord, i.e. word embeddings. I synnerhet utforskar den relationen mellan modell- och embeddingsemantik, det vill säga hur modelldesign påverkar vad ord-embeddings innehåller, hur man resonerar om ord-embeddings, och hur egenskaperna hos modellen kan utnyttjas för att extrahera ny information från embeddings. Konkret introducerar vi topologiskt medvetna grannskapsfrågor som berikar den information som erhålls från grannskap extraherade från distributionella sematikmodeller, villkorade likhetsfrågor (och modeller som möjliggör dem), konceptutvinning från distributionella semantikmodeller, tillämpningar av embbeddningmodeller inom statsvetenskap, samt en grundlig utvärdering av en bred mängd av distributionella semantikmodeller.

Abstract [en]

Distributional semantic models are a class of machine learning models with the aim of constructing representations that capture the semantics, i.e. meaning, of objects that carry meaning in a data-driven fashion. This thesis is particularly concerned with the construction of semantic representations of words, an endeavour that has a long history in computational linguistics, and that has seen dramatic developments in recent years.

The primary research objective of this thesis is to explore the limits and applications of distributional semantic models of words, i.e. word embeddings. In particular, it explores the relation between model and embedding semantics, i.e. how model design influences what our embeddings encode, how to reason about embeddings, and how properties of the model can be exploited to extract novel information from embeddings. Concretely, we introduce topologically aware neighborhood queries that enrich the information gained from neighborhood queries on distributional semantic models, conditioned similarity queries (and models enabling them), concept extraction from distributional semantic models, applications of embedding models in the realm of political science, as well as a thorough evaluation of a broad range of distributional semantic models. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. p. 45
Series
TRITA-EECS-AVL ; 2023:2
National Category
Natural Language Processing
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-322262 (URN)978-91-8040-444-0 (ISBN)
Public defence
2023-01-17, Zoom: https://kth-se.zoom.us/j/66943302856, F3, Lindstedtsvägen 26, Stockholm, 09:00 (English)
Opponent
Supervisors
Note

QC 20221207

Available from: 2022-12-08 Created: 2022-12-07 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Gyllensten, Amaru Cuba

Search in DiVA

By author/editor
Gyllensten, Amaru Cuba
In the same journal
Social science computer review
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 51 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf