kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring Internal Numeracy in Language Models: A Case Study on ALBERT
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-7919-0166
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-1643-1054
2024 (English)In: MathNLP 2024: 2nd Workshop on Mathematical Natural Language Processing at LREC-COLING 2024 - Workshop Proceedings, European Language Resources Association (ELRA) , 2024, p. 35-40Conference paper, Published paper (Refereed)
Abstract [en]

It has been found that Transformer-based language models have the ability to perform basic quantitative reasoning. In this paper, we propose a method for studying how these models internally represent numerical data, and use our proposal to analyze the ALBERT family of language models. Specifically, we extract the learned embeddings these models use to represent tokens that correspond to numbers and ordinals, and subject these embeddings to Principal Component Analysis (PCA). PCA results reveal that ALBERT models of different sizes, trained and initialized separately, consistently learn to use the axes of greatest variation to represent the approximate ordering of various numerical concepts. Numerals and their textual counterparts are represented in separate clusters, but increase along the same direction in 2D space. Our findings illustrate that language models, trained purely to model text, can intuit basic mathematical concepts, opening avenues for NLP applications that intersect with quantitative reasoning.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA) , 2024. p. 35-40
Keywords [en]
Language models, Numerals in NLP, Numerical data representation, PCA, Transformer-based models, Word embeddings
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-347702Scopus ID: 2-s2.0-85195172546OAI: oai:DiVA.org:kth-347702DiVA, id: diva2:1869419
Conference
2nd Workshop on Mathematical Natural Language Processing, MathNLP 2024, Torino, Italy, May 21 2024
Note

QC 20240613

Part of ISBN 978-249381422-7

Available from: 2024-06-13 Created: 2024-06-13 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Wennberg, UlmeHenter, Gustav Eje

Search in DiVA

By author/editor
Wennberg, UlmeHenter, Gustav Eje
By organisation
Speech, Music and Hearing, TMH
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 45 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf