kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-1643-1054
2021 (English)In: ACL-IJCNLP 2021: THE 59TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS AND THE 11TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING, VOL 2, ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2021, p. 130-140Conference paper, Published paper (Refereed)
Abstract [en]

Mechanisms for encoding positional information are central for transformer-based language models. In this paper, we analyze the position embeddings of existing language models, finding strong evidence of translation invariance, both for the embeddings themselves and for their effect on self-attention. The degree of translation invariance increases during training and correlates positively with model performance. Our findings lead us to propose translation-invariant self-attention (TISA), which accounts for the relative position between tokens in an interpretable fashion without needing conventional position embeddings. Our proposal has several theoretical advantages over existing position-representation approaches. Experiments show that it improves on regular ALBERT on GLUE tasks, while only adding orders of magnitude less positional parameters.

Place, publisher, year, edition, pages
ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2021. p. 130-140
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-303516ISI: 000694699200018Scopus ID: 2-s2.0-85119103317OAI: oai:DiVA.org:kth-303516DiVA, id: diva2:1603506
Conference
Joint Conference of 59th Annual Meeting of the Association-for-Computational-Linguistics (ACL) / 11th International Joint Conference on Natural Language Processing (IJCNLP) / 6th Workshop on Representation Learning for NLP (RepL4NLP), AUG 01-06, 2021, ELECTR NETWORK
Note

Part of proceedings: ISBN 978-1-954085-53-4, QC 20230117

Available from: 2021-10-15 Created: 2021-10-15 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Wennberg, UlmeHenter, Gustav Eje

Search in DiVA

By author/editor
Wennberg, UlmeHenter, Gustav Eje
By organisation
Speech, Music and Hearing, TMH
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 79 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf