kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Integrating Syntactic Structure in Transformer Language Models: A Study of Part-of-Speech Augmentation
KTH, School of Engineering Sciences (SCI).
2025 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

In modern natural language processing tasks, the transformer architecture has set significant benchmarks across common language modeling metrics. Transformer-based language models typically process linguistic information implicitly through self-attention mechanisms, offering little insight into linguistic priors such as syntactic structure. This thesis investigates the enhancement of transformer performance via explicit incorporation of Part of-Speech (POS), a form of syntactic knowledge. Specifically, we examine different methods of fusing tokens with POS in the embedding layer to provide a computationally inexpensive injection method. We amend a transformer-decoder with dense POS embeddings and compare it against a baseline model on the WikiText-2 dataset in terms of perplexity and next-token prediction. Our experiments indicate that integrating POS information through concatenation and projection leads to statistically significant improvement, decreasing perplexity by approximately 15.1% (from 42.70 to 36.26) and increasing next-token prediction accuracy by 0.69 percentage points (from 55.18% to 55.86%) on the WikiText-2 test set. We find the concatenation and projection method particularly effective, outperforming simple element-wise addition. We discuss the findings, linking the performance gains to the role of POS tags as an inductive bias within an information-theoretic framework. Furthermore, we provide methodological details for aligning tags with subword tokens and acknowledge limitations concerning model scale and tagger accuracy. This research provides evidence and implementation insights for incorporating explicit syntactic knowledge through lightweight embedding fusion into decoder-only transformer architectures.

Place, publisher, year, edition, pages
2025.
Series
TRITA-SCI-GRU ; 2025:178
Keywords [en]
transformer, language modeling, part-of-speech tagging, syntactic structure, linguistic inductive bias, embedding fusion
National Category
Mathematical sciences
Identifiers
URN: urn:nbn:se:kth:diva-365864OAI: oai:DiVA.org:kth-365864DiVA, id: diva2:1979811
External cooperation
Stanford University
Subject / course
Mathematics
Educational program
Master of Science in Engineering -Engineering Physics
Supervisors
Available from: 2025-07-01 Created: 2025-07-01 Last updated: 2025-07-01Bibliographically approved

Open Access in DiVA

fulltext(2744 kB)245 downloads
File information
File name FULLTEXT01.pdfFile size 2744 kBChecksum SHA-512
f6567370fddd31542a3bfec5be1e74979510c093a63a82e72988edaed11439c0b5b4e3249608add99a01536bdd29d04076a85be12f3924c715b0c5851dc742bd
Type fulltextMimetype application/pdf

By organisation
School of Engineering Sciences (SCI)
Mathematical sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 245 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 265 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf