Word Classes in Language Modelling
2024 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
This thesis concerns itself with word classes and their application to language modelling.Considering a purely statistical Markov model trained on sequences of word classes in theSwedish language different problems in language engineering are examined. Problemsconsidered are part-of-speech tagging, evaluating text modifiers such as translators withthe help of probability measurements and matrix norms, and lastly detecting differenttypes of text using the Fourier transform of cross entropy sequences of word classes.The results show that the word class language model is quite weak by itself but that itis able to improve part-of-speech tagging for 1 and 2 letter models. There are indicationsthat a stronger word class model could aid 3-letter and potentially even stronger models.For evaluating modifiers the model is often able to distinguish between shuffled andsometimes translated text as well as to assign a score as to how much a text has beenmodified. Future work on this should however take better care to ensure large enoughtest data. The results from the Fourier approach indicate that a Fourier analysis of thecross entropy sequence between word classes may allow the model to distinguish betweenA.I. generated text as well as translated text from human written text. Future work onmachine learning word class models could be carried out to get further insights into therole of word class models in modern applications. The results could also give interestinginsights in linguistic research regarding word classes.
Place, publisher, year, edition, pages
2024.
Series
TRITA-SCI-GRU ; 2024:152
Keywords [en]
Word class, Language Model, POS-tagging, n-gram, Markov Model, Transition Matrix, Matrix norm, Cross Entropy, Discrete Fourier Transform
National Category
Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-349074OAI: oai:DiVA.org:kth-349074DiVA, id: diva2:1879630
Educational program
Master of Science in Engineering -Engineering Physics
Supervisors
Examiners
2024-06-282024-06-282024-06-28Bibliographically approved