Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike
KTH, School of Information and Communication Technology (ICT), Computer and Systems Sciences, DSV. Stockholm University, Sweden.
2009 (English)In: ACL-IJCNLP 2009 - Joint Conf. of the 47th Annual Meeting of the Association for Computational Linguistics and 4th Int. Joint Conf. on Natural Language Processing of the AFNLP, Proceedings of the Conf., 2009, 145-153 p.Conference paper, Published paper (Refereed)
Abstract [en]

We propose a method to automatically train lemmatization rules that handle prefix, infix and suffix changes to generate the lemma from the full form of a word. We explain how the lemmatization rules are created and how the lemmatizer works. We trained this lemmatizer on Danish, Dutch, English, German, Greek, Icelandic, Norwegian, Polish, Slovene and Swedish full form-lemma pairs respectively. We obtained significant improvements of 24 percent for Polish, 2.3 percent for Dutch, 1.5 percent for English, 1.2 percent for German and 1.0 percent for Swedish compared to plain suffix lemmatization using a suffix-only lemmatizer. Icelandic deteriorated with 1.9 percent. We also made an observation regarding the number of produced lemmatization rules as a function of the number of training pairs.

Place, publisher, year, edition, pages
2009. 145-153 p.
Keyword [en]
Automatic training, Icelandics, Lemmatization, Morphological changes, Natural language processing systems, Computational linguistics
National Category
Other Computer and Information Science
Identifiers
URN: urn:nbn:se:kth:diva-152006Scopus ID: 2-s2.0-84859911010ISBN: 978-161738258-1 (print)OAI: oai:DiVA.org:kth-152006DiVA: diva2:749147
Conference
Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and 4th International Joint Conference on Natural Language Processing of the AFNLP, ACL-IJCNLP 2009, 2 August 2009 through 7 August 2009, Suntec, Singapore
Note

QC 20140923

Available from: 2014-09-23 Created: 2014-09-23 Last updated: 2014-09-23Bibliographically approved

Open Access in DiVA

No full text

Scopus

Search in DiVA

By author/editor
Dalianis, Hercules
By organisation
Computer and Systems Sciences, DSV
Other Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 9 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf