Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian
Dept of Computer and System Sciences, Stockholm Univ, Sweden.
Dept of Computer and System Sciences, Stockholm Univ, Sweden.
KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.ORCID iD: 0000-0003-3199-8953
2009 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents how we adapted awebsite search engine for cross languageinformation retrieval, using theUplug word alignment tool for parallelcorpora.We first studied the monolingualsearch queries posed by the visitors ofthe website of the Nordic council containingfive different languages. In orderto compare how well different types ofbilingual dictionaries covered the mostcommon queries and terms on the websitewe tried a collection of ordinary bilingualdictionaries, a small manuallyconstructed trilingual dictionary and anautomatically constructed trilingual dictionary,constructed from the news corpusin the website using Uplug. The precisionand recall of the automaticallyconstructed Swedish-English dictionaryusing Uplug were 71 and 93 percent, respectively.We found that precision andrecall increase significantly in sampleswith high word frequency, but we couldnot confirm that POS-tags improve precision.The collection of ordinary dictionaries,consisting of about 200 000words, only cover 41 of the top 100search queries at the website. The automaticallybuilt trilingual dictionary combinedwith the small manually built trilingualdictionary, consisting of about2 300 words, and cover 36 of the topsearch queries.

Place, publisher, year, edition, pages
2009.
Keyword [en]
Cross language information retrieval, parallel corpora, wordalignment, Swedish, Danish, Norwegian
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-62902OAI: oai:DiVA.org:kth-62902DiVA: diva2:481322
Conference
Workshop: The Automatic Treatment of Multilinguality in Retrieval, Search and Lexicography, Copenhagen, Denmark
Note
QC 20120125Available from: 2012-01-20 Created: 2012-01-20 Last updated: 2012-01-25Bibliographically approved

Open Access in DiVA

No full text

Authority records BETA

Kann, Viggo

Search in DiVA

By author/editor
Dalianis, HerculesKann, Viggo
By organisation
Numerical Analysis and Computer Science, NADA
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 40 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf