Change search
ReferencesLink to record
Permanent link

Direct link
Using Uplug and SiteSeeker to construct a cross language search engine for Scandinavian
Dept of Computer and System Sciences, Stockholm Univ, Sweden.
Dept of Computer and System Sciences, Stockholm Univ, Sweden.
KTH, School of Computer Science and Communication (CSC), Numerical Analysis and Computer Science, NADA.ORCID iD: 0000-0003-3199-8953
2009 (English)Conference paper (Refereed)
Abstract [en]

This paper presents how we adapted awebsite search engine for cross languageinformation retrieval, using theUplug word alignment tool for parallelcorpora.We first studied the monolingualsearch queries posed by the visitors ofthe website of the Nordic council containingfive different languages. In orderto compare how well different types ofbilingual dictionaries covered the mostcommon queries and terms on the websitewe tried a collection of ordinary bilingualdictionaries, a small manuallyconstructed trilingual dictionary and anautomatically constructed trilingual dictionary,constructed from the news corpusin the website using Uplug. The precisionand recall of the automaticallyconstructed Swedish-English dictionaryusing Uplug were 71 and 93 percent, respectively.We found that precision andrecall increase significantly in sampleswith high word frequency, but we couldnot confirm that POS-tags improve precision.The collection of ordinary dictionaries,consisting of about 200 000words, only cover 41 of the top 100search queries at the website. The automaticallybuilt trilingual dictionary combinedwith the small manually built trilingualdictionary, consisting of about2 300 words, and cover 36 of the topsearch queries.

Place, publisher, year, edition, pages
Keyword [en]
Cross language information retrieval, parallel corpora, wordalignment, Swedish, Danish, Norwegian
National Category
Language Technology (Computational Linguistics)
URN: urn:nbn:se:kth:diva-62902OAI: diva2:481322
Workshop: The Automatic Treatment of Multilinguality in Retrieval, Search and Lexicography, Copenhagen, Denmark
QC 20120125Available from: 2012-01-20 Created: 2012-01-20 Last updated: 2012-01-25Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Dalianis, HerculesKann, Viggo
By organisation
Numerical Analysis and Computer Science, NADA
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 31 hits
ReferencesLink to record
Permanent link

Direct link