Change search
ReferencesLink to record
Permanent link

Direct link
Annotated Search – Indexing, searching and ranking within annotated Wikipedia information boxes.
KTH, School of Computer Science and Communication (CSC).
2011 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The main focus of Natural Language Processing has been aimed to understanding texts better, but little work has been aimed toward finding good search results to a query, given annotated data. This is the problem I have focused on.

This thesis discuss both how to index annotated data, in which cases a search engine over annotated data offer better search results than a regular full text search engine, how the ranking function differ between annotated data and unstructured data search and how to evaluate a annotated search engine.

I created a search engine over the semantically annotated Wikipedia information boxes and a baseline full-text search system over the same data. The thesis show that with some simple work, a annotated search engine can improve the performance with between 17 and 27 percent compared to the baseline even on a diverse data collection such as the Wikipedia information boxes.

Abstract [sv]

Inom språkteknologi har man länge fokuserat på textförståelse för att kunna skapa intelligenta språkteknologiska system. Tyvärr har inte lika mycket tid har vigts åt att hitta bra sökresultat till annoterad data.

Den här rapporten kommer att behandla sök inom annoterad data och diskutera hur annoterad data ska indexeras, i vilka fall en sökmotor över annoterad data ger bättre resultat än en vanlig fulltextsökmotor, hur rankningsfunktionen skiljer sig mellan annoterad sök och fulltextsök samt hur man kan evaluera en sökmotor över annoterad data.

Jag har skapat en sökmotor över Wikipedias annoterade informationsboxar och ett bassystem som imiterar en fulltextsökmotor över samma data. Rapporten visar att att sökresultaten kan förbättras med mellan 17 och 27 procent jämfört med bassystemet. Denna förbättring kunde påvisas trots Wikipedias fria formuleringar i informationsboxarna.

Place, publisher, year, edition, pages
Trita-CSC-E, ISSN 1653-5715 ; 2011:036
National Category
Computer Science
URN: urn:nbn:se:kth:diva-130772OAI: diva2:654219
Educational program
Master of Science in Engineering - Computer Science and Technology
Available from: 2013-10-07 Created: 2013-10-07

Open Access in DiVA

No full text

Other links
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 52 hits
ReferencesLink to record
Permanent link

Direct link