Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Named Entity Recognition with Support Vector Machines
KTH, School of Computer Science and Communication (CSC).
2013 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

This report describes a degree project in Computer Science,

the aim of which was to construct a system for Named Entity

Recognition in Swedish texts of names of people, locations

and organizations, as well as expressions for time. This system

was constructed from the part-of-speech tagger

Granska

and the Support Vector Machine system

SVMlin. The completed

system was trained to recognize Named Entities by analyzing

patterns in training corpora consisting of lists of example

words belonging to each category. The system was initially

trained to recognize patterns based on individual characters

in words, but was later rewritten to recognize other

characteristics of individual words such as the types of characters

the words contained. When evaluating the system, it

was determined that no incarnation of the system managed to

perform satisfactorily when tested to recognize Named Entities

of the aforementioned categories. A possible reason for

this is that three of the categories, i.e. names of people, names

of locations and names of organizations have few or no distinguishing

features between them, which might warrant more

research. The system proved apt when tested with solving

the related problem of distinguishing email addresses from

other named entities, indicating that the system might be of

use in some cases of Named Entity Recognition.

Abstract [sv]

Denna rapport beskriver ett examensarbete inom datalogi, målet med vilket var att konstruera ett system för igenkänning i svensk text av Named Entities för personnamn, platsnamn och namn på organisationer, samt tidsangivelser. Systemet konstruerades utgående från part-of-speech-taggaren Granska samt supportvektormaskinsystemet SVMlin. Det färdiga systemet tränades att känna igen Named Entities genom att analysera mönster i träningscorpora bestående av listor på exempelord tillhörande varje kategori. Systemet tränades först att känna igen mönster baserade på enskilda tecken i ord, men skrevs sedan om för att känna igen andra karakteristika hos enskilda ord såsom vilka slags tecken de innehåller.

När systemet evaluerades framkom att ingen version

av det fungerade tillfredsställande när det testades att känna

igen Named Entities av ovan nämnda kategorier. En möjlig

orsak till detta kan vara att tre av kategorierna, personnamn,

platsnamn och namn på organisationer har få eller inga inneboende

skillnader sinsemellan, vilket kan bli grund till mer

forskning. Systemet visade sig dugligt när det prövades att

lösa det relaterade problemet att särskilja mailadresser från

andra Named Entities, vilket kan tyda på att systemet kan användas

för viss typ av igenkänning av Named Entities.

Place, publisher, year, edition, pages
2013.
Series
TRITA-CSC-E, ISSN 1653-5715 ; 13:112
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-138012OAI: oai:DiVA.org:kth-138012DiVA: diva2:680232
Educational program
Master of Science in Engineering - Computer Science and Technology
Supervisors
Examiners
Available from: 2013-12-17 Created: 2013-12-17 Last updated: 2015-08-05Bibliographically approved

Open Access in DiVA

Named Entity Recognition with Support Vector Machines(619 kB)67 downloads
File information
File name FULLTEXT01.pdfFile size 619 kBChecksum SHA-512
397ad78b7355188e428b4d7fe37e299852b59f4b13b6e871cf7dfb8e642e0598438ffb369e372555c72eed2074b87d6afc61156b585924d24678a39d0c5da842
Type fulltextMimetype application/pdf

Other links

http://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2013/rapporter13/mickelin_joel_13011.pdf
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 67 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 235 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf