Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Utvärdering av nyckelordsbaserad textkategoriseringsalgoritmer
KTH, Skolan för elektroteknik och datavetenskap (EECS), Programvaruteknik och datorsystem, SCS.
2016 (svensk)Independent thesis Advanced level (degree of Master (Two Years)), 20 poäng / 30 hpOppgave
Abstract [en]

Supervised learning algorithms have been used for automatic text categoriza- tion with very good results. But supervised learning requires a large amount of manually labeled training data and this is a serious limitation for many practical applications. Keyword-based text categorization does not require manually la- beled training data and has therefore been presented as an attractive alternative to supervised learning. The aim of this study is to explore if there are other li- mitations for using keyword-based text categorization in industrial applications. This study also tests if a new lexical resource, based on the paradigmatic rela- tions between words, could be used to improve existing keyword-based text ca- tegorization algorithms. An industry motivated use case was created to measure practical applicability. The results showed that none of five examined algorithms was able to meet the requirements in the industrial motivated use case. But it was possible to modify one algorithm proposed by Liebeskind et.al. (2015) to meet the requirements. The new lexical resource produced relevant keywords for text categorization but there was still a large variance in the algorithm’s capaci- ty to correctly categorize different text categories. The categorization capacity was also generally too low to meet the requirements in many practical applica- tions. Further studies are needed to explore how the algorithm’s categorization capacity could be improved. 

sted, utgiver, år, opplag, sider
2016.
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-222164OAI: oai:DiVA.org:kth-222164DiVA, id: diva2:1179709
Eksternt samarbeid
Gavagai
Fag / kurs
Computer Science
Utdanningsprogram
Master of Science in Engineering - Computer Science and Technology
Veileder
Examiner
Tilgjengelig fra: 2018-02-06 Laget: 2018-02-02 Sist oppdatert: 2018-02-06bibliografisk kontrollert

Open Access i DiVA

fulltext(420 kB)39 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 420 kBChecksum SHA-512
a73c06a131cb123142465b6463fdbbb9428707d369ac0b21afdc535bd789c06dbf1b0ca6e05d26e42e4acba93a4271a3413cad5fb028fb2017759f7e84349d59
Type fulltextMimetype application/pdf

Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 39 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 150 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf