Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Distributionell karaktär hos vissa kategorier av ord
KTH, School of Engineering Sciences (SCI).
KTH, School of Engineering Sciences (SCI).
2014 (Swedish)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

The amount of information stored on the internet grows daily and naturally the requirements on the systems used to search for and analyse information increases. As a part in meeting the raised requirements this study investigates if it is possible for a automatised text analysis system to distinguish certain groups and categories of words in a text, and more specifically investigate if it is possible to distinguish words with a high information value from words with a low information value. This is important to enable optimizations of systems for global surveillance and information retrieval. The study is carried out using word spaces, which are often used in text analysis to model language. The distributional character of certain categories of words is examined by studying the intrinsic dimensionality of the space, locally around different words. Based on the result from the study of the intrinsic dimensionality, where there seems to be differences in the distributional character between categories of words, an algorithm is implemented for classifying words based on the dimensionality data. The classification algorithm is tested for different categories. The result strengthens the thesis that there could exist useful differences between the distributional character of different categories of words.

Abstract [sv]

I takt med att allt mer information finns tillgänglig på internet växer kraven som ställs på system som används för att söka efter och analysera information. I den här rapporten undersöks huruvida det är möjligt för ett systemför automatiserad textanalys att avgöra vilka ord som är relevanta och informationsbärande i ett sammanhang. Detta är viktigt för att möjlig göra optimering och effektivisering av exempelvis informationssöknings- och omvärldsbevakningssystem. Undersökningen genomförs med hjälp av ordrumsmodeller för att modellera språk. Den distributionella karaktären hos termerna undersöks genom att studeraden intrinsiska dimensionaliteten lokalt i rummet kring olika termer. Baserat på resultaten av denna undersökning, som tycks visa på att det fanns skillnader i den distributionella karaktären hos olika kategorier av ord, implementeras en algoritm för att klassificera ord baserat på dimensionaliteten. Klassificeringsalgoritmen testas för olika kategorier. Resultatet stärker tesen om att det kan finnas vissa användbara skillnader mellan den distributionella karaktären hos olika kategorier av ord.

Place, publisher, year, edition, pages
2014. , 66 p.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-153782OAI: oai:DiVA.org:kth-153782DiVA: diva2:753691
Educational program
Master of Science in Engineering -Engineering Physics
Supervisors
Available from: 2014-10-08 Created: 2014-10-08 Last updated: 2014-10-09Bibliographically approved

Open Access in DiVA

Martin Bohman & Emelie Kullmann kandidatexam(917 kB)82 downloads
File information
File name FULLTEXT01.pdfFile size 917 kBChecksum SHA-512
4f191d5da172e3c4e3d451c189c7b73a749a427482a6e553505d6587df27fff2412cf0d6331aa1fb8fbbc178a376fd4f2419dbf2f0e81e37ac216fd15788d6ed
Type fulltextMimetype application/pdf

By organisation
School of Engineering Sciences (SCI)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 82 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 341 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf