Change search
ReferencesLink to record
Permanent link

Direct link
Enhancing Relevant Region Classifying
KTH, School of Information and Communication Technology (ICT).
2011 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In this thesis we present a new way of extracting relevant data from texts. We use the method presented in the paper by Patwardhan and Rilo (2007), with improvements of our own.

Our approach modifes the input to the support vector machine, to construct a self-trained relevant sentence classi er. This classffer is used to identify relevant sentences on the MUC-4 terrorism corpus.

We modify the input by removing stopwords, converting words to its stem and only using words that occur at least three times in the corpus. We also changed how each word is weighted, using TF x IDF as weighting function.

By using the relevant sentence classiffer together with domain relevant extraction patterns, we achieved higher performance on the MUC-4 terrorism corpus than the original model.

Place, publisher, year, edition, pages
2011. , 54 p.
Trita-ICT-EX, 52
Keyword [en]
Natural Language processing, Information Extraction, Support Vector Machine, Pattern Extraction
National Category
Software Engineering
URN: urn:nbn:se:kth:diva-32661OAI: diva2:411324
Available from: 2011-04-18 Created: 2011-04-18 Last updated: 2011-05-11Bibliographically approved

Open Access in DiVA

fulltext(955 kB)257 downloads
File information
File name FULLTEXT01.pdfFile size 955 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Software Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 257 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 104 hits
ReferencesLink to record
Permanent link

Direct link