Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Classification of Speech Acts in Discussion Threads.
KTH, School of Computer Science and Communication (CSC).
2011 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

In this thesis the prospect of classifying parts of internet discussion threads as different speech acts, such as questions and answers, are examined. The approach is using different machine-learning algorithms such as decision trees and support vector machines (SVM) coupled with different kind of feature selections.

Most of the work was focused on finding an appropriate set of features that would be on the right level of complexity for determining the speech act. Methods that are examined are N-grams of part of speech, word patterns, ratios of common words and various statistical features.

The result showed that with a relatively small training set it was possible to get fairly good results(about 60% correct classifications) depending on the conditions. It was also found that there are quite big performance differences for individual speech acts and classifiers.

Abstract [sv]

I det här exjobbet är uppdraget att undersöka hur man kan klassifiera delar av diskussionstrådar i olika tal-akter, som till exempel frågor och svar. Inriktningen är att använda olika maskininlärningstekniker som beslutsträd och SVM tillsammans med olika val av egenskaper.

Den största delen av arbetet har lagts ned på att finna en bra uppsättning av egenskaper som ska användas för klassifieringen. Undersökta metoder inkluderar n-gram med ordklasser, ordmönster, ratios av ofta förekommande ord och diverse statistiska mått.

Resultatet visade att man med ett relativt begränsat set av testdata, kunde uppnå ett ganska bra resultat (60 % korrekta klassifieringar) beroende på förutsättningar. Det gick också att utläsa stora skillnader mellan olika algoritmer och egenskaper.

Place, publisher, year, edition, pages
2011.
Series
Trita-CSC-E, ISSN 1653-5715 ; 2011:128
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-130703OAI: oai:DiVA.org:kth-130703DiVA: diva2:654150
Educational program
Master of Science in Engineering - Computer Science and Technology
Uppsok
Technology
Supervisors
Examiners
Available from: 2013-10-07 Created: 2013-10-07

Open Access in DiVA

No full text

Other links

http://www.nada.kth.se/utbildning/grukth/exjobb/rapportlistor/2011/rapporter11/gustafson_erik_11128.pdf
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 26 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf