Change search
ReferencesLink to record
Permanent link

Direct link
Combining Lexicon- and Learning-based Approaches for Improved Performance and Convenience in Sentiment Classification
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2015 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Sentiment classification is the process of categorizing data into categories based on its polarity with a wide array of applications across several industries. This report examines a combination of two prominent approaches to sentiment classification using a lexicon of weighted words and machine learning respectively. These approaches are compared with the combined hybrid approach in order to give an account of their relative strengths and weaknesses. When run on a set of IMDb movie reviews the results indicate that the hybrid model performs better than the lexicon-based approach, in turn being outperformed by the learning-based approach. However, the gain in convenience brought on by eliminating the need for training data makes the hybrid model an appealing alternative to the other approaches with a slight trade-off in performance.

Abstract [sv]

Att klassificera text i kategorier baserat på känslan de uttrycker är ett aktuellt område idag och kan tillämpas inom många industrier. Rapporten undersöker en kombination av de två framstående tillvägagångssätten till denna typ av klassificering baserade på ett lexikon med definerade ordvikter respektive maskininlärning. Denna hybridlösning jämförs mot de två andra tillvägagångssätten för att framlägga deras relativa styrkor och svagheter. På ett dataset med filmrecensioner från IMDb får maskininlärningsklassificeraren bäst resultat, följt av hybridlösningen och sist den lexikonbaserade lösningen. Trots det kan hybridlösningen vara att föredra i situationer där det är ogenomförbart eller oskäligt att förbereda träningsdata för maskininlärningsklassificeraren, dock med ett visst avkall på prestanda.

Place, publisher, year, edition, pages
Keyword [en]
sentiment classification, sentiment analysis, opinion mining, lexicon, machine learning, hybrid, movie reviews, social media
National Category
Computer Science
URN: urn:nbn:se:kth:diva-166430OAI: diva2:811021
Available from: 2015-05-28 Created: 2015-05-09 Last updated: 2015-05-28Bibliographically approved

Open Access in DiVA

fulltext(1656 kB)290 downloads
File information
File name FULLTEXT01.pdfFile size 1656 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Sommar, FredrikWielondek, Milosz
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 290 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 583 hits
ReferencesLink to record
Permanent link

Direct link