Change search
ReferencesLink to record
Permanent link

Direct link
Document and Image Classification withTopic Ngram Model
KTH, School of Computer Science and Communication (CSC).
2014 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Latent Dirichlet Allocation (LDA) is a popular probabilistic model for information retrieval. Many extended models based on LDA have been introduced during the past 10 years. In LDA, a data point is represented as a bag (multiset)of words. In the text case, a word is a regular text word, but other types of data can also be represented as words (e.g. visual words). Due to the bag-of-words assumption, the original LDA neglects the structure of thedata, i.e., all the relationships between words, which leads to information loss. As a matter of fact, the spatial relationship is important and useful. In order to explore the importance of the relationship, we focus on an extensionof LDA called Topic Ngram Model, which models the relationship among adjacent words. In this thesis, we first implement the model and use it in for text classification. Furthermore, we propose a 2D extension, which enables us to model spatial relationships of features in images.

Place, publisher, year, edition, pages
National Category
Computer Science
URN: urn:nbn:se:kth:diva-155771OAI: diva2:762932
Available from: 2014-11-20 Created: 2014-11-13 Last updated: 2014-11-20Bibliographically approved

Open Access in DiVA

fulltext(1523 kB)11784 downloads
File information
File name FULLTEXT01.pdfFile size 1523 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 11784 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 102 hits
ReferencesLink to record
Permanent link

Direct link