Change search
ReferencesLink to record
Permanent link

Direct link
A Document Recommender Based on Word Embedding
KTH, School of Electrical Engineering (EES).
2015 (English)Independent thesis Advanced level (degree of Master (Two Years)), 80 credits / 120 HE creditsStudent thesis
Abstract [en]

With the booming development of information technology, text information is not only remained in paper-based forms, but also in digital forms which have been distributed all over internet. Massive information on the internet provides us so many options while at the same time makes it hard for us to choose which detail information we exactly need. The appearance of media monitoring is going to change the situation and help solve the problem. Meltwater group as a media monitoring company provides a service of tracking and sorting information to enterprises and help them to achieve business goals. These goals may include finding the best time or place to do business campaign and knowing the dynamic information about the competitors.

There is a recommender system in Meltwater. When a query has been searched, the corresponding documents which are searched from the database will be presented. The problem for the system is that some of the documents have beenturned out to be misclassified and the correctness rate for the recommendation isnot that high. To help solve this problem and make the search better, this paper will introduce a new algorithm which is based on word embedding approach and users’ supervision. The background information of Meltwater group and its existing frame of recommender system will be specifically illustrated at the beginning of the paper. Followed by it will be the exploration of background methods which include LSA (Latent Semantic Analysis), Random Indexing and Word2vec. Besides, the necessary tools such as T-SNE, K-means clustering and hierarchy clustering will also be mentioned in this part.

The data sets that are going to be used in this paper will be described after thepart of background methods. Information such as the introduction of the data and the dealing of it will be mentioned in a detail way. The description of the algorithm will appear in the middle of the paper with detail steps. Followed by it is the evaluation. The algorithm will be evaluated by using several different data sets and the confusion matrix will be used as a means of measurement. Finally, a summary of the method as well as future suggestions will be made at the end of the paper.

Place, publisher, year, edition, pages
2015. , 60 p.
TRITA-EE, ISSN 1653-5146 ; 2015:66
Keyword [en]
natural language processing, recommender system, clustering
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
URN: urn:nbn:se:kth:diva-183502OAI: diva2:911873
Educational program
Master of Science in Engineering - Information and Communication Technology
2015-10-02, A:367, Osquldas väg 10, Stockholm, 12:00 (English)
Available from: 2016-05-17 Created: 2016-03-14 Last updated: 2016-05-17Bibliographically approved

Open Access in DiVA

fulltext(820 kB)49 downloads
File information
File name FULLTEXT01.pdfFile size 820 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
He, Binlai
By organisation
School of Electrical Engineering (EES)
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 49 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 17 hits
ReferencesLink to record
Permanent link

Direct link