Change search
ReferencesLink to record
Permanent link

Direct link
Evaluation of a Prototype for Relevance Profiling
KTH, School of Information and Communication Technology (ICT).
2013 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Only a small portion of the amounts of information generated online are relevant to a given person.

In this thesis, a prototype for determining a relevance value based on sets of data for some topic is evaluated to determine its viability in a future product called Votia.

To achieve this, an evaluation model was dened based on \accuracy" and \eciency" for various machine learning algorithms applied to various types of data found in a tweet - a short user message on the Twitter platform | such as the message, relations between users and the tweeter, users' general behavior characteristics, and geographic data. A system was set up to fetch and convert Twitter data into data tting the prototype, with the hypothesis that (1) the Twitter data model could be mapped into the Votia data model, from which user behavior could be predicted at an adequate accuracy, and that (2) user behavior could be predicted to some degree from isolated sets of data.

Data from Twitter was obtained by taking a random sample of users - the main actors - and then loading their and their friends' timelines. The data was processed, identifying interactivity between the set of users and their friends, in particular in who retweeted what. A number of machine learning algorithms, such as Nave Bayes classier, were tested on this data and evaluated according to the model.

In the case of user relation, data was instead obtained by identifying a number of the top Twitter users, and the evaluation revolved around grouping their followers based on how similarly they behaved.

The evaluation shows that predicting user behavior from isolated sets of data is not applicable in the given environment, and that the data set must be analyzed in a more integrated manner, e.g. by grouping similar users together. As the input data sets are arbitrary, each being analyzed in specic ways, a pipeline with processing modules that not only analyze the data sets in terms of relevance, but also perform preprocessing, is suggested. Examples of preprocessing might be ltering, adjusting data for use by subsequent modules, or at out rejecting the data prematurely.

Place, publisher, year, edition, pages
2013. , 44 p.
Trita-ICT-EX, 2013:173
Keyword [en]
machine learning, twitter, information, grouping, social networks, relations
National Category
Engineering and Technology
URN: urn:nbn:se:kth:diva-128930OAI: diva2:648805
Educational program
Master of Science in Engineering - Information and Communication Technology
Available from: 2013-09-17 Created: 2013-09-17 Last updated: 2013-09-17Bibliographically approved

Open Access in DiVA

fulltext(358 kB)400 downloads
File information
File name FULLTEXT01.pdfFile size 358 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Information and Communication Technology (ICT)
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 400 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 344 hits
ReferencesLink to record
Permanent link

Direct link