Evaluation of a Prototype for Relevance Profiling
Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Only a small portion of the amounts of information generated online are relevant to a given person.
In this thesis, a prototype for determining a relevance value based on sets of data for some topic is evaluated to determine its viability in a future product called Votia.
To achieve this, an evaluation model was dened based on \accuracy" and \eciency" for various machine learning algorithms applied to various types of data found in a tweet - a short user message on the Twitter platform | such as the message, relations between users and the tweeter, users' general behavior characteristics, and geographic data. A system was set up to fetch and convert Twitter data into data tting the prototype, with the hypothesis that (1) the Twitter data model could be mapped into the Votia data model, from which user behavior could be predicted at an adequate accuracy, and that (2) user behavior could be predicted to some degree from isolated sets of data.
Data from Twitter was obtained by taking a random sample of users - the main actors - and then loading their and their friends' timelines. The data was processed, identifying interactivity between the set of users and their friends, in particular in who retweeted what. A number of machine learning algorithms, such as Nave Bayes classier, were tested on this data and evaluated according to the model.
In the case of user relation, data was instead obtained by identifying a number of the top Twitter users, and the evaluation revolved around grouping their followers based on how similarly they behaved.
The evaluation shows that predicting user behavior from isolated sets of data is not applicable in the given environment, and that the data set must be analyzed in a more integrated manner, e.g. by grouping similar users together. As the input data sets are arbitrary, each being analyzed in specic ways, a pipeline with processing modules that not only analyze the data sets in terms of relevance, but also perform preprocessing, is suggested. Examples of preprocessing might be ltering, adjusting data for use by subsequent modules, or at out rejecting the data prematurely.
Place, publisher, year, edition, pages
2013. , 44 p.
machine learning, twitter, information, grouping, social networks, relations
Engineering and Technology
IdentifiersURN: urn:nbn:se:kth:diva-128930OAI: oai:DiVA.org:kth-128930DiVA: diva2:648805
Master of Science in Engineering - Information and Communication Technology
Kilander, Fredrik, Associate professor