Style Mining of Electronic Messages for Multiple Authorship Discrimination: First Results
2003 (English)In: KDD '03 Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining / [ed] Getoor et al., New York, NY, USA: ACM , 2003, 475-480 p.Conference paper (Refereed)
This paper considers the use of computational stylistics for performing authorship attribution of electronic messages, addressing categorization problems with as many as 20 different classes (authors). Effective stylistic characterization of text is potentially useful for a variety of tasks, as language style contains cues regarding the authorship, purpose, and mood of the text, all of which would be useful adjuncts to information retrieval or knowledge-management tasks. We focus here on the problem of determining the author of an anonymous message, based only on the message text. Several multiclass variants of the Winnow algorithm were applied to a vector representation of the message texts to learn models for discriminating different authors. We present results comparing the classification accuracy of the different approaches. The results show that stylistic models can be accurately learned to determine an author's identity.
Place, publisher, year, edition, pages
New York, NY, USA: ACM , 2003. 475-480 p.
authorship attribution, computational stylistics, electronic communication, text categorization, text mining
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-38215DOI: 10.1145/956750.956805ISBN: 1-58113-737-0OAI: oai:DiVA.org:kth-38215DiVA: diva2:436263
The Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining