Comparing Comparisons: Document Clustering Evaluation Using Two Manual Classifications
2004 (English)Conference paper (Refereed)
“Describe your occupation in a few words”, is a question answered by 44 000 Swedish twins.Each respondent was then manually categorized according to two established occupation classificationsystems. Would a clustering algorithm have produced satisfactory results? Usually,this question cannot be answered. The existing quality measures will tell us how much thealgorithmic clustering deviates from the manual classification, not if this is an acceptable deviation. But in our situation, with two different manual classifications (in classificationsystems called AMSYK and YK80), we can indeed construct such quality measures. If the algorithmic result differs no more from the manual classifications than these differ from eachother (comparing the comparisons) we have an indication of its being useful. Further, weuse the kappa coefficient as a clustering quality measure. Using one manual classification asa coding scheme we assess the agreement of a clustering and the other. After applying both these novel evaluation methods we conclude that our clusterings are useful.
Place, publisher, year, edition, pages
IdentifiersURN: urn:nbn:se:kth:diva-7121OAI: oai:DiVA.org:kth-7121DiVA: diva2:12037
ICON 2004, India.
QC 201008062005-09-292005-09-292012-01-20Bibliographically approved