Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Text Cluster Trimming for Better Descriptions and Improved Quality
KTH, School of Computer Science and Communication (CSC).
2010 (English)In: LREC 2010 - SEVENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION / [ed] Calzolari, N Choukri, K Maegaard, B Mariani, J Odijk, J Piperidis, S Rosner, M Tapias, D, EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA , 2010, p. 3076-3083Conference paper, Published paper (Refereed)
Abstract [en]

Text clustering is potentially very useful for exploration of text sets that are too large to study manually. The success of such a tool depends on whether the results can be explained to the user. An automatically extracted cluster description usually consists of a few words that are deemed representative for the cluster. It is preferably short in order to be easily grasped. However, text cluster content is often diverse. We introduce a trimming method that removes texts that do not contain any, or a few of the words in the cluster description. The result is clusters that match their descriptions better. In experiments on two quite different text sets we obtain significant improvements in both internal and external clustering quality for the trimmed clustering compared to the original. The trimming thus has two positive effects: it forces the clusters to agree with their descriptions ( resulting in better descriptions) and improves the quality of the trimmed clusters.

Place, publisher, year, edition, pages
EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA , 2010. p. 3076-3083
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-242814ISI: 000356879507142Scopus ID: 2-s2.0-84955667285OAI: oai:DiVA.org:kth-242814DiVA, id: diva2:1290635
Conference
7th International Conference on Language Resources and Evaluation, LREC 2010; Mediterranean Conference CentreValletta; Malta; 17 May 2010 through 23 May 2010
Note

QC 20190221

Available from: 2019-02-21 Created: 2019-02-21 Last updated: 2019-02-21Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Search in DiVA

By author/editor
Rosell, Magnus
By organisation
School of Computer Science and Communication (CSC)
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf