kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Ground truth clustering is not the optimum clustering
University of Sevilla, C. San Fernando 4, 41004, Seville, Spain.
Institut für Mathematik, Alpen-Adria-Universität Klagenfurt, Universitätstraße 65-67, 9020, Klagenfurt, Austria.
Faculty of Mechanical Engineering, University of Ljubljana, Aškerčeva 6, 1000, Ljubljana, Slovenia; Rudolfovo – Science and technology center Novo Mesto, Podbreznik 15, 8000, Novo Mesto, Slovenia.
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Numerical Analysis, Optimization and Systems Theory.ORCID iD: 0000-0001-6352-0968
2025 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 15, no 1, article id 9223Article in journal (Refereed) Published
Abstract [en]

Data clustering is a fundamental yet challenging task in data science. The minimum sum-of-squares clustering (MSSC) problem aims to partition data points into k clusters to minimize the sum of squared distances between the points and their cluster centers (centroids). Despite being NP-hard, solvers exist that can compute optimal solutions for small to medium-sized datasets. One such solver is SOS-SDP, a branch-and-bound algorithm based on semidefinite programming. We used it to obtain optimal MSSC solutions (optimum clusterings) for various k across multiple datasets with known ground truth clusterings. We evaluated the alignment between the optimum and ground truth clusterings using six extrinsic measures and assessed their quality using three intrinsic measures. The results reveal that the optimum clusterings often differ significantly from the ground truth clusterings. Additionally, the optimum clusterings frequently outperform the ground truth clusterings, according to the intrinsic measures that we used. However, when ground truth clusters are well-separated convex shapes, such as ellipsoids, the optimum and ground truth clusterings closely align.

Place, publisher, year, edition, pages
Springer Nature , 2025. Vol. 15, no 1, article id 9223
Keywords [en]
Extrinsic measures, Ground truth clustering, Intrinsic measures, Minimum sum-of-squares clustering
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:kth:diva-362013DOI: 10.1038/s41598-025-90865-9ISI: 001446949700011PubMedID: 40097499Scopus ID: 2-s2.0-105000375355OAI: oai:DiVA.org:kth-362013DiVA, id: diva2:1949686
Note

QC 20250425

Available from: 2025-04-03 Created: 2025-04-03 Last updated: 2025-04-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPubMedScopus

Authority records

Zhao, Shudian

Search in DiVA

By author/editor
Zhao, Shudian
By organisation
Numerical Analysis, Optimization and Systems Theory
In the same journal
Scientific Reports
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric score

doi
pubmed
urn-nbn
Total: 26 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf