Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Ground truth clustering is not the optimum clustering
University of Sevilla, C. San Fernando 4, 41004, Seville, Spain.
Institut für Mathematik, Alpen-Adria-Universität Klagenfurt, Universitätstraße 65-67, 9020, Klagenfurt, Austria.
Faculty of Mechanical Engineering, University of Ljubljana, Aškerčeva 6, 1000, Ljubljana, Slovenia; Rudolfovo – Science and technology center Novo Mesto, Podbreznik 15, 8000, Novo Mesto, Slovenia.
KTH, Skolan för teknikvetenskap (SCI), Matematik (Inst.), Numerisk analys, optimeringslära och systemteori.ORCID-id: 0000-0001-6352-0968
2025 (engelsk)Inngår i: Scientific Reports, E-ISSN 2045-2322, Vol. 15, nr 1, artikkel-id 9223Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Data clustering is a fundamental yet challenging task in data science. The minimum sum-of-squares clustering (MSSC) problem aims to partition data points into k clusters to minimize the sum of squared distances between the points and their cluster centers (centroids). Despite being NP-hard, solvers exist that can compute optimal solutions for small to medium-sized datasets. One such solver is SOS-SDP, a branch-and-bound algorithm based on semidefinite programming. We used it to obtain optimal MSSC solutions (optimum clusterings) for various k across multiple datasets with known ground truth clusterings. We evaluated the alignment between the optimum and ground truth clusterings using six extrinsic measures and assessed their quality using three intrinsic measures. The results reveal that the optimum clusterings often differ significantly from the ground truth clusterings. Additionally, the optimum clusterings frequently outperform the ground truth clusterings, according to the intrinsic measures that we used. However, when ground truth clusters are well-separated convex shapes, such as ellipsoids, the optimum and ground truth clusterings closely align.

sted, utgiver, år, opplag, sider
Springer Nature , 2025. Vol. 15, nr 1, artikkel-id 9223
Emneord [en]
Extrinsic measures, Ground truth clustering, Intrinsic measures, Minimum sum-of-squares clustering
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-362013DOI: 10.1038/s41598-025-90865-9ISI: 001446949700011PubMedID: 40097499Scopus ID: 2-s2.0-105000375355OAI: oai:DiVA.org:kth-362013DiVA, id: diva2:1949686
Merknad

QC 20250425

Tilgjengelig fra: 2025-04-03 Laget: 2025-04-03 Sist oppdatert: 2025-04-25bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstPubMedScopus

Person

Zhao, Shudian

Søk i DiVA

Av forfatter/redaktør
Zhao, Shudian
Av organisasjonen
I samme tidsskrift
Scientific Reports

Søk utenfor DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric

doi
pubmed
urn-nbn
Totalt: 70 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf