kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Diversity-Aware k-median: Clustering with Fair Center Representation
Aalto Univ, Espoo, Finland..
Aalto Univ, Espoo, Finland..
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. Aalto Univ, Espoo, Finland..ORCID iD: 0000-0002-5211-112X
2021 (English)In: MACHINE LEARNING AND KNOWLEDGE DISCOVERY IN DATABASES, ECML PKDD 2021: RESEARCH TRACK, PT II / [ed] Oliver, N PerezCruz, F Kramer, S Read, J Lozano, JA, Springer Nature , 2021, Vol. 12976, p. 765-780Conference paper, Published paper (Refereed)
Abstract [en]

We introduce a novel problem for diversity-aware clustering. We assume that the potential cluster centers belong to a set of groups defined by protected attributes, such as ethnicity, gender, etc. We then ask to find a minimum-cost clustering of the data into k clusters so that a specified minimum number of cluster centers are chosen from each group. We thus require that all groups are represented in the clustering solution as cluster centers, according to specified requirements. More precisely, we are given a set of clients C, a set of facilities F, a collection F = {F-1,...,Ft} of facility groups F-i subset of F, a budget k, and a set of lower-bound thresholds R = {r(1),..,r(t)}, one for each group in The diversity-aware k-median problem asks to find a set S of k facilities in F such that vertical bar S boolean AND F-i vertical bar >= r(i), that is, at least ri centers in S are from group and the k-median cost Sigma(c is an element of C) min(s is an element of S) d(c, s) is minimized. We show that in the general case where the facility groups may overlap, the diversity-aware k-median problem is NP-hard, fixed-parameter intractable with respect to parameter k, and inapproximable to any multiplicative factor. On the other hand, when the facility groups are disjoint, approximation algorithms can be obtained by reduction to the matroid median and redblue median problems. Experimentally, we evaluate our approximation methods for the tractable cases, and present a relaxation-based heuristic for the theoretically intractable case, which can provide high-quality and efficient solutions for real-world datasets.

Place, publisher, year, edition, pages
Springer Nature , 2021. Vol. 12976, p. 765-780
Series
Lecture Notes in Artificial Intelligence, ISSN 0302-9743
Keywords [en]
Algorithmic bias, Algorithmic fairness, Diversity-aware clustering, Fair clustering
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-305422DOI: 10.1007/978-3-030-86520-7_47ISI: 000713032300047Scopus ID: 2-s2.0-85115715754OAI: oai:DiVA.org:kth-305422DiVA, id: diva2:1615668
Conference
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), SEP 13-17, 2021, ELECTR NETWORK
Note

Part of proceedings: ISBN 978-3-030-86520-7, QC 20230117

Available from: 2021-11-30 Created: 2021-11-30 Last updated: 2025-01-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Gionis, Aristides

Search in DiVA

By author/editor
Gionis, Aristides
By organisation
Theoretical Computer Science, TCS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 31 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf