kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Optimal clustering from noisy binary feedback
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control). CyberAgent, Tokyo, Japan.ORCID iD: 0000-0001-6286-9906
POSTECH, Pohang Si, South Korea..
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-4679-4673
Korea Adv Inst Sci & Technol, Daejeon, South Korea..
2024 (English)In: Machine Learning, ISSN 0885-6125, E-ISSN 1573-0565, Vol. 113, no 5, p. 2733-2764Article in journal (Refereed) Published
Abstract [en]

We study the problem of clustering a set of items from binary user feedback. Such a problem arises in crowdsourcing platforms solving large-scale labeling tasks with minimal effort put on the users. For example, in some of the recent reCAPTCHA systems, users clicks (binary answers) can be used to efficiently label images. In our inference problem, items are grouped into initially unknown non-overlapping clusters. To recover these clusters, the learner sequentially presents to users a finite list of items together with a question with a binary answer selected from a fixed finite set. For each of these items, the user provides a noisy answer whose expectation is determined by the item cluster and the question and by an item-specific parameter characterizing the hardness of classifying the item. The objective is to devise an algorithm with a minimal cluster recovery error rate. We derive problem-specific information-theoretical lower bounds on the error rate satisfied by any algorithm, for both uniform and adaptive (list, question) selection strategies. For uniform selection, we present a simple algorithm built upon the K-means algorithm and whose performance almost matches the fundamental limits. For adaptive selection, we develop an adaptive algorithm that is inspired by the derivation of the information-theoretical error lower bounds, and in turn allocates the budget in an efficient way. The algorithm learns to select items hard to cluster and relevant questions more often. We compare the performance of our algorithms with or without the adaptive selection strategy numerically and illustrate the gain achieved by being adaptive.

Place, publisher, year, edition, pages
Springer Nature , 2024. Vol. 113, no 5, p. 2733-2764
Keywords [en]
Online algorithm, Clustering, Community detection, Stochastic block model, Crowdsourcing
National Category
Control Engineering
Identifiers
URN: urn:nbn:se:kth:diva-350751DOI: 10.1007/s10994-024-06532-zISI: 001240239000023Scopus ID: 2-s2.0-85188279327OAI: oai:DiVA.org:kth-350751DiVA, id: diva2:1884846
Note

QC 20240718

Available from: 2024-07-18 Created: 2024-07-18 Last updated: 2024-07-18Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Ariu, KaitoProutiere, Alexandre

Search in DiVA

By author/editor
Ariu, KaitoProutiere, Alexandre
By organisation
Decision and Control Systems (Automatic Control)
In the same journal
Machine Learning
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 32 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf