Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Provable randomized rounding for minimum-similarity diversification
Department of Computer Science, Aalto University, Espoo, Finland.
Department of Computer Science, University of Helsinki, Helsinki, Finland.
Department of Computer Science, Aalto University, Espoo, Finland.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Teoretisk datalogi, TCS.ORCID-id: 0000-0002-5211-112X
2022 (engelsk)Inngår i: Data mining and knowledge discovery, ISSN 1384-5810, E-ISSN 1573-756X, Vol. 36, nr 2, s. 709-738Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

When searching for information in a data collection, we are often interested not only in finding relevant items, but also in assembling a diverse set, so as to explore different concepts that are present in the data. This problem has been researched extensively. However, finding a set of items with minimal pairwise similarities can be computationally challenging, and most existing works striving for quality guarantees assume that item relatedness is measured by a distance function. Given the widespread use of similarity functions in many domains, we believe this to be an important gap in the literature. In this paper we study the problem of finding a diverse set of items, when item relatedness is measured by a similarity function. We formulate the diversification task using a flexible, broadly applicable minimization objective, consisting of the sum of pairwise similarities of the selected items and a relevance penalty term. To find good solutions we adopt a randomized rounding strategy, which is challenging to analyze because of the cardinality constraint present in our formulation. Even though this obstacle can be overcome using dependent rounding, we show that it is possible to obtain provably good solutions using an independent approach, which is faster, simpler to implement and completely parallelizable. Our analysis relies on a novel bound for the ratio of Poisson-Binomial densities, which is of independent interest and has potential implications for other combinatorial-optimization problems. We leverage this result to design an efficient randomized algorithm that provides a lower-order additive approximation guarantee. We validate our method using several benchmark datasets, and show that it consistently outperforms the greedy approaches that are commonly used in the literature.

sted, utgiver, år, opplag, sider
Springer Nature , 2022. Vol. 36, nr 2, s. 709-738
Emneord [en]
Diversification, Quadratic programming, Randomized rounding, Recommender systems, Approximation algorithms, Combinatorial optimization, Cardinality constraints, Data collection, Distance functions, Item relatedness, Minimisation, Penalty term, Searching for informations, Similarity functions
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-319160DOI: 10.1007/s10618-021-00811-2ISI: 000738439600002PubMedID: 35401029Scopus ID: 2-s2.0-85122726042OAI: oai:DiVA.org:kth-319160DiVA, id: diva2:1700363
Merknad

QC 20220930

Tilgjengelig fra: 2022-09-30 Laget: 2022-09-30 Sist oppdatert: 2025-01-27bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstPubMedScopus

Person

Gionis, Aristides

Søk i DiVA

Av forfatter/redaktør
Gionis, Aristides
Av organisasjonen
I samme tidsskrift
Data mining and knowledge discovery

Søk utenfor DiVA

GoogleGoogle Scholar

doi
pubmed
urn-nbn

Altmetric

doi
pubmed
urn-nbn
Totalt: 29 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf