Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms
KTH, School of Electrical Engineering (EES), Automatic Control.ORCID iD: 0000-0003-2988-8464
Supelec, France.
KTH, School of Electrical Engineering (EES), Automatic Control. INRIA, France.
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitzfunction of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitzbandits, we derive asymptotic problem specific lower bounds for the regret satisfied by anyalgorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitzstructure of the problem. In fact, we prove that OSLB is asymptotically optimal, as its asymptoticregret matches the lower bound. The regret analysis of our algorithms relies on a new concentrationinequality for weighted sums of KL divergences between the empirical distributions of rewards andtheir true distributions. For continuous Lipschitz bandits, we propose to first discretize the actionspace, and then apply OSLB or CKL-UCB, algorithms that provably exploit the structure efficiently.This approach is shown, through numerical experiments, to significantly outperform existing algorithmsthat directly deal with the continuous set of arms. Finally the results and algorithms areextended to contextual bandits with similarities.

Place, publisher, year, edition, pages
2014. Vol. 35, 975-999 p.
Series
Journal of Machine Learning Research, ISSN 1532-4435
Keyword [en]
Multi-armed Bandits
National Category
Probability Theory and Statistics Computer Science
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-146876Scopus ID: 2-s2.0-84939604123OAI: oai:DiVA.org:kth-146876DiVA: diva2:737623
Conference
COLT,Barcelona, Spain, June 13-15, 2014
Note

QC 20140826

Available from: 2014-08-13 Created: 2014-06-17 Last updated: 2015-10-19Bibliographically approved

Open Access in DiVA

fulltext(557 kB)32 downloads
File information
File name FULLTEXT01.pdfFile size 557 kBChecksum SHA-512
3f5b80f58e1c34a5bceedcb3be699e48102e6ee6ef7730082ae3edbc5966dfb6b906d073c057d41787d95cb3d82fe9fa0a8d4add0919ea19bc9f8bb1ade5e072
Type fulltextMimetype application/pdf

Other links

ScopusConference websiteFulltext at JMLR

Authority records BETA

Magureanu, Stefan

Search in DiVA

By author/editor
Magureanu, StefanProutiere, Alexandre
By organisation
Automatic Control
Probability Theory and StatisticsComputer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 32 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 110 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf