Lipschitz Bandits: Regret Lower Bounds and Optimal Algorithms
2014 (English)Conference paper (Refereed)
We consider stochastic multi-armed bandit problems where the expected reward is a Lipschitzfunction of the arm, and where the set of arms is either discrete or continuous. For discrete Lipschitzbandits, we derive asymptotic problem specific lower bounds for the regret satisfied by anyalgorithm, and propose OSLB and CKL-UCB, two algorithms that efficiently exploit the Lipschitzstructure of the problem. In fact, we prove that OSLB is asymptotically optimal, as its asymptoticregret matches the lower bound. The regret analysis of our algorithms relies on a new concentrationinequality for weighted sums of KL divergences between the empirical distributions of rewards andtheir true distributions. For continuous Lipschitz bandits, we propose to first discretize the actionspace, and then apply OSLB or CKL-UCB, algorithms that provably exploit the structure efficiently.This approach is shown, through numerical experiments, to significantly outperform existing algorithmsthat directly deal with the continuous set of arms. Finally the results and algorithms areextended to contextual bandits with similarities.
Place, publisher, year, edition, pages
2014. Vol. 35, 975-999 p.
, Journal of Machine Learning Research, ISSN 1532-4435
Probability Theory and Statistics Computer Science
Research subject Computer Science
IdentifiersURN: urn:nbn:se:kth:diva-146876ScopusID: 2-s2.0-84939604123OAI: oai:DiVA.org:kth-146876DiVA: diva2:737623
COLT,Barcelona, Spain, June 13-15, 2014
QC 201408262014-08-132014-06-172015-10-19Bibliographically approved