Unimodal Bandits with Continuous Arms: Order-optimal Regret without Smoothness
2020 (English)In: SIGMETRICS Performance 2020 - Abstracts of the 2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems, Association for Computing Machinery, Inc , 2020, p. 107-108Conference paper, Published paper (Refereed)
Abstract [en]
We consider stochastic bandit problems with a continuous set of arms and where the expected reward is a continuous and unimodal function of the arm. For these problems, we propose the Stochastic Polychotomy (SP) algorithms, and derive finite-time upper bounds on their regret and optimization error. We show that, for a class of reward functions, the SP algorithm achieves a regret and an optimization error with optimal scalings, i.e., O(T) and O(1/T) (up to a logarithmic factor), respectively.
Place, publisher, year, edition, pages
Association for Computing Machinery, Inc , 2020. p. 107-108
Keywords [en]
continuous bandits, multi-armed bandits, online learning, stochastic optimization, unimodal bandits, Computer science, Computers, Measurement, Bandit problems, Finite time, Optimal regret, Optimization errors, Reward function, SP algorithms, Unimodal functions, Upper Bound, Stochastic systems
National Category
Control Engineering
Identifiers
URN: urn:nbn:se:kth:diva-303168DOI: 10.1145/3393691.3394225Scopus ID: 2-s2.0-85086996497OAI: oai:DiVA.org:kth-303168DiVA, id: diva2:1601629
Conference
2020 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS 2020, 8 June 2020 through 12 June 2020
Note
QC 20211008
2021-10-082021-10-082023-04-05Bibliographically approved