Structured Stochastic Bandits
2016 (English)Licentiate thesis, monograph (Other academic)
In this thesis we address the multi-armed bandit (MAB) problem with stochastic rewards and correlated arms. Particularly, we investigate the case when the expected rewards are a Lipschitz function of the arm, and the learning to rank problem, as viewed from a MAB perspective. For the former, we derive a problem specific lower bound and propose both an asymptotically optimal algorithm (OSLB) and a (pareto)optimal, algorithm (POSLB). For the latter, we construct the regret lower bound and determine its closed form for some particular settings, as well as propose two asymptotically optimal algorithms PIE and PIE-C. For all algorithms mentioned above, we present performance analysis in the form of theoretical regret guarantees as well as numerical evaluation on artificial datasets as well as real-world datasets, in the case of PIE and PIE-C.
Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2016. , vii, 94 p.
TRITA-EE, ISSN 1653-5146 ; 2016:021
Multi-armed bandits, Learning to rank, reinforcement learning, Lipschitz Bandits
Other Electrical Engineering, Electronic Engineering, Information Engineering
Research subject Electrical Engineering
IdentifiersURN: urn:nbn:se:kth:diva-182816ISBN: 978-91-7595-880-4OAI: oai:DiVA.org:kth-182816DiVA: diva2:905873
2016-03-15, Q2, Osquldasväg 10, KTH, Stockholm, 10:00 (English)
Kaufmann, Emilie, Dr.
Proutiere, Alexandre, Associate Professor
FunderEU, European Research Council
QC 201602232016-02-232016-02-232016-02-23Bibliographically approved