kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Adaptive Sampling for Best Policy Identification in Markov Decision Processes
ENS Lyon, UMPA, Lyon, France..
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-4679-4673
2021 (English)In: Proceedings of the 38 th International Conference on Machine Learning, PMLR 139, 2021 / [ed] Meila, M Zhang, T, The Journal of Machine Learning Research (JMLR) , 2021, Vol. 139Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the problem of best-policy identification in discounted Markov Decision Processes (MDPs) when the learner has access to a generative model. The objective is to devise a learning algorithm returning the best policy as early as possible. We first derive a problem-specific lower bound of the sample complexity satisfied by any learning algorithm. This lower bound corresponds to an optimal sample allocation that solves a non-convex program, and hence, is hard to exploit in the design of efficient algorithms. We then provide a simple and tight upper bound of the sample complexity lower bound, whose corresponding nearly-optimal sample allocation becomes explicit. The upper bound depends on specific functionals of the MDP such as the sub-optimality gaps and the variance of the next-state value function, and thus really captures the hardness of the MDP. Finally, we devise KLB-TS (KL Ball Track-and-Stop), an algorithm tracking this nearly-optimal allocation, and provide asymptotic guarantees for its sample complexity (both almost surely and in expectation). The advantages of KLB -TS against state-of-the-art algorithms are discussed and illustrated numerically.

Place, publisher, year, edition, pages
The Journal of Machine Learning Research (JMLR) , 2021. Vol. 139
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Control Engineering
Identifiers
URN: urn:nbn:se:kth:diva-311540ISI: 000768182703055OAI: oai:DiVA.org:kth-311540DiVA, id: diva2:1655712
Conference
International Conference on Machine Learning (ICML), JUL 18-24, 2021, Electr Network, Vienna , Austria
Note

QC 20220503

Available from: 2022-05-03 Created: 2022-05-03 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Proutiere, Alexandre

Search in DiVA

By author/editor
Proutiere, Alexandre
By organisation
Decision and Control Systems (Automatic Control)
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 216 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf