kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Adaptive Sampling for Best Policy Identification in Markov Decision Processes
UMPA, ENS Lyon, UMPA, ENS Lyon.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-4679-4673
2021 (English)In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, ML Research Press , 2021, p. 7459-7468Conference paper, Published paper (Refereed)
Abstract [en]

We investigate the problem of best-policy identification in discounted Markov Decision Processes (MDPs) when the learner has access to a generative model. The objective is to devise a learning algorithm returning the best policy as early as possible. We first derive a problem-specific lower bound of the sample complexity satisfied by any learning algorithm. This lower bound corresponds to an optimal sample allocation that solves a non-convex program, and hence, is hard to exploit in the design of efficient algorithms. We then provide a simple and tight upper bound of the sample complexity lower bound, whose corresponding nearly-optimal sample allocation becomes explicit. The upper bound depends on specific functionals of the MDP such as the sub-optimality gaps and the variance of the next-state value function, and thus really captures the hardness of the MDP. Finally, we devise KLB-TS (KL Ball Track-and-Stop), an algorithm tracking this nearly-optimal allocation, and provide asymptotic guarantees for its sample complexity (both almost surely and in expectation). The advantages of KLB-TS against state-of-the-art algorithms are discussed and illustrated numerically.

Place, publisher, year, edition, pages
ML Research Press , 2021. p. 7459-7468
National Category
Control Engineering
Identifiers
URN: urn:nbn:se:kth:diva-347965Scopus ID: 2-s2.0-85119138246OAI: oai:DiVA.org:kth-347965DiVA, id: diva2:1882558
Conference
38th International Conference on Machine Learning, ICML 2021, Virtual, Online, NA, Jul 18 2021 - Jul 24 2021
Note

Part of ISBN [9781713845065]

QC 20240705

Available from: 2024-07-05 Created: 2024-07-05 Last updated: 2024-07-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Proutiere, Alexandre

Search in DiVA

By author/editor
Proutiere, Alexandre
By organisation
Decision and Control Systems (Automatic Control)
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 128 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf