kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Policy Evaluation in Distributional LQR
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).
Department of Computer Science, University of Oxford, UK.
Information-oriented Control, Technical University of Munich, Germany.
Department of Mechanical Engineering and Materials Science, Duke University, USA.
Show others and affiliations
2023 (English)In: Proceedings of the 5th Annual Learning for Dynamics and Control Conference, L4DC 2023, ML Research Press , 2023, Vol. 211, p. 1245-1256Conference paper, Published paper (Refereed)
Abstract [en]

Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on discounted linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call distributional LQR. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results.

Place, publisher, year, edition, pages
ML Research Press , 2023. Vol. 211, p. 1245-1256
Series
Proceedings of Machine Learning Research, ISSN 26403498
Keywords [en]
Distributional LQR, distributional RL, policy evaluation, risk-averse control
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-338027ISI: 001221742900095Scopus ID: 2-s2.0-85172892657OAI: oai:DiVA.org:kth-338027DiVA, id: diva2:1804640
Conference
5th Annual Conference on Learning for Dynamics and Control, L4DC 2023, Philadelphia, PA, United States of America, Jun 16 2023 - Jun 15 2023
Note

QC 20231013

Available from: 2023-10-13 Created: 2023-10-13 Last updated: 2024-09-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Wang, ZifanJohansson, Karl H.

Search in DiVA

By author/editor
Wang, ZifanJohansson, Karl H.
By organisation
Decision and Control Systems (Automatic Control)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 42 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf