kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Risk-Sensitive RL Using Sampling-Based Expectation-Maximization
Institute for Systems Research (ISR) at the University of Maryland, Department of Electrical and Computer Engineering, College Park, MD, USA.
Institute for Systems Research (ISR) at the University of Maryland, Department of Electrical and Computer Engineering, College Park, MD, USA.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0001-9940-5929
2023 (English)In: Proceedings 2023 62nd IEEE Conference on Decision and Control, CDC 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 7015-7020Conference paper, Published paper (Refereed)
Abstract [en]

There is a need for robust Reinforcement Learning (RL) algorithms that can cope with model misspecification, parameter uncertainty, disturbances, etc. Risk-sensitive methods offer an approach to developing robust RL algorithms by hedging against undesirable outcomes in a probabilistic manner. The Probabilistic Graphical Model (PGM) framework offers systematic exploration for risk-sensitive RL. In this paper, we bridge the Markov Decision Process (MDP) and the PGM frameworks. We exploit the equivalence of optimizing a certain risk-sensitive criterion in the MDP formalism with optimizing a log-likelihood objective in the PGM formalism. By utilizing this equivalence, we offer an approach for developing risk-sensitive algorithms by leveraging the PGM framework. We explore the Expectation-Maximization (EM) algorithm under the PGM formalism. We show that risk-sensitive policy gradient methods can be obtained by applying sampling-based approaches to the EM algorithm, e.g., Monte-Carlo EM, with the log-likelihood. We show that Monte-Carlo EM leads to a risk-sensitive Monte-Carlo policy gradient algorithm. Our simulations illustrate the risk-sensitive nature of the resulting algorithm.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2023. p. 7015-7020
National Category
Control Engineering
Identifiers
URN: urn:nbn:se:kth:diva-343743DOI: 10.1109/CDC49753.2023.10383692ISI: 001166433805118Scopus ID: 2-s2.0-85184821192OAI: oai:DiVA.org:kth-343743DiVA, id: diva2:1839938
Conference
62nd IEEE Conference on Decision and Control, CDC 2023, Singapore, Singapore, Dec 13 2023 - Dec 15 2023
Note

Part of ISBN 9798350301243

QC 20240304

Available from: 2024-02-22 Created: 2024-02-22 Last updated: 2024-03-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Johansson, Karl H.

Search in DiVA

By author/editor
Johansson, Karl H.
By organisation
Decision and Control Systems (Automatic Control)
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 27 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf