kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality
KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.ORCID iD: 0000-0003-3906-8806
KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.ORCID iD: 0000-0003-2598-4459
KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.ORCID iD: 0000-0002-0862-1333
KTH, School of Electrical Engineering and Computer Science (EECS), Information Science and Engineering.ORCID iD: 0000-0002-0036-9049
Show others and affiliations
2025 (English)In: 2025 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), Institute of Electrical and Electronics Engineers (IEEE) , 2025, article id 3980Conference paper, Published paper (Refereed)
Abstract [en]

We study agents acting in an unknown environment where the agent's goal is to find a robust policy. We consider robust policies as policies that achieve high cumulative rewards for all possible environments. To this end, we consider agents minimizing the maximum regret over different environment parameters, leading to the study of minimax regret. This research focuses on deriving information-theoretic bounds for minimax regret in Markov Decision Processes (MDPs) with a finite time horizon. Building on concepts from supervised learning, such as minimum excess risk (MER) and minimax excess risk, we use recent bounds on the Bayesian regret to derive minimax regret bounds. Specifically, we establish minimax theorems and use bounds on the Bayesian regret to perform minimax regret analysis using these minimax theorems. Our contributions include defining a suitable minimax regret in the context of MDPs, finding information-theoretic bounds for it, and applying these bounds in various scenarios.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2025. article id 3980
Series
International Conference on Acoustics Speech and Signal Processing ICASSP, ISSN 1520-6149
Keywords [en]
information theory, reinforcement learning, minimax theorems
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-377543DOI: 10.1109/ICASSP49660.2025.10889842ISI: 001611517600617Scopus ID: 2-s2.0-105009590606ISBN: 979-8-3503-6874-1 (print)ISBN: 979-8-3503-6874-1 (print)OAI: oai:DiVA.org:kth-377543DiVA, id: diva2:2046303
Conference
2025 International Conference on Acoustics Speech and Signal Processing-ICASSP-Annual, APR 06-11, 2025, Hyderabad, INDIA
Note

QC 20260316

Available from: 2026-03-16 Created: 2026-03-16 Last updated: 2026-03-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Bongole, RaghavGouverneur, AmauryRodríguez Gálvez, BorjaOechtering, Tobias J.Skoglund, Mikael

Search in DiVA

By author/editor
Bongole, RaghavGouverneur, AmauryRodríguez Gálvez, BorjaOechtering, Tobias J.Skoglund, Mikael
By organisation
Information Science and Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 18 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf