kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Conformal Off-Policy Evaluation in Markov Decision Processes
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0001-9083-5260
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-4679-4673
2023 (English)In: 2023 62ND IEEE CONFERENCE ON DECISION AND CONTROL, CDC, IEEE , 2023, p. 3087-3094Conference paper, Published paper (Refereed)
Abstract [en]

Reinforcement Learning aims at identifying and evaluating efficient control policies from data. In many real-world applications, the learner is not allowed to experiment and cannot gather data in an online manner (this is the case when experimenting is expensive, risky or unethical). For such applications, the reward of a given policy (the target policy) must be estimated using historical data gathered under a different policy (the behavior policy). Most methods for this learning task, referred to as Off-Policy Evaluation (OPE), do not come with accuracy and certainty guarantees. We present a novel OPE method based on Conformal Prediction that outputs an interval containing the true reward of the target policy with a prescribed level of certainty. The main challenge in OPE stems from the distribution shift due to the discrepancies between the target and the behavior policies. We propose and empirically evaluate different ways to deal with this shift. Some of these methods yield conformalized intervals with reduced length compared to existing approaches, while maintaining the same certainty level.

Place, publisher, year, edition, pages
IEEE , 2023. p. 3087-3094
Series
IEEE Conference on Decision and Control, ISSN 0743-1546
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-344697ISI: 001166433802090OAI: oai:DiVA.org:kth-344697DiVA, id: diva2:1847177
Conference
62nd IEEE Conference on Decision and Control (CDC), DEC 13-15, 2023, IEEE Control Syst Soc, Singapore, SINGAPORE
Note

QC 20240326

Part of ISBN 979-8-3503-0124-3

Available from: 2024-03-26 Created: 2024-03-26 Last updated: 2024-10-11Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Foffano, DanieleRusso, AlessioProutiere, Alexandre

Search in DiVA

By author/editor
Foffano, DanieleRusso, AlessioProutiere, Alexandre
By organisation
Decision and Control Systems (Automatic Control)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 48 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf