We study different classes of reinforcement learning problems using the minimax regret framework. We formalize a finite-horizon reinforcement learning problem setting that is suitable for the information-theoretic analysis of minimax regret which encompasses linear bandits, Markov decision processes, linear Markov decision processes, and other reinforcement learning problems. We derive a minimax theorem applicable to this setting that does not require any finiteness or deterministic policy constraints. Using this theorem, we show that any Bayesian regret bound can be used to bound the minimax regret within our framework. We then apply the minimax theorem to obtain an information-theoretic upper bound for the minimax regret, leveraging a general Bayesian regret bound. The derived minimax regret bound inherits key properties of the Bayesian regret bound, including its ability to isolate factors such as the information ratio, the mutual information between the learning target and the environment, and the Bayesian regret of the target policy. Finally, we demonstrate the applicability of our bounds in various settings, including linear bandits, episodic reinforcement learning, and linear Markov decision processes, recovering known results for the minimax regret.
Part of ISBN 9798331531423
QC 20260226