kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Minimal Expected Regret in Linear Quadratic Control
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-4403-1066
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-4679-4673
2022 (English)In: International Conference on Artificial Intelligence and Statistics, vol 151 / [ed] Camps-Valls, G Ruiz, FJR Valera, I, ML Research Press , 2022, Vol. 151Conference paper, Published paper (Refereed)
Abstract [en]

We consider the problem of online learning in Linear Quadratic Control systems whose state transition and state-action transition matrices A and B may be initially unknown. We devise an online learning algorithm and provide guarantees on its expected regret. This regret at time T is upper bounded (i) by (O) over tilde((d(u) + d(x))root d(x)T) when A and B are unknown, (ii) by (O) over tilde (d(x)(2) log(T)) if only A is unknown, and (iii) by (O) over tilde (d(x)(d(u) + d(x))log(T)) if only B is unknown and under some mild non-degeneracy condition (d(x) and d(u) denote the dimensions of the state and of the control input, respectively). These regret scalings are minimal in T, d(x) and d(u) as they match existing lower bounds in scenario (i) when d(x) <= d(u) (Simchowitz and Foster, 2020), and in scenario (ii) (Lai, 1986). We conjecture that our upper bounds are also optimal in scenario (iii) (there is no known lower bound in this setting). Existing online algorithms proceed in epochs of (typically exponentially) growing durations. The control policy is fixed within each epoch, which considerably simplifies the analysis of the estimation error on A and B and hence of the regret. Our algorithm departs from this design choice: it is a simple variant of certainty-equivalence regulators, where the estimates of A and B and the resulting control policy can be updated as frequently as we wish, possibly at every step. Quantifying the impact of such a constantly-varying control policy on the performance of these estimates and on the regret constitutes one of the technical challenges tackled in this paper.

Place, publisher, year, edition, pages
ML Research Press , 2022. Vol. 151
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-320996ISI: 000841852304034Scopus ID: 2-s2.0-85132044960OAI: oai:DiVA.org:kth-320996DiVA, id: diva2:1708518
Conference
25th International Conference on Artificial Intelligence and Statistics, AISTATS 2022, Virtual, Online, 28-30 March 2022
Note

QC 20221104

Available from: 2022-11-04 Created: 2022-11-04 Last updated: 2023-09-07Bibliographically approved
In thesis
1. Statistical Learning in Linearly Structured Systems: Identification, Control, and Reinforcement Learning
Open this publication in new window or tab >>Statistical Learning in Linearly Structured Systems: Identification, Control, and Reinforcement Learning
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, we investigate the design and statistical efficiency of learning algorithms in systems with a linear structure. This study is carried along three main domains, namely identification, control, and reinforcement learning, and is presented as a collection of five papers. 

In the first paper, we consider the problem of best-arm identification in linear bandits.  We devise a simple learning algorithm based on the track-and-stop design whose sample complexity matches known instance-optimal lower bounds asymptotically. Actually, these instance-optimal lower bounds are obtained by solving an optimization problem, which in turn inspires the design of our algorithm. We also show that our algorithm is computationally competitive and performs remarkably well experimentally.  

In the second paper, we investigate the linear system identification problem at a finite sample size level. More precisely, we study the problem in the so-called Probably Approximately Correct (PAC) framework. We establish tight instance-specific sample complexity lower bounds via change-of-measure arguments for generic systems. In the case of stable linear dynamical systems, we derive the first non-asymptotic instance-dependent lower bounds and prove that the Least-Squares Estimator (LSE) achieves these limits in the high accuracy regime. Our analysis of the LSE is sharper, simpler, and easier to interpret than existing analyses, and relies on novel concentration results for covariates matrices with dependent data.

In the third paper, we consider the problem of learning linear quadratic regulators. We devise online learning algorithms and provide guarantees on their expected regret. We consider two distinct setups, namely when the dynamics of the system are partially known or not known at all. We achieve minimal regret scalings in both setups. The algorithm we propose is a simple variant of certainty equivalence controllers, where the estimates of the dynamics are continuously updated, and rely on a clever hysteresis switching mechanism. Empirical results suggest that such switching mechanism leads to fast burning time and does not harm the theoretical guarantees.

In the fourth paper, we consider the problem of best policy identification in discounted linear Markov Decision Processes (MDPs) under both the generative and the forward models. We derive instance-specific sample complexity lower bounds to identify a near-optimal policy, and devise algorithms matching these limits. In the generative model, our algorithm exhibits a sample complexity guarantee matching existing minimax and gap-dependent lower bounds. In the forward model, we identify sufficient conditions, weaker than ergodicity or communication, under which learnability is achievable. When these conditions are met, our algorithm is asymptotically matching an instance-specific complexity measure, corresponding to the value of an optimal experiment-design problem.    

In the fifth paper, we study model estimation in episodic block MDPs. In these MDPs, the decision maker observes rich contexts generated from a small number of latent states. We are interested in recovering the latent state decoding function (the mapping from the observations to latent states) from the data generated under a fixed behavior policy. We derive an information-theoretical lower bound on the error rate for estimating this function and propose an algorithm approaching this limit. We apply our results to the problem of learning near-optimal policies in the reward-free setting. Based on our efficient model estimation algorithm, we show that we can learn a policy converging (as the number of collected samples grows large) to the optimal policy at the best possible asymptotic minimax rate. Our analysis provides necessary and sufficient conditions under which exploiting the block structure results in improved sample complexity guarantees for identifying near-optimal policies.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. p. 67
Series
TRITA-EECS-AVL ; 2023:53
Keywords
Machine Learning; Statistical Learning; Control Theory; Reinforcement Learning; System Identification
National Category
Control Engineering
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-327357 (URN)978-91-8040-628-4 (ISBN)
Public defence
2023-06-14, D2, Lindstedtsvägen 9, Stockholm, 16:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20230525

Available from: 2023-05-25 Created: 2023-05-24 Last updated: 2023-05-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Jedra, YassirProutiere, Alexandre

Search in DiVA

By author/editor
Jedra, YassirProutiere, Alexandre
By organisation
Decision and Control Systems (Automatic Control)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 98 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf