kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sub-linear Regret in Adaptive Model Predictive Control
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control). (Statistical Learning & Control Group)ORCID iD: 0000-0003-4606-0060
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-4679-4673
2023 (English)In: New Frontiers in Learning, Control, and Dynamical Systems Workshop / [ed] Valentin De Bortoli, Charlotte Bunne, Guan-Horng Liu, Tianrong Chen, Maxim Raginsky, Pratik Chaudhari, Melanie Zeilinger, Animashree Anandkumar, 2023Conference paper, Published paper (Refereed)
Abstract [en]

We consider the problem of adaptive ModelPredictive Control (MPC) for uncertain linear-systems with additive disturbances and with state and input constraints. We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online algorithm that combines the certainty-equivalence principle and polytopic tubes. Specifically, at any given step, STT-MPC infers the system dynamics using the Least Squares Estimator(LSE), and applies a controller obtained by solving an MPC problem using these estimates. The use of polytopic tubes is so that, despite the uncertainties, state and input constraints are satisfied, and recursive-feasibility and asymptotic stability hold. In this work, we analyze the regret of the algorithm, when compared to an oracle algorithm initially aware of the system dynamics. We establish that the expected regret of STT-MPC does not exceed O(T 1/2+ε), where ε ∈ (0, 1) is a design parameter tuning the persistent excitation component of the algorithm. Our result relies on a recently proposed exponential decay of sensitivity property and, to the best of our knowledge, is the first of its kind in this setting. We illustrate the performance of our algorithm using a simple numerical example.

Place, publisher, year, edition, pages
2023.
Keywords [en]
Adaptive Control, Model Predictive Control
National Category
Computer Sciences
Research subject
Computer Science; Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-337466OAI: oai:DiVA.org:kth-337466DiVA, id: diva2:1802149
Conference
New Frontiers in Learning, Control, and Dynamical Systems, Workshop at the International Conference on Machine Learning (ICML) 2023, Hawaii Convention Center, July 28th 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP), WASP 66453
Note

QC 20231123

Available from: 2023-10-03 Created: 2023-10-03 Last updated: 2023-11-23Bibliographically approved
In thesis
1. Reinforcement Learning and Optimal Adaptive Control for Structured Dynamical Systems
Open this publication in new window or tab >>Reinforcement Learning and Optimal Adaptive Control for Structured Dynamical Systems
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, we study the related problems of reinforcement learning and optimal adaptive control, specialized to specific classes of stochastic and structured dynamical systems. By stochastic, we mean systems that are unknown to the decision maker and evolve according to some probabilistic law. By structured, we mean that they are restricted in some known way, e.g., they belong to a specific model class or must obey a set of known constraints. The objective in both problems is the design of an optimal algorithm, i.e., one that maximizes a certain performance metric. Because of the stochasticity, the algorithm faces an exploration-exploitation dilemma, where it must balance collecting information from the system and leveraging existing information to choose the best action or input. This trade-off  is best captured by the notion of regret, defined as the difference between the performance of the algorithm and an oracle which has full knowledge of the system. In the first part of the thesis, we investigate systems that can be modeled as Markov Decision Processes (MDPs) and derive general asymptotic and problem-specific regret lower bounds for ergodic and deterministic MDPs. We make these bounds explicit for MDPs that: i) are ergodic and unstructured, ii) have Lipschitz transitions and rewards, and iii) are deterministic and satisfy a decoupling property. Furthermore, we propose Directed Exploration Leaning (DEL), an algorithm that is valid for any ergodic MDP with any structure and whose regret upper bound matches the associated regret lower bounds, thus being truly optimal. For this algorithm, we present theoretical regret guarantees as well as a numerical demonstration that verifies its ability to exploit the underlying structure. In the second part, we study systems with uncertain linear dynamics and which are subject to additive disturbances as well as state and input constraints. We develop Self-Tuning Tube-based Model Predictive Control (STTMPC), an adaptive and robust model predictive control algorithm which leverages the least-squares estimator as well as polytopic tubes to guarantee robust constraint satisfaction, along with recursive feasibility, and input-to-state stability. The algorithm also ensures the persistence of excitation without compromising the system's asymptotic performance and with no increase in computational complexity. We also provide guarantees on the expected regret of STT-MPC, in the form of an upper bound whose rate explicitly depends on the chosen rate of excitation. The performance of the algorithm is also demonstrated via a numerical example.

Place, publisher, year, edition, pages
Stockholm: Kungliga Tekniska högskolan, 2023. p. 152
Series
TRITA-EECS-AVL ; 2023:67
Keywords
Reinforcement Learning, Adaptive Control, Dynamical Systems, Control Theory, Control Engineering
National Category
Control Engineering
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-337406 (URN)978-91-8040-712-0 (ISBN)
Public defence
2023-10-23, Kollegiesalen, Brinellvägen 6, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP), WASP 66453
Note

QC 20231003

Available from: 2023-10-04 Created: 2023-10-02 Last updated: 2023-10-04Bibliographically approved

Open Access in DiVA

Fulltext(391 kB)138 downloads
File information
File name FULLTEXT01.pdfFile size 391 kBChecksum SHA-512
2081e3f145acc0faabf5ef35bcbbbed731371f492b7caa81a955d94ae060d8119e6a5c7c9b8d04b69dd85629ef6fa60abf49abb1e3268aebaf4e059d54758345
Type fulltextMimetype application/pdf

Other links

Workshop website

Authority records

Tranos, DamianosProutiere, Alexandre

Search in DiVA

By author/editor
Tranos, DamianosProutiere, Alexandre
By organisation
Decision and Control Systems (Automatic Control)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 138 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 264 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf