kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reinforcement Learning and Optimal Adaptive Control for Structured Dynamical Systems
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control). (Statistical Learning & Control Group)ORCID iD: 0000-0003-4606-0060
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis, we study the related problems of reinforcement learning and optimal adaptive control, specialized to specific classes of stochastic and structured dynamical systems. By stochastic, we mean systems that are unknown to the decision maker and evolve according to some probabilistic law. By structured, we mean that they are restricted in some known way, e.g., they belong to a specific model class or must obey a set of known constraints. The objective in both problems is the design of an optimal algorithm, i.e., one that maximizes a certain performance metric. Because of the stochasticity, the algorithm faces an exploration-exploitation dilemma, where it must balance collecting information from the system and leveraging existing information to choose the best action or input. This trade-off  is best captured by the notion of regret, defined as the difference between the performance of the algorithm and an oracle which has full knowledge of the system. In the first part of the thesis, we investigate systems that can be modeled as Markov Decision Processes (MDPs) and derive general asymptotic and problem-specific regret lower bounds for ergodic and deterministic MDPs. We make these bounds explicit for MDPs that: i) are ergodic and unstructured, ii) have Lipschitz transitions and rewards, and iii) are deterministic and satisfy a decoupling property. Furthermore, we propose Directed Exploration Leaning (DEL), an algorithm that is valid for any ergodic MDP with any structure and whose regret upper bound matches the associated regret lower bounds, thus being truly optimal. For this algorithm, we present theoretical regret guarantees as well as a numerical demonstration that verifies its ability to exploit the underlying structure. In the second part, we study systems with uncertain linear dynamics and which are subject to additive disturbances as well as state and input constraints. We develop Self-Tuning Tube-based Model Predictive Control (STTMPC), an adaptive and robust model predictive control algorithm which leverages the least-squares estimator as well as polytopic tubes to guarantee robust constraint satisfaction, along with recursive feasibility, and input-to-state stability. The algorithm also ensures the persistence of excitation without compromising the system's asymptotic performance and with no increase in computational complexity. We also provide guarantees on the expected regret of STT-MPC, in the form of an upper bound whose rate explicitly depends on the chosen rate of excitation. The performance of the algorithm is also demonstrated via a numerical example.

Place, publisher, year, edition, pages
Stockholm: Kungliga Tekniska högskolan, 2023. , p. 152
Series
TRITA-EECS-AVL ; 2023:67
Keywords [en]
Reinforcement Learning, Adaptive Control, Dynamical Systems, Control Theory, Control Engineering
National Category
Control Engineering
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-337406ISBN: 978-91-8040-712-0 (print)OAI: oai:DiVA.org:kth-337406DiVA, id: diva2:1801807
Public defence
2023-10-23, Kollegiesalen, Brinellvägen 6, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP), WASP 66453
Note

QC 20231003

Available from: 2023-10-04 Created: 2023-10-02 Last updated: 2023-10-04Bibliographically approved
List of papers
1. Exploration in Structured Reinforcement Learning
Open this publication in new window or tab >>Exploration in Structured Reinforcement Learning
2018 (English)In: Advances In Neural Information Processing Systems 31 (NIPS 2018) / [ed] Bengio, S Wallach, H Larochelle, H Grauman, K CesaBianchi, N Garnett, R, Neural Information Processing Systems (NIPS) , 2018, Vol. 31Conference paper, Published paper (Refereed)
Abstract [en]

We address reinforcement learning problems with finite state and action spaces where the underlying MDP has some known structure that could be potentially exploited to minimize the exploration rates of suboptimal (state, action) pairs. For any arbitrary structure, we derive problem-specific regret lower bounds satisfied by any learning algorithm. These lower bounds are made explicit for unstructured MDPs and for those whose transition probabilities and average reward functions are Lipschitz continuous w.r.t. the state and action. For Lipschitz MDPs, the bounds are shown not to scale with the sizes S and A of the state and action spaces, i.e., they are smaller than c log T where T is the time horizon and the constant c only depends on the Lipschitz structure, the span of the bias function, and the minimal action sub-optimality gap. This contrasts with unstructured MDPs where the regret lower bound typically scales as SA log T. We devise DEL (Directed Exploration Learning), an algorithm that matches our regret lower bounds. We further simplify the algorithm for Lipschitz MDPs, and show that the simplified version is still able to efficiently exploit the structure.

Place, publisher, year, edition, pages
Neural Information Processing Systems (NIPS), 2018
Series
Advances in Neural Information Processing Systems, ISSN 1049-5258 ; 31
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-249916 (URN)000461852003043 ()2-s2.0-85064810437 (Scopus ID)
Conference
32nd Conference on Neural Information Processing Systems (NIPS), DEC 02-08, 2018, Montreal, CANADA
Note

QC 20190425. QC 20191023

Available from: 2019-04-25 Created: 2019-04-25 Last updated: 2023-10-03Bibliographically approved
2. Regret Analysis in Deterministic Reinforcement Learning
Open this publication in new window or tab >>Regret Analysis in Deterministic Reinforcement Learning
2021 (English)In: 2021 60TH IEEE CONFERENCE ON DECISION AND CONTROL (CDC), Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 2246-2251Conference paper, Published paper (Refereed)
Abstract [en]

We consider Markov Decision Processes (MDPs) with deterministic transitions and study the problem of regret minimization, which is central to the analysis and design of optimal learning algorithms. We present logarithmic problem-specific regret lower bounds that explicitly depend on the system parameter (in contrast to previous minimax approaches) and thus, truly quantify the fundamental limit of the performance achievable by any learning algorithm. Deterministic MDPs can be interpreted as graphs and analyzed in terms of their cycles, a fact which we leverage in order to identify a class of deterministic MDPs whose regret lower bound can be determined numerically. We further exemplify this result on a deterministic line search problem, and a deterministic MDP with state-dependent rewards, whose regret lower bounds we can state explicitly. These bounds share similarities with the known problem-specific bound of the multi-armed bandit problem and suggest that navigation on a deterministic MDP need not have an effect on the performance of a learning algorithm.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Series
IEEE Conference on Decision and Control, ISSN 0743-1546
National Category
Computer Sciences Other Physics Topics Communication Systems
Identifiers
urn:nbn:se:kth:diva-313022 (URN)10.1109/CDC45484.2021.9682972 (DOI)000781990302009 ()2-s2.0-85126035898 (Scopus ID)
Conference
60th IEEE Conference on Decision and Control (CDC), DEC 13-17, 2021, ELECTR NETWORK
Note

Part of proceedings: ISBN 978-1-6654-3659-5

QC 20220601

Available from: 2022-06-01 Created: 2022-06-01 Last updated: 2023-10-03Bibliographically approved
3. Self-Tuning Tube-based Model Predictive Control
Open this publication in new window or tab >>Self-Tuning Tube-based Model Predictive Control
2023 (English)In: 2023 American Control Conference, ACC 2023, Institute of Electrical and Electronics Engineers Inc. , 2023, p. 3626-3632Conference paper, Published paper (Refereed)
Abstract [en]

We present Self-Tuning Tube-based Model Predictive Control (STT-MPC), an adaptive robust control algorithm for uncertain linear systems with additive disturbances based on the least-squares estimator and polytopic tubes. Our algorithm leverages concentration results to bound the system uncertainty set with prescribed confidence, and guarantees robust constraint satisfaction for this set, along with recursive feasibility and input-to-state stability. Persistence of excitation is ensured without compromising the algorithm's asymptotic performance or increasing its computational complexity. We demonstrate the performance of our algorithm using numerical experiments.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2023
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-335048 (URN)10.23919/ACC55779.2023.10155796 (DOI)001027160303038 ()2-s2.0-85167784896 (Scopus ID)
Conference
2023 American Control Conference, ACC 2023, San Diego, United States of America, May 31 2023 - Jun 2 2023
Note

Part of ISBN 9798350328066

QC 20230831

Available from: 2023-08-31 Created: 2023-08-31 Last updated: 2023-10-03Bibliographically approved
4. Sub-linear Regret in Adaptive Model Predictive Control
Open this publication in new window or tab >>Sub-linear Regret in Adaptive Model Predictive Control
2023 (English)In: New Frontiers in Learning, Control, and Dynamical Systems Workshop / [ed] Valentin De Bortoli, Charlotte Bunne, Guan-Horng Liu, Tianrong Chen, Maxim Raginsky, Pratik Chaudhari, Melanie Zeilinger, Animashree Anandkumar, 2023Conference paper, Published paper (Refereed)
Abstract [en]

We consider the problem of adaptive ModelPredictive Control (MPC) for uncertain linear-systems with additive disturbances and with state and input constraints. We present STT-MPC (Self-Tuning Tube-based Model Predictive Control), an online algorithm that combines the certainty-equivalence principle and polytopic tubes. Specifically, at any given step, STT-MPC infers the system dynamics using the Least Squares Estimator(LSE), and applies a controller obtained by solving an MPC problem using these estimates. The use of polytopic tubes is so that, despite the uncertainties, state and input constraints are satisfied, and recursive-feasibility and asymptotic stability hold. In this work, we analyze the regret of the algorithm, when compared to an oracle algorithm initially aware of the system dynamics. We establish that the expected regret of STT-MPC does not exceed O(T 1/2+ε), where ε ∈ (0, 1) is a design parameter tuning the persistent excitation component of the algorithm. Our result relies on a recently proposed exponential decay of sensitivity property and, to the best of our knowledge, is the first of its kind in this setting. We illustrate the performance of our algorithm using a simple numerical example.

Keywords
Adaptive Control, Model Predictive Control
National Category
Computer Sciences
Research subject
Computer Science; Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-337466 (URN)
Conference
New Frontiers in Learning, Control, and Dynamical Systems, Workshop at the International Conference on Machine Learning (ICML) 2023, Hawaii Convention Center, July 28th 2023
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP), WASP 66453
Note

QC 20231123

Available from: 2023-10-03 Created: 2023-10-03 Last updated: 2023-11-23Bibliographically approved

Open Access in DiVA

Fulltext(31079 kB)193 downloads
File information
File name FULLTEXT01.pdfFile size 31079 kBChecksum SHA-512
1f8e8c1b82a8ddce2179e6a34182acb8987531dd879705eaa1c4b16adc78b1ab352b7a6e05bc63dc3b8d47c44f47a06dfb09a07e027cd1db2ab5a2d6e58d787a
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Tranos, Damianos
By organisation
Decision and Control Systems (Automatic Control)
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 193 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 944 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf