kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Dual control concepts for linear dynamical systems
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0003-1014-502x
2022 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

We study simultaneous learning and control of linear dynamical systems. In such a setting, control policies are derived with respect to two objectives: i) to control the system as well as possible, given the current knowledge of system dynamics (exploitation), and ii) to gather as much information as possible about the unknown system that can improve control (exploration).These two objectives are often in conflict, and this phenomenon is known as the exploration-exploitation trade-off.More specifically, in the context of simultaneous learning and control, we consider: linear quadratic regulation (LQR) problem, model reference control, and data-driven control based on Willems \textit{et al.}'s fundamental lemma. 

First, we consider the LQR problem with unknown dynamics. We present robust and certainty equivalence (CE) model-based control methods that balance exploration and exploitation. We focus on control policies that can be iteratively updated after sequentially collecting data.

We propose robust (with respect to parameter uncertainty) LQR design methods. To quantify uncertainty, we derive a methodbased on Bayesian inference, which is directly applicable to robust control synthesis. To begin, we derive a robust controller to minimize the worst-case cost, with high probability, given the empirical observation of the system. This robust controller synthesis is then used to derive a robust dual controller, which updates its control policy after collecting data. An episode in which data is collected is called exploration, and the episode using an updated control policy called exploitation. The objective is to minimize the worst-case cost of the updated control policy, requiring that a given exploration budget constrains the worst-case cost during exploration. Additionally, we derive methods that balance exploration and exploitation to minimize the cumulative worst-case cost for a fixed number of episodes. In this thesis, we refer to such a problem as robust reinforcement learning. Essentially, it is a robust dual controller aiming to minimize the cumulative worst-case cost, and that updates its control policy in each episode.Numerical experiments show that the proposed methods perform better than existing state-of-the-art algorithms. Moreover, experiments also indicate that the exploration prioritizes the uncertainty reduction in the parameters that matter most for control.

A control policy using the CE principle for LQR consists of a sum of an optimal controller calculated using estimated dynamics at time $t$, and an additive external excitation.  It has been shown over the years that the optimal asymptotic rate of regret is in many instances $\mathcal{O}(\sqrt{T})$. In particular, this rate can be obtained by adding a white noise external excitation, with a variance decaying as $\gamma/\sqrt{T}$, where $\gamma$ is a predefined constant. As the amount of excitation is pre-determined, such approaches can be viewed as open-loop control of the external excitation.  In this thesis, we approach the problem of designing the external excitation from a feedback perspective leveraging the well-known benefits of feedback control for decreasing sensitivity to external disturbances and system-model mismatch, as compared to open-loop strategies. The benefits of this approach over the open-loop approach can be seen in the case of unmodeled dynamics and disturbances. However, even when using the benefits of feedback control, we do not calculate the optimal amount of external excitation. To find the optimal amount of external excitation, we suggest exploration strategies that are based on a time-dependent scaling $\gamma_t$ and can attain cumulative regret similar to or lower than cumulative regret obtained for optimal scaling $\gamma^*$ according to numerical examples.

Second, we consider the model reference control problem with the goal of proposing a data-driven robust control design method based on an average risk criterion, which we call Bayes control. We show that this approach has very close ties to the Bayesian kernel-based method, but the conceptual difference lies in the use of a deterministic respective stochastic setting for the system parameters.  

Finally, we consider data-driven control using Willems \textit{et al.}'s fundamental lemma. First, we propose variations of the fundamental lemma that, instead of a data trajectory, utilize correlation functions in the time domain, as well as power spectra of the input and the output in the frequency domain. Since data-driven control using the fundamental lemma can become a very expensive computation task for large datasets, the proposed variations are easy to computeeven for large datasets and can be efficient as a data compression technique. Second, we study connections of data informativity conditions between the results based on the fundamental lemma (finite time), and classical system identification. We show that finite time informativity conditions for state-space systems are closely linked to the identifiability conditions derived from the fundamental lemma. We prove that the obtained persistency of excitation conditions for infinite time are sufficient conditions for finite time informativity. Moreover, we reveal that the obtained conditions for a finite time in closed-loop are stricter than in classical system identification. This is a consequence of the noiseless data setting in the fundamental lemma that precludes the possibility of noise to excite the system in a feedback setting.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2022. , p. 203
Series
TRITA-EECS-AVL ; 2022-47
National Category
Control Engineering
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-316648ISBN: 978-91-8040-299-6 (print)OAI: oai:DiVA.org:kth-316648DiVA, id: diva2:1690950
Public defence
2022-09-23, https://kth-se.zoom.us/j/62470041935, F3, Lindstedtsvägen 26, Stockholm, 10:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20220830

Available from: 2022-08-30 Created: 2022-08-29 Last updated: 2022-09-07Bibliographically approved

Open Access in DiVA

fulltext(49661 kB)1393 downloads
File information
File name FULLTEXT01.pdfFile size 49661 kBChecksum SHA-512
19dfe9b110fd749470d6c5779bb709ba575511c9e5c4cfdb3fd712979968c83559d42ba9e7fd16087988f39ae50ed79995114afc958d3d237c2646add9ebe3e6
Type fulltextMimetype application/pdf

Authority records

Ferizbegovic, Mina

Search in DiVA

By author/editor
Ferizbegovic, Mina
By organisation
Decision and Control Systems (Automatic Control)
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 1398 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1450 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf