kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 12) Show all publications
Ziemann, I., Tsiamis, A., Sandberg, H. & Matni, N. (2022). How are policy gradient methods affected by the limits of control?. In: 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC): . Paper presented at IEEE 61st Conference on Decision and Control (CDC), DEC 06-09, 2022, Cancun, MEXICO (pp. 5992-5999). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>How are policy gradient methods affected by the limits of control?
2022 (English)In: 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 5992-5999Conference paper, Published paper (Refereed)
Abstract [en]

We study stochastic policy gradient methods from the perspective of control-theoretic limitations. Our main result is that ill-conditioned linear systems in the sense of Doyle inevitably lead to noisy gradient estimates. We also give an example of a class of stable systems in which policy gradient methods suffer from the curse of dimensionality. Finally, we show how our results extend to partially observed systems.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
IEEE Conference on Decision and Control, ISSN 0743-1546
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-326404 (URN)10.1109/CDC51059.2022.9992612 (DOI)000948128105010 ()2-s2.0-85147024672 (Scopus ID)
Conference
IEEE 61st Conference on Decision and Control (CDC), DEC 06-09, 2022, Cancun, MEXICO
Note

QC 20230502

Available from: 2023-05-02 Created: 2023-05-02 Last updated: 2023-05-02Bibliographically approved
Tsiamis, A., Ziemann, I., Morari, M., Matni, N. & Pappas, G. J. (2022). Learning to Control Linear Systems can be Hard. In: Proceedings of 35th Conference on Learning Theory, COLT 2022: . Paper presented at 35th Conference on Learning Theory, COLT 2022, London, United Kingdom of Great Britain and Northern Ireland, Jul 2 2022 - Jul 5 2022 (pp. 3820-3857). ML Research Press
Open this publication in new window or tab >>Learning to Control Linear Systems can be Hard
Show others...
2022 (English)In: Proceedings of 35th Conference on Learning Theory, COLT 2022, ML Research Press , 2022, p. 3820-3857Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we study the statistical difficulty of learning to control linear systems. We focus on two standard benchmarks, the sample complexity of stabilization, and the regret of the online learning of the Linear Quadratic Regulator (LQR). Prior results state that the statistical difficulty for both benchmarks scales polynomially with the system state dimension up to system-theoretic quantities. However, this does not reveal the whole picture. By utilizing minimax lower bounds for both benchmarks, we prove that there exist nontrivial classes of systems for which learning complexity scales dramatically, i.e. exponentially, with the system dimension. This situation arises in the case of underactuated systems, i.e. systems with fewer inputs than states. Such systems are structurally difficult to control and their system theoretic quantities can scale exponentially with the system dimension dominating learning complexity. Under some additional structural assumptions (bounding systems away from uncontrollability), we provide qualitatively matching upper bounds. We prove that learning complexity can be at most exponential with the controllability index of the system, that is the degree of underactuation.

Place, publisher, year, edition, pages
ML Research Press, 2022
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-333435 (URN)2-s2.0-85146925813 (Scopus ID)
Conference
35th Conference on Learning Theory, COLT 2022, London, United Kingdom of Great Britain and Northern Ireland, Jul 2 2022 - Jul 5 2022
Note

QC 20230802

Available from: 2023-08-02 Created: 2023-08-02 Last updated: 2023-08-02Bibliographically approved
Ziemann, I. & Tu, S. (2022). Learning with little mixing. In: Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022: . Paper presented at 36th Conference on Neural Information Processing Systems, NeurIPS 2022, New Orleans, United States of America, Nov 28 2022 - Dec 9 2022. Neural information processing systems foundation
Open this publication in new window or tab >>Learning with little mixing
2022 (English)In: Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022, Neural information processing systems foundation , 2022Conference paper, Published paper (Refereed)
Abstract [en]

We study square loss in a realizable time-series framework with martingale difference noise. Our main result is a fast rate excess risk bound which shows that whenever a trajectory hypercontractivity condition holds, the risk of the least-squares estimator on dependent data matches the iid rate order-wise after a burn-in time. In comparison, many existing results in learning from dependent data have rates where the effective sample size is deflated by a factor of the mixing-time of the underlying process, even after the burn-in time. Furthermore, our results allow the covariate process to exhibit long range correlations which are substantially weaker than geometric ergodicity. We call this phenomenon learning with little mixing, and present several examples for when it occurs: bounded function classes for which the L2 and L2+ε norms are equivalent, ergodic finite state Markov chains, various parametric models, and a broad family of infinite dimensional ℓ2(N) ellipsoids. By instantiating our main result to system identification of nonlinear dynamics with generalized linear model transitions, we obtain a nearly minimax optimal excess risk bound after only a polynomial burn-in time.

Place, publisher, year, edition, pages
Neural information processing systems foundation, 2022
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-333387 (URN)2-s2.0-85149642683 (Scopus ID)
Conference
36th Conference on Neural Information Processing Systems, NeurIPS 2022, New Orleans, United States of America, Nov 28 2022 - Dec 9 2022
Note

Part of ISBN 9781713871088

QC 20230801

Available from: 2023-08-01 Created: 2023-08-01 Last updated: 2023-08-01Bibliographically approved
Ziemann, I., Sandberg, H. & Matni, N. (2022). Single Trajectory Nonparametric Learning of Nonlinear Dynamics. In: Proceedings of 35th Conference on Learning Theory, COLT 2022: . Paper presented at 35th Conference on Learning Theory, COLT 2022, London, United Kingdom of Great Britain and Northern Ireland, Jul 2 2022 - Jul 5 2022 (pp. 3333-3364). ML Research Press
Open this publication in new window or tab >>Single Trajectory Nonparametric Learning of Nonlinear Dynamics
2022 (English)In: Proceedings of 35th Conference on Learning Theory, COLT 2022, ML Research Press , 2022, p. 3333-3364Conference paper, Published paper (Refereed)
Abstract [en]

Given a single trajectory of a dynamical system, we analyze the performance of the nonparametric least squares estimator (LSE). More precisely, we give nonasymptotic expected l2-distance bounds between the LSE and the true regression function, where expectation is evaluated on a fresh, counterfactual, trajectory. We leverage recently developed information-theoretic methods to establish the optimality of the LSE for nonparametric hypotheses classes in terms of supremum norm metric entropy and a subgaussian parameter. Next, we relate this subgaussian parameter to the stability of the underlying process using notions from dynamical systems theory. When combined, these developments lead to rate-optimal error bounds that scale as T−1/(2+q) for suitably stable processes and hypothesis classes with metric entropy growth of order δ−q. Here, T is the length of the observed trajectory, δ ∈ R+ is the packing granularity and q ∈ (0, 2) is a complexity term. Finally, we specialize our results to a number of scenarios of practical interest, such as Lipschitz dynamics, generalized linear models, and dynamics described by functions in certain classes of Reproducing Kernel Hilbert Spaces (RKHS).

Place, publisher, year, edition, pages
ML Research Press, 2022
National Category
Control Engineering Signal Processing
Identifiers
urn:nbn:se:kth:diva-334444 (URN)2-s2.0-85164702345 (Scopus ID)
Conference
35th Conference on Learning Theory, COLT 2022, London, United Kingdom of Great Britain and Northern Ireland, Jul 2 2022 - Jul 5 2022
Note

QC 20230821

Available from: 2023-08-21 Created: 2023-08-21 Last updated: 2023-08-21Bibliographically approved
Ziemann, I. (2022). Statistical Learning, Dynamics and Control: Fast Rates and Fundamental Limits for Square Loss. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Statistical Learning, Dynamics and Control: Fast Rates and Fundamental Limits for Square Loss
2022 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

Learning algorithms play an ever increasing role in modern engineering solutions. However, despite many recent successes, their performance in the context of dynamical and control systems is not exactly well-understood. This becomes especially pertinent as we move toward deploying these algorithms in safety-critical systems such as our power grids or smart mobility systems. These shortcomings motivate the present thesis. Broadly, by blending tools from control theory, information theory and statistical learning theory, this thesis seeks to further our understanding of when learning in the context of (controlled) dynamical systems is easy or hard. That is, we seek to quantify those properties which are key, and those which are not, for learning to succeed in this context.

In the first part of the thesis, we study learning with square loss (regression) in a realizable time-series framework with martingale difference noise. Our main result is a fast rate excess risk boundwhich shows that whenever a\emph{trajectory hypercontractivity} condition holds, the risk of the least-squares estimator on dependent data matches the \iid\ rate order-wise after a burn-in time.In comparison, many existing results in learning from dependent data have rates where the effective sample size is deflated by a factor of the mixing-time of the underlying process, even after the burn-in time. However, recent work on linear dynamical systems and generalized linear models has shown that this is deflation is not always necessary. The main contribution of the first part of the thesis is to extend this observation to a much wider class of examples. Furthermore, our results allow the covariate process to exhibit long range correlations which are substantially weaker than geometric ergodicity. We present several examples for when this occurs: bounded function classes forwhich the $L^2$ and $L^{2+\e}$ norms are equivalent, ergodic finite state Markov chains, various parametric models including linear dynamical systems and generalized linear models, and a broad family of infinite dimensional $\ell^2(\mathbb{N})$-ellipsoids. 

In the second part of the thesis, we study fundamental limits to learning-based solutions of the linear quadratic Gaussian (LQG) problem. The majority of this part is devoted to the online (or adaptive) control problem, in which a learner (or control engineer) seeks to control an unknown linear system subject to a quadratic (square) loss function with minimal regret (cumulative suboptimality). %To solve this problem, the learner must solve a trade-off between exploration and exploitation. Put differently,The learner faces the dual objectives of controlling the system to the best of their present knowledge while simultaneously ensuring that enough information about the system is gathered. We prove fundamental limits - regret lower bounds - to this problem by relating the regret of any policy to an associated Fisher information and applying the Van Trees' Inequality (Bayesian Cramér-Rao). We instantiate our bounds for state-feedback and partially observed systems. We show that ill-conditioned systems - in terms of their controllability and observability structure - lead to larger regret. In other words, classical control-theoretic limitations carry over to the learning-based setting. We further show that systems as simple as integrators lead to \emph{exponential} in the problem dimension regret lower bounds. Finally, we also study the variance of policy gradient methods. We derive results similar in spirit to those for the online LQG problem: ill-conditioned systems, as per poor controllability or observability, yield noisier gradients. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2022. p. vii, 163
Series
TRITA-EECS-AVL ; 2022:65
Keywords
Machine Learning, Control Theory, Statistical Learning, Fundamental Limits
National Category
Electrical Engineering, Electronic Engineering, Information Engineering Control Engineering
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-320345 (URN)978-91-8040-385-6 (ISBN)
Public defence
2022-11-11, Zoom: https://kth-se.zoom.us/j/63439289698, Kollegiesalen, Brinellvägen 6, Stockholm, 15:00 (English)
Opponent
Supervisors
Note

QC 20221019

Available from: 2022-10-19 Created: 2022-10-19 Last updated: 2022-11-09Bibliographically approved
Ziemann, I. (2021). Applications of Information Inequalities to Linear Systems: Adaptive Control and Security. (Licentiate dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Applications of Information Inequalities to Linear Systems: Adaptive Control and Security
2021 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

This thesis considers the application of information inequalities, Cramér-Rao type bounds, based on Fisher information, to linear systems. These tools are used to study the trade-offs between learning and performance in two application areas: adaptive control and control systems security.

In the first part of the thesis, we study stochastic adaptive control of linear quadratic regulators (LQR). Here, information inequalities are used to derive instance-dependent  regret lower bounds. First, we consider a simplified version of LQR, a memoryless reference tracking model, and show how regret can be linked to a cumulative estimation error. This is then exploited to derive a regret lower bound in terms of the Fisher information generated by the experiment of the optimal policy. It is shown that if the optimal policy has ill-conditioned Fisher information, then so does any low-regret policy. This is combined with a Cramér-Rao bound to give a regret lower bound on the order of magnitude square-root T in the time-horizon  for a class of instances we call uninformative. The lower bound holds for all policies which depend smoothly on the underlying parametrization.

Second, we extend these results to the general LQR model, and to arbitrary affine parametrizations of the instance parameters. The notion of uninformativeness is generalized to this situation to give a structure-dependent rank condition for when logarithmic regret is impossible. This is done by reduction of regret to a cumulative Bellman error. Due to the quadratic nature of LQR, this Bellman error turns out to be a quadratic form, which again can be interpreted as an estimation error. Using this, we prove a local minimax regret lower bound, of which the proof relies on relating the minimax regret to a Bayesian estimation problem, and then using Van Trees' inequality. Again, it is shown that an appropriate information quantity of any low regret policy is similar to that of the optimal policy and that any uninformative instance suffers local minimax regret at least on the order of magnitude square-root T. Moreover, it shown that the notion of uninformativeness when specialized to certain well-understood scenarios yields a tight characterization of square-root-regret.

In the second part of this thesis, we study control systems security problems from a Fisher information point of view. First, we consider a secure state estimation problem and characterize the maximal impact an adversary can cause by means of least informative distributions -- those which maximize the Cramér-Rao bound. For a linear measurement equation, it is shown that the least informative distribution, subjected to variance and sparsity constraints, can be solved for by a semi-definite program, which becomes mixed-integer in the presence of sparsity constraints. Furthermore, by relying on well-known results on minimax and robust estimation, a game-theoretic interpretation for this characterization of the maximum impact is offered.

Last, we consider a Fisher information regularized minimum variance control objective, to study the trade-offs between parameter privacy and control performance. It is noted that this can be motivated for instance by learning-based attacks, in which case one seeks to leak as little information as possible to a system-identification adversary. Supposing that the feedback law is linear, the noise distribution minimizing the trace of Fisher information subject to a state variance penalty is found to be conditionally Gaussian. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2021. p. 133
Series
TRITA-EECS-AVL ; 2021:18
Keywords
Stochastic Adaptive Control, Machine Learning, Fisher Information, Secure Control, Fundamental Limitations, Reinforcement Learning
National Category
Control Engineering
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-291203 (URN)978-91-7873-804-5 (ISBN)
Presentation
2021-03-31, F3, Lindstedtsvägen 26, Stockholm, 15:00 (English)
Opponent
Supervisors
Funder
Swedish Foundation for Strategic Research, CLAS
Note

QC 20210310

QC 20210310

Available from: 2021-03-10 Created: 2021-03-04 Last updated: 2022-06-25Bibliographically approved
Ziemann, I. & Ziemann, V. (2021). Noninvasively improving the orbit-response matrix while continuously correcting the orbit. Physical Review Accelerators and Beams, 24(7), Article ID 072804.
Open this publication in new window or tab >>Noninvasively improving the orbit-response matrix while continuously correcting the orbit
2021 (English)In: Physical Review Accelerators and Beams, E-ISSN 2469-9888, Vol. 24, no 7, article id 072804Article in journal (Refereed) Published
Abstract [en]

Based on continuously recorded beam positions and corrector excitations from, for example, a closedorbit feedback system we describe an algorithm that continuously updates an estimate of the orbit response matrix. The speed of convergence can be increased by adding very small perturbations, so-called dither, to the corrector excitations. Estimates for the rate of convergence and the asymptotically achievable accuracies are provided.

Place, publisher, year, edition, pages
American Physical Society (APS), 2021
National Category
Accelerator Physics and Instrumentation Control Engineering
Identifiers
urn:nbn:se:kth:diva-300243 (URN)10.1103/PhysRevAccelBeams.24.072804 (DOI)000685008400001 ()2-s2.0-85113149445 (Scopus ID)
Note

QC 20210831

Available from: 2021-08-31 Created: 2021-08-31 Last updated: 2023-03-31Bibliographically approved
Ziemann, I. & Sandberg, H. (2021). On a Phase Transition of Regret in Linear Quadratic Control: The Memoryless Case. IEEE Control Systems Letters, 5(2), 695-700, Article ID 9126856.
Open this publication in new window or tab >>On a Phase Transition of Regret in Linear Quadratic Control: The Memoryless Case
2021 (English)In: IEEE Control Systems Letters, E-ISSN 2475-1456, Vol. 5, no 2, p. 695-700, article id 9126856Article in journal (Refereed) Published
Abstract [en]

We consider an idealized version of adaptive control of a multiple input multiple output (MIMO) system without state. We demonstrate how rank deficient Fisher information in this simple memoryless problem leads to the impossibility of logarithmic rates of regret. Our analysis rests on a version of the Cramèr-Rao inequality that takes into account possible ill-conditioning of Fisher information and a pertubation result on the corresponding singular subspaces. This is used to define a sufficient condition, which we term uniformativeness, for regret to be at least order square root in the samples. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2021
Keywords
Adaptive control, estimation error, machine learning, stochastic systems, Adaptive control systems, Fisher information matrix, Linear control systems, Fisher information, Ill-conditioning, Linear quadratic control, Logarithmic rates, Memoryless, Singular subspaces, Square roots, MIMO systems
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-285250 (URN)10.1109/LCSYS.2020.3005224 (DOI)000548754100023 ()2-s2.0-85089175821 (Scopus ID)
Note

QC 20201112

Available from: 2020-11-12 Created: 2020-11-12 Last updated: 2024-01-09Bibliographically approved
Ziemann, I. & Sandberg, H. (2021). Resource Constrained Sensor Attacks by Minimizing Fisher Information. In: 2021 American control conference (ACC): . Paper presented at 2021 American Control Conference, ACC 2021, Virtual, New Orleans, 25 May 2021 through 28 May 2021 (pp. 4580-4585). Institute of Electrical and Electronics Engineers Inc., Article ID 9482865.
Open this publication in new window or tab >>Resource Constrained Sensor Attacks by Minimizing Fisher Information
2021 (English)In: 2021 American control conference (ACC), Institute of Electrical and Electronics Engineers Inc. , 2021, p. 4580-4585, article id 9482865Conference paper, Published paper (Refereed)
Abstract [en]

We analyze the impact of sensor attacks on a linear state estimation problem subject to variance and sparsity constraints. We show that the maximum impact in a leader-follower game where the attacker first chooses the distribution of an adversarial perturbation and the defender follows by choosing an estimator is characterized by a minimum Fisher information principle. In general, this is a nonlinear variational problem, but we show that it can be reduced to a finite-dimensional mixed integer SDP. Alternatively, the proposed solution can be seen as a lower bound on the maximum impact for a game in which the defender plays first.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2021
Series
Proceedings of the American Control Conference, ISSN 0743-1619
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-304546 (URN)10.23919/ACC50511.2021.9482865 (DOI)000702263304106 ()2-s2.0-85111940527 (Scopus ID)
Conference
2021 American Control Conference, ACC 2021, Virtual, New Orleans, 25 May 2021 through 28 May 2021
Note

QC 20220420

Part of proceedings: ISBN 978-1-6654-4197-1

Available from: 2021-11-09 Created: 2021-11-09 Last updated: 2022-06-25Bibliographically approved
Ziemann, I. & Sandberg, H. (2020). Parameter Privacy versus Control Performance: Fisher Information Regularized Control. In: 2020 AMERICAN CONTROL CONFERENCE (ACC): . Paper presented at American Control Conference (ACC), JUL 01-03, 2020, Denver, CO (pp. 1259-1265). IEEE
Open this publication in new window or tab >>Parameter Privacy versus Control Performance: Fisher Information Regularized Control
2020 (English)In: 2020 AMERICAN CONTROL CONFERENCE (ACC), IEEE , 2020, p. 1259-1265Conference paper, Published paper (Refereed)
Abstract [en]

This article introduces and solves a new privacy-related optimization problem for cyber-physical systems where an adversary tries to learn the system dynamics. In the context of linear quadratic systems, we consider the problem of achieving a small cost while balancing the need for keeping knowledge about the model's parameters private. To this end, we formulate a Fisher information regularized version of the linear quadratic regulator with cheap cost. Here the control operator is allowed to not only control the plant but also mask its state by injecting further noise. Within the class of linear policies with additive noise, we solve this problem and show that the optimal noise distribution is Gaussian with state dependent covariance. Next, we prove that the optimal linear feedback law is the same as without regularization. Finally, to motivate our proposed scheme, we formulate for scalar systems an equivalent maximin problem for the worst-case scenario in which the adversary has full knowledge of all other inputs and outputs. Here, our policies are maximin optimal with respect to maximizing the variance over all asymptotically unbiased estimators.

Place, publisher, year, edition, pages
IEEE, 2020
Series
Proceedings of the American Control Conference, ISSN 0743-1619
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-292721 (URN)10.23919/ACC45564.2020.9147690 (DOI)000618079801041 ()2-s2.0-85084280958 (Scopus ID)
Conference
American Control Conference (ACC), JUL 01-03, 2020, Denver, CO
Note

QC 20210412

Available from: 2021-04-12 Created: 2021-04-12 Last updated: 2023-04-05Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4140-1279

Search in DiVA

Show all publications