kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Discrete-Time Inverse Linear Quadratic Optimal Control over Finite Time-Horizon under Noisy Output Measurements
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Optimization and Systems Theory.ORCID iD: 0000-0002-3905-0633
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Optimization and Systems Theory.ORCID iD: 0000-0001-7287-1495
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Optimization and Systems Theory.ORCID iD: 0000-0003-0177-1993
2021 (English)In: Control Theory and Technology, ISSN 2095-6983, Vol. 19, no 4, p. 563-572Article in journal (Refereed) Published
Abstract [en]

In this paper, the problem of inverse quadratic optimal control over finite time-horizon for discrete-time linear systems is considered. Our goal is to recover the corresponding quadratic objective function using noisy observations. First, the identifiability of the model structure for the inverse optimal control problem is analyzed under relative degree assumption and we show the model structure is strictly globally identifiable. Next, we study the inverse optimal control problem whose initial state distribution and the observation noise distribution are unknown, yet the exact observations on the initial states are available. We formulate the problem as a risk minimization problem and approximate the problem using empirical average. It is further shown that the solution to the approximated problem is statistically consistent under the assumption of relative degrees. We then study the case where the exact observations on the initial states are not available, yet the observation noises are known to be white Gaussian distributed and the distribution of the initial state is also Gaussian (with unknown mean and covariance). EM-algorihm is used to estimate the parameters in the objective function. The effectiveness of our results are demonstrated by numerical examples.

Place, publisher, year, edition, pages
Springer Nature , 2021. Vol. 19, no 4, p. 563-572
Keywords [en]
Inverse optimal control, Linear quadratic regulator, Statistical consistency, EM-algorithm
National Category
Control Engineering
Identifiers
URN: urn:nbn:se:kth:diva-310129DOI: 10.1007/s11768-021-00066-8ISI: 000718777100001Scopus ID: 2-s2.0-85119060446OAI: oai:DiVA.org:kth-310129DiVA, id: diva2:1646325
Note

QC 20220323

Available from: 2022-03-22 Created: 2022-03-22 Last updated: 2022-06-25Bibliographically approved
In thesis
1. Inverse and Forward Approaches for Optimal Control and Estimation in Agent-Based Systems
Open this publication in new window or tab >>Inverse and Forward Approaches for Optimal Control and Estimation in Agent-Based Systems
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This dissertation is concerned with three topics within the field of optimal control and estimation in dynamical agent-based systems, with potential applications that meet both engineering and societal needs. Firstly, the inverse optimal control problem is studied. Given a dynamical system, the goal is to recover the underlying cost function from observations of the optimal state trajectories. Such recovery of cost functions will not only help us develop a better understanding of natural and societal phenomena, but also provide a criterion to design optimal controllers in similar contexts. Secondly, we further study synthesis of collective emergence in multi-agent systems. The problem is fit into a game theoretical framework based on modeling the strategic interactions among self-oriented agents. In this thesis the specific topic of intrinsic formation control is addressed, in which designing individual cost functions to realize desired optimal emergence is a critical issue. Finally, topics of distributed coordination are also considered for societal systems, or more specifically, in mathematical finance. The credit scoring problem is studied by incorporating dynamical networked information.

Specifically, in Paper A and Paper C, the finite-horizon inverse optimal control problem is studied for continuous-time systems, with full or partial state observations. Although the infinite-horizon inverse linear quadratic problem is well-studied with numerous results, the finite-horizon case is still an open problem. To the best of our knowledge, our result is the first complete result on necessary and sufficient conditions for the solvability of such inverse problem. The uniqueness of solutions is studied and the equivalence class of cost functions is derived. In addition, based on system invertibility a well-posed inverse problem is formulated even for the case in which the optimal synthesis can only be partially observed. As for suboptimal observations, residual optimization problems are solved to obtain a best-fit approximate cost function.

Paper B further studies the inverse optimal control problem in a stochastic set-up, where partial state observations of a discrete-time system are available under measurement noise. Firstly, by formulating the problem as a system identification task with the exact initial states as model excitations, its identifiability is justified under the relative degree assumption and statistical consistency is shown for the empirical estimation. Furthermore, as for more practical scenarios with imperfect initial states as well, the problem is fit into the framework of maximum likelihood estimation and is solved by Expectation Maximization algorithm under Gaussian assumptions. 

In Paper D, the intrinsic formation control problem of a multi-agent system is formulated as both finite- and infinite-horizon noncooperative differential games. The manifold of all equivalent configurations of the desired formation is studied by considering all orientations and agent permutations, whose convergence and stability are analyzed in both cases. The main novelty of our work lies in that the desired relative pattern is not predefined in the game, and is achieved intrinsically only via different choices of the communication topology of the multi-agent system without using formation errors in the controller, which can be hard to obtain in practice. Patterns of regular polyhedra and antipodal formations are achieved by Nash equilibria while inter-agent collisions are naturally avoided.

Paper E concerns the network-based credit scoring problem and the advantages of such incorporation are studied in two scenarios. Firstly, when the score publishing is merely individual-dependent, an optimal Bayesian filter is designed for risk prediction, which serves as a reference for the lender on future financial decisions. Secondly, a recursive Bayes estimator is proposed to further improve the accuracy of score publishing by incorporating the dynamical network topology as well. It is shown that under the proposed evolution framework, the designed biased estimator has a higher precision than any efficient estimator, and the mean square errors are strictly smaller than the Cramér-Rao lower bound for clients within a certain range of scores.

Abstract [sv]

Denna avhandling behandlar tre ämnen inom optimal styrteori och estimering för agentbaserade dynamiska system, med både ingenjörs- och samhälleliga tillämpningar. Först studeras det inversa problemet för optimal styrning. Där är målet att givet ett dynamiskt system återskapa den underliggande kostnadsfunktionen utifrån observationer av optimala trajektorier. Detta ger inte endast större förståelse för naturliga och samhälleliga fenomen, utan också kriterier för att bestämma optimala regulatorer för andra, liknande problem. Vidare studerar vi även framträdandet av emergens i system med flera agenter. Problemet behandlas utifrån ett spelteoretiskt ramverk som modellerar strategiska interaktioner mellan självinriktade agenter. I denna avhandling behandlas särskilt formering (immanent formationsreglering), där ett särskilt problem är att bestämma individuella kostnadsfunktioner för att uppnå optimal emergens. Slutligen behandlas även distribuerad samordning av samhälleliga, specifikt finansiella, system. Kreditvärderingsproblemet studeras genom att ta in information från ett dynamiskt nätverkssystem.

Det inversa problemet för optimal styrning av kontinuerliga system med finit tidshorisont studeras särskilt i artikel A och C, antingen med fullständiga eller partiella observationer av tillstånd. Trots att det linjärkvadratiska problemet med infinit tidshorisont har studerats omfattande, är motsvarande problem med finit tidshorisont till stor del olöst. Vad vi vet är vårt resultat det första fullständiga gällande nödvändiga och tillräckliga villkor för att ett sådant inverst problem ska vara lösbart. Vi behandlar lösningars unikhet och härleder kostnadsfunktionernas ekvivalensklass. Dessutom ställer vi, utifrån systemets inverterbarhet, upp ett välformulerat problem även för fallet då optimal syntes endast delvis kan observeras. För icke-optimala observationer löser vi minimeringsproblem av residualer för att ta bäst skatta kostnadsfunktionen.

Vidare behandlar artikel B the inversa problemet för optimal styrning utifrån en stokastisk modell, där partiella tillståndsobservationer görs med ett slumpmässigt mätfel. Först formuleras problemet som ett identifieringsproblem där de exakta initiala tillstånden (används att) excitera modellen, vars identifierbarhet motiveras under ett antagande om systemets relativa grad och vars statistiska konsistens visas vid empirisk estimering. Sedan anpassas problemet till maximum likelihoodestimering för att behandla mer praktiska scenarier med icke-exakta initiala tillstånd. Problemet löses då med en väntevärdesmaximerande algoritm under antaganden om Gaussiska sannolikhetsfördelningar.

I artikel D formuleras formeringsproblemet för system med flera agenter som ett differentiellt spel utan samarbete för både finit och infinit tidshorisont. Mångfalden av alla ekvivalenta konfigurationer av önskade formationer studeras genom att ta hänsyn till alla permutationer av riktningar och agenter, vars konvergens och stabilitet analyseras i båda fallen. Det nya i vårt arbete ligger framför allt i att den önskade formationen inte definieras i förväg i spelet, utan uppnås enbart genom valet av systemets kommunikationstopologi utan användning av formeringsfelet i regulatorn, vilket annars kan vara svårt att få information om. Regelbundna polyedrar och antipodiska formationer uppnås genom Nashjämvikt medan kollisioner agenter emellan undviks naturligt.

Artikel E berör nätverksbaserad kreditvärdering och fördelen av nätverksbaserad information studeras i två scenarier. Först, då kreditvärderingen endast är individberoende, bestäms ett optimalt Bayesiskt filter för riskpredicering vilket används som referens för långivaren för framtida finansiella beslut. Sedan föreslås en rekursiv Bayesisk estimator för att ytterligare förbättra kreditvärderingen genom att också ta hänsyn till den dynamiska nätverkstopologin. Vi visar att den föreslagna, icke-väntevärdesriktiga, estimatorn har högre precision än någon effektiv estimator i det föreslagna utvärderingsramverket (evolutionsramverket), och att de genomsnittliga kvadratfelen är strikt mindre än Cramér-Raos undre begränsning för kunder inom ett särskilt kreditvärdighetsintervall.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2022. p. 196
Series
TRITA-SCI-FOU ; 2022;19
Keywords
Autonomous systems; inverse optimal control; system identification; nonlinear systems; formation control; differential game; credit scoring., Autonoma system; invers optimal styrning; system identifiering; olinjära system; formationsstyrning; differentialspel; kreditvärdering.
National Category
Computational Mathematics
Research subject
Applied and Computational Mathematics, Optimization and Systems Theory
Identifiers
urn:nbn:se:kth:diva-311742 (URN)978-91-8040-233-0 (ISBN)
Public defence
2022-06-02, F3, Lindstedsvägen 26, KTH, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 220504

Available from: 2022-05-04 Created: 2022-05-03 Last updated: 2023-01-02Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Zhang, HanLi, YibeiHu, Xiaoming

Search in DiVA

By author/editor
Zhang, HanLi, YibeiHu, Xiaoming
By organisation
Optimization and Systems Theory
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 101 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf