kth.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (9 of 9) Show all publications
Wang, Z. (2025). Distributionally Robust Optimization, Control and Games. (Licentiate dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Distributionally Robust Optimization, Control and Games
2025 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

In the era of data-driven decision-making, real-world applications often face uncertainties arising from noise, environmental shifts, and adversarial perturbations. These challenges can degrade model performance, lead to poor decisions, and introduce unforeseen risks. This thesis tackles these issues by developing robust decision-making frameworks for optimization, control, and games, with a particular focus on distributional robustness and risk-averse learning under uncertain data distributions. It consists of four parts.

In the first part, we consider outlier-robust distributionally robust optimization (DRO) problems, where the data distributions are subject to Wasserstein perturbations and outlier contamination. We propose a novel DRO framework leveraging a distance inspired by Unbalanced Optimal Transport (UOT). This UOT-based distance incorporates a soft penalization term in place of traditional hard constraints, enabling the construction of ambiguity sets that are more robust to outliers. Under appropriate smoothness conditions, we establish the strong duality of the proposed DRO formulation. Additionally, we present a computationally efficient Lagrangian penalty formulation and demonstrate that strong duality also holds. We provide empirical results that demonstrate that our method offers improved robustness to outliers and is computationally less demanding.

In the second part, we focus on the decision-dependent optimization problems, where the data distributions react in response to decisions, affecting both the objective function and linear constraints. We establish a sufficient condition for the existence of a constrained equilibrium point, at which the distributions remain invariant under retraining. We propose dual ascent and projected gradient descent algorithms, each with theoretical convergence guarantees, operating in the dual and primal spaces, respectively. Furthermore, we explore the relationship between the equilibrium point and the optimal point for the constrained decision-dependent optimization problem.

In the third part, we study risk-averse learning in online convex games using conditional value at risk (CVaR) as the risk measure. For the zeroth-order feedback setting where agents access only cost values of their selected actions, we propose risk-averse learning algorithms with sample reuse and variance reduction. For the first-order feedback setting where agents obtain gradient information, we develop a first-order risk-averse leaning algorithm based on value at risk estimates. Despite the bias in CVaR gradient estimates, we establish high-probability convergence guarantees for all proposed algorithms.

In the final part, we explore distributional reinforcement learning (DRL) in linear quadratic regulator (LQR) problems. A key challenge in DRL is the design of the distribution representation for policy evaluation. For discounted LQR control, we derive a closed-form expression for the random return and analyze its properties, including variance bounds, sensitivity, and finite approximation error. For unknown models, we introduce a model-free method to estimate the return distribution with sample complexity guarantees. We also extend these results to partially observable linear systems. Using the learned return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using CVaR as the risk measure.

Abstract [sv]

I en era av datadrivet beslutsfattande ställs verkliga tillämpningar ofta inför osäkerheter som uppstår från brus, miljöförändringar och adversariala störningar. Dessa utmaningar kan försämra modellens prestanda, leda till dåliga beslut och introducera oförutsedda risker. Denna avhandling hanterar dessa frågor genom att utveckla robusta beslutsramverk för optimering, styrning och spel, med särskilt fokus på distributionell robusthet och riskavert inlärning under osäkra datadistributioner. Den består av fyra delar.

I den första delen undersöker vi outlier-robusta problem inom distributionell robust optimering (DRO), där datadistributionerna är utsatta för störningar i form av Wasserstein-perturbationer och outlier-kontaminering. Vi föreslår ett nytt DRO-ramverk som utnyttjar ett avstånd inspirerat av Obalanserad Optimal Transport (UOT). Detta UOT-baserade avstånd inför en mjuk penaliseringskomponent istället för traditionella hårda begränsningar, vilket möjliggör konstruktionen av tvetydighetsmängder som är mer robusta mot outliers. Under lämpliga jämnhetsvillkor fastställer vi stark dualitet för den föreslagna DRO-formuleringen. Dessutom presenterar vi en beräkningsmässigt effektiv formulering med Lagrangestraff och visar att stark dualitet även gäller här. Vi presenterar empiriska resultat som visar att vår metod erbjuder förbättrad robusthet mot outliers och är beräkningsmässigt mindre krävande.

I den andra delen fokuserar vi på beslutberoende optimeringsproblem, där datadistributionerna förändras som svar på besluten och påverkar både målfunktionen och linjära begränsningar. Vi fastställer ett tillräckligt villkor för existensen av en begränsad jämviktspunkt, där distributionerna förblir oförändrade vid omträning. Vi föreslår dual ascent- och projicerade gradientnedstigningsalgoritmer, båda med teoretiska konvergensgarantier, som arbetar i respektive duala och primala rum. Vidare undersöker vi sambandet mellan jämviktspunkten och optimalpunkten för det beslutberoende optimeringsproblemet med begränsningar.

I den tredje delen studerar vi riskavert inlärning i online konvexa spel genom att använda Conditional Value at Risk (CVaR) som riskmått. För feedbackinställningen med nollte ordningen, där agenter endast har tillgång till kostnadsvärden för sina valda handlingar, föreslår vi riskaverta inlärningsalgoritmer med provåteranvändning och variansreduktion. För feedbackinställningen med första ordningen, där agenter får gradientinformation, utvecklar vi en riskavert inlärningsalgoritm baserad på Value at Risk-estimat. Trots bias i gradientestimat för CVaR, fastställer vi konvergensgarantier med hög sannolikhet för alla föreslagna algoritmer.

I den sista delen utforskar vi distributionsförstärkt förstärkningsinlärning (DRL) i problem med linjära kvadratiska regulatorer (LQR). En nyckelutmaning i DRL är utformningen av representationen för returdistributionen vid policyutvärdering. För diskonterad LQR-kontroll härleder vi ett slutet uttryck för den stokastiska returen och analyserar dess egenskaper, inklusive variationsgränser, känslighet och fel vid ändlig approximation. För okända modeller introducerar vi en modellfri metod för att uppskatta returdistributionen med garantier för provkomplexitet. Vi utvidgar också dessa resultat till partiellt observerbara linjära system. Med hjälp av den inlärda returdistributionen föreslår vi en policy-gradientalgoritm av nollte ordningen för riskavers LQR med CVaR som riskmått.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. xiii, 172
Series
TRITA-EECS-AVL ; 2025:7
Keywords
Distributionally robust optimization, decision-dependent optimization, risk-averse games, distributional LQR, Fördelningsrobust optimering, beslutsberoende optimering, riskaverta spel, fördelningsbaserad LQR
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-358102 (URN)978-91-8106-158-1 (ISBN)
Presentation
2025-01-31, https://kth-se.zoom.us/j/69761970586, Harry Nyquist, Malvinas väg 10, KTH Campus, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20250108

Available from: 2025-01-08 Created: 2025-01-07 Last updated: 2025-01-20Bibliographically approved
Wang, Z. (2025). Outlier-Robust Distributionally Robust Optimization via Unbalanced Optimal Transport. In: : . Paper presented at NeurIPS 2024 - The Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver Convention Center, 10-15 Dec, 2024.
Open this publication in new window or tab >>Outlier-Robust Distributionally Robust Optimization via Unbalanced Optimal Transport
2025 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Distributionally Robust Optimization (DRO) accounts for uncertainty in data distributions by optimizing the model performance against the worst possible distribution within an ambiguity set. In this paper, we propose a DRO framework that relieson a new distance inspired by Unbalanced Optimal Transport (UOT). The proposed UOT distance employs a soft penalization term instead of hard constraints, enabling the construction of an ambiguity set that is more resilient to outliers. Under smoothness conditions, we establish strong duality of the proposed DRO problem. Moreover, we introduce a computationally efficient Lagrangian penalty formulation for which we show that strong duality also holds. Finally, we provide empirical results that demonstrate that our method offers improved robustness to outliers and is computationally less demanding for regression and classification tasks

National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-358100 (URN)
Conference
NeurIPS 2024 - The Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver Convention Center, 10-15 Dec, 2024
Note

QC 20250114

Available from: 2025-01-07 Created: 2025-01-07 Last updated: 2025-01-14Bibliographically approved
Wang, Z., Shen, Y., Zavlanos, M. M. & Johansson, K. H. (2024). Learning of Nash Equilibria in Risk-Averse Games. In: 2024 American Control Conference, ACC 2024: . Paper presented at 2024 American Control Conference, ACC 2024, Toronto, Canada, Jul 10 2024 - Jul 12 2024 (pp. 3270-3275). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Learning of Nash Equilibria in Risk-Averse Games
2024 (English)In: 2024 American Control Conference, ACC 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 3270-3275Conference paper, Published paper (Refereed)
Abstract [en]

This paper considers risk-averse learning in convex games involving multiple agents that aim to minimize their individual risk of incurring significantly high costs. Specifically, the agents adopt the conditional value at risk (CVaR) as a risk measure with possibly different risk levels. To solve this problem, we propose a first-order risk-averse leaning algorithm, in which the CVaR gradient estimate depends on an estimate of the Value at Risk (VaR) value combined with the gradient of the stochastic cost function. Although estimation of the CVaR gradients using finitely many samples is generally biased, we show that the accumulated error of the CVaR gradient estimates is bounded with high probability. Moreover, assuming that the risk-averse game is strongly monotone, we show that the proposed algorithm converges to the risk-averse Nash equilibrium. We present numerical experiments on a Cournot game example to illustrate the performance of the proposed method.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Computer Sciences Probability Theory and Statistics
Identifiers
urn:nbn:se:kth:diva-354317 (URN)10.23919/ACC60939.2024.10644891 (DOI)2-s2.0-85204439865 (Scopus ID)
Conference
2024 American Control Conference, ACC 2024, Toronto, Canada, Jul 10 2024 - Jul 12 2024
Note

QC 20241003

Part of ISBN 979-8-3503-8265-5

Available from: 2024-10-02 Created: 2024-10-02 Last updated: 2024-10-03Bibliographically approved
Wang, Z., Shen, Y., Zavlanos, M. M. & Johansson, K. H. (2024). Outlier-Robust Distributionally Robust Optimization via Unbalanced Optimal Transport. In: Advances in Neural Information Processing Systems 37 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024: . Paper presented at 38th Conference on Neural Information Processing Systems, NeurIPS 2024, Vancouver, Canada, Dec 9 2024 - Dec 15 2024. Neural information processing systems foundation
Open this publication in new window or tab >>Outlier-Robust Distributionally Robust Optimization via Unbalanced Optimal Transport
2024 (English)In: Advances in Neural Information Processing Systems 37 - 38th Conference on Neural Information Processing Systems, NeurIPS 2024, Neural information processing systems foundation , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Distributionally Robust Optimization (DRO) accounts for uncertainty in data distributions by optimizing the model performance against the worst possible distribution within an ambiguity set. In this paper, we propose a DRO framework that relies on a new distance inspired by Unbalanced Optimal Transport (UOT). The proposed UOT distance employs a soft penalization term instead of hard constraints, enabling the construction of an ambiguity set that is more resilient to outliers. Under smoothness conditions, we establish strong duality of the proposed DRO problem. Moreover, we introduce a computationally efficient Lagrangian penalty formulation for which we show that strong duality also holds. Finally, we provide empirical results that demonstrate that our method offers improved robustness to outliers and is computationally less demanding for regression and classification tasks.

Place, publisher, year, edition, pages
Neural information processing systems foundation, 2024
National Category
Computer graphics and computer vision Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-361955 (URN)2-s2.0-105000478053 (Scopus ID)
Conference
38th Conference on Neural Information Processing Systems, NeurIPS 2024, Vancouver, Canada, Dec 9 2024 - Dec 15 2024
Note

QC 20250409

Available from: 2025-04-03 Created: 2025-04-03 Last updated: 2025-04-09Bibliographically approved
Wang, Z., Shen, Y., Zavlanos, M. M. & Johansson, K. H. (2023). Convergence Analysis of the Best Response Algorithm for Time-Varying Games. In: 2023 62nd IEEE Conference on Decision and Control, CDC 2023: . Paper presented at 62nd IEEE Conference on Decision and Control, CDC 2023, Singapore, Singapore, Dec 13 2023 - Dec 15 2023 (pp. 1144-1149). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Convergence Analysis of the Best Response Algorithm for Time-Varying Games
2023 (English)In: 2023 62nd IEEE Conference on Decision and Control, CDC 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 1144-1149Conference paper, Published paper (Refereed)
Abstract [en]

This paper studies a class of strongly monotone games involving non-cooperative agents that optimize their own time-varying cost functions. We assume that the agents can observe other agents' historical actions and choose actions that best respond to other agents' previous actions; we call this a best response scheme. We start by analyzing the convergence rate of this best response scheme for standard time-invariant games. Specifically, we provide a sufficient condition on the strong monotonicity parameter of the time-invariant games under which the proposed best response algorithm achieves exponential convergence to the static Nash equilibrium. We further illustrate that this best response algorithm may oscillate when the proposed sufficient condition fails to hold, which indicates that this condition is tight. Next, we analyze this best response algorithm for time-varying games where the cost functions of each agent change over time. Under similar conditions as for time-invariant games, we show that the proposed best response algorithm stays asymptotically close to the evolving equilibrium. We do so by analyzing both the equilibrium tracking error and the dynamic regret. Numerical experiments on economic market problems are presented to validate our analysis.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
Proceedings of the IEEE Conference on Decision and Control, ISSN 0743-1546
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-343745 (URN)10.1109/CDC49753.2023.10383751 (DOI)001166433800138 ()2-s2.0-85184817602 (Scopus ID)
Conference
62nd IEEE Conference on Decision and Control, CDC 2023, Singapore, Singapore, Dec 13 2023 - Dec 15 2023
Note

Part of ISBN 9798350301243

QC 20240222

Available from: 2024-02-22 Created: 2024-02-22 Last updated: 2024-03-26Bibliographically approved
Wang, Z., Gao, Y., Wang, S., Zavlanos, M. M., Abate, A. & Johansson, K. H. (2023). Policy Evaluation in Distributional LQR. In: Proceedings of the 5th Annual Learning for Dynamics and Control Conference, L4DC 2023: . Paper presented at 5th Annual Conference on Learning for Dynamics and Control, L4DC 2023, Philadelphia, PA, United States of America, Jun 16 2023 - Jun 15 2023 (pp. 1245-1256). ML Research Press, 211
Open this publication in new window or tab >>Policy Evaluation in Distributional LQR
Show others...
2023 (English)In: Proceedings of the 5th Annual Learning for Dynamics and Control Conference, L4DC 2023, ML Research Press , 2023, Vol. 211, p. 1245-1256Conference paper, Published paper (Refereed)
Abstract [en]

Distributional reinforcement learning (DRL) enhances the understanding of the effects of the randomness in the environment by letting agents learn the distribution of a random return, rather than its expected value as in standard RL. At the same time, a main challenge in DRL is that policy evaluation in DRL typically relies on the representation of the return distribution, which needs to be carefully designed. In this paper, we address this challenge for a special class of DRL problems that rely on discounted linear quadratic regulator (LQR) for control, advocating for a new distributional approach to LQR, which we call distributional LQR. Specifically, we provide a closed-form expression of the distribution of the random return which, remarkably, is applicable to all exogenous disturbances on the dynamics, as long as they are independent and identically distributed (i.i.d.). While the proposed exact return distribution consists of infinitely many random variables, we show that this distribution can be approximated by a finite number of random variables, and the associated approximation error can be analytically bounded under mild assumptions. Using the approximate return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using the Conditional Value at Risk (CVaR) as a measure of risk. Numerical experiments are provided to illustrate our theoretical results.

Place, publisher, year, edition, pages
ML Research Press, 2023
Series
Proceedings of Machine Learning Research, ISSN 26403498
Keywords
Distributional LQR, distributional RL, policy evaluation, risk-averse control
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-338027 (URN)001221742900095 ()2-s2.0-85172892657 (Scopus ID)
Conference
5th Annual Conference on Learning for Dynamics and Control, L4DC 2023, Philadelphia, PA, United States of America, Jun 16 2023 - Jun 15 2023
Note

QC 20231013

Available from: 2023-10-13 Created: 2023-10-13 Last updated: 2024-09-05Bibliographically approved
Wang, Z., Shen, Y., Bell, Z. I., Nivison, S., Zavlanos, M. M. & Johansson, K. H. (2022). A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games. In: 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC): . Paper presented at IEEE 61st Conference on Decision and Control (CDC), DEC 06-09, 2022, Cancun, MEXICO (pp. 5179-5184). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A Zeroth-Order Momentum Method for Risk-Averse Online Convex Games
Show others...
2022 (English)In: 2022 IEEE 61ST CONFERENCE ON DECISION AND CONTROL (CDC), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 5179-5184Conference paper, Published paper (Refereed)
Abstract [en]

We consider risk-averse learning in repeated unknown games where the goal of the agents is to minimize their individual risk of incurring significantly high cost. Specifically, the agents use the conditional value at risk (CVaR) as a risk measure and rely on bandit feedback in the form of the cost values of the selected actions at every episode to estimate their CVaR values and update their actions. A major challenge in using bandit feedback to estimate CVaR is that the agents can only access their own cost values, which, however, depend on the actions of all agents. To address this challenge, we propose a new risk-averse learning algorithm with momentum that utilizes the full historical information on the cost values. We show that this algorithm achieves sub-linear regret and matches the best known algorithms in the literature. We provide numerical experiments for a Cournot game that show that our method outperforms existing methods.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
IEEE Conference on Decision and Control, ISSN 0743-1546
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-326430 (URN)10.1109/CDC51059.2022.9992630 (DOI)000948128104054 ()2-s2.0-85147035163 (Scopus ID)
Conference
IEEE 61st Conference on Decision and Control (CDC), DEC 06-09, 2022, Cancun, MEXICO
Note

QC 20230503

Available from: 2023-05-03 Created: 2023-05-03 Last updated: 2023-05-03Bibliographically approved
Wang, Z., Gao, Y., Liu, Y., Wang, S. & Wu, L. (2022). Distributed dynamic event-triggered communication and control for multi-agent consensus: A hybrid system approach. Information Sciences, 618, 191-208
Open this publication in new window or tab >>Distributed dynamic event-triggered communication and control for multi-agent consensus: A hybrid system approach
Show others...
2022 (English)In: Information Sciences, ISSN 0020-0255, E-ISSN 1872-6291, Vol. 618, p. 191-208Article in journal (Refereed) Published
Abstract [en]

This paper investigates the output-based event-triggered communication and control for linear multi-agent consensus under a directed graph based on a co-design method. The communication among the agents, as well as the controller updates, are determined by some new event-triggering mechanisms, in order to reduce the use of the network resources. To simultaneously guarantee the significant properties including the asymptotic consensus, and strong Zeno-freeness (strictly positive inter-event times), a novel dis-tributed dynamic event-triggered protocol is proposed. Unlike the most-existing emulation-based approaches in which the control gain is previously decided, a systematic co-design procedure is proposed to design the controller gain, the observer gain, and the event-triggering mechanisms altogether in terms of solving a linear matrix inequality opti-mization problem. Based on the resulting hybrid system framework, a hybrid model is established for the distributed closed-loop system and the asymptotic consensus is achieved. Finally, a numerical example is presented to verify our systematic design methodology.

Place, publisher, year, edition, pages
Elsevier BV, 2022
Keywords
Distributed event-triggered control, Multi-agent systems, Directed communication graphs, Co-design approach
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-323348 (URN)10.1016/j.ins.2022.11.005 (DOI)000900845400012 ()2-s2.0-85141923184 (Scopus ID)
Note

QC 20230127

Available from: 2023-01-27 Created: 2023-01-27 Last updated: 2023-01-27Bibliographically approved
Wang, Z., Shen, Y. & Zavlanos, M. M. (2022). Risk-Averse No-Regret Learning in Online Convex Games. In: Proceedings of the 39th International Conference on Machine Learning, ICML 2022: . Paper presented at 39th International Conference on Machine Learning, ICML 2022, Baltimore, United States of America, Jul 17 2022 - Jul 23 2022 (pp. 22999-23017). ML Research Press
Open this publication in new window or tab >>Risk-Averse No-Regret Learning in Online Convex Games
2022 (English)In: Proceedings of the 39th International Conference on Machine Learning, ICML 2022, ML Research Press , 2022, p. 22999-23017Conference paper, Published paper (Refereed)
Abstract [en]

We consider an online stochastic game with risk-averse agents whose goal is to learn optimal decisions that minimize the risk of incurring significantly high costs. Specifically, we use the Conditional Value at Risk (CVaR) as a risk measure that the agents can estimate using bandit feedback in the form of the cost values of only their selected actions. Since the distributions of the cost functions depend on the actions of all agents that are generally unobservable, they are themselves unknown and, therefore, the CVaR values of the costs are difficult to compute. To address this challenge, we propose a new online risk-averse learning algorithm that relies on one-point zeroth-order estimation of the CVaR gradients computed using CVaR values that are estimated by appropriately sampling the cost functions. We show that this algorithm achieves sub-linear regret with high probability. We also propose two variants of this algorithm that improve performance. The first variant relies on a new sampling strategy that uses samples from the previous iteration to improve the estimation accuracy of the CVaR values. The second variant employs residual feedback that uses CVaR values from the previous iteration to reduce the variance of the CVaR gradient estimates. We theoretically analyze the convergence properties of these variants and illustrate their performance on an online market problem that we model as a Cournot game.

Place, publisher, year, edition, pages
ML Research Press, 2022
Series
Proceedings of Machine Learning Research, ISSN 26403498
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-335778 (URN)000900130204008 ()2-s2.0-85138129689 (Scopus ID)
Conference
39th International Conference on Machine Learning, ICML 2022, Baltimore, United States of America, Jul 17 2022 - Jul 23 2022
Note

QC 20230908

Available from: 2023-09-08 Created: 2023-09-08 Last updated: 2023-10-09Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-6464-492X

Search in DiVA

Show all publications