In the era of data-driven decision-making, real-world applications often face uncertainties arising from noise, environmental shifts, and adversarial perturbations. These challenges can degrade model performance, lead to poor decisions, and introduce unforeseen risks. This thesis tackles these issues by developing robust decision-making frameworks for optimization, control, and games, with a particular focus on distributional robustness and risk-averse learning under uncertain data distributions. It consists of four parts.
In the first part, we consider outlier-robust distributionally robust optimization (DRO) problems, where the data distributions are subject to Wasserstein perturbations and outlier contamination. We propose a novel DRO framework leveraging a distance inspired by Unbalanced Optimal Transport (UOT). This UOT-based distance incorporates a soft penalization term in place of traditional hard constraints, enabling the construction of ambiguity sets that are more robust to outliers. Under appropriate smoothness conditions, we establish the strong duality of the proposed DRO formulation. Additionally, we present a computationally efficient Lagrangian penalty formulation and demonstrate that strong duality also holds. We provide empirical results that demonstrate that our method offers improved robustness to outliers and is computationally less demanding.
In the second part, we focus on the decision-dependent optimization problems, where the data distributions react in response to decisions, affecting both the objective function and linear constraints. We establish a sufficient condition for the existence of a constrained equilibrium point, at which the distributions remain invariant under retraining. We propose dual ascent and projected gradient descent algorithms, each with theoretical convergence guarantees, operating in the dual and primal spaces, respectively. Furthermore, we explore the relationship between the equilibrium point and the optimal point for the constrained decision-dependent optimization problem.
In the third part, we study risk-averse learning in online convex games using conditional value at risk (CVaR) as the risk measure. For the zeroth-order feedback setting where agents access only cost values of their selected actions, we propose risk-averse learning algorithms with sample reuse and variance reduction. For the first-order feedback setting where agents obtain gradient information, we develop a first-order risk-averse leaning algorithm based on value at risk estimates. Despite the bias in CVaR gradient estimates, we establish high-probability convergence guarantees for all proposed algorithms.
In the final part, we explore distributional reinforcement learning (DRL) in linear quadratic regulator (LQR) problems. A key challenge in DRL is the design of the distribution representation for policy evaluation. For discounted LQR control, we derive a closed-form expression for the random return and analyze its properties, including variance bounds, sensitivity, and finite approximation error. For unknown models, we introduce a model-free method to estimate the return distribution with sample complexity guarantees. We also extend these results to partially observable linear systems. Using the learned return distribution, we propose a zeroth-order policy gradient algorithm for risk-averse LQR using CVaR as the risk measure.