The growing complexity of mobile networks has driven the need for automated optimization approaches, with Reinforcement Learning (RL) emerging as a promising data-driven technique for controlling network parameters. However, RL systems often operate as black boxes, lacking the interpretability and transparency required by Mobile Network Operators (MNOs) and Artificial Intelligence (AI) engineers to trust, monitor, and refine their behavior. This lack poses significant challenges, particularly in the telecommunications domain, where ensuring alignment with operational objectives and maintaining reliable network performance is critical. Consequently, there is a pressing need for explainability methods that make RL decisions more interpretable and understandable for stakeholders.
This thesis investigates the emerging field of Explainable Reinforcement Learning (XRL), specifically focusing on its application to mobile network optimization. In the context of single-agent RL, we evaluate two state-of-the-art XRL techniques in the Remote Electrical Tilt (RET) optimization problem, where the tilt of each antenna needs to be controlled to optimize network coverage and capacity. These methods address two distinct interpretability challenges in RL: (i) understanding the state-action mapping determined by an RL policy and (ii) explaining the long-term goal of an RL agent. These evaluations highlight the potential and limitations of existing XRL methods when applied to a simulated mobile network.
To address a significant gap in the literature on single-agent XRL, we devise a novel algorithm, Temporal Policy Decomposition (TPD), which explains RL actions by predicting their outcomes in upcoming time steps. This method provides a clear view of an agent's anticipated behavior starting from a given environment state by generating insights for individual time steps. These time-aware explanations offer a comprehensive understanding of the decision-making process that accounts for the sequential nature of RL.
We then focus on multi-agent systems and develop a rollout-based algorithm to estimate Local Shapley Values (LSVs), quantifying individual agent contributions in specific states. This method reliably identifies agent contributions even in scenarios involving undertrained or suboptimal agents, making it a valuable tool for monitoring and diagnosing cooperative multi-agent systems.
These contributions represent a step toward a holistic explainability framework for RL in mobile networks, combining single-agent and multi-agent perspectives. By addressing core interpretability challenges, this research equips MNOs and AI engineers with practical techniques to trust, monitor, debug, and refine RL models. Furthermore, it helps ensure readiness for potential regulatory requirements, contributing to the broader goal of advancing trustworthy AI in telecommunications.