kth.sePublications
12 2 of 2
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Explainable Reinforcement Learning for Mobile Network Optimization
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-3230-6237
2025 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The growing complexity of mobile networks has driven the need for automated optimization approaches, with Reinforcement Learning (RL) emerging as a promising data-driven technique for controlling network parameters. However, RL systems often operate as black boxes, lacking the interpretability and transparency required by Mobile Network Operators (MNOs) and Artificial Intelligence (AI) engineers to trust, monitor, and refine their behavior. This lack poses significant challenges, particularly in the telecommunications domain, where ensuring alignment with operational objectives and maintaining reliable network performance is critical. Consequently, there is a pressing need for explainability methods that make RL decisions more interpretable and understandable for stakeholders.

This thesis investigates the emerging field of Explainable Reinforcement Learning (XRL), specifically focusing on its application to mobile network optimization. In the context of single-agent RL, we evaluate two state-of-the-art XRL techniques in the Remote Electrical Tilt (RET) optimization problem, where the tilt of each antenna needs to be controlled to optimize network coverage and capacity. These methods address two distinct interpretability challenges in RL: (i) understanding the state-action mapping determined by an RL policy and (ii) explaining the long-term goal of an RL agent. These evaluations highlight the potential and limitations of existing XRL methods when applied to a simulated mobile network.

To address a significant gap in the literature on single-agent XRL, we devise a novel algorithm, Temporal Policy Decomposition (TPD), which explains RL actions by predicting their outcomes in upcoming time steps. This method provides a clear view of an agent's anticipated behavior starting from a given environment state by generating insights for individual time steps. These time-aware explanations offer a comprehensive understanding of the decision-making process that accounts for the sequential nature of RL.

We then focus on multi-agent systems and develop a rollout-based algorithm to estimate Local Shapley Values (LSVs), quantifying individual agent contributions in specific states. This method reliably identifies agent contributions even in scenarios involving undertrained or suboptimal agents, making it a valuable tool for monitoring and diagnosing cooperative multi-agent systems.

These contributions represent a step toward a holistic explainability framework for RL in mobile networks, combining single-agent and multi-agent perspectives. By addressing core interpretability challenges, this research equips MNOs and AI engineers with practical techniques to trust, monitor, debug, and refine RL models. Furthermore, it helps ensure readiness for potential regulatory requirements, contributing to the broader goal of advancing trustworthy AI in telecommunications.

Abstract [sv]

Den ökande komplexiteten hos mobila nätverk har drivit på behovet av automatiserade optimeringsmetoder, där Reinforcement Learning (RL) framstår som en lovande datadriven teknik för att kontrollera nätverksparametrar. RL-system fungerar dock ofta som svarta lådor som saknar den tolkningsbarhet och transparens som krävs av mobilnätsoperatörer och AI-utvecklare för att kunna lita på, övervaka och förbättra deras beteende. Denna brist utgör betydande utmaningar, särskilt inom telekommunikationsområdet, där det är kritiskt att säkerställa överensstämmelse med operativa mål och upprätthålla pålitlig nätverksprestanda.

Den här avhandlingen undersöker det framväxande området Explainable Reinforcement Learning (XRL), med fokus speciellt på dess tillämpning för mobilnätsoperatörer. I samband med single-agent RL utvärderar vi två toppmoderna XRL-tekniker för optimeringsproblemet Remote Electrical Tilt (RET), där lutningen av varje antenn måste kontrolleras för att optimera täckning och kapacitet. Dessa metoder tar itu med två distinkta tolkbarhetsutmaningar i RL: (i) förstå tillstånds-handlingsmappningen som bestäms av en RL-policy och (ii) att förklara det långsiktiga målet för en RL-agent. Dessa utvärderingar belyser potentialen och begränsningarna hos befintliga XRL-metoder när de tillämpas på ett simulerat mobilnät.

För att ta itu med en betydande lucka i litteraturen om single-agent XRL, utvecklar vi en ny algoritm, Temporal Policy Decomposition (TPD), som förklarar RL-handlingar genom att förutsäga deras resultat i kommande tidssteg. Denna metod ger en tydlig bild av en agents förväntade beteende från ett givet tillstånd genom att generera insikter för individuella tidssteg. Dessa tidsmedvetna förklaringar ger en omfattande förståelse för beslutsprocessen som tar hänsyn till RL:s sekventiella karaktär.

Vi fokuserar sedan på system med flera agenter och utvecklar en utrullningsbaserad algoritm för att uppskatta lokala Shapley-värden (LSV), som kvantifierar individuella agentbidrag i specifika stater. Denna metod identifierar på ett tillförlitligt sätt agentbidrag även i scenarier som involverar undertränade eller suboptimala agenter, vilket gör den till ett värdefullt verktyg för att övervaka och diagnostisera kooperativa multiagentsystem.

Dessa bidrag representerar ett steg mot en holistisk förklaringsram för RL i mobilnät, som kombinerar enagent- och multiagentperspektiv. Genom att ta itu med centrala tolkningsutmaningar, utrustar denna forskning MNO:er och AI-utvecklare med praktiska tekniker för att lita på, felsöka, övervaka, och förbättra RL-modeller. Dessutom bidrar det till att säkerställa beredskap för potentiellt kommande regulatoriska krav, vilket bidrar till det bredare målet att främja pålitlig AI inom telekommunikation.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2025. , p. x, 47
Series
TRITA-EECS-AVL ; 2025:17
Keywords [en]
Artificial Intelligence, Machine Learning, Reinforcement Learning, Explainable Artificial Intelligence, Explainable Reinforcement Learning, Mobile Network Optimization, Telecommunications
National Category
Computer Sciences Electrical Engineering, Electronic Engineering, Information Engineering Telecommunications Artificial Intelligence
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-358957ISBN: 978-91-8106-180-2 (print)OAI: oai:DiVA.org:kth-358957DiVA, id: diva2:1932434
Presentation
2025-02-21, https://kth-se.zoom.us/j/66674834407, Harry Nyquist, Malvinas väg 10, Stockholm, 15:00 (English)
Opponent
Supervisors
Funder
Swedish Foundation for Strategic ResearchWallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20250129

Available from: 2025-01-29 Created: 2025-01-29 Last updated: 2025-01-30Bibliographically approved
List of papers
1. Evaluation of Intrinsic Explainable Reinforcement Learning in Remote Electrical Tilt Optimization
Open this publication in new window or tab >>Evaluation of Intrinsic Explainable Reinforcement Learning in Remote Electrical Tilt Optimization
2024 (English)In: Proceedings of 8th International Congress on Information and Communication Technology - ICICT 2023, Springer Nature , 2024, p. 835-854Conference paper, Published paper (Refereed)
Abstract [en]

This paper empirically evaluates two intrinsic Explainable Reinforcement Learning (XRL) algorithms on the Remote Electrical Tilt (RET) optimization problem. In RET optimization, where the electrical downtilt of the antennas in a cellular network is controlled to optimize coverage and capacity, explanations are necessary to understand the reasons behind a specific adjustment. First, we formulate the RET problem in the reinforcement learning (RL) framework and describe how we apply Decomposed Reward Deep Q Network (drDQN) and Linear ModelU-Tree (LMUT), which are two state-of-the-art XRL algorithms. Then, we train and test such agents in a realistic simulated network. Our results highlight both advantages and disadvantages of the algorithms. DrDQN provides intuitive contrastive local explanations for the agent’s decisions to adjust the downtilt of an antenna, while achieving the same performance as the original DQN algorithm. LMUT reaches high performance while employing a fully transparent linear model capable of generating both local and global explanations. On the other hand, drDQN adds a constraint on the reward design that might be problematic for the specification of the objective, whereas LMUT could generate misleading global feature importance and needs additional developments to provide more user-interpretable local explanations.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Artificial intelligence, Cellular networks, Explainable reinforcement learning, Reinforcement learning, Remote electrical tilt optimization
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-339274 (URN)10.1007/978-981-99-3236-8_67 (DOI)2-s2.0-85174736754 (Scopus ID)
Conference
8th International Congress on Information and Communication Technology, ICICT 2023, London, United Kingdom of Great Britain and Northern Ireland, Feb 20 2023 - Feb 23 2023
Note

Part of ISBN 9789819932351

QC 20231106

Available from: 2023-11-06 Created: 2023-11-06 Last updated: 2025-01-29Bibliographically approved
2. Rollout-based Shapley Values for Explainable Cooperative Multi-Agent Reinforcement Learning
Open this publication in new window or tab >>Rollout-based Shapley Values for Explainable Cooperative Multi-Agent Reinforcement Learning
Show others...
2024 (English)In: 2024 IEEE International Conference on Machine Learning for Communication and Networking, ICMLCN 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 227-233Conference paper, Published paper (Refereed)
Abstract [en]

Credit assignment in cooperative Multi-Agent Reinforcement Learning (MARL) focuses on quantifying individual agent contributions toward achieving a shared objective. One widely adopted approach to compute these contributions is through the application of Shapley Values, a concept derived from game theory. Previous research in Explainable Reinforcement Learning (XRL) successfully computed Global Shapley Values (GSVs), albeit neglecting local explanations in specific situations. In contrast, another approach concentrated on learning Local Shapley Values (LSVs) during training, prioritizing sample efficiency over explainability. In this paper, we extend an existing method to generate local and global explanations in a model-agnostic manner, bridging the gap between these two approaches. We apply our proposed algorithm to two cooperative tasks: a predator-prey environment and an antenna tilt optimization problem in cellular networks. Our findings reveal that the LSVs offer valuable insights into the agents' behavior with a finer time-frame granularity, while their aggregation in GSVs enhances trust by potentially identifying suboptimality. Importantly, our approach surpasses the existing state-of-the-art methods in estimating LSVs, enhancing the accuracy of assessing individual agent contributions. This work represents a significant advancement in the field of XRL and provides a powerful tool for gaining deeper insights into agents' behavior in cooperative MARL systems.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Explainable Reinforcement Learning, Multi-Agent Reinforcement Learning, Remote Electrical Tilt Optimization, Shapley Values
National Category
Computer Sciences Control Engineering
Identifiers
urn:nbn:se:kth:diva-353555 (URN)10.1109/ICMLCN59089.2024.10624777 (DOI)001307813600039 ()2-s2.0-85202446671 (Scopus ID)
Conference
1st IEEE International Conference on Machine Learning for Communication and Networking, ICMLCN 2024, Stockholm, Sweden, May 5 2024 - May 8 2024
Note

Part of ISBN 9798350343199

QC 20241111

Available from: 2024-09-19 Created: 2024-09-19 Last updated: 2025-01-29Bibliographically approved
3. Explainable Reinforcement Learning via Temporal Policy Decomposition
Open this publication in new window or tab >>Explainable Reinforcement Learning via Temporal Policy Decomposition
(English)Manuscript (preprint) (Other academic)
Abstract [en]

We investigate the explainability of Reinforcement Learning (RL) policies from a temporal perspective, focusing on the sequence of future outcomes associated with individual actions. In RL, value functions compress information about rewards collected across multiple trajectories and over an infinite horizon, allowing a compact form of knowledge representation. However, this compression obscures the temporal details inherent in sequential decision-making, presenting a key challenge for interpretability. We present Temporal Policy Decomposition (TPD), a novel explainability approach that explains individual RL actions in terms of their Expected Future Outcome (EFO). These explanations decompose generalized value functions into a sequence of EFOs, one for each time step up to a prediction horizon of interest, revealing insights into when specific outcomes are expected to occur. We leverage fixed-horizon temporal difference learning to devise an off-policy method for learning EFOs for both optimal and suboptimal actions, enabling contrastive explanations consisting of EFOs for different state-action pairs. Our experiments demonstrate that TPD generates accurate explanations that (i) clarify the policy's future strategy and anticipated trajectory for a given action and (ii) improve understanding of the reward composition, facilitating fine-tuning of the reward function to align with human expectations. 

Keywords
Explainable Reinforcement Learning, Explainable Artificial Intelligence, Reinforcement Learning, Artificial Intelligence, Explainability
National Category
Computer Sciences Control Engineering
Identifiers
urn:nbn:se:kth:diva-358955 (URN)10.48550/arXiv.2501.03902 (DOI)
Funder
Swedish Foundation for Strategic Research
Note

QC 20250129

Available from: 2025-01-24 Created: 2025-01-24 Last updated: 2025-01-29Bibliographically approved

Open Access in DiVA

fulltext(2048 kB)62 downloads
File information
File name FULLTEXT01.pdfFile size 2048 kBChecksum SHA-512
7d27727e01be82dc7e307359c6f17f83c223389c549f0faa15eabe1af1c185bce42061421c07c84cf1ccb877fea048ce080929e27ea8580a4a4f6c5e805f574d
Type fulltextMimetype application/pdf

Authority records

Ruggeri, Franco

Search in DiVA

By author/editor
Ruggeri, Franco
By organisation
Decision and Control Systems (Automatic Control)
Computer SciencesElectrical Engineering, Electronic Engineering, Information EngineeringTelecommunicationsArtificial Intelligence

Search outside of DiVA

GoogleGoogle Scholar
Total: 62 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1240 hits
12 2 of 2
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf