kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rollout-based Shapley Values for Explainable Cooperative Multi-Agent Reinforcement Learning
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control). Ericsson Research, Stockholm, Sweden.ORCID iD: 0000-0002-3230-6237
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control). Ericsson Research, Stockholm, Sweden.
KTH, School of Industrial Engineering and Management (ITM), Engineering Design, Mechatronics and Embedded Control Systems. Ericsson Research, Stockholm, Sweden.ORCID iD: 0000-0002-6650-2789
KTH, School of Industrial Engineering and Management (ITM), Engineering Design, Mechatronics and Embedded Control Systems. Ericsson Research, Stockholm, Sweden.ORCID iD: 0000-0001-7448-3381
Show others and affiliations
2024 (English)In: 2024 IEEE International Conference on Machine Learning for Communication and Networking, ICMLCN 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 227-233Conference paper, Published paper (Refereed)
Abstract [en]

Credit assignment in cooperative Multi-Agent Reinforcement Learning (MARL) focuses on quantifying individual agent contributions toward achieving a shared objective. One widely adopted approach to compute these contributions is through the application of Shapley Values, a concept derived from game theory. Previous research in Explainable Reinforcement Learning (XRL) successfully computed Global Shapley Values (GSVs), albeit neglecting local explanations in specific situations. In contrast, another approach concentrated on learning Local Shapley Values (LSVs) during training, prioritizing sample efficiency over explainability. In this paper, we extend an existing method to generate local and global explanations in a model-agnostic manner, bridging the gap between these two approaches. We apply our proposed algorithm to two cooperative tasks: a predator-prey environment and an antenna tilt optimization problem in cellular networks. Our findings reveal that the LSVs offer valuable insights into the agents' behavior with a finer time-frame granularity, while their aggregation in GSVs enhances trust by potentially identifying suboptimality. Importantly, our approach surpasses the existing state-of-the-art methods in estimating LSVs, enhancing the accuracy of assessing individual agent contributions. This work represents a significant advancement in the field of XRL and provides a powerful tool for gaining deeper insights into agents' behavior in cooperative MARL systems.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. p. 227-233
Keywords [en]
Explainable Reinforcement Learning, Multi-Agent Reinforcement Learning, Remote Electrical Tilt Optimization, Shapley Values
National Category
Computer Sciences Control Engineering
Identifiers
URN: urn:nbn:se:kth:diva-353555DOI: 10.1109/ICMLCN59089.2024.10624777ISI: 001307813600039Scopus ID: 2-s2.0-85202446671OAI: oai:DiVA.org:kth-353555DiVA, id: diva2:1899230
Conference
1st IEEE International Conference on Machine Learning for Communication and Networking, ICMLCN 2024, Stockholm, Sweden, May 5 2024 - May 8 2024
Note

Part of ISBN 9798350343199

QC 20241111

Available from: 2024-09-19 Created: 2024-09-19 Last updated: 2025-01-29Bibliographically approved
In thesis
1. Explainable Reinforcement Learning for Mobile Network Optimization
Open this publication in new window or tab >>Explainable Reinforcement Learning for Mobile Network Optimization
2025 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

The growing complexity of mobile networks has driven the need for automated optimization approaches, with Reinforcement Learning (RL) emerging as a promising data-driven technique for controlling network parameters. However, RL systems often operate as black boxes, lacking the interpretability and transparency required by Mobile Network Operators (MNOs) and Artificial Intelligence (AI) engineers to trust, monitor, and refine their behavior. This lack poses significant challenges, particularly in the telecommunications domain, where ensuring alignment with operational objectives and maintaining reliable network performance is critical. Consequently, there is a pressing need for explainability methods that make RL decisions more interpretable and understandable for stakeholders.

This thesis investigates the emerging field of Explainable Reinforcement Learning (XRL), specifically focusing on its application to mobile network optimization. In the context of single-agent RL, we evaluate two state-of-the-art XRL techniques in the Remote Electrical Tilt (RET) optimization problem, where the tilt of each antenna needs to be controlled to optimize network coverage and capacity. These methods address two distinct interpretability challenges in RL: (i) understanding the state-action mapping determined by an RL policy and (ii) explaining the long-term goal of an RL agent. These evaluations highlight the potential and limitations of existing XRL methods when applied to a simulated mobile network.

To address a significant gap in the literature on single-agent XRL, we devise a novel algorithm, Temporal Policy Decomposition (TPD), which explains RL actions by predicting their outcomes in upcoming time steps. This method provides a clear view of an agent's anticipated behavior starting from a given environment state by generating insights for individual time steps. These time-aware explanations offer a comprehensive understanding of the decision-making process that accounts for the sequential nature of RL.

We then focus on multi-agent systems and develop a rollout-based algorithm to estimate Local Shapley Values (LSVs), quantifying individual agent contributions in specific states. This method reliably identifies agent contributions even in scenarios involving undertrained or suboptimal agents, making it a valuable tool for monitoring and diagnosing cooperative multi-agent systems.

These contributions represent a step toward a holistic explainability framework for RL in mobile networks, combining single-agent and multi-agent perspectives. By addressing core interpretability challenges, this research equips MNOs and AI engineers with practical techniques to trust, monitor, debug, and refine RL models. Furthermore, it helps ensure readiness for potential regulatory requirements, contributing to the broader goal of advancing trustworthy AI in telecommunications.

Abstract [sv]

Den ökande komplexiteten hos mobila nätverk har drivit på behovet av automatiserade optimeringsmetoder, där Reinforcement Learning (RL) framstår som en lovande datadriven teknik för att kontrollera nätverksparametrar. RL-system fungerar dock ofta som svarta lådor som saknar den tolkningsbarhet och transparens som krävs av mobilnätsoperatörer och AI-utvecklare för att kunna lita på, övervaka och förbättra deras beteende. Denna brist utgör betydande utmaningar, särskilt inom telekommunikationsområdet, där det är kritiskt att säkerställa överensstämmelse med operativa mål och upprätthålla pålitlig nätverksprestanda.

Den här avhandlingen undersöker det framväxande området Explainable Reinforcement Learning (XRL), med fokus speciellt på dess tillämpning för mobilnätsoperatörer. I samband med single-agent RL utvärderar vi två toppmoderna XRL-tekniker för optimeringsproblemet Remote Electrical Tilt (RET), där lutningen av varje antenn måste kontrolleras för att optimera täckning och kapacitet. Dessa metoder tar itu med två distinkta tolkbarhetsutmaningar i RL: (i) förstå tillstånds-handlingsmappningen som bestäms av en RL-policy och (ii) att förklara det långsiktiga målet för en RL-agent. Dessa utvärderingar belyser potentialen och begränsningarna hos befintliga XRL-metoder när de tillämpas på ett simulerat mobilnät.

För att ta itu med en betydande lucka i litteraturen om single-agent XRL, utvecklar vi en ny algoritm, Temporal Policy Decomposition (TPD), som förklarar RL-handlingar genom att förutsäga deras resultat i kommande tidssteg. Denna metod ger en tydlig bild av en agents förväntade beteende från ett givet tillstånd genom att generera insikter för individuella tidssteg. Dessa tidsmedvetna förklaringar ger en omfattande förståelse för beslutsprocessen som tar hänsyn till RL:s sekventiella karaktär.

Vi fokuserar sedan på system med flera agenter och utvecklar en utrullningsbaserad algoritm för att uppskatta lokala Shapley-värden (LSV), som kvantifierar individuella agentbidrag i specifika stater. Denna metod identifierar på ett tillförlitligt sätt agentbidrag även i scenarier som involverar undertränade eller suboptimala agenter, vilket gör den till ett värdefullt verktyg för att övervaka och diagnostisera kooperativa multiagentsystem.

Dessa bidrag representerar ett steg mot en holistisk förklaringsram för RL i mobilnät, som kombinerar enagent- och multiagentperspektiv. Genom att ta itu med centrala tolkningsutmaningar, utrustar denna forskning MNO:er och AI-utvecklare med praktiska tekniker för att lita på, felsöka, övervaka, och förbättra RL-modeller. Dessutom bidrar det till att säkerställa beredskap för potentiellt kommande regulatoriska krav, vilket bidrar till det bredare målet att främja pålitlig AI inom telekommunikation.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2025. p. x, 47
Series
TRITA-EECS-AVL ; 2025:17
Keywords
Artificial Intelligence, Machine Learning, Reinforcement Learning, Explainable Artificial Intelligence, Explainable Reinforcement Learning, Mobile Network Optimization, Telecommunications
National Category
Computer Sciences Electrical Engineering, Electronic Engineering, Information Engineering Telecommunications Artificial Intelligence
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-358957 (URN)978-91-8106-180-2 (ISBN)
Presentation
2025-02-21, https://kth-se.zoom.us/j/66674834407, Harry Nyquist, Malvinas väg 10, Stockholm, 15:00 (English)
Opponent
Supervisors
Funder
Swedish Foundation for Strategic ResearchWallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20250129

Available from: 2025-01-29 Created: 2025-01-29 Last updated: 2025-02-12Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Ruggeri, FrancoEmanuelsson, WilliamTerra, AhmadInam, RafiaJohansson, Karl H.

Search in DiVA

By author/editor
Ruggeri, FrancoEmanuelsson, WilliamTerra, AhmadInam, RafiaJohansson, Karl H.
By organisation
Decision and Control Systems (Automatic Control)Mechatronics and Embedded Control Systems
Computer SciencesControl Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 118 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf