kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Larsson Forsberg, AlbinORCID iD iconorcid.org/0009-0007-9146-0027
Publications (7 of 7) Show all publications
Larsson Forsberg, A., Lau, K., Nikou, A., Feljan, A. V. & Tumova, J. (2026). Diffusion Models for Constrained Planning with Probabilistic Risk-awareness Guarantees. In: Proceedings of the 18th International Conference on Agents and Artificial Intelligence: . Paper presented at The 18th International Conference on Agents and Artificial Intelligence (ICAART), Marbella, Spain, March 5-7, 2026 (pp. 2350-2358). INSTICC
Open this publication in new window or tab >>Diffusion Models for Constrained Planning with Probabilistic Risk-awareness Guarantees
Show others...
2026 (English)In: Proceedings of the 18th International Conference on Agents and Artificial Intelligence, INSTICC , 2026, p. 2350-2358Conference paper, Published paper (Refereed)
Abstract [en]

Diffusion models have shown great potential in generating trajectory plans for agents in environments with unknown dynamics. However, such models provide no safety guarantees. In this work, we focus on risk-aware planning with respect to safety constraints and introduce a probabilistically risk-aware variant of Diffuser (PRA-Diffuser). The diffusion model initially learns a distribution over trajectories that may or may not be unsafe. We then fine-tune this model to reduce the probability of sampling such unsafe trajectories. We analyze the proposed solution and introduce a provable lower bound on risk of safety violation leveraging concentration inequalities for conditional Value-at-Risk. Our approach can be applied to models that have been pre-trained, potentially from datasets containing unsafe trajectories. Our empirical results demonstrate that our approach significantly reduces unsafe trajectories generated by the diffusion model across multiple environments.

Place, publisher, year, edition, pages
INSTICC, 2026
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-379070 (URN)10.5220/0014238100004052 (DOI)
Conference
The 18th International Conference on Agents and Artificial Intelligence (ICAART), Marbella, Spain, March 5-7, 2026
Note

Part of ISBN 978-989-758-796-2

QC 20260408

Available from: 2026-04-08 Created: 2026-04-08 Last updated: 2026-04-09Bibliographically approved
Larsson Forsberg, A., Nikou, A., Feljan, A. & Tumova, J. (2026). Learning Long-Horizon Multi-Agent Coordination from Temporal Logic Specifications. In: Proceedings of the 18th International Conference on Agents and Artificial Intelligence: . Paper presented at The 18th International Conference on Agents and Artificial Intelligence (ICAART), Marbella, Spain, March 5-7, 2026 (pp. 70-79). INSTICC, 1
Open this publication in new window or tab >>Learning Long-Horizon Multi-Agent Coordination from Temporal Logic Specifications
2026 (English)In: Proceedings of the 18th International Conference on Agents and Artificial Intelligence, INSTICC , 2026, Vol. 1, p. 70-79Conference paper, Published paper (Refereed)
Abstract [en]

We study multi-agent reinforcement learning (MARL) under temporally extended Signal Temporal Logic(STL) objectives, which require reasoning over both long-horizon dynamics and inter-agent relations. Wepropose TD-MAT, a transformer-based architecture with multivariate positional encodings, causal temporalmasking, and a decomposed reward based on arithmetic–geometric mean robustness with variance regularization. Experiments on coordination tasks ranging from unstructured multi-objective problems to strict temporalsequencing show that TD-MAT learns effective long-term behaviors and generalizes to heterogeneous agentsettings. Ablation studies highlight the necessity of temporal masking, positional encodings, and reward decomposition, while comparisons to MAPPO, RMAPPO, and MAT reveal that transformers provide the greatestbenefit on unstructured, long-horizon tasks.

Place, publisher, year, edition, pages
INSTICC, 2026
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-379069 (URN)10.5220/0014329000004052 (DOI)
Conference
The 18th International Conference on Agents and Artificial Intelligence (ICAART), Marbella, Spain, March 5-7, 2026
Note

Part of ISBN 978-989-758-796-2

QC 20260408

Available from: 2026-04-08 Created: 2026-04-08 Last updated: 2026-04-09Bibliographically approved
Larsson Forsberg, A. (2026). Multi-Agent Learning Under Spatio-Temporal Constraints in Coordinated Communication Networks. (Doctoral dissertation). Stockholm: Kungliga Tekniska högskolan
Open this publication in new window or tab >>Multi-Agent Learning Under Spatio-Temporal Constraints in Coordinated Communication Networks
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Modern cellular networks have gotten more complex over the years, transitioning from sparse macro-cell deployments to ultra-dense, heterogeneous systems. In this thesis we consider a radio resource management (RRM) problem called remote electrical tilt (RET), in particular. The objective in RET opimization is to tune antenna tilt parameters in the network to allocate radio resources where they are the most needed. As cellular networks evolve toward 6G, we expect an unprecedented increased need for autonomous decision making in the networks, introducing new coordination challenges exacerbated by the denser networks. Traditional network management has been reliant on manual engineering and rule-based heuristics and is insufficient for the needs of the next generation as it scales poorly. While Multi-Agent Reinforcement Learning appears as a promising tool for autonomously adapting the network, currently deployed solutions often struggle with the large scale of the problem. Additionally, they fail to provide formal guarantees, and remain limited by myopic and step-wise reward structures that cannot capture complex constraints communication service providers (CSPs) may impose on the network. Lacking these attributes holds back deployment in live networks beyond small scale pilot studies.

This thesis proposes a series of approaches that aim to provide high-assurance autonomous network parameter control. The contributions progressively build on each other from spatial interference coordination to long-horizon, risk-aware planning to satisfy CSP network intents. First, we address the myopic constraints by leveraging graph-based decomposition and coordination graphs to factorize the joint action space, enabling scalable \textit{constrained} learning in dense urban environments. Recognizing that critical infrastructure demands reliability beyond mean performance, we also introduce a risk-aware constrained learning framework utilizing Conditional Value-at-Risk to provide probabilistic reasoning over constraints in the network.

To bridge the gap between low-level control and high-level CSP intents, we transition from scalar rewards to formal specifications. We utilize Signal Temporal Logic (STL) and transformer-based architectures to satisfy complex intents, enabling agents to reason over  long-horizon requirements. Finally, we move beyond traditional control policies toward generative planning of trajectory rollouts. We aim to enable the generation of safe, high-quality plans that respect hard constraints with probabilistic guarantees by using diffusion probabilistic.

The proposed methods are evaluated on high-fidelity simulators modeled after real-world urban topologies. The results demonstrate that by integrating structural coordination, formal logic, and generative modeling, it is possible to address many of the issues that plague contemporary autonomous network management. The policies that are obtained by these approaches are not only high-performing but also interpretable, safe, and aligned with the rigorous demands of next-generation telecommunications infrastructure.

Abstract [sv]

Moderna mobilnät har blivit alltmer komplexa genom åren och genomgår en övergång från glesa makrocellsutbyggnader till ultratäta, heterogena system. I denna avhandling fokuserar vi specifikt på ett problem inom radioresurshantering (RRM) kallat fjärrstyrd elektrisk lutning (Remote Electrical Tilt, RET). Målet med RET-optimering är att justera antennernas lutning för att fördela radioresurser där de behövs som mest. I takt med att mobilnäten utvecklas mot 6G förväntas en oöverträffad tillväxt i automatiseringsbehov, vilket introducerar nya koordinationsutmaningar som förstärks av de tätare näten. Traditionell nätverkshantering har varit beroende av manuell justering och regelbaserad heuristik, vilket är otillräckligt för nästa generations behov då det skalar upp dåligt. Även om förstärkningsinlärning i fleragentsystem (MARL) framstår som ett lovande verktyg för att autonomt nätverk, kämpar nuvarande lösningar ofta med problemets storskalighet. Dessutom misslyckas de med att ge garantier och begränsas av kortsiktiga, stegvisa belöningsstrukturer som inte kan fånga de komplexa krav som nätoperatörer ställer på nätverket. Bristen på dessa egenskaper hindrar live driftsättning i nät bortom småskaliga pilotstudier.

Denna avhandling föreslår en serie metoder som syftar till att tillhandahålla autonom kontroll av nätverksparametrar med hög tillförlitlighet. Bidragen bygger progressivt på varandra, från spatial interferenskoordinering till långsiktig, riskmedveten planering för att uppfylla operatörernas nätverksmål (intents). Först hanterar vi kortsiktiga begränsningar genom att utnyttja grafbaserad dekomponering och koordinationsgrafer för att faktorisera det gemensamma beslutsutrymmet, vilket möjliggör skalbar villkorad inlärning i täta stadsmiljöer. Med insikten att kritisk infrastruktur kräver tillförlitlighet bortom genomsnittlig prestanda, introducerar vi också ett riskmedvetet ramverk för villkorad inlärning som utnyttjar betingat riskvärde (Conditional Value-at-Risk) för att möjliggöra resonerande kring villkor i nätverket.

För att överbrygga gapet mellan kontroll på låg nivå och operatörers mål på hög nivå går vi från skalära belöningar till formella specifikationer. Vi använder Signal Temporal Logic (STL) och Transformer-baserade arkitekturer för att uppfylla komplexa mål, vilket gör det möjligt för agenter att resonera kring långsiktiga krav. Slutligen rör vi oss från traditionella kontrollstrategier mot generativ planering av trajektorier. Genom att använda probabilistiska diffusionsmodeller ämnar vi till att möjliggöra generering av säkra planer av hög kvalitet som respekterar hårda villkor med probabilistiska garantier.

De föreslagna metoderna utvärderas i avancerade simulatorer modellerade efter topologier från stadsmiljöer. Resultaten visar att det är möjligt--genom att integrera strukturell koordination, formell logik och generativ modellering--att adressera många av de problem som plågar samtida autonom nätverkshantering. De strategier som erhålls genom dessa metoder är inte bara högpresterande utan även tolkningsbara, säkra och anpassade till de rigorösa krav som finns i nästa generations telekommunikationsinfrastruktur.

Place, publisher, year, edition, pages
Stockholm: Kungliga Tekniska högskolan, 2026. p. 67
Series
TRITA-EECS-AVL ; 2026:31
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-379072 (URN)978-91-8106-576-3 (ISBN)
Public defence
2026-05-07, https://kth-se.zoom.us/s/67709142389, F3 (Flodis), Lindstedtsvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20260410

Available from: 2026-04-10 Created: 2026-04-09 Last updated: 2026-04-10Bibliographically approved
Larsson Forsberg, A., Nikou, A., Feljan, A. V. & Tumova, J. (2026). Temporal Intent-Aware Multi-agent Learning for Network Optimization. In: Computer Safety, Reliability, and Security. SAFECOMP 2025 Workshops - CoC3CPS, DECSoS, SASSUR, SENSEI, SRToITS, and WAISE, 2025, Proceedings: . Paper presented at Co-Design of Communication, Computing and Control in Cyber-Physical Systems, CoC3CPS 2025 held in conjunction with the 44th International Conference on Computer Safety, Reliability, and Security, SAFECOMP 2025, Stockholm, Sweden, September 9, 2025 (pp. 29-40). Springer Nature
Open this publication in new window or tab >>Temporal Intent-Aware Multi-agent Learning for Network Optimization
2026 (English)In: Computer Safety, Reliability, and Security. SAFECOMP 2025 Workshops - CoC3CPS, DECSoS, SASSUR, SENSEI, SRToITS, and WAISE, 2025, Proceedings, Springer Nature , 2026, p. 29-40Conference paper, Published paper (Refereed)
Abstract [en]

Cellular networks have grown in size and complexity in recent years. To meet increasing traffic demands, new approaches are needed to replace legacy rule-based controllers and network management systems. Among these, learning-based methods are appealing because they can discover control policies without relying on expert knowledge. Intent-based networking, which describes desired network behavior rather than specific configurations, introduces a new level of abstraction. However, satisfying network intents under temporal constraints remains an open challenge. In this paper, we present a reinforcement learning approach that leverages Signal Temporal Logic (STL) to quantitatively translate network intents into a reward signal. We combine this with a transformer-based neural network architecture to handle temporal dependencies and multi-agent coordination. We evaluate our method in a high-fidelity telecommunications simulator, demonstrating that it outperforms state-of-the-art baselines. Our experiments show an improvement in satisfying temporally dependent intents compared to prior methods.

Place, publisher, year, edition, pages
Springer Nature, 2026
Keywords
Intent-driven control, Network optimization, Reinforcement learning, Temporal logic
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-370457 (URN)10.1007/978-3-032-02018-5_3 (DOI)2-s2.0-105014755551 (Scopus ID)
Conference
Co-Design of Communication, Computing and Control in Cyber-Physical Systems, CoC3CPS 2025 held in conjunction with the 44th International Conference on Computer Safety, Reliability, and Security, SAFECOMP 2025, Stockholm, Sweden, September 9, 2025
Note

Part of ISBN 9783032020178

QC 20250929

Available from: 2025-09-29 Created: 2025-09-29 Last updated: 2026-04-09Bibliographically approved
Larsson Forsberg, A., Nikou, A., Feljan, A. V. & Tumova, J. (2023). Network Parameter Control in Cellular Networks through Graph-Based Multi-Agent Constrained Reinforcement Learning. In: 2023 IEEE 19th International Conference on Automation Science and Engineering, CASE 2023: . Paper presented at 19th IEEE International Conference on Automation Science and Engineering, CASE 2023, Auckland, New Zealand, Aug 26 2023 - Aug 30 2023. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Network Parameter Control in Cellular Networks through Graph-Based Multi-Agent Constrained Reinforcement Learning
2023 (English)In: 2023 IEEE 19th International Conference on Automation Science and Engineering, CASE 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Cellular networks are growing in complexity at increasing speed and the geographical locations in which they are deployed in are getting denser. Traditional control methods fall short in providing a scalable and dynamic way of adapting the network to new conditions. Distributed multiagent reinforcement learning successfully addresses scalability problems seen in centralized approaches. The question of achieving learning with constraint satisfaction in distributed systems is still left unanswered in the state-of-the-art. In this work, we aim to perform distributed multi-agent constrained reinforcement learning in order to learn a policy online while satisfying imposed constraints. We use a coordination graph to model the interactions between agents and decompose the global value function. A conservative safety critic is trained in parallel to evaluate the safety of proposed actions. Our method allows for separate training of both the critic and the value network independently of each other, and hence offers flexibility in how and when to train the different models. The results are compared to a baseline using no safety critic. Simulations show that the agents succeed in learning a policy that can satisfy the constraints, while still maximizing the objective.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-350256 (URN)10.1109/CASE56687.2023.10260368 (DOI)2-s2.0-85174409786 (Scopus ID)
Conference
19th IEEE International Conference on Automation Science and Engineering, CASE 2023, Auckland, New Zealand, Aug 26 2023 - Aug 30 2023
Note

Part of ISBN 979-8-3503-2069-5

QC 20240710

Available from: 2024-07-10 Created: 2024-07-10 Last updated: 2026-04-09Bibliographically approved
Larsson Forsberg, A., Tumova, J., Nikou, A. & Vulgarakis Feljan, A. (2023). Network Parameter Control in Cellular Networks through Graph-basedMulti-agent Constrained Reinforcement Learning. In: : . Paper presented at 19th International Conference on Automation Science and Engineering, Auckland, New Zealand, 26 – 29 August 2023. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Network Parameter Control in Cellular Networks through Graph-basedMulti-agent Constrained Reinforcement Learning
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Cellular networks are growing in complexity at increasing speed and the geographical locations in which they are deployed in are getting denser. Traditional control methods fall short in providing a scalable and dynamic way of adapting the network to new conditions. Distributed multi-agent reinforcement learning successfully addresses scalability problems seen in centralized approaches. The question of achieving learning with constraint satisfaction in distributed systems is still left unanswered in the state-of-the-art. In this work, we aim to perform distributed multi-agent constrained reinforcement learning in order to learn a policy online while satisfying imposed constraints. We use a coordination graph to model the interactions between agents and decompose the global value function.A conservative safety critic is trained in parallel to evaluate the safety of proposed actions. Our method allows for separate training of both the critic and the value network independently of each other, and hence offers flexibility in how and when to train the different models. The results are compared to a baseline using no safety critic. Simulations show that the agents succeed in learning a policy that can satisfy the constraints, while still maximizing the objective. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-330277 (URN)
Conference
19th International Conference on Automation Science and Engineering, Auckland, New Zealand, 26 – 29 August 2023
Funder
Swedish Foundation for Strategic Research, ID20-0027
Note

QC 20230630

Available from: 2023-06-28 Created: 2023-06-28 Last updated: 2025-09-22Bibliographically approved
Larsson Forsberg, A., Nikou, A., Feljan, A. V. & Tumova, J. Multi-agent risk-aware constrained learning for network parameter optimization.
Open this publication in new window or tab >>Multi-agent risk-aware constrained learning for network parameter optimization
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Mobile networks face increasing demands for throughput in highly dynamic and stochastic environments, posing significant challenges for effective parameter optimization. Traditional methods relying on precise dynamics models struggle to adapt to uncertainties, while existing reinforcement learning approaches often fail to balance safety and performance under constraints. In this work, we introduce a novel risk-aware constrained multi-agent RL framework, specifically applied to the Remote Electrical Tilt problem in mobile networks. Our method leverages a risk-aware critic based on risk metrics, enabling a dynamic trade-off between worst-case and expectation-based constraint satisfaction. By allowing fine-grained control over risk tolerance, the framework addresses the practical need to adapt policies to varying operational conditions. Experimental results demonstrate that the proposed approach achieves robust network optimization, ensuring safety through action filtering while maintaining high performance. This work provides a scalable foundation for constrained RL in dynamic, multi-agent environments, with potential applications beyond mobile networks, such as autonomous systems and energy grids. Additionally, we present an ablation study illustrating how action filtering enhances safety in uncertain scenarios.

National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-379071 (URN)
Note

Submitted to IEEE Transactions on Vehicular Technology, ISSN 0018-9545, EISSN 1939-9359

QC 20260408

Available from: 2026-04-08 Created: 2026-04-08 Last updated: 2026-04-09Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0009-0007-9146-0027

Search in DiVA

Show all publications