kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 14) Show all publications
Vannella, F. (2023). Bandit Methods for Network Optimization: Safety, Exploration, and Coordination. (Doctoral dissertation). Stockholm, Sweden: Kungliga Tekniska högskolan
Open this publication in new window or tab >>Bandit Methods for Network Optimization: Safety, Exploration, and Coordination
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Banditmetoder för Nätverksoptimering : Säkerhet, Utforskning och Koordinering
Abstract [en]

The increasing complexity of modern mobile networks poses unprecedented challenges to their optimization. Mobile Network Operators (MNOs) need to control a large number of network parameters to satisfy the users’ demands. The antenna tilt is an important example of controllable network parameter that has significant effects on the network coverage and capacity. Recently, sequential learning algorithms have emerged as promising tools for controlling network parameters efficiently and reducing operational expenses. Bandits are a class of sequential learning methods in which an agent interacts with an environment to learn an optimal policy by selecting actions and receiving rewards from an environment. However, these methods come with their own challenges with respect to safety, exploration, and coordination. In this thesis, we investigate the antenna tilt optimization problem using bandit methods with a focus on these challenges.

We investigate the safety aspect using offline learning methods in Contextual Bandits (CBs), where the goal is to learn an improved target policy, solely from offline data collected by a logging policy. Based on this data, a target policy is derived by minimizing its counterfactual estimated risk. We learn and evaluate several antenna tilt policies using real-world network data, showing that offline learning approaches can safely learn improved tilt control policies.

We then focus on the exploration aspect by investigating the Best Policy Identification (BPI) problem in Contextual Linear Bandit (CLB), in which the reward has a convenient linear structure. The goal is to identify an optimal policy using the least possible amount of data. We derive lower bounds on the number of samples required by any algorithm returning an approximately optimal policy. We devise algorithms learning optimal tilt control policies from existing data (passive learning) or from data actively generated by the algorithm (active learning). We demonstrate experimentally that our algorithms produce optimal tilt control policies using fewer samples than rule-based algorithms.

We explore the coordination aspect in a multi-agent bandit setting where the reward of each agent depends on the actions of other agents. We investigate both Best Arm Identification (BAI) and Regret Minimization (RM): in BAI, the objective is to find an optimal global action with minimal sample complexity, while in RM the objective is to maximize the cumulative reward over rounds. We derive lower bounds on the sample complexity and the regret, which characterize the corresponding optimal sampling strategy. Unfortunately, these bounds are obtained by solving combinatorial optimization problems that are hard to solve and cannot be leveraged in the design of efficient algorithms. Inspired by Mean Field (MF) methods, we devise a family of approximations to these problems. By leveraging these approximations, we devise algorithms for BAI and RM trading off the achievable statistical and computational complexity. We apply these algorithms to solve the antenna tilt optimization problem showcasing the scalability and statistical efficiency of the proposed methods.

Abstract [sv]

Den ökande komplexiteten hos moderna mobila nätverk skapar nya oöverträffade utmaningar. Mobiloperatörer behöver kontrollera ett stort antal nätverksparametrar för att tillfredsställa användarnas krav. Antennlutning är ett viktigt exempel på en kontrollerbar nätverksparameter som har betydande effekter på nätverkstäckning och nätverkskapacitet. Sekventiella inlärningsalgoritmer har framträtt som lovande verktyg för kontroll av nätverksparametrar och för att minska driftskostnader. Banditmetoder är sekventiella inlärningsmetoder där en agent interagerar med en miljö för att lära sig en optimal kontrollstrategi med avseende på ett givet mål. Dessa metoder medför dock egna utmaningar när det gäller säkerhet, utforskning och koordination. I den här avhandlingen studerar vi antennlutningsoptimeringsproblemet med banditmetoder med fokus på dessa utmaningar.

Vi undersöker först fsäkerhetsaspekten med hjälp av offline off-policy inlärningsmetoder i Contextual Bandits (CBs). Målet är att lära sig en förbättrad målpolicy, enbart från offline-data insamlad av en loggningspolicy. Baserat på detta data härleds en målpolicy genom att minimera uppkattade risken på målpolicien. Vi lär oss och utvärderar flera antennlutningspolicies med hjälp av verkliga nätverksdata och visar att dessa metoder kan lära sig förbättrade lutningskontrollpolicies, på ett säkert sätt.

Sedan fokuserar vi på utforskningssidan inom ett Best Policy Identification-inställning i Contextual Linear Bandits (CLBs), där belöningen har en bekväm linjär struktur. Målet är att, med så lite data som möjligt, identifiera en optimal policy. Vi härleder nedre gränser för antalet dataexempel som krävs av vilken algoritm som helst som returnerar en appriximativt optimal policy. Vi utvecklar algoritmer som lär sig optimala lutningskontrollpolicies från befintlig data (passiv inlärning) eller från data genererad av algoritmen (aktiv inlärning). Vi visar experimentellt att våra algoritmer producerar optimala antennlutningskontrollpolicies med färre prover än regelbaserade algoritmer.

Vi utforskar koordinationsaspekten genom att undersöka antennlutningsoptimeringsproblemet i ett fleragents-banditinscenario med en kopplad belöningsstruktur. Vi undersöker både Best Arm Identification (BAI) och Regret Minimization (RM). I BAI är målet att hitta en optimal global åtgärd med minimal dataexempelkomplexitet, medan i RM är målet att maximera den kumulativa belöningen över flera omgångar. Vi härleder instansspecifika nedre gränser för dataexempelkomplexiteten och ånger, som karaktäriserar den motsvarande optimala dataexempeltagningstrategin. Tyvärr erhålls dessa gränser genom att lösa kombinatoriska optimeringsproblem som är svåra att lösa och inte kan utnyttjas i utformningen av effektiva algoritmer. Inspirerade av Mean Field (MF)-approximationsmetoder utvecklar vi därfor en familj av approximationer till dessa problem och utforskar avvägningen mellan den uppnåeliga statistiska och beräkningsmässiga komplexiteten. Vi utvecklar algoritmer för BAI och RM vars provkomplexitet och ånger bevisligen matchar våra approximerade nedre gränser och tillämpar dessa algoritmer för att lösa antennlutningsoptimeringsproblemet.

Place, publisher, year, edition, pages
Stockholm, Sweden: Kungliga Tekniska högskolan, 2023. p. xiv, 47
Series
TRITA-EECS-AVL ; 2023:66
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-337535 (URN)978-91-8040-716-8 (ISBN)
Public defence
2023-10-30, Kollegiesalen, Brinellvägen 8, Stockholm, 09:00 (English)
Opponent
Supervisors
Note

QC 20231009

Available from: 2023-10-09 Created: 2023-10-05 Last updated: 2023-10-13Bibliographically approved
Vannella, F., Proutiere, A. & Jeong, J. (2023). Best Arm Identification in Multi-Agent Multi-Armed Bandits. In: Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett (Ed.), Proceedings of the 40 th International Conference on MachineLearning, Honolulu, Hawaii, USA: . Paper presented at International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USA (pp. 34875-34907). MLResearchPress, 202
Open this publication in new window or tab >>Best Arm Identification in Multi-Agent Multi-Armed Bandits
2023 (English)In: Proceedings of the 40 th International Conference on MachineLearning, Honolulu, Hawaii, USA / [ed] Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett, MLResearchPress , 2023, Vol. 202, p. 34875-34907Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

We investigate the problem of best arm identification in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through afactor graph. The objective is to find an optimalglobal action with a prescribed level of confidenceand minimal sample complexity. We derive a tightinstance-specific lower bound of the sample complexity and characterize the corresponding optimal sampling strategy. Unfortunately, this boundis obtained by solving a combinatorial optimization problem with a number of variables and constraints exponentially growing with the number ofagents. We leverage Mean Field (MF) techniquesto obtain, in a computationally efficient manner,an approximation of the lower bound. The approximation scales at most as ρKd(where ρ, K,and d denote the number of factors in the graph,the number of possible actions per agent, and themaximal degree of the factor graph). We deviseMF-TaS (Mean-Field-Track-and-Stop), an algorithm whose sample complexity provably matchesour approximated lower bound. We illustratethe performance of MF-TaS numerically usingboth synthetic and real-world experiments (e.g.,to solve the antenna tilt optimization problem inradio communication networks).

Place, publisher, year, edition, pages
MLResearchPress, 2023
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-336615 (URN)2-s2.0-85174397882 (Scopus ID)
Conference
International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USA
Note

QC 20230915

Available from: 2023-09-15 Created: 2023-09-15 Last updated: 2024-07-08Bibliographically approved
Isaksson, M., Vannella, F., Sandberg, D. & Coster, R. (2023). mmWave Beam Selection in Analog Beamforming Using Personalized Federated Learning. In: Proceedings - 2023 IEEE Future Networks World Forum: Future Networks: Imagining the Network of the Future, FNWF 2023: . Paper presented at 6th IEEE Future Networks World Forum, FNWF 2023, Baltimore, United States of America, Nov 13 2023 - Nov 15 2023. Institute of Electrical and Electronics Engineers Inc.
Open this publication in new window or tab >>mmWave Beam Selection in Analog Beamforming Using Personalized Federated Learning
2023 (English)In: Proceedings - 2023 IEEE Future Networks World Forum: Future Networks: Imagining the Network of the Future, FNWF 2023, Institute of Electrical and Electronics Engineers Inc. , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Using analog beamforming in mmWave frequency bands we can focus the energy towards a receiver to achieve high throughput. However, this requires the network to quickly find the best downlink beam configuration in the face of non-IID data. We propose a personalized Federated Learning (FL) method to address this challenge, where we learn a mapping between uplink Sub-6GHz channel estimates and the best downlink beam in heterogeneous scenarios with non-IID characteristics. We also devise FedLion, a FL implementation of the Lion optimization algorithm. Our approach reduces the signalling overhead and provides superior performance, up to 33.6% higher accuracy than a single FL model and 6 % higher than a local model.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2023
Keywords
beam selection, beamforming, distributed learning, federated learning
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-347321 (URN)10.1109/FNWF58287.2023.10520606 (DOI)001229556600119 ()2-s2.0-85194158189 (Scopus ID)
Conference
6th IEEE Future Networks World Forum, FNWF 2023, Baltimore, United States of America, Nov 13 2023 - Nov 15 2023
Note

QC 20240612

Part of ISBN 979-835032458-7

Available from: 2024-06-10 Created: 2024-06-10 Last updated: 2024-10-11Bibliographically approved
Vannella, F., Jeong, J. & Proutiere, A. (2023). Off-Policy Learning in Contextual Bandits for Remote Electrical Tilt Optimization. IEEE Transactions on Vehicular Technology, 72(1), 546-556
Open this publication in new window or tab >>Off-Policy Learning in Contextual Bandits for Remote Electrical Tilt Optimization
2023 (English)In: IEEE Transactions on Vehicular Technology, ISSN 0018-9545, E-ISSN 1939-9359, Vol. 72, no 1, p. 546-556Article in journal (Refereed) Published
Abstract [en]

We investigate the problem of Remote Electrical Tilt (RET) optimization using off-policy learning techniques devised for Contextual Bandits (CBs). The goal in RET optimization is to control the vertical tilt angle of antennas at base stations to optimize key performance indicators representing the Quality of Service (QoS) perceived by the users in cellular networks. Learning an improved tilt update policy is hard. On the one hand, coming up with a policy in an online manner in a real network requires exploring tilt updates that have never been used before, and is operationally too risky. On the other hand, devising this policy via simulations suffers from the simulation-to-reality gap. In this paper, we circumvent these issues by learning an improved policy in an offline manner using existing data collected on real networks. We formulate the problem of devising such a policy using the off-policy contextual bandit framework. We propose CB learning algorithms to extract optimal tilt update policies from the data. We train and evaluate these policies on real-world cellular network data. Our policies show consistent improvements over the rule-based logging policy used to collect the data.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Optimization, Antennas, Cellular networks, Fuzzy logic, Safety, Quality of service, Analytical models, Contextual bandits, off-policy evaluation, off-policy learning, remote electrical tilt optimization
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-326569 (URN)10.1109/TVT.2022.3202041 (DOI)000966751100001 ()2-s2.0-85137611705 (Scopus ID)
Note

QC 20230508

Available from: 2023-05-08 Created: 2023-05-08 Last updated: 2023-10-05Bibliographically approved
Vannella, F. (2023). Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits. Paper presented at Neural Information Processing Systems (NeurIPS).
Open this publication in new window or tab >>Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits
2023 (English)Manuscript (preprint) (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-337463 (URN)
Conference
Neural Information Processing Systems (NeurIPS)
Note

QC 20231004

Available from: 2023-10-03 Created: 2023-10-03 Last updated: 2023-10-06Bibliographically approved
Vannella, F., Proutiere, A. & Jeong, J. (2023). Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits. In: Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023: . Paper presented at 37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, United States of America, Dec 10 2023 - Dec 16 2023. Neural Information Processing Systems Foundation, 36
Open this publication in new window or tab >>Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits
2023 (English)In: Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023, Neural Information Processing Systems Foundation , 2023, Vol. 36Conference paper, Published paper (Refereed)
Abstract [en]

We study the problem of regret minimization in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph.We derive an instance-specific regret lower bound and characterize the minimal expected number of times each global action should be explored.This bound and the corresponding optimal exploration process are obtained by solving a combinatorial optimization problem whose set of variables and constraints exponentially grow with the number of agents, and cannot be exploited in the design of efficient algorithms.Inspired by Mean Field approximation techniques used in graphical models, we provide simple upper bounds of the regret lower bound.The corresponding optimization problems have a reduced number of variables and constraints.By tuning the latter, we may explore the trade-off between the achievable regret and the complexity of computing the corresponding exploration process.We devise Efficient Sampling for MAMAB (ESM), an algorithm whose regret asymptotically matches the approximated lower bounds.The regret and computational complexity of ESM are assessed numerically, using both synthetic and real-world experiments in radio communications networks.

Place, publisher, year, edition, pages
Neural Information Processing Systems Foundation, 2023
Series
Advances in Neural Information Processing Systems, ISSN 1049-5258 ; 36
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-346142 (URN)2-s2.0-85191164473 (Scopus ID)
Conference
37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, United States of America, Dec 10 2023 - Dec 16 2023
Note

QC 20240507

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2024-07-04Bibliographically approved
Jin, Y., Vannella, F., Bouton, M., Jeong, J. & Al Hakim, E. (2022). A Graph Attention Learning Approach to Antenna Tilt Optimization. In: 2022 1St International Conference On 6G Networking (6GNET): . Paper presented at 1st International Conference on 6G Networking (6GNet), JUL 06-08, 2022, Orange, Paris, FRANCE. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A Graph Attention Learning Approach to Antenna Tilt Optimization
Show others...
2022 (English)In: 2022 1St International Conference On 6G Networking (6GNET), Institute of Electrical and Electronics Engineers (IEEE) , 2022Conference paper, Published paper (Refereed)
Abstract [en]

6G will move mobile networks towards increasing levels of complexity. To deal with this complexity, optimization of network parameters is key to ensure high performance and timely adaptivity to dynamic network environments. The optimization of the antenna tilt provides a practical and cost-efficient method to improve coverage and capacity in the network. Previous methods based on Reinforcement Learning (RL) have shown effectiveness for tilt optimization by learning adaptive policies outperforming traditional tilt optimization methods. However, most existing RL methods are based on single-cell features representation, which fails to fully characterize the agent state, resulting in suboptimal performance. Also, most of such methods lack scalability and generalization ability due to state-action explosion. In this paper, we propose a Graph Attention Q-learning (GAQ) algorithm for tilt optimization. GAQ relies on a graph attention mechanism to select relevant neighbors information, improving the agent state representation, and updates the tilt control policy based on a history of observations using a Deep Q-Network (DQN). We show that GAQ efficiently captures important network information and outperforms baselines with local information by a large margin. In addition, we demonstrate its ability to generalize to network deployments of different sizes and density.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-320297 (URN)10.1109/6GNet54646.2022.9830258 (DOI)000860313400009 ()2-s2.0-85136105973 (Scopus ID)
Conference
1st International Conference on 6G Networking (6GNet), JUL 06-08, 2022, Orange, Paris, FRANCE
Note

Part of proceedings: ISBN 978-1-6654-6763-6

QC 20221024

Available from: 2022-10-24 Created: 2022-10-24 Last updated: 2025-11-10Bibliographically approved
Vannella, F., Proutiere, A., Jedra, Y. & Jeong, J. (2022). Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach. In: Proceedings - IEEE INFOCOM: . Paper presented at 41st IEEE Conference on Computer Communications, INFOCOM 2022, 2 May 2022 through 5 May 2022 (pp. 740-749). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach
2022 (English)In: Proceedings - IEEE INFOCOM, Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 740-749Conference paper, Published paper (Refereed)
Abstract [en]

Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. In this paper, we devise algorithms learning optimal tilt control policies from existing data (in the so-called passive learning setting) or from data actively generated by the algorithms (the active learning setting). We formalize the design of such algorithms as a Best Policy Identification (BPI) problem in Contextual Linear Multi-Arm Bandits (CL-MAB). An arm represents an antenna tilt update; the context captures current network conditions; the reward corresponds to an improvement of performance, mixing coverage and capacity; and the objective is to identify, with a given level of confidence, an approximately optimal policy (a function mapping the context to an arm with maximal reward). For CL-MAB in both active and passive learning settings, we derive information-theoretical lower bounds on the number of samples required by any algorithm returning an approximately optimal policy with a given level of certainty, and devise algorithms achieving these fundamental limits. We apply our algorithms to the Remote Electrical Tilt (RET) optimization problem in cellular networks, and show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Antennas, Economic and social effects, Information theory, Mobile telecommunication systems, Wireless networks, Active Learning, Antenna tilt, Cellular network, Control policy, Learning settings, Network Capacity, Optimal policies, Passive learning, Tilt control, Trade off, Learning algorithms
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-325322 (URN)10.1109/INFOCOM48880.2022.9796783 (DOI)000936344400075 ()2-s2.0-85133288997 (Scopus ID)
Conference
41st IEEE Conference on Computer Communications, INFOCOM 2022, 2 May 2022 through 5 May 2022
Note

QC 20230426

Available from: 2023-04-04 Created: 2023-04-04 Last updated: 2023-10-05Bibliographically approved
Aumayr, E., Feghhi, S., Vannella, F., Al Hakim, E. & Iakovidis, G. (2021). A Safe Reinforcement Learning Architecture for Antenna Tilt Optimisation. In: 2021 Ieee 32Nd Annual International Symposium On Personal, Indoor And Mobile Radio Communications (PIMRC): . Paper presented at 32nd IEEE Annual International Symposium on Personal, Indoor and Mobile Radio Communications (IEEE PIMRC), SEP 13-16, 2021, ELECTR NETWORK. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A Safe Reinforcement Learning Architecture for Antenna Tilt Optimisation
Show others...
2021 (English)In: 2021 Ieee 32Nd Annual International Symposium On Personal, Indoor And Mobile Radio Communications (PIMRC), Institute of Electrical and Electronics Engineers (IEEE) , 2021Conference paper, Published paper (Refereed)
Abstract [en]

Safe interaction with the environment is one of the most challenging aspects of Reinforcement Learning (RL) when applied to real-world problems. This is particularly important when unsafe actions have a high or irreversible negative impact on the environment. In the context of network management operations, Remote Electrical Tilt (RET) optimisation is a safety-critical application in which exploratory modifications of antenna tilt angles of base stations can cause significant performance degradation in the network. In this paper, we propose a modular Safe Reinforcement Learning (SRL) architecture which is then used to address the RET optimisation in cellular networks. In this approach, a safety shield continuously benchmarks the performance of RL agents against safe baselines, and determines safe antenna tilt updates to be performed on the network. Our results demonstrate improved performance of the SRL agent over the baseline while ensuring the safety of the performed actions.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Safe Reinforcement Learning, Mobile Networks, RET Optimisation
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-312673 (URN)10.1109/PIMRC50174.2021.9569387 (DOI)000782471000189 ()2-s2.0-85118469190 (Scopus ID)
Conference
32nd IEEE Annual International Symposium on Personal, Indoor and Mobile Radio Communications (IEEE PIMRC), SEP 13-16, 2021, ELECTR NETWORK
Note

QC 20220520

Part of proceedings ISBN 978-1-7281-7586-7

Available from: 2022-05-20 Created: 2022-05-20 Last updated: 2022-06-25Bibliographically approved
Vannella, F. (2021). Learning Methods for Antenna Tilt Optimization. (Licentiate dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Learning Methods for Antenna Tilt Optimization
2021 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

The increasing complexity of modern mobile networks poses unprecedented challenges to Mobile Network Operators (MNOs). MNOs need to utilize network resources optimally to satisfy the growing demand of network users in a reliable manner. To this end, algorithms for self-optimization of network parameters are an essential tool to increase network efficiency and reduce capital and operational expense. In particular, the control of the antenna tilt angle in mobile networks provides an effective method for improving network coverage and capacity. 

In this thesis, we study Remote Electrical Tilt (RET) optimization using learning-based methods. In these methods, the objective is to learn an optimal control policy, adjusting the vertical tilt of base station antennas to jointly maximize network coverage and capacity. Existing learning-based RET optimization methods, mainly rely on trial-and-error learning paradigms that inevitably degrade network performance during exploration phases, or may require an excessively large amount of samples to converge. We address RET optimization in the Contextual Bandit (CB) setting, a powerful sequential decision-making framework that allows to efficiently model and solve the RET optimization problem. Specifically, we focus on two distinct CB settings tackling the above mentioned problems: (i) the offline off-policy learning setting, and (ii) the Best Policy Identification (BPI) setting. 

In offline off-policy learning, the goal is to learn an improved policy, solely from offline data previously collected by a logging policy. Based on these data, a target policy is derived by minimizing the off-policy estimated risk of the learning policy. In RET optimization, the agent can leverage the vast amount of real-world network data collected by MNOs during network operations. This entails a significant advantage compared to online learning methods in terms of operational safety and performance reliability of the learned policy. We train and evaluate several target policies on real-world network data, showing that the off-policy approach can safely learn improved tilt update policy while providing a higher degree of reliability. 

In BPI, the goal is to identify an optimal policy with the least possible amount of data samples. We study BPI in Linear Contextual Bandits (LCBs), in which the reward has a convenient linear structure. We devise algorithms learning optimal tilt update policies from existing data (passive learning) or from data actively generated by the algorithms (active learning). For both active and passive learning settings, we derive information-theoretical lower bounds on the number of data samples required by any algorithm returning an approximately optimal policy with a given level of certainty and devise algorithms achieving these fundamental limits. We then show how to effectively model RET optimization in LCBs and demonstrate that our algorithms can produce optimal tilt update policies using much fewer data samples than naive or existing rule-based learning algorithms.

With the results obtained in this thesis, we argue that a significant improvement for sample complexity and operational safety can be achieved while learning RET optimization policies in CBs, providing potential for real-world network deployment of learning-based RET policies. 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2021. p. 101
Series
TRITA-EECS-AVL ; 2021:57
National Category
Control Engineering
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-302353 (URN)978-91-7873-985-1 (ISBN)
Presentation
2021-10-14, https://kth-se.zoom.us/j/64071065445, U1, Brinellvägen 26, Stockholm, 10:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20210921

Available from: 2021-09-21 Created: 2021-09-20 Last updated: 2022-06-25Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-7668-0650

Search in DiVA

Show all publications