kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bandit Methods for Network Optimization: Safety, Exploration, and Coordination
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-7668-0650
2023 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Banditmetoder för Nätverksoptimering : Säkerhet, Utforskning och Koordinering (Swedish)
Abstract [en]

The increasing complexity of modern mobile networks poses unprecedented challenges to their optimization. Mobile Network Operators (MNOs) need to control a large number of network parameters to satisfy the users’ demands. The antenna tilt is an important example of controllable network parameter that has significant effects on the network coverage and capacity. Recently, sequential learning algorithms have emerged as promising tools for controlling network parameters efficiently and reducing operational expenses. Bandits are a class of sequential learning methods in which an agent interacts with an environment to learn an optimal policy by selecting actions and receiving rewards from an environment. However, these methods come with their own challenges with respect to safety, exploration, and coordination. In this thesis, we investigate the antenna tilt optimization problem using bandit methods with a focus on these challenges.

We investigate the safety aspect using offline learning methods in Contextual Bandits (CBs), where the goal is to learn an improved target policy, solely from offline data collected by a logging policy. Based on this data, a target policy is derived by minimizing its counterfactual estimated risk. We learn and evaluate several antenna tilt policies using real-world network data, showing that offline learning approaches can safely learn improved tilt control policies.

We then focus on the exploration aspect by investigating the Best Policy Identification (BPI) problem in Contextual Linear Bandit (CLB), in which the reward has a convenient linear structure. The goal is to identify an optimal policy using the least possible amount of data. We derive lower bounds on the number of samples required by any algorithm returning an approximately optimal policy. We devise algorithms learning optimal tilt control policies from existing data (passive learning) or from data actively generated by the algorithm (active learning). We demonstrate experimentally that our algorithms produce optimal tilt control policies using fewer samples than rule-based algorithms.

We explore the coordination aspect in a multi-agent bandit setting where the reward of each agent depends on the actions of other agents. We investigate both Best Arm Identification (BAI) and Regret Minimization (RM): in BAI, the objective is to find an optimal global action with minimal sample complexity, while in RM the objective is to maximize the cumulative reward over rounds. We derive lower bounds on the sample complexity and the regret, which characterize the corresponding optimal sampling strategy. Unfortunately, these bounds are obtained by solving combinatorial optimization problems that are hard to solve and cannot be leveraged in the design of efficient algorithms. Inspired by Mean Field (MF) methods, we devise a family of approximations to these problems. By leveraging these approximations, we devise algorithms for BAI and RM trading off the achievable statistical and computational complexity. We apply these algorithms to solve the antenna tilt optimization problem showcasing the scalability and statistical efficiency of the proposed methods.

Abstract [sv]

Den ökande komplexiteten hos moderna mobila nätverk skapar nya oöverträffade utmaningar. Mobiloperatörer behöver kontrollera ett stort antal nätverksparametrar för att tillfredsställa användarnas krav. Antennlutning är ett viktigt exempel på en kontrollerbar nätverksparameter som har betydande effekter på nätverkstäckning och nätverkskapacitet. Sekventiella inlärningsalgoritmer har framträtt som lovande verktyg för kontroll av nätverksparametrar och för att minska driftskostnader. Banditmetoder är sekventiella inlärningsmetoder där en agent interagerar med en miljö för att lära sig en optimal kontrollstrategi med avseende på ett givet mål. Dessa metoder medför dock egna utmaningar när det gäller säkerhet, utforskning och koordination. I den här avhandlingen studerar vi antennlutningsoptimeringsproblemet med banditmetoder med fokus på dessa utmaningar.

Vi undersöker först fsäkerhetsaspekten med hjälp av offline off-policy inlärningsmetoder i Contextual Bandits (CBs). Målet är att lära sig en förbättrad målpolicy, enbart från offline-data insamlad av en loggningspolicy. Baserat på detta data härleds en målpolicy genom att minimera uppkattade risken på målpolicien. Vi lär oss och utvärderar flera antennlutningspolicies med hjälp av verkliga nätverksdata och visar att dessa metoder kan lära sig förbättrade lutningskontrollpolicies, på ett säkert sätt.

Sedan fokuserar vi på utforskningssidan inom ett Best Policy Identification-inställning i Contextual Linear Bandits (CLBs), där belöningen har en bekväm linjär struktur. Målet är att, med så lite data som möjligt, identifiera en optimal policy. Vi härleder nedre gränser för antalet dataexempel som krävs av vilken algoritm som helst som returnerar en appriximativt optimal policy. Vi utvecklar algoritmer som lär sig optimala lutningskontrollpolicies från befintlig data (passiv inlärning) eller från data genererad av algoritmen (aktiv inlärning). Vi visar experimentellt att våra algoritmer producerar optimala antennlutningskontrollpolicies med färre prover än regelbaserade algoritmer.

Vi utforskar koordinationsaspekten genom att undersöka antennlutningsoptimeringsproblemet i ett fleragents-banditinscenario med en kopplad belöningsstruktur. Vi undersöker både Best Arm Identification (BAI) och Regret Minimization (RM). I BAI är målet att hitta en optimal global åtgärd med minimal dataexempelkomplexitet, medan i RM är målet att maximera den kumulativa belöningen över flera omgångar. Vi härleder instansspecifika nedre gränser för dataexempelkomplexiteten och ånger, som karaktäriserar den motsvarande optimala dataexempeltagningstrategin. Tyvärr erhålls dessa gränser genom att lösa kombinatoriska optimeringsproblem som är svåra att lösa och inte kan utnyttjas i utformningen av effektiva algoritmer. Inspirerade av Mean Field (MF)-approximationsmetoder utvecklar vi därfor en familj av approximationer till dessa problem och utforskar avvägningen mellan den uppnåeliga statistiska och beräkningsmässiga komplexiteten. Vi utvecklar algoritmer för BAI och RM vars provkomplexitet och ånger bevisligen matchar våra approximerade nedre gränser och tillämpar dessa algoritmer för att lösa antennlutningsoptimeringsproblemet.

Place, publisher, year, edition, pages
Stockholm, Sweden: Kungliga Tekniska högskolan, 2023. , p. xiv, 47
Series
TRITA-EECS-AVL ; 2023:66
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-337535ISBN: 978-91-8040-716-8 (print)OAI: oai:DiVA.org:kth-337535DiVA, id: diva2:1802841
Public defence
2023-10-30, Kollegiesalen, Brinellvägen 8, Stockholm, 09:00 (English)
Opponent
Supervisors
Note

QC 20231009

Available from: 2023-10-09 Created: 2023-10-05 Last updated: 2023-10-13Bibliographically approved
List of papers
1. Off-policy Learning for Remote Electrical Tilt Optimization
Open this publication in new window or tab >>Off-policy Learning for Remote Electrical Tilt Optimization
2020 (English)In: IEEE Vehicular Technology Conference, Institute of Electrical and Electronics Engineers Inc. , 2020Conference paper, Published paper (Refereed)
Abstract [en]

We address the problem of Remote Electrical Tilt (RET) optimization using off-policy Contextual Multi-Armed-Bandit (CMAB) techniques. The goal in RET optimization is to control the orientation of the vertical tilt angle of the antenna to optimize Key Performance Indicators (KPIs) representing the Quality of Service (QoS) perceived by the users in cellular networks. Learning an improved tilt update policy is hard. On the one hand, coming up with a new policy in an online manner in a real network requires exploring tilt updates that have never been used before, and is operationally too risky. On the other hand, devising this policy via simulations suffers from the simulation-to-reality gap. In this paper, we circumvent these issues by learning an improved policy in an offline manner using existing data collected on real networks. We formulate the problem of devising such a policy using the off-policy CMAB framework. We propose CMAB learning algorithms to extract optimal tilt update policies from the data. We train and evaluate these policies on real-world 4G Long Term Evolution (LTE) cellular network data. Our policies show consistent improvements over the rule-based logging policy used to collect the data. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers Inc., 2020
Keywords
4G mobile communication systems, Antennas, Benchmarking, Long Term Evolution (LTE), Mobile telecommunication systems, Quality control, Quality of service, Wireless networks, Cellular network, Key performance indicators, Multi armed bandit, Policy learning, Real networks, Reality gaps, Rule based, Tilt angle, Learning algorithms
National Category
Communication Systems
Identifiers
urn:nbn:se:kth:diva-292855 (URN)10.1109/VTC2020-Fall49728.2020.9348456 (DOI)000662218600026 ()2-s2.0-85098819646 (Scopus ID)
Conference
92nd IEEE Vehicular Technology Conference, VTC 2020-Fall, 18 November 2020 through 16 November 2020
Note

QC 20210505

Available from: 2021-05-05 Created: 2021-05-05 Last updated: 2023-10-05Bibliographically approved
2. Off-Policy Learning in Contextual Bandits for Remote Electrical Tilt Optimization
Open this publication in new window or tab >>Off-Policy Learning in Contextual Bandits for Remote Electrical Tilt Optimization
2023 (English)In: IEEE Transactions on Vehicular Technology, ISSN 0018-9545, E-ISSN 1939-9359, Vol. 72, no 1, p. 546-556Article in journal (Refereed) Published
Abstract [en]

We investigate the problem of Remote Electrical Tilt (RET) optimization using off-policy learning techniques devised for Contextual Bandits (CBs). The goal in RET optimization is to control the vertical tilt angle of antennas at base stations to optimize key performance indicators representing the Quality of Service (QoS) perceived by the users in cellular networks. Learning an improved tilt update policy is hard. On the one hand, coming up with a policy in an online manner in a real network requires exploring tilt updates that have never been used before, and is operationally too risky. On the other hand, devising this policy via simulations suffers from the simulation-to-reality gap. In this paper, we circumvent these issues by learning an improved policy in an offline manner using existing data collected on real networks. We formulate the problem of devising such a policy using the off-policy contextual bandit framework. We propose CB learning algorithms to extract optimal tilt update policies from the data. We train and evaluate these policies on real-world cellular network data. Our policies show consistent improvements over the rule-based logging policy used to collect the data.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Optimization, Antennas, Cellular networks, Fuzzy logic, Safety, Quality of service, Analytical models, Contextual bandits, off-policy evaluation, off-policy learning, remote electrical tilt optimization
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-326569 (URN)10.1109/TVT.2022.3202041 (DOI)000966751100001 ()2-s2.0-85137611705 (Scopus ID)
Note

QC 20230508

Available from: 2023-05-08 Created: 2023-05-08 Last updated: 2023-10-05Bibliographically approved
3. Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach
Open this publication in new window or tab >>Learning Optimal Antenna Tilt Control Policies: A Contextual Linear Bandit Approach
2022 (English)In: Proceedings - IEEE INFOCOM, Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 740-749Conference paper, Published paper (Refereed)
Abstract [en]

Controlling antenna tilts in cellular networks is imperative to reach an efficient trade-off between network coverage and capacity. In this paper, we devise algorithms learning optimal tilt control policies from existing data (in the so-called passive learning setting) or from data actively generated by the algorithms (the active learning setting). We formalize the design of such algorithms as a Best Policy Identification (BPI) problem in Contextual Linear Multi-Arm Bandits (CL-MAB). An arm represents an antenna tilt update; the context captures current network conditions; the reward corresponds to an improvement of performance, mixing coverage and capacity; and the objective is to identify, with a given level of confidence, an approximately optimal policy (a function mapping the context to an arm with maximal reward). For CL-MAB in both active and passive learning settings, we derive information-theoretical lower bounds on the number of samples required by any algorithm returning an approximately optimal policy with a given level of certainty, and devise algorithms achieving these fundamental limits. We apply our algorithms to the Remote Electrical Tilt (RET) optimization problem in cellular networks, and show that they can produce optimal tilt update policy using much fewer data samples than naive or existing rule-based learning algorithms. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Antennas, Economic and social effects, Information theory, Mobile telecommunication systems, Wireless networks, Active Learning, Antenna tilt, Cellular network, Control policy, Learning settings, Network Capacity, Optimal policies, Passive learning, Tilt control, Trade off, Learning algorithms
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-325322 (URN)10.1109/INFOCOM48880.2022.9796783 (DOI)000936344400075 ()2-s2.0-85133288997 (Scopus ID)
Conference
41st IEEE Conference on Computer Communications, INFOCOM 2022, 2 May 2022 through 5 May 2022
Note

QC 20230426

Available from: 2023-04-04 Created: 2023-04-04 Last updated: 2023-10-05Bibliographically approved
4. Identifying Optimal Remote Electrical Tilt Policies in Contextual Linear Bandits
Open this publication in new window or tab >>Identifying Optimal Remote Electrical Tilt Policies in Contextual Linear Bandits
(English)Manuscript (preprint) (Other academic)
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-302360 (URN)
Note

QC 20211103

Available from: 2021-09-21 Created: 2021-09-21 Last updated: 2023-10-06Bibliographically approved
5. Best Arm Identification in Multi-Agent Multi-Armed Bandits
Open this publication in new window or tab >>Best Arm Identification in Multi-Agent Multi-Armed Bandits
2023 (English)In: Proceedings of the 40 th International Conference on MachineLearning, Honolulu, Hawaii, USA / [ed] Andreas Krause, Emma Brunskill, Kyunghyun Cho, Barbara Engelhardt, Sivan Sabato, Jonathan Scarlett, MLResearchPress , 2023, Vol. 202, p. 34875-34907Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

We investigate the problem of best arm identification in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through afactor graph. The objective is to find an optimalglobal action with a prescribed level of confidenceand minimal sample complexity. We derive a tightinstance-specific lower bound of the sample complexity and characterize the corresponding optimal sampling strategy. Unfortunately, this boundis obtained by solving a combinatorial optimization problem with a number of variables and constraints exponentially growing with the number ofagents. We leverage Mean Field (MF) techniquesto obtain, in a computationally efficient manner,an approximation of the lower bound. The approximation scales at most as ρKd(where ρ, K,and d denote the number of factors in the graph,the number of possible actions per agent, and themaximal degree of the factor graph). We deviseMF-TaS (Mean-Field-Track-and-Stop), an algorithm whose sample complexity provably matchesour approximated lower bound. We illustratethe performance of MF-TaS numerically usingboth synthetic and real-world experiments (e.g.,to solve the antenna tilt optimization problem inradio communication networks).

Place, publisher, year, edition, pages
MLResearchPress, 2023
Series
Proceedings of Machine Learning Research, ISSN 2640-3498
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-336615 (URN)2-s2.0-85174397882 (Scopus ID)
Conference
International Conference on Machine Learning, 23-29 July 2023, Honolulu, Hawaii, USA
Note

QC 20230915

Available from: 2023-09-15 Created: 2023-09-15 Last updated: 2024-07-08Bibliographically approved
6. Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits
Open this publication in new window or tab >>Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits
2023 (English)Manuscript (preprint) (Other academic)
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-337463 (URN)
Conference
Neural Information Processing Systems (NeurIPS)
Note

QC 20231004

Available from: 2023-10-03 Created: 2023-10-03 Last updated: 2023-10-06Bibliographically approved

Open Access in DiVA

fulltext(2987 kB)362 downloads
File information
File name FULLTEXT01.pdfFile size 2987 kBChecksum SHA-512
a9b2be7d67f387d17449dfb98786894ff51841ea5a2aa7d54fa97e8148daf9b8e2b4e62f1c998367356b1e53bc420e15ef66586dcd0a0dbfb42a505c53869891
Type fulltextMimetype application/pdf

Authority records

Vannella, Filippo

Search in DiVA

By author/editor
Vannella, Filippo
By organisation
Decision and Control Systems (Automatic Control)
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 363 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1937 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf