kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A study of wireless communications with reinforcement learning
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Information Science and Engineering.ORCID iD: 0000-0002-9878-3722
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

 The explosive proliferation of mobile users and wireless data traffic in recent years pose imminent challenges upon wireless system design. The trendfor wireless communications becoming more complicated, decentralized andintelligent is inevitable. Lots of key issues in this field are decision-makingrelated problems such as resource allocation, transmission control, intelligentbeam tracking in millimeter Wave (mmWave) systems and so on. Reinforcement learning (RL) was once a languishing field of AI for solving varioussequential decision-making problems. However, it got revived in the late 80sand early 90s when it was connected to dynamic programming (DP). Then,recently RL has progressed in many applications, especially when underliningmodels do not have explicit mathematical solutions and simulations must beused. For instance, the success of RL in AlphaGo and AlphaZero motivatedlots of recent research activities in RL from both academia and industries.Moreover, since computation power has dramatically increased within thelast decade, the methods of simulations and online learning (planning) become feasible for implementations and deployment of RL. Despite of its potentials, the applications of RL to wireless communications are still far frommature. Therefore, it is of great interest to investigate RL-based methodsand algorithms to adapt to different wireless communication scenarios. Morespecifically, this thesis with regards to RL in wireless communications can beroughly divided into the following parts:In the first part of the thesis, we develop a framework based on deepRL (DRL) to solve the spectrum allocation problem in the emerging integrated access and backhaul (IAB) architecture with large scale deploymentand dynamic environment. We propose to use the latest DRL method by integrating an actor-critic spectrum allocation (ACSA) scheme and a deep neuralnetwork (DNN) to achieve real-time spectrum allocation in different scenarios. The proposed methods are evaluated through numerical simulations andshow promising results compared with some baseline allocation policies.In the second part of the thesis, we investigate the decentralized RL algorithms using Alternating direction method of multipliers (ADMM) in applications of Edge IoT. For RL in a decentralized setup, edge nodes (agents)connected through a communication network aim to work collaboratively tofind a policy to optimize the global reward as the sum of local rewards. However, communication costs, scalability and adaptation in complex environments with heterogeneous agents may significantly limit the performance ofdecentralized RL. ADMM has a structure that allows for decentralized implementation and has shown faster convergence than gradient-descent-basedmethods. Therefore, we propose an adaptive stochastic incremental ADMM(asI-ADMM) algorithm and apply the asI-ADMM to decentralized RL withedge computing-empowered IoT networks. We provide convergence properties for proposed algorithms by designing a Lyapunov function and prove thatthe asI-ADMM has O(1=k) + O(1=M) convergence rate where k and M are thenumber of iterations and batch samples, respectively.The third part of the thesis considers the problem of joint beam training and data transmission control of delay-sensitive communications overvimmWave channels. We formulate the problem as a constrained Markov Decision Process (MDP), which aims to minimize the cumulative energy consumption over the whole considered period of time under delay constraints.By introducing a Lagrange multiplier, we reformulate the constrained MDPto an unconstrained one. Then, we solve it using the parallel-rollout-basedRL method in a data-driven manner. Our numerical results demonstrate thatthe optimized policy obtained from parallel rollout significantly outperformsother baseline policies in both energy consumption and delay performance.The final part of the thesis is a further study of the beam tracking problem using supervised learning approach. Due to computation and delay limitation in real deployment, a light-weight algorithm is desired in the beamtracking problem in mmWave networks. We formulate the beam tracking(beam sweeping) problem as a binary-classification problem, and investigatesupervised learning methods for the solution. The methods are tested in bothsimulation scenarios, i.e., ray-tracing model, and real testing data with Ericsson over-the-air (OTA) dataset. It showed that the proposed methods cansignificantly improve cell capacity and reduce overhead consumption whenthe number of UEs increases in the network. 

Abstract [sv]

Den explosiva spridningen av mobilanvändare och trådlös datatrafik un-der de senaste åren innebär överhängande utmaningar när det gäller designav trådlösa system. Trenden att trådlös kommunikation blir mer komplice-rad, decentraliserad och intelligent är oundviklig. Många nyckelfrågor inomdetta område är beslutsfattande problem såsom resursallokering, överförings-kontroll, intelligent spårning i millimetervågsystem (mmWave) och så vidare.Förstärkningsinlärning (RL) var en gång ett försvagande område för AI underen viss tidsperiod. Den återupplivades dock i slutet av 80-talet och början av90-talet när den kopplades till dynamisk programmering (DP). Sedan har RLnyligen utvecklats i många tillämpningar, speciellt när understrykande mo-deller inte har explicita matematiska lösningar och simuleringar måste använ-das. Till exempel motiverade framgångarna för RL i Alpha Go och AlphaGoZero många nya forskningsaktiviteter i RL från både akademi och industrier.Dessutom, eftersom beräkningskraften har ökat dramatiskt under det senastedecenniet, blir metoderna för simuleringar och onlineinlärning (planering) ge-nomförbara för implementeringar och distribution av RL. Trots potentialer ärtillämpningarna av RL för trådlös kommunikation fortfarande långt ifrån mo-gen. Baserat på observationer utvecklar vi RL-metoder och algoritmer underolika scenarier för trådlös kommunikation. Mer specifikt kan denna avhand-ling med avseende på RL i trådlös kommunikation grovt delas in i följandeartiklar:I den första delen av avhandlingen utvecklar vi ett ramverk baserat pådjup förstärkningsinlärning (DRL) för att lösa spektrumallokeringsproblemeti den framväxande integrerade access- och backhaul-arkitekturen (IAB) medstorskalig utbyggnad och dynamisk miljö. Vi föreslår att man använder densenaste DRL-metoden genom att integrera ett ACSA-schema (Actor-criticspectrum allocation) och ett djupt neuralt nätverk (DNN) för att uppnå real-tidsspektrumallokering i olika scenarier. De föreslagna metoderna utvärderasgenom numeriska simuleringar och visar lovande resultat jämfört med vissabaslinjetilldelningspolicyer.I den andra delen av avhandlingen undersöker vi den decentraliserade för-stärkningsinlärningen med Alternerande riktningsmetoden för multiplikatorer(ADMM) i applikationer av Edge IoT. För RL i en decentraliserad uppställ-ning syftar kantnoder (agenter) anslutna via ett kommunikationsnätverk tillatt samarbeta för att hitta en policy för att optimera den globala belöning-en som summan av lokala belöningar. Kommunikationskostnader, skalbarhetoch anpassning i komplexa miljöer med heterogena agenter kan dock avsevärtbegränsa prestandan för decentraliserad RL. ADMM har en struktur sommöjliggör decentraliserad implementering och har visat snabbare konvergensän gradientnedstigningsbaserade metoder. Därför föreslår vi en adaptiv sto-kastisk inkrementell ADMM (asI-ADMM) algoritm och tillämpar asI-ADMMpå decentraliserad RL med edge computing-bemyndigade IoT-nätverk. Vi till-handahåller konvergensegenskaper för föreslagna algoritmer genom att desig-na en Lyapunov-funktion och bevisar att asI-ADMM har O(1/k) + O(1/M)konvergenshastighet där k och M är antalet iterationer och satsprover.

Den tredje delen av avhandlingen behandlar problemet med gemensamstrålträning och dataöverföringskontroll av fördröjningskänslig kommunika-tion över millimetervågskanaler (mmWave). Vi formulerar problemet som enbegränsad Markov-beslutsprocess (MDP), som syftar till att minimera denkumulativa energiförbrukningen under hela den betraktade tidsperioden un-der fördröjningsbegränsningar. Genom att införa en Lagrange-multiplikatoromformulerar vi den begränsade MDP till en obegränsad. Sedan löser vi detmed hjälp av parallell-utrullning-baserad förstärkningsinlärningsmetod på ettdatadrivet sätt. Våra numeriska resultat visar att den optimerade policyn somerhålls från parallell utbyggnad avsevärt överträffar andra baslinjepolicyer ibåde energiförbrukning och fördröjningsprestanda.Den sista delen av avhandlingen är en ytterligare studie av strålspårnings-problem med hjälp av ett övervakat lärande. På grund av beräknings- ochfördröjningsbegränsningar i verklig distribution, är en lättviktsalgoritm önsk-värd i strålspårningsproblem i mmWave-nätverk. Vi formulerar beam tracking(beam sweeping) problemet som ett binärt klassificeringsproblem och under-söker övervakade inlärningsmetoder för lösningen. Metoderna testas i bådesimuleringsscenariot, det vill säga ray-tracing-modellen, och riktiga testda-ta med Ericsson over-the-air (OTA) dataset. Den visade att de föreslagnametoderna avsevärt kan förbättra cellkapaciteten och minska overheadför-brukningen när antalet UE ökar i nätverket.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2022. , p. 148
Series
TRITA-EECS-AVL ; 2022:26
Keywords [en]
Reinforcement learning, wireless communications, decentralized learning, beam tracking, machine learning
Keywords [sv]
Förstärkningsinlärning, trådlös kommunikation, decentrali- serad inlärning, strålspårning i mmvåg, maskininlärning
National Category
Applied Mechanics
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-312916ISBN: 978-91-8040-205-7 (print)OAI: oai:DiVA.org:kth-312916DiVA, id: diva2:1660717
Public defence
2022-06-14, F3, Lindstedtsvägen 26, Stockholm, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20220524

Available from: 2022-05-24 Created: 2022-05-24 Last updated: 2022-09-20Bibliographically approved
List of papers
1. Deep reinforcement learning-based spectrum allocation in integrated access and backhaul networks
Open this publication in new window or tab >>Deep reinforcement learning-based spectrum allocation in integrated access and backhaul networks
2020 (English)In: IEEE Transactions on Cognitive Communications and Networking, E-ISSN 2332-7731, Vol. 6, no 3, p. 970-979Article in journal (Refereed) Published
Abstract [en]

We develop a framework based on deep reinforcement learning (DRL) to solve the spectrum allocation problem in the emerging integrated access and backhaul (IAB) architecture with large scale deployment and dynamic environment. The available spectrum is divided into several orthogonal sub-channels, and the donor base station (DBS) and all IAB nodes have the same spectrum resource for allocation, where a DBS utilizes those sub-channels for access links of associated user equipment (UE) as well as for backhaul links of associated IAB nodes, and an IAB node can utilize all for its associated UEs. This is one of key features in which 5G differs from traditional settings where the backhaul networks are designed independently from the access networks. With the goal of maximizing the sum log-rate of all UE groups, we formulate the spectrum allocation problem into a mix-integer and non-linear programming. However, it is intractable to find an optimal solution especially when the IAB network is large and time-varying. To tackle this problem, we propose to use the latest DRL method by integrating an actor-critic spectrum allocation (ACSA) scheme and deep neural network (DNN) to achieve real-time spectrum allocation in different scenarios. The proposed methods are evaluated through numerical simulations and show promising results compared with some baseline allocation policies.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2020
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-312652 (URN)10.1109/TCCN.2020.2992628 (DOI)000568659500009 ()2-s2.0-85091582265 (Scopus ID)
Note

QC 20220530

Available from: 2022-05-19 Created: 2022-05-19 Last updated: 2023-01-25Bibliographically approved
2. Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in Edge Industrial IoT
Open this publication in new window or tab >>Adaptive Stochastic ADMM for Decentralized Reinforcement Learning in Edge Industrial IoT
Show others...
2021 (English)Manuscript (preprint) (Other academic)
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-312729 (URN)
Note

QC 20220530

Available from: 2022-05-21 Created: 2022-05-21 Last updated: 2022-07-12Bibliographically approved
3. Joint Beam Training and Data Transmission Control for mmWave Delay-Sensitive Communications: A Parallel Reinforcement Learning Approach
Open this publication in new window or tab >>Joint Beam Training and Data Transmission Control for mmWave Delay-Sensitive Communications: A Parallel Reinforcement Learning Approach
2022 (English)In: IEEE Journal of Selected Topics in Signal Processing, ISSN 1932-4553, Vol. 16, no 3, p. 447-459Article in journal (Refereed) Published
Abstract [en]

Future communication networks call for new solutions to support their capacity and delay demands by leveraging potentials of the millimeter wave (mmWave) frequency band. However, the beam training procedure in mmWave systems incurs significant overhead as well as huge energy consumption. As such, deriving an adaptive control policy is beneficial to both delay-sensitive and energy-efficient data transmission over mmWave networks. To this end, we investigate the problem of joint beam training and data transmission control for mmWave delay-sensitive communications in this paper. Specifically, the considered problem is firstly formulated as a constrained Markov Decision Process (MDP), which aims to minimize the cumulative energy consumption over the whole considered period of time under delay constraint. By introducing a Lagrange multiplier, we transform the constrained MDP into an unconstrained one, which is then solved via a parallel-rollout-based reinforcement learning method in a data-driven manner. Our numerical results demonstrate that the optimized policy via parallel-rollout significantly outperforms other baseline policies in both energy consumption and delay performance.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
Training, Data communication, Transmitters, Energy consumption, Delays, Array signal processing, Reinforcement learning, Beam training, data-driven, delay-sensitive, Markov decision process, millimeter wave, reinforcement learning
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-312654 (URN)10.1109/JSTSP.2022.3143488 (DOI)000797421100015 ()2-s2.0-85123368417 (Scopus ID)
Note

QC 20220530

Available from: 2022-05-19 Created: 2022-05-19 Last updated: 2022-06-25Bibliographically approved
4. Adaptive Beam Tracking With Supervised Learning
Open this publication in new window or tab >>Adaptive Beam Tracking With Supervised Learning
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

 Utilizing millimeter-wave (mmWave) frequencies forwireless communication in mobile systems is challenging sincecontinuous tracking of the beam direction is needed. For the purpose, beam sweeping is performed periodically. Such approachcan be sufficient in the initial deployment of the network whenthe number of users is small. However, a more efficient solutionis needed when lots of users are connected to the network due tohigher overhead consumption. We explore a supervised learningapproach to adaptively perform beam sweeping, which has lowimplementation complexity and can improve cell capacity byreducing beam sweeping overhead. By formulating the beamtracking problem as a binary classification problem, we appliedsupervised learning methods to solve the formulated problem.The methods were tested on two scenarios: ray-tracing outdoorscenario and over-the-air (OTA) testing dataset from Ericsson.Both experimental results show that the proposed methodssignificantly increase cell throughput comparing with existingexhaustive sweeping and periodical sweeping strategies. 

Keywords
millimeter-wave; beam tracking; supervised learning
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-312922 (URN)
Note

QC 20220524

Submitted to IEEE Wireless Communication Letters

Available from: 2022-05-24 Created: 2022-05-24 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

summary(3750 kB)1973 downloads
File information
File name SUMMARY01.pdfFile size 3750 kBChecksum SHA-512
3b0edf01484ae21452a0cd19f754f301193c43967397a262ab3b0a97d4b567028aaf9adefc8d56b5deaa7399880c5c8e375f8de4478b6290d47995666395fb04
Type fulltextMimetype application/pdf

Authority records

Lei, Wanlu

Search in DiVA

By author/editor
Lei, Wanlu
By organisation
Information Science and Engineering
Applied Mechanics

Search outside of DiVA

GoogleGoogle Scholar
Total: 0 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 747 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf