kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Offline Reinforcement Learning for Downlink Link Adaption: A study on dataset and algorithm requirements for offline reinforcement learning.
KTH, School of Electrical Engineering and Computer Science (EECS).
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Offline Reinforcement Learning för nedlänksanpassning : En studie om krav på en datauppsättning och algoritm för offline reinforcement learning (Swedish)
Abstract [en]

This thesis studies offline reinforcement learning as an optimization technique for downlink link adaptation, which is one of many control loops in Radio access networks. The work studies the impact of the quality of pre-collected datasets, in terms of how much the data covers the state-action space and whether it is collected by an expert policy or not. The data quality is evaluated by training three different algorithms: Deep Q-networks, Critic regularized regression, and Monotonic advantage re-weighted imitation learning. The performance is measured for each combination of algorithm and dataset, and their need for hyperparameter tuning and sample efficiency is studied. The results showed Critic regularized regression to be the most robust because it could learn well from any of the datasets that were used in the study and did not require extensive hyperparameter tuning. Deep Q-networks required careful hyperparameter tuning, but paired with the expert data it managed to reach rewards equally as high as the agents trained with Critic Regularized Regression. Monotonic advantage re-weighted imitation learning needed data from an expert policy to reach a high reward. In summary, offline reinforcement learning can perform with success in a telecommunication use case such as downlink link adaptation. Critic regularized regression was the preferred algorithm because it could perform great with all the three different datasets presented in the thesis.

Abstract [sv]

Denna avhandling studerar offline reinforcement learning som en optimeringsteknik för nedlänks länkanpassning, vilket är en av många kontrollcyklar i radio access networks. Arbetet undersöker inverkan av kvaliteten på förinsamlade dataset, i form av hur mycket datan täcker state-action rymden och om den samlats in av en expertpolicy eller inte. Datakvaliteten utvärderas genom att träna tre olika algoritmer: Deep Q-nätverk, Critic regularized regression och Monotonic advantage re-weighted imitation learning. Prestanda mäts för varje kombination av algoritm och dataset, och deras behov av hyperparameterinställning och effektiv användning av data studeras. Resultaten visade att Critic regularized regression var mest robust, eftersom att den lyckades lära sig mycket från alla dataseten som användes i studien och inte krävde omfattande hyperparameterinställning. Deep Q-nätverk krävde noggrann hyperparameterinställning och tillsammans med expertdata lyckades den nå högst prestanda av alla agenter i studien. Monotonic advantage re-weighted imitation learning behövde data från en expertpolicy för att lyckas lära sig problemet. Det datasetet som var mest framgångsrikt var expertdatan. Sammanfattningsvis kan offline reinforcement learning vara framgångsrik inom telekommunikation, specifikt nedlänks länkanpassning. Critic regularized regression var den föredragna algoritmen för att den var stabil och kunde prestera bra med alla tre olika dataseten som presenterades i avhandlingen.

Place, publisher, year, edition, pages
2024. , p. 46
Series
TRITA-EECS-EX ; 2024:9
Keywords [en]
Offline Reinforcement Learning, Downlink Link adaptation, data analysis, Deep Q-networks, Critic Regularized Regression, Monotonic advantage reweighted imitation learning
Keywords [sv]
Offline Reinforcement Learning, nedlänksanpassning, data analys, Deep Qnetworks, Critic Regularized Regression, Monotonic advantage re-weighted imitation learning
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-344766OAI: oai:DiVA.org:kth-344766DiVA, id: diva2:1847465
External cooperation
Ericsson AB
Supervisors
Examiners
Available from: 2024-04-03 Created: 2024-03-27 Last updated: 2024-04-03Bibliographically approved

Open Access in DiVA

fulltext(1670 kB)255 downloads
File information
File name FULLTEXT01.pdfFile size 1670 kBChecksum SHA-512
e628f35ee8d7e7b3903d7702235a98c3895a95bc4521a5b4f9ac4c2d073267d273f3714d3512fdd525465fc271835a079a866c7a96dc85cfca6e9e53da0f7e2c
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 255 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 518 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf