kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Sample Efficient Multi-Agent Approach to Continuous Reinforcement Learning
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. Ericsson AB, Stockholm, Sweden..ORCID iD: 0000-0003-1558-4670
RISE AI, Res Inst Sweden, Kista, Sweden..
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.ORCID iD: 0000-0001-7949-1815
2022 (English)In: 2022 18Th International Conference On Network And Service Management (CNSM 2022): INTELLIGENT MANAGEMENT OF DISRUPTIVE NETWORK TECHNOLOGIES AND SERVICES / [ed] Charalambides, M Papadimitriou, P Cerroni, W Kanhere, S Mamatas, L, IEEE , 2022, p. 338-344Conference paper, Published paper (Refereed)
Abstract [en]

As design, deployment and operation complexity increase in mobile systems, adaptive self-learning techniques have become essential enablers in mitigation and control of the complexity problem. Artificial intelligence and, in particular, reinforcement learning has shown great potential in learning complex tasks through observations. The majority of ongoing reinforcement learning research activities focus on single-agent problem settings with an assumption of accessibility to a globally observable state and action space. In many real-world settings, such as LTE or 5G, decision making is distributed and there is often only local accessibility to the state space. In such settings, multi-agent learning may be preferable, with the added challenge of ensuring that all agents collaboratively work towards achieving a common goal. We present a novel cooperative and distributed actor-critic multi-agent reinforcement learning algorithm. We claim the approach is sample efficient, both in terms of selecting observation samples and in terms of assignment of credit between subsets of collaborating agents.

Place, publisher, year, edition, pages
IEEE , 2022. p. 338-344
Series
International Conference on Network and Service Management, ISSN 2165-9605
Keywords [en]
Machine learning, Radio resource scheduling
National Category
Computer Sciences Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:kth:diva-323581DOI: 10.23919/CNSM55787.2022.9965060ISI: 000903721000044Scopus ID: 2-s2.0-85143886726OAI: oai:DiVA.org:kth-323581DiVA, id: diva2:1735237
Conference
18th International Conference on Network and Service Management (CNSM) - Intelligent Management of Disruptive Network Technologies and Services, OCT 31-NOV 04, 2022, Thessaloniki, GREECE
Note

Part of proceedings: ISBN 978-3-903176-51-5, QC 20230208

Available from: 2023-02-08 Created: 2023-02-08 Last updated: 2025-02-01Bibliographically approved
In thesis
1. Systematic Data-Driven Continual Self-Learning
Open this publication in new window or tab >>Systematic Data-Driven Continual Self-Learning
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

There is a lot of unexploited potential in using data-driven and self-learning methods to dramatically improve automatic decision-making and control in complex industrial systems. So far, and on a relatively small scale, these methods have demonstrated some potential to achieve performance gains for the automated tuning of complex distributed systems. However, many difficult questions and challenges remain in relation to how to design methods and organise their deployment and operation into large-scale real-world systems. For systematic and scalable integration of state-of-the-art machine learning into such systems, we propose a structured architectural approach.

To understand the essential elements of this architecture, we identify a set of foundational challenges and then derive a set of five research questions. These questions drill into the essential and complex interdependency between data streams, self-learning algorithms that never stop learning and the supporting reference and run-time architectural structures. While there is a need for traditional one-shot supervised models, pushing the technical boundaries of automating all classes of machine learning model training will require a continual approach. 

To support continual learning, real-time data streams are complemented with accurate synthetic data generated for use in model training. By developing and integrating advanced simulations, models can be trained before deployment into a live system, for which system accuracy is then measured quantitatively in realistic scenarios. Reinforcement learning, exploring an action space and qualifying effective dynamic action combinations, is here employed for effective network policy learning. While single-agent and centralised model training may be appropriate in some cases, distributed multi-agent self-learning is essential in industrial scale systems, and thus such a scalable and energy-efficient approach is developed, implemented and analysed in detail. 

Energy usage minimisation in software and hardware intense communication systems, such as the 5G radio access system, is an important and difficult problem in its own right. Our work has focused on energy-aware approaches to applying self-learning methods both to energy reduction applications and algorithms. Using this approach, we can demonstrate clear energy savings while at the same time improving system performance.

Perhaps most importantly, our work attempts to form an understanding of the broader industrial system issues of applying self-learning approaches at scale. Our results take some clear, formative, steps towards large-scale industrialisation of self-learning approaches in communication systems such as 5G.

Abstract [sv]

Datadrivna och självlärande system besitter en mycket stor outnyttjad potential för att förbättra automatisk kontroll och automatiskt beslutsfattande i komplexa industriella system. I mindre skala så har dessa metoder visats ha en viss potential rörande förbättrad prestanda för  automatisk justering av komplexa distribuerande system. Trots detta återstår många svåra frågor och utmaningar kring hur man utformar metoder och hur man organiserar implementering och drift för dessa i storskaliga realtidssystem. 

För systematisk och skalbar integrering av moderna maskininlärningstekniker i dessa verkliga och kommersiellt fungerande system föreslår vi här en strukturerad metod. För att förstå de viktigaste beståndsdelarna och arkitektoniska utmaningarna så namnger och förklarar vi en uppsättning sådana. Ur dessa härleder vi sedan fem forskningsfrågor, vilka undersöker det komplexa beroendeförhållandet mellan dataströmmar, självlärande algoritmer med kontinuerlig inlärning, samt stödjande referens- och driftstrukturer.Det finns fortfarande ett behov av övervakade ''one-shot''-modeller, men för att tänja på de tekniska gränserna avseende automatiserad träning av alla olika slags självlärande system så krävs en kontinuerlig metod. För att främja kontinuerlig inlärning kompletteras realtidsdataströmmar med adekvata syntetiska data, genererade för att möjliggöra träning av modellerna.Genom att utveckla och integrera avancerade simuleringar kan system och modeller tränas innan de implementeras för att användas ''live'', där systemets prestanda and korrekthet kan mätas kvantitativt i realistiska scenarier. För effektiv inlärning av en policy för nätverk så används förstärkningsinlärning (''reinforcement learning''), som utforskar en rymd av möjliga handlingar, ofta i kvalificerade kombinationer.

Medan centraliserad träning kan vara lämpligt i vissa fall så är distribuerade och självlärande agenter nödvändiga komponenter i industriellt storskaliga system. Därför utvecklar, implementerar och detaljanalyserar vi en sådan skalbar och energieffektiv metod.Att minska energianvändningen i mjuk- och hårdvaruintensiva kommunikationssystem, som 5G-radiosystemet, är en svår och viktig utmaning i sig. Vårt arbete har fokuserat på en energimedveten ansats med självlärande metoder, både  för tillämpningarna och för de grundläggande algoritmerna. Genom denna ansats har vi lyckats påvisa avsevärda energibesparingar samtidigt som systemets prestanda förbättrats. Till sist så är nyckelresultatet i vårt arbete analysen av de största utmaningarna för självlärande system i industriell skala och vi har därmed  tagit ett stort steg emot storskalig industrialisering av självlärande metoder inom kommunikationssystem 

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2023. p. xxvii, 154
Series
TRITA-EECS-AVL ; 2023:29
Keywords
Data-Driven Methods, Self-Learning Systems, Reinforcement Learning Algorithms, Implementation Architectures, Datadrivna metoder, Självlärande system, Reinforcement Learning-algoritmer, Implementeringsarkitekturer
National Category
Communication Systems Computer Systems
Identifiers
urn:nbn:se:kth:diva-325733 (URN)978-91-8040-534-8 (ISBN)
Public defence
2023-05-09, Ka-Sal C, KTH, Kistagången 16, Kista, Stockholm, 15:00 (English)
Opponent
Supervisors
Note

QC 20230414

Available from: 2023-04-17 Created: 2023-04-14 Last updated: 2023-04-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Corcoran, DiarmuidBoman, Magnus

Search in DiVA

By author/editor
Corcoran, DiarmuidBoman, Magnus
By organisation
Software and Computer systems, SCS
Computer SciencesComputer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 291 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf