kth.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A scalable species-based genetic algorithm for reinforcement learning problems
KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Programvaruteknik och datorsystem, SCS. Ericsson, Torshamnsgatan 23, S-16483 Stockholm, Sweden..
Ericsson, Torshamnsgatan 23, S-16483 Stockholm, Sweden..
Ericsson, Torshamnsgatan 23, S-16483 Stockholm, Sweden..
2022 (Engelska)Ingår i: Knowledge engineering review (Print), ISSN 0269-8889, E-ISSN 1469-8005, Vol. 37, artikel-id e9Artikel, forskningsöversikt (Refereegranskat) Published
Abstract [en]

Reinforcement Learning (RL) methods often rely on gradient estimates to learn an optimal policy for control problems. These expensive computations result in long training times, a poor rate of convergence, and sample inefficiency when applied to real-world problems with a large state and action space. Evolutionary Computation (EC)-based techniques offer a gradient-free apparatus to train a deep neural network for RL problems. In this work, we leverage the benefits of EC and propose a novel variant of genetic algorithm called SP-GA which utilizes a species-inspired weight initialization strategy and trains a population of deep neural networks, each estimating the Q-function for the RL problem. Efficient encoding of a neural network that utilizes less memory is also proposed which provides an intuitive mechanism to apply Gaussian mutations and single-point crossover. The results on Atari 2600 games outline comparable performance with gradient-based algorithms like Deep Q-Network (DQN), Asynchronous Advantage Actor Critic (A3C), and gradient-free algorithms like Evolution Strategy (ES) and simple Genetic Algorithm (GA) while requiring far fewer hyperparameters to train. The algorithm also improved certain Key Performance Indicators (KPIs) when applied to a Remote Electrical Tilt (RET) optimization task in the telecommunication domain.

Ort, förlag, år, upplaga, sidor
Cambridge University Press (CUP) , 2022. Vol. 37, artikel-id e9
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
URN: urn:nbn:se:kth:diva-319758DOI: 10.1017/S0269888922000042ISI: 000857823100001Scopus ID: 2-s2.0-85139662395OAI: oai:DiVA.org:kth-319758DiVA, id: diva2:1701723
Anmärkning

QC 20221007

Tillgänglig från: 2022-10-07 Skapad: 2022-10-07 Senast uppdaterad: 2023-05-29Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Sök vidare i DiVA

Av författaren/redaktören
Seth, Anirudh
Av organisationen
Programvaruteknik och datorsystem, SCS
I samma tidskrift
Knowledge engineering review (Print)
Datavetenskap (datalogi)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 22 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf