Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
A scalable species-based genetic algorithm for reinforcement learning problems
KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Programvaruteknik och datorsystem, SCS. Ericsson, Torshamnsgatan 23, S-16483 Stockholm, Sweden..
Ericsson, Torshamnsgatan 23, S-16483 Stockholm, Sweden..
Ericsson, Torshamnsgatan 23, S-16483 Stockholm, Sweden..
2022 (engelsk)Inngår i: Knowledge engineering review (Print), ISSN 0269-8889, E-ISSN 1469-8005, Vol. 37, artikkel-id e9Artikkel, forskningsoversikt (Fagfellevurdert) Published
Abstract [en]

Reinforcement Learning (RL) methods often rely on gradient estimates to learn an optimal policy for control problems. These expensive computations result in long training times, a poor rate of convergence, and sample inefficiency when applied to real-world problems with a large state and action space. Evolutionary Computation (EC)-based techniques offer a gradient-free apparatus to train a deep neural network for RL problems. In this work, we leverage the benefits of EC and propose a novel variant of genetic algorithm called SP-GA which utilizes a species-inspired weight initialization strategy and trains a population of deep neural networks, each estimating the Q-function for the RL problem. Efficient encoding of a neural network that utilizes less memory is also proposed which provides an intuitive mechanism to apply Gaussian mutations and single-point crossover. The results on Atari 2600 games outline comparable performance with gradient-based algorithms like Deep Q-Network (DQN), Asynchronous Advantage Actor Critic (A3C), and gradient-free algorithms like Evolution Strategy (ES) and simple Genetic Algorithm (GA) while requiring far fewer hyperparameters to train. The algorithm also improved certain Key Performance Indicators (KPIs) when applied to a Remote Electrical Tilt (RET) optimization task in the telecommunication domain.

sted, utgiver, år, opplag, sider
Cambridge University Press (CUP) , 2022. Vol. 37, artikkel-id e9
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-319758DOI: 10.1017/S0269888922000042ISI: 000857823100001Scopus ID: 2-s2.0-85139662395OAI: oai:DiVA.org:kth-319758DiVA, id: diva2:1701723
Merknad

QC 20221007

Tilgjengelig fra: 2022-10-07 Laget: 2022-10-07 Sist oppdatert: 2023-05-29bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Søk i DiVA

Av forfatter/redaktør
Seth, Anirudh
Av organisasjonen
I samme tidsskrift
Knowledge engineering review (Print)

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 22 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf