kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A scalable species-based genetic algorithm for reinforcement learning problems
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS. Ericsson, Torshamnsgatan 23, S-16483 Stockholm, Sweden..
Ericsson, Torshamnsgatan 23, S-16483 Stockholm, Sweden..
Ericsson, Torshamnsgatan 23, S-16483 Stockholm, Sweden..
2022 (English)In: Knowledge engineering review (Print), ISSN 0269-8889, E-ISSN 1469-8005, Vol. 37, article id e9Article, review/survey (Refereed) Published
Abstract [en]

Reinforcement Learning (RL) methods often rely on gradient estimates to learn an optimal policy for control problems. These expensive computations result in long training times, a poor rate of convergence, and sample inefficiency when applied to real-world problems with a large state and action space. Evolutionary Computation (EC)-based techniques offer a gradient-free apparatus to train a deep neural network for RL problems. In this work, we leverage the benefits of EC and propose a novel variant of genetic algorithm called SP-GA which utilizes a species-inspired weight initialization strategy and trains a population of deep neural networks, each estimating the Q-function for the RL problem. Efficient encoding of a neural network that utilizes less memory is also proposed which provides an intuitive mechanism to apply Gaussian mutations and single-point crossover. The results on Atari 2600 games outline comparable performance with gradient-based algorithms like Deep Q-Network (DQN), Asynchronous Advantage Actor Critic (A3C), and gradient-free algorithms like Evolution Strategy (ES) and simple Genetic Algorithm (GA) while requiring far fewer hyperparameters to train. The algorithm also improved certain Key Performance Indicators (KPIs) when applied to a Remote Electrical Tilt (RET) optimization task in the telecommunication domain.

Place, publisher, year, edition, pages
Cambridge University Press (CUP) , 2022. Vol. 37, article id e9
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-319758DOI: 10.1017/S0269888922000042ISI: 000857823100001Scopus ID: 2-s2.0-85139662395OAI: oai:DiVA.org:kth-319758DiVA, id: diva2:1701723
Note

QC 20221007

Available from: 2022-10-07 Created: 2022-10-07 Last updated: 2023-05-29Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Seth, Anirudh
By organisation
Software and Computer systems, SCS
In the same journal
Knowledge engineering review (Print)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 22 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf