Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evolutionary Development of Hierarchical Learning Structures
KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
Neural Computation Unit, Initial Research Project, Okinawa Institute of Science and Technology, Japan.
Neural Computation Unit, Initial Research Project, Okinawa Institute of Science and Technology, Japan.
KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
2007 (English)In: IEEE Transactions on Evolutionary Computation, ISSN 1089-778X, E-ISSN 1941-0026, Vol. 11, no 2, 249-264 p.Article in journal (Refereed) Published
Abstract [en]

Hierarchical reinforcement learning (RL) algorithms can learn a policy faster than standard RL algorithms. However, the applicability of hierarchical RL algorithms is limited by the fact that the task decomposition has to be performed in advance by the human designer. We propose a Lamarckian evolutionary approach for automatic development of the learning structure in hierarchical RL. The proposed method combines the MAXQ hierarchical RL method and genetic programming (GP). In the MAXQ framework, a subtask can optimize the policy independently of its parent task's policy, which makes it possible to reuse learned policies of the subtasks. In the proposed method, the MAXQ method learns the policy based on the task hierarchies obtained by GP, while the GP explores the appropriate hierarchies using the result of the MAXQ method. To show the validity of the proposed method, we have performed simulation experiments for a foraging task in three different environmental settings. The results show strong interconnection between the obtained learning structures and the 'given task environments. The main conclusion of the experiments is that the GP can find a minimal strategy, i.e., a hierarchy that minimizes the number of primitive subtasks that can be executed for each type of situation. The experimental results for the most challenging environment also show that the policies of the subtasks can continue to improve, even after the structure of the hierarchy has been evolutionary stabilized, as an effect of Lamarckian mechanisms.

Place, publisher, year, edition, pages
2007. Vol. 11, no 2, 249-264 p.
Keyword [en]
autonomous development; genetic programming (GP); hierarchical reinforcement learning (RL); Lamarckian evolution
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-7569DOI: 10.1109/TEVC.2006.890270ISI: 000245518500008Scopus ID: 2-s2.0-34047255425OAI: oai:DiVA.org:kth-7569DiVA: diva2:12635
Note
QC 20100706Available from: 2007-10-23 Created: 2007-10-23 Last updated: 2017-12-14Bibliographically approved
In thesis
1. Embodied Evolution of Learning Ability
Open this publication in new window or tab >>Embodied Evolution of Learning Ability
2007 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous, and autonomous properties of biological evolution. The evaluation, selection, and reproduction are carried out by cooperation and competition of the robots, without any need for human intervention. An embodied evolution framework is therefore well suited to study the adaptive learning mechanisms for artificial agents that share the same fundamental constraints as biological agents: self-preservation and self-reproduction.

The main goal of the research in this thesis has been to develop a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing of subpopulations of virtual agents inside each robot. The framework integrates reproduction as a directed autonomous behavior, and allows for learning of basic behaviors for survival by reinforcement learning. The purpose of the evolution is to evolve the learning ability of the agents, by optimizing meta-properties in reinforcement learning, such as the selection of basic behaviors, meta-parameters that modulate the efficiency of the learning, and additional and richer reward signals that guides the learning in the form of shaping rewards. The realization of the embodied evolution framework has been a cumulative research process in three steps: 1) investigation of the learning of a cooperative mating behavior for directed autonomous reproduction; 2) development of an embodied evolution framework, in which the selection of pre-learned basic behaviors and the optimization of battery recharging are evolved; and 3) development of an embodied evolution framework that includes meta-learning of basic reinforcement learning behaviors for survival, and in which the individuals are evaluated by an implicit and biologically inspired fitness function that promotes reproductive ability. The proposed embodied evolution methods have been validated in a simulation environment of the Cyber Rodent robot, a robotic platform developed for embodied evolution purposes. The evolutionarily obtained solutions have also been transferred to the real robotic platform.

The evolutionary approach to meta-learning has also been applied for automatic design of task hierarchies in hierarchical reinforcement learning, and for co-evolving meta-parameters and potential-based shaping rewards to accelerate reinforcement learning, both in regards to finding initial solutions and in regards to convergence to robust policies.

Place, publisher, year, edition, pages
Stockholm: KTH, 2007. vi, 61 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 2007:16
Keyword
Embodied Evolution, Evolutionary Robotics, Reinforcement Learning, Shaping Rewards, Meta-parameters, Hierarchical Reinforcement Learning
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-4515 (URN)978-91-7178-787-3 (ISBN)
Public defence
2007-11-12, Sal F3, KTH, Lindstedtsvägen 26, Stockholm, 10:00
Opponent
Supervisors
Note
QC 20100706Available from: 2007-10-23 Created: 2007-10-23 Last updated: 2010-09-21Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Elfwing, StefanChristensen, Henrik
By organisation
Computer Vision and Active Perception, CVAP
In the same journal
IEEE Transactions on Evolutionary Computation
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 75 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf