Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Embodied Evolution of Learning Ability
KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.
2007 (English)Doctoral thesis, comprehensive summary (Other scientific)
Abstract [en]

Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous, and autonomous properties of biological evolution. The evaluation, selection, and reproduction are carried out by cooperation and competition of the robots, without any need for human intervention. An embodied evolution framework is therefore well suited to study the adaptive learning mechanisms for artificial agents that share the same fundamental constraints as biological agents: self-preservation and self-reproduction.

The main goal of the research in this thesis has been to develop a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing of subpopulations of virtual agents inside each robot. The framework integrates reproduction as a directed autonomous behavior, and allows for learning of basic behaviors for survival by reinforcement learning. The purpose of the evolution is to evolve the learning ability of the agents, by optimizing meta-properties in reinforcement learning, such as the selection of basic behaviors, meta-parameters that modulate the efficiency of the learning, and additional and richer reward signals that guides the learning in the form of shaping rewards. The realization of the embodied evolution framework has been a cumulative research process in three steps: 1) investigation of the learning of a cooperative mating behavior for directed autonomous reproduction; 2) development of an embodied evolution framework, in which the selection of pre-learned basic behaviors and the optimization of battery recharging are evolved; and 3) development of an embodied evolution framework that includes meta-learning of basic reinforcement learning behaviors for survival, and in which the individuals are evaluated by an implicit and biologically inspired fitness function that promotes reproductive ability. The proposed embodied evolution methods have been validated in a simulation environment of the Cyber Rodent robot, a robotic platform developed for embodied evolution purposes. The evolutionarily obtained solutions have also been transferred to the real robotic platform.

The evolutionary approach to meta-learning has also been applied for automatic design of task hierarchies in hierarchical reinforcement learning, and for co-evolving meta-parameters and potential-based shaping rewards to accelerate reinforcement learning, both in regards to finding initial solutions and in regards to convergence to robust policies.

Place, publisher, year, edition, pages
Stockholm: KTH , 2007. , vi, 61 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 2007:16
Keyword [en]
Embodied Evolution, Evolutionary Robotics, Reinforcement Learning, Shaping Rewards, Meta-parameters, Hierarchical Reinforcement Learning
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-4515ISBN: 978-91-7178-787-3 (print)OAI: oai:DiVA.org:kth-4515DiVA: diva2:12636
Public defence
2007-11-12, Sal F3, KTH, Lindstedtsvägen 26, Stockholm, 10:00
Opponent
Supervisors
Note
QC 20100706Available from: 2007-10-23 Created: 2007-10-23 Last updated: 2010-09-21Bibliographically approved
List of papers
1. Multi-Agent Reinforcement Learning: Using Macro Actions to Learn a Mating Task
Open this publication in new window or tab >>Multi-Agent Reinforcement Learning: Using Macro Actions to Learn a Mating Task
2004 (English)In: 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Sendai, 2004, 3164-3169 p.Conference paper, Published paper (Refereed)
Abstract [en]

Standard reinforcement learning methods are inefficient and often inadequate for learning cooperative multi-agent tasks. For these kinds of tasks the behavior of one agent strongly depends on dynamic interaction with other agents, not only with the interaction with a static environment as in standard reinforcement learning. The success of the learning is therefore coupled to the agents' ability to predict the other agents' behaviors. In this study we try to overcome this problem by adding a few simple macro actions, actions that are extended in time for more than one time step. The macro actions improve the learning by making search of the state space more effective and thereby making the behavior more predictable for the other agent. In this study we have considered a cooperative mating task, which is the first step towards our aim to perform embodied evolution, where the evolutionary selection process is an integrated part of the task. We show, in simulation and hardware, that in the case of learning without macro actions, the agents fail to learn a meaningful behavior. In contrast, for the learning with macro action the agents learn a good mating behavior in reasonable time, in both simulation and hardware.

Keyword
Approximation theory; Autonomous agents; Computer simulation; Mathematical models; Multi agent systems; Parameter estimation; Problem solving
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-7565 (URN)2-s2.0-14044273146 (Scopus ID)0780384636 (ISBN)
Conference
2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS); Sendai; 28 Sept. - 2 Oct. 2004
Note
QC 20100706Available from: 2007-10-23 Created: 2007-10-23 Last updated: 2010-07-06Bibliographically approved
2. Biologically Inspired Embodied Evolution of Survival
Open this publication in new window or tab >>Biologically Inspired Embodied Evolution of Survival
2005 (English)In: 2005 IEEE Congress on Evolutionary Computation, IEEE CEC 2005. Proceedings, 2005, 2210-2216 p.Conference paper, Published paper (Refereed)
Abstract [en]

Embodied evolution is a methodology for evolutionary robotics that mimics the distributed, asynchronous and autonomous properties of biological evolution. The evaluation, selection and reproduction are carried out by and between the robots, without any need for human intervention. In this paper we propose a biologically inspired embodied evolution framework, which fully integrates self-preservation, recharging from external batteries in the environment, and self-reproduction, pair-wise exchange of genetic material, into a survival system. The individuals are, explicitly, evaluated for the performance of the battery capturing task, but also, implicitly, for the mating task by the fact that an individual that mates frequently has larger probability to spread its gene in the population. We have evaluated our method in simulation experiments and the simulation results show that the solutions obtained by our embodied evolution method were able to optimize the two survival tasks, battery capturing and mating, simultaneously. We have also performed preliminary experiments in hardware, with promising results.

Keyword
Computer simulation; Evolutionary algorithms; Genes; Population statistics; Probability
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-7566 (URN)10.1109/CEC.2005.1554969 (DOI)000232173100294 ()2-s2.0-27144533954 (Scopus ID)0-7803-9363-5 (ISBN)
Conference
2005 IEEE Congress on Evolutionary Computation, IEEE CEC 2005; Edinburgh, Scotland; 2 Sept. - 5 Sept. 2005
Note
QC 20100706Available from: 2007-10-23 Created: 2007-10-23 Last updated: 2011-10-14Bibliographically approved
3. Darwinian Embodied Evolution of the Learning Ability for Survival
Open this publication in new window or tab >>Darwinian Embodied Evolution of the Learning Ability for Survival
2011 (English)In: Adaptive Behavior, ISSN 1059-7123, E-ISSN 1741-2633, Vol. 19, no 2, 101-102 p.Article in journal (Refereed) Published
Abstract [en]

In this article we propose a framework for performing embodied evolution with a limited number of robots, by utilizing time-sharing in subpopulations of virtual agents hosted in each robot. Within this framework, we explore the combination of within-generation learning of basic survival behaviors by reinforcement learning, and evolutionary adaptations over the generations of the basic behavior selection policy, the reward functions, and metaparameters for reinforcement learning. We apply a biologically inspired selection scheme, in which there is no explicit communication of the individuals' fitness information. The individuals can only reproduce offspring by mating-a pair-wise exchange of genotypes-and the probability that an individual reproduces offspring in its own subpopulation is dependent on the individual's "health," that is, energy level, at the mating occasion. We validate the proposed method by comparing it with evolution using standard centralized selection, in simulation, and by transferring the obtained solutions to hardware using two real robots.

Keyword
Embodied evolution; evolutionary robotics; reinforcement learning; meta-learning; shaping rewards; metaparameters
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-7567 (URN)10.1177/1059712310397633 (DOI)000289714900002 ()2-s2.0-79955445257 (Scopus ID)
Note

QC 20110509 ändrad från submitted till pulished 20110509

Available from: 2007-10-23 Created: 2007-10-23 Last updated: 2017-12-14Bibliographically approved
4. Co-Evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning
Open this publication in new window or tab >>Co-Evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning
2008 (English)In: Adaptive Behavior, ISSN 1059-7123, E-ISSN 1741-2633, Vol. 16, no 6, 400-412 p.Article in journal (Refereed) Published
Abstract [en]

In this article, we explore an evolutionary approach to the optimization of potential-based shaping rewards and meta-parameters in reinforcement learning. Shaping rewards is a frequently used approach to increase the learning performance of reinforcement learning, with regards to both initial performance and convergence speed. Shaping rewards provide additional knowledge to the agent in the form of richer reward signals, which guide learning to high-rewarding states. Reinforcement learning depends critically on a few meta-parameters that modulate the learning updates or the exploration of the environment, such as the learning rate alpha, the discount factor of future rewards gamma, and the temperature tau that controls the trade-off between exploration and exploitation in softmax action selection. We validate the proposed approach in simulation using the mountain-car task. We also transfer shaping rewards and meta-parameters, evolutionarily obtained in simulation, to hardware, using a robotic foraging task.

Keyword
reinforcement learning; shaping rewards; meta-parameters; genetic algorithms
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-7568 (URN)10.1177/1059712308092835 (DOI)000260840100004 ()2-s2.0-55949119833 (Scopus ID)
Note
QC 20100706. Uppdaterad från Submitted till Published 20100706.Available from: 2007-10-23 Created: 2007-10-23 Last updated: 2017-12-14Bibliographically approved
5. Evolutionary Development of Hierarchical Learning Structures
Open this publication in new window or tab >>Evolutionary Development of Hierarchical Learning Structures
2007 (English)In: IEEE Transactions on Evolutionary Computation, ISSN 1089-778X, E-ISSN 1941-0026, Vol. 11, no 2, 249-264 p.Article in journal (Refereed) Published
Abstract [en]

Hierarchical reinforcement learning (RL) algorithms can learn a policy faster than standard RL algorithms. However, the applicability of hierarchical RL algorithms is limited by the fact that the task decomposition has to be performed in advance by the human designer. We propose a Lamarckian evolutionary approach for automatic development of the learning structure in hierarchical RL. The proposed method combines the MAXQ hierarchical RL method and genetic programming (GP). In the MAXQ framework, a subtask can optimize the policy independently of its parent task's policy, which makes it possible to reuse learned policies of the subtasks. In the proposed method, the MAXQ method learns the policy based on the task hierarchies obtained by GP, while the GP explores the appropriate hierarchies using the result of the MAXQ method. To show the validity of the proposed method, we have performed simulation experiments for a foraging task in three different environmental settings. The results show strong interconnection between the obtained learning structures and the 'given task environments. The main conclusion of the experiments is that the GP can find a minimal strategy, i.e., a hierarchy that minimizes the number of primitive subtasks that can be executed for each type of situation. The experimental results for the most challenging environment also show that the policies of the subtasks can continue to improve, even after the structure of the hierarchy has been evolutionary stabilized, as an effect of Lamarckian mechanisms.

Keyword
autonomous development; genetic programming (GP); hierarchical reinforcement learning (RL); Lamarckian evolution
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-7569 (URN)10.1109/TEVC.2006.890270 (DOI)000245518500008 ()2-s2.0-34047255425 (Scopus ID)
Note
QC 20100706Available from: 2007-10-23 Created: 2007-10-23 Last updated: 2017-12-14Bibliographically approved

Open Access in DiVA

fulltext(1329 kB)821 downloads
File information
File name FULLTEXT01.pdfFile size 1329 kBChecksum MD5
ac375cc743de7d62c249ccae47e2ee9da1632b3125da6de0ffed261a0fa55d17eb8b42e8
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Elfwing, Stefan
By organisation
Computer Vision and Active Perception, CVAP
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 821 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 768 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf