Full-text not available in DiVA
Author:
Elfwing, Stefan (KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP)
Uchibe, E. (Neural Computation Unit, Initial Research Project, Okinawa Institute of Science and Technology, Japan)
Doya, K. (Neural Computation Unit, Initial Research Project, Okinawa Institute of Science and Technology, Japan)
Christensen, Henrik (KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP)
Title:
Evolutionary Development of Hierarchical Learning Structures
Department:
KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP
Publication type:
Article in journal (Refereed)
Language:
English
Status:
Published
In:
IEEE Transactions on Evolutionary Computation(ISSN 1089-778X)(EISSN 1941-0026)
Volume:
11
Issue:
2
Pages:
249-264
Year of publ.:
2007
URI:
urn:nbn:se:kth:diva-7569
Permanent link:
http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-7569
ISI:
000245518500008
Subject category:
Computer Science
SVEP category:
Computer science
Keywords(en) :
autonomous development; genetic programming (GP); hierarchical reinforcement learning (RL); Lamarckian evolution
Abstract(en) :

Hierarchical reinforcement learning (RL) algorithms can learn a policy faster than standard RL algorithms. However, the applicability of hierarchical RL algorithms is limited by the fact that the task decomposition has to be performed in advance by the human designer. We propose a Lamarckian evolutionary approach for automatic development of the learning structure in hierarchical RL. The proposed method combines the MAXQ hierarchical RL method and genetic programming (GP). In the MAXQ framework, a subtask can optimize the policy independently of its parent task's policy, which makes it possible to reuse learned policies of the subtasks. In the proposed method, the MAXQ method learns the policy based on the task hierarchies obtained by GP, while the GP explores the appropriate hierarchies using the result of the MAXQ method. To show the validity of the proposed method, we have performed simulation experiments for a foraging task in three different environmental settings. The results show strong interconnection between the obtained learning structures and the 'given task environments. The main conclusion of the experiments is that the GP can find a minimal strategy, i.e., a hierarchy that minimizes the number of primitive subtasks that can be executed for each type of situation. The experimental results for the most challenging environment also show that the policies of the subtasks can continue to improve, even after the structure of the hierarchy has been evolutionary stabilized, as an effect of Lamarckian mechanisms.

Note:
QC 20100706
Available from:
2007-10-23
Created:
2007-10-23
Last updated:
2010-07-06
Statistics:
24 hits