Co-Evolution of Shaping Rewards and Meta-Parameters in Reinforcement Learning
2008 (English)In: Adaptive Behavior, ISSN 1059-7123, E-ISSN 1741-2633, Vol. 16, no 6, 400-412 p.Article in journal (Refereed) Published
In this article, we explore an evolutionary approach to the optimization of potential-based shaping rewards and meta-parameters in reinforcement learning. Shaping rewards is a frequently used approach to increase the learning performance of reinforcement learning, with regards to both initial performance and convergence speed. Shaping rewards provide additional knowledge to the agent in the form of richer reward signals, which guide learning to high-rewarding states. Reinforcement learning depends critically on a few meta-parameters that modulate the learning updates or the exploration of the environment, such as the learning rate alpha, the discount factor of future rewards gamma, and the temperature tau that controls the trade-off between exploration and exploitation in softmax action selection. We validate the proposed approach in simulation using the mountain-car task. We also transfer shaping rewards and meta-parameters, evolutionarily obtained in simulation, to hardware, using a robotic foraging task.
Place, publisher, year, edition, pages
2008. Vol. 16, no 6, 400-412 p.
reinforcement learning; shaping rewards; meta-parameters; genetic algorithms
IdentifiersURN: urn:nbn:se:kth:diva-7568DOI: 10.1177/1059712308092835ISI: 000260840100004ScopusID: 2-s2.0-55949119833OAI: oai:DiVA.org:kth-7568DiVA: diva2:12634
QC 20100706. Uppdaterad från Submitted till Published 20100706.2007-10-232007-10-232010-07-06Bibliographically approved