Multi-optima exploration with adaptive Gaussian mixture model
2012 (English)In: Development and Learning and Epigenetic Robotics (ICDL), 2012 IEEE International Conference on, IEEE , 2012, 6400808- p.Conference paper (Refereed)
In learning by exploration problems such as reinforcement learning (RL), direct policy search, stochastic optimization or evolutionary computation, the goal of an agent is to maximize some form of reward function (or minimize a cost function). Often, these algorithms are designed to find a single policy solution. We address the problem of representing the space of control policy solutions by considering exploration as a density estimation problem. Such representation provides additional information such as shape and curvature of local peaks that can be exploited to analyze the discovered solutions and guide the exploration. We show that the search process can easily be generalized to multi-peaked distributions by employing a Gaussian mixture model (GMM) with an adaptive number of components. The GMM has a dual role: representing the space of possible control policies, and guiding the exploration of new policies. A variation of expectation-maximization (EM) applied to reward-weighted policy parameters is presented to model the space of possible solutions, as if this space was a probability distribution. The approach is tested in a dart game experiment formulated as a black-box optimization problem, where the agent's throwing capability increases while it chases for the best strategy to play the game. This experiment is used to study how the proposed approach can exploit new promising solution alternatives in the search process, when the optimality criterion slowly drifts over time. The results show that the proposed multi-optima search approach can anticipate such changes by exploiting promising candidates to smoothly adapt to the change of global optimum.
Place, publisher, year, edition, pages
IEEE , 2012. 6400808- p.
Adaptive Gaussian mixture, Black-box optimization, Control policy, Density estimation, Direct policy search, Dual role, Expectation Maximization, Gaussian Mixture Model, Global optimum, Number of components, Optimality criteria, Reward function, Search process, Stochastic optimizations
Engineering and Technology
IdentifiersURN: urn:nbn:se:kth:diva-118394DOI: 10.1109/DevLrn.2012.6400808ScopusID: 2-s2.0-84872867195ISBN: 978-146734963-5OAI: oai:DiVA.org:kth-118394DiVA: diva2:606542
2012 IEEE International Conference on Development and Learning and Epigenetic Robotics, ICDL 2012, 7 November 2012 through 9 November 2012, "San Diego,CA"
QC 201302192013-02-192013-02-182013-02-19Bibliographically approved