Computing monotone policies for Markov decision processes by exploiting sparsity
2013 (English)In: 2013 3rd Australian Control Conference, AUCC 2013, IEEE , 2013, 6697239- p.Conference paper (Refereed)
This paper considers Markov decision processes whose optimal policy is a randomized mixture of monotone increasing policies. Such monotone policies have an inherent sparsity structure. We present a two-stage convex optimization algorithm for computing the optimal policy that exploits the sparsity. It combines an alternating direction method of multipliers (ADMM) to solve a linear programming problem with respect to the joint action state probabilities, together with a sub-gradient step that promotes the monotone sparsity pattern in the conditional probabilities of the action given the state. In the second step, sum-of-norms regularization is used to stress the monotone structure of the optimal policy.
Place, publisher, year, edition, pages
IEEE , 2013. 6697239- p.
Alternating direction method of multipliers, Conditional probabilities, Convex optimization algorithms, Linear programming problem, Markov Decision Processes, Sparsity patterns, Sparsity structure, State probability
IdentifiersURN: urn:nbn:se:kth:diva-137318DOI: 10.1109/AUCC.2013.6697239ISI: 000330581100001ScopusID: 2-s2.0-84893279383ISBN: 978-1-4799-2497-4OAI: oai:DiVA.org:kth-137318DiVA: diva2:678713
2013 3rd Australian Control Conference, AUCC 2013; Fremantle, WA; Australia; 4 November 2013 through 5 November 2013
QC 201402252013-12-122013-12-122014-03-20Bibliographically approved