Change search
ReferencesLink to record
Permanent link

Direct link
Partially Observed Markov Decision Process Multiarmed Bandits-Structural Results
University of British Columbia.
KTH, School of Electrical Engineering (EES), Automatic Control. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre. (System Identification Group)ORCID iD: 0000-0002-1927-1690
2009 (English)In: Mathematics of Operations Research, ISSN 0364-765X, E-ISSN 1526-5471, Vol. 34, no 2, 287-302 p.Article in journal (Refereed) Published
Abstract [en]

This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy's approximation is presented. We then show that for the case of totally positive of order 2 (TP2) transition probability matrices and monotone likelihood ratio (MLR) ordered observation probabilities, the Gittins index is MLR increasing in the information state. Algorithms that exploit this structure are then presented.

Place, publisher, year, edition, pages
2009. Vol. 34, no 2, 287-302 p.
Keyword [en]
multiarmed bandits, partially observed Markov decision process, monotone policies, likelihood ratio ordering, opportunistic scheduling, stochastic approximation algorithm
National Category
Control Engineering
URN: urn:nbn:se:kth:diva-18430DOI: 10.1287/moor.1080.0371ISI: 000266101500004ScopusID: 2-s2.0-67649963279OAI: diva2:336477
Swedish Research Council
QC 20100525 QC 20120102Available from: 2010-08-05 Created: 2010-08-05 Last updated: 2013-09-05Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Krishnamurthy, VikramWahlberg, Bo
By organisation
Automatic ControlACCESS Linnaeus Centre
In the same journal
Mathematics of Operations Research
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 42 hits
ReferencesLink to record
Permanent link

Direct link