Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Partially Observed Markov Decision Process Multiarmed Bandits-Structural Results
University of British Columbia.
KTH, School of Electrical Engineering (EES), Automatic Control. KTH, School of Electrical Engineering (EES), Centres, ACCESS Linnaeus Centre. (System Identification Group)ORCID iD: 0000-0002-1927-1690
2009 (English)In: Mathematics of Operations Research, ISSN 0364-765X, E-ISSN 1526-5471, Vol. 34, no 2, 287-302 p.Article in journal (Refereed) Published
Abstract [en]

This paper considers multiarmed bandit problems involving partially observed Markov decision processes (POMDPs). We show how the Gittins index for the optimal scheduling policy can be computed by a value iteration algorithm on each process, thereby considerably simplifying the computational cost. A suboptimal value iteration algorithm based on Lovejoy's approximation is presented. We then show that for the case of totally positive of order 2 (TP2) transition probability matrices and monotone likelihood ratio (MLR) ordered observation probabilities, the Gittins index is MLR increasing in the information state. Algorithms that exploit this structure are then presented.

Place, publisher, year, edition, pages
2009. Vol. 34, no 2, 287-302 p.
Keyword [en]
multiarmed bandits, partially observed Markov decision process, monotone policies, likelihood ratio ordering, opportunistic scheduling, stochastic approximation algorithm
National Category
Control Engineering
Identifiers
URN: urn:nbn:se:kth:diva-18430DOI: 10.1287/moor.1080.0371ISI: 000266101500004Scopus ID: 2-s2.0-67649963279OAI: oai:DiVA.org:kth-18430DiVA: diva2:336477
Funder
Swedish Research Council
Note
QC 20100525 QC 20120102Available from: 2010-08-05 Created: 2010-08-05 Last updated: 2017-12-12Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Wahlberg, Bo

Search in DiVA

By author/editor
Krishnamurthy, VikramWahlberg, Bo
By organisation
Automatic ControlACCESS Linnaeus Centre
In the same journal
Mathematics of Operations Research
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 56 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf