Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Two-target algorithms for infinite-armed bandits with Bernoulli rewards
Telecom Paris Tech.
KTH, School of Electrical Engineering (EES), Automatic Control. (ACL)
2013 (English)In: Advances in Neural Information Processing Systems 26 (2013), Morgan Kaufmann Publishers, 2013Conference paper, Published paper (Refereed)
Abstract [en]

We consider an infinite-armed bandit problem with Bernoulli rewards. The mean rewards are independent, uniformly distributed over [0; 1]. Rewards 0 and 1 are referred to as a success and a failure, respectively. We propose a novel algorithm where the decision to exploit any arm is based on two successive targets, namely, the total number of successes until the first failure and until the first m failures, respectively, where m is a fixed parameter. This two-target algorithm achieves a long-term average regret in 2√n for a large parameter m and a known time horizon n. This regret is optimal and strictly less than the regret achieved by the best known algorithms, which is in 2√n The results are extended to any mean-reward distribution whose support contains 1 and to unknown time horizons. Numerical experiments show the performance of the algorithm for finite time horizon.

Place, publisher, year, edition, pages
Morgan Kaufmann Publishers, 2013.
Series
Advances in Neural Information Processing Systems, ISSN 1049-5258 ; 26
Keyword [en]
Bandit problems, Bernoulli, Best-known algorithms, Finite time horizon, M-failure, Novel algorithm, Numerical experiments, Time horizons
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-139319Scopus ID: 2-s2.0-84898996413OAI: oai:DiVA.org:kth-139319DiVA: diva2:685137
Conference
27th Annual Conference on Neural Information Processing Systems, NIPS 2013; Lake Tahoe, NV; United States; 5 December 2013 through 10 December 2013
Funder
EU, European Research CouncilSwedish Research Council
Note

QC 20140625

Available from: 2014-01-08 Created: 2014-01-08 Last updated: 2014-06-25Bibliographically approved

Open Access in DiVA

fulltext(251 kB)22 downloads
File information
File name FULLTEXT01.pdfFile size 251 kBChecksum SHA-512
bd733d5dadcdc9ef9d14a41b50e7221c9a945749e092172439220dac81079bf7a74dd46066a8bdac5899a917e66bb673024779b55728df72a54ac9c2b65957d7
Type fulltextMimetype application/pdf

Scopus

Search in DiVA

By author/editor
Proutière, Alexandre
By organisation
Automatic Control
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 22 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 39 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf