kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Statistical and Computational Trade-off in Multi-Agent Multi-Armed Bandits
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control). Ericsson Research Stockholm, Sweden.ORCID iD: 0000-0002-7668-0650
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-4679-4673
Ericsson Research Stockholm, Sweden.ORCID iD: 0000-0003-4533-3418
2023 (English)In: Advances in Neural Information Processing Systems 36 - 37th Conference on Neural Information Processing Systems, NeurIPS 2023, Neural Information Processing Systems Foundation , 2023, Vol. 36Conference paper, Published paper (Refereed)
Abstract [en]

We study the problem of regret minimization in Multi-Agent Multi-Armed Bandits (MAMABs) where the rewards are defined through a factor graph.We derive an instance-specific regret lower bound and characterize the minimal expected number of times each global action should be explored.This bound and the corresponding optimal exploration process are obtained by solving a combinatorial optimization problem whose set of variables and constraints exponentially grow with the number of agents, and cannot be exploited in the design of efficient algorithms.Inspired by Mean Field approximation techniques used in graphical models, we provide simple upper bounds of the regret lower bound.The corresponding optimization problems have a reduced number of variables and constraints.By tuning the latter, we may explore the trade-off between the achievable regret and the complexity of computing the corresponding exploration process.We devise Efficient Sampling for MAMAB (ESM), an algorithm whose regret asymptotically matches the approximated lower bounds.The regret and computational complexity of ESM are assessed numerically, using both synthetic and real-world experiments in radio communications networks.

Place, publisher, year, edition, pages
Neural Information Processing Systems Foundation , 2023. Vol. 36
Series
Advances in Neural Information Processing Systems, ISSN 1049-5258 ; 36
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-346142Scopus ID: 2-s2.0-85191164473OAI: oai:DiVA.org:kth-346142DiVA, id: diva2:1855927
Conference
37th Conference on Neural Information Processing Systems, NeurIPS 2023, New Orleans, United States of America, Dec 10 2023 - Dec 16 2023
Note

QC 20240507

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2024-07-04Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Vannella, FilippoProutiere, Alexandre

Search in DiVA

By author/editor
Vannella, FilippoProutiere, AlexandreJeong, Jaeseong
By organisation
Decision and Control Systems (Automatic Control)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 29 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf