Open this publication in new window or tab >>Show others...
2025 (English)In: 2025 IEEE 64th Conference on Decision and Control, CDC 2025, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 6814-6819Conference paper, Published paper (Refereed)
Abstract [en]
This paper considers online bandit games with arbitrary delays, where the cost functions of all self-interested players are time-varying. In addition, players lack an explicit model of the game and can only learn their actions based on the sole available feedback of delayed cost values. To address this challenging setting, a novel learning algorithm named Cumulative Bandit Online Learning with arbitrary delays (CBOL-ad) is proposed. We conduct regret analysis for time-varying games where the player-specific problem is convex, explicitly revealing the influence of time delays and game structure on the regret bound. In particular, under certain delay conditions, our bound can achieve the same order as that of online bandit optimization problems without delays. Finally, numerical simulations are provided to illustrate the algorithmic performance.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Control Engineering Other Mathematics
Identifiers
urn:nbn:se:kth:diva-378896 (URN)10.1109/CDC57313.2025.11312073 (DOI)2-s2.0-105031876553 (Scopus ID)
Conference
64th IEEE Conference on Decision and Control, CDC 2025, Rio de Janeiro, Brazil, Dec 9 2025 - Dec 12 2025
Note
Part of ISBN 9798331526276
QC 20260409
2026-04-092026-04-092026-04-09Bibliographically approved