kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Conjectural Online Learning with First-order Beliefs in Asymmetric Information Stochastic Games
New York University, Department of Electrical and Computer Engineering, USA.
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.ORCID iD: 0000-0003-1773-8354
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Network and Systems Engineering.ORCID iD: 0000-0001-6039-8493
New York University, Department of Electrical and Computer Engineering, USA.
2024 (English)In: 2024 IEEE 63rd Conference on Decision and Control, CDC 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 6780-6785Conference paper, Published paper (Refereed)
Abstract [en]

Asymmetric information stochastic games (AISGs) arise in many complex socio-technical systems, such as cyberphysical systems and IT infrastructures. Existing computational methods for AISGs are primarily offline and can not adapt to equilibrium deviations. Further, current methods are limited to particular information structures to avoid belief hierarchies. Considering these limitations, we propose conjectural online learning (COL), an online learning method under generic information structures in AISGs. COL uses a forecaster-actorcritic (FAC) architecture, where subjective forecasts are used to conjecture the opponents' strategies within a lookahead horizon, and Bayesian learning is used to calibrate the conjectures. To adapt strategies to nonstationary environments based on information feedback, COL uses online rollout with cost function approximation (actor-critic). We prove that the conjectures produced by COL are asymptotically consistent with the information feedback in the sense of a relaxed Bayesian consistency. We also prove that the empirical strategy profile induced by COL converges to the Berk-Nash equilibrium, a solution concept characterizing rationality under subjectivity. Experimental results from an intrusion response use case demonstrate COL's faster convergence over state-of-the-art reinforcement learning methods against nonstationary attacks.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. p. 6780-6785
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-361743DOI: 10.1109/CDC56724.2024.10886479Scopus ID: 2-s2.0-86000618322OAI: oai:DiVA.org:kth-361743DiVA, id: diva2:1948010
Conference
63rd IEEE Conference on Decision and Control, CDC 2024, Milan, Italy, December 16-19, 2024
Note

Part of ISBN 9798350316339

QC 20250328

Available from: 2025-03-27 Created: 2025-03-27 Last updated: 2025-03-28Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Hammar, KimStadler, Rolf

Search in DiVA

By author/editor
Hammar, KimStadler, Rolf
By organisation
Network and Systems Engineering
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 17 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf