kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The Cost of Uncertainty in Self-play Reinforcement Learning and Search
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. FOI Swedish Defence Research Agency, SE-164 90 Stockholm, Sweden.ORCID iD: 0000-0002-2677-9759
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. FOI Swedish Defence Research Agency, SE-164 90 Stockholm, Sweden.
FOI Swedish Defence Research Agency, SE-164 90 Stockholm, Sweden.
Show others and affiliations
2025 (English)In: 2025 IEEE International Conference on Intelligence and Security Informatics (ISI), Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 113-120Conference paper, Published paper (Refereed)
Abstract [en]

The combination of reinforcement learning and look-ahead search introduced in AlphaGo, has revolutionized our understanding of tactics and strategy in classical strategy games such as Go and chess. Until recently, this pioneering approach has been limited to perfect information games, where players have full information about the current state of the game. This paper investigates the recent generalization of reinforcement learning with search to imperfect information games, such as poker, where parts of the game state, e.g., the opponent’s hand, is hidden from the player. The paper explores how well this approach scales as the amount of hidden information increases. To this end, the current state of the art in reinforcement learning with search, the student of games general learning algorithm, is reproduced and evaluated across three variants of a custom poker game, each differing by the number of hidden cards dealt to players. It is found that games with less hidden information are learned more effectively, and that computational demands scale sublinearly with increasing hidden information.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2025. p. 113-120
Keywords [en]
computer poker, counterfactual regret minimization, imperfect information games, Reinforcement learning, student of games algorithm, tree search
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-377986DOI: 10.1109/ISI65680.2025.11201174Scopus ID: 2-s2.0-105030660994OAI: oai:DiVA.org:kth-377986DiVA, id: diva2:2045402
Conference
21st Annual IEEE International Conference on Intelligence and Security Informatics, ISI 2025, Hong Kong, China, July 12-13, 2025
Note

Part of ISBN 9798331512767

QC 20260312

Available from: 2026-03-12 Created: 2026-03-12 Last updated: 2026-03-12Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Oscarsson, MarcusBrynielsson, JoelCohen, Mika

Search in DiVA

By author/editor
Oscarsson, MarcusBrynielsson, JoelCohen, Mika
By organisation
KTHTheoretical Computer Science, TCS
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 22 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf