Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reinforcement Learning and the Game of Nim
KTH, School of Engineering Sciences (SCI).
KTH, School of Engineering Sciences (SCI).
2015 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

This paper treats the concept of Reinforcement Learning (RL) applied to finding the winning strategy of the mathematical game Nim. Two algorithms, Q-learning and SARSA, were compared using several different sets of parameters in three different training regimes. Ananalysis on scalability was also undertaken.

It was found that tuning parameters for optimality is difficult andtime-consuming, yet the RL agents did learn a winning strategy, in essentially the same time for both algorithms. As for scalability, it showed that increased learning time is indeed a problem in this approach.

The relevance of the different training regimes as well as other conceptual matters of the approach are discussed.

It is concluded that this usage of RL is a promising method, although laborious to optimize in this case and quickly becomes ineffective when scaling up the problem. Ideas are discussed and proposed for future research on solving these limiting factors.

Abstract [sv]

Denna rapport behandlar konceptet Reinforcement Learning (RL) och RL-agenters förmåga att lära sig den vinnande strategin i det matematiska spelet Nim. Två algoritmer, Q-learning och SARSA, med flera olika parameterinställningar jämfördes i tre olika träningsregimer. Därutöver analyserades effekterna av storleksökning av spelet.

I undersökningen visade det sig att bestämmandet av parametrar förett optimalt beteende var väldigt svårt och tidskrävande, även om RLagenterna med de funna parametrarna lyckades lära sig den vinnande strategin, och båda algoritmerna verkade lära sig ungefär lika snabbt. Att ökningen av inlärningstid vid växande uppgifter är ett problem verifierades också i undersökningen.

Relevansen av de olika träningsregimerna behandlas, likväl andra konceptuella frågor.

Som slutsats kan sägas att denna tillämpning av RL är en lovande metod men komplicerad att optimera och med nackdelen att den lätt blir ineffektiv vid större problem. I rapporten diskuteras idéer om föreslagen forskning på lösningar till de begränsande faktorerna.

Place, publisher, year, edition, pages
2015. , 32 p.
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-168213OAI: oai:DiVA.org:kth-168213DiVA: diva2:814832
Supervisors
Examiners
Available from: 2015-05-28 Created: 2015-05-28 Last updated: 2015-05-28Bibliographically approved

Open Access in DiVA

fulltext(1615 kB)330 downloads
File information
File name FULLTEXT01.pdfFile size 1615 kBChecksum SHA-512
5574679e052ddda02fc46c0df80e69caca35eadbb7baf353ee4eb0a20c8c18dc196e14ddf8154126402df8221dd42fb886bf4033006d910b966f1043e1f6287d
Type fulltextMimetype application/pdf

By organisation
School of Engineering Sciences (SCI)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 330 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 366 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf