Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Q-learning för fyra i rad.
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2011 (Swedish)Independent thesis Advanced level (professional degree), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Q-learning is an algorithm for self-learning where the the learner is rewarded for encouraged behaviour and punished for unwanted behaviour. The report investigates how many matches of connect-four for a self-learned player through Q-learning is needed to win on average 90 % of all matches against a random player, a pattern matchin player and against a calculating player. It will also be investigated wherever the Q-learning player can be combined with other algorithms to create an improved player. Within realistic time on a personal computer the Q-learner beats the pattern matching and calculating player. The random player cannot be beaten in reasonable time, and would also require an unreasonable ammount of memory. The improved Q-learning player that combines Q-learning with pattern matching and calculated start values can be created and this one beats all the three players after a very short learning period.

Abstract [sv]

Q-learning är ett inlärningsalgoritm där den lärande får belöning vid positiva beteenden och bestraffning vid negativa. Rapporten avser undersöka hur många matcher i fyra i rad som krävs innan en självlärd spelare som använder Q-learning vinner i snitt 90 % av alla matcher mot en slumpande spelare, en mönstermatchande samt en beräknande spelare. Det undersöks även om Q-learning kan kombineras med andra algoritmer för att skapa en bättre spelare. Q-spelaren slår inom rimlig tid de mönstermatchande och beräknande spelarna men får problem att inom rimlig tid och minnesanvänding slå den slumpande spelaren. Den förbättrade Q-learning-spelaren som kombinerar Q-learning med mönstermatchning och beräknade startvärden slår däremot alla tre spelare inom mycket kort inlärningstid.

Place, publisher, year, edition, pages
2011.
Series
Kandidatexjobb CSC ; K11049
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-130826OAI: oai:DiVA.org:kth-130826DiVA, id: diva2:654273
Educational program
Master of Science in Engineering - Computer Science and Technology
Uppsok
Technology
Supervisors
Examiners
Available from: 2013-10-07 Created: 2013-10-07 Last updated: 2018-01-11

Open Access in DiVA

No full text in DiVA

Other links

http://www.csc.kth.se/utbildning/kandidatexjobb/datateknik/2011/rapport/hassel_olle_OCH_janse_petter_K11049.pdf
By organisation
School of Computer Science and Communication (CSC)
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 42 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf