Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Q-learning för fyra i rad.
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC).
2011 (Swedish)Independent thesis Advanced level (professional degree), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Q-learning is an algorithm for self-learning where the the learner is rewarded for encouraged behaviour and punished for unwanted behaviour. The report investigates how many matches of connect-four for a self-learned player through Q-learning is needed to win on average 90 % of all matches against a random player, a pattern matchin player and against a calculating player. It will also be investigated wherever the Q-learning player can be combined with other algorithms to create an improved player. Within realistic time on a personal computer the Q-learner beats the pattern matching and calculating player. The random player cannot be beaten in reasonable time, and would also require an unreasonable ammount of memory. The improved Q-learning player that combines Q-learning with pattern matching and calculated start values can be created and this one beats all the three players after a very short learning period.

Abstract [sv]

Q-learning är ett inlärningsalgoritm där den lärande får belöning vid positiva beteenden och bestraffning vid negativa. Rapporten avser undersöka hur många matcher i fyra i rad som krävs innan en självlärd spelare som använder Q-learning vinner i snitt 90 % av alla matcher mot en slumpande spelare, en mönstermatchande samt en beräknande spelare. Det undersöks även om Q-learning kan kombineras med andra algoritmer för att skapa en bättre spelare. Q-spelaren slår inom rimlig tid de mönstermatchande och beräknande spelarna men får problem att inom rimlig tid och minnesanvänding slå den slumpande spelaren. Den förbättrade Q-learning-spelaren som kombinerar Q-learning med mönstermatchning och beräknade startvärden slår däremot alla tre spelare inom mycket kort inlärningstid.

Place, publisher, year, edition, pages
2011.
Series
Kandidatexjobb CSC, K11049
National Category
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-130826OAI: oai:DiVA.org:kth-130826DiVA: diva2:654273
Educational program
Master of Science in Engineering - Computer Science and Technology
Uppsok
Technology
Supervisors
Examiners
Available from: 2013-10-07 Created: 2013-10-07

Open Access in DiVA

No full text

Other links

http://www.csc.kth.se/utbildning/kandidatexjobb/datateknik/2011/rapport/hassel_olle_OCH_janse_petter_K11049.pdf
By organisation
School of Computer Science and Communication (CSC)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 42 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf