kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Deep q-learning in Continuous Time
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematics (Div.).
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Djup q-Inlärning i Kontinuerlig Tid (Swedish)
Abstract [en]

Reinforcement Learning (RL) focuses on designing agents that solve sequential decision-making problems by exploring and learning optimal actions through trial-and-error. Traditionally formulated in discrete-time, RL algorithms like Deep Q-learning teach agents the Q-function, by means of function approximation using Deep Neural Networks (DNNs). Recent advancements by X. Y. Zhou and his co-authors propose q-learning, a continuous-time Q-learning framework. In this setting, one focuses on the "q-function," the time derivative of the Q-function, which is learned by a martingale approach. This thessi introduces the concept of Deep q-learning, which infolves approximating the optimal q-function and optimal value function with DNNs, analogous to the case of Deep Q-learning. We adapt q-learning algorithms from Jia and Zhou (2023), obtaining offline and online Deep q-learning algorithms. Furthermore, we prove, under certain assumptions, that discretization errors associated with q-learning algorithms approach zero as the time discretization approaches zero. Lastly, we demonstrate convergence of the offline Deep q-learning algorithm through numerical simulations.

Abstract [sv]

Förstärkningsinlärning (RL) fokuserar på att utforma agenter som löser sekventiella beslutsproblem genom att utforska och lära sig optimala handlingar efter försök och misstag. Traditionellt formulerade i diskret tid, lär sig RL-algoritmer, som djup Q-inlärning, agenter Q-funktionen genom funktionsapproximation med djupa neurala nätverk (DNNs). Nyligen introducerade X. Y. Zhou och hans medförfattare q-inlärning, vilket är ett Q-inlärningsramverk i kontinuerlig tid. I detta fall fokuserar man på "q-funktionen," tidsderivatan av Q-funktionen, som karakteriseras och inlärs med hjälp av martingaler. Detta masterarbete introducerar begreppet djup q-inlärning, vilket innebär att man approximerar den optimala q-funktionen och den optimala värdefunktionen med DNNs, analogt med djup Q-inlärning. Vi anpassar q-inlärningsalgoritmer från Jia och Zhou (2023) till vårt fall, vilket resulterar i djupa q-inlärningsalgoritmer, både offline och online. Vidare bevisar vi att diskretiseringsfel förknippade med q-inlärningsalgoritmer går mot noll när tidsdiskretiseringen närmar sig noll. Sist demonstrerar vi konvergens av den djupa q-inlärningsalgoritmen offline genom numeriska simuleringar.

Place, publisher, year, edition, pages
2024. , p. 58
Series
TRITA-SCI-GRU ; 2024:341
Keywords [en]
Deep Reinforcement Learning, Q-learning, continuous-time, entropy-regularization, stochastic control, martingale
Keywords [sv]
Djup förstärkningsinlärning, Q-inlärning, kontinuerlig tid, entropiregularisering, stokastisk styrteori, martingal
National Category
Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-358022OAI: oai:DiVA.org:kth-358022DiVA, id: diva2:1923803
Subject / course
Mathematics
Educational program
Master of Science - Mathematics
Supervisors
Examiners
Available from: 2024-12-30 Created: 2024-12-30 Last updated: 2024-12-30Bibliographically approved

Open Access in DiVA

fulltext(694 kB)117 downloads
File information
File name FULLTEXT01.pdfFile size 694 kBChecksum SHA-512
2a6f697101fb4c3a037ecf8374eb554a0977ddb917f22a029146d7346cc0489e1d49b1b67ee446bf95d1b421db8426c5b697ad7f412b4fedbe33f7d04303eafc
Type fulltextMimetype application/pdf

By organisation
Mathematics (Div.)
Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 117 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 319 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf