kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reinforcement learning for optimal execution in high resolution Markovian limit order book models
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.ORCID iD: 0000-0002-0067-4908
KTH, School of Engineering Sciences (SCI), Mathematics (Dept.), Mathematical Statistics.ORCID iD: 0000-0001-9210-121X
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-4679-4673
Show others and affiliations
(English)Manuscript (preprint) (Other academic)
National Category
Computational Mathematics
Identifiers
URN: urn:nbn:se:kth:diva-295423OAI: oai:DiVA.org:kth-295423DiVA, id: diva2:1556125
Note

QC 20210531

Available from: 2021-05-20 Created: 2021-05-20 Last updated: 2022-06-25Bibliographically approved
In thesis
1. Generative models of limit order books
Open this publication in new window or tab >>Generative models of limit order books
2021 (English)Licentiate thesis, comprehensive summary (Other academic)
Abstract [en]

In this thesis generative models in machine learning are developed with the overall aim to improve methods for algorithmic trading on high-frequency electronic exchanges based on limit order books. The thesis consists of two papers.

In the first paper a new generative model for the dynamic evolution of a limit order book, based on recurrent neural networks, is developed. The model captures the full dynamics of the limit order book by decomposing the probability of each transition of the limit order book into a product of conditional probabilities of order type, price level, order size, and time delay. Each such conditional probability is modeled by a recurrent neural network. In addition several evaluation metrics for generative models related to order execution are introduced. The generative model is successfully trained to fit both synthetic data generated by a Markov model and real data from the Nasdaq Stockholm exchange.

The second paper explores reinforcement learning methods to find optimal policies for trading execution in Markovian models. A number of different approaches are implemented and compared, including a baseline time-weighted average price (TWAP) strategy, tabular Q-learning, and deep Q-learning based on predefined features as well as with the entire limit order book as input. The results indicate that it is preferable to use deep Q-learning with the entire limit order book as input to design efficient execution policies. In order to improve the understanding of the decisions taken by the agent, the learned action-value function for the deep Q-learning with predefined features is visualized as a function of selected features.  

Abstract [sv]

I denna avhandling utvecklas generativa modeller i maskininlärning med syfte att förbättra metoder för algoritmisk handel på högfrekventa elektroniska marknader baserat på orderböcker. Avhandlingen består av två artiklar.

Den första artikeln utvecklar en generativ modell för den dynamiska utvecklingen av en orderbok baserad på rekurrenta neurala nätverk. Modellen fångar orderbokens fullständiga dynamik genom att bryta ned sannolikheten för varje förändring av orderboken i en produkt av betingade sannolikheter för ordertyp, prisnivå, orderstorlek och tidsfördröjning. Var och en av de betingade sannolikheterna modelleras med ett rekurrent neuralt nätverk.  Dessutom introduceras flera evalueringsmetoder för generativa modeller relaterade till orderexekvering. Den generativa modellen tränas framgångsrikt både för syntetisk data, genererad av en Markovmodell, och riktig data från Nasdaq Stockholm.

Den andra artikeln utforskar förstärkningsinlärning för att hitta optimala strategier för orderexekvering i Markovska modeller. Flera olika metoder implementeras och jämförs, inklusive en referensstrategi med tidsviktat medelpris, tabulär Q-inlärning och djup Q-inlärning baserade både på fördefinierade statistikor och med hela orderboken som indata. Resultaten indikerar att det är fördelaktigt att använda hela orderboken som indata för djup Q-inlärning. För att förbättra förståelsen för besluten som agenten tar, visualiseras Q-funktionen för djup Q-inlärning som funktion av de fördefinierade statistikorna. 

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2021. p. 109
Series
TRITA-SCI-FOU ; 2021;25
National Category
Probability Theory and Statistics
Research subject
Mathematics
Identifiers
urn:nbn:se:kth:diva-295424 (URN)978-91-7873-921-9 (ISBN)
Presentation
2021-06-10, Via Zoom: https://kth-se.zoom.us/webinar/register/WN_ELZ61ZbqSNKq_c7ShhtAqA, 13:00 (English)
Opponent
Supervisors
Available from: 2021-05-21 Created: 2021-05-20 Last updated: 2022-09-19Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records

Hultin, HannaHult, HenrikProutiere, AlexandreTarighati, Alla

Search in DiVA

By author/editor
Hultin, HannaHult, HenrikProutiere, AlexandreTarighati, Alla
By organisation
Mathematical StatisticsDecision and Control Systems (Automatic Control)
Computational Mathematics

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 1388 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf