CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reinforcement Learning Endowed Robot Planning under Spatiotemporal Logic Specifications
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Decision and Control Systems (Automatic Control).ORCID iD: 0000-0002-7422-3966
2019 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

Recent advances in artificial intelligence are producing fascinating results in the field of computer science. Motivated by these successes, the desire to transfer and implement learning methods on real-life systems is growing as well. The increased level of autonomy and intelligence of the resulting systems in carrying out complex tasks can be expected to revolutionize both the industry and our everyday lives. This thesis takes a step towards this goal by studying reinforcement learning methods for solving optimal control problems with task satisfaction constraints. More specifically, spatiotemporal tasks given in the expressive language of signal temporal logic are considered.

We begin by introducing our proposed solution to the task constrained optimal control problem, which is based on blending traditional control methods with more recent, data-driven approaches. We adopt the paradigm that the two approaches should be considered as endpoints of a continuous spectrum, and incorporate partial knowledge of system dynamics into the learning process in the form of guidance controllers. These guidance controllers aid in enforcing the task satisfaction constraint, allowing the system to explore towards finding optimal trajectories in a more sample-efficient manner. The proposed solution algorithm is termed guided policy improvement with path integrals (G-PI2). We also propose a framework for deriving effective guidance controllers, and the importance of this guidance is illustrated through a simulation case study.

The thesis also considers a diverse range of enhancements to the developed G-PI2 algorithm. First, the effectiveness of the guidance laws is increased by continuously updating their parameters throughout the learning process using so-called funnel adaptation. Second, we explore a learning framework for gathering and storing experiences gained from previously solved problems in order to efficiently tackle changes in initial conditions or task specifications in future missions. Finally, we look at how so-called robustness metrics, which quantify the extent of task satisfaction for signal temporal logic, can be explicitly defined in order to aid the learning process towards finding task satisfying trajectories. The multidisciplinary nature of the examined task constrained optimal control problem offers a broad range of additional research directions to consider in future work, which are discussed in detail as well.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2019. , p. 133
Series
TRITA-EECS-AVL ; 2019:83
Keywords [en]
Formal methods, temporal logic, autonomous systems, reinforcement learning
National Category
Control Engineering
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-263611ISBN: 978-91-7873-371-2 (print)OAI: oai:DiVA.org:kth-263611DiVA, id: diva2:1368556
Presentation
2019-12-09, V3, Teknikringen 72, KTH Campus, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20191111

Available from: 2019-11-11 Created: 2019-11-07 Last updated: 2019-11-14Bibliographically approved

Open Access in DiVA

PVarnai_LicentiateThesis(1625 kB)73 downloads
File information
File name FULLTEXT01.pdfFile size 1625 kBChecksum SHA-512
68e411c90e332372e6aed3a3435573c9c211ffd8b2b2169a9219a9583870445c70291278c8df8adb2dd254a457340a7ebb014c1cf1497e3bbacaeba9b564e1d7
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Varnai, Peter
By organisation
Decision and Control Systems (Automatic Control)
Control Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 73 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 170 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf