kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning Deep Neural Policies with Stability Guarantees
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0003-0443-7982
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-3599-440x
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Autonomous Systems, CAS.ORCID iD: 0000-0003-2965-2953
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Deep reinforcement learning (DRL) has been successfully used to solve various robotic manipulation tasks. However, most of the existing works do not address the issue of control stability. This is in sharp contrast to the control theory community where the well-established norm is to prove stability whenever a control law is synthesized. What makes traditional stability analysis difficult for DRL are the uninterpretable nature of the neural network policies and unknown system dynamics. In this work, unconditional stability is obtained by deriving an interpretable deep policy structure based on the energy shaping control of Lagrangian systems. Then, stability during physical interaction with an unknown environment is established based on passivity. The result is a stability guaranteeing DRL in a model-free framework that is general enough for contact-rich manipulation tasks. With an experiment on a peg-in-hole task, we demonstrate, to the best of our knowledge, the first DRL with stability guarantee on a real robotic manipulator.

Keywords [en]
Robotics, Reinforcement Learning, Robot Control, Robotic Manipulation
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-299798OAI: oai:DiVA.org:kth-299798DiVA, id: diva2:1585668
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20210818

Available from: 2021-08-17 Created: 2021-08-17 Last updated: 2025-02-09Bibliographically approved
In thesis
1. Data-Driven Methods for Contact-Rich Manipulation: Control Stability and Data-Efficiency
Open this publication in new window or tab >>Data-Driven Methods for Contact-Rich Manipulation: Control Stability and Data-Efficiency
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Autonomous robots are expected to make a greater presence in the homes and workplaces of human beings. Unlike their industrial counterparts, autonomous robots have to deal with a great deal of uncertainty and lack of structure in their environment. A remarkable aspect of performing manipulation in such a scenario is the possibility of physical contact between the robot and the environment. Therefore, not unlike human manipulation, robotic manipulation has to manage contacts, both expected and unexpected, that are often characterized by complex interaction dynamics.

Skill learning has emerged as a promising approach for robots to acquire rich motion generation capabilities. In skill learning, data driven methods are used to learn reactive control policies that map states to actions. Such an approach is appealing because a sufficiently expressive policy can almost instantaneously generate appropriate control actions without the need for computationally expensive search operations. Although reinforcement learning (RL) is a natural framework for skill learning, its practical application is limited for a number of reasons. Arguably, the two main reasons are the lack of guaranteed control stability and poor data-efficiency. While control stability is necessary for ensuring safety and predictability, data-efficiency is required for achieving realistic training times. In this thesis, solutions are sought for these two issues in the context of contact-rich manipulation.

First, this thesis addresses the problem of control stability. Despite unknown interaction dynamics during contact, skill learning with stability guarantee is formulated as a model-free RL problem. The thesis proposes multiple solutions for parameterizing stability-aware policies. Some policy parameterizations are partly or almost wholly deep neural networks. This is followed by policy search solutions that preserve stability during random exploration, if required. In one case, a novel evolution strategies-based policy search method is introduced. It is shown, with the help of real robot experiments, that Lyapunov stability is both possible and beneficial for RL-based skill learning.

Second, this thesis addresses the issue of data-efficiency. Although data-efficiency is targeted by formulating skill learning as a model-based RL problem, only the model learning part is addressed. In addition to benefiting from the data-efficiency and uncertainty representation of the Gaussian process, this thesis further investigates the benefits of adopting the structure of hybrid automata for learning forward dynamics models. The method also includes an algorithm for predicting long-term trajectory distributions that can represent discontinuities and multiple modes. The proposed method is shown to be more data-efficient than some state-of-the-art methods. 

Abstract [sv]

Autonoma robotar förväntas utgöra en allt större närvaro på människors arbetsplatser och i deras hem. Till skillnad från sina industriella motparter, behöver dessa autonoma robotar hantera en stor mängd osäkerhet och brist på struktur i sina omgivningar. En väsentlig del av att utföra manipulation i dylika scenarier, är förekomsten av fysisk interaktion med direkt kontakt mellan roboten och dess omgivning. Därför måste robotar, inte olikt människor, kunna hantera både förväntade och oväntade kontakter med omgivningen, som ofta karaktäriseras av komplex interaktionsdynamik.

Skill learning, eller inlärning av färdigheter, står ut som ett lovande alternativ för att låta robotar tillgodogöra sig en rik förmoga att generera rörelser. I Skill Learning används datadrivna metoder för att lära in en reaktiv policy, en reglerfunktion som kopplar tillstånd till styrsignaler. Detta tillvägagångssätt är tilltalande eftersom en tillräckligt uttrycksfull policy kan generera lämpliga styrsignaler nästan instantant, utan att behöva genomföra beräkningsmässigt kostsamma sökoperationer. Även om Reinforcement Learning (RL), förstärkningsinlärning, är ett naturligt ramverk för skill learning, har dess praktiska tillämpningar varit begräsade av ett antal anledningar. Det kan med fog påstås att de två främsta anledningarna är brist på garanterad stabilitet, och dålig dataeffektivitet. Stabilitet i reglerloopen är nödvändigt för att kunna garanterar säkerhet och förutsägbarhet, och dataeffektivitet behövs för att uppnå realistiska inlärningstider. I denna avhandling söker vi efter lösningar till dessa problem i kontexten av manipulation med rik förekomst av kontakter.

Denna avhandling behandlar först problemet med stabilitet. Trots at dynamiken för interaktionen är okänd vid förekomsten av kontakter, formuleras skill learning med stabilitetsgarantier som ett modelfritt RL-problem. Avhandlingen presenterar flera lösningar för att parametrisera stabilitetsmedvetna policys. Detta följs sedan av lösningar för att söka efter policys som är stabila under slumpmässig sökning, om detta behövs. Några parametriseringar bestå helt eller delvis av djupa neurala nätverk. I ett fall introduceras också en sökmetod baserad på evolution strategies. Vi visar, genom experiment på faktiska robotar, att lyaponovstabilitet är både möjligt och fördelaktigt vid RL-baserad skill learning.

Vidare tar avhandlingen upp dataeffektivitet. Även om dataeffektiviteten angrips genom att formulera skill learning som ett modellbaserat RL-problem, så behandlar vi endast delen med modellinlärning. Utöver att dra nytta av dataeffektiviteten och osäkerhetsrepresentationen i gaussiska processer, så undersöker avhandlingen även fördelarna med att använda strukturen hos hybrida automata för att lära in modeller för framåtdynamiken. Metoden innehåller även en algoritm för att förutsäga fördelningarna av trajektorier över en längre tidsrymd, för att representera diskontinuiteter och multipla moder. Vi visar att den föreslagna metodiken är mer dataeffektiv än ett antal existerande metoder.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2021. p. 63
Series
TRITA-EECS-AVL ; 49
Keywords
Robotic, Skill Learning, Reinforcement Learning, Contact-Rich Manipulation
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-299799 (URN)978-91-7873-937-0 (ISBN)
Public defence
2021-09-17, https://kth-se.zoom.us/j/68651867110, F3, Lindstedtsvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20210823

Available from: 2021-08-18 Created: 2021-08-17 Last updated: 2025-02-09Bibliographically approved

Open Access in DiVA

fulltext(1814 kB)415 downloads
File information
File name FULLTEXT01.pdfFile size 1814 kBChecksum SHA-512
9bbe60d56031629f492b2b66c3b07a639faa37f3700b4e9d4f17d5f4877f45d9789a181a6f4367302849f425de45dccde062fb441efec561a28b8becd4193c60
Type fulltextMimetype application/pdf

Other links

arXiv

Authority records

Abdul Khader, ShahbazYin, HangKragic, Danica

Search in DiVA

By author/editor
Abdul Khader, ShahbazYin, HangKragic, Danica
By organisation
Robotics, Perception and Learning, RPLCentre for Autonomous Systems, CAS
Robotics and automation

Search outside of DiVA

GoogleGoogle Scholar
Total: 415 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 806 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf