kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning Stable Normalizing-Flow Control for Robotic Manipulation
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0003-0443-7982
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-3599-440x
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Autonomous Systems, CAS.ORCID iD: 0000-0003-2965-2953
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Reinforcement Learning (RL) of robotic manipu-lation skills, despite its impressive successes, stands to benefitfrom incorporating domain knowledge from control theory. Oneof the most important properties that is of interest is controlstability. Ideally, one would like to achieve stability guaranteeswhile staying within the framework of state-of-the-art deepRL algorithms. Such a solution does not exist in general,especially one that scales to complex manipulation tasks. Wecontribute towards closing this gap by introducing normalizing-flow control structure, that can be deployed in any latest deepRL algorithms. While stable exploration is not guaranteed,our method is designed to ultimately produce deterministiccontrollers with provable stability. In addition to demonstratingour method on challenging contact-rich manipulation tasks, wealso show that it is possible to achieve considerable explorationefficiency–reduced state space coverage and actuation efforts–without losing learning efficiency.

Keywords [en]
Robotics, Reinforcement Learning, Robot Control, Robotic Manipulation
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-299797OAI: oai:DiVA.org:kth-299797DiVA, id: diva2:1585667
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20210818

Available from: 2021-08-17 Created: 2021-08-17 Last updated: 2025-02-09Bibliographically approved
In thesis
1. Data-Driven Methods for Contact-Rich Manipulation: Control Stability and Data-Efficiency
Open this publication in new window or tab >>Data-Driven Methods for Contact-Rich Manipulation: Control Stability and Data-Efficiency
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Autonomous robots are expected to make a greater presence in the homes and workplaces of human beings. Unlike their industrial counterparts, autonomous robots have to deal with a great deal of uncertainty and lack of structure in their environment. A remarkable aspect of performing manipulation in such a scenario is the possibility of physical contact between the robot and the environment. Therefore, not unlike human manipulation, robotic manipulation has to manage contacts, both expected and unexpected, that are often characterized by complex interaction dynamics.

Skill learning has emerged as a promising approach for robots to acquire rich motion generation capabilities. In skill learning, data driven methods are used to learn reactive control policies that map states to actions. Such an approach is appealing because a sufficiently expressive policy can almost instantaneously generate appropriate control actions without the need for computationally expensive search operations. Although reinforcement learning (RL) is a natural framework for skill learning, its practical application is limited for a number of reasons. Arguably, the two main reasons are the lack of guaranteed control stability and poor data-efficiency. While control stability is necessary for ensuring safety and predictability, data-efficiency is required for achieving realistic training times. In this thesis, solutions are sought for these two issues in the context of contact-rich manipulation.

First, this thesis addresses the problem of control stability. Despite unknown interaction dynamics during contact, skill learning with stability guarantee is formulated as a model-free RL problem. The thesis proposes multiple solutions for parameterizing stability-aware policies. Some policy parameterizations are partly or almost wholly deep neural networks. This is followed by policy search solutions that preserve stability during random exploration, if required. In one case, a novel evolution strategies-based policy search method is introduced. It is shown, with the help of real robot experiments, that Lyapunov stability is both possible and beneficial for RL-based skill learning.

Second, this thesis addresses the issue of data-efficiency. Although data-efficiency is targeted by formulating skill learning as a model-based RL problem, only the model learning part is addressed. In addition to benefiting from the data-efficiency and uncertainty representation of the Gaussian process, this thesis further investigates the benefits of adopting the structure of hybrid automata for learning forward dynamics models. The method also includes an algorithm for predicting long-term trajectory distributions that can represent discontinuities and multiple modes. The proposed method is shown to be more data-efficient than some state-of-the-art methods. 

Abstract [sv]

Autonoma robotar förväntas utgöra en allt större närvaro på människors arbetsplatser och i deras hem. Till skillnad från sina industriella motparter, behöver dessa autonoma robotar hantera en stor mängd osäkerhet och brist på struktur i sina omgivningar. En väsentlig del av att utföra manipulation i dylika scenarier, är förekomsten av fysisk interaktion med direkt kontakt mellan roboten och dess omgivning. Därför måste robotar, inte olikt människor, kunna hantera både förväntade och oväntade kontakter med omgivningen, som ofta karaktäriseras av komplex interaktionsdynamik.

Skill learning, eller inlärning av färdigheter, står ut som ett lovande alternativ för att låta robotar tillgodogöra sig en rik förmoga att generera rörelser. I Skill Learning används datadrivna metoder för att lära in en reaktiv policy, en reglerfunktion som kopplar tillstånd till styrsignaler. Detta tillvägagångssätt är tilltalande eftersom en tillräckligt uttrycksfull policy kan generera lämpliga styrsignaler nästan instantant, utan att behöva genomföra beräkningsmässigt kostsamma sökoperationer. Även om Reinforcement Learning (RL), förstärkningsinlärning, är ett naturligt ramverk för skill learning, har dess praktiska tillämpningar varit begräsade av ett antal anledningar. Det kan med fog påstås att de två främsta anledningarna är brist på garanterad stabilitet, och dålig dataeffektivitet. Stabilitet i reglerloopen är nödvändigt för att kunna garanterar säkerhet och förutsägbarhet, och dataeffektivitet behövs för att uppnå realistiska inlärningstider. I denna avhandling söker vi efter lösningar till dessa problem i kontexten av manipulation med rik förekomst av kontakter.

Denna avhandling behandlar först problemet med stabilitet. Trots at dynamiken för interaktionen är okänd vid förekomsten av kontakter, formuleras skill learning med stabilitetsgarantier som ett modelfritt RL-problem. Avhandlingen presenterar flera lösningar för att parametrisera stabilitetsmedvetna policys. Detta följs sedan av lösningar för att söka efter policys som är stabila under slumpmässig sökning, om detta behövs. Några parametriseringar bestå helt eller delvis av djupa neurala nätverk. I ett fall introduceras också en sökmetod baserad på evolution strategies. Vi visar, genom experiment på faktiska robotar, att lyaponovstabilitet är både möjligt och fördelaktigt vid RL-baserad skill learning.

Vidare tar avhandlingen upp dataeffektivitet. Även om dataeffektiviteten angrips genom att formulera skill learning som ett modellbaserat RL-problem, så behandlar vi endast delen med modellinlärning. Utöver att dra nytta av dataeffektiviteten och osäkerhetsrepresentationen i gaussiska processer, så undersöker avhandlingen även fördelarna med att använda strukturen hos hybrida automata för att lära in modeller för framåtdynamiken. Metoden innehåller även en algoritm för att förutsäga fördelningarna av trajektorier över en längre tidsrymd, för att representera diskontinuiteter och multipla moder. Vi visar att den föreslagna metodiken är mer dataeffektiv än ett antal existerande metoder.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2021. p. 63
Series
TRITA-EECS-AVL ; 49
Keywords
Robotic, Skill Learning, Reinforcement Learning, Contact-Rich Manipulation
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-299799 (URN)978-91-7873-937-0 (ISBN)
Public defence
2021-09-17, https://kth-se.zoom.us/j/68651867110, F3, Lindstedtsvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20210823

Available from: 2021-08-18 Created: 2021-08-17 Last updated: 2025-02-09Bibliographically approved

Open Access in DiVA

fulltext(2782 kB)634 downloads
File information
File name FULLTEXT01.pdfFile size 2782 kBChecksum SHA-512
d6aab9ec8acf0b2df689744ab03e59219fa6a065025dc0882236e9e84097a9cfbda6c53a41bcfc22236abb5a5e505851a2f4792e85655e05a91124d83ad8dc98
Type fulltextMimetype application/pdf

Other links

arXiv

Authority records

Abdul Khader, ShahbazYin, HangKragic, Danica

Search in DiVA

By author/editor
Abdul Khader, ShahbazYin, HangKragic, Danica
By organisation
Robotics, Perception and Learning, RPLCentre for Autonomous Systems, CAS
Robotics and automation

Search outside of DiVA

GoogleGoogle Scholar
Total: 635 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 598 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf