kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Stability-Guaranteed Reinforcement Learning for Contact-Rich Manipulation
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL. ABB Corp Res, S-72178 Västerås, Sweden..ORCID iD: 0000-0003-0443-7982
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-3599-440x
ABB Corp Res, S-72178 Västerås, Sweden..
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0003-2965-2953
2021 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 6, no 1, p. 1-8Article in journal (Refereed) Published
Abstract [en]

Reinforcement learning (RL) has had its fair share of success in contact-rich manipulation tasks but it still lags behind in benefiting from advances in robot control theory such as impedance control and stability guarantees. Recently, the concept of variable impedance control (VIC) was adopted into RL with encouraging results. However, the more important issue of stability remains unaddressed. To clarify the challenge in stable RL, we introduce the term all-the-time-stability that unambiguously means that every possible rollout should be stability certified. Our contribution is a model-free RL method that not only adopts VIC but also achieves all-the-time-stability. Building on a recently proposed stable VIC controller as the policy parameterization, we introduce a novel policy search algorithm that is inspired by Cross-Entropy Method and inherently guarantees stability. Our experimental studies confirm the feasibility and usefulness of stability guarantee and also features, to the best of our knowledge, the first successful application of RL with all-the-time-stability on the benchmark problem of peg-in-hole.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2021. Vol. 6, no 1, p. 1-8
Keywords [en]
Reinforcement learning, compliance and impedance control, compliant assembly
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:kth:diva-285758DOI: 10.1109/LRA.2020.3028529ISI: 000577867400001Scopus ID: 2-s2.0-85092014420OAI: oai:DiVA.org:kth-285758DiVA, id: diva2:1500579
Note

QC 20201112

Available from: 2020-11-12 Created: 2020-11-12 Last updated: 2025-02-07Bibliographically approved
In thesis
1. Data-Driven Methods for Contact-Rich Manipulation: Control Stability and Data-Efficiency
Open this publication in new window or tab >>Data-Driven Methods for Contact-Rich Manipulation: Control Stability and Data-Efficiency
2021 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Autonomous robots are expected to make a greater presence in the homes and workplaces of human beings. Unlike their industrial counterparts, autonomous robots have to deal with a great deal of uncertainty and lack of structure in their environment. A remarkable aspect of performing manipulation in such a scenario is the possibility of physical contact between the robot and the environment. Therefore, not unlike human manipulation, robotic manipulation has to manage contacts, both expected and unexpected, that are often characterized by complex interaction dynamics.

Skill learning has emerged as a promising approach for robots to acquire rich motion generation capabilities. In skill learning, data driven methods are used to learn reactive control policies that map states to actions. Such an approach is appealing because a sufficiently expressive policy can almost instantaneously generate appropriate control actions without the need for computationally expensive search operations. Although reinforcement learning (RL) is a natural framework for skill learning, its practical application is limited for a number of reasons. Arguably, the two main reasons are the lack of guaranteed control stability and poor data-efficiency. While control stability is necessary for ensuring safety and predictability, data-efficiency is required for achieving realistic training times. In this thesis, solutions are sought for these two issues in the context of contact-rich manipulation.

First, this thesis addresses the problem of control stability. Despite unknown interaction dynamics during contact, skill learning with stability guarantee is formulated as a model-free RL problem. The thesis proposes multiple solutions for parameterizing stability-aware policies. Some policy parameterizations are partly or almost wholly deep neural networks. This is followed by policy search solutions that preserve stability during random exploration, if required. In one case, a novel evolution strategies-based policy search method is introduced. It is shown, with the help of real robot experiments, that Lyapunov stability is both possible and beneficial for RL-based skill learning.

Second, this thesis addresses the issue of data-efficiency. Although data-efficiency is targeted by formulating skill learning as a model-based RL problem, only the model learning part is addressed. In addition to benefiting from the data-efficiency and uncertainty representation of the Gaussian process, this thesis further investigates the benefits of adopting the structure of hybrid automata for learning forward dynamics models. The method also includes an algorithm for predicting long-term trajectory distributions that can represent discontinuities and multiple modes. The proposed method is shown to be more data-efficient than some state-of-the-art methods. 

Abstract [sv]

Autonoma robotar förväntas utgöra en allt större närvaro på människors arbetsplatser och i deras hem. Till skillnad från sina industriella motparter, behöver dessa autonoma robotar hantera en stor mängd osäkerhet och brist på struktur i sina omgivningar. En väsentlig del av att utföra manipulation i dylika scenarier, är förekomsten av fysisk interaktion med direkt kontakt mellan roboten och dess omgivning. Därför måste robotar, inte olikt människor, kunna hantera både förväntade och oväntade kontakter med omgivningen, som ofta karaktäriseras av komplex interaktionsdynamik.

Skill learning, eller inlärning av färdigheter, står ut som ett lovande alternativ för att låta robotar tillgodogöra sig en rik förmoga att generera rörelser. I Skill Learning används datadrivna metoder för att lära in en reaktiv policy, en reglerfunktion som kopplar tillstånd till styrsignaler. Detta tillvägagångssätt är tilltalande eftersom en tillräckligt uttrycksfull policy kan generera lämpliga styrsignaler nästan instantant, utan att behöva genomföra beräkningsmässigt kostsamma sökoperationer. Även om Reinforcement Learning (RL), förstärkningsinlärning, är ett naturligt ramverk för skill learning, har dess praktiska tillämpningar varit begräsade av ett antal anledningar. Det kan med fog påstås att de två främsta anledningarna är brist på garanterad stabilitet, och dålig dataeffektivitet. Stabilitet i reglerloopen är nödvändigt för att kunna garanterar säkerhet och förutsägbarhet, och dataeffektivitet behövs för att uppnå realistiska inlärningstider. I denna avhandling söker vi efter lösningar till dessa problem i kontexten av manipulation med rik förekomst av kontakter.

Denna avhandling behandlar först problemet med stabilitet. Trots at dynamiken för interaktionen är okänd vid förekomsten av kontakter, formuleras skill learning med stabilitetsgarantier som ett modelfritt RL-problem. Avhandlingen presenterar flera lösningar för att parametrisera stabilitetsmedvetna policys. Detta följs sedan av lösningar för att söka efter policys som är stabila under slumpmässig sökning, om detta behövs. Några parametriseringar bestå helt eller delvis av djupa neurala nätverk. I ett fall introduceras också en sökmetod baserad på evolution strategies. Vi visar, genom experiment på faktiska robotar, att lyaponovstabilitet är både möjligt och fördelaktigt vid RL-baserad skill learning.

Vidare tar avhandlingen upp dataeffektivitet. Även om dataeffektiviteten angrips genom att formulera skill learning som ett modellbaserat RL-problem, så behandlar vi endast delen med modellinlärning. Utöver att dra nytta av dataeffektiviteten och osäkerhetsrepresentationen i gaussiska processer, så undersöker avhandlingen även fördelarna med att använda strukturen hos hybrida automata för att lära in modeller för framåtdynamiken. Metoden innehåller även en algoritm för att förutsäga fördelningarna av trajektorier över en längre tidsrymd, för att representera diskontinuiteter och multipla moder. Vi visar att den föreslagna metodiken är mer dataeffektiv än ett antal existerande metoder.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2021. p. 63
Series
TRITA-EECS-AVL ; 49
Keywords
Robotic, Skill Learning, Reinforcement Learning, Contact-Rich Manipulation
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-299799 (URN)978-91-7873-937-0 (ISBN)
Public defence
2021-09-17, https://kth-se.zoom.us/j/68651867110, F3, Lindstedtsvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20210823

Available from: 2021-08-18 Created: 2021-08-17 Last updated: 2025-02-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Khader, Shahbaz AbdulYin, HangKragic, Danica

Search in DiVA

By author/editor
Khader, Shahbaz AbdulYin, HangKragic, Danica
By organisation
Robotics, Perception and Learning, RPL
In the same journal
IEEE Robotics and Automation Letters
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 378 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf