Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA..
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.;Stanford Univ, Stanford, CA 94305 USA..
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA..
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA..
Show others and affiliations
2019 (English)In: The international journal of robotics research, ISSN 0278-3649, E-ISSN 1741-3176, Vol. 38, no 2-3, p. 126-145Article in journal (Refereed) Published
Abstract [en]

We present sequential windowed inverse reinforcement learning (SWIRL), a policy search algorithm that is a hybrid of exploration and demonstration paradigms for robot learning. We apply unsupervised learning to a small number of initial expert demonstrations to structure future autonomous exploration. SWIRL approximates a long time horizon task as a sequence of local reward functions and subtask transition conditions. Over this approximation, SWIRL applies Q-learning to compute a policy that maximizes rewards. Experiments suggest that SWIRL requires significantly fewer rollouts than pure reinforcement learning and fewer expert demonstrations than behavioral cloning to learn a policy. We evaluate SWIRL in two simulated control tasks, parallel parking and a two-link pendulum. On the parallel parking task, SWIRL achieves the maximum reward on the task with 85% fewer rollouts than Q-learning, and one-eight of demonstrations needed by behavioral cloning. We also consider physical experiments on surgical tensioning and cutting deformable sheets using a da Vinci surgical robot. On the deformable tensioning task, SWIRL achieves a 36% relative improvement in reward compared with a baseline of behavioral cloning with segmentation.

Place, publisher, year, edition, pages
SAGE PUBLICATIONS LTD , 2019. Vol. 38, no 2-3, p. 126-145
Keywords [en]
Reinforcement learning, inverse reinforcement learning, learning from demonstrations, medical robots and systems
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-247830DOI: 10.1177/0278364918784350ISI: 000460099500003Scopus ID: 2-s2.0-85052190280OAI: oai:DiVA.org:kth-247830DiVA, id: diva2:1299180
Note

QC 20190326

Available from: 2019-03-26 Created: 2019-03-26 Last updated: 2019-03-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Pokorny, Florian T.

Search in DiVA

By author/editor
Pokorny, Florian T.
By organisation
Robotics, perception and learning, RPL
In the same journal
The international journal of robotics research
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 51 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf