Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
SWIRL: A sequential windowed inverse reinforcement learning algorithm for robot tasks with delayed rewards
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA..
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA.;Stanford Univ, Stanford, CA 94305 USA..
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA..
Univ Calif Berkeley, AUTOLAB, Berkeley, CA 94720 USA..
Vise andre og tillknytning
2019 (engelsk)Inngår i: The international journal of robotics research, ISSN 0278-3649, E-ISSN 1741-3176, Vol. 38, nr 2-3, s. 126-145Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

We present sequential windowed inverse reinforcement learning (SWIRL), a policy search algorithm that is a hybrid of exploration and demonstration paradigms for robot learning. We apply unsupervised learning to a small number of initial expert demonstrations to structure future autonomous exploration. SWIRL approximates a long time horizon task as a sequence of local reward functions and subtask transition conditions. Over this approximation, SWIRL applies Q-learning to compute a policy that maximizes rewards. Experiments suggest that SWIRL requires significantly fewer rollouts than pure reinforcement learning and fewer expert demonstrations than behavioral cloning to learn a policy. We evaluate SWIRL in two simulated control tasks, parallel parking and a two-link pendulum. On the parallel parking task, SWIRL achieves the maximum reward on the task with 85% fewer rollouts than Q-learning, and one-eight of demonstrations needed by behavioral cloning. We also consider physical experiments on surgical tensioning and cutting deformable sheets using a da Vinci surgical robot. On the deformable tensioning task, SWIRL achieves a 36% relative improvement in reward compared with a baseline of behavioral cloning with segmentation.

sted, utgiver, år, opplag, sider
SAGE PUBLICATIONS LTD , 2019. Vol. 38, nr 2-3, s. 126-145
Emneord [en]
Reinforcement learning, inverse reinforcement learning, learning from demonstrations, medical robots and systems
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-247830DOI: 10.1177/0278364918784350ISI: 000460099500003Scopus ID: 2-s2.0-85052190280OAI: oai:DiVA.org:kth-247830DiVA, id: diva2:1299180
Merknad

QC 20190326

Tilgjengelig fra: 2019-03-26 Laget: 2019-03-26 Sist oppdatert: 2019-03-26bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Personposter BETA

Pokorny, Florian T.

Søk i DiVA

Av forfatter/redaktør
Pokorny, Florian T.
Av organisasjonen
I samme tidsskrift
The international journal of robotics research

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 58 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf