kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
SWIRL: A SequentialWindowed Inverse Reinforcement Learning Algorithm for Robot Tasks With Delayed Rewards
Show others and affiliations
2020 (English)In: Springer Proceedings in Advanced Robotics, ISSN 2511-1256, Vol. 13, p. 672-687Article in journal (Refereed) Published
Abstract [en]

Inverse Reinforcement Learning (IRL) allows a robot to generalize from demonstrations to previously unseen scenarios by learning the demonstrator’s reward function. However, in multi-step tasks, the learned rewards might be delayed and hard to directly optimize. We present Sequential Windowed Inverse Reinforcement Learning (SWIRL), a three-phase algorithm that partitions a complex task into shorter-horizon subtasks based on linear dynamics transitions that occur consistently across demonstrations. SWIRL then learns a sequence of local reward functions that describe the motion between transitions. Once these reward functions are learned, SWIRL applies Q-learning to compute a policy that maximizes the rewards. We compare SWIRL (demonstrations to segments to rewards) with Supervised Policy Learning (SPL - demonstrations to policies) and Maximum Entropy IRL (MaxEnt-IRL demonstrations to rewards) on standard Reinforcement Learning benchmarks: Parallel Parking with noisy dynamics, Two-Link acrobot, and a 2D GridWorld. We find that SWIRL converges to a policy with similar success rates (60%) in 3x fewer time-steps than MaxEnt-IRL, and requires 5x fewer demonstrations than SPL. In physical experiments using the da Vinci surgical robot, we evaluate the extent to which SWIRL generalizes from linear cutting demonstrations to cutting sequences of curved paths.

Place, publisher, year, edition, pages
Springer Nature , 2020. Vol. 13, p. 672-687
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-301017DOI: 10.1007/978-3-030-43089-4_43Scopus ID: 2-s2.0-85107034722OAI: oai:DiVA.org:kth-301017DiVA, id: diva2:1593944
Note

QC 20210914

Available from: 2021-09-14 Created: 2021-09-14 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Pokorny, Florian T.

Search in DiVA

By author/editor
Pokorny, Florian T.
By organisation
Robotics, Perception and Learning, RPL
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 8 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf