kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models
KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Autonomous Systems, CAS.
KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Autonomous Systems, CAS.
KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Autonomous Systems, CAS.
KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for Autonomous Systems, CAS.ORCID iD: 0000-0002-0900-1523
Show others and affiliations
2024 (English)In: 2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 748-754Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we build upon two major recent developments in the field, Diffusion Policies for visuomotor manipulation and large pre-trained multimodal foundational models to obtain a robotic skill learning system. The system can obtain new skills via the behavioral cloning approach of visuomotor diffusion policies given teleoperated demonstrations. Foundational models are being used to perform skill selection given the user's prompt in natural language. Before executing a skill the foundational model performs a precondition check given an observation of the workspace. We compare the performance of different foundational models to this end and give a detailed experimental evaluation of the skills taught by the user in simulation and the real world. Finally, we showcase the combined system on a challenging food serving scenario in the real world. Videos of all experimental executions, as well as the process of teaching new skills in simulation and the real world, are available on the project's website(1).

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. p. 748-754
Series
IEEE RO-MAN, ISSN 1944-9445
National Category
Algebra and Logic
Identifiers
URN: urn:nbn:se:kth:diva-358777DOI: 10.1109/RO-MAN60168.2024.10731242ISI: 001348918600099Scopus ID: 2-s2.0-85209783266OAI: oai:DiVA.org:kth-358777DiVA, id: diva2:1930172
Conference
33rd IEEE International Conference on Robot and Human Interactive Communication (IEEE RO-MAN) - Embracing Human-Centered HRI, AUG 26-30, 2024, Pasadena, CA
Note

Part of ISBN 979-8-3503-7503-9; 979-8-3503-7502-2

QC 20250122

Available from: 2025-01-22 Created: 2025-01-22 Last updated: 2025-03-12Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Ingelhag, NilsMunkeby, Jespervan Haastregt, JonneVarava, AnastasiiaWelle, Michael C.Kragic, Danica

Search in DiVA

By author/editor
Ingelhag, NilsMunkeby, Jespervan Haastregt, JonneVarava, AnastasiiaWelle, Michael C.Kragic, Danica
By organisation
Centre for Autonomous Systems, CASRobotics, Perception and Learning, RPLCollaborative Autonomous Systems
Algebra and Logic

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 42 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf