A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation ModelsShow others and affiliations
2024 (English)In: 2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 748-754Conference paper, Published paper (Refereed)
Abstract [en]
In this paper, we build upon two major recent developments in the field, Diffusion Policies for visuomotor manipulation and large pre-trained multimodal foundational models to obtain a robotic skill learning system. The system can obtain new skills via the behavioral cloning approach of visuomotor diffusion policies given teleoperated demonstrations. Foundational models are being used to perform skill selection given the user's prompt in natural language. Before executing a skill the foundational model performs a precondition check given an observation of the workspace. We compare the performance of different foundational models to this end and give a detailed experimental evaluation of the skills taught by the user in simulation and the real world. Finally, we showcase the combined system on a challenging food serving scenario in the real world. Videos of all experimental executions, as well as the process of teaching new skills in simulation and the real world, are available on the project's website(1).
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. p. 748-754
Series
IEEE RO-MAN, ISSN 1944-9445
National Category
Algebra and Logic
Identifiers
URN: urn:nbn:se:kth:diva-358777DOI: 10.1109/RO-MAN60168.2024.10731242ISI: 001348918600099Scopus ID: 2-s2.0-85209783266OAI: oai:DiVA.org:kth-358777DiVA, id: diva2:1930172
Conference
33rd IEEE International Conference on Robot and Human Interactive Communication (IEEE RO-MAN) - Embracing Human-Centered HRI, AUG 26-30, 2024, Pasadena, CA
Note
Part of ISBN 979-8-3503-7503-9; 979-8-3503-7502-2
QC 20250122
2025-01-222025-01-222025-03-12Bibliographically approved