kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 36) Show all publications
Moletta, M., Welle, M. C. & Kragic Jensfelt, D. (2026). Preference Aligned Visuomotor Diffusion Policies for Deformable Object Manipulation. IEEE Robotics and Automation Letters, 11(4), 4985-4992
Open this publication in new window or tab >>Preference Aligned Visuomotor Diffusion Policies for Deformable Object Manipulation
2026 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 11, no 4, p. 4985-4992Article in journal (Refereed) Published
Abstract [en]

Humans naturally develop preferences for how manipulation tasks should be performed, which are often subtle, personal, and difficult to articulate. Although it is important for robots to account for these preferences to increase personalization and user satisfaction, they remain largely underexplored in robotic manipulation, particularly in the context of deformable objects like garments and fabrics. In this work, we study how to adapt pretrained visuomotor diffusion policies to reflect preferred behaviors using limited demonstrations. We introduce RKO, a novel preference-alignment method that combines the benefits of two recent frameworks: RPO and KTO. We evaluate RKO against common preference learning frameworks, including these two, as well as a baseline vanilla diffusion policy, on real-world cloth-folding tasks spanning multiple garments and preference settings. We show that preference-aligned policies (particularly RKO) achieve superior performance and sample efficiency compared to standard diffusion policy fine-tuning. These results highlight the importance and feasibility of structured preference learning for scaling personalized robot behavior in complex deformable object manipulation tasks.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2026
Keywords
Deep Learning in Grasping and Manipulation, Dual Arm Manipulation, Imitation Learning
National Category
Robotics and automation Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-377868 (URN)10.1109/LRA.2026.3665075 (DOI)2-s2.0-105031083921 (Scopus ID)
Note

QC 20260310

Available from: 2026-03-10 Created: 2026-03-10 Last updated: 2026-03-10Bibliographically approved
Koczy, P., Welle, M. C. & Kragic Jensfelt, D. (2025). Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion. In: IROS 2025 - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, Conference Proceedings: . Paper presented at 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025, Hangzhou, China, Oct 19 2025 - Oct 25 2025 (pp. 121-127). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion
2025 (English)In: IROS 2025 - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, Conference Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 121-127Conference paper, Published paper (Refereed)
Abstract [en]

We present a framework for learning dexterous in-hand manipulation with multifingered hands using visuo-motor diffusion policies. Our system enables complex in-hand manipulation tasks, such as unscrewing a bottle lid with one hand, by leveraging a fast and responsive teleoperation setup for the four-fingered Allegro Hand. We collect high-quality expert demonstrations using an augmented reality (AR) interface that tracks hand movements and applies inverse kinematics and motion retargeting for precise control. The AR headset provides real-time visualization, while gesture controls streamline teleoperation. To enhance policy learning, we introduce a novel demonstration outlier removal approach based on HDBSCAN clustering and the Global-Local Outlier Score from Hierarchies (GLOSH) algorithm, effectively filtering out low-quality demonstrations that could degrade performance. We evaluate our approach extensively in real-world settings and provide all experimental videos on the project website.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Robotics and automation Computer graphics and computer vision Computer Sciences
Identifiers
urn:nbn:se:kth:diva-377810 (URN)10.1109/IROS60139.2025.11247641 (DOI)2-s2.0-105029942640 (Scopus ID)
Conference
2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025, Hangzhou, China, Oct 19 2025 - Oct 25 2025
Note

Part of ISBN 9798331543938

QC 20260310

Available from: 2026-03-10 Created: 2026-03-10 Last updated: 2026-03-10Bibliographically approved
Zhang, Y., Orthmann, B., Welle, M. C., van Haastregt, J. & Kragic Jensfelt, D. (2025). LLM-Driven Augmented Reality Puppeteer: Controller-Free Voice-Commanded Robot Teleoperation. In: Social Computing and Social Media - 17th International Conference, SCSM 2025, Held as Part of the 27th HCI International Conference, HCII 2025, Proceedings: . Paper presented at 17th International Conference on Social Computing and Social Media, SCSM 2025, held as part of the 27th HCI International Conference, HCII 2025, Gothenburg, Sweden, Jun 22 2025 - Jun 27 2025 (pp. 97-112). Springer Nature
Open this publication in new window or tab >>LLM-Driven Augmented Reality Puppeteer: Controller-Free Voice-Commanded Robot Teleoperation
Show others...
2025 (English)In: Social Computing and Social Media - 17th International Conference, SCSM 2025, Held as Part of the 27th HCI International Conference, HCII 2025, Proceedings, Springer Nature , 2025, p. 97-112Conference paper, Published paper (Refereed)
Abstract [en]

The integration of robotics and augmented reality (AR) presents transformative opportunities for advancing human-robot interaction (HRI) by improving usability, intuitiveness, and accessibility. This work introduces a controller-free, LLM-driven voice-commanded AR puppeteering system, enabling users to teleoperate a robot by manipulating its virtual counterpart in real-time. By leveraging natural language processing (NLP) and AR technologies, our system—prototyped using Meta Quest 3—eliminates the need for physical controllers, enhancing ease of use while minimizing potential safety risks associated with direct robot operation. A preliminary user demonstration successfully validated the system’s functionality, demonstrating its potential for safer, more intuitive, and immersive robotic control.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
AR puppeteer, Controller-free, LLM-driven
National Category
Robotics and automation Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-364402 (URN)10.1007/978-3-031-93539-8_7 (DOI)001551225200007 ()2-s2.0-105007131127 (Scopus ID)
Conference
17th International Conference on Social Computing and Social Media, SCSM 2025, held as part of the 27th HCI International Conference, HCII 2025, Gothenburg, Sweden, Jun 22 2025 - Jun 27 2025
Note

 Part of ISBN 9783031935381

QC 20250613

Available from: 2025-06-12 Created: 2025-06-12 Last updated: 2025-12-08Bibliographically approved
Yang, Q., Welle, M. C., Kragic Jensfelt, D. & Andersson, O. (2025). S2-Diffusion: Generalizing from Instance-level to Category-level Skills in Robot Manipulation. IEEE Robotics and Automation Letters, 10(12), 12995-13002
Open this publication in new window or tab >>S2-Diffusion: Generalizing from Instance-level to Category-level Skills in Robot Manipulation
2025 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 10, no 12, p. 12995-13002Article in journal (Refereed) Published
Abstract [en]

Recent advances in skill learning has propelled robot manipulation to new heights by enabling it to learn complex manipulation tasks from a practical number of demonstrations. However, these skills are often limited to the particular action, object, and environment instances that are shown in the training data, and have trouble transferring to other instances of the same category. In this work we present an open-vocabulary Spatial-Semantic Diffusion policy (S2-Diffusion) which enables generalization from instance-level training data to category-level, enabling skills to be transferable between instances of the same category. We show that functional aspects of skills can be captured via a promptable semantic module combined with a spatial representation. We further propose leveraging depth estimation networks to allow the use of only a single RGB camera. Our approach is evaluated and compared on a diverse number of robot manipulation tasks, both in simulation and in the real world. Our results show that S2-Diffusion is invariant to changes in category-irrelevant factors as well as enables satisfying performance on other instances within the same category, even if it was not trained on that specific instance. Project website: https://s2-diffusion.github.io.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Deep Learning in Grasping and Manipulation, Imitation Learning, Learning from Demonstration
National Category
Robotics and automation Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-372570 (URN)10.1109/LRA.2025.3625497 (DOI)001611073100009 ()2-s2.0-105019658163 (Scopus ID)
Note

QC 20260120

Available from: 2025-11-10 Created: 2025-11-10 Last updated: 2026-01-20Bibliographically approved
Hadjiloizou, L., Welle, M. C., Yin, H. & Kragic Jensfelt, D. (2025). Towards Safe Reinforcement Learning with Reduced Conservativeness: A Case Study on Drone Flight Control. In: IROS 2025 - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, Conference Proceedings: . Paper presented at 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025, Hangzhou, China, Oct 19 2025 - Oct 25 2025 (pp. 14870-14876). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Towards Safe Reinforcement Learning with Reduced Conservativeness: A Case Study on Drone Flight Control
2025 (English)In: IROS 2025 - 2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, Conference Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 14870-14876Conference paper, Published paper (Refereed)
Abstract [en]

Incorporating formal methods into reinforcement learning (RL) has the potential to result in the best of both worlds, combining the robustness of formal guarantees with the adaptability and learning capabilities of RL, though careful design is needed to balance safety and exploration. In this work, we propose a framework to mitigate this loss of exploration while still allowing for the safety of the system to be ensured. Specifically, we introduce a less restrictive method that can reduce the conservativeness of formal methods by refining a disturbance model using online collected data and it evaluates the safety of a learning-based controller, using computationally efficient zonotopic reachability analysis for the safety analysis to facilitate a real-time implementation. We validate the framework in a real-world drone flight through a canyon, where the drone is subjected to unknown external disturbances and the framework is tasked with learning those disturbances online and adjusting the safety guarantees accordingly. The results show that the framework enables a less restrictive online training of learning-based controllers without compromising the safety of the system.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Control Engineering Robotics and automation Vehicle and Aerospace Engineering
Identifiers
urn:nbn:se:kth:diva-377815 (URN)10.1109/IROS60139.2025.11247738 (DOI)2-s2.0-105029957051 (Scopus ID)
Conference
2025 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2025, Hangzhou, China, Oct 19 2025 - Oct 25 2025
Note

Part of ISBN 9798331543938

QC 20260310

Available from: 2026-03-10 Created: 2026-03-10 Last updated: 2026-03-10Bibliographically approved
Longhini, A., Wang, Y., Garcia-Camacho, I., Blanco-Mulero, D., Moletta, M., Welle, M. C., . . . Kragic Jensfelt, D. (2025). Unfolding the Literature: A Review of Robotic Cloth Manipulation. Annual Review of Control, Robotics, and Autonomous Systems, 8(1), 295-322
Open this publication in new window or tab >>Unfolding the Literature: A Review of Robotic Cloth Manipulation
Show others...
2025 (English)In: Annual Review of Control, Robotics, and Autonomous Systems, E-ISSN 2573-5144, Vol. 8, no 1, p. 295-322Article, review/survey (Refereed) Published
Abstract [en]

The realm of textiles spans clothing, households, healthcare, sports, and industrial applications. The deformable nature of these objects poses unique challenges that prior work on rigid objects cannot fully address. The increasing interest within the community in textile perception and manipulation has led to new methods that aim to address challenges in modeling, perception, and control, resulting in significant progress. However, this progress is often tailored to one specific textile or a subcategory of these textiles. To understand what restricts these methods and hinders current approaches from generalizing to a broader range of real-world textiles, this review provides an overview of the field, focusing specifically on how and to what extent textile variations are addressed in modeling, perception, benchmarking, and manipulation of textiles. We conclude by identifying key open problems and outlining grand challenges that will drive future advancements in the field.

Keywords
deformable object manipulation, generalization, physical property variations, task variations, textiles
National Category
Robotics and automation Textile, Rubber and Polymeric Materials
Identifiers
urn:nbn:se:kth:diva-363740 (URN)10.1146/annurev-control-022723-033252 (DOI)001488650100012 ()2-s2.0-105004918049 (Scopus ID)
Note

QC 20250528

Available from: 2025-05-21 Created: 2025-05-21 Last updated: 2025-07-01Bibliographically approved
Ingelhag, N., Munkeby, J., van Haastregt, J., Varava, A., Welle, M. C. & Kragic, D. (2024). A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models. In: 2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024: . Paper presented at 33rd IEEE International Conference on Robot and Human Interactive Communication (IEEE RO-MAN) - Embracing Human-Centered HRI, AUG 26-30, 2024, Pasadena, CA (pp. 748-754). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models
Show others...
2024 (English)In: 2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 748-754Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we build upon two major recent developments in the field, Diffusion Policies for visuomotor manipulation and large pre-trained multimodal foundational models to obtain a robotic skill learning system. The system can obtain new skills via the behavioral cloning approach of visuomotor diffusion policies given teleoperated demonstrations. Foundational models are being used to perform skill selection given the user's prompt in natural language. Before executing a skill the foundational model performs a precondition check given an observation of the workspace. We compare the performance of different foundational models to this end and give a detailed experimental evaluation of the skills taught by the user in simulation and the real world. Finally, we showcase the combined system on a challenging food serving scenario in the real world. Videos of all experimental executions, as well as the process of teaching new skills in simulation and the real world, are available on the project's website(1).

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
IEEE RO-MAN, ISSN 1944-9445
National Category
Algebra and Logic
Identifiers
urn:nbn:se:kth:diva-358777 (URN)10.1109/RO-MAN60168.2024.10731242 (DOI)001348918600099 ()2-s2.0-85209783266 (Scopus ID)
Conference
33rd IEEE International Conference on Robot and Human Interactive Communication (IEEE RO-MAN) - Embracing Human-Centered HRI, AUG 26-30, 2024, Pasadena, CA
Note

Part of ISBN 979-8-3503-7503-9; 979-8-3503-7502-2

QC 20250122

Available from: 2025-01-22 Created: 2025-01-22 Last updated: 2025-03-12Bibliographically approved
Longhini, A., Welle, M. C., Erickson, Z. & Kragic, D. (2024). AdaFold: Adapting Folding Trajectories of Cloths via Feedback-Loop Manipulation. IEEE Robotics and Automation Letters, 9(11), 9183-9190
Open this publication in new window or tab >>AdaFold: Adapting Folding Trajectories of Cloths via Feedback-Loop Manipulation
2024 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 9, no 11, p. 9183-9190Article in journal (Refereed) Published
Abstract [en]

We present AdaFold, a model-based feedback-loop framework for optimizing folding trajectories. AdaFold extracts a particle-based representation of cloth from RGB-D images and feeds back the representation to a model predictive control to re-plan folding trajectory at every time-step. A key component of AdaFold that enables feedback-loop manipulation is the use of semantic descriptors extracted from geometric features. These descriptors enhance the particle representation of the cloth to distinguish between ambiguous point clouds of differently folded cloths. Our experiments demonstrate AdaFold's ability to adapt folding trajectories of cloths with varying physical properties and generalize from simulated training to real-world execution.

Place, publisher, year, edition, pages
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2024
Keywords
Trajectory optimization, Shape, Manipulation planning, perception for grasping and manipulation, RGB-D perception, semantic scene understanding
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-354332 (URN)10.1109/LRA.2024.3436329 (DOI)001316209900014 ()2-s2.0-85199779805 (Scopus ID)
Note

QC 20241004

Available from: 2024-10-04 Created: 2024-10-04 Last updated: 2025-02-09Bibliographically approved
Lippi, M., Welle, M. C., Gasparri, A. & Kragic, D. (2024). Ensemble latent space roadmap for improved robustness in visual action planning. In: 2024 IEEE International Conference on Robotics and Automation, ICRA 2024: . Paper presented at 2024 IEEE International Conference on Robotics and Automation, ICRA 2024, May 13-17, 2024, Yokohama, Japan (pp. 2638-2644). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Ensemble latent space roadmap for improved robustness in visual action planning
2024 (English)In: 2024 IEEE International Conference on Robotics and Automation, ICRA 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 2638-2644Conference paper, Published paper (Refereed)
Abstract [en]

Planning in learned latent spaces helps to decrease the dimensionality of raw observations. In this work, we propose to leverage the ensemble paradigm to enhance the robustness of latent planning systems. We rely on our Latent Space Roadmap (LSR) framework, which builds a graph in a learned structured latent space to perform planning. Given multiple LSR framework instances, that differ either on their latent spaces or on the parameters for constructing the graph, we use the action information as well as the embedded nodes of the produced plans to define similarity measures. These are then utilized to select the most promising plans. We validate the performance of our Ensemble LSR (ENS-LSR) on simulated box stacking and grape harvesting tasks as well as on a real-world robotic T-shirt folding experiment.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-353547 (URN)10.1109/ICRA57147.2024.10611385 (DOI)001294576202029 ()2-s2.0-85202451901 (Scopus ID)
Conference
2024 IEEE International Conference on Robotics and Automation, ICRA 2024, May 13-17, 2024, Yokohama, Japan
Note

Part of ISBN: 9798350384574

QC 20240926

Available from: 2024-09-19 Created: 2024-09-19 Last updated: 2025-12-05Bibliographically approved
Lippi, M., Welle, M. C., Wozniak, M. K., Gasparri, A. & Kragic, D. (2024). Low-Cost Teleoperation with Haptic Feedback through Vision-based Tactile Sensors for Rigid and Soft Object Manipulation. In: 2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024: . Paper presented at 33rd IEEE International Conference on Robot and Human Interactive Communication (IEEE RO-MAN) - Embracing Human-Centered HRI, AUG 26-30, 2024, Pasadena, CA (pp. 1963-1969). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Low-Cost Teleoperation with Haptic Feedback through Vision-based Tactile Sensors for Rigid and Soft Object Manipulation
Show others...
2024 (English)In: 2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 1963-1969Conference paper, Published paper (Refereed)
Abstract [en]

Haptic feedback is essential for humans to successfully perform complex and delicate manipulation tasks. A recent rise in tactile sensors has enabled robots to leverage the sense of touch and expand their capability drastically. However, many tasks still need human intervention/guidance. For this reason, we present a teleoperation framework designed to provide haptic feedback to human operators based on the data from camera-based tactile sensors mounted on the robot gripper. Partial autonomy is introduced to prevent slippage of grasped objects during task execution. Notably, we rely exclusively on low-cost off-the-shelf hardware to realize an affordable solution. We demonstrate the versatility of the framework on nine different objects ranging from rigid to soft and fragile ones, using three different operators on real hardware.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
IEEE RO-MAN, ISSN 1944-9445
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-358791 (URN)10.1109/RO-MAN60168.2024.10731383 (DOI)001348918600260 ()2-s2.0-85209811452 (Scopus ID)
Conference
33rd IEEE International Conference on Robot and Human Interactive Communication (IEEE RO-MAN) - Embracing Human-Centered HRI, AUG 26-30, 2024, Pasadena, CA
Note

Part of ISBN 979-8-3503-7503-9; 979-8-3503-7502-2QC 20250121

Available from: 2025-01-21 Created: 2025-01-21 Last updated: 2025-01-21Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3827-3824

Search in DiVA

Show all publications