kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 26) Show all publications
Weng, Z., Zhou, P., Yin, H., Kravchenko, A., Varava, A., Navarro-Alarcon, D. & Kragic, D. (2024). Interactive Perception for Deformable Object Manipulation. IEEE Robotics and Automation Letters, 9(9), 7763-7770
Open this publication in new window or tab >>Interactive Perception for Deformable Object Manipulation
Show others...
2024 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 9, no 9, p. 7763-7770Article in journal (Refereed) Published
Abstract [en]

Interactive perception enables robots to manipulate the environment and objects to bring them into states that benefit the perception process. Deformable objects pose challenges to this due to manipulation difficulty and occlusion in vision-based perception. In this work, we address such a problem with a setup involving both an active camera and an object manipulator. Our approach is based on a sequential decision-making framework and explicitly considers the motion regularity and structure in coupling the camera and manipulator. We contribute a method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration. The effectiveness of the framework and approach are validated in both a simulation and a real dual-arm robot setup. Our results confirm the necessity of an active camera and coordinative motion in interactive perception for deformable objects.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Cameras, Manifolds, IP networks, End effectors, Task analysis, Couplings, Robot kinematics, Perception for grasping and manipulation, perception-action coupling, manipulation planning
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-352106 (URN)10.1109/LRA.2024.3431943 (DOI)001283670800004 ()2-s2.0-85199505576 (Scopus ID)
Note

QC 20240822

Available from: 2024-08-22 Created: 2024-08-22 Last updated: 2025-05-14Bibliographically approved
Yin, W., Tu, R., Yin, H., Kragic, D., Kjellström, H. & Björkman, M. (2023). Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models. In: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN: . Paper presented at 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA (pp. 1102-1108). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models
Show others...
2023 (English)In: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 1102-1108Conference paper, Published paper (Refereed)
Abstract [en]

Data-driven and controllable human motion synthesis and prediction are active research areas with various applications in interactive media and social robotics. Challenges remain in these fields for generating diverse motions given past observations and dealing with imperfect poses. This paper introduces MoDiff, an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities. Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities. We also introduce a new data dropout method based on the diffusion forward process to provide richer data representations and robust generation. We demonstrate the superior performance of MoDiff in controllable motion synthesis for locomotion with respect to two baselines and show the benefits of diffusion data dropout for robust synthesis and reconstruction of high-fidelity motion close to recorded data.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
IEEE RO-MAN, ISSN 1944-9445
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-341978 (URN)10.1109/RO-MAN57019.2023.10309317 (DOI)001108678600131 ()2-s2.0-85186990309 (Scopus ID)
Conference
32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA
Note

Part of proceedings ISBN 979-8-3503-3670-2

QC 20240110

Available from: 2024-01-10 Created: 2024-01-10 Last updated: 2025-02-07Bibliographically approved
Yin, W., Yin, H., Baraka, K., Kragic, D. & Björkman, M. (2023). Dance Style Transfer with Cross-modal Transformer. In: 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV): . Paper presented at 23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), JAN 03-07, 2023, Waikoloa, HI (pp. 5047-5056). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Dance Style Transfer with Cross-modal Transformer
Show others...
2023 (English)In: 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 5047-5056Conference paper, Published paper (Refereed)
Abstract [en]

We present CycleDance, a dance style transfer system to transform an existing motion clip in one dance style to a motion clip in another dance style while attempting to preserve motion context of the dance. Our method extends an existing CycleGAN architecture for modeling audio sequences and integrates multimodal transformer encoders to account for music context. We adopt sequence length-based curriculum learning to stabilize training. Our approach captures rich and long-term intra-relations between motion frames, which is a common challenge in motion transfer and synthesis work. We further introduce new metrics for gauging transfer strength and content preservation in the context of dance movements. We perform an extensive ablation study as well as a human study including 30 participants with 5 or more years of dance experience. The results demonstrate that CycleDance generates realistic movements with the target style, significantly outperforming the baseline CycleGAN on naturalness, transfer strength, and content preservation. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
IEEE Winter Conference on Applications of Computer Vision, ISSN 2472-6737
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-333220 (URN)10.1109/WACV56688.2023.00503 (DOI)000971500205016 ()2-s2.0-85149044034 (Scopus ID)
Conference
23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), JAN 03-07, 2023, Waikoloa, HI
Note

QC 20230731

Available from: 2023-07-31 Created: 2023-07-31 Last updated: 2025-02-07Bibliographically approved
Lippi, M., Poklukar, P., Welle, M. C., Varava, A., Yin, H., Marino, A. & Kragic, D. (2023). Enabling Visual Action Planning for Object Manipulation Through Latent Space Roadmap. IEEE Transactions on robotics, 39(1), 57-75
Open this publication in new window or tab >>Enabling Visual Action Planning for Object Manipulation Through Latent Space Roadmap
Show others...
2023 (English)In: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 39, no 1, p. 57-75Article in journal (Refereed) Published
Abstract [en]

In this article, we present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces, focusing on manipulation of deformable objects. We propose a latent space roadmap (LSR) for task planning, which is a graph-based structure globally capturing the system dynamics in a low-dimensional latent space. Our framework consists of the following three parts. First, a mapping module (MM) that maps observations is given in the form of images into a structured latent space extracting the respective states as well as generates observations from the latent states. Second, the LSR, which builds and connects clusters containing similar states in order to find the latent plans between start and goal states, extracted by MM. Third, the action proposal module that complements the latent plan found by the LSR with the corresponding actions. We present a thorough investigation of our framework on simulated box stacking and rope/box manipulation tasks, and a folding task executed on a real robot. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Deep Learning in Robotics and Automation, Latent Space Planning, Manipulation Planning, Visual Learning, Deep learning, Graphic methods, Job analysis, Planning, Robot programming, Action planning, Deep learning in robotic and automation, Heuristics algorithm, Roadmap, Space planning, Stackings, Task analysis, Heuristic algorithms
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-326180 (URN)10.1109/TRO.2022.3188163 (DOI)000829072000001 ()2-s2.0-85135223386 (Scopus ID)
Note

QC 20230502

Available from: 2023-05-02 Created: 2023-05-02 Last updated: 2025-02-09Bibliographically approved
Reichlin, A., Marchetti, G. L., Yin, H., Ghadirzadeh, A. & Kragic, D. (2022). Back to the Manifold: Recovering from Out-of-Distribution States. In: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS): . Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN (pp. 8660-8666). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Back to the Manifold: Recovering from Out-of-Distribution States
Show others...
2022 (English)In: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 8660-8666Conference paper, Published paper (Refereed)
Abstract [en]

Learning from previously collected datasets of expert data offers the promise of acquiring robotic policies without unsafe and costly online explorations. However, a major challenge is a distributional shift between the states in the training dataset and the ones visited by the learned policy at the test time. While prior works mainly studied the distribution shift caused by the policy during the offline training, the problem of recovering from out-of-distribution states at the deployment time is not very well studied yet. We alleviate the distributional shift at the deployment time by introducing a recovery policy that brings the agent back to the training manifold whenever it steps out of the in-distribution states, e.g., due to an external perturbation. The recovery policy relies on an approximation of the training data density and a learned equivariant mapping that maps visual observations into a latent space in which translations correspond to the robot actions. We demonstrate the effectiveness of the proposed method through several manipulation experiments on a real robotic platform. Our results show that the recovery policy enables the agent to complete tasks while the behavioral cloning alone fails because of the distributional shift problem.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
IEEE International Conference on Intelligent Robots and Systems, ISSN 2153-0858
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-324860 (URN)10.1109/IROS47612.2022.9981315 (DOI)000909405301050 ()2-s2.0-85146319849 (Scopus ID)
Conference
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN
Note

QC 20230322

Available from: 2023-03-22 Created: 2023-03-22 Last updated: 2024-03-04Bibliographically approved
Yin, H., Verginis, C. K. & Kragic, D. (2022). Consensus-based Normalizing-Flow Control: A Case Study in Learning Dual-Arm Coordination. In: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS): . Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN (pp. 10417-10424). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Consensus-based Normalizing-Flow Control: A Case Study in Learning Dual-Arm Coordination
2022 (English)In: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 10417-10424Conference paper, Published paper (Refereed)
Abstract [en]

We develop two consensus-based learning algorithms for multi-robot systems applied on complex tasks involving collision constraints and force interactions, such as the cooperative peg-in-hole placement. The proposed algorithms integrate multi-robot distributed consensus and normalizingflow-based reinforcement learning. The algorithms guarantee the stability and the consensus of the multi-robot system's generalized variables in a transformed space. This transformed space is obtained via a diffeomorphic transformation parameterized by normalizing-flow models that the algorithms use to train the underlying task, learning hence skillful, dexterous trajectories required for the task accomplishment. We validate the proposed algorithms by parameterizing reinforcement learning policies, demonstrating efficient cooperative learning, and strong generalization of dual-arm assembly skills in a dynamics-engine simulator.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
IEEE International Conference on Intelligent Robots and Systems, ISSN 2153-0858
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-324871 (URN)10.1109/IROS47612.2022.9981827 (DOI)000909405302093 ()2-s2.0-85146353921 (Scopus ID)
Conference
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN
Note

QC 20230320

Available from: 2023-03-20 Created: 2023-03-20 Last updated: 2025-02-09Bibliographically approved
Yin, H., Welle, M. C. & Kragic, D. (2022). Embedding Koopman Optimal Control in Robot Policy Learning. In: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS): . Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN (pp. 13392-13399). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Embedding Koopman Optimal Control in Robot Policy Learning
2022 (English)In: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 13392-13399Conference paper, Published paper (Refereed)
Abstract [en]

Embedding an optimization process has been explored for imposing efficient and flexible policy structures. Existing work often build upon nonlinear optimization with explicitly iteration steps, making policy inference prohibitively expensive for online learning and real-time control. Our approach embeds a linear-quadratic-regulator (LQR) formulation with a Koopman representation, thus exhibiting the tractability from a closed-form solution and richness from a non-convex neural network. We use a few auxiliary objectives and reparameterization to enforce optimality conditions of the policy that can be easily integrated to standard gradient-based learning. Our approach is shown to be effective for learning policies rendering an optimality structure and efficient reinforcement learning, including simulated pendulum control, 2D and 3D walking, and manipulation for both rigid and deformable objects. We also demonstrate real world application in a robot pivoting task.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
IEEE International Conference on Intelligent Robots and Systems, ISSN 2153-0858
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-324865 (URN)10.1109/IROS47612.2022.9981540 (DOI)000909405304070 ()2-s2.0-85146355853 (Scopus ID)
Conference
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN
Note

QC 20230322

Available from: 2023-03-22 Created: 2023-03-22 Last updated: 2025-02-09Bibliographically approved
Poklukar, P., Vasco, M., Yin, H., Melo, F. S., Paiva, A. & Kragic, D. (2022). Geometric Multimodal Contrastive Representation Learning. In: Proceedings of the 39th International Conference on Machine Learning, ICML 2022: . Paper presented at 39th International Conference on Machine Learning, ICML 2022, Baltimore, United States of America, Jul 17 2022 - Jul 23 2022 (pp. 17782-17800). ML Research Press
Open this publication in new window or tab >>Geometric Multimodal Contrastive Representation Learning
Show others...
2022 (English)In: Proceedings of the 39th International Conference on Machine Learning, ICML 2022, ML Research Press , 2022, p. 17782-17800Conference paper, Published paper (Refereed)
Abstract [en]

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

Place, publisher, year, edition, pages
ML Research Press, 2022
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-333348 (URN)000900064907043 ()2-s2.0-85153911322 (Scopus ID)
Conference
39th International Conference on Machine Learning, ICML 2022, Baltimore, United States of America, Jul 17 2022 - Jul 23 2022
Note

QC 20230801

Available from: 2023-08-01 Created: 2023-08-01 Last updated: 2023-08-14Bibliographically approved
Poklukar, P., Miguel, V., Yin, H., Melo, F. S., Paiva, A. & Kragic, D. (2022). GMC - Geometric Multimodal Contrastive Representation Learning. In: : . Paper presented at International Conference on Machine Learning.
Open this publication in new window or tab >>GMC - Geometric Multimodal Contrastive Representation Learning
Show others...
2022 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method comprised of two main components: i) a two level architecture consisting of modality-specific base encoder, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

Keywords
Representation Learning, Machine Learning, Multimodal, Contrastive Learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-312719 (URN)
Conference
International Conference on Machine Learning
Note

QC 20220614

Available from: 2022-05-20 Created: 2022-05-20 Last updated: 2022-06-25Bibliographically approved
Vasco, M., Yin, H., Melo, F. S. & Paiva, A. (2022). How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS: . Paper presented at 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, 9-13 May 2022 (pp. 1301-1309). International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Open this publication in new window or tab >>How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents
2022 (English)In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) , 2022, p. 1301-1309Conference paper, Published paper (Refereed)
Abstract [en]

This work addresses the problem of sensing the world: how to learn a multimodal representation of a reinforcement learning agent's environment that allows the execution of tasks under incomplete perceptual conditions. To address such problem, we argue for hierarchy in the design of representation models and contribute with a novel multimodal representation model, MUSE. The proposed model learns a hierarchy of representations: low-level modality-specific representations, encoded from raw observation data, and a high-level multimodal representation, encoding joint-modality information to allow robust state estimation. We employ MUSE as the perceptual model of deep reinforcement learning agents provided with multimodal observations in Atari games. We perform a comparative study over different designs of reinforcement learning agents, showing that MUSE allows agents to perform tasks under incomplete perceptual experience with minimal performance loss. Finally, we also evaluate the generative performance of MUSE in literature-standard multimodal scenarios with higher number and more complex modalities, showing that it outperforms state-of-the-art multimodal variational autoencoders in single and cross-modality generation.

Place, publisher, year, edition, pages
International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2022
Keywords
Multimodal Representation Learning, Reinforcement Learning, Unsupervised Learning, Autonomous agents, Deep learning, Learning systems, Multi agent systems, Condition, Encodings, Learn+, Multi-modal, Multimodal perception, Observation data, Reinforcement learning agent, Reinforcement learnings, Representation model
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-326092 (URN)2-s2.0-85134325159 (Scopus ID)
Conference
21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, 9-13 May 2022
Note

QC 20230424

Available from: 2023-04-24 Created: 2023-04-24 Last updated: 2023-04-24Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-3599-440x

Search in DiVA

Show all publications