kth.sePublikationer
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (10 of 26) Visa alla publikationer
Weng, Z., Zhou, P., Yin, H., Kravchenko, A., Varava, A., Navarro-Alarcon, D. & Kragic, D. (2024). Interactive Perception for Deformable Object Manipulation. IEEE Robotics and Automation Letters, 9(9), 7763-7770
Öppna denna publikation i ny flik eller fönster >>Interactive Perception for Deformable Object Manipulation
Visa övriga...
2024 (Engelska)Ingår i: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 9, nr 9, s. 7763-7770Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Interactive perception enables robots to manipulate the environment and objects to bring them into states that benefit the perception process. Deformable objects pose challenges to this due to manipulation difficulty and occlusion in vision-based perception. In this work, we address such a problem with a setup involving both an active camera and an object manipulator. Our approach is based on a sequential decision-making framework and explicitly considers the motion regularity and structure in coupling the camera and manipulator. We contribute a method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration. The effectiveness of the framework and approach are validated in both a simulation and a real dual-arm robot setup. Our results confirm the necessity of an active camera and coordinative motion in interactive perception for deformable objects.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2024
Nyckelord
Cameras, Manifolds, IP networks, End effectors, Task analysis, Couplings, Robot kinematics, Perception for grasping and manipulation, perception-action coupling, manipulation planning
Nationell ämneskategori
Robotik och automation
Identifikatorer
urn:nbn:se:kth:diva-352106 (URN)10.1109/LRA.2024.3431943 (DOI)001283670800004 ()2-s2.0-85199505576 (Scopus ID)
Anmärkning

QC 20240822

Tillgänglig från: 2024-08-22 Skapad: 2024-08-22 Senast uppdaterad: 2025-05-14Bibliografiskt granskad
Yin, W., Tu, R., Yin, H., Kragic, D., Kjellström, H. & Björkman, M. (2023). Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models. In: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN: . Paper presented at 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA (pp. 1102-1108). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Controllable Motion Synthesis and Reconstruction with Autoregressive Diffusion Models
Visa övriga...
2023 (Engelska)Ingår i: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 1102-1108Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Data-driven and controllable human motion synthesis and prediction are active research areas with various applications in interactive media and social robotics. Challenges remain in these fields for generating diverse motions given past observations and dealing with imperfect poses. This paper introduces MoDiff, an autoregressive probabilistic diffusion model over motion sequences conditioned on control contexts of other modalities. Our model integrates a cross-modal Transformer encoder and a Transformer-based decoder, which are found effective in capturing temporal correlations in motion and control modalities. We also introduce a new data dropout method based on the diffusion forward process to provide richer data representations and robust generation. We demonstrate the superior performance of MoDiff in controllable motion synthesis for locomotion with respect to two baselines and show the benefits of diffusion data dropout for robust synthesis and reconstruction of high-fidelity motion close to recorded data.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2023
Serie
IEEE RO-MAN, ISSN 1944-9445
Nationell ämneskategori
Datorgrafik och datorseende
Identifikatorer
urn:nbn:se:kth:diva-341978 (URN)10.1109/RO-MAN57019.2023.10309317 (DOI)001108678600131 ()2-s2.0-85186990309 (Scopus ID)
Konferens
32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA
Anmärkning

Part of proceedings ISBN 979-8-3503-3670-2

QC 20240110

Tillgänglig från: 2024-01-10 Skapad: 2024-01-10 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Yin, W., Yin, H., Baraka, K., Kragic, D. & Björkman, M. (2023). Dance Style Transfer with Cross-modal Transformer. In: 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV): . Paper presented at 23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), JAN 03-07, 2023, Waikoloa, HI (pp. 5047-5056). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Dance Style Transfer with Cross-modal Transformer
Visa övriga...
2023 (Engelska)Ingår i: 2023 IEEE/CVF WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV), Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 5047-5056Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We present CycleDance, a dance style transfer system to transform an existing motion clip in one dance style to a motion clip in another dance style while attempting to preserve motion context of the dance. Our method extends an existing CycleGAN architecture for modeling audio sequences and integrates multimodal transformer encoders to account for music context. We adopt sequence length-based curriculum learning to stabilize training. Our approach captures rich and long-term intra-relations between motion frames, which is a common challenge in motion transfer and synthesis work. We further introduce new metrics for gauging transfer strength and content preservation in the context of dance movements. We perform an extensive ablation study as well as a human study including 30 participants with 5 or more years of dance experience. The results demonstrate that CycleDance generates realistic movements with the target style, significantly outperforming the baseline CycleGAN on naturalness, transfer strength, and content preservation. 

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2023
Serie
IEEE Winter Conference on Applications of Computer Vision, ISSN 2472-6737
Nationell ämneskategori
Datorgrafik och datorseende
Identifikatorer
urn:nbn:se:kth:diva-333220 (URN)10.1109/WACV56688.2023.00503 (DOI)000971500205016 ()2-s2.0-85149044034 (Scopus ID)
Konferens
23rd IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), JAN 03-07, 2023, Waikoloa, HI
Anmärkning

QC 20230731

Tillgänglig från: 2023-07-31 Skapad: 2023-07-31 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Lippi, M., Poklukar, P., Welle, M. C., Varava, A., Yin, H., Marino, A. & Kragic, D. (2023). Enabling Visual Action Planning for Object Manipulation Through Latent Space Roadmap. IEEE Transactions on robotics, 39(1), 57-75
Öppna denna publikation i ny flik eller fönster >>Enabling Visual Action Planning for Object Manipulation Through Latent Space Roadmap
Visa övriga...
2023 (Engelska)Ingår i: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 39, nr 1, s. 57-75Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

In this article, we present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces, focusing on manipulation of deformable objects. We propose a latent space roadmap (LSR) for task planning, which is a graph-based structure globally capturing the system dynamics in a low-dimensional latent space. Our framework consists of the following three parts. First, a mapping module (MM) that maps observations is given in the form of images into a structured latent space extracting the respective states as well as generates observations from the latent states. Second, the LSR, which builds and connects clusters containing similar states in order to find the latent plans between start and goal states, extracted by MM. Third, the action proposal module that complements the latent plan found by the LSR with the corresponding actions. We present a thorough investigation of our framework on simulated box stacking and rope/box manipulation tasks, and a folding task executed on a real robot. 

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2023
Nyckelord
Deep Learning in Robotics and Automation, Latent Space Planning, Manipulation Planning, Visual Learning, Deep learning, Graphic methods, Job analysis, Planning, Robot programming, Action planning, Deep learning in robotic and automation, Heuristics algorithm, Roadmap, Space planning, Stackings, Task analysis, Heuristic algorithms
Nationell ämneskategori
Robotik och automation
Identifikatorer
urn:nbn:se:kth:diva-326180 (URN)10.1109/TRO.2022.3188163 (DOI)000829072000001 ()2-s2.0-85135223386 (Scopus ID)
Anmärkning

QC 20230502

Tillgänglig från: 2023-05-02 Skapad: 2023-05-02 Senast uppdaterad: 2025-02-09Bibliografiskt granskad
Reichlin, A., Marchetti, G. L., Yin, H., Ghadirzadeh, A. & Kragic, D. (2022). Back to the Manifold: Recovering from Out-of-Distribution States. In: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS): . Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN (pp. 8660-8666). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Back to the Manifold: Recovering from Out-of-Distribution States
Visa övriga...
2022 (Engelska)Ingår i: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), Institute of Electrical and Electronics Engineers (IEEE) , 2022, s. 8660-8666Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Learning from previously collected datasets of expert data offers the promise of acquiring robotic policies without unsafe and costly online explorations. However, a major challenge is a distributional shift between the states in the training dataset and the ones visited by the learned policy at the test time. While prior works mainly studied the distribution shift caused by the policy during the offline training, the problem of recovering from out-of-distribution states at the deployment time is not very well studied yet. We alleviate the distributional shift at the deployment time by introducing a recovery policy that brings the agent back to the training manifold whenever it steps out of the in-distribution states, e.g., due to an external perturbation. The recovery policy relies on an approximation of the training data density and a learned equivariant mapping that maps visual observations into a latent space in which translations correspond to the robot actions. We demonstrate the effectiveness of the proposed method through several manipulation experiments on a real robotic platform. Our results show that the recovery policy enables the agent to complete tasks while the behavioral cloning alone fails because of the distributional shift problem.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2022
Serie
IEEE International Conference on Intelligent Robots and Systems, ISSN 2153-0858
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-324860 (URN)10.1109/IROS47612.2022.9981315 (DOI)000909405301050 ()2-s2.0-85146319849 (Scopus ID)
Konferens
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN
Anmärkning

QC 20230322

Tillgänglig från: 2023-03-22 Skapad: 2023-03-22 Senast uppdaterad: 2024-03-04Bibliografiskt granskad
Yin, H., Verginis, C. K. & Kragic, D. (2022). Consensus-based Normalizing-Flow Control: A Case Study in Learning Dual-Arm Coordination. In: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS): . Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN (pp. 10417-10424). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Consensus-based Normalizing-Flow Control: A Case Study in Learning Dual-Arm Coordination
2022 (Engelska)Ingår i: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), Institute of Electrical and Electronics Engineers (IEEE) , 2022, s. 10417-10424Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We develop two consensus-based learning algorithms for multi-robot systems applied on complex tasks involving collision constraints and force interactions, such as the cooperative peg-in-hole placement. The proposed algorithms integrate multi-robot distributed consensus and normalizingflow-based reinforcement learning. The algorithms guarantee the stability and the consensus of the multi-robot system's generalized variables in a transformed space. This transformed space is obtained via a diffeomorphic transformation parameterized by normalizing-flow models that the algorithms use to train the underlying task, learning hence skillful, dexterous trajectories required for the task accomplishment. We validate the proposed algorithms by parameterizing reinforcement learning policies, demonstrating efficient cooperative learning, and strong generalization of dual-arm assembly skills in a dynamics-engine simulator.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2022
Serie
IEEE International Conference on Intelligent Robots and Systems, ISSN 2153-0858
Nationell ämneskategori
Robotik och automation
Identifikatorer
urn:nbn:se:kth:diva-324871 (URN)10.1109/IROS47612.2022.9981827 (DOI)000909405302093 ()2-s2.0-85146353921 (Scopus ID)
Konferens
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN
Anmärkning

QC 20230320

Tillgänglig från: 2023-03-20 Skapad: 2023-03-20 Senast uppdaterad: 2025-02-09Bibliografiskt granskad
Yin, H., Welle, M. C. & Kragic, D. (2022). Embedding Koopman Optimal Control in Robot Policy Learning. In: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS): . Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN (pp. 13392-13399). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Embedding Koopman Optimal Control in Robot Policy Learning
2022 (Engelska)Ingår i: 2022 IEEE/RSJ INTERNATIONAL CONFERENCE ON INTELLIGENT ROBOTS AND SYSTEMS (IROS), Institute of Electrical and Electronics Engineers (IEEE) , 2022, s. 13392-13399Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Embedding an optimization process has been explored for imposing efficient and flexible policy structures. Existing work often build upon nonlinear optimization with explicitly iteration steps, making policy inference prohibitively expensive for online learning and real-time control. Our approach embeds a linear-quadratic-regulator (LQR) formulation with a Koopman representation, thus exhibiting the tractability from a closed-form solution and richness from a non-convex neural network. We use a few auxiliary objectives and reparameterization to enforce optimality conditions of the policy that can be easily integrated to standard gradient-based learning. Our approach is shown to be effective for learning policies rendering an optimality structure and efficient reinforcement learning, including simulated pendulum control, 2D and 3D walking, and manipulation for both rigid and deformable objects. We also demonstrate real world application in a robot pivoting task.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2022
Serie
IEEE International Conference on Intelligent Robots and Systems, ISSN 2153-0858
Nationell ämneskategori
Robotik och automation
Identifikatorer
urn:nbn:se:kth:diva-324865 (URN)10.1109/IROS47612.2022.9981540 (DOI)000909405304070 ()2-s2.0-85146355853 (Scopus ID)
Konferens
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), OCT 23-27, 2022, Kyoto, JAPAN
Anmärkning

QC 20230322

Tillgänglig från: 2023-03-22 Skapad: 2023-03-22 Senast uppdaterad: 2025-02-09Bibliografiskt granskad
Poklukar, P., Vasco, M., Yin, H., Melo, F. S., Paiva, A. & Kragic, D. (2022). Geometric Multimodal Contrastive Representation Learning. In: Proceedings of the 39th International Conference on Machine Learning, ICML 2022: . Paper presented at 39th International Conference on Machine Learning, ICML 2022, Baltimore, United States of America, Jul 17 2022 - Jul 23 2022 (pp. 17782-17800). ML Research Press
Öppna denna publikation i ny flik eller fönster >>Geometric Multimodal Contrastive Representation Learning
Visa övriga...
2022 (Engelska)Ingår i: Proceedings of the 39th International Conference on Machine Learning, ICML 2022, ML Research Press , 2022, s. 17782-17800Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method consisting of two main components: i) a two-level architecture consisting of modality-specific base encoders, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

Ort, förlag, år, upplaga, sidor
ML Research Press, 2022
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-333348 (URN)000900064907043 ()2-s2.0-85153911322 (Scopus ID)
Konferens
39th International Conference on Machine Learning, ICML 2022, Baltimore, United States of America, Jul 17 2022 - Jul 23 2022
Anmärkning

QC 20230801

Tillgänglig från: 2023-08-01 Skapad: 2023-08-01 Senast uppdaterad: 2023-08-14Bibliografiskt granskad
Poklukar, P., Miguel, V., Yin, H., Melo, F. S., Paiva, A. & Kragic, D. (2022). GMC - Geometric Multimodal Contrastive Representation Learning. In: : . Paper presented at International Conference on Machine Learning.
Öppna denna publikation i ny flik eller fönster >>GMC - Geometric Multimodal Contrastive Representation Learning
Visa övriga...
2022 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Learning representations of multimodal data that are both informative and robust to missing modalities at test time remains a challenging problem due to the inherent heterogeneity of data obtained from different channels. To address it, we present a novel Geometric Multimodal Contrastive (GMC) representation learning method comprised of two main components: i) a two level architecture consisting of modality-specific base encoder, allowing to process an arbitrary number of modalities to an intermediate representation of fixed dimensionality, and a shared projection head, mapping the intermediate representations to a latent representation space; ii) a multimodal contrastive loss function that encourages the geometric alignment of the learned representations. We experimentally demonstrate that GMC representations are semantically rich and achieve state-of-the-art performance with missing modality information on three different learning problems including prediction and reinforcement learning tasks.

Nyckelord
Representation Learning, Machine Learning, Multimodal, Contrastive Learning
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-312719 (URN)
Konferens
International Conference on Machine Learning
Anmärkning

QC 20220614

Tillgänglig från: 2022-05-20 Skapad: 2022-05-20 Senast uppdaterad: 2022-06-25Bibliografiskt granskad
Vasco, M., Yin, H., Melo, F. S. & Paiva, A. (2022). How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS: . Paper presented at 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, 9-13 May 2022 (pp. 1301-1309). International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Öppna denna publikation i ny flik eller fönster >>How to Sense the World: Leveraging Hierarchy in Multimodal Perception for Robust Reinforcement Learning Agents
2022 (Engelska)Ingår i: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) , 2022, s. 1301-1309Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

This work addresses the problem of sensing the world: how to learn a multimodal representation of a reinforcement learning agent's environment that allows the execution of tasks under incomplete perceptual conditions. To address such problem, we argue for hierarchy in the design of representation models and contribute with a novel multimodal representation model, MUSE. The proposed model learns a hierarchy of representations: low-level modality-specific representations, encoded from raw observation data, and a high-level multimodal representation, encoding joint-modality information to allow robust state estimation. We employ MUSE as the perceptual model of deep reinforcement learning agents provided with multimodal observations in Atari games. We perform a comparative study over different designs of reinforcement learning agents, showing that MUSE allows agents to perform tasks under incomplete perceptual experience with minimal performance loss. Finally, we also evaluate the generative performance of MUSE in literature-standard multimodal scenarios with higher number and more complex modalities, showing that it outperforms state-of-the-art multimodal variational autoencoders in single and cross-modality generation.

Ort, förlag, år, upplaga, sidor
International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2022
Nyckelord
Multimodal Representation Learning, Reinforcement Learning, Unsupervised Learning, Autonomous agents, Deep learning, Learning systems, Multi agent systems, Condition, Encodings, Learn+, Multi-modal, Multimodal perception, Observation data, Reinforcement learning agent, Reinforcement learnings, Representation model
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-326092 (URN)2-s2.0-85134325159 (Scopus ID)
Konferens
21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, 9-13 May 2022
Anmärkning

QC 20230424

Tillgänglig från: 2023-04-24 Skapad: 2023-04-24 Senast uppdaterad: 2023-04-24Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-3599-440x

Sök vidare i DiVA

Visa alla publikationer