kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 40) Show all publications
Ingelhag, N., Munkeby, J., van Haastregt, J., Varava, A., Welle, M. C. & Kragic, D. (2024). A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models. In: 2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024: . Paper presented at 33rd IEEE International Conference on Robot and Human Interactive Communication (IEEE RO-MAN) - Embracing Human-Centered HRI, AUG 26-30, 2024, Pasadena, CA (pp. 748-754). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models
Show others...
2024 (English)In: 2024 33RD IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, ROMAN 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 748-754Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we build upon two major recent developments in the field, Diffusion Policies for visuomotor manipulation and large pre-trained multimodal foundational models to obtain a robotic skill learning system. The system can obtain new skills via the behavioral cloning approach of visuomotor diffusion policies given teleoperated demonstrations. Foundational models are being used to perform skill selection given the user's prompt in natural language. Before executing a skill the foundational model performs a precondition check given an observation of the workspace. We compare the performance of different foundational models to this end and give a detailed experimental evaluation of the skills taught by the user in simulation and the real world. Finally, we showcase the combined system on a challenging food serving scenario in the real world. Videos of all experimental executions, as well as the process of teaching new skills in simulation and the real world, are available on the project's website(1).

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
IEEE RO-MAN, ISSN 1944-9445
National Category
Algebra and Logic
Identifiers
urn:nbn:se:kth:diva-358777 (URN)10.1109/RO-MAN60168.2024.10731242 (DOI)001348918600099 ()2-s2.0-85209783266 (Scopus ID)
Conference
33rd IEEE International Conference on Robot and Human Interactive Communication (IEEE RO-MAN) - Embracing Human-Centered HRI, AUG 26-30, 2024, Pasadena, CA
Note

Part of ISBN 979-8-3503-7503-9; 979-8-3503-7502-2

QC 20250122

Available from: 2025-01-22 Created: 2025-01-22 Last updated: 2025-03-12Bibliographically approved
Medbouhi, A. A., Marchetti, G. L., Polianskii, V., Kravberg, A., Poklukar, P., Varava, A. & Kragic, D. (2024). Hyperbolic Delaunay Geometric Alignment. In: Bifet, A Krilavicius, T Davis, J Kull, M Ntoutsi, E Zliobaite, I (Ed.), Machine learning and knowledge discovery in databases: Research track, pt iii, ECML PKDD 2024. Paper presented at Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), SEP 09-13, 2024, Vilnius, Lithuania (pp. 111-126). Springer Nature
Open this publication in new window or tab >>Hyperbolic Delaunay Geometric Alignment
Show others...
2024 (English)In: Machine learning and knowledge discovery in databases: Research track, pt iii, ECML PKDD 2024 / [ed] Bifet, A Krilavicius, T Davis, J Kull, M Ntoutsi, E Zliobaite, I, Springer Nature , 2024, p. 111-126Conference paper, Published paper (Refereed)
Abstract [en]

Hyperbolic machine learning is an emerging field aimed at representing data with a hierarchical structure. However, there is a lack of tools for evaluation and analysis of the resulting hyperbolic data representations. To this end, we propose Hyperbolic Delaunay Geometric Alignment (HyperDGA) - a similarity score for comparing datasets in a hyperbolic space. The core idea is counting the edges of the hyperbolic Delaunay graph connecting datapoints across the given sets. We provide an empirical investigation on synthetic and real-life biological data and demonstrate that HyperDGA outperforms the hyperbolic version of classical distances between sets. Furthermore, we showcase the potential of HyperDGA for evaluating latent representations inferred by a Hyperbolic Variational Auto-Encoder.

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Artificial Intelligence, ISSN 2945-9133 ; 14943
Keywords
Hyperbolic Geometry, Hierarchical Data, Evaluation
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-355149 (URN)10.1007/978-3-031-70352-2_7 (DOI)001308375900007 ()
Conference
Joint European Conference on Machine Learning and Knowledge Discovery in Databases (ECML PKDD), SEP 09-13, 2024, Vilnius, Lithuania
Note

Part of ISBN: 978-3-031-70351-5, 978-3-031-70352-2

QC 20241025

Available from: 2024-10-25 Created: 2024-10-25 Last updated: 2024-10-25Bibliographically approved
Weng, Z., Zhou, P., Yin, H., Kravchenko, A., Varava, A., Navarro-Alarcon, D. & Kragic, D. (2024). Interactive Perception for Deformable Object Manipulation. IEEE Robotics and Automation Letters, 9(9), 7763-7770
Open this publication in new window or tab >>Interactive Perception for Deformable Object Manipulation
Show others...
2024 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 9, no 9, p. 7763-7770Article in journal (Refereed) Published
Abstract [en]

Interactive perception enables robots to manipulate the environment and objects to bring them into states that benefit the perception process. Deformable objects pose challenges to this due to manipulation difficulty and occlusion in vision-based perception. In this work, we address such a problem with a setup involving both an active camera and an object manipulator. Our approach is based on a sequential decision-making framework and explicitly considers the motion regularity and structure in coupling the camera and manipulator. We contribute a method for constructing and computing a subspace, called Dynamic Active Vision Space (DAVS), for effectively utilizing the regularity in motion exploration. The effectiveness of the framework and approach are validated in both a simulation and a real dual-arm robot setup. Our results confirm the necessity of an active camera and coordinative motion in interactive perception for deformable objects.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Cameras, Manifolds, IP networks, End effectors, Task analysis, Couplings, Robot kinematics, Perception for grasping and manipulation, perception-action coupling, manipulation planning
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-352106 (URN)10.1109/LRA.2024.3431943 (DOI)001283670800004 ()2-s2.0-85199505576 (Scopus ID)
Note

QC 20240822

Available from: 2024-08-22 Created: 2024-08-22 Last updated: 2025-05-14Bibliographically approved
Kravberg, A., Devaurs, D., Varava, A., Kavraki, L. E. & Kragic, D. (2024). MoleQCage: Geometric High-Throughput Screening for Molecular Caging Prediction. Journal of Chemical Information and Modeling, 64(24), 9034-9039
Open this publication in new window or tab >>MoleQCage: Geometric High-Throughput Screening for Molecular Caging Prediction
Show others...
2024 (English)In: Journal of Chemical Information and Modeling, ISSN 1549-9596, E-ISSN 1549-960X, Vol. 64, no 24, p. 9034-9039Article in journal (Refereed) Published
Abstract [en]

Although being able to determine whether a host molecule can enclose a guest molecule and form a caging complex could benefit numerous chemical and medical applications, the experimental discovery of molecular caging complexes has not yet been achieved at scale. Here, we propose MoleQCage, a simple tool for the high-throughput screening of host and guest candidates based on an efficient robotics-inspired geometric algorithm for molecular caging prediction, providing theoretical guarantees and robustness assessment. MoleQCage is distributed as Linux-based software with a graphical user interface and is available online at https://hub.docker.com/r/dantrigne/moleqcage in the form of a Docker container. Documentation and examples are available as Supporting Information and online at https://hub.docker.com/r/dantrigne/moleqcage.

Place, publisher, year, edition, pages
American Chemical Society (ACS), 2024
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-367341 (URN)10.1021/acs.jcim.4c01419 (DOI)001376043000001 ()39665285 (PubMedID)2-s2.0-85211987438 (Scopus ID)
Note

QC 20250716

Available from: 2025-07-16 Created: 2025-07-16 Last updated: 2025-07-16Bibliographically approved
Marchetti, G. L., Polianskii, V., Varava, A., Pokorny, F. T. & Kragic, D. (2023). An Efficient and Continuous Voronoi Density Estimator. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023: . Paper presented at 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023, Valencia, Spain, Apr 25 2023 - Apr 27 2023 (pp. 4732-4744). ML Research Press
Open this publication in new window or tab >>An Efficient and Continuous Voronoi Density Estimator
Show others...
2023 (English)In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023, ML Research Press , 2023, p. 4732-4744Conference paper, Published paper (Refereed)
Abstract [en]

We introduce a non-parametric density estimator deemed Radial Voronoi Density Estimator (RVDE). RVDE is grounded in the geometry of Voronoi tessellations and as such benefits from local geometric adaptiveness and broad convergence properties. Due to its radial definition RVDE is continuous and computable in linear time with respect to the dataset size. This amends for the main shortcomings of previously studied VDEs, which are highly discontinuous and computationally expensive. We provide a theoretical study of the modes of RVDE as well as an empirical investigation of its performance on high-dimensional data. Results show that RVDE outperforms other non-parametric density estimators, including recently introduced VDEs.

Place, publisher, year, edition, pages
ML Research Press, 2023
Series
Proceedings of Machine Learning Research, ISSN 2640-3498, ; 206
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-334436 (URN)001222727704044 ()2-s2.0-85165187458 (Scopus ID)
Conference
26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023, Valencia, Spain, Apr 25 2023 - Apr 27 2023
Note

QC 20241204

Available from: 2023-08-21 Created: 2023-08-21 Last updated: 2025-02-07Bibliographically approved
Lippi, M., Poklukar, P., Welle, M. C., Varava, A., Yin, H., Marino, A. & Kragic, D. (2023). Enabling Visual Action Planning for Object Manipulation Through Latent Space Roadmap. IEEE Transactions on robotics, 39(1), 57-75
Open this publication in new window or tab >>Enabling Visual Action Planning for Object Manipulation Through Latent Space Roadmap
Show others...
2023 (English)In: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 39, no 1, p. 57-75Article in journal (Refereed) Published
Abstract [en]

In this article, we present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces, focusing on manipulation of deformable objects. We propose a latent space roadmap (LSR) for task planning, which is a graph-based structure globally capturing the system dynamics in a low-dimensional latent space. Our framework consists of the following three parts. First, a mapping module (MM) that maps observations is given in the form of images into a structured latent space extracting the respective states as well as generates observations from the latent states. Second, the LSR, which builds and connects clusters containing similar states in order to find the latent plans between start and goal states, extracted by MM. Third, the action proposal module that complements the latent plan found by the LSR with the corresponding actions. We present a thorough investigation of our framework on simulated box stacking and rope/box manipulation tasks, and a folding task executed on a real robot. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Deep Learning in Robotics and Automation, Latent Space Planning, Manipulation Planning, Visual Learning, Deep learning, Graphic methods, Job analysis, Planning, Robot programming, Action planning, Deep learning in robotic and automation, Heuristics algorithm, Roadmap, Space planning, Stackings, Task analysis, Heuristic algorithms
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-326180 (URN)10.1109/TRO.2022.3188163 (DOI)000829072000001 ()2-s2.0-85135223386 (Scopus ID)
Note

QC 20230502

Available from: 2023-05-02 Created: 2023-05-02 Last updated: 2025-02-09Bibliographically approved
Marchetti, G. L., Tegner, G., Varava, A. & Kragic, D. (2023). Equivariant Representation Learning via Class-Pose Decomposition. In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023: . Paper presented at 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023, Valencia, Spain, Apr 25 2023 - Apr 27 2023 (pp. 4745-4756). ML Research Press, 206
Open this publication in new window or tab >>Equivariant Representation Learning via Class-Pose Decomposition
2023 (English)In: Proceedings of the 26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023, ML Research Press , 2023, Vol. 206, p. 4745-4756Conference paper, Published paper (Refereed)
Abstract [en]

We introduce a general method for learning representations that are equivariant to symmetries of data. Our central idea is to decompose the latent space into an invariant factor and the symmetry group itself. The components semantically correspond to intrinsic data classes and poses respectively. The learner is trained on a loss encouraging equivariance based on supervision from relative symmetry information. The approach is motivated by theoretical results from group theory and guarantees representations that are lossless, interpretable and disentangled. We provide an empirical investigation via experiments involving datasets with a variety of symmetries. Results show that our representations capture the geometry of data and outperform other equivariant representation learning frameworks.

Place, publisher, year, edition, pages
ML Research Press, 2023
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-334435 (URN)001222727704045 ()2-s2.0-85165155542 (Scopus ID)
Conference
26th International Conference on Artificial Intelligence and Statistics, AISTATS 2023, Valencia, Spain, Apr 25 2023 - Apr 27 2023
Note

QC 20241204

Available from: 2023-08-21 Created: 2023-08-21 Last updated: 2025-02-09Bibliographically approved
Medbouhi, A. A., Polianskii, V., Varava, A. & Kragic, D. (2023). InvMap and Witness Simplicial Variational Auto-Encoders. MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 5(1), 199-236
Open this publication in new window or tab >>InvMap and Witness Simplicial Variational Auto-Encoders
2023 (English)In: MACHINE LEARNING AND KNOWLEDGE EXTRACTION, ISSN 2504-4990, Vol. 5, no 1, p. 199-236Article in journal (Refereed) Published
Abstract [en]

Variational auto-encoders (VAEs) are deep generative models used for unsupervised learning, however their standard version is not topology-aware in practice since the data topology may not be taken into consideration. In this paper, we propose two different approaches with the aim to preserve the topological structure between the input space and the latent representation of a VAE. Firstly, we introduce InvMap-VAE as a way to turn any dimensionality reduction technique, given an embedding it produces, into a generative model within a VAE framework providing an inverse mapping into original space. Secondly, we propose the Witness Simplicial VAE as an extension of the simplicial auto-encoder to the variational setup using a witness complex for computing the simplicial regularization, and we motivate this method theoretically using tools from algebraic topology. The Witness Simplicial VAE is independent of any dimensionality reduction technique and together with its extension, Isolandmarks Witness Simplicial VAE, preserves the persistent Betti numbers of a dataset better than a standard VAE.

Place, publisher, year, edition, pages
MDPI AG, 2023
Keywords
variational auto-encoder, topological machine learning, non-linear dimensionality reduction, topological data analysis, data visualization, representation learning, Betti number, persistence homology, simplicial complex, simplicial regularization
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-326074 (URN)10.3390/make5010014 (DOI)000957769300001 ()2-s2.0-85150984631 (Scopus ID)
Note

QC 20230425

Available from: 2023-04-25 Created: 2023-04-25 Last updated: 2023-04-25Bibliographically approved
Reichlin, A., Marchetti, G. L., Yin, H., Varava, A. & Kragic, D. (2023). Learning Geometric Representations of Objects via Interaction. In: Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference, ECML PKDD 2023, Proceedings: . Paper presented at European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023, Turin, Italy, Sep 18 2023 - Sep 22 2023 (pp. 629-644). Springer Nature
Open this publication in new window or tab >>Learning Geometric Representations of Objects via Interaction
Show others...
2023 (English)In: Machine Learning and Knowledge Discovery in Databases: Research Track - European Conference, ECML PKDD 2023, Proceedings, Springer Nature , 2023, p. 629-644Conference paper, Published paper (Refereed)
Abstract [en]

We address the problem of learning representations from observations of a scene involving an agent and an external object the agent interacts with. To this end, we propose a representation learning framework extracting the location in physical space of both the agent and the object from unstructured observations of arbitrary nature. Our framework relies on the actions performed by the agent as the only source of supervision, while assuming that the object is displaced by the agent via unknown dynamics. We provide a theoretical foundation and formally prove that an ideal learner is guaranteed to infer an isometric representation, disentangling the agent from the object and correctly extracting their locations. We evaluate empirically our framework on a variety of scenarios, showing that it outperforms vision-based approaches such as a state-of-the-art keypoint extractor. We moreover demonstrate how the extracted representations enable the agent to solve downstream tasks via reinforcement learning in an efficient manner.

Place, publisher, year, edition, pages
Springer Nature, 2023
Keywords
Equivariance, Interaction, Representation Learning
National Category
Computer graphics and computer vision Computer Sciences
Identifiers
urn:nbn:se:kth:diva-339271 (URN)10.1007/978-3-031-43421-1_37 (DOI)001156141200037 ()2-s2.0-85174436596 (Scopus ID)
Conference
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2023, Turin, Italy, Sep 18 2023 - Sep 22 2023
Note

Part of ISBN 9783031434204

QC 20231106

Available from: 2023-11-06 Created: 2023-11-06 Last updated: 2025-02-01Bibliographically approved
Kravchenko, A., Marchetti, G. L., Polianskii, V., Varava, A., Pokorny, F. T. & Kragic, D. (2022). Active Nearest Neighbor Regression Through Delaunay Refinement. In: Proceedings of the 39th International Conference on Machine Learning: . Paper presented at 39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 17-23 July, 2022 (pp. 11650-11664). MLResearch Press, 162
Open this publication in new window or tab >>Active Nearest Neighbor Regression Through Delaunay Refinement
Show others...
2022 (English)In: Proceedings of the 39th International Conference on Machine Learning, MLResearch Press , 2022, Vol. 162, p. 11650-11664Conference paper, Published paper (Refereed)
Abstract [en]

We introduce an algorithm for active function approximation based on nearest neighbor regression. Our Active Nearest Neighbor Regressor (ANNR) relies on the Voronoi-Delaunay framework from computational geometry to subdivide the space into cells with constant estimated function value and select novel query points in a way that takes the geometry of the function graph into account. We consider the recent state-of-the-art active function approximator called DEFER, which is based on incremental rectangular partitioning of the space, as the main baseline. The ANNR addresses a number of limitations that arise from the space subdivision strategy used in DEFER. We provide a computationally efficient implementation of our method, as well as theoretical halting guarantees. Empirical results show that ANNR outperforms the baseline for both closed-form functions and real-world examples, such as gravitational wave parameter inference and exploration of the latent space of a generative model.

Place, publisher, year, edition, pages
MLResearch Press, 2022
Series
Proceedings of Machine Learning Research, ISSN 2640-3498 ; 162
National Category
Computer Sciences Control Engineering
Identifiers
urn:nbn:se:kth:diva-319194 (URN)000900064901033 ()2-s2.0-85163127180 (Scopus ID)
Conference
39th International Conference on Machine Learning, Baltimore, Maryland, USA, PMLR 162, 17-23 July, 2022
Note

QC 20230509

Available from: 2022-09-28 Created: 2022-09-28 Last updated: 2024-03-02Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-0900-1523

Search in DiVA

Show all publications