kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Synergies between Policy Learning and Sampling-based Planning
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-1772-7930
2024 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Synergier mellan policyinlärning och sampling-baserad planering (Swedish)
Abstract [en]

Recent advances in artificial intelligence and machine learning have significantly impacted the field of robotics and led to the interdisciplinary study of robot learning. These developments have the potential to revolutionize the automation of tasks in various industries by reducing the reliance on human workers. However, fully autonomous, learning-based robotic systems are still mainly limited to controlled environments. Ideally, we are looking for methods that enable autonomous acquisition of robotic skills for any temporally extended setting with potentially complex sensor observations. Classical sampling-based planning algorithms used in robot motion planning compute feasible paths between robot states over long time horizons and even in geometrically complex environments. This thesis investigates the possibility of combining learning-based methods with these classical approaches to solve challenging problems in robot manipulation, e.g. the manipulation of deformable objects. The core idea is to leverage the best of both worlds and achieve long-horizon control through planning, while using learning to obtain useful environment models from potentially high-dimensional and complex observation data. The presented frameworks rely on recent machine learning techniques such as contrastive representation learning, generative modeling and reinforcement learning. Finally, we outline the potentials, challenges and limitations of this type of approaches and highlight future directions.

Abstract [sv]

De senaste framstegen inom artificiell intelligens och maskininlärning har haft en betydande inverkan på robotikområdet och lett till det tvärvetenskapliga studerandet av robotinlärning. Dessa utvecklingar har potentialen att revolutionera automatiseringen inom olika industrier genom att minska beroendet av mänskliga arbetare. Dock är helt autonoma, inlärningsbaserade robotsystem fortfarande huvudsakligen begränsade till kontrollerade miljöer. Idealt sett letar vi efter metoder som möjliggör autonom förvärvning av robotfärdigheter för situationer med långa tidshorisonter och potentiellt komplexa sensorobservationer. Klassiska sampling-baserade planeringsalgoritmer som används i robotrörelseplanering beräknar genomförbara vägar mellan robottillstånd över långa tidshorisonter och även i geometriskt komplexa miljöer. I detta arbete undersöker vi möjligheten att kombinera inlärningsbaserade tillvägagångssätt med dessa klassiska tillvägagångssätt för att lösa utmanande problem inom robotmanipulation, t.ex. hantering av formbara objekt. Kärnidén är att utnyttja det bästa av båda världarna och uppnå långsiktig kontroll genom planering, samtidigt som man använder inlärning för att erhålla användbara miljömodeller från potentiellt högdimensionella och komplexa observationsdata. De presenterade ramverken förlitar sig på senaste maskininlärningstekniker såsom kontrastiv representationsinlärning, generativ modellering och förstärkningsinlärning. Slutligen skisserar vi potentialerna, utmaningarna och begränsningarna med denna typ av tillvägagångssätt och belyser framtida riktningar.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2024. , p. ix, 54
Series
TRITA-EECS-AVL ; 2024:6
Keywords [en]
Machine Learning, Robotics, Reinforcement Learning, Motion Planning, Robotic Manipulation
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-341911ISBN: 978-91-8040-803-5 (print)OAI: oai:DiVA.org:kth-341911DiVA, id: diva2:1824523
Public defence
2024-01-30, https://kth-se.zoom.us/j/63888939859, F3 (Flodis), Lindstedtsvägen 26 & 28, Stockholm, 15:00 (English)
Opponent
Supervisors
Note

QC 20240108

Available from: 2024-01-08 Created: 2024-01-05 Last updated: 2025-02-07Bibliographically approved
List of papers
1. ReForm: A Robot Learning Sandbox for Deformable Linear Object Manipulation
Open this publication in new window or tab >>ReForm: A Robot Learning Sandbox for Deformable Linear Object Manipulation
2021 (English)In: 2021 IEEE International Conference on Robotics and Automation (ICRA), Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 4717-4723Conference paper, Published paper (Refereed)
Abstract [en]

Recent advances in machine learning have triggered an enormous interest in using learning-based approaches for robot control and object manipulation. While the majority of existing algorithms are evaluated under the assumption that the involved bodies are rigid, a large number of practical applications contain deformable objects. In this work we focus on Deformable Linear Objects (DLOs) which can be used to model cables, tubes or wires. They are present in many applications such as manufacturing, agriculture and medicine. New methods in robotic manipulation research are often demonstrated in custom environments impeding reproducibility and comparisons of algorithms. We introduce ReForm, a simulation sandbox and a tool for benchmarking manipulation of DLOs. We offer six distinct environments representing important characteristics of deformable objects such as elasticity, plasticity or self-collisions and occlusions. A modular framework is used, enabling design parameters such as the end-effector degrees of freedom, reward function and type of observation. ReForm is a novel robot learning sandbox with which we intend to facilitate testing and reproducibility in manipulation research for DLOs.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Series
IEEE International Conference on Robotics and Automation ICRA, ISSN 1050-4729
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-311663 (URN)10.1109/ICRA48506.2021.9561766 (DOI)000765738803098 ()2-s2.0-85116802097 (Scopus ID)
Conference
IEEE International Conference on Robotics and Automation (ICRA), MAY 30-JUN 05, 2021, Xian, China
Note

Part of proceedings: ISBN 978-1-7281-9077-8

QC 20220503

Available from: 2022-05-03 Created: 2022-05-03 Last updated: 2025-02-09Bibliographically approved
2. Planning-Augmented Hierarchical Reinforcement Learning
Open this publication in new window or tab >>Planning-Augmented Hierarchical Reinforcement Learning
2021 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 6, no 3, p. 5097-5104Article in journal (Refereed) Published
Abstract [en]

Planning algorithms are powerful at solving long-horizon decision-making problems but require that environment dynamics are known. Model-free reinforcement learning has recently been merged with graph-based planning to increase the robustness of trained policies in state-space navigation problems. Recent ideas suggest to use planning in order to provide intermediate waypoints guiding the policy in long-horizon tasks. Yet, it is not always practical to describe a problem in the setting of state-to-state navigation. Often, the goal is defined by one or multiple disjoint sets of valid states or implicitly using an abstract task description. Building upon previous efforts, we introduce a novel algorithm called Planning-Augmented Hierarchical Reinforcement Learning (PAHRL) which translates the concept of hybrid planning/RL to such problems with implicitly defined goal. Using a hierarchical framework, we divide the original task, formulated as a Markov Decision Process (MDP), into a hierarchy of shorter horizon MDPs. Actor-critic agents are trained in parallel for each level of the hierarchy. During testing, a planner then determines useful subgoals on a state graph constructed at the bottom level of the hierarchy. The effectiveness of our approach is demonstrated for a set of continuous control problems in simulation including robot arm reaching tasks and the manipulation of a deformable object.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Machine learning for robot control, Motion and path planning, Reinforcement learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-295823 (URN)10.1109/LRA.2021.3071062 (DOI)000642765100014 ()2-s2.0-85103879789 (Scopus ID)
Note

QC 20210602

Available from: 2021-06-02 Created: 2021-06-02 Last updated: 2024-03-18Bibliographically approved
3. Latent Planning via Expansive Tree Search
Open this publication in new window or tab >>Latent Planning via Expansive Tree Search
2022 (English)In: Advances in Neural Information Processing Systems 35 - 36th Conference on Neural Information Processing Systems, NeurIPS 2022, Neural Information Processing Systems Foundation , 2022Conference paper, Published paper (Refereed)
Abstract [en]

Planning enables autonomous agents to solve complex decision-making problems by evaluating predictions of the future. However, classical planning algorithms often become infeasible in real-world settings where state spaces are high-dimensional and transition dynamics unknown. The idea behind latent planning is to simplify the decision-making task by mapping it to a lower-dimensional embedding space. Common latent planning strategies are based on trajectory optimization techniques such as shooting or collocation, which are prone to failure in long-horizon and highly non-convex settings. In this work, we study long-horizon goal-reaching scenarios from visual inputs and formulate latent planning as an explorative tree search. Inspired by classical sampling-based motion planning algorithms, we design a method which iteratively grows and optimizes a tree representation of visited areas of the latent space. To encourage fast exploration, the sampling of new states is biased towards sparsely represented regions within the estimated data support. Our method, called Expansive Latent Space Trees (ELAST), relies on self-supervised training via contrastive learning to obtain (a) a latent state representation and (b) a latent transition density model. We embed ELAST into a model-predictive control scheme and demonstrate significant performance improvements compared to existing baselines given challenging visual control tasks in simulation, including the navigation for a deformable object.

Place, publisher, year, edition, pages
Neural Information Processing Systems Foundation, 2022
Series
Advances in Neural Information Processing Systems, ISSN 1049-5258 ; 35
National Category
Robotics and automation Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-331664 (URN)2-s2.0-85163176952 (Scopus ID)
Conference
36th Conference on Neural Information Processing Systems, NeurIPS 2022, New Orleans, United States of America, Nov 28 2022 - Dec 9 2022
Note

Part of ISBN 9781713871088

QC 20230712

Available from: 2023-07-13 Created: 2023-07-13 Last updated: 2025-02-05Bibliographically approved
4. Expansive Latent Planning for Sparse Reward Offline Reinforcement Learning
Open this publication in new window or tab >>Expansive Latent Planning for Sparse Reward Offline Reinforcement Learning
2023 (English)In: Proceedings of The 7th Conference on Robot Learning, Proceedings of Machine Learning Research , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Sampling-based motion planning algorithms excel at searching global solution paths in geometrically complex settings. However, classical approaches, such as RRT, are difficult to scale beyond low-dimensional search spaces and rely on privileged knowledge e.g. about collision detection and underlying state distances. In this work, we take a step towards the integration of sampling-based planning into the reinforcement learning framework to solve sparse-reward control tasks from high-dimensional inputs. Our method, called VELAP, determines sequences of waypoints through sampling-based exploration in a learned state embedding. Unlike other sampling-based techniques, we iteratively expand a tree-based memory of visited latent areas, which is leveraged to explore a larger portion of the latent space for a given number of search iterations. We demonstrate state-of-the-art results in learning control from offline data in the context of vision-based manipulation under sparse reward feedback. Our method extends the set of available planning tools in model-based reinforcement learning by adding a latent planner that searches globally for feasible paths instead of being bound to a fixed prediction horizon. 

Place, publisher, year, edition, pages
Proceedings of Machine Learning Research, 2023
National Category
Computer graphics and computer vision Robotics and automation
Identifiers
urn:nbn:se:kth:diva-341581 (URN)001221201500001 ()2-s2.0-85184350420 (Scopus ID)
Conference
The 7th Conference on Robot Learning, Atlanta, GA, Nov 6-9, 2023
Note

QC 20231227

Available from: 2023-12-22 Created: 2023-12-22 Last updated: 2025-02-05Bibliographically approved

Open Access in DiVA

Kappa(4021 kB)488 downloads
File information
File name FULLTEXT01.pdfFile size 4021 kBChecksum SHA-512
1a6573a890731d099cc1c6b63dc688404aa2cf2fbbe8ead8519fe496d3c662cc36341d0af8b4b8344561f4c3b34226a528d6e9315933cd7a561d10102a3383f4
Type fulltextMimetype application/pdf

Authority records

Gieselmann, Robert

Search in DiVA

By author/editor
Gieselmann, Robert
By organisation
Robotics, Perception and Learning, RPL
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar
Total: 488 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1889 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf