kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Expansive Latent Planning for Sparse Reward Offline Reinforcement Learning
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-1772-7930
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0003-1114-6040
2023 (English)In: Proceedings of The 7th Conference on Robot Learning, Proceedings of Machine Learning Research , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Sampling-based motion planning algorithms excel at searching global solution paths in geometrically complex settings. However, classical approaches, such as RRT, are difficult to scale beyond low-dimensional search spaces and rely on privileged knowledge e.g. about collision detection and underlying state distances. In this work, we take a step towards the integration of sampling-based planning into the reinforcement learning framework to solve sparse-reward control tasks from high-dimensional inputs. Our method, called VELAP, determines sequences of waypoints through sampling-based exploration in a learned state embedding. Unlike other sampling-based techniques, we iteratively expand a tree-based memory of visited latent areas, which is leveraged to explore a larger portion of the latent space for a given number of search iterations. We demonstrate state-of-the-art results in learning control from offline data in the context of vision-based manipulation under sparse reward feedback. Our method extends the set of available planning tools in model-based reinforcement learning by adding a latent planner that searches globally for feasible paths instead of being bound to a fixed prediction horizon. 

Place, publisher, year, edition, pages
Proceedings of Machine Learning Research , 2023.
National Category
Computer graphics and computer vision Robotics and automation
Identifiers
URN: urn:nbn:se:kth:diva-341581ISI: 001221201500001Scopus ID: 2-s2.0-85184350420OAI: oai:DiVA.org:kth-341581DiVA, id: diva2:1822390
Conference
The 7th Conference on Robot Learning, Atlanta, GA, Nov 6-9, 2023
Note

QC 20231227

Available from: 2023-12-22 Created: 2023-12-22 Last updated: 2025-02-05Bibliographically approved
In thesis
1. Synergies between Policy Learning and Sampling-based Planning
Open this publication in new window or tab >>Synergies between Policy Learning and Sampling-based Planning
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Synergier mellan policyinlärning och sampling-baserad planering
Abstract [en]

Recent advances in artificial intelligence and machine learning have significantly impacted the field of robotics and led to the interdisciplinary study of robot learning. These developments have the potential to revolutionize the automation of tasks in various industries by reducing the reliance on human workers. However, fully autonomous, learning-based robotic systems are still mainly limited to controlled environments. Ideally, we are looking for methods that enable autonomous acquisition of robotic skills for any temporally extended setting with potentially complex sensor observations. Classical sampling-based planning algorithms used in robot motion planning compute feasible paths between robot states over long time horizons and even in geometrically complex environments. This thesis investigates the possibility of combining learning-based methods with these classical approaches to solve challenging problems in robot manipulation, e.g. the manipulation of deformable objects. The core idea is to leverage the best of both worlds and achieve long-horizon control through planning, while using learning to obtain useful environment models from potentially high-dimensional and complex observation data. The presented frameworks rely on recent machine learning techniques such as contrastive representation learning, generative modeling and reinforcement learning. Finally, we outline the potentials, challenges and limitations of this type of approaches and highlight future directions.

Abstract [sv]

De senaste framstegen inom artificiell intelligens och maskininlärning har haft en betydande inverkan på robotikområdet och lett till det tvärvetenskapliga studerandet av robotinlärning. Dessa utvecklingar har potentialen att revolutionera automatiseringen inom olika industrier genom att minska beroendet av mänskliga arbetare. Dock är helt autonoma, inlärningsbaserade robotsystem fortfarande huvudsakligen begränsade till kontrollerade miljöer. Idealt sett letar vi efter metoder som möjliggör autonom förvärvning av robotfärdigheter för situationer med långa tidshorisonter och potentiellt komplexa sensorobservationer. Klassiska sampling-baserade planeringsalgoritmer som används i robotrörelseplanering beräknar genomförbara vägar mellan robottillstånd över långa tidshorisonter och även i geometriskt komplexa miljöer. I detta arbete undersöker vi möjligheten att kombinera inlärningsbaserade tillvägagångssätt med dessa klassiska tillvägagångssätt för att lösa utmanande problem inom robotmanipulation, t.ex. hantering av formbara objekt. Kärnidén är att utnyttja det bästa av båda världarna och uppnå långsiktig kontroll genom planering, samtidigt som man använder inlärning för att erhålla användbara miljömodeller från potentiellt högdimensionella och komplexa observationsdata. De presenterade ramverken förlitar sig på senaste maskininlärningstekniker såsom kontrastiv representationsinlärning, generativ modellering och förstärkningsinlärning. Slutligen skisserar vi potentialerna, utmaningarna och begränsningarna med denna typ av tillvägagångssätt och belyser framtida riktningar.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2024. p. ix, 54
Series
TRITA-EECS-AVL ; 2024:6
Keywords
Machine Learning, Robotics, Reinforcement Learning, Motion Planning, Robotic Manipulation
National Category
Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-341911 (URN)978-91-8040-803-5 (ISBN)
Public defence
2024-01-30, https://kth-se.zoom.us/j/63888939859, F3 (Flodis), Lindstedtsvägen 26 & 28, Stockholm, 15:00 (English)
Opponent
Supervisors
Note

QC 20240108

Available from: 2024-01-08 Created: 2024-01-05 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

ScopusPaper

Authority records

Gieselmann, RobertPokorny, Florian T.

Search in DiVA

By author/editor
Gieselmann, RobertPokorny, Florian T.
By organisation
Robotics, Perception and Learning, RPL
Computer graphics and computer visionRobotics and automation

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 80 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf