kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Bayesian optimization in variational latent spaces with dynamic compression
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-3018-2445
Facebook AI Research.
Facebook AI Research.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0003-2965-2953
2020 (English)In: Proceedings of Machine Learning Research: Volume 100: Proceedings of the 3rd Annual Conference on Robot Learning (CoRL), 2020, Vol. 100, p. 456-465Conference paper, Published paper (Refereed)
Abstract [en]

Data-efficiency is crucial for autonomous robots to adapt to new tasks and environments. In this work, we focus on robotics problems with a budget of only 10-20 trials. This is a very challenging setting even for data- efficient approaches like Bayesian optimization (BO), especially when optimizing higher-dimensional controllers. Previous work extracted expert-designed low-dimensional features from simulation trajectories to construct informed kernels and run ultra sample-efficient BO on hardware. We remove the need for expert-designed features by proposing a model and architecture for a sequential variational autoencoder that embeds the space of simulated trajectories into a lower-dimensional space of latent paths in an unsupervised way. We further compress the search space for BO by reducing exploration in parts of the state space that are undesirable, without requiring explicit constraints on controller parameters. We validate our approach with hardware experiments on a Daisy hexapod robot and an ABB Yumi manipulator. We also present simulation experiments with further comparisons to several baselines on Daisy and two manipulators. Our experiments indicate the proposed trajectory-based kernel with dynamic compression can offer ultra data-efficient optimization.

Place, publisher, year, edition, pages
2020. Vol. 100, p. 456-465
Keywords [en]
Bayesian Optimization, Data-efficient Reinforcement Learning, Variational Inference
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-278759OAI: oai:DiVA.org:kth-278759DiVA, id: diva2:1455577
Conference
Conference on Robot Learning (CoRL)
Funder
Knut and Alice Wallenberg Foundation
Note

QC 20210120

Available from: 2020-07-27 Created: 2020-07-27 Last updated: 2025-02-09Bibliographically approved
In thesis
1. Transfer-Aware Kernels, Priors and Latent Spaces from Simulation to Real Robots
Open this publication in new window or tab >>Transfer-Aware Kernels, Priors and Latent Spaces from Simulation to Real Robots
2020 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Consider challenging sim-to-real cases lacking high-fidelity simulators and allowing only 10-20 hardware trials. This work shows that even imprecise simulation can be beneficial if used to build transfer-aware representations.

First, the thesis introduces an informed kernel that embeds the space of simulated trajectories into a lower-dimensional space of latent paths. It uses a sequential variational autoencoder (sVAE) to handle large-scale training from simulated data. Its modular design enables quick adaptation when used for Bayesian optimization (BO) on hardware. The thesis and the included publications demonstrate that this approach works for different areas of robotics: locomotion and manipulation. Furthermore, a variant of BO that ensures recovery from negative transfer when using corrupted kernels is introduced. An application to task-oriented grasping validates its performance on hardware.

For the case of parametric learning, simulators can serve as priors or regularizers. This work describes how to use simulation to regularize a VAE's decoder to bind the VAE's latent space to simulator parameter posterior. With that, training on a small number of real trajectories can quickly shift the posterior to reflect reality. The included publication demonstrates that this approach can also help reinforcement learning (RL) quickly overcome the sim-to-real gap on a manipulation task on hardware.

A longer-term vision is to shape latent spaces without needing to mandate a particular simulation scenario. A first step is to learn general relations that hold on sequences of states from a set of related domains. This work introduces a unifying mathematical formulation for learning independent analytic relations. Relations are learned from source domains, then used to help structure the latent space when learning on target domains. This formulation enables a more general, flexible and principled way of shaping the latent space. It formalizes the notion of learning independent relations, without imposing restrictive simplifying assumptions or requiring domain-specific information. This work presents mathematical properties, concrete algorithms and experimental validation of successful learning and transfer of latent relations.

Abstract [sv]

Betänk komplicerade fall av simulering-till-verklighet där det saknas simulatorer med hög precision och endast 10-20 hårdvaruförsök tillåts. Detta arbete visar att även oprecis simulering kan vara till nytta i dessa fall, om det används för att skapa överföringsbara representationer.

Avhandlingen introducerar först en informerad kärna som bäddar in rummet av simulerade trajektorier i ett lågdimensionellt rum med latenta banor. Denna använder en så kallad sekventiell variational autoencoder (sVAE) för att hantera storskalig träning utifrån simulerade data. Dess modulära design medför snabb anpassning till den nya domänen då den används för Bayesiansk optimering (BO) på verklig hårdvara. Avhandlingen och de inkluderade publikationerna visar att denna metod fungerar för flera olika områden inom robotik: rörelse och manipulation av objekt. Dessutom introduceras en variant av BO som garanterar återhämtning från negativ överföring om korrupta kärnor används. En tillämpning inom uppgiftsanpassade handgrepp bekräftar metodens prestanda på hårdvara.

När det gäller parametrisk inlärning, kan simulatorer tjäna som apriorifördelningar eller regulariserare. Detta arbete beskriver hur man kan använda simulering för att regularisera en VAEs avkodare för att koppla ihop det latenta VAE rummet till simuleringsparametrarnas aposteriorifördelning. I och med detta kan träning på ett litet antal verkliga banor snabbt anpassa aposteriorifördelningen till att återspegla verkligheten. Den inkluderade publikationen demonstrerar att detta tillvägagångssätt också kan hjälpa så kallad förstärkningsinlärning (RL) att snabbt överbrygga gapet mellan simulering och verklighet för en manipulationsuppgift på hårdvara.

En långsiktig vision är att skapa latenta rum utan att behöva förutsätta ett specifikt simuleringsscenario. Ett första steg är att lära in generella relationer som håller för sekvenser av tillstånd i en mängd angränsande domäner. Detta arbete introducerar en enhetlig matematisk formulering för att lära in oberoende analytiska relationer. Relationerna lärs in från källdomäner och används sedan för att strukturera det latenta rummet under inlärning i måldomänen. Denna formulering medger ett mer generellt, flexibelt och principiellt sätt att skapa det latenta rummet. Det formaliserar idén om inlärning av oberoende relationer utan att påtvinga begränsande antaganden eller krav på domänspecifik information. Detta arbete presenterar matematiska egenskaper, konkreta algoritmer och experimentell utvärdering av framgångsrik träning och överföring av latenta relationer.

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2020. p. 198
Series
TRITA-EECS-AVL ; 54
National Category
Robotics and automation Computer graphics and computer vision
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-284138 (URN)
Public defence
2020-11-20, F3, Lindstedtsvägen 26, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20201019

Available from: 2020-10-19 Created: 2020-10-15 Last updated: 2025-02-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

PMLR

Authority records

Antonova, RikaKragic, Danica

Search in DiVA

By author/editor
Antonova, RikaKragic, Danica
By organisation
Robotics, Perception and Learning, RPL
Robotics and automation

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 412 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf