kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Generative models for action generation and action understanding
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0001-5344-8042
2019 (English)Doctoral thesis, comprehensive summary (Other academic)Alternative title
Generativa modeller för generering och förståelse av mänsklig aktivitet (Swedish)
Abstract [en]

The question of how to build intelligent machines raises the question of how to rep-resent the world to enable intelligent behavior. In nature, this representation relies onthe interplay between an organism’s sensory input and motor input. Action-perceptionloops allow many complex behaviors to arise naturally. In this work, we take these sen-sorimotor contingencies as an inspiration to build robot systems that can autonomouslyinteract with their environment and with humans. The goal is to pave the way for robotsystems that can learn motor control in an unsupervised fashion and relate their ownsensorimotor experience to observed human actions. By combining action generationand action understanding we hope to facilitate smooth and intuitive interaction betweenrobots and humans in shared work spaces.To model robot sensorimotor contingencies and human behavior we employ gen-erative models. Since generative models represent a joint distribution over relevantvariables, they are flexible enough to cover the range of tasks that we are tacklinghere. Generative models can represent variables that originate from multiple modali-ties, model temporal dynamics, incorporate latent variables and represent uncertaintyover any variable - all of which are features required to model sensorimotor contin-gencies. By using generative models, we can predict the temporal development of thevariables in the future, which is important for intelligent action selection.We present two lines of work. Firstly, we will focus on unsupervised learning ofmotor control with help of sensorimotor contingencies. Based on Gaussian Processforward models we demonstrate how the robot can execute goal-directed actions withthe help of planning techniques or reinforcement learning. Secondly, we present anumber of approaches to model human activity, ranging from pure unsupervised mo-tion prediction to including semantic action and affordance labels. Here we employdeep generative models, namely Variational Autoencoders, to model the 3D skeletalpose of humans over time and, if required, include semantic information. These twolines of work are then combined to implement physical human-robot interaction tasks.Our experiments focus on real-time applications, both when it comes to robot ex-periments and human activity modeling. Since many real-world scenarios do not haveaccess to high-end sensors, we require our models to cope with uncertainty. Additionalrequirements are data-efficient learning, because of the wear and tear of the robot andhuman involvement, online employability and operation under safety and complianceconstraints. We demonstrate how generative models of sensorimotor contingencies canhandle these requirements in our experiments satisfyingly.

Abstract [sv]

Frågan om hur man bygger intelligenta maskiner väcker frågan om hur man kanrepresentera världen för att möjliggöra intelligent beteende. I naturen bygger en sådanrepresentation på samspelet mellan en organisms sensoriska intryck och handlingar.Kopplingar mellan sinnesintryck och handlingar gör att många komplexa beteendenkan uppstå naturligt. I detta arbete tar vi dessa sensorimotoriska kopplingar som eninspiration för att bygga robotarsystem som autonomt kan interagera med sin miljöoch med människor. Målet är att bana väg för robotarsystem som självständiga kan lärasig att kontrollera sina rörelser och relatera sina egen sensorimotoriska upplevelser tillobserverade mänskliga handlingar. Genom att relatera robotens rörelser och förståelsenav mänskliga handlingar, hoppas vi kunna underlätta smidig och intuitiv interaktionmellan robotar och människor.För att modellera robotens sensimotoriska kopplingar och mänskligt beteende an-vänder vi generativa modeller. Eftersom generativa modeller representerar en multiva-riat fördelning över relevanta variabler, är de tillräckligt flexibla för att uppfylla demkrav som vi ställer här. Generativa modeller kan representera variabler från olika mo-daliteter, modellera temporala dynamiska system, modellera latenta variabler och re-presentera variablers varians - alla dessa egenskaper är nödvändiga för att modellerasensorimotoriska kopplingar. Genom att använda generativa modeller kan vi förutseutvecklingen av variablerna i framtiden, vilket är viktigt för att ta intelligenta beslut.Vi presenterar arbete som går i två riktningar. För det första kommer vi att fokuserapå självständig inlärande av rörelse kontroll med hjälp av sensorimotoriska kopplingar.Baserat på Gaussian Process forward modeller visar vi hur roboten kan röra på sigmot ett mål med hjälp av planeringstekniker eller förstärkningslärande. För det andrapresenterar vi ett antal tillvägagångssätt för att modellera mänsklig aktivitet, allt frånatt förutse hur människan kommer röra på sig till att inkludera semantisk information.Här använder vi djupa generativa modeller, nämligen Variational Autoencoders, föratt modellera 3D-skelettpositionen av människor över tid och, om så krävs, inkluderasemantisk information. Dessa två ideer kombineras sedan för att hjälpa roboten attinteragera med människan.Våra experiment fokuserar på realtidsscenarion, både när det gäller robot experi-ment och mänsklig aktivitet modellering. Eftersom många verkliga scenarier inte hartillgång till avancerade sensorer, kräver vi att våra modeller hanterar osäkerhet. Yt-terligare krav är maskininlärningsmodeller som inte behöver mycket data, att systemsfungerar i realtid och under säkerhetskrav. Vi visar hur generativa modeller av senso-rimotoriska kopplingar kan hantera dessa krav i våra experiment tillfredsställande.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2019. , p. 41
Series
TRITA-EECS-AVL ; 2019:60
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-256002ISBN: 978-91-7873-246-3 (print)OAI: oai:DiVA.org:kth-256002DiVA, id: diva2:1342963
Public defence
2019-09-12, F3, Lindstedtsvägen 26, Stockholm, 13:00 (English)
Opponent
Supervisors
Funder
EU, Horizon 2020, socsmcs
Note

QC 20190816

Available from: 2019-08-16 Created: 2019-08-15 Last updated: 2025-02-09Bibliographically approved
List of papers
1. Self-learning and adaptation in a sensorimotor framework
Open this publication in new window or tab >>Self-learning and adaptation in a sensorimotor framework
2016 (English)In: Proceedings - IEEE International Conference on Robotics and Automation, IEEE conference proceedings, 2016, p. 551-558Conference paper, Published paper (Refereed)
Abstract [en]

We present a general framework to autonomously achieve the task of finding a sequence of actions that result in a desired state. Autonomy is acquired by learning sensorimotor patterns of a robot, while it is interacting with its environment. Gaussian processes (GP) with automatic relevance determination are used to learn the sensorimotor mapping. In this way, relevant sensory and motor components can be systematically found in high-dimensional sensory and motor spaces. We propose an incremental GP learning strategy, which discerns between situations, when an update or an adaptation must be implemented. The Rapidly exploring Random Tree (RRT∗) algorithm is exploited to enable long-term planning and generating a sequence of states that lead to a given goal; while a gradient-based search finds the optimum action to steer to a neighbouring state in a single time step. Our experimental results prove the suitability of the proposed framework to learn a joint space controller with high data dimensions (10×15). It demonstrates short training phase (less than 12 seconds), real-time performance and rapid adaptations capabilities.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2016
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-197241 (URN)10.1109/ICRA.2016.7487178 (DOI)000389516200069 ()2-s2.0-84977498692 (Scopus ID)9781467380263 (ISBN)
Conference
2016 IEEE International Conference on Robotics and Automation, ICRA 2016, 16 May 2016 through 21 May 2016
Note

QC 20161207

Available from: 2016-12-07 Created: 2016-11-30 Last updated: 2025-02-09Bibliographically approved
2. A sensorimotor reinforcement learning framework for physical human-robot interaction
Open this publication in new window or tab >>A sensorimotor reinforcement learning framework for physical human-robot interaction
Show others...
2016 (English)In: IEEE International Conference on Intelligent Robots and Systems, IEEE, 2016, p. 2682-2688Conference paper, Published paper (Refereed)
Abstract [en]

Modeling of physical human-robot collaborations is generally a challenging problem due to the unpredictive nature of human behavior. To address this issue, we present a data-efficient reinforcement learning framework which enables a robot to learn how to collaborate with a human partner. The robot learns the task from its own sensorimotor experiences in an unsupervised manner. The uncertainty in the interaction is modeled using Gaussian processes (GP) to implement a forward model and an actionvalue function. Optimal action selection given the uncertain GP model is ensured by Bayesian optimization. We apply the framework to a scenario in which a human and a PR2 robot jointly control the ball position on a plank based on vision and force/torque data. Our experimental results show the suitability of the proposed method in terms of fast and data-efficient model learning, optimal action selection under uncertainty and equal role sharing between the partners.

Place, publisher, year, edition, pages
IEEE, 2016
Keywords
Behavioral research, Intelligent robots, Reinforcement learning, Robots, Bayesian optimization, Forward modeling, Gaussian process, Human behaviors, Human-robot collaboration, Model learning, Optimal actions, Physical human-robot interactions, Human robot interaction
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-202121 (URN)10.1109/IROS.2016.7759417 (DOI)000391921702127 ()2-s2.0-85006367922 (Scopus ID)9781509037629 (ISBN)
Conference
2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2016, 9 October 2016 through 14 October 2016
Note

QC 20170228

Available from: 2017-02-28 Created: 2017-02-28 Last updated: 2025-02-09Bibliographically approved
3. Deep representation learning for human motion prediction and classification
Open this publication in new window or tab >>Deep representation learning for human motion prediction and classification
2017 (English)In: 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), IEEE, 2017, p. 1591-1599Conference paper, Published paper (Refereed)
Abstract [en]

Generative models of 3D human motion are often restricted to a small number of activities and can therefore not generalize well to novel movements or applications. In this work we propose a deep learning framework for human motion capture data that learns a generic representation from a large corpus of motion capture data and generalizes well to new, unseen, motions. Using an encoding-decoding network that learns to predict future 3D poses from the most recent past, we extract a feature representation of human motion. Most work on deep learning for sequence prediction focuses on video and speech. Since skeletal data has a different structure, we present and evaluate different network architectures that make different assumptions about time dependencies and limb correlations. To quantify the learned features, we use the output of different layers for action classification and visualize the receptive fields of the network units. Our method outperforms the recent state of the art in skeletal motion prediction even though these use action specific training data. Our results show that deep feedforward networks, trained from a generic mocap database, can successfully be used for feature extraction from human motion data and that this representation can be used as a foundation for classification and prediction.

Place, publisher, year, edition, pages
IEEE, 2017
Series
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-221047 (URN)10.1109/CVPR.2017.173 (DOI)000418371401068 ()2-s2.0-85028058735 (Scopus ID)
Conference
30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), JUL 21-26, 2016, Honolulu, HI
Funder
Swedish Foundation for Strategic Research
Note

QC 20180111

Part of ISBN 978-1-5386-0457-1

Available from: 2018-01-11 Created: 2018-01-11 Last updated: 2025-02-07Bibliographically approved
4. Anticipating many futures: Online human motion prediction and generation for human-robot interaction
Open this publication in new window or tab >>Anticipating many futures: Online human motion prediction and generation for human-robot interaction
2018 (English)In: 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE COMPUTER SOC , 2018, p. 4563-4570Conference paper, Published paper (Refereed)
Abstract [en]

Fluent and safe interactions of humans and robots require both partners to anticipate the others' actions. The bottleneck of most methods is the lack of an accurate model of natural human motion. In this work, we present a conditional variational autoencoder that is trained to predict a window of future human motion given a window of past frames. Using skeletal data obtained from RGB depth images, we show how this unsupervised approach can be used for online motion prediction for up to 1660 ms. Additionally, we demonstrate online target prediction within the first 300-500 ms after motion onset without the use of target specific training data. The advantage of our probabilistic approach is the possibility to draw samples of possible future motion patterns. Finally, we investigate how movements and kinematic cues are represented on the learned low dimensional manifold.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2018
Series
IEEE International Conference on Robotics and Automation ICRA, ISSN 1050-4729
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-237164 (URN)10.1109/ICRA.2018.8460651 (DOI)000446394503071 ()2-s2.0-85063143206 (Scopus ID)978-1-5386-3081-5 (ISBN)
Conference
IEEE International Conference on Robotics and Automation (ICRA), MAY 21-25, 2018, Brisbane, AUSTRALIA
Funder
Swedish Foundation for Strategic Research
Note

QC 20181024

Available from: 2018-10-24 Created: 2018-10-24 Last updated: 2025-02-07Bibliographically approved
5. A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling
Open this publication in new window or tab >>A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Human behavior is a continuous stochastic spatio-temporal process which is governed by semantic actions and affordances as well as latent factors. Therefore, video-based human activity modeling is concerned with a number of tasks such as inferring current and future semantic labels, predicting future continuous observations as well as imagining possible future label and feature sequences. In this paper we present a semi-supervised probabilistic deep latent variable model that can represent both discrete labels and continuous observations as well as latent dynamics over time. This allows the model to solve several tasks at once without explicit fine-tuning. We focus here on the tasks of action classification, detection, prediction and anticipation as well as motion prediction and synthesis based on 3D human activity data recorded with Kinect. We further extend the model to capture hierarchical label structure and to model the dependencies between multiple entities, such as a human and objects. Our experiments demonstrate that our principled approach to human activity modeling can be used to detect current and anticipate future semantic labels and to predict and synthesize future label and feature sequences. When comparing our model to state-of-the-art approaches, which are specifically designed for e.g. action classification, we find that our probabilistic formulation outperforms or is comparable to these task specific models.

National Category
Robotics and automation
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-256000 (URN)
Funder
EU, Horizon 2020
Note

QC 20190816

Available from: 2019-08-15 Created: 2019-08-15 Last updated: 2025-02-09Bibliographically approved

Open Access in DiVA

fulltext(617 kB)1106 downloads
File information
File name FULLTEXT01.pdfFile size 617 kBChecksum SHA-512
efd065b006bca53b7739016890c1ebe62bec59d48feb192bc5ec0dd2234afa905503436f792dfb65c8c05458abf6e36cd772eba74359c17efa97dd75de6988c7
Type fulltextMimetype application/pdf

Authority records

Bütepage, Judith

Search in DiVA

By author/editor
Bütepage, Judith
By organisation
Robotics, Perception and Learning, RPL
Robotics and automation

Search outside of DiVA

GoogleGoogle Scholar
Total: 1107 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1378 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf