Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Generative models for action generation and action understanding
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Robotik, perception och lärande, RPL.ORCID-id: 0000-0001-5344-8042
2019 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)Alternativ tittel
Generativa modeller för generering och förståelse av mänsklig aktivitet (svensk)
Abstract [en]

The question of how to build intelligent machines raises the question of how to rep-resent the world to enable intelligent behavior. In nature, this representation relies onthe interplay between an organism’s sensory input and motor input. Action-perceptionloops allow many complex behaviors to arise naturally. In this work, we take these sen-sorimotor contingencies as an inspiration to build robot systems that can autonomouslyinteract with their environment and with humans. The goal is to pave the way for robotsystems that can learn motor control in an unsupervised fashion and relate their ownsensorimotor experience to observed human actions. By combining action generationand action understanding we hope to facilitate smooth and intuitive interaction betweenrobots and humans in shared work spaces.To model robot sensorimotor contingencies and human behavior we employ gen-erative models. Since generative models represent a joint distribution over relevantvariables, they are flexible enough to cover the range of tasks that we are tacklinghere. Generative models can represent variables that originate from multiple modali-ties, model temporal dynamics, incorporate latent variables and represent uncertaintyover any variable - all of which are features required to model sensorimotor contin-gencies. By using generative models, we can predict the temporal development of thevariables in the future, which is important for intelligent action selection.We present two lines of work. Firstly, we will focus on unsupervised learning ofmotor control with help of sensorimotor contingencies. Based on Gaussian Processforward models we demonstrate how the robot can execute goal-directed actions withthe help of planning techniques or reinforcement learning. Secondly, we present anumber of approaches to model human activity, ranging from pure unsupervised mo-tion prediction to including semantic action and affordance labels. Here we employdeep generative models, namely Variational Autoencoders, to model the 3D skeletalpose of humans over time and, if required, include semantic information. These twolines of work are then combined to implement physical human-robot interaction tasks.Our experiments focus on real-time applications, both when it comes to robot ex-periments and human activity modeling. Since many real-world scenarios do not haveaccess to high-end sensors, we require our models to cope with uncertainty. Additionalrequirements are data-efficient learning, because of the wear and tear of the robot andhuman involvement, online employability and operation under safety and complianceconstraints. We demonstrate how generative models of sensorimotor contingencies canhandle these requirements in our experiments satisfyingly.

Abstract [sv]

Frågan om hur man bygger intelligenta maskiner väcker frågan om hur man kanrepresentera världen för att möjliggöra intelligent beteende. I naturen bygger en sådanrepresentation på samspelet mellan en organisms sensoriska intryck och handlingar.Kopplingar mellan sinnesintryck och handlingar gör att många komplexa beteendenkan uppstå naturligt. I detta arbete tar vi dessa sensorimotoriska kopplingar som eninspiration för att bygga robotarsystem som autonomt kan interagera med sin miljöoch med människor. Målet är att bana väg för robotarsystem som självständiga kan lärasig att kontrollera sina rörelser och relatera sina egen sensorimotoriska upplevelser tillobserverade mänskliga handlingar. Genom att relatera robotens rörelser och förståelsenav mänskliga handlingar, hoppas vi kunna underlätta smidig och intuitiv interaktionmellan robotar och människor.För att modellera robotens sensimotoriska kopplingar och mänskligt beteende an-vänder vi generativa modeller. Eftersom generativa modeller representerar en multiva-riat fördelning över relevanta variabler, är de tillräckligt flexibla för att uppfylla demkrav som vi ställer här. Generativa modeller kan representera variabler från olika mo-daliteter, modellera temporala dynamiska system, modellera latenta variabler och re-presentera variablers varians - alla dessa egenskaper är nödvändiga för att modellerasensorimotoriska kopplingar. Genom att använda generativa modeller kan vi förutseutvecklingen av variablerna i framtiden, vilket är viktigt för att ta intelligenta beslut.Vi presenterar arbete som går i två riktningar. För det första kommer vi att fokuserapå självständig inlärande av rörelse kontroll med hjälp av sensorimotoriska kopplingar.Baserat på Gaussian Process forward modeller visar vi hur roboten kan röra på sigmot ett mål med hjälp av planeringstekniker eller förstärkningslärande. För det andrapresenterar vi ett antal tillvägagångssätt för att modellera mänsklig aktivitet, allt frånatt förutse hur människan kommer röra på sig till att inkludera semantisk information.Här använder vi djupa generativa modeller, nämligen Variational Autoencoders, föratt modellera 3D-skelettpositionen av människor över tid och, om så krävs, inkluderasemantisk information. Dessa två ideer kombineras sedan för att hjälpa roboten attinteragera med människan.Våra experiment fokuserar på realtidsscenarion, både när det gäller robot experi-ment och mänsklig aktivitet modellering. Eftersom många verkliga scenarier inte hartillgång till avancerade sensorer, kräver vi att våra modeller hanterar osäkerhet. Yt-terligare krav är maskininlärningsmodeller som inte behöver mycket data, att systemsfungerar i realtid och under säkerhetskrav. Vi visar hur generativa modeller av senso-rimotoriska kopplingar kan hantera dessa krav i våra experiment tillfredsställande.

sted, utgiver, år, opplag, sider
Stockholm: KTH Royal Institute of Technology, 2019. , s. 41
Serie
TRITA-EECS-AVL ; 2019:60
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-256002ISBN: 978-91-7873-246-3 (tryckt)OAI: oai:DiVA.org:kth-256002DiVA, id: diva2:1342963
Disputas
2019-09-12, F3, Lindstedtsvägen 26, Stockholm, 13:00 (engelsk)
Opponent
Veileder
Forskningsfinansiär
EU, Horizon 2020, socsmcs
Merknad

QC 20190816

Tilgjengelig fra: 2019-08-16 Laget: 2019-08-15 Sist oppdatert: 2025-02-09bibliografisk kontrollert
Delarbeid
1. Self-learning and adaptation in a sensorimotor framework
Åpne denne publikasjonen i ny fane eller vindu >>Self-learning and adaptation in a sensorimotor framework
2016 (engelsk)Inngår i: Proceedings - IEEE International Conference on Robotics and Automation, IEEE conference proceedings, 2016, s. 551-558Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We present a general framework to autonomously achieve the task of finding a sequence of actions that result in a desired state. Autonomy is acquired by learning sensorimotor patterns of a robot, while it is interacting with its environment. Gaussian processes (GP) with automatic relevance determination are used to learn the sensorimotor mapping. In this way, relevant sensory and motor components can be systematically found in high-dimensional sensory and motor spaces. We propose an incremental GP learning strategy, which discerns between situations, when an update or an adaptation must be implemented. The Rapidly exploring Random Tree (RRT∗) algorithm is exploited to enable long-term planning and generating a sequence of states that lead to a given goal; while a gradient-based search finds the optimum action to steer to a neighbouring state in a single time step. Our experimental results prove the suitability of the proposed framework to learn a joint space controller with high data dimensions (10×15). It demonstrates short training phase (less than 12 seconds), real-time performance and rapid adaptations capabilities.

sted, utgiver, år, opplag, sider
IEEE conference proceedings, 2016
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-197241 (URN)10.1109/ICRA.2016.7487178 (DOI)000389516200069 ()2-s2.0-84977498692 (Scopus ID)9781467380263 (ISBN)
Konferanse
2016 IEEE International Conference on Robotics and Automation, ICRA 2016, 16 May 2016 through 21 May 2016
Merknad

QC 20161207

Tilgjengelig fra: 2016-12-07 Laget: 2016-11-30 Sist oppdatert: 2025-02-09bibliografisk kontrollert
2. A sensorimotor reinforcement learning framework for physical human-robot interaction
Åpne denne publikasjonen i ny fane eller vindu >>A sensorimotor reinforcement learning framework for physical human-robot interaction
Vise andre…
2016 (engelsk)Inngår i: IEEE International Conference on Intelligent Robots and Systems, IEEE, 2016, s. 2682-2688Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Modeling of physical human-robot collaborations is generally a challenging problem due to the unpredictive nature of human behavior. To address this issue, we present a data-efficient reinforcement learning framework which enables a robot to learn how to collaborate with a human partner. The robot learns the task from its own sensorimotor experiences in an unsupervised manner. The uncertainty in the interaction is modeled using Gaussian processes (GP) to implement a forward model and an actionvalue function. Optimal action selection given the uncertain GP model is ensured by Bayesian optimization. We apply the framework to a scenario in which a human and a PR2 robot jointly control the ball position on a plank based on vision and force/torque data. Our experimental results show the suitability of the proposed method in terms of fast and data-efficient model learning, optimal action selection under uncertainty and equal role sharing between the partners.

sted, utgiver, år, opplag, sider
IEEE, 2016
Emneord
Behavioral research, Intelligent robots, Reinforcement learning, Robots, Bayesian optimization, Forward modeling, Gaussian process, Human behaviors, Human-robot collaboration, Model learning, Optimal actions, Physical human-robot interactions, Human robot interaction
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-202121 (URN)10.1109/IROS.2016.7759417 (DOI)000391921702127 ()2-s2.0-85006367922 (Scopus ID)9781509037629 (ISBN)
Konferanse
2016 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2016, 9 October 2016 through 14 October 2016
Merknad

QC 20170228

Tilgjengelig fra: 2017-02-28 Laget: 2017-02-28 Sist oppdatert: 2025-02-09bibliografisk kontrollert
3. Deep representation learning for human motion prediction and classification
Åpne denne publikasjonen i ny fane eller vindu >>Deep representation learning for human motion prediction and classification
2017 (engelsk)Inngår i: 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), IEEE, 2017, s. 1591-1599Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Generative models of 3D human motion are often restricted to a small number of activities and can therefore not generalize well to novel movements or applications. In this work we propose a deep learning framework for human motion capture data that learns a generic representation from a large corpus of motion capture data and generalizes well to new, unseen, motions. Using an encoding-decoding network that learns to predict future 3D poses from the most recent past, we extract a feature representation of human motion. Most work on deep learning for sequence prediction focuses on video and speech. Since skeletal data has a different structure, we present and evaluate different network architectures that make different assumptions about time dependencies and limb correlations. To quantify the learned features, we use the output of different layers for action classification and visualize the receptive fields of the network units. Our method outperforms the recent state of the art in skeletal motion prediction even though these use action specific training data. Our results show that deep feedforward networks, trained from a generic mocap database, can successfully be used for feature extraction from human motion data and that this representation can be used as a foundation for classification and prediction.

sted, utgiver, år, opplag, sider
IEEE, 2017
Serie
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-221047 (URN)10.1109/CVPR.2017.173 (DOI)000418371401068 ()2-s2.0-85028058735 (Scopus ID)
Konferanse
30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), JUL 21-26, 2016, Honolulu, HI
Forskningsfinansiär
Swedish Foundation for Strategic Research
Merknad

QC 20180111

Part of ISBN 978-1-5386-0457-1

Tilgjengelig fra: 2018-01-11 Laget: 2018-01-11 Sist oppdatert: 2025-02-07bibliografisk kontrollert
4. Anticipating many futures: Online human motion prediction and generation for human-robot interaction
Åpne denne publikasjonen i ny fane eller vindu >>Anticipating many futures: Online human motion prediction and generation for human-robot interaction
2018 (engelsk)Inngår i: 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE COMPUTER SOC , 2018, s. 4563-4570Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Fluent and safe interactions of humans and robots require both partners to anticipate the others' actions. The bottleneck of most methods is the lack of an accurate model of natural human motion. In this work, we present a conditional variational autoencoder that is trained to predict a window of future human motion given a window of past frames. Using skeletal data obtained from RGB depth images, we show how this unsupervised approach can be used for online motion prediction for up to 1660 ms. Additionally, we demonstrate online target prediction within the first 300-500 ms after motion onset without the use of target specific training data. The advantage of our probabilistic approach is the possibility to draw samples of possible future motion patterns. Finally, we investigate how movements and kinematic cues are represented on the learned low dimensional manifold.

sted, utgiver, år, opplag, sider
IEEE COMPUTER SOC, 2018
Serie
IEEE International Conference on Robotics and Automation ICRA, ISSN 1050-4729
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-237164 (URN)10.1109/ICRA.2018.8460651 (DOI)000446394503071 ()2-s2.0-85063143206 (Scopus ID)978-1-5386-3081-5 (ISBN)
Konferanse
IEEE International Conference on Robotics and Automation (ICRA), MAY 21-25, 2018, Brisbane, AUSTRALIA
Forskningsfinansiär
Swedish Foundation for Strategic Research
Merknad

QC 20181024

Tilgjengelig fra: 2018-10-24 Laget: 2018-10-24 Sist oppdatert: 2025-02-07bibliografisk kontrollert
5. A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling
Åpne denne publikasjonen i ny fane eller vindu >>A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling
(engelsk)Manuskript (preprint) (Annet vitenskapelig)
Abstract [en]

Human behavior is a continuous stochastic spatio-temporal process which is governed by semantic actions and affordances as well as latent factors. Therefore, video-based human activity modeling is concerned with a number of tasks such as inferring current and future semantic labels, predicting future continuous observations as well as imagining possible future label and feature sequences. In this paper we present a semi-supervised probabilistic deep latent variable model that can represent both discrete labels and continuous observations as well as latent dynamics over time. This allows the model to solve several tasks at once without explicit fine-tuning. We focus here on the tasks of action classification, detection, prediction and anticipation as well as motion prediction and synthesis based on 3D human activity data recorded with Kinect. We further extend the model to capture hierarchical label structure and to model the dependencies between multiple entities, such as a human and objects. Our experiments demonstrate that our principled approach to human activity modeling can be used to detect current and anticipate future semantic labels and to predict and synthesize future label and feature sequences. When comparing our model to state-of-the-art approaches, which are specifically designed for e.g. action classification, we find that our probabilistic formulation outperforms or is comparable to these task specific models.

HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-256000 (URN)
Forskningsfinansiär
EU, Horizon 2020
Merknad

QC 20190816

Tilgjengelig fra: 2019-08-15 Laget: 2019-08-15 Sist oppdatert: 2025-02-09bibliografisk kontrollert

Open Access i DiVA

fulltext(617 kB)1188 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 617 kBChecksum SHA-512
efd065b006bca53b7739016890c1ebe62bec59d48feb192bc5ec0dd2234afa905503436f792dfb65c8c05458abf6e36cd772eba74359c17efa97dd75de6988c7
Type fulltextMimetype application/pdf

Person

Bütepage, Judith

Søk i DiVA

Av forfatter/redaktør
Bütepage, Judith
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 1190 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

isbn
urn-nbn

Altmetric

isbn
urn-nbn
Totalt: 1545 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf