Change search
Link to record
Permanent link

Direct link
BETA
Butepage, Judith
Publications (5 of 5) Show all publications
Zhang, C., Butepage, J., Kjellström, H. & Mandt, S. (2019). Advances in Variational Inference. IEEE Transaction on Pattern Analysis and Machine Intelligence, 41(8), 2008-2026
Open this publication in new window or tab >>Advances in Variational Inference
2019 (English)In: IEEE Transaction on Pattern Analysis and Machine Intelligence, ISSN 0162-8828, E-ISSN 1939-3539, Vol. 41, no 8, p. 2008-2026Article in journal (Refereed) Published
Abstract [en]

Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. This approach has been successfully applied to various models and large-scale applications. In this review, we give an overview of recent trends in variational inference. We first introduce standard mean field variational inference, then review recent advances focusing on the following aspects: (a) scalable VI, which includes stochastic approximations, (b) generic VI, which extends the applicability of VI to a large class of otherwise intractable models, such as non-conjugate models, mean field approximation or with atypical divergences, and (d) amortized VI, which implements the inference over local latent variables with inference networks. Finally, we provide a summary of promising future research directions.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2019
Keywords
Variational inference, approximate Bayesian inference, reparameterization gradients, structured variational approximations, scalable inference, inference networks
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-255405 (URN)10.1109/TPAMI.2018.2889774 (DOI)000473598800016 ()30596568 (PubMedID)2-s2.0-85059288228 (Scopus ID)
Note

QC 20190814

Available from: 2019-08-14 Created: 2019-08-14 Last updated: 2019-08-14Bibliographically approved
Bütepage, J. (2019). Generative models for action generation and action understanding. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Generative models for action generation and action understanding
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Generativa modeller för generering och förståelse av mänsklig aktivitet
Abstract [en]

The question of how to build intelligent machines raises the question of how to rep-resent the world to enable intelligent behavior. In nature, this representation relies onthe interplay between an organism’s sensory input and motor input. Action-perceptionloops allow many complex behaviors to arise naturally. In this work, we take these sen-sorimotor contingencies as an inspiration to build robot systems that can autonomouslyinteract with their environment and with humans. The goal is to pave the way for robotsystems that can learn motor control in an unsupervised fashion and relate their ownsensorimotor experience to observed human actions. By combining action generationand action understanding we hope to facilitate smooth and intuitive interaction betweenrobots and humans in shared work spaces.To model robot sensorimotor contingencies and human behavior we employ gen-erative models. Since generative models represent a joint distribution over relevantvariables, they are flexible enough to cover the range of tasks that we are tacklinghere. Generative models can represent variables that originate from multiple modali-ties, model temporal dynamics, incorporate latent variables and represent uncertaintyover any variable - all of which are features required to model sensorimotor contin-gencies. By using generative models, we can predict the temporal development of thevariables in the future, which is important for intelligent action selection.We present two lines of work. Firstly, we will focus on unsupervised learning ofmotor control with help of sensorimotor contingencies. Based on Gaussian Processforward models we demonstrate how the robot can execute goal-directed actions withthe help of planning techniques or reinforcement learning. Secondly, we present anumber of approaches to model human activity, ranging from pure unsupervised mo-tion prediction to including semantic action and affordance labels. Here we employdeep generative models, namely Variational Autoencoders, to model the 3D skeletalpose of humans over time and, if required, include semantic information. These twolines of work are then combined to implement physical human-robot interaction tasks.Our experiments focus on real-time applications, both when it comes to robot ex-periments and human activity modeling. Since many real-world scenarios do not haveaccess to high-end sensors, we require our models to cope with uncertainty. Additionalrequirements are data-efficient learning, because of the wear and tear of the robot andhuman involvement, online employability and operation under safety and complianceconstraints. We demonstrate how generative models of sensorimotor contingencies canhandle these requirements in our experiments satisfyingly.

Abstract [sv]

Frågan om hur man bygger intelligenta maskiner väcker frågan om hur man kanrepresentera världen för att möjliggöra intelligent beteende. I naturen bygger en sådanrepresentation på samspelet mellan en organisms sensoriska intryck och handlingar.Kopplingar mellan sinnesintryck och handlingar gör att många komplexa beteendenkan uppstå naturligt. I detta arbete tar vi dessa sensorimotoriska kopplingar som eninspiration för att bygga robotarsystem som autonomt kan interagera med sin miljöoch med människor. Målet är att bana väg för robotarsystem som självständiga kan lärasig att kontrollera sina rörelser och relatera sina egen sensorimotoriska upplevelser tillobserverade mänskliga handlingar. Genom att relatera robotens rörelser och förståelsenav mänskliga handlingar, hoppas vi kunna underlätta smidig och intuitiv interaktionmellan robotar och människor.För att modellera robotens sensimotoriska kopplingar och mänskligt beteende an-vänder vi generativa modeller. Eftersom generativa modeller representerar en multiva-riat fördelning över relevanta variabler, är de tillräckligt flexibla för att uppfylla demkrav som vi ställer här. Generativa modeller kan representera variabler från olika mo-daliteter, modellera temporala dynamiska system, modellera latenta variabler och re-presentera variablers varians - alla dessa egenskaper är nödvändiga för att modellerasensorimotoriska kopplingar. Genom att använda generativa modeller kan vi förutseutvecklingen av variablerna i framtiden, vilket är viktigt för att ta intelligenta beslut.Vi presenterar arbete som går i två riktningar. För det första kommer vi att fokuserapå självständig inlärande av rörelse kontroll med hjälp av sensorimotoriska kopplingar.Baserat på Gaussian Process forward modeller visar vi hur roboten kan röra på sigmot ett mål med hjälp av planeringstekniker eller förstärkningslärande. För det andrapresenterar vi ett antal tillvägagångssätt för att modellera mänsklig aktivitet, allt frånatt förutse hur människan kommer röra på sig till att inkludera semantisk information.Här använder vi djupa generativa modeller, nämligen Variational Autoencoders, föratt modellera 3D-skelettpositionen av människor över tid och, om så krävs, inkluderasemantisk information. Dessa två ideer kombineras sedan för att hjälpa roboten attinteragera med människan.Våra experiment fokuserar på realtidsscenarion, både när det gäller robot experi-ment och mänsklig aktivitet modellering. Eftersom många verkliga scenarier inte hartillgång till avancerade sensorer, kräver vi att våra modeller hanterar osäkerhet. Yt-terligare krav är maskininlärningsmodeller som inte behöver mycket data, att systemsfungerar i realtid och under säkerhetskrav. Vi visar hur generativa modeller av senso-rimotoriska kopplingar kan hantera dessa krav i våra experiment tillfredsställande.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2019. p. 41
Series
TRITA-EECS-AVL ; 2019:60
National Category
Robotics
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-256002 (URN)978-91-7873-246-3 (ISBN)
Public defence
2019-09-12, F3, Lindstedtsvägen 26, Stockholm, 13:00 (English)
Opponent
Supervisors
Funder
EU, Horizon 2020, socsmcs
Note

QC 20190816

Available from: 2019-08-16 Created: 2019-08-15 Last updated: 2019-08-16Bibliographically approved
Butepage, J., Kjellström, H. & Kragic, D. (2018). Anticipating many futures: Online human motion prediction and generation for human-robot interaction. In: 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA): . Paper presented at IEEE International Conference on Robotics and Automation (ICRA), MAY 21-25, 2018, Brisbane, AUSTRALIA (pp. 4563-4570). IEEE COMPUTER SOC
Open this publication in new window or tab >>Anticipating many futures: Online human motion prediction and generation for human-robot interaction
2018 (English)In: 2018 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), IEEE COMPUTER SOC , 2018, p. 4563-4570Conference paper, Published paper (Refereed)
Abstract [en]

Fluent and safe interactions of humans and robots require both partners to anticipate the others' actions. The bottleneck of most methods is the lack of an accurate model of natural human motion. In this work, we present a conditional variational autoencoder that is trained to predict a window of future human motion given a window of past frames. Using skeletal data obtained from RGB depth images, we show how this unsupervised approach can be used for online motion prediction for up to 1660 ms. Additionally, we demonstrate online target prediction within the first 300-500 ms after motion onset without the use of target specific training data. The advantage of our probabilistic approach is the possibility to draw samples of possible future motion patterns. Finally, we investigate how movements and kinematic cues are represented on the learned low dimensional manifold.

Place, publisher, year, edition, pages
IEEE COMPUTER SOC, 2018
Series
IEEE International Conference on Robotics and Automation ICRA, ISSN 1050-4729
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-237164 (URN)000446394503071 ()978-1-5386-3081-5 (ISBN)
Conference
IEEE International Conference on Robotics and Automation (ICRA), MAY 21-25, 2018, Brisbane, AUSTRALIA
Funder
Swedish Foundation for Strategic Research
Note

QC 20181024

Available from: 2018-10-24 Created: 2018-10-24 Last updated: 2019-08-20Bibliographically approved
Butepage, J., Black, M. J., Kragic, D. & Kjellström, H. (2017). Deep representation learning for human motion prediction and classification. In: 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017): . Paper presented at 30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), JUL 21-26, 2016, Honolulu, HI (pp. 1591-1599). IEEE
Open this publication in new window or tab >>Deep representation learning for human motion prediction and classification
2017 (English)In: 30th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), IEEE, 2017, p. 1591-1599Conference paper (Refereed)
Abstract [en]

Generative models of 3D human motion are often restricted to a small number of activities and can therefore not generalize well to novel movements or applications. In this work we propose a deep learning framework for human motion capture data that learns a generic representation from a large corpus of motion capture data and generalizes well to new, unseen, motions. Using an encoding-decoding network that learns to predict future 3D poses from the most recent past, we extract a feature representation of human motion. Most work on deep learning for sequence prediction focuses on video and speech. Since skeletal data has a different structure, we present and evaluate different network architectures that make different assumptions about time dependencies and limb correlations. To quantify the learned features, we use the output of different layers for action classification and visualize the receptive fields of the network units. Our method outperforms the recent state of the art in skeletal motion prediction even though these use action specific training data. Our results show that deep feedforward networks, trained from a generic mocap database, can successfully be used for feature extraction from human motion data and that this representation can be used as a foundation for classification and prediction.

Place, publisher, year, edition, pages
IEEE, 2017
Series
IEEE Conference on Computer Vision and Pattern Recognition, ISSN 1063-6919
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-221047 (URN)10.1109/CVPR.2017.173 (DOI)000418371401068 ()978-1-5386-0457-1 (ISBN)
Conference
30th IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), JUL 21-26, 2016, Honolulu, HI
Funder
Swedish Foundation for Strategic Research
Note

QC 20180111

Available from: 2018-01-11 Created: 2018-01-11 Last updated: 2019-08-16Bibliographically approved
Bütepage, J., Kjellström, H. & Kragic, D.A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling.
Open this publication in new window or tab >>A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Human behavior is a continuous stochastic spatio-temporal process which is governed by semantic actions and affordances as well as latent factors. Therefore, video-based human activity modeling is concerned with a number of tasks such as inferring current and future semantic labels, predicting future continuous observations as well as imagining possible future label and feature sequences. In this paper we present a semi-supervised probabilistic deep latent variable model that can represent both discrete labels and continuous observations as well as latent dynamics over time. This allows the model to solve several tasks at once without explicit fine-tuning. We focus here on the tasks of action classification, detection, prediction and anticipation as well as motion prediction and synthesis based on 3D human activity data recorded with Kinect. We further extend the model to capture hierarchical label structure and to model the dependencies between multiple entities, such as a human and objects. Our experiments demonstrate that our principled approach to human activity modeling can be used to detect current and anticipate future semantic labels and to predict and synthesize future label and feature sequences. When comparing our model to state-of-the-art approaches, which are specifically designed for e.g. action classification, we find that our probabilistic formulation outperforms or is comparable to these task specific models.

National Category
Robotics
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-256000 (URN)
Funder
EU, Horizon 2020
Note

QC 20190816

Available from: 2019-08-15 Created: 2019-08-15 Last updated: 2019-08-16Bibliographically approved
Organisations

Search in DiVA

Show all publications