kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Probabilistic Semi-Supervised Approach to Multi-Task Human Activity Modeling
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0001-5344-8042
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-5750-9655
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0003-2965-2953
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Human behavior is a continuous stochastic spatio-temporal process which is governed by semantic actions and affordances as well as latent factors. Therefore, video-based human activity modeling is concerned with a number of tasks such as inferring current and future semantic labels, predicting future continuous observations as well as imagining possible future label and feature sequences. In this paper we present a semi-supervised probabilistic deep latent variable model that can represent both discrete labels and continuous observations as well as latent dynamics over time. This allows the model to solve several tasks at once without explicit fine-tuning. We focus here on the tasks of action classification, detection, prediction and anticipation as well as motion prediction and synthesis based on 3D human activity data recorded with Kinect. We further extend the model to capture hierarchical label structure and to model the dependencies between multiple entities, such as a human and objects. Our experiments demonstrate that our principled approach to human activity modeling can be used to detect current and anticipate future semantic labels and to predict and synthesize future label and feature sequences. When comparing our model to state-of-the-art approaches, which are specifically designed for e.g. action classification, we find that our probabilistic formulation outperforms or is comparable to these task specific models.

National Category
Robotics and automation
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-256000OAI: oai:DiVA.org:kth-256000DiVA, id: diva2:1342958
Funder
EU, Horizon 2020
Note

QC 20190816

Available from: 2019-08-15 Created: 2019-08-15 Last updated: 2025-02-09Bibliographically approved
In thesis
1. Generative models for action generation and action understanding
Open this publication in new window or tab >>Generative models for action generation and action understanding
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Alternative title[sv]
Generativa modeller för generering och förståelse av mänsklig aktivitet
Abstract [en]

The question of how to build intelligent machines raises the question of how to rep-resent the world to enable intelligent behavior. In nature, this representation relies onthe interplay between an organism’s sensory input and motor input. Action-perceptionloops allow many complex behaviors to arise naturally. In this work, we take these sen-sorimotor contingencies as an inspiration to build robot systems that can autonomouslyinteract with their environment and with humans. The goal is to pave the way for robotsystems that can learn motor control in an unsupervised fashion and relate their ownsensorimotor experience to observed human actions. By combining action generationand action understanding we hope to facilitate smooth and intuitive interaction betweenrobots and humans in shared work spaces.To model robot sensorimotor contingencies and human behavior we employ gen-erative models. Since generative models represent a joint distribution over relevantvariables, they are flexible enough to cover the range of tasks that we are tacklinghere. Generative models can represent variables that originate from multiple modali-ties, model temporal dynamics, incorporate latent variables and represent uncertaintyover any variable - all of which are features required to model sensorimotor contin-gencies. By using generative models, we can predict the temporal development of thevariables in the future, which is important for intelligent action selection.We present two lines of work. Firstly, we will focus on unsupervised learning ofmotor control with help of sensorimotor contingencies. Based on Gaussian Processforward models we demonstrate how the robot can execute goal-directed actions withthe help of planning techniques or reinforcement learning. Secondly, we present anumber of approaches to model human activity, ranging from pure unsupervised mo-tion prediction to including semantic action and affordance labels. Here we employdeep generative models, namely Variational Autoencoders, to model the 3D skeletalpose of humans over time and, if required, include semantic information. These twolines of work are then combined to implement physical human-robot interaction tasks.Our experiments focus on real-time applications, both when it comes to robot ex-periments and human activity modeling. Since many real-world scenarios do not haveaccess to high-end sensors, we require our models to cope with uncertainty. Additionalrequirements are data-efficient learning, because of the wear and tear of the robot andhuman involvement, online employability and operation under safety and complianceconstraints. We demonstrate how generative models of sensorimotor contingencies canhandle these requirements in our experiments satisfyingly.

Abstract [sv]

Frågan om hur man bygger intelligenta maskiner väcker frågan om hur man kanrepresentera världen för att möjliggöra intelligent beteende. I naturen bygger en sådanrepresentation på samspelet mellan en organisms sensoriska intryck och handlingar.Kopplingar mellan sinnesintryck och handlingar gör att många komplexa beteendenkan uppstå naturligt. I detta arbete tar vi dessa sensorimotoriska kopplingar som eninspiration för att bygga robotarsystem som autonomt kan interagera med sin miljöoch med människor. Målet är att bana väg för robotarsystem som självständiga kan lärasig att kontrollera sina rörelser och relatera sina egen sensorimotoriska upplevelser tillobserverade mänskliga handlingar. Genom att relatera robotens rörelser och förståelsenav mänskliga handlingar, hoppas vi kunna underlätta smidig och intuitiv interaktionmellan robotar och människor.För att modellera robotens sensimotoriska kopplingar och mänskligt beteende an-vänder vi generativa modeller. Eftersom generativa modeller representerar en multiva-riat fördelning över relevanta variabler, är de tillräckligt flexibla för att uppfylla demkrav som vi ställer här. Generativa modeller kan representera variabler från olika mo-daliteter, modellera temporala dynamiska system, modellera latenta variabler och re-presentera variablers varians - alla dessa egenskaper är nödvändiga för att modellerasensorimotoriska kopplingar. Genom att använda generativa modeller kan vi förutseutvecklingen av variablerna i framtiden, vilket är viktigt för att ta intelligenta beslut.Vi presenterar arbete som går i två riktningar. För det första kommer vi att fokuserapå självständig inlärande av rörelse kontroll med hjälp av sensorimotoriska kopplingar.Baserat på Gaussian Process forward modeller visar vi hur roboten kan röra på sigmot ett mål med hjälp av planeringstekniker eller förstärkningslärande. För det andrapresenterar vi ett antal tillvägagångssätt för att modellera mänsklig aktivitet, allt frånatt förutse hur människan kommer röra på sig till att inkludera semantisk information.Här använder vi djupa generativa modeller, nämligen Variational Autoencoders, föratt modellera 3D-skelettpositionen av människor över tid och, om så krävs, inkluderasemantisk information. Dessa två ideer kombineras sedan för att hjälpa roboten attinteragera med människan.Våra experiment fokuserar på realtidsscenarion, både när det gäller robot experi-ment och mänsklig aktivitet modellering. Eftersom många verkliga scenarier inte hartillgång till avancerade sensorer, kräver vi att våra modeller hanterar osäkerhet. Yt-terligare krav är maskininlärningsmodeller som inte behöver mycket data, att systemsfungerar i realtid och under säkerhetskrav. Vi visar hur generativa modeller av senso-rimotoriska kopplingar kan hantera dessa krav i våra experiment tillfredsställande.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2019. p. 41
Series
TRITA-EECS-AVL ; 2019:60
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-256002 (URN)978-91-7873-246-3 (ISBN)
Public defence
2019-09-12, F3, Lindstedtsvägen 26, Stockholm, 13:00 (English)
Opponent
Supervisors
Funder
EU, Horizon 2020, socsmcs
Note

QC 20190816

Available from: 2019-08-16 Created: 2019-08-15 Last updated: 2025-02-09Bibliographically approved

Open Access in DiVA

fulltext(733 kB)258 downloads
File information
File name FULLTEXT02.pdfFile size 733 kBChecksum SHA-512
44edd44696812dafbf1be92d6c879bea01228eecf2364ab9dc3789da33fe56111267931d34d3433c7359d64c2990367ad1102af5a3afc51953f0e7cad160ccee
Type fulltextMimetype application/pdf

Authority records

Bütepage, JudithKjellström, HedvigKragic, Danica

Search in DiVA

By author/editor
Bütepage, JudithKjellström, HedvigKragic, Danica
By organisation
Robotics, Perception and Learning, RPL
Robotics and automation

Search outside of DiVA

GoogleGoogle Scholar
Total: 258 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 430 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf