kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Predicting the what and how - A probabilistic semi-supervised approach to multi-task human activity modeling
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0001-5344-8042
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-5750-9655
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0003-2965-2953
2019 (English)In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, IEEE Computer Society , 2019, p. 2923-2926Conference paper, Published paper (Refereed)
Abstract [en]

Video-based prediction of human activity is usually performed on one of two levels: either a model is trained to anticipate high-level action labels or it is trained to predict future trajectories either in skeletal joint space or in image pixel space. This separation of classification and regression tasks implies that models cannot make use of the mutual information between continuous and semantic observations. However, if a model knew that an observed human wants to drink from a nearby glass, the space of possible trajectories would be highly constrained to reaching movements. Likewise, if a model had predicted a reaching trajectory, the inference of future semantic labels would rank 'lifting' more likely than 'walking'. In this work, we propose a semi-supervised generative latent variable model that addresses both of these levels by modeling continuous observations as well as semantic labels. This fusion of signals allows the model to solve several tasks, such as action detection and anticipation as well as motion prediction and synthesis, simultaneously. We demonstrate this ability on the UTKinect-Action3D dataset, which consists of noisy, partially labeled multi-action sequences. The aim of this work is to encourage research within the field of human activity modeling based on mixed categorical and continuous data.

Place, publisher, year, edition, pages
IEEE Computer Society , 2019. p. 2923-2926
Keywords [en]
Classification (of information), Computer vision, Forecasting, Semantics, Trajectories, Action sequences, Continuous data, Continuous observation, Human activities, Latent variable modeling, Motion prediction, Mutual informations, Reaching movements, Motion estimation
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-274794DOI: 10.1109/CVPRW.2019.00352ISI: 000569983600346Scopus ID: 2-s2.0-85083309794OAI: oai:DiVA.org:kth-274794DiVA, id: diva2:1445952
Conference
32nd IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, CVPRW 2019, 16-20 June 2019, Long Beach, United States
Note

QC 20200623

Part of ISBN 9781728125060

Available from: 2020-06-23 Created: 2020-06-23 Last updated: 2024-10-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Butepage, JudithKjellström, HedvigKragic, Danica

Search in DiVA

By author/editor
Butepage, JudithKjellström, HedvigKragic, Danica
By organisation
Robotics, Perception and Learning, RPL
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 82 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf