kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Transflower: probabilistic autoregressive dance generation with multimodal attention
Univ Bordeaux, Ensta ParisTech, Bordeaux, France..
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-1643-1054
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1399-6604
KTH, School of Electrical Engineering and Computer Science (EECS), Human Centered Technology, Media Technology and Interaction Design, MID.ORCID iD: 0000-0003-1679-6018
Show others and affiliations
2021 (English)In: ACM Transactions on Graphics, ISSN 0730-0301, E-ISSN 1557-7368, Vol. 40, no 6, article id 195Article in journal (Refereed) Published
Abstract [en]

Dance requires skillful composition of complex movements that follow rhythmic, tonal and timbral features of music. Formally, generating dance conditioned on a piece of music can be expressed as a problem of modelling a high-dimensional continuous motion signal, conditioned on an audio signal. In this work we make two contributions to tackle this problem. First, we present a novel probabilistic autoregressive architecture that models the distribution over future poses with a normalizing flow conditioned on previous poses as well as music context, using a multimodal transformer encoder. Second, we introduce the currently largest 3D dance-motion dataset, obtained with a variety of motion-capture technologies, and including both professional and casual dancers. Using this dataset, we compare our new model against two baselines, via objective metrics and a user study, and show that both the ability to model a probability distribution, as well as being able to attend over a large motion and music context are necessary to produce interesting, diverse, and realistic dance that matches the music.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM) , 2021. Vol. 40, no 6, article id 195
Keywords [en]
Generative models, machine learning, normalising flows, Glow, transformers, dance
National Category
Computer Vision and Robotics (Autonomous Systems) Computer Sciences Signal Processing
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-307028DOI: 10.1145/3478513.3480570ISI: 000729846700001Scopus ID: 2-s2.0-85125127739OAI: oai:DiVA.org:kth-307028DiVA, id: diva2:1626445
Funder
Swedish Research Council, 2018-05409Swedish Research Council, 2019-03694Knut and Alice Wallenberg Foundation, WASPMarianne and Marcus Wallenberg Foundation, 2020.0102
Note

QC 20220520

Available from: 2022-01-11 Created: 2022-01-11 Last updated: 2023-06-08Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Henter, Gustav EjeBeskow, JonasHolzapfel, AndreAlexanderson, Simon

Search in DiVA

By author/editor
Henter, Gustav EjeBeskow, JonasHolzapfel, AndreAlexanderson, Simon
By organisation
Speech, Music and Hearing, TMHMedia Technology and Interaction Design, MID
In the same journal
ACM Transactions on Graphics
Computer Vision and Robotics (Autonomous Systems)Computer SciencesSignal Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 196 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf