Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH. Motorica AB, Sweden.ORCID-id: 0000-0002-7801-7617
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-9653-6699
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-1399-6604
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH. Motorica AB, Sweden.ORCID-id: 0000-0002-1643-1054
2023 (engelsk)Inngår i: ACM Transactions on Graphics, ISSN 0730-0301, E-ISSN 1557-7368, Vol. 42, nr 4, artikkel-id 44Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models. We show that these models are an excellent fit for synthesising human motion that co-occurs with audio, e.g., dancing and co-speech gesticulation, since motion is complex and highly ambiguous given audio, calling for a probabilistic description. Specifically, we adapt the DiffWave architecture to model 3D pose sequences, putting Conformers in place of dilated convolutions for improved modelling power. We also demonstrate control over motion style, using classifier-free guidance to adjust the strength of the stylistic expression. Experiments on gesture and dance generation confirm that the proposed method achieves top-of-the-line motion quality, with distinctive styles whose expression can be made more or less pronounced. We also synthesise path-driven locomotion using the same model architecture. Finally, we generalise the guidance procedure to obtain product-of-expert ensembles of diffusion models and demonstrate how these may be used for, e.g., style interpolation, a contribution we believe is of independent interest.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM) , 2023. Vol. 42, nr 4, artikkel-id 44
Emneord [en]
conformers, dance, diffusion models, ensemble models, generative models, gestures, guided interpolation, locomotion, machine learning, product of experts
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-335345DOI: 10.1145/3592458ISI: 001044671300010Scopus ID: 2-s2.0-85166332883OAI: oai:DiVA.org:kth-335345DiVA, id: diva2:1795070
Merknad

QC 20230907

Tilgjengelig fra: 2023-09-07 Laget: 2023-09-07 Sist oppdatert: 2023-09-22bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Person

Alexanderson, SimonNagy, RajmundBeskow, JonasHenter, Gustav Eje

Søk i DiVA

Av forfatter/redaktør
Alexanderson, SimonNagy, RajmundBeskow, JonasHenter, Gustav Eje
Av organisasjonen
I samme tidsskrift
ACM Transactions on Graphics

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 168 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf