kth.sePublikationer KTH
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Emotional Speech-driven 3D Body Animation via Disentangled Latent Diffusion
KTH, Skolan för elektroteknik och datavetenskap (EECS), Datavetenskap, Beräkningsvetenskap och beräkningsteknik (CST).ORCID-id: 0000-0002-7414-845X
Max Planck Institute for Intelligent Systems, Germany.ORCID-id: 0000-0002-1651-030X
Max Planck Institute for Intelligent Systems, Germany.
Max Planck Institute for Intelligent Systems, Germany.
Visa övriga samt affilieringar
2024 (Engelska)Ingår i: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Institute of Electrical and Electronics Engineers (IEEE) , 2024, s. 1942-1953Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Existing methods for synthesizing 3D human gestures from speech have shown promising results but they do not explicitly model the impact of emotions on the generated gestures. Instead these methods directly output animations from speech without control over the expressed emotion. To address this limitation we present AMUSE an emotional speech-driven body animation model based on latent diffusion. Our observation is that content (i.e. gestures related to speech rhythm and word utterances) emotion and personal style are separable. To account for this AMUSE maps the driving audio to three disentangled latent vectors: one for content one for emotion and one for personal style. A latent diffusion model trained to generate gesture motion sequences is then conditioned on these latent vectors. Once trained AMUSE synthesizes 3D human gestures directly from speech with control over the expressed emotions and style by combining the content from the driving speech with the emotion and style of another speech sequence. Randomly sampling the noise of the diffusion model further generates variations of the gesture with the same emotional expressivity. Qualitative quantitative and perceptual evaluations demonstrate that AMUSE outputs realistic gesture sequences. Compared to the state of the art the generated gestures are better synchronized with the speech content and better represent the emotion expressed by the input speech. Our code is available at amuse.is.tue.mpg.de.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE) , 2024. s. 1942-1953
Nationell ämneskategori
Elektroteknik och elektronik
Forskningsämne
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-354048DOI: 10.1109/CVPR52733.2024.00190ISI: 001322555902029Scopus ID: 2-s2.0-85202286367OAI: oai:DiVA.org:kth-354048DiVA, id: diva2:1901299
Konferens
2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 16-22 2024, Seattle, WA, USA
Anmärkning

Part of ISBN 979-8-3503-5300-6

QC 20240930

Tillgänglig från: 2024-09-26 Skapad: 2024-09-26 Senast uppdaterad: 2025-01-20Bibliografiskt granskad

Open Access i DiVA

Pdf(936 kB)143 nedladdningar
Filinformation
Filnamn FULLTEXT01.pdfFilstorlek 936 kBChecksumma SHA-512
6be60d70db2fedd6d86f18e403bf890887774d1664ff85b3dea8553a9e234e32f91948c90c42a268a887ca7ced4140a9fc1e1a64caf031ffd07e4538a607e1d2
Typ fulltextMimetyp application/pdf

Övriga länkar

Förlagets fulltextScopus

Person

Chhatre, KiranPeters, Christopher

Sök vidare i DiVA

Av författaren/redaktören
Chhatre, KiranDaněček, RadekPeters, ChristopherBolkart, Timo
Av organisationen
Beräkningsvetenskap och beräkningsteknik (CST)
Elektroteknik och elektronik

Sök vidare utanför DiVA

GoogleGoogle Scholar
Totalt: 146 nedladdningar
Antalet nedladdningar är summan av nedladdningar för alla fulltexter. Det kan inkludera t.ex tidigare versioner som nu inte längre är tillgängliga.

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 143 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf