kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating Speech and Video Models for Face-Body Congruence
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0002-7414-845X
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0003-1206-5701
KTH, School of Electrical Engineering and Computer Science (EECS), Human Centered Technology, Media Technology and Interaction Design, MID.ORCID iD: 0000-0002-6571-0623
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0002-7257-0761
2025 (English)In: I3D Companion '25: Companion Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, Association for Computing Machinery (ACM) , 2025Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Animations produced by generative models are often evaluated using objective quantitative metrics that do not fully capture perceptual effects in immersive virtual environments. To address this gap, we present a preliminary perceptual evaluation of generative models for animation synthesis, conducted via a VR-based user study (N = 48). Our investigation specifically focuses on animation congruency—ensuring that generated facial expressions and body gestures are both congruent with and synchronized to driving speech. We evaluated two state-of-the-art methods: a speech-driven full-body animation model and a video-driven full-body reconstruction model, assessing their capability to produce congruent facial expressions and body gestures. Our results demonstrate a strong user preference for combined facial and body animations, highlighting that congruent multimodal animations significantly enhance perceived realism compared to animations featuring only a single modality. By incorporating VR-based perceptual feedback into training pipelines, our approach provides a foundation for developing more engaging and responsive virtual characters.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM) , 2025.
Keywords [en]
Computer graphics, Animation
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:kth:diva-363248DOI: 10.1145/3722564.3728374OAI: oai:DiVA.org:kth-363248DiVA, id: diva2:1957344
Conference
ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games-I3D 2025, NJIT, Jersey City, NJ, USA, 7-9 May 2025
Note

Part of ISBN 9798400718335

QC 20250509

Available from: 2025-05-09 Created: 2025-05-09 Last updated: 2025-05-09Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records

Chhatre, KiranGuarese, RenanMatviienko, AndriiPeters, Christopher

Search in DiVA

By author/editor
Chhatre, KiranGuarese, RenanMatviienko, AndriiPeters, Christopher
By organisation
Computational Science and Technology (CST)Media Technology and Interaction Design, MID
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 16 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf