kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Evaluating Gesture Generation in a Large-scale Open Challenge: The GENEA Challenge 2022
Elect Arts Inc, SEED, Stockholm, Sweden..ORCID iD: 0000-0001-9838-8848
Radboud Univ Nijmegen, Donders Inst Brain Cognit & Behav, Nijmegen, Gelderland, Netherlands.;Univ Ghent, IDLab, Ghent, Belgium..
ETRI, Daejeon, South Korea..
Carnegie Mellon Univ, Pittsburgh, PA 15213 USA.;Nova Univ Lisbon, Lisbon, Portugal..
Show others and affiliations
2024 (English)In: ACM Transactions on Graphics, ISSN 0730-0301, E-ISSN 1557-7368, Vol. 43, no 3, article id 32Article in journal (Refereed) Published
Abstract [en]

This article reports on the second GENEA Challenge to benchmark datadriven automatic co-speech gesture generation. Participating teams used the same speech and motion dataset to build gesture-generation systems. Motion generated by all these systems was rendered to video using a standardised visualisation pipeline and evaluated in several large, crowdsourced user studies. Unlike when comparing different research articles, differences in results are here only due to differences betweenmethods, enabling direct comparison between systems. The dataset was based on 18 hours of fullbody motion capture, including fingers, of different persons engaging in a dyadic conversation. Ten teams participated in the challenge across two tiers: full-body and upper-body gesticulation. For each tier, we evaluated both the human-likeness of the gesture motion and its appropriateness for the specific speech signal. Our evaluations decouple human-likeness from gesture appropriateness, which has been a difficult problem in the field. The evaluation results show some synthetic gesture conditions being rated as significantly more human-like than 3D human motion capture. To the best of our knowledge, this has not been demonstrated before. On the other hand, all synthetic motion is found to be vastly less appropriate for the speech than the original motion-capture recordings. We also find that conventional objective metrics do not correlate well with subjective human-likeness ratings in this large evaluation. The one exception is the Frechet gesture distance (FGD), which achieves a Kendall's tau rank correlation of around -0.5. Based on the challenge results we formulate numerous recommendations for system building and evaluation.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM) , 2024. Vol. 43, no 3, article id 32
Keywords [en]
Animation, gesture generation, embodied, conversational agents, evaluation paradigms
National Category
Human Computer Interaction
Identifiers
URN: urn:nbn:se:kth:diva-352263DOI: 10.1145/3656374ISI: 001265558400008Scopus ID: 2-s2.0-85192703805OAI: oai:DiVA.org:kth-352263DiVA, id: diva2:1892730
Note

QC 20240827

Available from: 2024-08-27 Created: 2024-08-27 Last updated: 2024-09-06Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Kucherenko, TarasHenter, Gustav Eje

Search in DiVA

By author/editor
Kucherenko, TarasHenter, Gustav Eje
By organisation
Speech, Music and Hearing, TMH
In the same journal
ACM Transactions on Graphics
Human Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 151 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf