kth.sePublications
System disruptions
We are currently experiencing disruptions on the search portals due to high traffic. We are working to resolve the issue, you may temporarily encounter an error message.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multimodal analysis of the predictability of hand-gesture properties
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0001-9838-8848
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-9653-6699
University of California, Davis, Davis, CA, USA.ORCID iD: 0000-0003-0226-2808
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-5750-9655
Show others and affiliations
2022 (English)In: AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, ACM Press, 2022, p. 770-779Conference paper, Published paper (Refereed)
Abstract [en]

Embodied conversational agents benefit from being able to accompany their speech with gestures. Although many data-driven approaches to gesture generation have been proposed in recent years, it is still unclear whether such systems can consistently generate gestures that convey meaning. We investigate which gesture properties (phase, category, and semantics) can be predicted from speech text and/or audio using contemporary deep learning. In extensive experiments, we show that gesture properties related to gesture meaning (semantics and category) are predictable from text features (time-aligned FastText embeddings) alone, but not from prosodic audio features, while rhythm-related gesture properties (phase) on the other hand can be predicted from audio features better than from text. These results are encouraging as they indicate that it is possible to equip an embodied agent with content-wise meaningful co-speech gestures using a machine-learning model.

Place, publisher, year, edition, pages
ACM Press, 2022. p. 770-779
Keywords [en]
embodied conversational agents, gesture generation, gesture analysis, gesture property
National Category
Computer Sciences Human Computer Interaction
Research subject
Computer Science; Human-computer Interaction
Identifiers
URN: urn:nbn:se:kth:diva-312470DOI: 10.5555/3535850.3535937Scopus ID: 2-s2.0-85134341889OAI: oai:DiVA.org:kth-312470DiVA, id: diva2:1659101
Conference
21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, Auckland, New Zealand, May 9-13, 2022
Funder
Swedish Foundation for Strategic ResearchWallenberg AI, Autonomous Systems and Software Program (WASP)Knut and Alice Wallenberg Foundation
Note

Part of proceedings ISBN: 9781450392136

QC 20220621

Available from: 2022-05-19 Created: 2022-05-19 Last updated: 2023-04-26Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopushttps://dl.acm.org/doi/abs/10.5555/3535850.3535937

Authority records

Kucherenko, TarasNagy, RajmundKjellström, HedvigHenter, Gustav Eje

Search in DiVA

By author/editor
Kucherenko, TarasNagy, RajmundNeff, MichaelKjellström, HedvigHenter, Gustav Eje
By organisation
Robotics, Perception and Learning, RPLSpeech, Music and Hearing, TMH
Computer SciencesHuman Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 121 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf