kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (7 of 7) Show all publications
Yoon, Y., Kucherenko, T., Woo, J., Wolfert, P., Nagy, R. & Henter, G. E. (2023). GENEA Workshop 2023: The 4th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents. In: ICMI 2023: Proceedings of the 25th International Conference on Multimodal Interaction. Paper presented at 25th International Conference on Multimodal Interaction, ICMI 2023, Paris, France, Oct 9 2023 - Oct 13 2023 (pp. 822-823). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>GENEA Workshop 2023: The 4th Workshop on Generation and Evaluation of Non-verbal Behaviour for Embodied Agents
Show others...
2023 (English)In: ICMI 2023: Proceedings of the 25th International Conference on Multimodal Interaction, Association for Computing Machinery (ACM) , 2023, p. 822-823Conference paper, Published paper (Refereed)
Abstract [en]

Non-verbal behavior is advantageous for embodied agents when interacting with humans. Despite many years of research on the generation of non-verbal behavior, there is no established benchmarking practice in the field. Most researchers do not compare their results to prior work, and if they do, they often do so in a manner that is not compatible with other approaches. The GENEA Workshop 2023 seeks to bring the community together to discuss the major challenges and solutions, and to identify the best ways to progress the field.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
behavior synthesis, datasets, evaluation, gesture generation
National Category
Communication Studies
Identifiers
urn:nbn:se:kth:diva-339690 (URN)10.1145/3577190.3616856 (DOI)001147764700105 ()2-s2.0-85175832532 (Scopus ID)
Conference
25th International Conference on Multimodal Interaction, ICMI 2023, Paris, France, Oct 9 2023 - Oct 13 2023
Note

Part of ISBN 9798400700552

QC 20231116

Available from: 2023-11-16 Created: 2023-11-16 Last updated: 2024-02-21Bibliographically approved
Alexanderson, S., Nagy, R., Beskow, J. & Henter, G. E. (2023). Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models. ACM Transactions on Graphics, 42(4), Article ID 44.
Open this publication in new window or tab >>Listen, Denoise, Action! Audio-Driven Motion Synthesis with Diffusion Models
2023 (English)In: ACM Transactions on Graphics, ISSN 0730-0301, E-ISSN 1557-7368, Vol. 42, no 4, article id 44Article in journal (Refereed) Published
Abstract [en]

Diffusion models have experienced a surge of interest as highly expressive yet efficiently trainable probabilistic models. We show that these models are an excellent fit for synthesising human motion that co-occurs with audio, e.g., dancing and co-speech gesticulation, since motion is complex and highly ambiguous given audio, calling for a probabilistic description. Specifically, we adapt the DiffWave architecture to model 3D pose sequences, putting Conformers in place of dilated convolutions for improved modelling power. We also demonstrate control over motion style, using classifier-free guidance to adjust the strength of the stylistic expression. Experiments on gesture and dance generation confirm that the proposed method achieves top-of-the-line motion quality, with distinctive styles whose expression can be made more or less pronounced. We also synthesise path-driven locomotion using the same model architecture. Finally, we generalise the guidance procedure to obtain product-of-expert ensembles of diffusion models and demonstrate how these may be used for, e.g., style interpolation, a contribution we believe is of independent interest.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
conformers, dance, diffusion models, ensemble models, generative models, gestures, guided interpolation, locomotion, machine learning, product of experts
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-335345 (URN)10.1145/3592458 (DOI)001044671300010 ()2-s2.0-85166332883 (Scopus ID)
Note

QC 20230907

Available from: 2023-09-07 Created: 2023-09-07 Last updated: 2023-09-22Bibliographically approved
Kucherenko, T., Nagy, R., Yoon, Y., Woo, J., Nikolov, T., Tsakov, M. & Henter, G. E. (2023). The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic setings. In: Proceedings Of The 25Th International Conference On Multimodal Interaction, Icmi 2023: . Paper presented at 25th International Conference on Multimodal Interaction (ICMI), OCT 09-13, 2023, Sorbonne Univ, Paris, France. (pp. 792-801). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>The GENEA Challenge 2023: A large-scale evaluation of gesture generation models in monadic and dyadic setings
Show others...
2023 (English)In: Proceedings Of The 25Th International Conference On Multimodal Interaction, Icmi 2023, Association for Computing Machinery (ACM) , 2023, p. 792-801Conference paper, Published paper (Refereed)
Abstract [en]

This paper reports on the GENEA Challenge 2023, in which participating teams built speech-driven gesture-generation systems using the same speech and motion dataset, followed by a joint evaluation. This year's challenge provided data on both sides of a dyadic interaction, allowing teams to generate full-body motion for an agent given its speech (text and audio) and the speech and motion of the interlocutor. We evaluated 12 submissions and 2 baselines together with held-out motion-capture data in several large-scale user studies. The studies focused on three aspects: 1) the human-likeness of the motion, 2) the appropriateness of the motion for the agent's own speech whilst controlling for the human-likeness of the motion, and 3) the appropriateness of the motion for the behaviour of the interlocutor in the interaction, using a setup that controls for both the human-likeness of the motion and the agent's own speech. We found a large span in human-likeness between challenge submissions, with a few systems rated close to human mocap. Appropriateness seems far from being solved, with most submissions performing in a narrow range slightly above chance, far behind natural motion. The efect of the interlocutor is even more subtle, with submitted systems at best performing barely above chance. Interestingly, a dyadic system being highly appropriate for agent speech does not necessarily imply high appropriateness for the interlocutor. Additional material is available via the project website at svito-zar.github.io/GENEAchallenge2023/.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
gesture generation, embodied conversational agents, evaluation paradigms, dyadic interaction, interlocutor awareness
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-343599 (URN)10.1145/3577190.3616120 (DOI)001147764700098 ()2-s2.0-85170511127 (Scopus ID)9798400700552 (ISBN)
Conference
25th International Conference on Multimodal Interaction (ICMI), OCT 09-13, 2023, Sorbonne Univ, Paris, France.
Note

Part of ISBN: 979-840070055-2

QC 20240223

Available from: 2024-02-23 Created: 2024-02-23 Last updated: 2024-02-26Bibliographically approved
Kucherenko, T., Nagy, R., Neff, M., Kjellström, H. & Henter, G. E. (2022). Multimodal analysis of the predictability of hand-gesture properties. In: AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems: . Paper presented at 21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, Auckland, New Zealand, May 9-13, 2022 (pp. 770-779). ACM Press
Open this publication in new window or tab >>Multimodal analysis of the predictability of hand-gesture properties
Show others...
2022 (English)In: AAMAS '22: Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, ACM Press, 2022, p. 770-779Conference paper, Published paper (Refereed)
Abstract [en]

Embodied conversational agents benefit from being able to accompany their speech with gestures. Although many data-driven approaches to gesture generation have been proposed in recent years, it is still unclear whether such systems can consistently generate gestures that convey meaning. We investigate which gesture properties (phase, category, and semantics) can be predicted from speech text and/or audio using contemporary deep learning. In extensive experiments, we show that gesture properties related to gesture meaning (semantics and category) are predictable from text features (time-aligned FastText embeddings) alone, but not from prosodic audio features, while rhythm-related gesture properties (phase) on the other hand can be predicted from audio features better than from text. These results are encouraging as they indicate that it is possible to equip an embodied agent with content-wise meaningful co-speech gestures using a machine-learning model.

Place, publisher, year, edition, pages
ACM Press, 2022
Keywords
embodied conversational agents, gesture generation, gesture analysis, gesture property
National Category
Computer Sciences Human Computer Interaction
Research subject
Computer Science; Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-312470 (URN)10.5555/3535850.3535937 (DOI)2-s2.0-85134341889 (Scopus ID)
Conference
21st International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2022, Auckland, New Zealand, May 9-13, 2022
Funder
Swedish Foundation for Strategic ResearchWallenberg AI, Autonomous Systems and Software Program (WASP)Knut and Alice Wallenberg Foundation
Note

Part of proceedings ISBN: 9781450392136

QC 20220621

Available from: 2022-05-19 Created: 2022-05-19 Last updated: 2023-04-26Bibliographically approved
Nagy, R., Kucherenko, T., Moell, B., Abelho Pereira, A. T., Kjellström, H. & Bernardet, U. (2021). A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents. In: : . Paper presented at 20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS)..
Open this publication in new window or tab >>A Framework for Integrating Gesture Generation Models into Interactive Conversational Agents
Show others...
2021 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Embodied conversational agents (ECAs) benefit from non-verbal behavior for natural and efficient interaction with users. Gesticulation – hand and arm movements accompanying speech – is an essential part of non-verbal behavior. Gesture generation models have been developed for several decades: starting with rule-based and ending with mainly data-driven methods. To date, recent end-to-end gesture generation methods have not been evaluated in areal-time interaction with users. We present a proof-of-concept

framework, which is intended to facilitate evaluation of modern gesture generation models in interaction. We demonstrate an extensible open-source framework that contains three components: 1) a 3D interactive agent; 2) a chatbot back-end; 3) a gesticulating system. Each component can be replaced,

making the proposed framework applicable for investigating the effect of different gesturing models in real-time interactions with different communication modalities, chatbot backends, or different agent appearances. The code and video are available at the project page https://nagyrajmund.github.io/project/gesturebot.

Keywords
conversational embodied agents; non-verbal behavior synthesis
National Category
Human Computer Interaction
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-304616 (URN)
Conference
20th International Conference on Autonomous Agents and Multiagent Systems (AAMAS).
Funder
Swedish Foundation for Strategic Research, RIT15-0107
Note

QC 20211130

Not duplicate with DiVA 1653872

Available from: 2021-11-08 Created: 2021-11-08 Last updated: 2022-06-25Bibliographically approved
Nagy, R., Kucherenko, T., Moell, B., Abelho Pereira, A. T., Kjellström, H. & Bernardet, U. (2021). A framework for integrating gesture generation models into interactive conversational agents. In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS: . Paper presented at 20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021, 3 May 2021 through 7 May 2021 (pp. 1767-1769). International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS)
Open this publication in new window or tab >>A framework for integrating gesture generation models into interactive conversational agents
Show others...
2021 (English)In: Proceedings of the International Joint Conference on Autonomous Agents and Multiagent Systems, AAMAS, International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS) , 2021, p. 1767-1769Conference paper, Published paper (Refereed)
Abstract [en]

Embodied conversational agents (ECAs) benefit from non-verbal behavior for natural and efficient interaction with users. Gesticulation - hand and arm movements accompanying speech - is an essential part of non-verbal behavior. Gesture generation models have been developed for several decades: starting with rule-based and ending with mainly data-driven methods. To date, recent end to- end gesture generation methods have not been evaluated in a real-time interaction with users. We present a proof-of-concept framework, which is intended to facilitate evaluation of modern gesture generation models in interaction. We demonstrate an extensible open-source framework that contains three components: 1) a 3D interactive agent; 2) a chatbot backend; 3) a gesticulating system. Each component can be replaced, making the proposed framework applicable for investigating the effect of different gesturing models in real-time interactions with different communication modalities, chatbot backends, or different agent appearances. The code and video are available at the project page https://nagyrajmund.github.io/project/gesturebot. 

Place, publisher, year, edition, pages
International Foundation for Autonomous Agents and Multiagent Systems (IFAAMAS), 2021
Keywords
Conversational embodied agents, Non-verbal behavior synthesis, Multi agent systems, Open systems, Speech, Communication modalities, Conversational agents, Data-driven methods, Efficient interaction, Embodied conversational agent, Interactive agents, Open source frameworks, Real time interactions, Autonomous agents
National Category
Human Computer Interaction Computer Sciences
Identifiers
urn:nbn:se:kth:diva-311130 (URN)2-s2.0-85112311041 (Scopus ID)
Conference
20th International Conference on Autonomous Agents and Multiagent Systems, AAMAS 2021, 3 May 2021 through 7 May 2021
Note

Part of proceedings: ISBN 978-1-7138-3262-1

QC 20220425

Available from: 2022-04-25 Created: 2022-04-25 Last updated: 2023-01-17Bibliographically approved
Kucherenko, T., Nagy, R., Jonell, P., Neff, M., Kjellström, H. & Henter, G. E. (2021). Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech. In: IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents. Paper presented at 21st ACM International Conference on Intelligent Virtual Agents, IVA 2021Virtual, Online14 September 2021 through 17 September 2021, University of Fukuchiyama, Fukuchiyama City, Kyoto, Japan (pp. 145-147). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech
Show others...
2021 (English)In: IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents, Association for Computing Machinery (ACM) , 2021, p. 145-147Conference paper, Published paper (Refereed)
Abstract [en]

We propose a new framework for gesture generation, aiming to allow data-driven approaches to produce more semantically rich gestures. Our approach first predicts whether to gesture, followed by a prediction of the gesture properties. Those properties are then used as conditioning for a modern probabilistic gesture-generation model capable of high-quality output. This empowers the approach to generate gestures that are both diverse and representational. Follow-ups and more information can be found on the project page:https://svito-zar.github.io/speech2properties2gestures

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
gesture generation, virtual agents, representational gestures
National Category
Human Computer Interaction
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-302667 (URN)10.1145/3472306.3478333 (DOI)000728149900023 ()2-s2.0-85113524837 (Scopus ID)
Conference
21st ACM International Conference on Intelligent Virtual Agents, IVA 2021Virtual, Online14 September 2021 through 17 September 2021, University of Fukuchiyama, Fukuchiyama City, Kyoto, Japan
Funder
Swedish Foundation for Strategic Research , RIT15-0107Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20211102

Part of Proceedings: ISBN 9781450386197

Available from: 2021-09-28 Created: 2021-09-28 Last updated: 2022-06-25Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-9653-6699

Search in DiVA

Show all publications