kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (9 of 9) Show all publications
Mehta, S., Deichler, A., O'Regan, J., Moëll, B., Beskow, J., Henter, G. E. & Alexanderson, S. (2024). Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition: . Paper presented at IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 1952-1964).
Open this publication in new window or tab >>Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Show others...
2024 (English)In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, p. 1952-1964Conference paper, Published paper (Refereed)
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-355103 (URN)
Conference
IEEE/CVF Conference on Computer Vision and Pattern Recognition
Projects
bodytalk
Note

QC 20241022

Available from: 2024-10-22 Created: 2024-10-22 Last updated: 2024-10-22Bibliographically approved
Werner, A. W., Beskow, J. & Deichler, A. (2024). Gesture Evaluation in Virtual Reality. In: ICMI Companion 2024 - Companion Publication of the 26th International Conference on Multimodal Interaction: . Paper presented at 26th International Conference on Multimodal Interaction, ICMI Companion 2024, San Jose, Costa Rica, Nov 4 2024 - Nov 8 2024 (pp. 156-164). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Gesture Evaluation in Virtual Reality
2024 (English)In: ICMI Companion 2024 - Companion Publication of the 26th International Conference on Multimodal Interaction, Association for Computing Machinery (ACM) , 2024, p. 156-164Conference paper, Published paper (Refereed)
Abstract [en]

Gestures play a crucial role in human communication, enhancing interpersonal interactions through non-verbal expression. Burgeoning technology allows virtual avatars to leverage communicative gestures to enhance their life-likeness and communication quality with AI-generated gestures. Traditionally, evaluations of AI-generated gestures have been confined to 2D settings. However, Virtual Reality (VR) offers an immersive alternative with the potential to affect the perception of virtual gestures. This paper introduces a novel evaluation approach for computer-generated gestures, investigating the impact of a fully immersive environment compared to a traditional 2D setting. The goal is to find the differences, benefits, and drawbacks of the two alternatives. Furthermore, the study also aims to investigate three gesture generation algorithms submitted to the 2023 GENEA Challenge and evaluate their performance in the two virtual settings. Experiments showed that the VR setting has an impact on the rating of generated gestures. Participants tended to rate gestures observed in VR slightly higher on average than in 2D. Furthermore, the results of the study showed that the generation models used for the study had a consistent ranking. However, the setting had a limited impact on the models' performance, having a bigger impact on the perception of 'true movement' which had higher ratings in VR than in 2D.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
dyadic interaction, embodied conversational agents, evaluation paradigms, gesture generation, virtual reality
National Category
Human Computer Interaction General Language Studies and Linguistics Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-357898 (URN)10.1145/3686215.3688821 (DOI)001429038200033 ()2-s2.0-85211184881 (Scopus ID)
Conference
26th International Conference on Multimodal Interaction, ICMI Companion 2024, San Jose, Costa Rica, Nov 4 2024 - Nov 8 2024
Note

Part of ISBN 9798400704635

QC 20250114

Available from: 2024-12-19 Created: 2024-12-19 Last updated: 2025-03-24Bibliographically approved
Deichler, A., Alexanderson, S. & Beskow, J. (2024). Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents. In: Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents, IVA 2024: . Paper presented at 24th ACM International Conference on Intelligent Virtual Agents, IVA 2024, co-located with the Affective Computing and Intelligent Interaction 2024 Conference, ACII 2024, Glasgow, United Kingdom of Great Britain and Northern Ireland, September 16-19, 2024. Association for Computing Machinery (ACM), Article ID 42.
Open this publication in new window or tab >>Incorporating Spatial Awareness in Data-Driven Gesture Generation for Virtual Agents
2024 (English)In: Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents, IVA 2024, Association for Computing Machinery (ACM) , 2024, article id 42Conference paper, Published paper (Refereed)
Abstract [en]

This paper focuses on enhancing human-agent communication by integrating spatial context into virtual agents’ non-verbal behaviors, specifically gestures. Recent advances in co-speech gesture generation have primarily utilized data-driven methods, which create natural motion but limit the scope of gestures to those performed in a void. Our work aims to extend these methods by enabling generative models to incorporate scene information into speech-driven gesture synthesis. We introduce a novel synthetic gesture dataset tailored for this purpose. This development represents a critical step toward creating embodied conversational agents that interact more naturally with their environment and users.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
Co-speech gesture, Deictic gestures, Gesture generation, Situated virtual agents, Synthetic data
National Category
Human Computer Interaction Computer Sciences
Identifiers
urn:nbn:se:kth:diva-359256 (URN)10.1145/3652988.3673936 (DOI)001441957400042 ()2-s2.0-85215524347 (Scopus ID)
Conference
24th ACM International Conference on Intelligent Virtual Agents, IVA 2024, co-located with the Affective Computing and Intelligent Interaction 2024 Conference, ACII 2024, Glasgow, United Kingdom of Great Britain and Northern Ireland, September 16-19, 2024
Note

Part of ISBN 9798400706257

QC 20250203

Available from: 2025-01-29 Created: 2025-01-29 Last updated: 2025-04-30Bibliographically approved
Deichler, A., Mehta, S., Alexanderson, S. & Beskow, J. (2023). Difusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation. In: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023: . Paper presented at 25th International Conference on Multimodal Interaction (ICMI), OCT 09-13, 2023, Sorbonne Univ, Paris, FRANCE (pp. 755-762). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Difusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
2023 (English)In: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, Association for Computing Machinery (ACM) , 2023, p. 755-762Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing difusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these modalities. The output of the CSMP module is used as a conditioning signal in the difusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation. Our entry achieved highest human-likeness and highest speech appropriateness rating among the submitted entries. This indicates that our system is a promising approach to achieve human-like co-speech gestures in agents that carry semantic meaning.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
gesture generation, motion synthesis, difusion models, contrastive pre-training, semantic gestures
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-343773 (URN)10.1145/3577190.3616117 (DOI)001147764700093 ()2-s2.0-85170496681 (Scopus ID)
Conference
25th International Conference on Multimodal Interaction (ICMI), OCT 09-13, 2023, Sorbonne Univ, Paris, FRANCE
Note

Part of proceedings ISBN 979-8-4007-0055-2

QC 20240222

Available from: 2024-02-22 Created: 2024-02-22 Last updated: 2025-02-07Bibliographically approved
Deichler, A., Wang, S., Alexanderson, S. & Beskow, J. (2023). Learning to generate pointing gestures in situated embodied conversational agents. Frontiers in Robotics and AI, 10, Article ID 1110534.
Open this publication in new window or tab >>Learning to generate pointing gestures in situated embodied conversational agents
2023 (English)In: Frontiers in Robotics and AI, E-ISSN 2296-9144, Vol. 10, article id 1110534Article in journal (Refereed) Published
Abstract [en]

One of the main goals of robotics and intelligent agent research is to enable them to communicate with humans in physically situated settings. Human communication consists of both verbal and non-verbal modes. Recent studies in enabling communication for intelligent agents have focused on verbal modes, i.e., language and speech. However, in a situated setting the non-verbal mode is crucial for an agent to adapt flexible communication strategies. In this work, we focus on learning to generate non-verbal communicative expressions in situated embodied interactive agents. Specifically, we show that an agent can learn pointing gestures in a physically simulated environment through a combination of imitation and reinforcement learning that achieves high motion naturalness and high referential accuracy. We compared our proposed system against several baselines in both subjective and objective evaluations. The subjective evaluation is done in a virtual reality setting where an embodied referential game is played between the user and the agent in a shared 3D space, a setup that fully assesses the communicative capabilities of the generated gestures. The evaluations show that our model achieves a higher level of referential accuracy and motion naturalness compared to a state-of-the-art supervised learning motion synthesis model, showing the promise of our proposed system that combines imitation and reinforcement learning for generating communicative gestures. Additionally, our system is robust in a physically-simulated environment thus has the potential of being applied to robots.

Place, publisher, year, edition, pages
Frontiers Media SA, 2023
Keywords
reinforcement learning, imitation learning, non-verbal communication, embodied interactive agents, gesture generation, physics-aware machine learning
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-326625 (URN)10.3389/frobt.2023.1110534 (DOI)000970385800001 ()37064574 (PubMedID)2-s2.0-85153351800 (Scopus ID)
Note

QC 20230508

Available from: 2023-05-08 Created: 2023-05-08 Last updated: 2023-05-08Bibliographically approved
Torre, I., Deichler, A., Nicholson, M., McDonnell, R. & Harte, N. (2022). To smile or not to smile: The effect of mismatched emotional expressions in a Human-Robot cooperative task. In: 2022 31St Ieee International Conference On Robot And Human Interactive Communication (Ieee Ro-Man 2022): . Paper presented at 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) - Social, Asocial, and Antisocial Robots, AUG 29-SEP 02, 2022, Napoli, ITALY (pp. 8-13). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>To smile or not to smile: The effect of mismatched emotional expressions in a Human-Robot cooperative task
Show others...
2022 (English)In: 2022 31St Ieee International Conference On Robot And Human Interactive Communication (Ieee Ro-Man 2022), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 8-13Conference paper, Published paper (Refereed)
Abstract [en]

Emotional expressivity is essential for successful Human-Robot Interaction. However, robots often have different levels of expressivity in their face and voice. Here we ask whether this modality mismatch influences human behaviour and perception of the robot. Participants played a cooperative task with a robot that displayed matched and mismatched smiling expressions in the face and voice. Emotional expressivity did not influence acceptance of robot's recommendations or subjective evaluations of the robot. However, we found that the robot had overall a higher social influence than a virtual character, and was evaluated more positively.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-322303 (URN)10.1109/RO-MAN53752.2022.9900592 (DOI)000885903300002 ()2-s2.0-85140789193 (Scopus ID)
Conference
31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) - Social, Asocial, and Antisocial Robots, AUG 29-SEP 02, 2022, Napoli, ITALY
Note

Part of proceedings: ISBN 978-1-7281-8859-1

QC 20221212

Available from: 2022-12-12 Created: 2022-12-12 Last updated: 2022-12-15Bibliographically approved
Deichler, A., Wang, S., Alexanderson, S. & Beskow, J. (2022). Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation. In: : . Paper presented at Context-Awareness in Human-Robot Interaction: Approaches and Challenges, workshop at 2022 ACM/IEEE International Conference on Human-Robot Interaction (pp. 2022).
Open this publication in new window or tab >>Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation
2022 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Pointing is an important mode of interaction with robots. While large amounts of prior studies focus on recognition of human pointing, there is a lack of investigation into generating context-aware human-like pointing gestures, a shortcoming we hope to address. We first collect a rich dataset of human pointing gestures and corresponding pointing target locations with accurate motion capture. Analysis of the dataset shows that it contains various pointing styles, handedness, and well-distributed target positions in surrounding 3D space in both single-target pointing scenario and two-target point-and-place.We then train reinforcement learning (RL) control policies in physically realistic simulation to imitate the pointing motion in the dataset while maximizing pointing precision reward.We show that our RL motion imitation setup allows models to learn human-like pointing dynamics while maximizing task reward (pointing precision). This is promising for incorporating additional context in the form of task reward to enable flexible context-aware pointing behaviors in a physically realistic environment while retaining human-likeness in pointing motion dynamics.

Keywords
motion generation, reinforcement learning, referring actions, pointing gestures, human-robot interaction, motion capture
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-313480 (URN)
Conference
Context-Awareness in Human-Robot Interaction: Approaches and Challenges, workshop at 2022 ACM/IEEE International Conference on Human-Robot Interaction
Note

QC 20220607

Available from: 2022-06-03 Created: 2022-06-03 Last updated: 2022-06-25Bibliographically approved
Jonell, P., Deichler, A., Torre, I., Leite, I. & Beskow, J. (2021). Mechanical Chameleons: Evaluating the effects of a social robot’snon-verbal behavior on social influence. In: Proceedings of SCRITA 2021, a workshop at IEEE RO-MAN 2021: . Paper presented at Trust, Acceptance and Social Cues in Human-Robot Interaction - SCRITA, 12 August, 2021.
Open this publication in new window or tab >>Mechanical Chameleons: Evaluating the effects of a social robot’snon-verbal behavior on social influence
Show others...
2021 (English)In: Proceedings of SCRITA 2021, a workshop at IEEE RO-MAN 2021, 2021Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present a pilot study which investigates how non-verbal behavior affects social influence in social robots. We also present a modular system which is capable of controlling the non-verbal behavior based on the interlocutor's facial gestures (head movements and facial expressions) in real time, and a study investigating whether three different strategies for facial gestures ("still", "natural movement", i.e. movements recorded from another conversation, and "copy", i.e. mimicking the user with a four second delay) has any affect on social influence and decision making in a "survival task". Our preliminary results show there was no significant difference between the three conditions, but this might be due to among other things a low number of study participants (12). 

National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-309464 (URN)
Conference
Trust, Acceptance and Social Cues in Human-Robot Interaction - SCRITA, 12 August, 2021
Funder
Swedish Foundation for Strategic Research , RIT15-0107Swedish Research Council, 2018-05409
Note

QC 20220308

Available from: 2022-03-03 Created: 2022-03-03 Last updated: 2022-06-25Bibliographically approved
Chhatre, K., Deichler, A., Peters, C. & Beskow, J. (2021). Spatio-temporal priors in 3D human motion. In: IEEE ICDL Workshop on Spatio-temporal Aspects of Embodied Predictive Processing: . Paper presented at IEEE ICDL Workshop on Spatio-temporal Aspects of Embodied Predictive Processing, Online, 22 Aug 2021.
Open this publication in new window or tab >>Spatio-temporal priors in 3D human motion
2021 (English)In: IEEE ICDL Workshop on Spatio-temporal Aspects of Embodied Predictive Processing, 2021Conference paper, Oral presentation only (Refereed)
Abstract [en]

When we practice a movement, human brains creates a motor memory of it. These memories are formed and stored in the brain as representations which allows us to perform familiar tasks faster than new movements. From a developmental robotics and embodied artificial agent perspective it could be also beneficial to exploit the concept of these motor representations in the form of spatial-temporal motion priors for complex, full-body motion synthesis. Encoding such priors in neural networks in a form of inductive biases inherit essential spatio-temporality aspect of human motion. In our current work we examine and compare recent approaches for capturing spatial and temporal dependencies with machine learning algorithms that are used to model human motion.

National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-354050 (URN)10.13140/RG.2.2.28042.80327 (DOI)
Conference
IEEE ICDL Workshop on Spatio-temporal Aspects of Embodied Predictive Processing, Online, 22 Aug 2021
Note

QC 20240930

Available from: 2024-09-26 Created: 2024-09-26 Last updated: 2024-09-30Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3135-5683

Search in DiVA

Show all publications