kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (5 of 5) Show all publications
Deichler, A., Mehta, S., Alexanderson, S. & Beskow, J. (2023). Difusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation. In: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023: . Paper presented at 25th International Conference on Multimodal Interaction (ICMI), OCT 09-13, 2023, Sorbonne Univ, Paris, FRANCE (pp. 755-762). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Difusion-Based Co-Speech Gesture Generation Using Joint Text and Audio Representation
2023 (English)In: PROCEEDINGS OF THE 25TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2023, Association for Computing Machinery (ACM) , 2023, p. 755-762Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes a system developed for the GENEA (Generation and Evaluation of Non-verbal Behaviour for Embodied Agents) Challenge 2023. Our solution builds on an existing difusion-based motion synthesis model. We propose a contrastive speech and motion pretraining (CSMP) module, which learns a joint embedding for speech and gesture with the aim to learn a semantic coupling between these modalities. The output of the CSMP module is used as a conditioning signal in the difusion-based gesture synthesis model in order to achieve semantically-aware co-speech gesture generation. Our entry achieved highest human-likeness and highest speech appropriateness rating among the submitted entries. This indicates that our system is a promising approach to achieve human-like co-speech gestures in agents that carry semantic meaning.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
gesture generation, motion synthesis, difusion models, contrastive pre-training, semantic gestures
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-343773 (URN)10.1145/3577190.3616117 (DOI)001147764700093 ()2-s2.0-85170496681 (Scopus ID)
Conference
25th International Conference on Multimodal Interaction (ICMI), OCT 09-13, 2023, Sorbonne Univ, Paris, FRANCE
Note

Part of proceedings ISBN 979-8-4007-0055-2

QC 20240222

Available from: 2024-02-22 Created: 2024-02-22 Last updated: 2024-03-05Bibliographically approved
Deichler, A., Wang, S., Alexanderson, S. & Beskow, J. (2023). Learning to generate pointing gestures in situated embodied conversational agents. Frontiers in Robotics and AI, 10, Article ID 1110534.
Open this publication in new window or tab >>Learning to generate pointing gestures in situated embodied conversational agents
2023 (English)In: Frontiers in Robotics and AI, E-ISSN 2296-9144, Vol. 10, article id 1110534Article in journal (Refereed) Published
Abstract [en]

One of the main goals of robotics and intelligent agent research is to enable them to communicate with humans in physically situated settings. Human communication consists of both verbal and non-verbal modes. Recent studies in enabling communication for intelligent agents have focused on verbal modes, i.e., language and speech. However, in a situated setting the non-verbal mode is crucial for an agent to adapt flexible communication strategies. In this work, we focus on learning to generate non-verbal communicative expressions in situated embodied interactive agents. Specifically, we show that an agent can learn pointing gestures in a physically simulated environment through a combination of imitation and reinforcement learning that achieves high motion naturalness and high referential accuracy. We compared our proposed system against several baselines in both subjective and objective evaluations. The subjective evaluation is done in a virtual reality setting where an embodied referential game is played between the user and the agent in a shared 3D space, a setup that fully assesses the communicative capabilities of the generated gestures. The evaluations show that our model achieves a higher level of referential accuracy and motion naturalness compared to a state-of-the-art supervised learning motion synthesis model, showing the promise of our proposed system that combines imitation and reinforcement learning for generating communicative gestures. Additionally, our system is robust in a physically-simulated environment thus has the potential of being applied to robots.

Place, publisher, year, edition, pages
Frontiers Media SA, 2023
Keywords
reinforcement learning, imitation learning, non-verbal communication, embodied interactive agents, gesture generation, physics-aware machine learning
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-326625 (URN)10.3389/frobt.2023.1110534 (DOI)000970385800001 ()37064574 (PubMedID)2-s2.0-85153351800 (Scopus ID)
Note

QC 20230508

Available from: 2023-05-08 Created: 2023-05-08 Last updated: 2023-05-08Bibliographically approved
Torre, I., Deichler, A., Nicholson, M., McDonnell, R. & Harte, N. (2022). To smile or not to smile: The effect of mismatched emotional expressions in a Human-Robot cooperative task. In: 2022 31St Ieee International Conference On Robot And Human Interactive Communication (Ieee Ro-Man 2022): . Paper presented at 31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) - Social, Asocial, and Antisocial Robots, AUG 29-SEP 02, 2022, Napoli, ITALY (pp. 8-13). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>To smile or not to smile: The effect of mismatched emotional expressions in a Human-Robot cooperative task
Show others...
2022 (English)In: 2022 31St Ieee International Conference On Robot And Human Interactive Communication (Ieee Ro-Man 2022), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 8-13Conference paper, Published paper (Refereed)
Abstract [en]

Emotional expressivity is essential for successful Human-Robot Interaction. However, robots often have different levels of expressivity in their face and voice. Here we ask whether this modality mismatch influences human behaviour and perception of the robot. Participants played a cooperative task with a robot that displayed matched and mismatched smiling expressions in the face and voice. Emotional expressivity did not influence acceptance of robot's recommendations or subjective evaluations of the robot. However, we found that the robot had overall a higher social influence than a virtual character, and was evaluated more positively.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-322303 (URN)10.1109/RO-MAN53752.2022.9900592 (DOI)000885903300002 ()2-s2.0-85140789193 (Scopus ID)
Conference
31st IEEE International Conference on Robot and Human Interactive Communication (RO-MAN) - Social, Asocial, and Antisocial Robots, AUG 29-SEP 02, 2022, Napoli, ITALY
Note

Part of proceedings: ISBN 978-1-7281-8859-1

QC 20221212

Available from: 2022-12-12 Created: 2022-12-12 Last updated: 2022-12-15Bibliographically approved
Deichler, A., Wang, S., Alexanderson, S. & Beskow, J. (2022). Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation. In: : . Paper presented at Context-Awareness in Human-Robot Interaction: Approaches and Challenges, workshop at 2022 ACM/IEEE International Conference on Human-Robot Interaction (pp. 2022).
Open this publication in new window or tab >>Towards Context-Aware Human-like Pointing Gestures with RL Motion Imitation
2022 (English)Conference paper, Oral presentation with published abstract (Refereed)
Abstract [en]

Pointing is an important mode of interaction with robots. While large amounts of prior studies focus on recognition of human pointing, there is a lack of investigation into generating context-aware human-like pointing gestures, a shortcoming we hope to address. We first collect a rich dataset of human pointing gestures and corresponding pointing target locations with accurate motion capture. Analysis of the dataset shows that it contains various pointing styles, handedness, and well-distributed target positions in surrounding 3D space in both single-target pointing scenario and two-target point-and-place.We then train reinforcement learning (RL) control policies in physically realistic simulation to imitate the pointing motion in the dataset while maximizing pointing precision reward.We show that our RL motion imitation setup allows models to learn human-like pointing dynamics while maximizing task reward (pointing precision). This is promising for incorporating additional context in the form of task reward to enable flexible context-aware pointing behaviors in a physically realistic environment while retaining human-likeness in pointing motion dynamics.

Keywords
motion generation, reinforcement learning, referring actions, pointing gestures, human-robot interaction, motion capture
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-313480 (URN)
Conference
Context-Awareness in Human-Robot Interaction: Approaches and Challenges, workshop at 2022 ACM/IEEE International Conference on Human-Robot Interaction
Note

QC 20220607

Available from: 2022-06-03 Created: 2022-06-03 Last updated: 2022-06-25Bibliographically approved
Jonell, P., Deichler, A., Torre, I., Leite, I. & Beskow, J. (2021). Mechanical Chameleons: Evaluating the effects of a social robot’snon-verbal behavior on social influence. In: Proceedings of SCRITA 2021, a workshop at IEEE RO-MAN 2021: . Paper presented at Trust, Acceptance and Social Cues in Human-Robot Interaction - SCRITA, 12 August, 2021.
Open this publication in new window or tab >>Mechanical Chameleons: Evaluating the effects of a social robot’snon-verbal behavior on social influence
Show others...
2021 (English)In: Proceedings of SCRITA 2021, a workshop at IEEE RO-MAN 2021, 2021Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present a pilot study which investigates how non-verbal behavior affects social influence in social robots. We also present a modular system which is capable of controlling the non-verbal behavior based on the interlocutor's facial gestures (head movements and facial expressions) in real time, and a study investigating whether three different strategies for facial gestures ("still", "natural movement", i.e. movements recorded from another conversation, and "copy", i.e. mimicking the user with a four second delay) has any affect on social influence and decision making in a "survival task". Our preliminary results show there was no significant difference between the three conditions, but this might be due to among other things a low number of study participants (12). 

National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-309464 (URN)
Conference
Trust, Acceptance and Social Cues in Human-Robot Interaction - SCRITA, 12 August, 2021
Funder
Swedish Foundation for Strategic Research , RIT15-0107Swedish Research Council, 2018-05409
Note

QC 20220308

Available from: 2022-03-03 Created: 2022-03-03 Last updated: 2022-06-25Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3135-5683

Search in DiVA

Show all publications