Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 127) Show all publications
Alexanderson, S., Henter, G. E., Kucherenko, T. & Beskow, J. (2020). Style-Controllable Speech-Driven Gesture SynthesisUsing Normalising Flows. In: : . Paper presented at EUROGRAPHICS 2020.
Open this publication in new window or tab >>Style-Controllable Speech-Driven Gesture SynthesisUsing Normalising Flows
2020 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Automatic synthesis of realistic gestures promises to transform the fields of animation, avatars and communicative agents. In off-line applications, novel tools can alter the role of an animator to that of a director, who provides only high-level input for the desired animation; a learned network then translates these instructions into an appropriate sequence of body poses. In interactive scenarios, systems for generating natural animations on the fly are key to achieving believable and relatable characters. In this paper we address some of the core issues towards these ends. By adapting a deep learning-based motion synthesis method called MoGlow, we propose a new generative model for generating state-of-the-art realistic speech-driven gesticulation. Owing to the probabilistic nature of the approach, our model can produce a battery of different, yet plausible, gestures given the same input speech signal. Just like humans, this gives a rich natural variation of motion. We additionally demonstrate the ability to exert directorial control over the output style, such as gesture level, speed, symmetry and spacial extent. Such control can be leveraged to convey a desired character personality or mood. We achieve all this without any manual annotation of the data. User studies evaluating upper-body gesticulation confirm that the generated motions are natural and well match the input speech. Our method scores above all prior systems and baselines on these measures, and comes close to the ratings of the original recorded motions. We furthermore find that we can accurately control gesticulation styles without unnecessarily compromising perceived naturalness. Finally, we also demonstrate an application of the same method to full-body gesticulation, including the synthesis of stepping motion and stance.

National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-268363 (URN)
Conference
EUROGRAPHICS 2020
Available from: 2020-02-18 Created: 2020-02-18 Last updated: 2020-02-18
Chen, C., Hensel, L. B., Duan, Y., Ince, R. A., Garrod, O. G., Beskow, J., . . . Schyns, P. G. (2019). Equipping social robots with culturally-sensitive facial expressions of emotion using data-driven methods. In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019): . Paper presented at 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019) (pp. 1-8).
Open this publication in new window or tab >>Equipping social robots with culturally-sensitive facial expressions of emotion using data-driven methods
Show others...
2019 (English)In: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), 2019, p. 1-8Conference paper, Published paper (Refereed)
National Category
Psychology
Identifiers
urn:nbn:se:kth:diva-268351 (URN)
Conference
2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019)
Available from: 2020-02-18 Created: 2020-02-18 Last updated: 2020-02-18
Székely, É., Henter, G. E., Beskow, J. & Gustafson, J. (2019). How to train your fillers: uh and um in spontaneous speech synthesis. In: : . Paper presented at The 10th ISCA Speech Synthesis Workshop.
Open this publication in new window or tab >>How to train your fillers: uh and um in spontaneous speech synthesis
2019 (English)Conference paper, Published paper (Refereed)
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-261693 (URN)
Conference
The 10th ISCA Speech Synthesis Workshop
Note

QC 20191011

Available from: 2019-10-10 Created: 2019-10-10 Last updated: 2019-10-11Bibliographically approved
Jonell, P., Kucherenko, T., Ekstedt, E. & Beskow, J. (2019). Learning Non-verbal Behavior for a Social Robot from YouTube Videos. In: : . Paper presented at ICDL-EpiRob Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions, Oslo, Norway, August 19, 2019.
Open this publication in new window or tab >>Learning Non-verbal Behavior for a Social Robot from YouTube Videos
2019 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Non-verbal behavior is crucial for positive perception of humanoid robots. If modeled well it can improve the interaction and leave the user with a positive experience, on the other hand, if it is modelled poorly it may impede the interaction and become a source of distraction. Most of the existing work on modeling non-verbal behavior show limited variability due to the fact that the models employed are deterministic and the generated motion can be perceived as repetitive and predictable. In this paper, we present a novel method for generation of a limited set of facial expressions and head movements, based on a probabilistic generative deep learning architecture called Glow. We have implemented a workflow which takes videos directly from YouTube, extracts relevant features, and trains a model that generates gestures that can be realized in a robot without any post processing. A user study was conducted and illustrated the importance of having any kind of non-verbal behavior while most differences between the ground truth, the proposed method, and a random control were not significant (however, the differences that were significant were in favor of the proposed method).

Keywords
Facial expressions, non-verbal behavior, generative models, neural network, head movement, social robotics
National Category
Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-261242 (URN)
Conference
ICDL-EpiRob Workshop on Naturalistic Non-Verbal and Affective Human-Robot Interactions, Oslo, Norway, August 19, 2019
Funder
Swedish Foundation for Strategic Research , RIT15-0107
Note

QC 20191007

Available from: 2019-10-03 Created: 2019-10-03 Last updated: 2019-10-07Bibliographically approved
Stefanov, K., Salvi, G., Kontogiorgos, D., Kjellström, H. & Beskow, J. (2019). Modeling of Human Visual Attention in Multiparty Open-World Dialogues. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 8(2), Article ID UNSP 8.
Open this publication in new window or tab >>Modeling of Human Visual Attention in Multiparty Open-World Dialogues
Show others...
2019 (English)In: ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, ISSN 2573-9522, Vol. 8, no 2, article id UNSP 8Article in journal (Refereed) Published
Abstract [en]

This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.

Place, publisher, year, edition, pages
ASSOC COMPUTING MACHINERY, 2019
Keywords
Human-human interaction, open-world dialogue, eye-gaze direction, head orientation, multiparty
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-255203 (URN)10.1145/3323231 (DOI)000472066800003 ()
Note

QC 20190904

Available from: 2019-09-04 Created: 2019-09-04 Last updated: 2019-10-15Bibliographically approved
Malisz, Z., Henter, G. E., Valentini-Botinhao, C., Watts, O., Beskow, J. & Gustafson, J. (2019). Modern speech synthesis for phonetic sciences: A discussion and an evaluation. In: Proceedings of ICPhS: . Paper presented at International Congress of Phonetic Sciences.
Open this publication in new window or tab >>Modern speech synthesis for phonetic sciences: A discussion and an evaluation
Show others...
2019 (English)In: Proceedings of ICPhS, 2019Conference paper, Published paper (Refereed)
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-260956 (URN)
Conference
International Congress of Phonetic Sciences
Note

QC 20191112

Available from: 2019-09-30 Created: 2019-09-30 Last updated: 2019-11-12Bibliographically approved
Malisz, Z., Henter, G. E., Valentini-Botinhao, C., Watts, O., Beskow, J. & Gustafson, J. (2019). Modern speech synthesis for phonetic sciences: A discussion and an evaluation. In: Proceedings of ICPhS: . Paper presented at International Congress of Phonetic Sciences ICPhS 2019.
Open this publication in new window or tab >>Modern speech synthesis for phonetic sciences: A discussion and an evaluation
Show others...
2019 (English)In: Proceedings of ICPhS, 2019Conference paper, Published paper (Refereed)
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:kth:diva-268353 (URN)
Conference
International Congress of Phonetic Sciences ICPhS 2019
Available from: 2020-02-18 Created: 2020-02-18 Last updated: 2020-02-18
Henter, G. E., Alexanderson, S. & Beskow, J. (2019). Moglow: Probabilistic and controllable motion synthesis using normalising flows. arXiv preprint arXiv:1905.06598
Open this publication in new window or tab >>Moglow: Probabilistic and controllable motion synthesis using normalising flows
2019 (English)In: arXiv preprint arXiv:1905.06598Article in journal (Other academic) Published
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-268348 (URN)
Available from: 2020-02-18 Created: 2020-02-18 Last updated: 2020-02-18
Skantze, G., Gustafson, J. & Beskow, J. (2019). Multimodal Conversational Interaction with Robots. In: Sharon Oviatt, Björn Schuller, Philip R. Cohen, Daniel Sonntag, Gerasimos Potamianos, Antonio Krüger (Ed.), The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions. ACM Press
Open this publication in new window or tab >>Multimodal Conversational Interaction with Robots
2019 (English)In: The Handbook of Multimodal-Multisensor Interfaces, Volume 3: Language Processing, Software, Commercialization, and Emerging Directions / [ed] Sharon Oviatt, Björn Schuller, Philip R. Cohen, Daniel Sonntag, Gerasimos Potamianos, Antonio Krüger, ACM Press, 2019Chapter in book (Refereed)
Place, publisher, year, edition, pages
ACM Press, 2019
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-254650 (URN)9781970001723 (ISBN)
Note

QC 20190821

Available from: 2019-07-02 Created: 2019-07-02 Last updated: 2019-08-21Bibliographically approved
Székely, É., Henter, G. E., Beskow, J. & Gustafson, J. (2019). Off the cuff: Exploring extemporaneous speech delivery with TTS. In: : . Paper presented at Interspeech.
Open this publication in new window or tab >>Off the cuff: Exploring extemporaneous speech delivery with TTS
2019 (English)Conference paper, Published paper (Refereed)
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-261691 (URN)
Conference
Interspeech
Note

QC 20191011

Available from: 2019-10-10 Created: 2019-10-10 Last updated: 2019-10-11Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-1399-6604

Search in DiVA

Show all publications