Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Speech2Properties2Gestures: Gesture-Property Prediction as a Tool for Generating Representational Gestures from Speech
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Robotik, perception och lärande, RPL.ORCID-id: 0000-0001-9838-8848
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-3687-6189
Vise andre og tillknytning
2021 (engelsk)Inngår i: IVA '21: Proceedings of the 21st ACM International Conference on Intelligent Virtual Agents, Association for Computing Machinery (ACM) , 2021, s. 145-147Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We propose a new framework for gesture generation, aiming to allow data-driven approaches to produce more semantically rich gestures. Our approach first predicts whether to gesture, followed by a prediction of the gesture properties. Those properties are then used as conditioning for a modern probabilistic gesture-generation model capable of high-quality output. This empowers the approach to generate gestures that are both diverse and representational. Follow-ups and more information can be found on the project page:https://svito-zar.github.io/speech2properties2gestures

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM) , 2021. s. 145-147
Emneord [en]
gesture generation, virtual agents, representational gestures
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
URN: urn:nbn:se:kth:diva-302667DOI: 10.1145/3472306.3478333ISI: 000728149900023Scopus ID: 2-s2.0-85113524837OAI: oai:DiVA.org:kth-302667DiVA, id: diva2:1598413
Konferanse
21st ACM International Conference on Intelligent Virtual Agents, IVA 2021Virtual, Online14 September 2021 through 17 September 2021, University of Fukuchiyama, Fukuchiyama City, Kyoto, Japan
Forskningsfinansiär
Swedish Foundation for Strategic Research , RIT15-0107Wallenberg AI, Autonomous Systems and Software Program (WASP)
Merknad

QC 20211102

Part of Proceedings: ISBN 9781450386197

Tilgjengelig fra: 2021-09-28 Laget: 2021-09-28 Sist oppdatert: 2022-06-25bibliografisk kontrollert
Inngår i avhandling
1. Developing and evaluating co-speech gesture-synthesis models for embodied conversational agents
Åpne denne publikasjonen i ny fane eller vindu >>Developing and evaluating co-speech gesture-synthesis models for embodied conversational agents
2021 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

 A  large part of our communication is non-verbal:   humans use non-verbal behaviors to express various aspects of our state or intent.  Embodied artificial agents, such as virtual avatars or robots, should also use non-verbal behavior for efficient and pleasant interaction. A core part of non-verbal communication is gesticulation:  gestures communicate a large share of non-verbal content. For example, around 90\% of spoken utterances in descriptive discourse are accompanied by gestures. Since gestures are important, generating co-speech gestures has been an essential task in the Human-Agent Interaction (HAI) and Computer Graphics communities for several decades.  Evaluating the gesture-generating methods has been an equally important and equally challenging part of field development. Consequently, this thesis contributes to both the development and evaluation of gesture-generation models. 

This thesis proposes three deep-learning-based gesture-generation models. The first model is deterministic and uses only audio and generates only beat gestures.  The second model is deterministic and uses both audio and text, aiming to generate meaningful gestures.  A final model uses both audio and text and is probabilistic to learn the stochastic character of human gesticulation.  The methods have applications to both virtual agents and social robots. Individual research efforts in the field of gesture generation are difficult to compare, as there are no established benchmarks.  To address this situation, my colleagues and I launched the first-ever gesture-generation challenge, which we called the GENEA Challenge.  We have also investigated if online participants are as attentive as offline participants and found that they are both equally attentive provided that they are well paid.   Finally,  we developed a  system that integrates co-speech gesture-generation models into a real-time interactive embodied conversational agent.  This system is intended to facilitate the evaluation of modern gesture generation models in interaction. 

To further advance the development of capable gesture-generation methods, we need to advance their evaluation, and the research in the thesis supports an interpretation that evaluation is the main bottleneck that limits the field.  There are currently no comprehensive co-speech gesture datasets, which should be large, high-quality, and diverse. In addition, no strong objective metrics are yet available.  Creating speech-gesture datasets and developing objective metrics are highlighted as essential next steps for further field development.

sted, utgiver, år, opplag, sider
KTH Royal Institute of Technology, 2021. s. 47
Serie
TRITA-EECS-AVL ; 2021:75
Emneord
Human-agent interaction, gesture generation, social robotics, conversational agents, non-verbal behavior, deep learning, machine learning
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-304618 (URN)978-91-8040-058-9 (ISBN)
Disputas
2021-12-07, Sal Kollegiesalen, Stockholm, 13:00 (engelsk)
Opponent
Veileder
Forskningsfinansiär
Swedish Foundation for Strategic Research , RIT15-0107
Merknad

QC 20211109

Tilgjengelig fra: 2021-11-10 Laget: 2021-11-08 Sist oppdatert: 2022-06-25bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopushttps://dl.acm.org/doi/10.1145/3472306.3478333

Person

Kucherenko, TarasNagy, RajmundJonell, PatrikKjellström, HedvigHenter, Gustav Eje

Søk i DiVA

Av forfatter/redaktør
Kucherenko, TarasNagy, RajmundJonell, PatrikKjellström, HedvigHenter, Gustav Eje
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 99 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf