Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Turn-taking Control Using Gaze in Multiparty Human-Computer Dialogue: Effects of 2D and 3D Displays
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.ORCID-id: 0000-0002-8579-1790
2011 (Engelska)Ingår i: Proceedings of the International Conference on Audio-Visual Speech Processing 2011, Stockholm: KTH Royal Institute of Technology, 2011, s. 99-102Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In a previous experiment we found that the perception of gazefrom an animated agent on a two-dimensional display suffersfrom the Mona Lisa effect, which means that exclusive mutual gaze cannot be established if there is more than one observer. By using a three-dimensional projection surface, this effect can be eliminated. In this study, we investigate whether this difference also holds for the turn-taking behaviour of subjects interacting with the animated agent in a multi-party dialogue. We present a Wizard-of-Oz experiment where five subjects talk toan animated agent in a route direction dialogue. The results show that the subjects to some extent can infer the intended target of the agent’s questions, in spite of the Mona Lisa effect, but that the accuracy of gaze when it comes to selecting an addressee is still significantly lower in the 2D condition, ascompared to the 3D condition. The response time is also significantly longer in the 2D condition, indicating that the inference of intended gaze may require additional cognitive efforts.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH Royal Institute of Technology, 2011. s. 99-102
Serie
Proceedings of the International Conference on Audio-Visual Speech Processing, ISSN 1680-8908 ; 2011
Nyckelord [en]
Turn-taking, Multi-party Dialogue, Gaze, Facial Interaction, Mona Lisa Effect, Facial Projection, Wizard of Oz
Nationell ämneskategori
Datavetenskap (datalogi) Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
URN: urn:nbn:se:kth:diva-52205ISBN: 978-91-7501-080-9 (tryckt)ISBN: 978-91-7501-079-3 (tryckt)OAI: oai:DiVA.org:kth-52205DiVA, id: diva2:465503
Konferens
International Conference on Audio-Visual Speech Processing 2011, Aug 31- Sep 3 2011, Volterra, Italy
Anmärkning
tmh_import_11_12_14. QC 20111222Tillgänglig från: 2011-12-14 Skapad: 2011-12-14 Senast uppdaterad: 2018-01-12Bibliografiskt granskad
Ingår i avhandling
1. Bringing the avatar to life: Studies and developments in facial communication for virtual agents and robots
Öppna denna publikation i ny flik eller fönster >>Bringing the avatar to life: Studies and developments in facial communication for virtual agents and robots
2012 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

The work presented in this thesis comes in pursuit of the ultimate goal of building spoken and embodied human-like interfaces that are able to interact with humans under human terms. Such interfaces need to employ the subtle, rich and multidimensional signals of communicative and social value that complement the stream of words – signals humans typically use when interacting with each other.

The studies presented in the thesis concern facial signals used in spoken communication, and can be divided into two connected groups. The first is targeted towards exploring and verifying models of facial signals that come in synchrony with speech and its intonation. We refer to this as visual-prosody, and as part of visual-prosody, we take prominence as a case study. We show that the use of prosodically relevant gestures in animated faces results in a more expressive and human-like behaviour. We also show that animated faces supported with these gestures result in more intelligible speech which in turn can be used to aid communication, for example in noisy environments.

The other group of studies targets facial signals that complement speech. As spoken language is a relatively poor system for the communication of spatial information; since such information is visual in nature. Hence, the use of visual movements of spatial value, such as gaze and head movements, is important for an efficient interaction. The use of such signals is especially important when the interaction between the human and the embodied agent is situated – that is when they share the same physical space, and while this space is taken into account in the interaction.

We study the perception, the modelling, and the interaction effects of gaze and head pose in regulating situated and multiparty spoken dialogues in two conditions. The first is the typical case where the animated face is displayed on flat surfaces, and the second where they are displayed on a physical three-dimensional model of a face. The results from the studies show that projecting the animated face onto a face-shaped mask results in an accurate perception of the direction of gaze that is generated by the avatar, and hence can allow for the use of these movements in multiparty spoken dialogue.

Driven by these findings, the Furhat back-projected robot head is developed. Furhat employs state-of-the-art facial animation that is projected on a 3D printout of that face, and a neck to allow for head movements. Although the mask in Furhat is static, the fact that the animated face matches the design of the mask results in a physical face that is perceived to “move”.

We present studies that show how this technique renders a more intelligible, human-like and expressive face. We further present experiments in which Furhat is used as a tool to investigate properties of facial signals in situated interaction.

Furhat is built to study, implement, and verify models of situated and multiparty, multimodal Human-Machine spoken dialogue, a study that requires that the face is physically situated in the interaction environment rather than in a two-dimensional screen. It also has received much interest from several communities, and been showcased at several venues, including a robot exhibition at the London Science Museum. We present an evaluation study of Furhat at the exhibition where it interacted with several thousand persons in a multiparty conversation. The analysis of the data from the setup further shows that Furhat can accurately regulate multiparty interaction using gaze and head movements.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH Royal Institute of Technology, 2012. s. xxvi, 96
Serie
Trita-CSC-A, ISSN 1653-5723 ; 2012:15
Nyckelord
Avatar, Speech Communication, Facial animation, Nonverbal, Social, Robot, Human-like, Face-to-face, Prosody, Pitch, Prominence, Furhat, Gaze, Head-pose, Dialogue, Interaction, Multimodal, Multiparty
Nationell ämneskategori
Människa-datorinteraktion (interaktionsdesign)
Forskningsämne
SRA - Informations- och kommunikationsteknik
Identifikatorer
urn:nbn:se:kth:diva-105605 (URN)978-91-7501-551-4 (ISBN)
Disputation
2012-12-07, F3, Lindstedtsvägen 26, KTH, Stockholm, 13:30 (Engelska)
Opponent
Handledare
Anmärkning

QC 20121123

Tillgänglig från: 2012-11-23 Skapad: 2012-11-22 Senast uppdaterad: 2018-01-12Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

http://www.speech.kth.se/prod/publications/files/3575.pdf

Personposter BETA

Skantze, Gabriel

Sök vidare i DiVA

Av författaren/redaktören
Al Moubayed, SamerSkantze, Gabriel
Av organisationen
Tal-kommunikation
Datavetenskap (datalogi)Språkteknologi (språkvetenskaplig databehandling)

Sök vidare utanför DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetricpoäng

isbn
urn-nbn
Totalt: 1023 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf