Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Furhat: A Back-projected Human-like Robot Head for Multiparty Human-Machine Interaction
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0003-1399-6604
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-8579-1790
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
2012 (English)In: Cognitive Behavioural Systems: COST 2102 International Training School, Dresden, Germany, February 21-26, 2011, Revised Selected Papers / [ed] Anna Esposito, Antonietta M. Esposito, Alessandro Vinciarelli, Rüdiger Hoffmann, Vincent C. Müller, Springer Berlin/Heidelberg, 2012, 114-130 p.Conference paper, Published paper (Refereed)
Abstract [en]

In this chapter, we first present a summary of findings from two previous studies on the limitations of using flat displays with embodied conversational agents (ECAs) in the contexts of face-to-face human-agent interaction. We then motivate the need for a three dimensional display of faces to guarantee accurate delivery of gaze and directional movements and present Furhat, a novel, simple, highly effective, and human-like back-projected robot head that utilizes computer animation to deliver facial movements, and is equipped with a pan-tilt neck. After presenting a detailed summary on why and how Furhat was built, we discuss the advantages of using optically projected animated agents for interaction. We discuss using such agents in terms of situatedness, environment, context awareness, and social, human-like face-to-face interaction with robots where subtle nonverbal and social facial signals can be communicated. At the end of the chapter, we present a recent application of Furhat as a multimodal multiparty interaction system that was presented at the London Science Museum as part of a robot festival,. We conclude the paper by discussing future developments, applications and opportunities of this technology.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2012. 114-130 p.
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 7403
Keyword [en]
Avatar, Back Projection, Dialogue System, Facial Animation, Furhat, Gaze, Gaze Perception, Mona Lisa Effect, Multimodal Interaction, Multiparty Interaction, Robot Heads, Situated Interaction, Talking Heads
National Category
Human Computer Interaction
Identifiers
URN: urn:nbn:se:kth:diva-105606DOI: 10.1007/978-3-642-34584-5_9Scopus ID: 2-s2.0-84870382387ISBN: 978-364234583-8 (print)OAI: oai:DiVA.org:kth-105606DiVA: diva2:571521
Conference
International Training School on Cognitive Behavioural Systems, COST 2102; Dresden; 21 February 2011 through 26 February 2011
Funder
ICT - The Next Generation
Note

QC 20121123

Available from: 2012-11-22 Created: 2012-11-22 Last updated: 2013-09-02Bibliographically approved
In thesis
1. Bringing the avatar to life: Studies and developments in facial communication for virtual agents and robots
Open this publication in new window or tab >>Bringing the avatar to life: Studies and developments in facial communication for virtual agents and robots
2012 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The work presented in this thesis comes in pursuit of the ultimate goal of building spoken and embodied human-like interfaces that are able to interact with humans under human terms. Such interfaces need to employ the subtle, rich and multidimensional signals of communicative and social value that complement the stream of words – signals humans typically use when interacting with each other.

The studies presented in the thesis concern facial signals used in spoken communication, and can be divided into two connected groups. The first is targeted towards exploring and verifying models of facial signals that come in synchrony with speech and its intonation. We refer to this as visual-prosody, and as part of visual-prosody, we take prominence as a case study. We show that the use of prosodically relevant gestures in animated faces results in a more expressive and human-like behaviour. We also show that animated faces supported with these gestures result in more intelligible speech which in turn can be used to aid communication, for example in noisy environments.

The other group of studies targets facial signals that complement speech. As spoken language is a relatively poor system for the communication of spatial information; since such information is visual in nature. Hence, the use of visual movements of spatial value, such as gaze and head movements, is important for an efficient interaction. The use of such signals is especially important when the interaction between the human and the embodied agent is situated – that is when they share the same physical space, and while this space is taken into account in the interaction.

We study the perception, the modelling, and the interaction effects of gaze and head pose in regulating situated and multiparty spoken dialogues in two conditions. The first is the typical case where the animated face is displayed on flat surfaces, and the second where they are displayed on a physical three-dimensional model of a face. The results from the studies show that projecting the animated face onto a face-shaped mask results in an accurate perception of the direction of gaze that is generated by the avatar, and hence can allow for the use of these movements in multiparty spoken dialogue.

Driven by these findings, the Furhat back-projected robot head is developed. Furhat employs state-of-the-art facial animation that is projected on a 3D printout of that face, and a neck to allow for head movements. Although the mask in Furhat is static, the fact that the animated face matches the design of the mask results in a physical face that is perceived to “move”.

We present studies that show how this technique renders a more intelligible, human-like and expressive face. We further present experiments in which Furhat is used as a tool to investigate properties of facial signals in situated interaction.

Furhat is built to study, implement, and verify models of situated and multiparty, multimodal Human-Machine spoken dialogue, a study that requires that the face is physically situated in the interaction environment rather than in a two-dimensional screen. It also has received much interest from several communities, and been showcased at several venues, including a robot exhibition at the London Science Museum. We present an evaluation study of Furhat at the exhibition where it interacted with several thousand persons in a multiparty conversation. The analysis of the data from the setup further shows that Furhat can accurately regulate multiparty interaction using gaze and head movements.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2012. xxvi, 96 p.
Series
Trita-CSC-A, ISSN 1653-5723 ; 2012:15
Keyword
Avatar, Speech Communication, Facial animation, Nonverbal, Social, Robot, Human-like, Face-to-face, Prosody, Pitch, Prominence, Furhat, Gaze, Head-pose, Dialogue, Interaction, Multimodal, Multiparty
National Category
Human Computer Interaction
Research subject
SRA - ICT
Identifiers
urn:nbn:se:kth:diva-105605 (URN)978-91-7501-551-4 (ISBN)
Public defence
2012-12-07, F3, Lindstedtsvägen 26, KTH, Stockholm, 13:30 (English)
Opponent
Supervisors
Note

QC 20121123

Available from: 2012-11-23 Created: 2012-11-22 Last updated: 2012-12-10Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Beskow, JonasSkantze, Gabriel

Search in DiVA

By author/editor
Al Moubayed, SamerBeskow, JonasSkantze, GabrielGranström, Björn
By organisation
Speech Communication and Technology
Human Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 630 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf