kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Turn-taking, feedback and joint attention in situated human-robot interaction
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-8579-1790
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-3585-8077
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-8273-0132
2014 (English)In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 65, p. 50-66Article in journal (Refereed) Published
Abstract [en]

In this paper, we present a study where a robot instructs a human on how to draw a route on a map. The human and robot are seated face-to-face with the map placed on the table between them. The user's and the robot's gaze can thus serve several simultaneous functions: as cues to joint attention, turn-taking, level of understanding and task progression. We have compared this face-to-face setting with a setting where the robot employs a random gaze behaviour, as well as a voice-only setting where the robot is hidden behind a paper board. In addition to this, we have also manipulated turn-taking cues such as completeness and filled pauses in the robot's speech. By analysing the participants' subjective rating, task completion, verbal responses, gaze behaviour, and drawing activity, we show that the users indeed benefit from the robot's gaze when talking about landmarks, and that the robot's verbal and gaze behaviour has a strong effect on the users' turn-taking behaviour. We also present an analysis of the users' gaze and lexical and prosodic realisation of feedback after the robot instructions, and show that these cues reveal whether the user has yet executed the previous instruction, as well as the user's level of uncertainty.

Place, publisher, year, edition, pages
2014. Vol. 65, p. 50-66
Keywords [en]
Turn-taking, Feedback, Joint attention, Prosody, Gaze, Uncertainty
National Category
Other Computer and Information Science
Identifiers
URN: urn:nbn:se:kth:diva-154366DOI: 10.1016/j.specom.2014.05.005ISI: 000341901700005Scopus ID: 2-s2.0-84903625192OAI: oai:DiVA.org:kth-154366DiVA, id: diva2:757131
Funder
Swedish Research Council, 2011-6237 2011-6152EU, FP7, Seventh Framework Programme, 288667
Note

QC 20141021

Available from: 2014-10-21 Created: 2014-10-20 Last updated: 2024-03-15Bibliographically approved
In thesis
1. Modelling Engagement in Multi-Party Conversations: Data-Driven Approaches to Understanding Human-Human Communication Patterns for Use in Human-Robot Interactions
Open this publication in new window or tab >>Modelling Engagement in Multi-Party Conversations: Data-Driven Approaches to Understanding Human-Human Communication Patterns for Use in Human-Robot Interactions
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The aim of this thesis is to study human-human interaction in order to provide virtual agents and robots with the capability to engage into multi-party-conversations in a human-like-manner. The focus lies with the modelling of conversational dynamics and the appropriate realization of multi-modal feedback behaviour. For such an undertaking, it is important to understand how human-human communication unfolds in varying contexts and constellations over time. To this end, multi-modal human-human corpora are designed as well as annotation schemes to capture conversational dynamics are developed. Multi-modal analysis is carried out and models are built. Emphasis is put on not modelling speaker behaviour in general and on modelling listener behaviour in particular.

In this thesis, a bridge is built between multi-modal modelling of conversational dynamics on the one hand multi-modal generation of listener behaviour in virtual agents and robots on the other hand. In order to build this bridge, a unit-selection multi-modal synthesis is carried out as well as a statistical speech synthesis of feedback. The effect of a variation in prosody of feedback token on the perception of third-party observers is evaluated. Finally, the effect of a controlled variation of eye-gaze is evaluated, as is the perception of user feedback in human-robot interaction.​

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2016. p. 87
Series
TRITA-CSC-A, ISSN 1653-5723 ; 2017:05
National Category
Engineering and Technology
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-198175 (URN)978-91-7729-237-1 (ISBN)
Public defence
2017-01-20, F3, Lindstedtsvägen 26, Kungl Tekniska högskolan, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20161214

Available from: 2016-12-14 Created: 2016-12-13 Last updated: 2022-06-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Skantze, GabrielHjalmarsson, AnnaOertel, Catharine

Search in DiVA

By author/editor
Skantze, GabrielHjalmarsson, AnnaOertel, Catharine
By organisation
Speech, Music and Hearing, TMH
In the same journal
Speech Communication
Other Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 629 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf