kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Turn-taking in Conversational Systems and Human-Robot Interaction: A Review
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-8579-1790
2021 (English)In: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 67, article id 101178Article, review/survey (Refereed) Published
Abstract [en]

The taking of turns is a fundamental aspect of dialogue. Since it is difficult to speak and listen at the same time, the participants need to coordinate who is currently speaking and when the next person can start to speak. Humans are very good at this coordination, and typically achieve fluent turn-taking with very small gaps and little overlap. Conversational systems (including voice assistants and social robots), on the other hand, typically have problems with frequent interruptions and long response delays, which has called for a substantial body of research on how to improve turn-taking in conversational systems. In this review article, we provide an overview of this research and give directions for future research. First, we provide a theoretical background of the linguistic research tradition on turn-taking and some of the fundamental concepts in theories of turn-taking. We also provide an extensive review of multi-modal cues (including verbal cues, prosody, breathing, gaze and gestures) that have been found to facilitate the coordination of turn-taking in human-human interaction, and which can be utilised for turn-taking in conversational systems. After this, we review work that has been done on modelling turn-taking, including end-of-turn detection, handling of user interruptions, generation of turn-taking cues, and multi-party human-robot interaction. Finally, we identify key areas where more research is needed to achieve fluent turn-taking in spoken interaction between man and machine.

Place, publisher, year, edition, pages
Elsevier BV , 2021. Vol. 67, article id 101178
Keywords [en]
Turn-taking, Dialogue systems, Social robotics, Prosody, Gaze
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-289941DOI: 10.1016/j.csl.2020.101178ISI: 000606545200002Scopus ID: 2-s2.0-85098741102OAI: oai:DiVA.org:kth-289941DiVA, id: diva2:1527596
Projects
tmh_turntaking
Note

QC 20210211

Available from: 2021-02-11 Created: 2021-02-11 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Skantze, Gabriel

Search in DiVA

By author/editor
Skantze, Gabriel
By organisation
Speech, Music and Hearing, TMH
In the same journal
Computer speech & language (Print)
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 1433 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf