kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring Turn-taking Cues in Multi-party Human-robot Discussions about Objects
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-8579-1790
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-3242-3935
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0003-1399-6604
2015 (English)In: Proceedings of the 2015 ACM International Conference on Multimodal Interaction, Association for Computing Machinery (ACM), 2015Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we present a dialog system that was exhibited at the Swedish National Museum of Science and Technology. Two visitors at a time could play a collaborative card sorting game together with the robot head Furhat, where the three players discuss the solution together. The cards are shown on a touch table between the players, thus constituting a target for joint attention. We describe how the system was implemented in order to manage turn-taking and attention to users and objects in the shared physical space. We also discuss how multi-modal redundancy (from speech, card movements and head pose) is exploited to maintain meaningful discussions, given that the system has to process conversational speech from both children and adults in a noisy environment. Finally, we present an analysis of 373 interactions, where we investigate the robustness of the system, to what extent the system's attention can shape the users' turn-taking behaviour, and how the system can produce multi-modal turn-taking signals (filled pauses, facial gestures, breath and gaze) to deal with processing delays in the system.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2015.
National Category
Computer Sciences Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-180422DOI: 10.1145/2818346.2820749ISI: 000380609500012Scopus ID: 2-s2.0-84959259564OAI: oai:DiVA.org:kth-180422DiVA, id: diva2:893714
Conference
ACM International Conference on Multimodal Interaction, ICMI 2015; Seattle, United States; 9-13 November 2015
Note

QC 20160331

Available from: 2016-01-13 Created: 2016-01-13 Last updated: 2025-02-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Skantze, GabrielJohansson, MartinBeskow, Jonas

Search in DiVA

By author/editor
Skantze, GabrielJohansson, MartinBeskow, Jonas
By organisation
Speech Communication and Technology
Computer SciencesNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 242 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf