kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Data-driven models for timing feedback responses in a Map Task dialogue system
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-7412-0967
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-8579-1790
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-0397-6442
2014 (English)In: Computer speech & language (Print), ISSN 0885-2308, E-ISSN 1095-8363, Vol. 28, no 4, p. 903-922Article in journal (Refereed) Published
Abstract [en]

Traditional dialogue systems use a fixed silence threshold to detect the end of users' turns. Such a simplistic model can result in system behaviour that is both interruptive and unresponsive, which in turn affects user experience. Various studies have observed that human interlocutors take cues from speaker behaviour, such as prosody, syntax, and gestures, to coordinate smooth exchange of speaking turns. However, little effort has been made towards implementing these models in dialogue systems and verifying how well they model the turn-taking behaviour in human computer interactions. We present a data-driven approach to building models for online detection of suitable feedback response locations in the user's speech. We first collected human computer interaction data using a spoken dialogue system that can perform the Map Task with users (albeit using a trick). On this data, we trained various models that use automatically extractable prosodic, contextual and lexico-syntactic features for detecting response locations. Next, we implemented a trained model in the same dialogue system and evaluated it in interactions with users. The subjective and objective measures from the user evaluation confirm that a model trained on speaker behavioural cues offers both smoother turn-transitions and more responsive system behaviour.

Place, publisher, year, edition, pages
2014. Vol. 28, no 4, p. 903-922
Keywords [en]
Spoken dialogue systems, Timing feedback, Turn-taking, User evaluation
National Category
Computer Sciences Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-147402DOI: 10.1016/j.csl.2014.02.002ISI: 000336694200005Scopus ID: 2-s2.0-84900533798OAI: oai:DiVA.org:kth-147402DiVA, id: diva2:731672
Note

QC 20140702

Available from: 2014-07-02 Created: 2014-06-27 Last updated: 2025-02-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Meena, RaveeshSkantze, GabrielGustafsson, Joakim

Search in DiVA

By author/editor
Meena, RaveeshSkantze, GabrielGustafsson, Joakim
By organisation
Speech Communication and Technology
In the same journal
Computer speech & language (Print)
Computer SciencesNatural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 338 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf