kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Response-conditioned Turn-taking Prediction
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-3513-4132
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-8579-1790
2023 (English)In: Findings of the Association for Computational Linguistics, ACL 2023, Association for Computational Linguistics (ACL) , 2023, p. 12241-12248Conference paper, Published paper (Refereed)
Abstract [en]

Previous approaches to turn-taking and response generation in conversational systems have treated it as a two-stage process: First, the end of a turn is detected (based on conversation history), then the system generates an appropriate response. Humans, however, do not take the turn just because it is likely, but also consider whether what they want to say fits the position. In this paper, we present a model (an extension of TurnGPT) that conditions the end-of-turn prediction on both conversation history and what the next speaker wants to say. We find that our model consistently outperforms the baseline model on a variety of metrics. The improvement is most prominent in two scenarios where turn predictions can be ambiguous solely from the conversation history: 1) when the current utterance contains a statement followed by a question; 2) when the end of the current utterance semantically matches the response. Treating the turn-prediction and response-ranking as a one-stage process, our findings suggest that our model can be used as an incremental response ranker, which can be applied in various settings.

Place, publisher, year, edition, pages
Association for Computational Linguistics (ACL) , 2023. p. 12241-12248
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-350243DOI: 10.18653/v1/2023.findings-acl.776Scopus ID: 2-s2.0-85175451617OAI: oai:DiVA.org:kth-350243DiVA, id: diva2:1883619
Conference
61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, July 9-14, 2023, Toronto, Canada
Projects
tmh_turntaking
Note

Part of ISBN 9781959429623

QC 20241028

Available from: 2024-07-11 Created: 2024-07-11 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Jiang, Bing'erEkstedt, ErikSkantze, Gabriel

Search in DiVA

By author/editor
Jiang, Bing'erEkstedt, ErikSkantze, Gabriel
By organisation
Speech, Music and Hearing, TMH
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 50 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf