kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
How Much Does Prosody Help Turn-taking?Investigations using Voice Activity Projection Models
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-3513-4132
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-8579-1790
2022 (English)In: Proceedings of the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue / [ed] Association for Computational Linguistics, Edinburgh UK: Association for Computational Linguistics, 2022, Vol. 23, p. 541-551, article id 2022.sigdial-1.51Conference paper, Published paper (Refereed)
Abstract [en]

Turn-taking is a fundamental aspect of human communication and can be described as the ability to take turns, project upcoming turn shifts, and supply backchannels at appropriate locations throughout a conversation. In this work, we investigate the role of prosody in turn-taking using the recently proposed Voice Activity Projection model, which incrementally models the upcoming speech activity of the interlocutors in a self-supervised manner, without relying on explicit annotation of turn-taking events, or the explicit modeling of prosodic features. Through manipulation of the speech signal, we investigate how these models implicitly utilize prosodic information. We show that these systems learn to utilize various prosodic aspects of speech both on aggregate quantitative metrics of long-form conversations and on single utterances specifically designed to depend on prosody.

Place, publisher, year, edition, pages
Edinburgh UK: Association for Computational Linguistics, 2022. Vol. 23, p. 541-551, article id 2022.sigdial-1.51
Keywords [en]
turn-taking, spoken dialog, voice activity projection, prosody
National Category
Natural Language Processing
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-322532DOI: 10.18653/v1/2022.sigdial-1.51Scopus ID: 2-s2.0-85161066300OAI: oai:DiVA.org:kth-322532DiVA, id: diva2:1720305
Conference
the 23rd Annual Meeting of the Special Interest Group on Discourse and Dialogue, Edinburgh, UK. Association for Computational Linguistics.
Projects
tmh_turntaking
Funder
Riksbankens Jubileumsfond, P20-0484Swedish Research Council, 2020-03812
Note

Won best paper award

QC 20221221

Part of ISBN 978-195591766-7

Available from: 2022-12-19 Created: 2022-12-19 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Ekstedt, ErikSkantze, Gabriel

Search in DiVA

By author/editor
Ekstedt, ErikSkantze, Gabriel
By organisation
Speech, Music and Hearing, TMH
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 103 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf