kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Voice Activity Projection: Self-supervised Learning of Turn-taking Events
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-3513-4132
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-8579-1790
2022 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, International Speech Communication Association, 2022, p. 5190-5194, article id 10955Conference paper, Published paper (Refereed)
Abstract [en]

The modeling of turn-taking in dialog can be viewed as the modeling of the dynamics of voice activity of the interlocutors. We extend prior work and define the predictive task of Voice Activity Projection, a general, self-supervised objective, as a way to train turn-taking models without the need of labeled data. We highlight a theoretical weakness with prior approaches, arguing for the need of modeling the dependency of voice activity events in the projection window. We propose four zero-shot tasks, related to the prediction of upcoming turn-shifts and backchannels, and show that the proposed model outperforms prior work.

Place, publisher, year, edition, pages
International Speech Communication Association, 2022. p. 5190-5194, article id 10955
Keywords [en]
turn-taking, spoken dialog, voice activity projection, transformer
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-322531DOI: 10.21437/Interspeech.2022-10955ISI: 000900724505074Scopus ID: 2-s2.0-85138623131OAI: oai:DiVA.org:kth-322531DiVA, id: diva2:1720299
Conference
23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, Incheon, 18 September 2022 through 22 September
Projects
tmh_turntaking
Funder
Riksbankens Jubileumsfond, P20-0484Swedish Research Council, 2020-03812
Note

QC 20241024

Available from: 2022-12-19 Created: 2022-12-19 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopushttps://www.isca-speech.org/archive/interspeech_2022/ekstedt22_interspeech.html

Authority records

Ekstedt, ErikSkantze, Gabriel

Search in DiVA

By author/editor
Ekstedt, ErikSkantze, Gabriel
By organisation
Speech, Music and Hearing, TMH
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 168 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf