Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Voice Activity Projection: Self-supervised Learning of Turn-taking Events
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-3513-4132
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-8579-1790
2022 (engelsk)Inngår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, The International Speech Communication Association (ISCA), 2022, s. 5190-5194, artikkel-id 10955Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The modeling of turn-taking in dialog can be viewed as the modeling of the dynamics of voice activity of the interlocutors. We extend prior work and define the predictive task of Voice Activity Projection, a general, self-supervised objective, as a way to train turn-taking models without the need of labeled data. We highlight a theoretical weakness with prior approaches, arguing for the need of modeling the dependency of voice activity events in the projection window. We propose four zero-shot tasks, related to the prediction of upcoming turn-shifts and backchannels, and show that the proposed model outperforms prior work.

sted, utgiver, år, opplag, sider
The International Speech Communication Association (ISCA), 2022. s. 5190-5194, artikkel-id 10955
Emneord [en]
turn-taking, spoken dialog, voice activity projection, transformer
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-322531DOI: 10.21437/Interspeech.2022-10955ISI: 000900724505074Scopus ID: 2-s2.0-85138623131OAI: oai:DiVA.org:kth-322531DiVA, id: diva2:1720299
Konferanse
23rd Annual Conference of the International Speech Communication Association, INTERSPEECH 2022, Incheon, 18 September 2022 through 22 September
Prosjekter
tmh_turntaking
Forskningsfinansiär
Riksbankens Jubileumsfond, P20-0484Swedish Research Council, 2020-03812
Merknad

QC 20241024

Tilgjengelig fra: 2022-12-19 Laget: 2022-12-19 Sist oppdatert: 2026-03-09bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopushttps://www.isca-speech.org/archive/interspeech_2022/ekstedt22_interspeech.html

Person

Ekstedt, ErikSkantze, Gabriel

Søk i DiVA

Av forfatter/redaktør
Ekstedt, ErikSkantze, Gabriel
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 270 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf