Multilingual Turn-taking Prediction Using Voice Activity ProjectionShow others and affiliations
2024 (English)In: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, European Language Resources Association (ELRA) , 2024, p. 11873-11883Conference paper, Published paper (Refereed)
Abstract [en]
This paper investigates the application of voice activity projection (VAP), a predictive turn-taking model for spoken dialogue, on multilingual data, encompassing English, Mandarin, and Japanese. The VAP model continuously predicts the upcoming voice activities of participants in dyadic dialogue, leveraging a cross-attention Transformer to capture the dynamic interplay between participants. The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages. However, a multilingual model, trained on all three languages, demonstrates predictive performance on par with monolingual models across all languages. Further analyses show that the multilingual model has learned to discern the language of the input signal. We also analyze the sensitivity to pitch, a prosodic cue that is thought to be important for turn-taking. Finally, we compare two different audio encoders, contrastive predictive coding (CPC) pre-trained on English, with a recent model based on multilingual wav2vec 2.0 (MMS).
Place, publisher, year, edition, pages
European Language Resources Association (ELRA) , 2024. p. 11873-11883
Keywords [en]
Multilingual, Spoken Dialogue System, Turn-taking, Voice Activity Projection
National Category
Natural Language Processing General Language Studies and Linguistics Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-348790Scopus ID: 2-s2.0-85195914079OAI: oai:DiVA.org:kth-348790DiVA, id: diva2:1878700
Conference
Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, May 20-25, 2024, Torino, Italy
Projects
tmh_turntaking
Note
Part of ISBN 978-249381410-4
QC 20241028
2024-06-272024-06-272025-02-01Bibliographically approved