kth.sePublikationer KTH
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Wang, Siyang
Publikationer (10 of 12) Visa alla publikationer
Wang, S., Székely, É. & Gustafsson, J. (2024). Contextual Interactive Evaluation of TTS Models in Dialogue Systems. In: Interspeech 2024: . Paper presented at 25th Interspeech Conferece 2024, Kos Island, Greece, Sep 1 2024 - Sep 5 2024 (pp. 2965-2969). International Speech Communication Association
Öppna denna publikation i ny flik eller fönster >>Contextual Interactive Evaluation of TTS Models in Dialogue Systems
2024 (Engelska)Ingår i: Interspeech 2024, International Speech Communication Association , 2024, s. 2965-2969Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Evaluation of text-to-speech (TTS) models is currently dominated by Mean-Opinion-Score (MOS) listening test, but MOS has been increasingly questioned for its validity. MOS tests place listeners in a passive setup, in which they do not actively interact with the TTS and usually evaluate isolated utterances without context. Thus it gives no indication for how well a TTS model suits an interactive application like spoken dialogue system, in which the capability of generating appropriate speech in the dialogue context is paramount. We aim to take a first step towards addressing this shortcoming by evaluating several state-of-the-art neural TTS models, including one that adapts to dialogue context, in a custom-built spoken dialogue system. We present system design, experiment setup, and results. Our work is the first to evaluate TTS in contextual dialogue system interactions. We also discuss the shortcomings and future opportunities of the proposed evaluation paradigm.

Ort, förlag, år, upplaga, sidor
International Speech Communication Association, 2024
Nyckelord
evaluation methodology, human-computer interaction, spoken dialogue system, text-to-speech
Nationell ämneskategori
Språkbehandling och datorlingvistik Annan teknik
Identifikatorer
urn:nbn:se:kth:diva-358876 (URN)10.21437/Interspeech.2024-1008 (DOI)001331850103017 ()2-s2.0-85214809755 (Scopus ID)
Konferens
25th Interspeech Conferece 2024, Kos Island, Greece, Sep 1 2024 - Sep 5 2024
Anmärkning

QC 20250128

Tillgänglig från: 2025-01-23 Skapad: 2025-01-23 Senast uppdaterad: 2025-12-05Bibliografiskt granskad
Wang, S. & Székely, É. (2024). Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model. In: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings: . Paper presented at Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, Torino, Italy, May 20 2024 - May 25 2024 (pp. 6464-6474). European Language Resources Association (ELRA)
Öppna denna publikation i ny flik eller fönster >>Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
2024 (Engelska)Ingår i: 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings, European Language Resources Association (ELRA) , 2024, s. 6464-6474Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Recent advances in generative language modeling applied to discrete speech tokens presented a new avenue for text-to-speech (TTS) synthesis. These speech language models (SLMs), similarly to their textual counterparts, are scalable, probabilistic, and context-aware. While they can produce diverse and natural outputs, they sometimes face issues such as unintelligibility and the inclusion of non-speech noises or hallucination. As the adoption of this innovative paradigm in speech synthesis increases, there is a clear need for an in-depth evaluation of its capabilities and limitations. In this paper, we evaluate TTS from a discrete token-based SLM, through both automatic metrics and listening tests. We examine five key dimensions: speaking style, intelligibility, speaker consistency, prosodic variation, spontaneous behaviour. Our results highlight the model's strength in generating varied prosody and spontaneous outputs. It is also rated higher in naturalness and context appropriateness in listening tests compared to a conventional TTS. However, the model's performance in intelligibility and speaker consistency lags behind traditional TTS. Additionally, we show that increasing the scale of SLMs offers a modest boost in robustness. Our findings aim to serve as a benchmark for future advancements in generative SLMs for speech synthesis.

Ort, förlag, år, upplaga, sidor
European Language Resources Association (ELRA), 2024
Nyckelord
discrete speech token, generative speech language model, text-to-speech evaluation
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
urn:nbn:se:kth:diva-348777 (URN)2-s2.0-85195990390 (Scopus ID)
Konferens
Joint 30th International Conference on Computational Linguistics and 14th International Conference on Language Resources and Evaluation, LREC-COLING 2024, Hybrid, Torino, Italy, May 20 2024 - May 25 2024
Anmärkning

Part of ISBN 9782493814104

QC 20240701

Tillgänglig från: 2024-06-27 Skapad: 2024-06-27 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Wang, S., Henter, G. E., Gustafsson, J. & Székely, É. (2023). A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS. In: ICASSPW 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings. Paper presented at 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, ICASSPW 2023, Rhodes Island, Greece, Jun 4 2023 - Jun 10 2023. Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>A Comparative Study of Self-Supervised Speech Representations in Read and Spontaneous TTS
2023 (Engelska)Ingår i: ICASSPW 2023: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2023Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Recent work has explored using self-supervised learning (SSL) speech representations such as wav2vec2.0 as the representation medium in standard two-stage TTS, in place of conventionally used mel-spectrograms. It is however unclear which speech SSL is the better fit for TTS, and whether or not the performance differs between read and spontaneous TTS, the later of which is arguably more challenging. This study aims at addressing these questions by testing several speech SSLs, including different layers of the same SSL, in two-stage TTS on both read and spontaneous corpora, while maintaining constant TTS model architecture and training settings. Results from listening tests show that the 9th layer of 12-layer wav2vec2.0 (ASR finetuned) outperforms other tested SSLs and mel-spectrogram, in both read and spontaneous TTS. Our work sheds light on both how speech SSL can readily improve current TTS systems, and how SSLs compare in the challenging generative task of TTS. Audio examples can be found at https://www.speech.kth.se/tts-demos/ssr_tts

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2023
Nyckelord
self-supervised speech representation, speech synthesis, spontaneous speech
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
urn:nbn:se:kth:diva-335090 (URN)10.1109/ICASSPW59220.2023.10193157 (DOI)001046933700056 ()2-s2.0-85165623363 (Scopus ID)
Konferens
2023 IEEE International Conference on Acoustics, Speech and Signal Processing Workshops, ICASSPW 2023, Rhodes Island, Greece, Jun 4 2023 - Jun 10 2023
Anmärkning

Part of ISBN 9798350302615

QC 20230831

Tillgänglig från: 2023-08-31 Skapad: 2023-08-31 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Wang, S., Henter, G. E., Gustafsson, J. & Székely, É. (2023). A comparative study of self-supervised speech representationsin read and spontaneous TTS. Paper presented at 2023 IEEE International Conference on Acoustics, Speech,and Signal Processing Workshops, 4-10 Jun 2023, Rhodes Island, Greece.
Öppna denna publikation i ny flik eller fönster >>A comparative study of self-supervised speech representationsin read and spontaneous TTS
2023 (Engelska)Manuskript (preprint) (Övrigt vetenskapligt)
Abstract [en]

Recent work has explored using self-supervised learning(SSL) speech representations such as wav2vec2.0 as the rep-resentation medium in standard two-stage TTS, in place ofconventionally used mel-spectrograms. It is however unclearwhich speech SSL is the better fit for TTS, and whether ornot the performance differs between read and spontaneousTTS, the later of which is arguably more challenging. Thisstudy aims at addressing these questions by testing severalspeech SSLs, including different layers of the same SSL, intwo-stage TTS on both read and spontaneous corpora, whilemaintaining constant TTS model architecture and trainingsettings. Results from listening tests show that the 9th layerof 12-layer wav2vec2.0 (ASR finetuned) outperforms othertested SSLs and mel-spectrogram, in both read and sponta-neous TTS. Our work sheds light on both how speech SSL canreadily improve current TTS systems, and how SSLs comparein the challenging generative task of TTS. Audio examplescan be found at https://www.speech.kth.se/tts-demos/ssr tts

Nyckelord
speech synthesis, self-supervised speech representation, spontaneous speech
Nationell ämneskategori
Annan elektroteknik och elektronik Annan teknik
Forskningsämne
Tal- och musikkommunikation
Identifikatorer
urn:nbn:se:kth:diva-328741 (URN)979-8-3503-0261-5 (ISBN)
Konferens
2023 IEEE International Conference on Acoustics, Speech,and Signal Processing Workshops, 4-10 Jun 2023, Rhodes Island, Greece
Projekt
Digital Futures project Advanced Adaptive Intelligent Systems (AAIS)Swedish Research Council project Connected (VR-2019-05003)Swedish Research Council project Perception of speaker stance (VR-2020- 02396)Riksbankens Jubileumsfond project CAPTivating (P20-0298)Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation
Anmärkning

Accepted by the 2023 IEEE International Conference on Acoustics, Speech,and Signal Processing Workshops, 4-10 Jun 2023, Rhodes Island, Greece

QC 20230620

Tillgänglig från: 2023-06-12 Skapad: 2023-06-12 Senast uppdaterad: 2025-02-18Bibliografiskt granskad
Ekstedt, E., Wang, S., Székely, É., Gustafsson, J. & Skantze, G. (2023). Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023: . Paper presented at 24th International Speech Communication Association, Interspeech 2023, August 20-24, 2023, Dublin, Ireland (pp. 5481-5485). International Speech Communication Association
Öppna denna publikation i ny flik eller fönster >>Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis
Visa övriga...
2023 (Engelska)Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2023, International Speech Communication Association , 2023, s. 5481-5485Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues. Using the recently proposed Voice Activity Projection model, we propose an automatic evaluation approach to measure these aspects for conversational speech synthesis. We investigate the ability of three commercial, and two open-source, Text-To-Speech (TTS) systems ability to generate turn-taking cues over simulated turns. By varying the stimuli, or controlling the prosody, we analyze the models performances. We show that while commercial TTS largely provide appropriate cues, they often produce ambiguous signals, and that further improvements are possible. TTS, trained on read or spontaneous speech, produce strong turn-hold but weak turn-yield cues. We argue that this approach, that focus on functional aspects of interaction, provides a useful addition to other important speech metrics, such as intelligibility and naturalness.

Ort, förlag, år, upplaga, sidor
International Speech Communication Association, 2023
Nyckelord
human-computer interaction, text-to-speech, turn-taking
Nationell ämneskategori
Språkbehandling och datorlingvistik Datavetenskap (datalogi) Jämförande språkvetenskap och allmän lingvistik
Identifikatorer
urn:nbn:se:kth:diva-337873 (URN)10.21437/Interspeech.2023-2064 (DOI)001186650305133 ()2-s2.0-85171597862 (Scopus ID)
Konferens
24th International Speech Communication Association, Interspeech 2023, August 20-24, 2023, Dublin, Ireland
Projekt
tmh_turntaking
Anmärkning

QC 20241024

Tillgänglig från: 2023-10-10 Skapad: 2023-10-10 Senast uppdaterad: 2025-02-01Bibliografiskt granskad
Mehta, S., Wang, S., Alexanderson, S., Beskow, J., Székely, É. & Henter, G. E. (2023). Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis. In: Proceedings 12th ISCA Speech Synthesis Workshop (SSW), Grenoble: . Paper presented at 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, France, August 26–28, 2023 (pp. 150-156). International Speech Communication Association
Öppna denna publikation i ny flik eller fönster >>Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis
Visa övriga...
2023 (Engelska)Ingår i: Proceedings 12th ISCA Speech Synthesis Workshop (SSW), Grenoble, International Speech Communication Association , 2023, s. 150-156Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

With read-aloud speech synthesis achieving high naturalness scores, there is a growing research interest in synthesising spontaneous speech. However, human spontaneous face-to-face conversation has both spoken and non-verbal aspects (here, co-speech gestures). Only recently has research begun to explore the benefits of jointly synthesising these two modalities in a single system. The previous state of the art used non-probabilistic methods, which fail to capture the variability of human speech and motion, and risk producing oversmoothing artefacts and sub-optimal synthesis quality. We present the first diffusion-based probabilistic model, called Diff-TTSG, that jointly learns to synthesise speech and gestures together. Our method can be trained on small datasets from scratch. Furthermore, we describe a set of careful uni- and multi-modal subjective tests for evaluating integrated speech and gesture synthesis systems, and use them to validate our proposed approach.

Ort, förlag, år, upplaga, sidor
International Speech Communication Association, 2023
Nyckelord
Text-to-speech, speech-to-gesture, joint multimodal synthesis, deep generative model, diffusion model, evaluation
Nationell ämneskategori
Signalbehandling
Forskningsämne
Datalogi; Datalogi
Identifikatorer
urn:nbn:se:kth:diva-368340 (URN)10.21437/SSW.2023-24 (DOI)
Konferens
12th ISCA Speech Synthesis Workshop (SSW), Grenoble, France, August 26–28, 2023
Forskningsfinansiär
Wallenberg AI, Autonomous Systems and Software Program (WASP), 3420 WASP SM GeH
Anmärkning

QC 20250813

Tillgänglig från: 2025-08-13 Skapad: 2025-08-13 Senast uppdaterad: 2025-08-13Bibliografiskt granskad
Miniotaitė, J., Wang, S., Beskow, J., Gustafson, J., Székely, É. & Abelho Pereira, A. T. (2023). Hi robot, it's not what you say, it's how you say it. In: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN: . Paper presented at 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA (pp. 307-314). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Hi robot, it's not what you say, it's how you say it
Visa övriga...
2023 (Engelska)Ingår i: 2023 32ND IEEE INTERNATIONAL CONFERENCE ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION, RO-MAN, Institute of Electrical and Electronics Engineers (IEEE) , 2023, s. 307-314Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Many robots use their voice to communicate with people in spoken language but the voices commonly used for robots are often optimized for transactional interactions, rather than social ones. This can limit their ability to create engaging and natural interactions. To address this issue, we designed a spontaneous text-to-speech tool and used it to author natural and spontaneous robot speech. A crowdsourcing evaluation methodology is proposed to compare this type of speech to natural speech and state-of-the-art text-to-speech technology, both in disembodied and embodied form. We created speech samples in a naturalistic setting of people playing tabletop games and conducted a user study evaluating Naturalness, Intelligibility, Social Impression, Prosody, and Perceived Intelligence. The speech samples were chosen to represent three contexts that are common in tabletopgames and the contexts were introduced to the participants that evaluated the speech samples. The study results show that the proposed evaluation methodology allowed for a robust analysis that successfully compared the different conditions. Moreover, the spontaneous voice met our target design goal of being perceived as more natural than a leading commercial text-to-speech.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2023
Serie
IEEE RO-MAN, ISSN 1944-9445
Nyckelord
speech synthesis, human-robot interaction, embodiment, spontaneous speech, intelligibility, naturalness
Nationell ämneskategori
Annan teknik
Identifikatorer
urn:nbn:se:kth:diva-341972 (URN)10.1109/RO-MAN57019.2023.10309427 (DOI)001108678600044 ()2-s2.0-85186982397 (Scopus ID)
Konferens
32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), AUG 28-31, 2023, Busan, SOUTH KOREA
Anmärkning

Part of proceedings ISBN 979-8-3503-3670-2

Tillgänglig från: 2024-01-09 Skapad: 2024-01-09 Senast uppdaterad: 2025-02-18Bibliografiskt granskad
Deichler, A., Wang, S., Alexanderson, S. & Beskow, J. (2023). Learning to generate pointing gestures in situated embodied conversational agents. Frontiers in Robotics and AI, 10, Article ID 1110534.
Öppna denna publikation i ny flik eller fönster >>Learning to generate pointing gestures in situated embodied conversational agents
2023 (Engelska)Ingår i: Frontiers in Robotics and AI, E-ISSN 2296-9144, Vol. 10, artikel-id 1110534Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

One of the main goals of robotics and intelligent agent research is to enable them to communicate with humans in physically situated settings. Human communication consists of both verbal and non-verbal modes. Recent studies in enabling communication for intelligent agents have focused on verbal modes, i.e., language and speech. However, in a situated setting the non-verbal mode is crucial for an agent to adapt flexible communication strategies. In this work, we focus on learning to generate non-verbal communicative expressions in situated embodied interactive agents. Specifically, we show that an agent can learn pointing gestures in a physically simulated environment through a combination of imitation and reinforcement learning that achieves high motion naturalness and high referential accuracy. We compared our proposed system against several baselines in both subjective and objective evaluations. The subjective evaluation is done in a virtual reality setting where an embodied referential game is played between the user and the agent in a shared 3D space, a setup that fully assesses the communicative capabilities of the generated gestures. The evaluations show that our model achieves a higher level of referential accuracy and motion naturalness compared to a state-of-the-art supervised learning motion synthesis model, showing the promise of our proposed system that combines imitation and reinforcement learning for generating communicative gestures. Additionally, our system is robust in a physically-simulated environment thus has the potential of being applied to robots.

Ort, förlag, år, upplaga, sidor
Frontiers Media SA, 2023
Nyckelord
reinforcement learning, imitation learning, non-verbal communication, embodied interactive agents, gesture generation, physics-aware machine learning
Nationell ämneskategori
Människa-datorinteraktion (interaktionsdesign)
Identifikatorer
urn:nbn:se:kth:diva-326625 (URN)10.3389/frobt.2023.1110534 (DOI)000970385800001 ()37064574 (PubMedID)2-s2.0-85153351800 (Scopus ID)
Anmärkning

QC 20230508

Tillgänglig från: 2023-05-08 Skapad: 2023-05-08 Senast uppdaterad: 2023-05-08Bibliografiskt granskad
Székely, É., Wang, S. & Gustafsson, J. (2023). So-to-Speak: an exploratory platform for investigating the interplay between style and prosody in TTS. In: Interspeech 2023: . Paper presented at 24th International Speech Communication Association, Interspeech 2023, August 20-24, 2023, Dublin, Ireland (pp. 2016-2017). International Speech Communication Association
Öppna denna publikation i ny flik eller fönster >>So-to-Speak: an exploratory platform for investigating the interplay between style and prosody in TTS
2023 (Engelska)Ingår i: Interspeech 2023, International Speech Communication Association , 2023, s. 2016-2017Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

In recent years, numerous speech synthesis systems have been proposed that feature multi-dimensional controllability, generating a level of variability that surpasses traditional TTS systems by orders of magnitude. However, it remains challenging for developers to comprehend and demonstrate the potential of these advanced systems. We introduce So-to-Speak, a customisable interface tailored for showcasing the capabilities of different controllable TTS systems. The interface allows for the generation, synthesis, and playback of hundreds of samples simultaneously, displayed on an interactive grid, with variation both low level prosodic features and high level style controls. To offer insights into speech quality, automatic estimates of MOS scores are presented for each sample. So-to-Speak facilitates the audiovisual exploration of the interaction between various speech features, which can be useful in a range of applications in speech technology.

Ort, förlag, år, upplaga, sidor
International Speech Communication Association, 2023
Nyckelord
prosody, speaking style, speech synthesis, TTS
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
urn:nbn:se:kth:diva-337833 (URN)001186650302036 ()2-s2.0-85171599228 (Scopus ID)
Konferens
24th International Speech Communication Association, Interspeech 2023, August 20-24, 2023, Dublin, Ireland
Anmärkning

QC 20241011

Tillgänglig från: 2023-10-09 Skapad: 2023-10-09 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Wang, S., Gustafsson, J. & Székely, É. (2022). Evaluating Sampling-based Filler Insertion with Spontaneous TTS. In: Calzolari, N Bechet, F Blache, P Choukri, K Cieri, C Declerck, T Goggi, S Isahara, H Maegaard, B Mazo, H Odijk, H Piperidis, S (Ed.), LREC 2022: Thirteen International Conference On Language Resources And Evaluation. Paper presented at 13th International Conference on Language Resources and Evaluation (LREC), JUN 20-25, 2022, Marseille, FRANCE (pp. 1960-1969). European Language Resources Association (ELRA)
Öppna denna publikation i ny flik eller fönster >>Evaluating Sampling-based Filler Insertion with Spontaneous TTS
2022 (Engelska)Ingår i: LREC 2022: Thirteen International Conference On Language Resources And Evaluation / [ed] Calzolari, N Bechet, F Blache, P Choukri, K Cieri, C Declerck, T Goggi, S Isahara, H Maegaard, B Mazo, H Odijk, H Piperidis, S, European Language Resources Association (ELRA) , 2022, s. 1960-1969Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Inserting fillers (such as "um", "like") to clean speech text has a rich history of study. One major application is to make dialogue systems sound more spontaneous. The ambiguity of filler occurrence and inter-speaker difference make both modeling and evaluation difficult. In this paper, we study sampling-based filler insertion, a simple yet unexplored approach to inserting fillers. We propose an objective score called Filler Perplexity (FPP). We build three models trained on two single-speaker spontaneous corpora, and evaluate them with FPP and perceptual tests. We implement two innovations in perceptual tests, (1) evaluating filler insertion on dialogue systems output, (2) synthesizing speech with neural spontaneous TTS engines. FPP proves to be useful in analysis but does not correlate well with perceptual MOS. Perceptual results show little difference between compared filler insertion models including with ground-truth, which may be due to the ambiguity of what is good filler insertion and a strong neural spontaneous TTS that produces natural speech irrespective of input. Results also show preference for filler-inserted speech synthesized with spontaneous TTS. The same test using TTS based on read speech obtains the opposite results, which shows the importance of using spontaneous TTS in evaluating filler insertions. Audio samples: www.speech.kth.se/tts- demos/LREC22

Ort, förlag, år, upplaga, sidor
European Language Resources Association (ELRA), 2022
Nyckelord
filler insertion, spontaneous text-to-speech, spoken dialogue system
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
urn:nbn:se:kth:diva-324340 (URN)000889371702007 ()2-s2.0-85144345531 (Scopus ID)
Konferens
13th International Conference on Language Resources and Evaluation (LREC), JUN 20-25, 2022, Marseille, FRANCE
Anmärkning

QC 20230228

Tillgänglig från: 2023-02-28 Skapad: 2023-02-28 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Organisationer

Sök vidare i DiVA

Visa alla publikationer