Voice Reconstruction through Large-Scale TTS Models: Comparing Zero-Shot and Fine-tuning Approaches to Personalise TTS in Assistive Communication
2025 (English)In: Interspeech 2025, International Speech Communication Association , 2025, p. 2735-2739Conference paper, Published paper (Refereed)
Abstract [en]
Personalised synthetic speech can enhance communication for Augmentative and Alternative Communication (AAC) users, but achieving high-quality, speaker-specific voices depends on various factors such as the condition causing speech loss, and availability of recorded speech. Recent advancements in large-scale zero-shot TTS models may change the data requirements, as they have the potential to adapt to a wider range of inputs. This paper explores the potential of these pretrained models in various data availability scenarios, from extensive spontaneous speech to minimal or no unaffected speech. We evaluate a state-of-the-art TTS system on a case study involving a stroke survivor with dysarthria, leveraging both typical and atypical speech data. Additionally, we introduce a novel interactive approach using dysarthric speech as an audio prompt to enable user-guided prosody adaptation.
Place, publisher, year, edition, pages
International Speech Communication Association , 2025. p. 2735-2739
Keywords [en]
assistive communication, augmentative and alternative communication, dysarthric speech, speech synthesis
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-372804DOI: 10.21437/Interspeech.2025-1726Scopus ID: 2-s2.0-105020070750OAI: oai:DiVA.org:kth-372804DiVA, id: diva2:2013624
Conference
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Note
QC 20251113
2025-11-132025-11-132025-11-13Bibliographically approved