kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (3 of 3) Show all publications
Francis, J., Gustafsson, J. & Székely, É. (2025). From Static to Dynamic: Enhancing AAC with Generative Imagery and Zero-Shot TTS. In: Interspeech 2025: . Paper presented at 26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025 (pp. 4960-4962). International Speech Communication Association
Open this publication in new window or tab >>From Static to Dynamic: Enhancing AAC with Generative Imagery and Zero-Shot TTS
2025 (English)In: Interspeech 2025, International Speech Communication Association , 2025, p. 4960-4962Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents an Augmentative and Alternative Communication (AAC) approach for minimally verbal children with Autism Spectrum Disorder. Traditional AAC systems use fixed symbol sets and pre-defined Text-to-Speech (TTS) voices, this proposed method leverages text-to-image generation and zero-shot TTS to expand expressive capabilities. Users can create visual symbols for concepts and interests, enabling richer communication. Further, zero-shot TTS allows users to upload or record personalized voices, enabling users to have individualized output. By minimizing reliance on static symbols and voices, this approach aims to increase communicative agency, personal relevance, and social validity, areas often neglected in traditional interventions. Future research will explore long-term effects on communicative skills, user satisfaction, social engagement, and adaptability across various cultural and linguistic settings, aiming to develop more dynamic and personalized AAC solutions.

Place, publisher, year, edition, pages
International Speech Communication Association, 2025
Keywords
AAC, Human-Computer Interaction, Speech Synthesis, TTS
National Category
Natural Language Processing Human Computer Interaction Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-372783 (URN)10.21437/Interspeech.2025-2815 (DOI)2-s2.0-105020070493 (Scopus ID)
Conference
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Note

QC 20251124

Available from: 2025-11-24 Created: 2025-11-24 Last updated: 2025-11-24Bibliographically approved
Netzorg, R., Carvalho, N., Guzman, A., Wang, L., Francis, J., Garoute, K. V., . . . Anumanchipalli, G. K. (2025). On the Production and Perception of a Single Speaker's Gender. In: Interspeech 2025: . Paper presented at 26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025 (pp. 669-673). International Speech Communication Association
Open this publication in new window or tab >>On the Production and Perception of a Single Speaker's Gender
Show others...
2025 (English)In: Interspeech 2025, International Speech Communication Association , 2025, p. 669-673Conference paper, Published paper (Refereed)
Abstract [en]

A voice's gender is considered to be dictated by one's biology and cultural situation. Without modification, this determinism results in colinearity between acoustic metrics, making disentangling a metric's contribution to gender perception difficult. To study disentanglement on natural speech, we collaborate with a gender-affirming voice teacher to collect the Disentangled Source-Filter Dataset (DSFD): 45-minutes of audio along 25 Pitch, Resonance, and Weight voice configurations, coupled with Electroglottograph (EGG) measurements. Our analysis demonstrates certain acoustic and physical metrics, namely avg. F0, ∆F, Contact Quotient (CQ), and Loudness correlate with Pitch, Resonance, and Weight. Going on to perform perceptual studies of gender, naturalness, and realness, we see that ∆F is the strongest predictor of perceived gender. Perceived naturalness and realness of a voice, however, prove to be unpredictable by these acoustic metrics.

Place, publisher, year, edition, pages
International Speech Communication Association, 2025
Keywords
gender perception, speaker identity, voice modification
National Category
Computer Vision and Learning Systems Gender Studies
Identifiers
urn:nbn:se:kth:diva-372801 (URN)10.21437/Interspeech.2025-2372 (DOI)2-s2.0-105020029533 (Scopus ID)
Conference
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Note

QC 20251118

Available from: 2025-11-18 Created: 2025-11-18 Last updated: 2025-11-18Bibliographically approved
Francis, J., Székely, É. & Gustafsson, J. (2024). ConnecTone: A Modular AAC System Prototype with Contextual Generative Text Prediction and Style-Adaptive Conversational TTS. In: Interspeech 2024: . Paper presented at 25th Interspeech Conferece 2024, Kos Island, Greece, September 1-5, 2024 (pp. 1001-1002). International Speech Communication Association
Open this publication in new window or tab >>ConnecTone: A Modular AAC System Prototype with Contextual Generative Text Prediction and Style-Adaptive Conversational TTS
2024 (English)In: Interspeech 2024, International Speech Communication Association , 2024, p. 1001-1002Conference paper, Published paper (Refereed)
Abstract [en]

Recent developments in generative language modeling and conversational Text-to-Speech present transformative potential for enhancing Augmentative and Alternative Communication (AAC) devices. Practical application of these technologies requires extensive research and testing. To address this, we introduce ConnecTone, a modular platform designed for rapid integration and testing of language generation and speech technology. ConnecTone implements context-sensitive generative text prediction, using conversational context from Automatic Speech Recognition inputs. The system incorporates a neural TTS that supports interpolation between reading and spontaneous conversational styles, along with adjustable prosodic features. These speech characteristics are predicted using Large Language Models, but can be adjusted by users for individual needs. We anticipate ConnecTone will enable us to rapidly evaluate and implement innovations, thereby contributing to faster benefit delivery to AAC users.

Place, publisher, year, edition, pages
International Speech Communication Association, 2024
Keywords
AAC, Human-Computer Interaction, Speech Synthesis, TTS
National Category
Natural Language Processing Computer Sciences Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-358873 (URN)2-s2.0-85214814511 (Scopus ID)
Conference
25th Interspeech Conferece 2024, Kos Island, Greece, September 1-5, 2024
Note

QC 20250124

Available from: 2025-01-23 Created: 2025-01-23 Last updated: 2025-01-24Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0009-0005-3693-511X

Search in DiVA

Show all publications