kth.sePublications KTH
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
From Static to Dynamic: Enhancing AAC with Generative Imagery and Zero-Shot TTS
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0009-0005-3693-511X
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-0397-6442
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1175-840X
2025 (English)In: Interspeech 2025, International Speech Communication Association , 2025, p. 4960-4962Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents an Augmentative and Alternative Communication (AAC) approach for minimally verbal children with Autism Spectrum Disorder. Traditional AAC systems use fixed symbol sets and pre-defined Text-to-Speech (TTS) voices, this proposed method leverages text-to-image generation and zero-shot TTS to expand expressive capabilities. Users can create visual symbols for concepts and interests, enabling richer communication. Further, zero-shot TTS allows users to upload or record personalized voices, enabling users to have individualized output. By minimizing reliance on static symbols and voices, this approach aims to increase communicative agency, personal relevance, and social validity, areas often neglected in traditional interventions. Future research will explore long-term effects on communicative skills, user satisfaction, social engagement, and adaptability across various cultural and linguistic settings, aiming to develop more dynamic and personalized AAC solutions.

Place, publisher, year, edition, pages
International Speech Communication Association , 2025. p. 4960-4962
Keywords [en]
AAC, Human-Computer Interaction, Speech Synthesis, TTS
National Category
Natural Language Processing Human Computer Interaction Other Engineering and Technologies
Identifiers
URN: urn:nbn:se:kth:diva-372783DOI: 10.21437/Interspeech.2025-2815Scopus ID: 2-s2.0-105020070493OAI: oai:DiVA.org:kth-372783DiVA, id: diva2:2016061
Conference
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Note

QC 20251124

Available from: 2025-11-24 Created: 2025-11-24 Last updated: 2025-11-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Francis, JulianaGustafsson, JoakimSzékely, Éva

Search in DiVA

By author/editor
Francis, JulianaGustafsson, JoakimSzékely, Éva
By organisation
Speech, Music and Hearing, TMH
Natural Language ProcessingHuman Computer InteractionOther Engineering and Technologies

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 38 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf