Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
From Static to Dynamic: Enhancing AAC with Generative Imagery and Zero-Shot TTS
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0009-0005-3693-511X
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-0397-6442
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-1175-840X
2025 (engelsk)Inngår i: Interspeech 2025, International Speech Communication Association , 2025, s. 4960-4962Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper presents an Augmentative and Alternative Communication (AAC) approach for minimally verbal children with Autism Spectrum Disorder. Traditional AAC systems use fixed symbol sets and pre-defined Text-to-Speech (TTS) voices, this proposed method leverages text-to-image generation and zero-shot TTS to expand expressive capabilities. Users can create visual symbols for concepts and interests, enabling richer communication. Further, zero-shot TTS allows users to upload or record personalized voices, enabling users to have individualized output. By minimizing reliance on static symbols and voices, this approach aims to increase communicative agency, personal relevance, and social validity, areas often neglected in traditional interventions. Future research will explore long-term effects on communicative skills, user satisfaction, social engagement, and adaptability across various cultural and linguistic settings, aiming to develop more dynamic and personalized AAC solutions.

sted, utgiver, år, opplag, sider
International Speech Communication Association , 2025. s. 4960-4962
Emneord [en]
AAC, Human-Computer Interaction, Speech Synthesis, TTS
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-372783DOI: 10.21437/Interspeech.2025-2815Scopus ID: 2-s2.0-105020070493OAI: oai:DiVA.org:kth-372783DiVA, id: diva2:2016061
Konferanse
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Merknad

QC 20251124

Tilgjengelig fra: 2025-11-24 Laget: 2025-11-24 Sist oppdatert: 2025-11-24bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Person

Francis, JulianaGustafsson, JoakimSzékely, Éva

Søk i DiVA

Av forfatter/redaktør
Francis, JulianaGustafsson, JoakimSzékely, Éva
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 53 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf