Åpne denne publikasjonen i ny fane eller vindu >>2025 (engelsk)Inngår i: Interspeech 2025, International Speech Communication Association , 2025, s. 2151-2152Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]
A new wave of speech foundation models is emerging, capable of processing spoken language directly from audio. These models promise more expressive and emotionally aware interactions by retaining prosodic information throughout conversations. 'Hear Me Out' evaluates their ability to preserve crucial vocal cues, enabling users to explore how variations in speaker characteristics and paralinguistic features influence AI responses. Through real-time voice conversion, users can ask a question and then re-ask it with a modified one, immediately observing differences in response tone, phrasing, and behavior. The system presents paired responses side by side, offering direct comparisons of AI interpretations of both the original and transformed voices, thereby highlighting potential biases. By creating inquiry into speaker modeling, contextual understanding, and fairness, this immersive experience encourages users to reflect on identity, voice, and also promote inclusive future research.
sted, utgiver, år, opplag, sider
International Speech Communication Association, 2025
Emneord
bias in conversational AI, speech-to-speech conversational AI, voice conversion
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-372786 (URN)2-s2.0-105020052310 (Scopus ID)
Konferanse
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Merknad
QC 20251120
2025-11-202025-11-202025-11-20bibliografisk kontrollert