Öppna denna publikation i ny flik eller fönster >>2025 (Engelska)Ingår i: Interspeech 2025, International Speech Communication Association , 2025, s. 2151-2152Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]
A new wave of speech foundation models is emerging, capable of processing spoken language directly from audio. These models promise more expressive and emotionally aware interactions by retaining prosodic information throughout conversations. 'Hear Me Out' evaluates their ability to preserve crucial vocal cues, enabling users to explore how variations in speaker characteristics and paralinguistic features influence AI responses. Through real-time voice conversion, users can ask a question and then re-ask it with a modified one, immediately observing differences in response tone, phrasing, and behavior. The system presents paired responses side by side, offering direct comparisons of AI interpretations of both the original and transformed voices, thereby highlighting potential biases. By creating inquiry into speaker modeling, contextual understanding, and fairness, this immersive experience encourages users to reflect on identity, voice, and also promote inclusive future research.
Ort, förlag, år, upplaga, sidor
International Speech Communication Association, 2025
Nyckelord
bias in conversational AI, speech-to-speech conversational AI, voice conversion
Nationell ämneskategori
Språkbehandling och datorlingvistik Människa-datorinteraktion (interaktionsdesign) Datavetenskap (datalogi) Jämförande språkvetenskap och allmän lingvistik
Identifikatorer
urn:nbn:se:kth:diva-372786 (URN)2-s2.0-105020052310 (Scopus ID)
Konferens
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Anmärkning
QC 20251120
2025-11-202025-11-202025-11-20Bibliografiskt granskad