kth.sePublications KTH
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Hear Me Out: Interactive evaluation and bias discovery platform for speech-to-speech conversational AI
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0009-0000-0554-7265
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-1643-1054
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1175-840X
2025 (English)In: Interspeech 2025, International Speech Communication Association , 2025, p. 2151-2152Conference paper, Published paper (Refereed)
Abstract [en]

A new wave of speech foundation models is emerging, capable of processing spoken language directly from audio. These models promise more expressive and emotionally aware interactions by retaining prosodic information throughout conversations. 'Hear Me Out' evaluates their ability to preserve crucial vocal cues, enabling users to explore how variations in speaker characteristics and paralinguistic features influence AI responses. Through real-time voice conversion, users can ask a question and then re-ask it with a modified one, immediately observing differences in response tone, phrasing, and behavior. The system presents paired responses side by side, offering direct comparisons of AI interpretations of both the original and transformed voices, thereby highlighting potential biases. By creating inquiry into speaker modeling, contextual understanding, and fairness, this immersive experience encourages users to reflect on identity, voice, and also promote inclusive future research.

Place, publisher, year, edition, pages
International Speech Communication Association , 2025. p. 2151-2152
Keywords [en]
bias in conversational AI, speech-to-speech conversational AI, voice conversion
National Category
Natural Language Processing Human Computer Interaction Computer Sciences Comparative Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:kth:diva-372786Scopus ID: 2-s2.0-105020052310OAI: oai:DiVA.org:kth-372786DiVA, id: diva2:2015313
Conference
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Note

QC 20251120

Available from: 2025-11-20 Created: 2025-11-20 Last updated: 2025-11-20Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

ScopusFulltext

Authority records

Bokkahalli Satish, Shree HarshaHenter, Gustav EjeSzékely, Éva

Search in DiVA

By author/editor
Bokkahalli Satish, Shree HarshaHenter, Gustav EjeSzékely, Éva
By organisation
Speech, Music and Hearing, TMH
Natural Language ProcessingHuman Computer InteractionComputer SciencesComparative Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 55 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf