On the Production and Perception of a Single Speaker's GenderShow others and affiliations
2025 (English)In: Interspeech 2025, International Speech Communication Association , 2025, p. 669-673Conference paper, Published paper (Refereed)
Abstract [en]
A voice's gender is considered to be dictated by one's biology and cultural situation. Without modification, this determinism results in colinearity between acoustic metrics, making disentangling a metric's contribution to gender perception difficult. To study disentanglement on natural speech, we collaborate with a gender-affirming voice teacher to collect the Disentangled Source-Filter Dataset (DSFD): 45-minutes of audio along 25 Pitch, Resonance, and Weight voice configurations, coupled with Electroglottograph (EGG) measurements. Our analysis demonstrates certain acoustic and physical metrics, namely avg. F0, ∆F, Contact Quotient (CQ), and Loudness correlate with Pitch, Resonance, and Weight. Going on to perform perceptual studies of gender, naturalness, and realness, we see that ∆F is the strongest predictor of perceived gender. Perceived naturalness and realness of a voice, however, prove to be unpredictable by these acoustic metrics.
Place, publisher, year, edition, pages
International Speech Communication Association , 2025. p. 669-673
Keywords [en]
gender perception, speaker identity, voice modification
National Category
Computer Vision and Learning Systems Gender Studies
Identifiers
URN: urn:nbn:se:kth:diva-372801DOI: 10.21437/Interspeech.2025-2372Scopus ID: 2-s2.0-105020029533OAI: oai:DiVA.org:kth-372801DiVA, id: diva2:2014571
Conference
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Note
QC 20251118
2025-11-182025-11-182025-11-18Bibliographically approved