Open this publication in new window or tab >>Show others...
2025 (English)In: Interspeech 2025, International Speech Communication Association , 2025, p. 669-673Conference paper, Published paper (Refereed)
Abstract [en]
A voice's gender is considered to be dictated by one's biology and cultural situation. Without modification, this determinism results in colinearity between acoustic metrics, making disentangling a metric's contribution to gender perception difficult. To study disentanglement on natural speech, we collaborate with a gender-affirming voice teacher to collect the Disentangled Source-Filter Dataset (DSFD): 45-minutes of audio along 25 Pitch, Resonance, and Weight voice configurations, coupled with Electroglottograph (EGG) measurements. Our analysis demonstrates certain acoustic and physical metrics, namely avg. F0, ∆F, Contact Quotient (CQ), and Loudness correlate with Pitch, Resonance, and Weight. Going on to perform perceptual studies of gender, naturalness, and realness, we see that ∆F is the strongest predictor of perceived gender. Perceived naturalness and realness of a voice, however, prove to be unpredictable by these acoustic metrics.
Place, publisher, year, edition, pages
International Speech Communication Association, 2025
Keywords
gender perception, speaker identity, voice modification
National Category
Computer Vision and Learning Systems Gender Studies
Identifiers
urn:nbn:se:kth:diva-372801 (URN)10.21437/Interspeech.2025-2372 (DOI)2-s2.0-105020029533 (Scopus ID)
Conference
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Note
QC 20251118
2025-11-182025-11-182025-11-18Bibliographically approved