kth.sePublications KTH
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An inclusive approach to creating a palette of synthetic voices for gender diversity
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1175-840X
Department of Linguistics Cognitive Science, University of Delaware.
2024 (English)In: Proc. Interspeech 2024, 2024, p. 3070-3074Conference paper, Published paper (Refereed)
Abstract [en]

Mainstream text-to-speech (TTS) technologies predominantly rely on binary, cisgender speech, failing to adequately represent the diversity of gender expansive (e.g., transgender and/or nonbinary) people. This poses challenges, particularly for users of Speech Generating Devices (SGDs) seeking TTS voices that authentically reflect their identity and desired expressive nuances. This paper introduces a novel approach for constructing a palette of controllable gender-expansive TTS voices using recordings from 14 gender-expansive speakers. We employ Constrained PCA to extract gender-independent speaker identity vectors from x-vectors, using acoustic Vocal Tract Length (aVTL) as a known component.The result is applied as a speaker embedding in neural TTS, allowing control over the aVTL and several emergent properties captured as a representation of the vocal space across speakers. In addition to quantitative metrics, we present a community evaluation conducted by nonbinary SGD users.

Place, publisher, year, edition, pages
2024. p. 3070-3074
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-367540OAI: oai:DiVA.org:kth-367540DiVA, id: diva2:1985230
Conference
Interspeech
Note

QC 20250731

Available from: 2025-07-22 Created: 2025-07-22 Last updated: 2025-07-31Bibliographically approved

Open Access in DiVA

fulltext(1373 kB)61 downloads
File information
File name FULLTEXT01.pdfFile size 1373 kBChecksum SHA-512
4a0e9196ef16e132def40703995ea84f3ad0e36d08213e33b01a59f818804242613654c63bd8d486fc4af161cab1d37d468be2c97c1279bb1edee2d39a611247
Type fulltextMimetype application/pdf

Authority records

Székely, Éva

Search in DiVA

By author/editor
Székely, Éva
By organisation
Speech, Music and Hearing, TMH
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 61 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 290 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf