kth.sePublications KTH
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
CreakVC: A Voice Conversion Tool for Modulating Creaky Voice
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-9537-8505
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-0397-6442
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1175-840X
2024 (English)In: Interspeech 2024, International Speech Communication Association , 2024, p. 1005-1006Conference paper, Published paper (Refereed)
Abstract [en]

We introduce a human-in-the-loop one-shot voice conversion tool called CreakVC designed to modulate the level of creaky voice in the converted speech. Creaky voice, often used by speakers to convey sociolinguistic cues, presents challenges to speech processing due to its complex phonation characteristics. The primary goal of CreakVC is to enable in-depth research into how these cues are perceived, using systematic perceptual studies. CreakVC provides access to a diverse range of voice identities exhibiting creaky voice, while maintaining consistency in other parameters. We developed a spectrogram-frame level creak representation using CreaPy and finetuned FreeVC, a one-shot voice conversion tool, by conditioning the speaker embedding and the self-supervised audio representation with the creak representation. An integrated plotting feature allows users to visualize and manipulate portions of speech for precise adjustments of creaky phonation levels. Beyond research, CreakVC has potential applications in voice-interactive systems and multimedia production.

Place, publisher, year, edition, pages
International Speech Communication Association , 2024. p. 1005-1006
Keywords [en]
creaky voice, TTS, voice conversion
National Category
General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:kth:diva-358875Scopus ID: 2-s2.0-85214828772OAI: oai:DiVA.org:kth-358875DiVA, id: diva2:1930528
Conference
25th Interspeech Conferece 2024, Kos Island, Greece, September 1-5, 2024
Note

QC 20250124

Available from: 2025-01-23 Created: 2025-01-23 Last updated: 2025-01-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Scopusfulltext

Authority records

Lameris, HarmGustafsson, JoakimSzékely, Éva

Search in DiVA

By author/editor
Lameris, HarmGustafsson, JoakimSzékely, Éva
By organisation
Speech, Music and Hearing, TMH
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 144 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf