kth.sePublications KTH
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Neural speech synthesis with controllable creaky voice style
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-9537-8505
Stockholms universitet, Institutionen för lingvistik.ORCID iD: 0000-0003-3824-2980
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-0397-6442
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1175-840X
2023 (English)In: Proceedings of the 20th International Congress of Phonetic Sciences - ICPhS 2023 / [ed] Radek Skarnitzl; Jan Volín, 2023, p. 3141-3145, article id 717Conference paper, Published paper (Refereed)
Abstract [en]

The use of creaky voice, or vocal fry in speech has been extensively studied for its linguistic, paralinguistic, and sociolinguistic functions. However, much of the existing research on this topic is fragmented and often contradictory. In order to gain a deeper understanding of the communicative functions of creaky voice, we propose the use of comparative perceptual studies with natural sounding speech synthesis. We present a neural speech synthesizer that produces highly naturalsounding synthetic speech with controllable creaky voice styles. In a subjective listening experiment, speech experts were able to identify the presence and intensity of creaky voice produced by the synthesizer. Our results suggest that neural speech synthesis can be a valuable tool in furthering our understanding of the communicative functions of creaky voice.

Place, publisher, year, edition, pages
2023. p. 3141-3145, article id 717
Keywords [en]
creaky voice, vocal fry, speech synthesis, TTS, spontaneous speech
National Category
Signal Processing
Identifiers
URN: urn:nbn:se:kth:diva-364768OAI: oai:DiVA.org:kth-364768DiVA, id: diva2:1970124
Conference
20th International Congress of Phonetic Sciences (ICPhS), Prague Congress Center, Czech Republic, August 7–11, 2023
Projects
Connected (VR-2019-05003)STANCE (VR- 2020-02396)Prosodic functions of voice quality dynamics (VR-2019-02932)CAPTivating (P20-0298)
Funder
Swedish Research Council, 2019-02932
Note

QC 20250616

Available from: 2025-06-16 Created: 2025-06-16 Last updated: 2025-06-16Bibliographically approved

Open Access in DiVA

fulltext(2835 kB)45 downloads
File information
File name FULLTEXT01.pdfFile size 2835 kBChecksum SHA-512
92fd8a1f49eedd1a26d9f8ee19327050400e3473ed986785c3259d6c2bd92e6ea01bb53a6e0286905d7fd503cb81def598707208945666823443943fcf04fbe9
Type fulltextMimetype application/pdf

Other links

fulltext

Authority records

Lameris, HarmGustafsson, JoakimSzékely, Éva

Search in DiVA

By author/editor
Lameris, HarmWlodarczak, MarcinGustafsson, JoakimSzékely, Éva
By organisation
Speech, Music and Hearing, TMH
Signal Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 45 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 266 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf