kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Perception of smiling voice in spontaneous speech synthesis
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-0292-1164
Department of Linguistics, Stockholm University, Sweden.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-0397-6442
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1175-840X
2021 (English)In: Proceedings of Speech Synthesis Workshop (SSW11), International Speech Communication Association , 2021, p. 108-112Conference paper, Published paper (Refereed)
Abstract [en]

Smiling during speech production has been shown to result in perceptible acoustic differences compared to non-smiling speech. However, there is a scarcity of research on the perception of “smiling voice” in synthesized spontaneous speech. In this study, we used a sequence-to-sequence neural text-tospeech system built on conversational data to produce utterances with the characteristics of spontaneous speech. Segments of speech following laughter, and the same utterances not preceded by laughter, were compared in a perceptual experiment after removing laughter and/or breaths from the beginning of the utterance to determine whether participants perceive the utterances preceded by laughter as sounding as if they were produced while smiling. The results showed that participants identified the post-laughter speech as smiling at a rate significantly greater than chance. Furthermore, the effect of content (positive/neutral/negative) was investigated. These results show that laughter, a spontaneous, non-elicited phenomenon in our model’s training data, can be used to synthesize expressive speech with the perceptual characteristics of smiling.

Place, publisher, year, edition, pages
International Speech Communication Association , 2021. p. 108-112
Keywords [en]
speech synthesis, text-to-speech, smiling voice, smiled speech
National Category
Natural Language Processing
Research subject
Speech and Music Communication
Identifiers
URN: urn:nbn:se:kth:diva-329143DOI: 10.21437/SSW.2021-19OAI: oai:DiVA.org:kth-329143DiVA, id: diva2:1768550
Conference
Speech Synthesis Workshop (SSW11), Budapest, Hungary, August 26-28, 2021
Funder
Swedish Research Council, VR-2020-02396Swedish Research Council, VR-2019- 05003Riksbankens Jubileumsfond, P20-0298
Note

QC 20230616

Available from: 2023-06-15 Created: 2023-06-15 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

fulltext(286 kB)163 downloads
File information
File name FULLTEXT01.pdfFile size 286 kBChecksum SHA-512
b276501ec612001e3c2d0bb325822984df70a4e6c2ccd6d7e008f347330051630ebbfd05e586ae09728a6c0bbb2ee0ef04f4e1d5dd77e6ca70ae8141526c017a
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Authority records

Kirkland, AmbikaGustafsson, JoakimSzékely, Éva

Search in DiVA

By author/editor
Kirkland, AmbikaGustafsson, JoakimSzékely, Éva
By organisation
Speech, Music and Hearing, TMH
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar
Total: 163 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 389 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf