Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Perception of smiling voice in spontaneous speech synthesis
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-0292-1164
Department of Linguistics, Stockholm University, Sweden.
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-0397-6442
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-1175-840X
2021 (engelsk)Inngår i: Proceedings of Speech Synthesis Workshop (SSW11), International Speech Communication Association , 2021, s. 108-112Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Smiling during speech production has been shown to result in perceptible acoustic differences compared to non-smiling speech. However, there is a scarcity of research on the perception of “smiling voice” in synthesized spontaneous speech. In this study, we used a sequence-to-sequence neural text-tospeech system built on conversational data to produce utterances with the characteristics of spontaneous speech. Segments of speech following laughter, and the same utterances not preceded by laughter, were compared in a perceptual experiment after removing laughter and/or breaths from the beginning of the utterance to determine whether participants perceive the utterances preceded by laughter as sounding as if they were produced while smiling. The results showed that participants identified the post-laughter speech as smiling at a rate significantly greater than chance. Furthermore, the effect of content (positive/neutral/negative) was investigated. These results show that laughter, a spontaneous, non-elicited phenomenon in our model’s training data, can be used to synthesize expressive speech with the perceptual characteristics of smiling.

sted, utgiver, år, opplag, sider
International Speech Communication Association , 2021. s. 108-112
Emneord [en]
speech synthesis, text-to-speech, smiling voice, smiled speech
HSV kategori
Forskningsprogram
Tal- och musikkommunikation
Identifikatorer
URN: urn:nbn:se:kth:diva-329143DOI: 10.21437/SSW.2021-19OAI: oai:DiVA.org:kth-329143DiVA, id: diva2:1768550
Konferanse
Speech Synthesis Workshop (SSW11), Budapest, Hungary, August 26-28, 2021
Forskningsfinansiär
Swedish Research Council, VR-2020-02396Swedish Research Council, VR-2019- 05003Riksbankens Jubileumsfond, P20-0298
Merknad

QC 20230616

Tilgjengelig fra: 2023-06-15 Laget: 2023-06-15 Sist oppdatert: 2025-02-07bibliografisk kontrollert

Open Access i DiVA

fulltext(286 kB)223 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 286 kBChecksum SHA-512
b276501ec612001e3c2d0bb325822984df70a4e6c2ccd6d7e008f347330051630ebbfd05e586ae09728a6c0bbb2ee0ef04f4e1d5dd77e6ca70ae8141526c017a
Type fulltextMimetype application/pdf

Andre lenker

Forlagets fulltekst

Person

Kirkland, AmbikaGustafsson, JoakimSzékely, Éva

Søk i DiVA

Av forfatter/redaktør
Kirkland, AmbikaGustafsson, JoakimSzékely, Éva
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 223 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 482 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf