kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Where's the uh, hesitation?: The interplay between filled pause location, speech rate and fundamental frequency in perception of confidence
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-0292-1164
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-9537-8505
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1175-840X
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-0397-6442
2022 (English)In: INTERSPEECH 2022, International Speech Communication Association , 2022, p. 4990-4994Conference paper, Published paper (Refereed)
Abstract [en]

Much of the research investigating the perception of speaker certainty has relied on either attempting to elicit prosodic features in read speech, or artificial manipulation of recorded audio. Our novel method of controlling prosody in synthesized spontaneous speech provides a powerful tool for studying speech perception and can provide better insight into the interacting effects of prosodic features on perception while also paving the way for conversational systems which are more effectively able to engage in and respond to social behaviors. Here we have used this method to examine the combined impact of filled pause location, speech rate and f0 on the perception of speaker confidence. We found an additive effect of all three features. The most confident-sounding utterances had no filler, low f0 and high speech rate, while the least confident-sounding utterances had a medial filled pause, high f0 and low speech rate. Insertion of filled pauses had the strongest influence, but pitch and speaking rate could be used to more finely control the uncertainty cues in spontaneous speech synthesis.

Place, publisher, year, edition, pages
International Speech Communication Association , 2022. p. 4990-4994
Series
Interspeech, ISSN 2308-457X
Keywords [en]
speech synthesis, speech perception, expressive speech synthesis, paralinguistics
National Category
General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:kth:diva-324862DOI: 10.21437/Interspeech.2022-10973ISI: 000900724505034Scopus ID: 2-s2.0-85140084915OAI: oai:DiVA.org:kth-324862DiVA, id: diva2:1745115
Conference
Interspeech Conference, SEP 18-22, 2022, Incheon, SOUTH KOREA
Note

QC 20230322

Available from: 2023-03-22 Created: 2023-03-22 Last updated: 2023-03-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Kirkland, AmbikaLameris, HarmSzékely, ÉvaGustafsson, Joakim

Search in DiVA

By author/editor
Kirkland, AmbikaLameris, HarmSzékely, ÉvaGustafsson, Joakim
By organisation
Speech, Music and Hearing, TMH
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 108 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf