kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Methods of slowing down speech
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH. Myndigheten för tillgängliga medier, MTM.ORCID iD: 0000-0002-9659-1532
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-9327-9482
2021 (English)In: Proceedings. 11th ISCA Speech Synthesis Workshop (SSW 11), 2021, p. 43-47Conference paper, Published paper (Refereed)
Abstract [en]

A slower speaking rate of human or synthetic speech is often requested by for example language learners or people with aphasia or dementia. Slow speech produced by human speakers typically contain a larger number of pauses, and both pauses and speech have longer segment durations than speech produced at a standard or fast speaking rate. This paper presents several methods of prolonging speech. Two speech chunks of about 30 seconds each, read by a professional voice talent at a very slow speaking rate, were used as reference. Seven pairs of stimuli containing the same word sequences were produced, one by the same professional, reading at her standard speaking rate and six by a moderately slow synthetic voice trained on the same human voice. Different combinations of pause insertions and stretching were used to match the total length of the corresponding reference stimulus. Stretching was applied in different proportions to speech and non-speech, and pauses were inserted at punctuations, at certain phrase boundaries, between each word, or by copying the pause locations of the reference reading. 128 crowdsourced listeners evaluated the 16 stimuli. The results show that all manipulated readings are less consistent with expectations of slow speech than the reference, but that the synthesised readings are comparable to stretched human speech. Key factors are the relation between speech and silence and the duration of talkspurts.

Place, publisher, year, edition, pages
2021. p. 43-47
National Category
Other Engineering and Technologies
Identifiers
URN: urn:nbn:se:kth:diva-304364DOI: 10.21437/SSW.2021-8OAI: oai:DiVA.org:kth-304364DiVA, id: diva2:1608081
Conference
ISCA Speech Synthesis Workshop, August 26-28 2021 Budapest
Funder
Vinnova, 2018-02427
Note

QC 20211125

Available from: 2021-11-02 Created: 2021-11-02 Last updated: 2025-02-10Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textPublished fulltext

Authority records

Tånnander, ChristinaEdlund, Jens

Search in DiVA

By author/editor
Tånnander, ChristinaEdlund, Jens
By organisation
Speech, Music and Hearing, TMH
Other Engineering and Technologies

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 291 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf