kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
PROMIS: a statistical-parametric speech synthesis system with prominence control via a prominence network
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-5953-7310
STTS – Södermalms talteknologiservice AB.
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0003-1399-6604
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-0397-6442
2019 (English)In: Proceedings of SSW 10 - The 10th ISCA Speech Synthesis Workshop, Vienna, 2019Conference paper, Published paper (Refereed)
Abstract [en]

We implement an architecture with explicit prominence learning via a prominence network in Merlin, a statistical-parametric DNN-based text-to-speech system. We build on our previous results that successfully evaluated the inclusion of an automatically extracted, speech-based prominence feature into the training and its control at synthesis time. In this work, we expand the PROMIS system by implementing the prominence network that predicts prominence values from text. We test the network predictions as well as the effects of a prominence control module based on SSML-like tags. Listening tests for the complete PROMIS system, combining a prominence feature, a prominence network and prominence control, show that it effectively controls prominence in a diagnostic set of target words. The tests also show a minor negative impact on perceived naturalness, relative to baseline, exerted by the two prominence tagging methods implemented in the control module.

Place, publisher, year, edition, pages
Vienna, 2019.
National Category
Computer Systems
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-283137OAI: oai:DiVA.org:kth-283137DiVA, id: diva2:1473140
Conference
SSW 10 - The 10th ISCA Speech Synthesis Workshop
Funder
Swedish Research Council, 2017-02861
Note

QC 20201020

Available from: 2020-10-05 Created: 2020-10-05 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

fulltext(212 kB)300 downloads
File information
File name FULLTEXT01.pdfFile size 212 kBChecksum SHA-512
c94767814fc2bdcda883e0f5dbad3ed8a590fc14a5ba9667bedab9e3d6a57d7a1ccea78eba3f002d8d4d513d30244cf446046d51eb0ca1aa0db3d9911c3ab313
Type fulltextMimetype application/pdf

Other links

Conference webpage

Authority records

Malisz, ZofiaBeskow, JonasGustafson, Joakim

Search in DiVA

By author/editor
Malisz, ZofiaBeskow, JonasGustafson, Joakim
By organisation
Speech, Music and Hearing, TMHSpeech Communication and Technology
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 300 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 304 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf