kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Controlling prominence realisation in parametric DNN-based speech synthesis
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-5953-7310
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-1399-6604
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-0397-6442
2017 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, International Speech Communication Association , 2017, Vol. 2017, p. 1079-1083Conference paper, Published paper (Refereed)
Abstract [en]

This work aims to improve text-To-speech synthesis forWikipedia by advancing and implementing models of prosodic prominence. We propose a new system architecture with explicit prominence modeling and test the first component of the architecture. We automatically extract a phonetic feature related to prominence from the speech signal in the ARCTIC corpus. We then modify the label files and train an experimental TTS system based on the feature using Merlin, a statistical-parametric DNN-based engine. Test sentences with contrastive prominence on the word-level are synthesised and separate listening tests a) evaluating the level of prominence control in generated speech, and b) naturalness, are conducted. Our results show that the prominence feature-enhanced system successfully places prominence on the appropriate words and increases perceived naturalness relative to the baseline.

Place, publisher, year, edition, pages
International Speech Communication Association , 2017. Vol. 2017, p. 1079-1083
Series
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X
Keywords [en]
Deep neural networks, Prosodic prominence, Speech synthesis
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:kth:diva-222092DOI: 10.21437/Interspeech.2017-1355ISI: 000457505000226Scopus ID: 2-s2.0-85039164235OAI: oai:DiVA.org:kth-222092DiVA, id: diva2:1178938
Conference
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017
Note

QC 20180131

Available from: 2018-01-31 Created: 2018-01-31 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Malisz, ZofiaBeskow, JonasGustafson, Joakim

Search in DiVA

By author/editor
Malisz, ZofiaBeskow, JonasGustafson, Joakim
By organisation
Speech, Music and Hearing, TMH
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 129 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf