Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Controlling prominence realisation in parametric DNN-based speech synthesis
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.ORCID-id: 0000-0001-5953-7310
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-1399-6604
KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-0397-6442
2017 (engelsk)Inngår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, International Speech Communication Association , 2017, Vol. 2017, s. 1079-1083Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This work aims to improve text-To-speech synthesis forWikipedia by advancing and implementing models of prosodic prominence. We propose a new system architecture with explicit prominence modeling and test the first component of the architecture. We automatically extract a phonetic feature related to prominence from the speech signal in the ARCTIC corpus. We then modify the label files and train an experimental TTS system based on the feature using Merlin, a statistical-parametric DNN-based engine. Test sentences with contrastive prominence on the word-level are synthesised and separate listening tests a) evaluating the level of prominence control in generated speech, and b) naturalness, are conducted. Our results show that the prominence feature-enhanced system successfully places prominence on the appropriate words and increases perceived naturalness relative to the baseline.

sted, utgiver, år, opplag, sider
International Speech Communication Association , 2017. Vol. 2017, s. 1079-1083
Serie
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X
Emneord [en]
Deep neural networks, Prosodic prominence, Speech synthesis
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-222092DOI: 10.21437/Interspeech.2017-1355Scopus ID: 2-s2.0-85039164235OAI: oai:DiVA.org:kth-222092DiVA, id: diva2:1178938
Konferanse
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017
Merknad

QC 20180131

Tilgjengelig fra: 2018-01-31 Laget: 2018-01-31 Sist oppdatert: 2018-01-31bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Personposter BETA

Malisz, ZofiaBeskow, JonasGustafson, Joakim

Søk i DiVA

Av forfatter/redaktør
Malisz, ZofiaBeskow, JonasGustafson, Joakim
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 33 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf