Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Artificial Neural Networks in Swedish Speech Synthesis
KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH, Speech Communication and Technology.
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Artificiella neurala nätverk i svensk talsyntes (Swedish)
Abstract [en]

Text-to-speech (TTS) systems have entered our daily lives in the form of smart assistants and many other applications. Contemporary re- search applies machine learning and artificial neural networks (ANNs) to synthesize speech. It has been shown that these systems outperform the older concatenative and parametric methods.

In this paper, ANN-based methods for speech synthesis are ex- plored and one of the methods is implemented for the Swedish lan- guage. The implemented method is dubbed “Tacotron” and is a first step towards end-to-end ANN-based TTS which puts many differ- ent ANN-techniques to work. The resulting system is compared to a parametric TTS through a strength-of-preference test that is carried out with 20 Swedish speaking subjects. A statistically significant pref- erence for the ANN-based TTS is found. Test subjects indicate that the ANN-based TTS performs better than the parametric TTS when it comes to audio quality and naturalness but sometimes lacks in intelli- gibility.

Abstract [sv]

Talsynteser, också kallat TTS (text-to-speech) används i stor utsträckning inom smarta assistenter och många andra applikationer. Samtida forskning applicerar maskininlärning och artificiella neurala nätverk (ANN) för att utföra talsyntes. Det har visats i studier att dessa system presterar bättre än de äldre konkatenativa och parametriska metoderna.

I den här rapporten utforskas ANN-baserade TTS-metoder och en av metoderna implementeras för det svenska språket. Den använda metoden kallas “Tacotron” och är ett första steg mot end-to-end TTS baserat på neurala nätverk. Metoden binder samman flertalet olika ANN-tekniker. Det resulterande systemet jämförs med en parametriskt TTS genom ett graderat preferens-test som innefattar 20 svensktalande försökspersoner. En statistiskt säkerställd preferens för det ANN- baserade TTS-systemet fastställs. Försökspersonerna indikerar att det ANN-baserade TTS-systemet presterar bättre än det parametriska när det kommer till ljudkvalitet och naturlighet men visar brister inom tydlighet.

Place, publisher, year, edition, pages
2018. , p. 50
Series
TRITA-EECS-EX ; 2018:737
Keywords [en]
Speech Synthesis, neural, LSTM, Speech Technology, Tacotron, Attention, CNN, Neural Networks, RNN
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-239350OAI: oai:DiVA.org:kth-239350DiVA, id: diva2:1264794
External cooperation
Voxo AB
Educational program
Master of Science - Computer Science
Supervisors
Examiners
Available from: 2018-11-26 Created: 2018-11-21 Last updated: 2018-11-26Bibliographically approved

Open Access in DiVA

fulltext(1727 kB)144 downloads
File information
File name FULLTEXT01.pdfFile size 1727 kBChecksum SHA-512
e6f9872dacc150980019a186c2262216b2571977c641a8478a251bf9e6fc97c130c84a1e3ae35501b4a067b0e2add4d162282d5b55bc5027275530b1dff1c25b
Type fulltextMimetype application/pdf

By organisation
Speech Communication and Technology
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 144 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 368 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf