kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Analysis-by-synthesis: phonetic-phonological variation indeep neural network-based text-to-speech synthesis
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-9659-1532
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-4628-3769
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Speech, Music and Hearing, TMH.ORCID iD: 0000-0001-9327-9482
2023 (English)In: Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023 / [ed] Radek Skarnitzl and Jan Volín, Prague, Czech Republic: GUARANT International , 2023, p. 3156-3160Conference paper, Published paper (Refereed)
Abstract [en]

Text-to-speech synthesis based on deep neuralnetworks can generate highly humanlike speech,which revitalizes the potential for analysis-bysynthesis in speech research. We propose that neuralsynthesis can provide evidence that a specificdistinction in its transcription system represents arobust acoustic/phonetic distinction in the speechused to train the model.We synthesized utterances with allophones inincorrect contexts and analyzed the resultsphonetically. Our assumption was that if we gainedcontrol over the allophonic variation in this way, itwould provide strong evidence that the variation isgoverned robustly by the phonological context usedto create the transcriptions.Of three allophonic variations investigated, thefirst, which was believed to be quite robust, gave usrobust control over the variation, while the other two,which are less categorical, did not afford us suchcontrol. These findings are consistent with ourhypothesis and support the notion that neural TTS canbe a valuable analysis-by-synthesis tool for speechresearch. 

Place, publisher, year, edition, pages
Prague, Czech Republic: GUARANT International , 2023. p. 3156-3160
Keywords [en]
analysis-by-synthesis, latent phonetic features, phonological variation, neural TTS
National Category
Other Engineering and Technologies
Research subject
Speech and Music Communication
Identifiers
URN: urn:nbn:se:kth:diva-336586OAI: oai:DiVA.org:kth-336586DiVA, id: diva2:1797227
Conference
20th International Congress of Phonetic Sciences (ICPhS), August 7-11, 2023, Prague, Czech Republic
Funder
Vinnova, 2018-02427
Note

Part of ISBN 978-80-908 114-2-3

QC 20230915

Available from: 2023-09-14 Created: 2023-09-14 Last updated: 2025-02-10Bibliographically approved

Open Access in DiVA

fulltext(573 kB)155 downloads
File information
File name FULLTEXT01.pdfFile size 573 kBChecksum SHA-512
0833578568a5d8209c8e9da03f25811800b47cee97d6aad55ed1f61ae6b0d1cd27da52e9f538666cf606a91efdb9ec2a086d4dcaf0a7a1d8712ec09820958cb3
Type fulltextMimetype application/pdf

Other links

Conference website

Authority records

Tånnander, ChristinaHouse, DavidEdlund, Jens

Search in DiVA

By author/editor
Tånnander, ChristinaHouse, DavidEdlund, Jens
By organisation
Speech, Music and Hearing, TMH
Other Engineering and Technologies

Search outside of DiVA

GoogleGoogle Scholar
Total: 157 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 406 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf