Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Analysis-by-synthesis: phonetic-phonological variation indeep neural network-based text-to-speech synthesis
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-9659-1532
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-4628-3769
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0001-9327-9482
2023 (engelsk)Inngår i: Proceedings of the 20th International Congress of Phonetic Sciences, Prague 2023 / [ed] Radek Skarnitzl and Jan Volín, Prague, Czech Republic: GUARANT International , 2023, s. 3156-3160Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Text-to-speech synthesis based on deep neuralnetworks can generate highly humanlike speech,which revitalizes the potential for analysis-bysynthesis in speech research. We propose that neuralsynthesis can provide evidence that a specificdistinction in its transcription system represents arobust acoustic/phonetic distinction in the speechused to train the model.We synthesized utterances with allophones inincorrect contexts and analyzed the resultsphonetically. Our assumption was that if we gainedcontrol over the allophonic variation in this way, itwould provide strong evidence that the variation isgoverned robustly by the phonological context usedto create the transcriptions.Of three allophonic variations investigated, thefirst, which was believed to be quite robust, gave usrobust control over the variation, while the other two,which are less categorical, did not afford us suchcontrol. These findings are consistent with ourhypothesis and support the notion that neural TTS canbe a valuable analysis-by-synthesis tool for speechresearch. 

sted, utgiver, år, opplag, sider
Prague, Czech Republic: GUARANT International , 2023. s. 3156-3160
Emneord [en]
analysis-by-synthesis, latent phonetic features, phonological variation, neural TTS
HSV kategori
Forskningsprogram
Tal- och musikkommunikation
Identifikatorer
URN: urn:nbn:se:kth:diva-336586OAI: oai:DiVA.org:kth-336586DiVA, id: diva2:1797227
Konferanse
20th International Congress of Phonetic Sciences (ICPhS), August 7-11, 2023, Prague, Czech Republic
Forskningsfinansiär
Vinnova, 2018-02427
Merknad

Part of ISBN 978-80-908 114-2-3

QC 20230915

Tilgjengelig fra: 2023-09-14 Laget: 2023-09-14 Sist oppdatert: 2023-09-15bibliografisk kontrollert

Open Access i DiVA

fulltext(573 kB)80 nedlastinger
Filinformasjon
Fil FULLTEXT01.pdfFilstørrelse 573 kBChecksum SHA-512
0833578568a5d8209c8e9da03f25811800b47cee97d6aad55ed1f61ae6b0d1cd27da52e9f538666cf606a91efdb9ec2a086d4dcaf0a7a1d8712ec09820958cb3
Type fulltextMimetype application/pdf

Andre lenker

Conference website

Person

Tånnander, ChristinaHouse, DavidEdlund, Jens

Søk i DiVA

Av forfatter/redaktør
Tånnander, ChristinaHouse, DavidEdlund, Jens
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar
Totalt: 81 nedlastinger
Antall nedlastinger er summen av alle nedlastinger av alle fulltekster. Det kan for eksempel være tidligere versjoner som er ikke lenger tilgjengelige

urn-nbn

Altmetric

urn-nbn
Totalt: 214 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf