Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Towards Adaptable and Intelligible Speech Synthesis in Noisy Environments
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-1001-6415
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0003-1399-6604
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-0397-6442
2025 (engelsk)Inngår i: Interspeech 2025, International Speech Communication Association , 2025, s. 2165-2169Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We present an investigation into adaptable speech synthesis for noisy environments. Leveraging a zero-shot TTS we synthesized a corpus of 1,200 speech samples from 100 sentences of varying complexity, each generated at six distinct levels of vocal effort. To simulate realistic listening conditions, the synthesized speech is merged with environmental noise recordings from a diverse range of indoor and transportation settings at nine different signal-to-noise ratios. We assess the intelligibility of the resulting noisy speech using the ASR word error rates across conditions. Additionally, the input text was evaluated using four metrics on sentence complexity and word predictability. A number of regression models that used noise type, SNR, vocal effort and text as input were trained to predict ASR WER. Results show that increased vocal effort improves intelligibility, with benefits up to 30% in adverse conditions, most most pronounced in environments with competing speech at low SNRs.

sted, utgiver, år, opplag, sider
International Speech Communication Association , 2025. s. 2165-2169
Emneord [en]
noisy environments, speech adaptation, speech intelligibility, speech synthesis
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-372805DOI: 10.21437/Interspeech.2025-2787Scopus ID: 2-s2.0-105020064005OAI: oai:DiVA.org:kth-372805DiVA, id: diva2:2013493
Konferanse
26th Interspeech Conference 2025, Rotterdam, Netherlands, Kingdom of the, August 17-21, 2025
Merknad

QC 20251113

Tilgjengelig fra: 2025-11-13 Laget: 2025-11-13 Sist oppdatert: 2025-11-13bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Person

Marcinek, LubosBeskow, JonasGustafsson, Joakim

Søk i DiVA

Av forfatter/redaktør
Marcinek, LubosBeskow, JonasGustafsson, Joakim
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 68 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf