kth.sePublikationer KTH
Driftmeddelande
För närvarande är det driftstörningar. Felsökning pågår.
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Acoustic-to-Articulatory Mapping With Joint Optimization of Deep Speech Enhancement and Articulatory Inversion Models
Norwegian Univ Sci & Technol, Signal Proc, N-7491 Trondheim, Norway..
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH. Norwegian Univ Sci & Technol, Signal Proc, N-7491 Trondheim, Norway.ORCID-id: 0000-0002-3323-5311
Norwegian Univ Sci & Technol, Signal Proc, N-7491 Trondheim, Norway..
Norwegian Univ Sci & Technol, Signal Proc, N-7491 Trondheim, Norway..
2022 (Engelska)Ingår i: IEEE/ACM Transactions on Audio, Speech, and Language Processing, ISSN 2329-9290, E-ISSN 2329-9304, Vol. 30, s. 135-147Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

We investigate the problem of speaker independent acoustic-to-articulatory inversion (AAI) in noisy conditions within the deep neural network (DNN) framework. In contrast with recent results in the literature, we argue that a DNN vector-to-vector regression front-end for speech enhancement (DNN-SE) can play a key role in AAI when used to enhance spectral features prior to AAI back-end processing. We experimented with single- and multi-task training strategies for the DNN-SE block finding the latter to be beneficial to AAI. Furthermore, we show that coupling DNN-SE producing enhanced speech features with an AAI trained on clean speech outperforms a multi-condition AAI (AAI-MC) when tested on noisy speech. We observe a 15% relative improvement in the Pearson's correlation coefficient (PCC) between our system and AAI-MC at 0 dB signal-to-noise ratio on the Haskins corpus. Our approach also compares favourably against using a conventional DSP approach to speech enhancement (MMSE with IMCRA) in the front-end. Finally, we demonstrate the utility of articulatory inversion in a downstream speech application. We report significant WER improvements on an automatic speech recognition task in mismatched conditions based on the Wall Street Journal corpus (WSJ) when leveraging articulatory information estimated by AAI-MC system over spectral-alone speech features.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE) , 2022. Vol. 30, s. 135-147
Nyckelord [en]
Noise measurement, Speech enhancement, Task analysis, Mel frequency cepstral coefficient, Training, Hidden Markov models, Deep learning, Deep neural network, acoustic-to-articulatory inversion, multi-task training, speaker independent models
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
URN: urn:nbn:se:kth:diva-307335DOI: 10.1109/TASLP.2021.3133218ISI: 000735507400007Scopus ID: 2-s2.0-85121342065OAI: oai:DiVA.org:kth-307335DiVA, id: diva2:1631339
Anmärkning

QC 20220124

Tillgänglig från: 2022-01-24 Skapad: 2022-01-24 Senast uppdaterad: 2025-08-28Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Salvi, Giampiero

Sök vidare i DiVA

Av författaren/redaktören
Salvi, Giampiero
Av organisationen
Tal, musik och hörsel, TMH
I samma tidskrift
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Språkbehandling och datorlingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 166 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf