kth.sePublikationer
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Transfer learning of articulatory information through phone information
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Tal, musik och hörsel, TMH.ORCID-id: 0000-0002-3323-5311
Visa övriga samt affilieringar
2020 (Engelska)Ingår i: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, International Speech Communication Association , 2020, s. 2877-2881Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Articulatory information has been argued to be useful for several speech tasks. However, in most practical scenarios this information is not readily available. We propose a novel transfer learning framework to obtain reliable articulatory information in such cases. We demonstrate its reliability both in terms of estimating parameters of speech production and its ability to enhance the accuracy of an end-to-end phone recognizer. Articulatory information is estimated from speaker independent phonemic features, using a small speech corpus, with electromagnetic articulography (EMA) measurements. Next, we employ a teacher-student model to learn estimation of articulatory features from acoustic features for the targeted phone recognition task. Phone recognition experiments, demonstrate that the proposed transfer learning approach outperforms the baseline transfer learning system acquired directly from an acoustic-to-articulatory (AAI) model. The articulatory features estimated by the proposed method, in conjunction with acoustic features, improved the phone error rate (PER) by 6.7% and 6% on the TIMIT core test and development sets, respectively, compared to standalone static acoustic features. Interestingly, this improvement is slightly higher than what is obtained by static+dynamic acoustic features, but with a significantly less. Adding articulatory features on top of static+dynamic acoustic features yields a small but positive PER improvement.

Ort, förlag, år, upplaga, sidor
International Speech Communication Association , 2020. s. 2877-2881
Nyckelord [en]
Articulatory inversion, Deep learning, Speech recognition, Transfer learning, Learning systems, Speech communication, Telephone sets, Acoustic features, Articulatory features, Articulatory informations, Electromagnetic articulography, Estimating parameters, Learning frameworks, Speaker independents, Speech production
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
URN: urn:nbn:se:kth:diva-302931DOI: 10.21437/Interspeech.2020-1139ISI: 000833594103003Scopus ID: 2-s2.0-85098223486OAI: oai:DiVA.org:kth-302931DiVA, id: diva2:1599907
Konferens
21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020, 25 October 2020 through 29 October 2020
Anmärkning

QC 20211003

Tillgänglig från: 2021-10-03 Skapad: 2021-10-03 Senast uppdaterad: 2025-02-07Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Salvi, Giampiero

Sök vidare i DiVA

Av författaren/redaktören
Salvi, Giampiero
Av organisationen
Tal, musik och hörsel, TMH
Språkbehandling och datorlingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 41 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf