Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
MRI-based vocal tract representations for the three-dimensional finite element synthesis of diphthongs
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligenta system, Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0003-4532-014X
2019 (English)In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 27, no 12, p. 2173-2182Article in journal (Refereed) In press
Abstract [en]

The synthesis of diphthongs in three-dimensions (3D) involves the simulation of acoustic waves propagating through a complex 3D vocal tract geometry that deforms over time. Accurate 3D vocal tract geometries can be extracted from Magnetic Resonance Imaging (MRI), but due to long acquisition times, only static sounds can be currently studied with an adequate spatial resolution. In this work, 3D dynamic vocal tract representations are built to generate diphthongs, based on a set of cross-sections extracted from MRI-based vocal tract geometries of static vowel sounds. A diphthong can then be easily generated by interpolating the location, orientation and shape of these cross-sections, thus avoiding the interpolation of full 3D geometries. Two options are explored to extract the cross-sections. The first one is based on an adaptive grid (AG), which extracts the cross-sections perpendicular to the vocal tract midline, whereas the second one resorts to a semi-polar grid (SPG) strategy, which fixes the cross-section orientations. The finite element method (FEM) has been used to solve the mixed wave equation and synthesize diphthongs [${\alpha i}$] and [${\alpha u}$] in the dynamic 3D vocal tracts. The outputs from a 1D acoustic model based on the Transfer Matrix Method have also been included for comparison. The results show that the SPG and AG provide very close solutions in 3D, whereas significant differences are observed when using them in 1D. The SPG dynamic vocal tract representation is recommended for 3D simulations because it helps to prevent the collision of adjacent cross-sections.

Place, publisher, year, edition, pages
IEEE Press, 2019. Vol. 27, no 12, p. 2173-2182
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-259580DOI: 10.1109/TASLP.2019.2942439ISI: 000492183000001Scopus ID: 2-s2.0-85073632242OAI: oai:DiVA.org:kth-259580DiVA, id: diva2:1352328
Projects
EUNISON
Note

QC 20191111

Available from: 2019-09-18 Created: 2019-09-18 Last updated: 2019-11-11Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Engwall, Olov
By organisation
Speech Communication and Technology
In the same journal
IEEE Transactions on Audio, Speech, and Language Processing
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 213 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf