CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Computational Modeling of the Vocal Tract: Applications to Speech Production
KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.ORCID iD: 0000-0002-8991-1016
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Human speech production is a complex process, involving neuromuscular control signals, the effects of articulators' biomechanical properties and acoustic wave propagation in a vocal tract tube of intricate shape. Modeling these phenomena may play an important role in advancing our understanding of the involved mechanisms, and may also have future medical applications, e.g., guiding doctors in diagnosing, treatment planning, and surgery prediction of related disorders, ranging from oral cancer, cleft palate, obstructive sleep apnea, dysphagia, etc.

A more complete understanding requires models that are as truthful representations as possible of the phenomena. Due to the complexity of such modeling, simplifications have nevertheless been used extensively in speech production research: phonetic descriptors (such as the position and degree of the most constricted part of the vocal tract) are used as control signals, the articulators are represented as two-dimensional geometrical models, the vocal tract is considered as a smooth tube and plane wave propagation is assumed, etc.

This thesis aims at firstly investigating the consequences of such simplifications, and secondly at contributing to establishing unified modeling of the speech production process, by connecting three-dimensional biomechanical modeling of the upper airway with three-dimensional acoustic simulations. The investigation on simplifying assumptions demonstrated the influence of vocal tract geometry features — such as shape representation, bending and lip shape — on its acoustic characteristics, and that the type of modeling — geometrical or biomechanical — affects the spatial trajectories of the articulators, as well as the transition of formant frequencies in the spectrogram.

The unification of biomechanical and acoustic modeling in three-dimensions allows to realistically control the acoustic output of dynamic sounds, such as vowel-vowel utterances, by contraction of relevant muscles. This moves and shapes the speech articulators that in turn dene the vocal tract tube in which the wave propagation occurs. The main contribution of the thesis in this line of work is a novel and complex method that automatically reconstructs the shape of the vocal tract from the biomechanical model. This step is essential to link biomechanical and acoustic simulations, since the vocal tract, which anatomically is a cavity enclosed by different structures, is only implicitly defined in a biomechanical model constituted of several distinct articulators.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2018. , p. 105
Series
TRITA-EECS-AVL ; 2018:90
Keywords [en]
vocal tract, upper airway, speech production, biomechanical model, acoustic model, vocal tract reconstruction
National Category
Computer Sciences
Research subject
Speech and Music Communication
Identifiers
URN: urn:nbn:se:kth:diva-239071ISBN: 978-91-7873-021-6 (print)OAI: oai:DiVA.org:kth-239071DiVA, id: diva2:1263713
Public defence
2018-12-07, D2, Lindstedtsvägen 5, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20181116

Available from: 2018-11-16 Created: 2018-11-16 Last updated: 2018-11-16Bibliographically approved
List of papers
1. Influence of lips on the production of vowels based on finite element simulations and experiments
Open this publication in new window or tab >>Influence of lips on the production of vowels based on finite element simulations and experiments
Show others...
2016 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 139, no 5, p. 2852-2859Article in journal (Refereed) Published
Abstract [en]

Three-dimensional (3-D) numerical approaches for voice production are currently being investigated and developed. Radiation losses produced when sound waves emanate from the mouth aperture are one of the key aspects to be modeled. When doing so, the lips are usually removed from the vocal tract geometry in order to impose a radiation impedance on a closed cross-section, which speeds up the numerical simulations compared to free-field radiation solutions. However, lips may play a significant role. In this work, the lips' effects on vowel sounds are investigated by using 3-D vocal tract geometries generated from magnetic resonance imaging. To this aim, two configurations for the vocal tract exit are considered: with lips and without lips. The acoustic behavior of each is analyzed and compared by means of time-domain finite element simulations that allow free-field wave propagation and experiments performed using 3-D-printed mechanical replicas. The results show that the lips should be included in order to correctly model vocal tract acoustics not only at high frequencies, as commonly accepted, but also in the low frequency range below 4 kHz, where plane wave propagation occurs.

Place, publisher, year, edition, pages
Acoustical Society of America (ASA), 2016
National Category
Language Technology (Computational Linguistics)
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-189323 (URN)10.1121/1.4950698 (DOI)000377715100066 ()27250177 (PubMedID)2-s2.0-84971216381 (Scopus ID)
Projects
EUNISON
Funder
EU, FP7, Seventh Framework Programme, 6877
Note

QC 20160704

Available from: 2016-07-02 Created: 2016-07-02 Last updated: 2018-11-16Bibliographically approved
2. Influence of vocal tract geometry simplifications on the numerical simulation of vowel sounds
Open this publication in new window or tab >>Influence of vocal tract geometry simplifications on the numerical simulation of vowel sounds
Show others...
2016 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 140, no 3, p. 1707-1718Article in journal (Refereed) Published
Abstract [en]

For many years, the vocal tract shape has been approximated by one-dimensional (1D) area functions to study the production of voice. More recently, 3D approaches allow one to deal with the complex 3D vocal tract, although area-based 3D geometries of circular cross-section are still in use. However, little is known about the influence of performing such a simplification, and some alternatives may exist between these two extreme options. To this aim, several vocal tract geometry simplifications for vowels [ɑ], [i], and [u] are investigated in this work. Six cases are considered, consisting of realistic, elliptical, and circular cross-sections interpolated through a bent or straight midline. For frequencies below 4–5 kHz, the influence of bending and cross-sectional shape has been found weak, while above these values simplified bent vocal tracts with realistic cross-sections are necessary to correctly emulate higher-order mode propagation. To perform this study, the finite element method (FEM) has been used. FEM results have also been compared to a 3D multimodal method and to a classical 1D frequency domain model.

Place, publisher, year, edition, pages
Acoustical Society of America (ASA), 2016
National Category
Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-192600 (URN)10.1121/1.4962488 (DOI)000386932500032 ()2-s2.0-84988353352 (Scopus ID)
Projects
EUNISON
Note

QC 20161010

Available from: 2016-09-15 Created: 2016-09-15 Last updated: 2018-11-16Bibliographically approved
3. A semi-polar grid strategy for the three-dimensional finite element simulation of vowel-vowel sequences
Open this publication in new window or tab >>A semi-polar grid strategy for the three-dimensional finite element simulation of vowel-vowel sequences
2017 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, The International Speech Communication Association (ISCA), 2017, Vol. 2017, p. 3477-3481Conference paper, Published paper (Refereed)
Abstract [en]

Three-dimensional computational acoustic models need very detailed 3D vocal tract geometries to generate high quality sounds. Static geometries can be obtained from Magnetic Resonance Imaging (MRI), but it is not currently possible to capture dynamic MRI-based geometries with sufficient spatial and time resolution. One possible solution consists in interpolating between static geometries, but this is a complex task. We instead propose herein to use a semi-polar grid to extract 2D cross-sections from the static 3D geometries, and then interpolate them to obtain the vocal tract dynamics. Other approaches such as the adaptive grid have also been explored. In this method, cross-sections are defined perpendicular to the vocal tract midline, as typically done in 1D to obtain the vocal tract area functions. However, intersections between adjacent cross-sections may occur during the interpolation process, especially when the vocal tract midline quickly changes its orientation. In contrast, the semi-polar grid prevents these intersections because the plane orientations are fixed over time. Finite element simulations of static vowels are first conducted, showing that 3D acoustic wave propagation is not significantly altered when the semi-polar grid is used instead of the adaptive grid. The vowel-vowel sequence [ɑi] is finally simulated to demonstrate the method.

Place, publisher, year, edition, pages
The International Speech Communication Association (ISCA), 2017
Series
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X ; 2017
National Category
Language Technology (Computational Linguistics)
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-212994 (URN)10.21437/Interspeech.2017-448 (DOI)2-s2.0-85039147985 (Scopus ID)
Conference
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017
Note

QC 20170828

Available from: 2017-08-25 Created: 2017-08-25 Last updated: 2018-11-16Bibliographically approved
4. From Tongue Movement Data to Muscle Activation – A Preliminary Study of Artisynth's Inverse Modelling
Open this publication in new window or tab >>From Tongue Movement Data to Muscle Activation – A Preliminary Study of Artisynth's Inverse Modelling
2014 (English)Conference paper, Published paper (Other academic)
Abstract [en]

Finding the muscle activations during speech production is an important part of developing a comprehensive biomechanical model of speech production. Although there are some direct ways, like Electromyography, for measuring muscle activations, these methods usually are highly invasive and sometimes not reliable. They are more over impossible to use for all muscles. In this study we therefore explore an indirect way to estimate tongue muscle activations during speech production by combining Electromagnetic Articulography (EMA) measurements of tongue movements and the inverse modeling in Artisynth. With EMA we measure the time-changing 3D positions of four sensors attached to the tongue surface for a Swedish female subject producing vowel-vowel and vowelconsonant-vowel (VCV) sequences. The measured sensor positions are used as target points for corresponding virtual sensors introduced in the tongue model of Artisynth’s inverse modelling framework, which computes one possible combination of muscle activations that results in the observed sequence of tongue articulations. We present resynthesized tongue movements in the Artisynth model and verify the results by comparing the calculated muscle activations with literature.

Keywords
speech, tongue, muscle activation, electromagnetic articulography, biomechanics
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-239054 (URN)
Conference
Parametric Modeling of Human Anatomy, PMHA 14, Aug 22-23, 2014, Vancouver, BC, CA
Funder
EU, FP7, Seventh Framework Programme, 308874
Note

QC 20181116

Available from: 2018-11-15 Created: 2018-11-15 Last updated: 2018-11-16Bibliographically approved
5. Using a Biomechanical Model and Articulatory Data for the Numerical Production of Vowels
Open this publication in new window or tab >>Using a Biomechanical Model and Articulatory Data for the Numerical Production of Vowels
Show others...
2016 (English)In: Interspeech 2016, 2016, p. 3569-3573Conference paper, Published paper (Refereed)
Abstract [en]

We introduce a framework to study speech production using a biomechanical model of the human vocal tract, ArtiSynth. Electromagnetic articulography data was used as input to an inverse tracking simulation that estimates muscle activations to generate 3D jaw and tongue postures corresponding to the target articulator positions. For acoustic simulations, the vocal tract geometry is needed, but since the vocal tract is a cavity rather than a physical object, its geometry does not explicitly exist in a biomechanical model. A fully-automatic method to extract the 3D geometry (surface mesh) of the vocal tract by blending geometries of the relevant articulators has therefore been developed. This automatic extraction procedure is essential, since a method with manual intervention is not feasible for large numbers of simulations or for generation of dynamic sounds, such as diphthongs. We then simulated the vocal tract acoustics by using the Finite Element Method (FEM). This requires a high quality vocal tract mesh without irregular geometry or self-intersections. We demonstrate that the framework is applicable to acoustic FEM simulations of a wide range of vocal tract deformations. In particular we present results for cardinal vowel production, with muscle activations, vocal tract geometry, and acoustic simulations.

Keywords
speech production, biomechanical articulatory model, vocal tract geometry, vocal tract acoustics, Finite Element Method
National Category
Computer Sciences Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-192602 (URN)10.21437/Interspeech.2016-1500 (DOI)000409394402095 ()2-s2.0-84994364959 (Scopus ID)
Conference
Interspeech, 8-12 Sep 2016, San Francisco
Projects
EUNISON
Note

QC 20160920

Available from: 2016-09-15 Created: 2016-09-15 Last updated: 2018-11-16Bibliographically approved
6. Synthesis of vowels and vowel-vowel utterancesusing a 3D biomechanical-acoustic model
Open this publication in new window or tab >>Synthesis of vowels and vowel-vowel utterancesusing a 3D biomechanical-acoustic model
2018 (English)In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, ISSN 2329-9290Article in journal (Refereed) Submitted
Abstract [en]

A link is established between a 3D biomechanicaland acoustic model allowing for the umerical synthesis of vowelsounds by contraction of the relevant muscles. That is, thecontraction of muscles in the biomechanical model displacesand deforms the articulators, which in turn deform the vocaltract shape. The mixed wave equation for the acoustic pressureand particle velocity is formulated in an arbitrary Lagrangian-Eulerian framework to account for moving boundaries. Theequations are solved numerically using the finite element method.Since the activation of muscles are not fully known for a givenvowel sound, an inverse method is employed to calculate aplausible activation pattern. For vowel-vowel utterances, two different approaches are utilized: linear interpolation in eithermuscle activation or geometrical space. Although the former isthe natural choice for biomechanical modeling, the latter is usedto investigate the contribution of biomechanical modeling onspeech acoustics. Six vowels [ɑ, ə, ɛ, e, i, ɯ] and three vowel-vowelutterances [ɑi, ɑɯ, ɯi] are synthesized using the 3D model. Results,including articulation, formants, and spectrogram of vowelvowelsounds, are in agreement with previous studies.Comparingthe spectrogram of interpolation in muscle and geometrical spacereveals differences in all frequencies, with the most extendeddifference in the second formant transition.

National Category
Computer Sciences
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-239056 (URN)
Projects
EUNISON
Funder
EU, FP7, Seventh Framework Programme, 308874
Note

QC 20181116

Available from: 2018-11-15 Created: 2018-11-15 Last updated: 2018-11-16Bibliographically approved
7. Reconstruction of vocal tract geometries from biomechanical simulations
Open this publication in new window or tab >>Reconstruction of vocal tract geometries from biomechanical simulations
2018 (English)In: International Journal for Numerical Methods in Biomedical Engineering, ISSN 2040-7939, E-ISSN 2040-7947Article in journal (Refereed) Published
Abstract [en]

Medical imaging techniques are usually utilized to acquire the vocal tract geometry in 3D, which may then be used, eg, for acoustic/fluid simulation. As an alternative, such a geometry may also be acquired from a biomechanical simulation, which allows to alter the anatomy and/or articulation to study a variety of configurations. In a biomechanical model, each physical structure is described by its geometry and its properties (such as mass, stiffness, and muscles). In such a model, the vocal tract itself does not have an explicit representation, since it is a cavity rather than a physical structure. Instead, its geometry is defined implicitly by all the structures surrounding the cavity, and such an implicit representation may not be suitable for visualization or for acoustic/fluid simulation. In this work, we propose a method to reconstruct the vocal tract geometry at each time step during the biomechanical simulation. Complexity of the problem, which arises from model alignment artifacts, is addressed by the proposed method. In addition to the main cavity, other small cavities, including the piriform fossa, the sublingual cavity, and the interdental space, can be reconstructed. These cavities may appear or disappear by the position of the larynx, the mandible, and the tongue. To illustrate our method, various static and temporal geometries of the vocal tract are reconstructed and visualized. As a proof of concept, the reconstructed geometries of three cardinal vowels are further used in an acoustic simulation, and the corresponding transfer functions are derived.

Place, publisher, year, edition, pages
John Wiley & Sons, 2018
National Category
Computer Sciences
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-239055 (URN)10.1002/cnm.3159 (DOI)
Funder
EU, FP7, Seventh Framework Programme, 308874
Note

QC 20181116

Available from: 2018-11-15 Created: 2018-11-15 Last updated: 2018-11-16Bibliographically approved

Open Access in DiVA

fulltext(8969 kB)55 downloads
File information
File name FULLTEXT01.pdfFile size 8969 kBChecksum SHA-512
9c6226ddc4bd350b02cdb767140b9449dc8fc1e3a2783c8e8a83f826b40872ef98aa7e53db4c1c7089ec366d4c5f6975aaeabcd6d6d096a28d0ee01b07d815a3
Type fulltextMimetype application/pdf

Authority records BETA

Dabbaghchian, Saeed

Search in DiVA

By author/editor
Dabbaghchian, Saeed
By organisation
Speech, Music and Hearing, TMH
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 55 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 134 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf