kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 16) Show all publications
Dabbaghchian, S., Arnela, M., Engwall, O. & Oriol, G. (2021). Simulation of vowel-vowel utterances using a 3D biomechanical-acoustic model. International Journal for Numerical Methods in Biomedical Engineering, 37(1), Article ID e3407.
Open this publication in new window or tab >>Simulation of vowel-vowel utterances using a 3D biomechanical-acoustic model
2021 (English)In: International Journal for Numerical Methods in Biomedical Engineering, ISSN 2040-7939, E-ISSN 2040-7947, Vol. 37, no 1, article id e3407Article in journal (Refereed) Published
Abstract [en]

A link is established between biomechanical and acoustic 3D models for the numerical simulation of vowel-vowel utterances. The former rely on the activation and contraction of relevant muscles for voice production, which displace and distort speech organs. However, biomechanical models do not provide a closed computational domain of the 3D vocal tract airway where to simulate sound wave propagation. An algorithm is thus proposed to extract the vocal tract boundary from the surrounding anatomical structures at each time step of the transition between vowels. The resulting 3D geometries are fed into a 3D finite element acoustic model that solves the mixed wave equation for the acoustic pressure and particle velocity. An arbitrary Lagrangian-Eulerian framework is considered to account for the evolving vocal tract. Examples include six static vowels and three dynamic vowel-vowel utterances. Plausible muscle activation patterns are first determined for the static vowel sounds following an inverse method. Dynamic utterances are then generated by linearly interpolating the muscle activation of the static vowels. Results exhibit nonlinear trajectory of the vocal tract geometry, similar to that observed in electromagnetic midsagittal articulography. Clear differences are appreciated when comparing the generated sound with that obtained from direct linear interpolation of the vocal tract geometry. That is, interpolation between the starting and ending vocal tract geometries of an utterance, without resorting to any biomechanical model.

Place, publisher, year, edition, pages
Wiley: Wiley-Blackwell, 2021
Keywords
Vowel-vowel utterances, biomechanical model, acoustic model, voice production, Finite Element Method
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-282655 (URN)10.1002/cnm.3407 (DOI)000585908700001 ()33070445 (PubMedID)2-s2.0-85094678269 (Scopus ID)
Funder
EU, FP7, Seventh Framework Programme, 308874
Note

QC 20210302

Available from: 2020-09-30 Created: 2020-09-30 Last updated: 2022-06-25Bibliographically approved
Arnela, M., Dabbaghchian, S., Guasch, O. & Engwall, O. (2019). MRI-based vocal tract representations for the three-dimensional finite element synthesis of diphthongs. IEEE Transactions on Audio, Speech, and Language Processing, 27(12), 2173-2182
Open this publication in new window or tab >>MRI-based vocal tract representations for the three-dimensional finite element synthesis of diphthongs
2019 (English)In: IEEE Transactions on Audio, Speech, and Language Processing, ISSN 1558-7916, E-ISSN 1558-7924, Vol. 27, no 12, p. 2173-2182Article in journal (Refereed) Published
Abstract [en]

The synthesis of diphthongs in three-dimensions (3D) involves the simulation of acoustic waves propagating through a complex 3D vocal tract geometry that deforms over time. Accurate 3D vocal tract geometries can be extracted from Magnetic Resonance Imaging (MRI), but due to long acquisition times, only static sounds can be currently studied with an adequate spatial resolution. In this work, 3D dynamic vocal tract representations are built to generate diphthongs, based on a set of cross-sections extracted from MRI-based vocal tract geometries of static vowel sounds. A diphthong can then be easily generated by interpolating the location, orientation and shape of these cross-sections, thus avoiding the interpolation of full 3D geometries. Two options are explored to extract the cross-sections. The first one is based on an adaptive grid (AG), which extracts the cross-sections perpendicular to the vocal tract midline, whereas the second one resorts to a semi-polar grid (SPG) strategy, which fixes the cross-section orientations. The finite element method (FEM) has been used to solve the mixed wave equation and synthesize diphthongs [${\alpha i}$] and [${\alpha u}$] in the dynamic 3D vocal tracts. The outputs from a 1D acoustic model based on the Transfer Matrix Method have also been included for comparison. The results show that the SPG and AG provide very close solutions in 3D, whereas significant differences are observed when using them in 1D. The SPG dynamic vocal tract representation is recommended for 3D simulations because it helps to prevent the collision of adjacent cross-sections.

Place, publisher, year, edition, pages
IEEE Press, 2019
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-259580 (URN)10.1109/TASLP.2019.2942439 (DOI)000492183000001 ()2-s2.0-85073632242 (Scopus ID)
Projects
EUNISON
Note

QC 20211129

Available from: 2019-09-18 Created: 2019-09-18 Last updated: 2022-06-26Bibliographically approved
Dabbaghchian, S. (2018). Computational Modeling of the Vocal Tract: Applications to Speech Production. (Doctoral dissertation). KTH Royal Institute of Technology
Open this publication in new window or tab >>Computational Modeling of the Vocal Tract: Applications to Speech Production
2018 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Human speech production is a complex process, involving neuromuscular control signals, the effects of articulators' biomechanical properties and acoustic wave propagation in a vocal tract tube of intricate shape. Modeling these phenomena may play an important role in advancing our understanding of the involved mechanisms, and may also have future medical applications, e.g., guiding doctors in diagnosing, treatment planning, and surgery prediction of related disorders, ranging from oral cancer, cleft palate, obstructive sleep apnea, dysphagia, etc.

A more complete understanding requires models that are as truthful representations as possible of the phenomena. Due to the complexity of such modeling, simplifications have nevertheless been used extensively in speech production research: phonetic descriptors (such as the position and degree of the most constricted part of the vocal tract) are used as control signals, the articulators are represented as two-dimensional geometrical models, the vocal tract is considered as a smooth tube and plane wave propagation is assumed, etc.

This thesis aims at firstly investigating the consequences of such simplifications, and secondly at contributing to establishing unified modeling of the speech production process, by connecting three-dimensional biomechanical modeling of the upper airway with three-dimensional acoustic simulations. The investigation on simplifying assumptions demonstrated the influence of vocal tract geometry features — such as shape representation, bending and lip shape — on its acoustic characteristics, and that the type of modeling — geometrical or biomechanical — affects the spatial trajectories of the articulators, as well as the transition of formant frequencies in the spectrogram.

The unification of biomechanical and acoustic modeling in three-dimensions allows to realistically control the acoustic output of dynamic sounds, such as vowel-vowel utterances, by contraction of relevant muscles. This moves and shapes the speech articulators that in turn dene the vocal tract tube in which the wave propagation occurs. The main contribution of the thesis in this line of work is a novel and complex method that automatically reconstructs the shape of the vocal tract from the biomechanical model. This step is essential to link biomechanical and acoustic simulations, since the vocal tract, which anatomically is a cavity enclosed by different structures, is only implicitly defined in a biomechanical model constituted of several distinct articulators.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2018. p. 105
Series
TRITA-EECS-AVL ; 2018:90
Keywords
vocal tract, upper airway, speech production, biomechanical model, acoustic model, vocal tract reconstruction
National Category
Computer Sciences
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-239071 (URN)978-91-7873-021-6 (ISBN)
Public defence
2018-12-07, D2, Lindstedtsvägen 5, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20181116

Available from: 2018-11-16 Created: 2018-11-16 Last updated: 2022-06-26Bibliographically approved
Dabbaghchian, S. (2018). Growing circles: A region growing algorithm for unstructured grids and non-aligned boundaries. In: European Association for Computer Graphics - 39th Annual Conference, EUROGRAPHICS 2018: . Paper presented at 39th Annual Conference on European Association for Computer Graphics, EUROGRAPHICS 2018; Delft; Netherlands; 16 April 2018 through 20 April 2018 (pp. 21-22). The Eurographics Association
Open this publication in new window or tab >>Growing circles: A region growing algorithm for unstructured grids and non-aligned boundaries
2018 (English)In: European Association for Computer Graphics - 39th Annual Conference, EUROGRAPHICS 2018, The Eurographics Association , 2018, p. 21-22Conference paper, Published paper (Refereed)
Abstract [en]

Detecting the boundaries of an enclosed region is a problem which arises in some applications such as the human upper airway modeling. Using of standard algorithms fails because of the inevitable errors, i.e. gaps and overlaps between the surrounding boundaries. Growing circles is an automatic approach to address this problem. A circle is centered inside the region and starts to grow by increasing its radius. Its growth is limited either by the surrounding boundaries or by reaching its maximum radius. To deal with complex shapes, many circles are used in which each circle partially reconstructs the region, and the whole region is determined by the union of these partial regions. The center of the circles and their maximum radius are calculated adaptively. It is similar to the region growing algorithm which is widely used in image processing applications. However, it works for unstructured grids as well as Cartesian ones. As an application of the method, it is applied to detect the boundaries of the upper airway cross-sections.

Place, publisher, year, edition, pages
The Eurographics Association, 2018
Keywords
Automatic approaches, Complex shapes, Computer graphics, Human upper airway model, Image processing, Image processing applications, Region growing algorithm, Respiratory mechanics, Standard algorithms, Unstructured grid, Upper airway
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-285009 (URN)10.2312/egp.20181018 (DOI)2-s2.0-85092206646 (Scopus ID)
Conference
39th Annual Conference on European Association for Computer Graphics, EUROGRAPHICS 2018; Delft; Netherlands; 16 April 2018 through 20 April 2018
Note

QC 20201229

Available from: 2020-12-29 Created: 2020-12-29 Last updated: 2025-02-07Bibliographically approved
Dabbaghchian, S., Arnela, M., Engwall, O. & Guasch, O. (2018). Reconstruction of vocal tract geometries from biomechanical simulations. International Journal for Numerical Methods in Biomedical Engineering
Open this publication in new window or tab >>Reconstruction of vocal tract geometries from biomechanical simulations
2018 (English)In: International Journal for Numerical Methods in Biomedical Engineering, ISSN 2040-7939, E-ISSN 2040-7947Article in journal (Refereed) Published
Abstract [en]

Medical imaging techniques are usually utilized to acquire the vocal tract geometry in 3D, which may then be used, eg, for acoustic/fluid simulation. As an alternative, such a geometry may also be acquired from a biomechanical simulation, which allows to alter the anatomy and/or articulation to study a variety of configurations. In a biomechanical model, each physical structure is described by its geometry and its properties (such as mass, stiffness, and muscles). In such a model, the vocal tract itself does not have an explicit representation, since it is a cavity rather than a physical structure. Instead, its geometry is defined implicitly by all the structures surrounding the cavity, and such an implicit representation may not be suitable for visualization or for acoustic/fluid simulation. In this work, we propose a method to reconstruct the vocal tract geometry at each time step during the biomechanical simulation. Complexity of the problem, which arises from model alignment artifacts, is addressed by the proposed method. In addition to the main cavity, other small cavities, including the piriform fossa, the sublingual cavity, and the interdental space, can be reconstructed. These cavities may appear or disappear by the position of the larynx, the mandible, and the tongue. To illustrate our method, various static and temporal geometries of the vocal tract are reconstructed and visualized. As a proof of concept, the reconstructed geometries of three cardinal vowels are further used in an acoustic simulation, and the corresponding transfer functions are derived.

Place, publisher, year, edition, pages
John Wiley & Sons, 2018
National Category
Computer Sciences
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-239055 (URN)10.1002/cnm.3159 (DOI)000458548700001 ()30242981 (PubMedID)2-s2.0-85056479850 (Scopus ID)
Funder
EU, FP7, Seventh Framework Programme, 308874
Note

QC 20181116

Available from: 2018-11-15 Created: 2018-11-15 Last updated: 2022-06-26Bibliographically approved
Dabbaghchian, S., Arnela, M., Engwall, O. & Guasch, O. (2018). Synthesis of vowels and vowel-vowel utterancesusing a 3D biomechanical-acoustic model. IEEE/ACM Transactions on Audio, Speech, and Language Processing
Open this publication in new window or tab >>Synthesis of vowels and vowel-vowel utterancesusing a 3D biomechanical-acoustic model
2018 (English)In: IEEE/ACM Transactions on Audio, Speech, and Language Processing, ISSN 2329-9290, E-ISSN 2329-9304Article in journal (Refereed) Submitted
Abstract [en]

A link is established between a 3D biomechanicaland acoustic model allowing for the umerical synthesis of vowelsounds by contraction of the relevant muscles. That is, thecontraction of muscles in the biomechanical model displacesand deforms the articulators, which in turn deform the vocaltract shape. The mixed wave equation for the acoustic pressureand particle velocity is formulated in an arbitrary Lagrangian-Eulerian framework to account for moving boundaries. Theequations are solved numerically using the finite element method.Since the activation of muscles are not fully known for a givenvowel sound, an inverse method is employed to calculate aplausible activation pattern. For vowel-vowel utterances, two different approaches are utilized: linear interpolation in eithermuscle activation or geometrical space. Although the former isthe natural choice for biomechanical modeling, the latter is usedto investigate the contribution of biomechanical modeling onspeech acoustics. Six vowels [ɑ, ə, ɛ, e, i, ɯ] and three vowel-vowelutterances [ɑi, ɑɯ, ɯi] are synthesized using the 3D model. Results,including articulation, formants, and spectrogram of vowelvowelsounds, are in agreement with previous studies.Comparingthe spectrogram of interpolation in muscle and geometrical spacereveals differences in all frequencies, with the most extendeddifference in the second formant transition.

National Category
Computer Sciences
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-239056 (URN)
Projects
EUNISON
Funder
EU, FP7, Seventh Framework Programme, 308874
Note

QC 20181116

Available from: 2018-11-15 Created: 2018-11-15 Last updated: 2025-08-28Bibliographically approved
Arnela, M., Dabbaghchian, S., Guasch, O. & Engwall, O. (2017). A semi-polar grid strategy for the three-dimensional finite element simulation of vowel-vowel sequences. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017: . Paper presented at 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017 (pp. 3477-3481). The International Speech Communication Association (ISCA), 2017
Open this publication in new window or tab >>A semi-polar grid strategy for the three-dimensional finite element simulation of vowel-vowel sequences
2017 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, The International Speech Communication Association (ISCA), 2017, Vol. 2017, p. 3477-3481Conference paper, Published paper (Refereed)
Abstract [en]

Three-dimensional computational acoustic models need very detailed 3D vocal tract geometries to generate high quality sounds. Static geometries can be obtained from Magnetic Resonance Imaging (MRI), but it is not currently possible to capture dynamic MRI-based geometries with sufficient spatial and time resolution. One possible solution consists in interpolating between static geometries, but this is a complex task. We instead propose herein to use a semi-polar grid to extract 2D cross-sections from the static 3D geometries, and then interpolate them to obtain the vocal tract dynamics. Other approaches such as the adaptive grid have also been explored. In this method, cross-sections are defined perpendicular to the vocal tract midline, as typically done in 1D to obtain the vocal tract area functions. However, intersections between adjacent cross-sections may occur during the interpolation process, especially when the vocal tract midline quickly changes its orientation. In contrast, the semi-polar grid prevents these intersections because the plane orientations are fixed over time. Finite element simulations of static vowels are first conducted, showing that 3D acoustic wave propagation is not significantly altered when the semi-polar grid is used instead of the adaptive grid. The vowel-vowel sequence [ɑi] is finally simulated to demonstrate the method.

Place, publisher, year, edition, pages
The International Speech Communication Association (ISCA), 2017
Series
Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, ISSN 2308-457X ; 2017
National Category
Natural Language Processing
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-212994 (URN)10.21437/Interspeech.2017-448 (DOI)000457505000724 ()2-s2.0-85039147985 (Scopus ID)
Conference
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017
Note

QC 20170828

Available from: 2017-08-25 Created: 2017-08-25 Last updated: 2025-02-07Bibliographically approved
Dabbaghchian, S., Arnela, M., Engwall, O. & Guasch, O. (2017). Synthesis of VV utterances from muscle activation to sound with a 3d model. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017: . Paper presented at 18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017 (pp. 3497-3501). The International Speech Communication Association (ISCA)
Open this publication in new window or tab >>Synthesis of VV utterances from muscle activation to sound with a 3d model
2017 (English)In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, The International Speech Communication Association (ISCA), 2017, p. 3497-3501Conference paper, Published paper (Refereed)
Abstract [en]

We propose a method to automatically generate deformable 3D vocal tract geometries from the surrounding structures in a biomechanical model. This allows us to couple 3D biomechanics and acoustics simulations. The basis of the simulations is muscle activation trajectories in the biomechanical model, which move the articulators to the desired articulatory positions. The muscle activation trajectories for a vowel-vowel utterance are here defined through interpolation between the determined activations of the start and end vowel. The resulting articulatory trajectories of flesh points on the tongue surface and jaw are similar to corresponding trajectories measured using Electromagnetic Articulography, hence corroborating the validity of interpolating muscle activation. At each time step in the articulatory transition, a 3D vocal tract tube is created through a cavity extraction method based on first slicing the geometry of the articulators with a semi-polar grid to extract the vocal tract contour in each plane and then reconstructing the vocal tract through a smoothed 3D mesh-generation using the extracted contours. A finite element method applied to these changing 3D geometries simulates the acoustic wave propagation. We present the resulting acoustic pressure changes on the vocal tract boundary and the formant transitions for the utterance [Ai].

Place, publisher, year, edition, pages
The International Speech Communication Association (ISCA), 2017
National Category
Natural Language Processing
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-212993 (URN)10.21437/Interspeech.2017-1614 (DOI)000457505000728 ()2-s2.0-85039149051 (Scopus ID)
Conference
18th Annual Conference of the International Speech Communication Association, INTERSPEECH 2017, Stockholm, Sweden, 20 August 2017 through 24 August 2017
Note

QC 20170828

Available from: 2017-08-25 Created: 2017-08-25 Last updated: 2025-02-07Bibliographically approved
Arnela, M., Guasch, O., Dabbaghchian, S. & Engwall, O. (2016). Finite element generation of vowel sounds using dynamic complex three-dimensional vocal tracts. In: Proceedings of the 23rd international congress on sound and vibration: From ancient to modern acoustics. Paper presented at 23rd International Congress on Sound and Vibration (ICSV), JUL 10-14, 2016, Athens, GREECE. INT INST ACOUSTICS & VIBRATION
Open this publication in new window or tab >>Finite element generation of vowel sounds using dynamic complex three-dimensional vocal tracts
2016 (English)In: Proceedings of the 23rd international congress on sound and vibration: From ancient to modern acoustics, INT INST ACOUSTICS & VIBRATION , 2016Conference paper, Published paper (Refereed)
Abstract [en]

Three-dimensional (3D) numerical simulations of the vocal tract acoustics require very detailed vocal tract geometries in order to generate good quality vowel sounds. These geometries are typically obtained from Magnetic Resonance Imaging (MRI), from which a volumetric representation of the complex vocal tract shape is obtained. Static vowel sounds can then be generated using a finite element code, which simulates the propagation of acoustic waves through the vocal tract when a given train of glottal pulses is introduced at the glottal cross-section. A more challenging problem to solve is that of generating dynamic vowel sounds. On the one hand, the acoustic wave equation has to be solved in a computational domain with moving boundaries, which entails some numerical difficulties. On the other hand, the finite element meshes where acoustic wave propagation is computed have to move according to the dynamics of these very complex vocal tract shapes. In this work this problem is addressed. First, the acoustic wave equation in mixed form is expressed in an Arbitrary Lagrangian-Eulerian (ALE) framework to account for the vocal tract wall motion. This equation is numerically solved using a stabilized finite element approach. Second, the dynamic 3D vocal tract geometry is approximated by a finite set of cross-sections with complex shape. The time-evolution of these cross-sections is used to move the boundary nodes of the finite element meshes, while inner nodes are computed through diffusion. Some dynamic vowel sounds are presented as numerical examples.

Place, publisher, year, edition, pages
INT INST ACOUSTICS & VIBRATION, 2016
Series
Proceedings of the International Congress on Sound and Vibration, ISSN 2329-3675
National Category
Fluid Mechanics
Identifiers
urn:nbn:se:kth:diva-203199 (URN)000388480401085 ()2-s2.0-84987899372 (Scopus ID)978-960-99226-2-3 (ISBN)
Conference
23rd International Congress on Sound and Vibration (ICSV), JUL 10-14, 2016, Athens, GREECE
Note

QC 20170314

Available from: 2017-03-14 Created: 2017-03-14 Last updated: 2025-02-09Bibliographically approved
Arnela, M., Blandin, R., Dabbaghchian, S., Guasch, O., Alías, F., Pelorson, X., . . . Engwall, O. (2016). Influence of lips on the production of vowels based on finite element simulations and experiments. Journal of the Acoustical Society of America, 139(5), 2852-2859
Open this publication in new window or tab >>Influence of lips on the production of vowels based on finite element simulations and experiments
Show others...
2016 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 139, no 5, p. 2852-2859Article in journal (Refereed) Published
Abstract [en]

Three-dimensional (3-D) numerical approaches for voice production are currently being investigated and developed. Radiation losses produced when sound waves emanate from the mouth aperture are one of the key aspects to be modeled. When doing so, the lips are usually removed from the vocal tract geometry in order to impose a radiation impedance on a closed cross-section, which speeds up the numerical simulations compared to free-field radiation solutions. However, lips may play a significant role. In this work, the lips' effects on vowel sounds are investigated by using 3-D vocal tract geometries generated from magnetic resonance imaging. To this aim, two configurations for the vocal tract exit are considered: with lips and without lips. The acoustic behavior of each is analyzed and compared by means of time-domain finite element simulations that allow free-field wave propagation and experiments performed using 3-D-printed mechanical replicas. The results show that the lips should be included in order to correctly model vocal tract acoustics not only at high frequencies, as commonly accepted, but also in the low frequency range below 4 kHz, where plane wave propagation occurs.

Place, publisher, year, edition, pages
Acoustical Society of America (ASA), 2016
National Category
Natural Language Processing
Research subject
Speech and Music Communication
Identifiers
urn:nbn:se:kth:diva-189323 (URN)10.1121/1.4950698 (DOI)000377715100066 ()27250177 (PubMedID)2-s2.0-84971216381 (Scopus ID)
Projects
EUNISON
Funder
EU, FP7, Seventh Framework Programme, 6877
Note

QC 20160704

Available from: 2016-07-02 Created: 2016-07-02 Last updated: 2025-02-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-8991-1016

Search in DiVA

Show all publications