Mapping between acoustic and articulatory gestures
2011 (English)In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 53, no 4, p. 567-589Article in journal (Refereed) Published
Abstract [en]
This paper proposes a definition for articulatory as well as acoustic gestures along with a method to segment the measured articulatory trajectories and acoustic waveforms into gestures. Using a simultaneously recorded acoustic-articulatory database, the gestures are detected based on finding critical points in the utterance, both in the acoustic and articulatory representations. The acoustic gestures are parameterized using 2-D cepstral coefficients. The articulatory trajectories arc essentially the horizontal and vertical movements of Electromagnetic Articulography (EMA) coils placed on the tongue, jaw and lips along the midsagittal plane. The articulatory movements are parameterized using 2D-DCT using the same transformation that is applied on the acoustics. The relationship between the detected acoustic and articulatory gestures in terms of the timing as well as the shape is studied. In order to study this relationship further, acoustic-to-articulatory inversion is performed using GMM-based regression. The accuracy of predicting the articulatory trajectories from the acoustic waveforms are at par with state-of-the-art frame-based methods with dynamical constraints (with an average error of 1.45-1.55 mm for the two speakers in the database). In order to evaluate the acoustic-to-articulatory inversion in a more intuitive manner, a method based on the error in estimated critical points is suggested. Using this method, it was noted that the estimated articulatory trajectories using the acoustic-to-articulatory inversion methods were still not accurate enough to be within the perceptual tolerance of audio-visual asynchrony.
Place, publisher, year, edition, pages
2011. Vol. 53, no 4, p. 567-589
Keywords [en]
Acoustic gestures, Articulatory gestures, Acoustic-to-articulatory inversion, Critical trajectory error
National Category
General Language Studies and Linguistics Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-32595DOI: 10.1016/j.specom.2011.01.009ISI: 000288929000008Scopus ID: 2-s2.0-79952359521OAI: oai:DiVA.org:kth-32595DiVA, id: diva2:412084
Funder
Swedish Research Council, 621-2008-4490
Note
QC 20110420
2011-04-202011-04-182024-03-18Bibliographically approved