Introducing visual cues in acoustic-to-articulatory inversion
2005 (English)In: Interspeech 2005: 9th European Conference on Speech Communication and Technology, 2005, p. 3205-3208Conference paper, Published paper (Refereed)
Abstract [en]
The contribution of facial measures in a statistical acoustic-to- articulatory inversion has been investigated. The tongue contour was estimated using a linear estimation from either acoustics or acoustics and facial measures. Measures of the lateral movement of lip corners and the vertical movement of the upper and lower lip and the jaw gave a substantial improvement over the audio-only case. It was further found that adding the corresponding articulatory measures that could be extracted from a profile view of the face; i.e. the protrusion of the lips, lip corners and the jaw, did not give any additional improvement of the inversion result. The present study hence suggests that audiovisual-to-articulatory inversion can as well be performed using front view monovision of the face, rather than stereovision of both the front and profile view.
Place, publisher, year, edition, pages
2005. p. 3205-3208
National Category
Computer Sciences Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-51880Scopus ID: 2-s2.0-33745183111OAI: oai:DiVA.org:kth-51880DiVA, id: diva2:465174
Conference
Interspeech 2005: 9th European Conference on Speech Communication and Technology. Lisbon, Portugal. 4 September 2005 - 8 September 2005
Note
QC 20120111. tmh_import_11_12_14
2011-12-142011-12-142022-06-24Bibliographically approved