Introducing visual cues in acoustic-to-articulatory inversion
2005 (English)In: Interspeech 2005: 9th European Conference on Speech Communication and Technology, 2005, 3205-3208 p.Conference paper (Refereed)
The contribution of facial measures in a statistical acoustic-to- articulatory inversion has been investigated. The tongue contour was estimated using a linear estimation from either acoustics or acoustics and facial measures. Measures of the lateral movement of lip corners and the vertical movement of the upper and lower lip and the jaw gave a substantial improvement over the audio-only case. It was further found that adding the corresponding articulatory measures that could be extracted from a profile view of the face; i.e. the protrusion of the lips, lip corners and the jaw, did not give any additional improvement of the inversion result. The present study hence suggests that audiovisual-to-articulatory inversion can as well be performed using front view monovision of the face, rather than stereovision of both the front and profile view.
Place, publisher, year, edition, pages
2005. 3205-3208 p.
Computer Science Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:kth:diva-51880ScopusID: 2-s2.0-33745183111OAI: oai:DiVA.org:kth-51880DiVA: diva2:465174
Interspeech 2005: 9th European Conference on Speech Communication and Technology. Lisbon, Portugal. 4 September 2005 - 8 September 2005
QC 20120111. tmh_import_11_12_142011-12-142011-12-142012-01-11Bibliographically approved