Trainable articulatory control models for visual speech synthesis
2004 (English)In: International Journal of Speech Technology, ISSN 1381-2416, E-ISSN 1572-8110, Vol. 7, no 4, 335-349 p.Article in journal (Refereed) Published
This paper deals with the problem of modelling the dynamics of articulation for a parameterised talkinghead based on phonetic input. Four different models are implemented and trained to reproduce the articulatorypatterns of a real speaker, based on a corpus of optical measurements. Two of the models, (“Cohen-Massaro”and “O¨ hman”) are based on coarticulation models from speech production theory and two are based on artificialneural networks, one of which is specially intended for streaming real-time applications. The different models areevaluated through comparison between predicted and measured trajectories, which shows that the Cohen-Massaromodel produces trajectories that best matches the measurements. A perceptual intelligibility experiment is alsocarried out, where the four data-driven models are compared against a rule-based model as well as an audio-alonecondition. Results show that all models give significantly increased speech intelligibility over the audio-alone case,with the rule-based model yielding highest intelligibility score.
Place, publisher, year, edition, pages
Boston: Kluwer Academic Publishers , 2004. Vol. 7, no 4, 335-349 p.
speech synthesis, facial animation, coarticulation, artificial neural networks, perceptual evaluation
IdentifiersURN: urn:nbn:se:kth:diva-12803ScopusID: 2-s2.0-4143072802OAI: oai:DiVA.org:kth-12803DiVA: diva2:318898
QC 201005112010-05-112010-05-112010-05-11Bibliographically approved