Data-driven synthesis of expressive visual speech using an MPEG-4 talking head
2005 (English)In: 9th European Conference on Speech Communication and Technology, Lisbon, 2005, 793-796 p.Conference paper (Refereed)
This paper describes initial experiments with synthesis of visual speech articulation for different emotions, using a newly developed MPEG-4 compatible talking head. The basic problem with combining speech and emotion in a talking head is to handle the interaction between emotional expression and articulation in the orofacial region. Rather than trying to model speech and emotion as two separate properties, the strategy taken here is to incorporate emotional expression in the articulation from the beginning. We use a data-driven approach, training the system to recreate the expressive articulation produced by an actor while portraying different emotions. Each emotion is modelled separately using principal component analysis and a parametric coarticulation model. The results so far are encouraging but more work is needed to improve naturalness and accuracy of the synthesized speech.
Place, publisher, year, edition, pages
Lisbon, 2005. 793-796 p.
Computer Science Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:kth:diva-51886ScopusID: 2-s2.0-33745218748OAI: oai:DiVA.org:kth-51886DiVA: diva2:465180
9th European Conference on Speech Communication and Technology; Lisbon; 4 September 2005 through 8 September 2005
QC 201203132011-12-142011-12-142012-03-13Bibliographically approved