Lip Synchronization: from Phone Lattice to PCA Eigen-projections using Neural Networks
2008 (English)In: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2008, p. 2016-2019Conference paper, Published paper (Refereed)
Abstract [en]
Lip synchronization is the process of generating natural lip movements from a speech signal. In this work we address the lip-sync problem using an automatic phone recognizer that generates a phone lattice carrying posterior probabilities. The acoustic feature vector contains the posterior probabilities of all the phones over a time window centered at the current time point. Hence this representation characterizes the phone recognition output including the confusion patterns caused by its limited accuracy. A 3D face model with varying texture is computed by analyzing a video recording of the speaker using a 3D morphable model. Training a neural network using 30 000 data vectors from an audiovisual recording in Dutch resulted in a very good simulation of the face on independent data sets of the same or of a different speaker.
Place, publisher, year, edition, pages
BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2008. p. 2016-2019
Keywords [en]
lip synchronization, speech recognition, phone lattice, 3D morphable models, principal component analysis, audio visual speech
National Category
Computer and Information Sciences General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:kth:diva-29854ISI: 000277026101077Scopus ID: 2-s2.0-84867204708ISBN: 978-1-61567-378-0 (print)OAI: oai:DiVA.org:kth-29854DiVA, id: diva2:399745
Conference
9th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2008)
Note
QC 20110222
2011-02-232011-02-172022-06-25Bibliographically approved