Lip Synchronization: from Phone Lattice to PCA Eigen-projections using Neural Networks
2008 (English)In: INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2008, 2016-2019 p.Conference paper (Refereed)
Lip synchronization is the process of generating natural lip movements from a speech signal. In this work we address the lip-sync problem using an automatic phone recognizer that generates a phone lattice carrying posterior probabilities. The acoustic feature vector contains the posterior probabilities of all the phones over a time window centered at the current time point. Hence this representation characterizes the phone recognition output including the confusion patterns caused by its limited accuracy. A 3D face model with varying texture is computed by analyzing a video recording of the speaker using a 3D morphable model. Training a neural network using 30 000 data vectors from an audiovisual recording in Dutch resulted in a very good simulation of the face on independent data sets of the same or of a different speaker.
Place, publisher, year, edition, pages
BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2008. 2016-2019 p.
lip synchronization, speech recognition, phone lattice, 3D morphable models, principal component analysis, audio visual speech
Computer and Information Science General Language Studies and Linguistics
IdentifiersURN: urn:nbn:se:kth:diva-29854ISI: 000277026101077ScopusID: 2-s2.0-84867204708ISBN: 978-1-61567-378-0OAI: oai:DiVA.org:kth-29854DiVA: diva2:399745
9th Annual Conference of the International-Speech-Communication-Association (INTERSPEECH 2008)
QC 201102222011-02-232011-02-172011-02-23Bibliographically approved