Picture My Voice: Audio to Visual Speech Synthesis using Artificial Neural Networks
1999 (English)In: Proceedings of International Conference on Auditory-Visual Speech Processing / [ed] Massaro, Dominic W., 1999, 133-138 p.Conference paper (Other academic)
This paper presents an initial implementation and evaluation of a system that synthesizes visualspeech directly from the acoustic waveform. Anartificial neural network (ANN) was trained tomap the cepstral coefficients of an individual’snatural speech to the control parameters of ananimated synthetic talking head. We trained ontwo data sets; one was a set of 400 words spokenin isolation by a single speaker and the other a subset of extemporaneous speech from 10different speakers. The system showed learning inboth cases. A perceptual evaluation test indicatedthat the system’s generalization to new words bythe same speaker provides significant visible information, but significantly below that given bya text-to-speech algorithm.
Place, publisher, year, edition, pages
1999. 133-138 p.
IdentifiersURN: urn:nbn:se:kth:diva-12710OAI: oai:DiVA.org:kth-12710DiVA: diva2:318254
International Conference on Auditory-Visual Speech Processing
QC 201005072010-05-072010-05-072010-05-11Bibliographically approved