Change search
ReferencesLink to record
Permanent link

Direct link
A Probabilistic Principal Component Analysis Based Hidden Markov Model For Audio-Visual Speech Recognition.
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
2008 (English)In: CONF REC ASILOMAR CONF SIGNAL, 2008, 2170-2173 p.Conference paper (Refereed)
Abstract [en]

Lipreading is an efficient method among those proposed to improve the performance of speech recognition systems, especially in acoustic noisy environments. This paper proposes a simple audio-visual speech recognition (AVSR) system, which could improve the robustness and accuracy of audio speech recognition by integrating the synchronous audio and visual information. We propose a hidden Markov model (HMM) based on the probabilistic principal component analysis (PCA) for the visual-only speech recognition and the visual modality of the audio-visual speech recognition. The probabilistic PCA based HMM directly uses the images which only contain the speaker's mouth region without pre-processing (mouth corner detection, contour marking, etc), and takes probabilistic PCA as the observation probability density function (PDF). Then we integrate these two modalities information (audio and visual) together and obtain a multi-stream hidden Markov model (MSHMM). We found that, without extracting the specialized features before processing, probabilistic PCA could capture the principal components during the training and describe the visual part of the materials. It is also verified by the experiments that the integration of the audio and visual information could help to improve the recognition accuracy even at a low acoustic signal-to-noisy ratio (SNR).

Place, publisher, year, edition, pages
2008. 2170-2173 p.
Keyword [en]
probabilistic PCA, multi-stream hidden Markov model, audio-visual speech recognition
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-30203DOI: 10.1109/ACSSC.2008.5074819ISI: 000274551001195ScopusID: 2-s2.0-70349658510OAI: diva2:399547
42nd Asilomar Conference on Signals, Systems and Computers Pacific Grove, CA, OCT 26-29, 2008
QC 20110222Available from: 2011-02-22 Created: 2011-02-21 Last updated: 2011-02-22Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Ma, ZhanyuLeijon, Arne
By organisation
Sound and Image Processing
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 55 hits
ReferencesLink to record
Permanent link

Direct link