Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Human Audio-Visual Consonant Recognition Analyzed with Three Bimodal Integration Models
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
KTH, School of Electrical Engineering (EES), Sound and Image Processing.
2009 (English)In: INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009, 812-815 p.Conference paper, Published paper (Refereed)
Abstract [en]

With A-V recordings. ten normal hearing people took recognition tests at different signal-to-noise ratios (SNR). The AV recognition results are predicted by the fuzzy logical model of perception (FLMP) and the post-labelling integration model (POSTL). We also applied hidden Markov models (HMMs) and multi-stream HMMs (MSHMMs) for the recognition. As expected, all the models agree qualitatively with the results that the benefit gained from the visual signal is larger at lower acoustic SNRs. However, the FLMP severely overestimates the AV integration result, while the POSTL model underestimates it. Our automatic speech recognizers integrated the audio and visual stream efficiently. The visual automatic speech recognizer could be adjusted to correspond to human visual performance. The MSHMMs combine the audio and visual streams efficiently, but the audio automatic speech recognizer must be further improved to allow precise quantitative comparisons with human audio-visual performance.

Place, publisher, year, edition, pages
BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2009. 812-815 p.
Keyword [en]
Audio-visual recognition, Fuzzy Logical Model of Perception, Post-Labelling Model, Hidden Markov Models, Multi-Stream Hidden Markov Models
National Category
Computer and Information Science General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:kth:diva-29880ISI: 000276842800203Scopus ID: 2-s2.0-70450192523ISBN: 978-1-61567-692-7 (print)OAI: oai:DiVA.org:kth-29880DiVA: diva2:399103
Conference
10th Annual Conference of the International Speech Communication Association
Note
QC 20110221Available from: 2011-02-21 Created: 2011-02-17 Last updated: 2011-11-15Bibliographically approved

Open Access in DiVA

No full text

Other links

ScopusISCA

Search in DiVA

By author/editor
Ma, ZhanyuLeijon, Arne
By organisation
Sound and Image Processing
Computer and Information ScienceGeneral Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 51 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf