Audio-visual phoneme classification for pronunciation training applications
2007 (English)In: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, 57-60 p.Conference paper (Refereed)
We present a method for audio-visual classification of Swedish phonemes, to be used in computer-assisted pronunciation training. The probabilistic kernel-based method is applied to the audio signal and/or either a principal or an independent component (PCA or ICA) representation of the mouth region in video images. We investigate which representation (PCA or ICA) that may be most suitable and the number of components required in the base, in order to be able to automatically detect pronunciation errors in Swedish from audio-visual input. Experiments performed on one speaker show that the visual information help avoiding classification errors that would lead to gravely erroneous feedback to the user; that it is better to perform phoneme classification on audio and video separately and then fuse the results, rather than combining them before classification; and that PCA outperforms ICA for fewer than 50 components.
Place, publisher, year, edition, pages
BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007. 57-60 p.
audiovisual phoneme classification, pronunciation error detection, PCA, ICA
Computer and Information Science General Language Studies and Linguistics
IdentifiersURN: urn:nbn:se:kth:diva-30670ISI: 000269998600015ScopusID: 2-s2.0-56149112494ISBN: 978-1-60560-316-2OAI: oai:DiVA.org:kth-30670DiVA: diva2:403104
Interspeech Conference 2007, Antwerp, BELGIUM, AUG 27-31, 2007
Book Group Author(s): ISCA2011-03-112011-03-042011-09-13Bibliographically approved