kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Audio-visual phoneme classification for pronunciation training applications
KTH, School of Computer Science and Communication (CSC), Computer Vision and Active Perception, CVAP.ORCID iD: 0000-0002-5750-9655
KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT. KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0003-4532-014X
KTH, School of Computer Science and Communication (CSC), Human - Computer Interaction, MDI.ORCID iD: 0000-0001-5626-1187
2007 (English)In: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, p. 57-60Conference paper, Published paper (Refereed)
Abstract [en]

We present a method for audio-visual classification of Swedish phonemes, to be used in computer-assisted pronunciation training. The probabilistic kernel-based method is applied to the audio signal and/or either a principal or an independent component (PCA or ICA) representation of the mouth region in video images. We investigate which representation (PCA or ICA) that may be most suitable and the number of components required in the base, in order to be able to automatically detect pronunciation errors in Swedish from audio-visual input. Experiments performed on one speaker show that the visual information help avoiding classification errors that would lead to gravely erroneous feedback to the user; that it is better to perform phoneme classification on audio and video separately and then fuse the results, rather than combining them before classification; and that PCA outperforms ICA for fewer than 50 components.

Place, publisher, year, edition, pages
BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007. p. 57-60
Keywords [en]
audiovisual phoneme classification, pronunciation error detection, PCA, ICA
National Category
Computer and Information Sciences General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:kth:diva-30670ISI: 000269998600015Scopus ID: 2-s2.0-56149112494ISBN: 978-1-60560-316-2 (print)OAI: oai:DiVA.org:kth-30670DiVA, id: diva2:403104
Conference
Interspeech Conference 2007, Antwerp, BELGIUM, AUG 27-31, 2007
Note
Book Group Author(s): ISCAAvailable from: 2011-03-11 Created: 2011-03-04 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Kjellström, HedvigBälter, Olle

Search in DiVA

By author/editor
Kjellström, HedvigEngwall, OlovBälter, Olle
By organisation
Computer Vision and Active Perception, CVAPCentre for Speech Technology, CTTSpeech Communication and TechnologyHuman - Computer Interaction, MDI
Computer and Information SciencesGeneral Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 439 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf