Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Detection of Specific Mispronunciations using Audiovisual Features
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.ORCID iD: 0000-0003-4532-014X
Show others and affiliations
2010 (English)In: Auditory-Visual Speech Processing (AVSP) 2010, 2010Conference paper, Published paper (Refereed)
Abstract [en]

This paper introduces a general approach for binaryclassification of audiovisual data. The intended application ismispronunciation detection for specific phonemic errors, usingvery sparse training data. The system uses a Support VectorMachine (SVM) classifier with features obtained from a TimeVarying Discrete Cosine Transform (TV-DCT) on the audiolog-spectrum as well as on the image sequences. Theconcatenated feature vectors from both the modalities werereduced to a very small subset using a combination of featureselection methods. We achieved 95-100% correctclassification for each pair-wise classifier on a database ofSwedish vowels with an average of 58 instances per vowel fortraining. The performance was largely unaffected when testedon data from a speaker who was not included in the training.

Place, publisher, year, edition, pages
2010.
National Category
Computer Science Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-52156OAI: oai:DiVA.org:kth-52156DiVA: diva2:465451
Conference
International Conference on Auditory-Visual Speech Processing, AVSP 2010. Hakone, Kanagawa, Japan. September 30-October 3, 2010
Note
tmh_import_11_12_14. QC 20111215Available from: 2011-12-14 Created: 2011-12-14 Last updated: 2011-12-15Bibliographically approved

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Picard, SebastienAnanthakrishnan, GopalWik, PrebenEngwall, Olov
By organisation
Speech Communication and TechnologyCentre for Speech Technology, CTT
Computer ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 42 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf