kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reconstructing Tongue Movements from Audio and Video
KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.ORCID iD: 0000-0002-5750-9655
KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT. KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.ORCID iD: 0000-0003-4532-014X
KTH, School of Computer Science and Communication (CSC), Human - Computer Interaction, MDI.ORCID iD: 0000-0001-5626-1187
2006 (English)In: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, Vol. 1-5, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, p. 2238-2241Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents an approach to articulatory inversion using audio and video of the user's face, requiring no special markers. The video is stabilized with respect to the face, and the mouth region cropped out. The mouth image is projected into a learned independent component subspace to obtain a low-dimensional representation of the mouth appearance. The inversion problem is treated as one of regression; a non-linear regressor using relevance vector machines is trained with a dataset of simultaneous images of a subject's face, acoustic features and positions of magnetic coils glued to the subjects's tongue. The results show the benefit of using both cues for inversion. We envisage the inversion method to be part of a pronunciation training system with articulatory feedback.

Place, publisher, year, edition, pages
BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006. p. 2238-2241
Keywords [en]
audio-visual to articulatory inversion
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-38182ISI: 000269965901297Scopus ID: 2-s2.0-34548378893ISBN: 978-1-60423-449-7 (print)OAI: oai:DiVA.org:kth-38182DiVA, id: diva2:436171
Conference
9th International Conference on Spoken Language Processing/INTERSPEECH 2006, Pittsburgh
Note
QC 20110822Available from: 2011-08-22 Created: 2011-08-22 Last updated: 2022-06-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Scopus

Authority records

Kjellström, HedvigBälter, Olle

Search in DiVA

By author/editor
Kjellström, HedvigEngwall, OlovBälter, Olle
By organisation
Centre for Speech Technology, CTTSpeech, Music and Hearing, TMHHuman - Computer Interaction, MDI
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 351 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf