kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Synthetic visual speech driven from auditory speech
KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.ORCID iD: 0000-0003-1399-6604
KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
KTH, Superseded Departments (pre-2005), Speech, Music and Hearing.
Show others and affiliations
1999 (English)In: Proceedings of Audio-Visual Speech Processing (AVSP'99)), 1999Conference paper, Published paper (Refereed)
Abstract [en]

We have developed two different methods for using auditory, telephone speech to drive the movements of a synthetic face. In the first method, Hidden Markov Models (HMMs) were trained on a phonetically transcribed telephone speech database. The output of the HMMs was then fed into a rulebased visual speech synthesizer as a string of phonemes together with time labels. In the second method, Artificial Neural Networks (ANNs) were trained on the same database to map acoustic parameters directly to facial control parameters. These target parameter trajectories were generated by using phoneme strings from a database as input to the visual speech synthesis The two methods were evaluated through audiovisual intelligibility tests with ten hearing impaired persons, and compared to “ideal” articulations (where no recognition was involved), a natural face, and to the intelligibility of the audio alone. It was found that the HMM method performs considerably better than the audio alone condition (54% and 34% keywords correct respectively), but not as well as the “ideal” articulating artificial face (64%). The intelligibility for the ANN method was 34% keywords correct.

Place, publisher, year, edition, pages
1999.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-53577OAI: oai:DiVA.org:kth-53577DiVA, id: diva2:470371
Conference
Auditory-Visual Speech Processing (AVSP'99), Santa Cruz, CA, USA, August 7-10, 1999
Note
QC 20120103Available from: 2011-12-28 Created: 2011-12-28 Last updated: 2022-06-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

http://www.isca-speech.org/archive_open/archive_papers/avsp99/av99_021.pdf

Authority records

Beskow, JonasSalvi, Giampiero

Search in DiVA

By author/editor
Agelfors, EvaBeskow, JonasGranström, BjörnSalvi, GiampieroSpens, Karl-ErikÖhman, Tobias
By organisation
Speech, Music and Hearing
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 613 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf