Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Synthetic visual speech driven from auditory speech
KTH, Tidigare Institutioner, Tal, musik och hörsel.
KTH, Tidigare Institutioner, Tal, musik och hörsel.ORCID-id: 0000-0003-1399-6604
KTH, Tidigare Institutioner, Tal, musik och hörsel.
KTH, Tidigare Institutioner, Tal, musik och hörsel.
Visa övriga samt affilieringar
1999 (Engelska)Ingår i: Proceedings of Audio-Visual Speech Processing (AVSP'99)), 1999Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

We have developed two different methods for using auditory, telephone speech to drive the movements of a synthetic face. In the first method, Hidden Markov Models (HMMs) were trained on a phonetically transcribed telephone speech database. The output of the HMMs was then fed into a rulebased visual speech synthesizer as a string of phonemes together with time labels. In the second method, Artificial Neural Networks (ANNs) were trained on the same database to map acoustic parameters directly to facial control parameters. These target parameter trajectories were generated by using phoneme strings from a database as input to the visual speech synthesis The two methods were evaluated through audiovisual intelligibility tests with ten hearing impaired persons, and compared to “ideal” articulations (where no recognition was involved), a natural face, and to the intelligibility of the audio alone. It was found that the HMM method performs considerably better than the audio alone condition (54% and 34% keywords correct respectively), but not as well as the “ideal” articulating artificial face (64%). The intelligibility for the ANN method was 34% keywords correct.

Ort, förlag, år, upplaga, sidor
1999.
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
URN: urn:nbn:se:kth:diva-53577OAI: oai:DiVA.org:kth-53577DiVA, id: diva2:470371
Konferens
Auditory-Visual Speech Processing (AVSP'99), Santa Cruz, CA, USA, August 7-10, 1999
Anmärkning
QC 20120103Tillgänglig från: 2011-12-28 Skapad: 2011-12-28 Senast uppdaterad: 2018-01-12Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

http://www.isca-speech.org/archive_open/archive_papers/avsp99/av99_021.pdf

Personposter BETA

Beskow, JonasSalvi, Giampiero

Sök vidare i DiVA

Av författaren/redaktören
Agelfors, EvaBeskow, JonasGranström, BjörnSalvi, GiampieroSpens, Karl-ErikÖhman, Tobias
Av organisationen
Tal, musik och hörsel
Data- och informationsvetenskap

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 369 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf