kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
SynFace-Speech-Driven Facial Animation for Virtual Speech-Reading Support
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-3323-5311
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0003-1399-6604
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
2009 (English)In: Eurasip Journal on Audio, Speech, and Music Processing, ISSN 1687-4714, Vol. 2009, p. 191940-Article in journal (Refereed) Published
Abstract [en]

This paper describes SynFace, a supportive technology that aims at enhancing audio-based spoken communication in adverse acoustic conditions by providing the missing visual information in the form of an animated talking head. Firstly, we describe the system architecture, consisting of a 3D animated face model controlled from the speech input by a specifically optimised phonetic recogniser. Secondly, we report on speech intelligibility experiments with focus on multilinguality and robustness to audio quality. The system, already available for Swedish, English, and Flemish, was optimised for German and for Swedish wide-band speech quality available in TV, radio, and Internet communication. Lastly, the paper covers experiments with nonverbal motions driven from the speech signal. It is shown that turn-taking gestures can be used to affect the flow of human-human dialogues. We have focused specifically on two categories of cues that may be extracted from the acoustic signal: prominence/emphasis and interactional cues (turn-taking/back-channelling).

Place, publisher, year, edition, pages
2009. Vol. 2009, p. 191940-
Keywords [en]
RECOGNITION, IMPLEMENTATION, THRESHOLD, MODELS
National Category
Specific Languages Computer and Information Sciences Fluid Mechanics
Identifiers
URN: urn:nbn:se:kth:diva-28195DOI: 10.1155/2009/191940ISI: 000285145100001Scopus ID: 2-s2.0-76649097032OAI: oai:DiVA.org:kth-28195DiVA, id: diva2:384801
Funder
EU, FP7, Seventh Framework Programme, IST-045089Swedish Research Council, 621-2005-3488
Note
QC 20110110Available from: 2011-01-10 Created: 2011-01-10 Last updated: 2025-02-05Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Salvi, GiampieroBeskow, Jonas

Search in DiVA

By author/editor
Salvi, GiampieroBeskow, JonasAl Moubayed, SamerGrandström, Björn
By organisation
Speech Communication and Technology
Specific LanguagesComputer and Information SciencesFluid Mechanics

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 244 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf