Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Intelligibility of an ASR-controlled synthetic talking face
KTH, Superseded Departments, Speech, Music and Hearing.ORCID iD: 0000-0002-3323-5311
2004 (English)In: Journal of the Acoustical Society of America, ISSN 0001-4966, E-ISSN 1520-8524, Vol. 115, no 5, 2428- p.Article in journal (Refereed) Published
Abstract [en]

The goal of the SYNFACE project is to develop a multilingual synthetic talking face, driven by an automatic speech recognizer (ASR), to assist hearing‐impaired people with telephone communication. Previous multilingual experiments with the synthetic face have shown that time‐aligned synthesized visual face movements can enhance speech intelligibility in normal‐hearing and hearing‐impaired users [C. Siciliano et al., Proc. Int. Cong. Phon. Sci. (2003)]. Similar experiments are in progress to examine whether the synthetic face remains intelligible when driven by ASR output. The recognizer produces phonetic output in real time, in order to drive the synthetic face while maintaining normal dialogue turn‐taking. Acoustic modeling was performed with a neural network, while an HMM was used for decoding. The recognizer was trained on the SpeechDAT telephone speech corpus. Preliminary results suggest that the currently achieved recognition performance of around 60% frames correct limits the usefulness of the synthetic face movements. This is particularly true for consonants, where correct place of articulation is especially important for visual intelligibility. Errors in the alignment of phone boundaries representative of those arising in the ASR output were also shown to decrease audio‐visual intelligibility.

Place, publisher, year, edition, pages
2004. Vol. 115, no 5, 2428- p.
National Category
Computer Science Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:kth:diva-51872OAI: oai:DiVA.org:kth-51872DiVA: diva2:465166
Note
tmh_import_11_12_14. QC 20111219Available from: 2011-12-14 Created: 2011-12-14 Last updated: 2017-12-08Bibliographically approved

Open Access in DiVA

No full text

Authority records BETA

Salvi, Giampiero

Search in DiVA

By author/editor
Salvi, Giampiero
By organisation
Speech, Music and Hearing
In the same journal
Journal of the Acoustical Society of America
Computer ScienceLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 34 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf