kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Audio-Visual Prosody: Perception, Detection, and Synthesis of Prominence
KTH, School of Computer Science and Communication (CSC).
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0003-1399-6604
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-4628-3769
2010 (English)In: 3rd COST 2102 International Training School on Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces: Theoretical and Practical Issues / [ed] Esposito A; Esposito AM; Martone R; Muller VC; Scarpetta G, 2010, Vol. 6456, p. 55-71Conference paper, Published paper (Refereed)
Abstract [en]

In this chapter, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study a speech intelligibility experiment is conducted, where speech quality is acoustically degraded, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrow raising gestures. The experiment shows that perceiving visual prominence as gestures, synchronized with the auditory prominence, significantly increases speech intelligibility compared to when these gestures are randomly added to speech. We also present a study examining the perception of the behavior of the talking heads when gestures are added at pitch movements. Using eye-gaze tracking technology and questionnaires for 10 moderately hearing impaired subjects, the results of the gaze data show that users look at the face in a similar fashion to when they look at a natural face when gestures are coupled with pitch movements opposed to when the face carries no gestures. From the questionnaires, the results also show that these gestures significantly increase the naturalness and helpfulness of the talking head.

Place, publisher, year, edition, pages
2010. Vol. 6456, p. 55-71
Series
Lecture Notes in Computer, ISSN 0302-9743 ; 6456
Keywords [en]
visual prosody, prominence, stress, multimodal, gaze, head-nod, eyebrows, visual synthesis, talking heads
National Category
Human Computer Interaction
Identifiers
URN: urn:nbn:se:kth:diva-34387DOI: 10.1007/978-3-642-18184-9_6ISI: 000290654500006Scopus ID: 2-s2.0-79952015947ISBN: 978-3-642-18183-2 (print)OAI: oai:DiVA.org:kth-34387DiVA, id: diva2:422418
Conference
3rd EUCOGII-COST 2102 International Training School Caserta, ITALY, MAR 15-19, 2010
Note

QC 20110613

Available from: 2011-06-13 Created: 2011-06-07 Last updated: 2024-03-15Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Al Moubayed, SamerBeskow, JonasHouse, David

Search in DiVA

By author/editor
Al Moubayed, SamerBeskow, JonasGranström, BjörnHouse, David
By organisation
School of Computer Science and Communication (CSC)Speech Communication and Technology
Human Computer Interaction

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 1402 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf