Integrating Audio and Visual Cues for Speaker Friendliness in Multimodal Speech Synthesis
2007 (English)In: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, 1461-1464 p.Conference paper (Refereed)
This paper investigates interactions between audio and visual cues to friendliness in questions in two perception experiments. In the first experiment, manually edited parametric audio-visual synthesis was used to create the stimuli. Results were consistent with earlier findings in that a late, high final focal accent peak was perceived as friendlier than an earlier, lower focal accent peak. Friendliness was also effectively signaled by visual facial parameters such as a smile, head nod and eyebrow raising synchronized with the final accent. Consistent additive effects were found between the audio and visual cues for the subjects as a group and individually showing that subjects integrate the two modalities. The second experiment used data-driven visual synthesis where the database was recorded by an actor instructed to portray anger and happiness. Friendliness was correlated to the happy database, but the effect was not as strong as for the parametric synthesis.
Place, publisher, year, edition, pages
BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007. 1461-1464 p.
audio-visual speech perception, multimodal integration, human-machine interaction, audio-visual speech synthesis
Computer and Information Science General Language Studies and Linguistics
IdentifiersURN: urn:nbn:se:kth:diva-30673ISI: 000269998600366ScopusID: 2-s2.0-56149101236ISBN: 978-1-60560-316-2OAI: oai:DiVA.org:kth-30673DiVA: diva2:403095
Interspeech Conference 2007, Antwerp, BELGIUM, AUG 27-31, 2007
Book Group Author(s): ISCA2011-03-112011-03-042011-09-13Bibliographically approved