Change search
ReferencesLink to record
Permanent link

Direct link
Audiovisual representation of prosody in expressive speech communication
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.ORCID iD: 0000-0002-4628-3769
2005 (English)In: Speech Communication, ISSN 0167-6393, Vol. 46, no 3-4, 473-484 p.Article in journal (Refereed) Published
Abstract [en]

Prosody in a single speaking style-often read speech-has been studied extensively in acoustic speech. During the past few years we have expanded our interest in two directions: (1) Prosody in expressive speech communication and (2) prosody as an audiovisual expression. Understanding the interactions between visual expressions (primarily in the face) and the acoustics of the corresponding speech presents a substantial challenge. Some of the visual articulation is for obvious reasons tightly connected to the acoustics (e.g. lip and jaw movements), but there are other articulatory movements that do not show up on the outside of the face. Furthermore, many facial gestures used for communicative purposes do not affect the acoustics directly, but might nevertheless be connected on a higher communicative level in which the timing of the gestures could play an important role. In this presentation we will give some examples of recent work, primarily at KTH, addressing these questions. We will report on methods for the acquisition and modeling of visual and acoustic data, and some evaluation experiments in which audiovisual prosody is tested. The context of much of our work in this area is to create an animated talking agent capable of displaying realistic communicative behavior and suitable for use in conversational spoken language systems, e.g. a virtual language teacher.

Place, publisher, year, edition, pages
2005. Vol. 46, no 3-4, 473-484 p.
Keyword [en]
audiovisual prosody, multimodal communication, expressive speech, talking heads, animation, dialogue
National Category
Computer and Information Science
URN: urn:nbn:se:kth:diva-14944DOI: 10.1016/j.specom.2005.02.017ISI: 000230804200017ScopusID: 2-s2.0-21844443583OAI: diva2:332985
QC 20100525 QC 20111011Available from: 2010-08-05 Created: 2010-08-05 Last updated: 2011-10-11Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Granström, BjörnHouse, David
By organisation
Speech Communication and Technology
In the same journal
Speech Communication
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 26 hits
ReferencesLink to record
Permanent link

Direct link