Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
The Swedish PFs-Star Multimodal Corpora
KTH, Tidigare Institutioner, Tal, musik och hörsel.ORCID-id: 0000-0003-1399-6604
KTH, Tidigare Institutioner, Tal, musik och hörsel.
KTH, Tidigare Institutioner, Tal, musik och hörsel.
KTH, Tidigare Institutioner, Tal, musik och hörsel.ORCID-id: 0000-0002-4628-3769
Visa övriga samt affilieringar
2004 (Engelska)Ingår i: Proceedings of LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces, 2004, s. 34-37Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The aim of this paper is to present the multimodal speech corpora collected at KTH, in the framework of the European project PF-Star, and discuss some of the issues related to the analysis and implementation of human communicative and emotional visual correlates of speech in synthetic conversational agents. Two multimodal speech corpora have been collected by means of an opto-electronic system, which allows capturing the dynamics of emotional facial expressions with very high precision. The data has been evaluated through a classification test and the results show promising identification rates for the different acted emotions. These multimodal speech corpora will truly represent a valuable source to get more knowledge about how speech articulation and communicative gestures are affected by the expression of emotions.

Ort, förlag, år, upplaga, sidor
2004. s. 34-37
Nyckelord [en]
Multimodal corpora collection and analysis, visual correlates of emotional speech, facial animation
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
URN: urn:nbn:se:kth:diva-6511OAI: oai:DiVA.org:kth-6511DiVA, id: diva2:11245
Konferens
LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces, Lisboa 25 May 2004
Anmärkning
QC 20101126Tillgänglig från: 2006-12-06 Skapad: 2006-12-06 Senast uppdaterad: 2018-01-13Bibliografiskt granskad
Ingår i avhandling
1. Expressiveness in virtual talking faces
Öppna denna publikation i ny flik eller fönster >>Expressiveness in virtual talking faces
2006 (Engelska)Licentiatavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

In this thesis, different aspects concerning how to make synthetic talking faces more expressive have been studied. How can we collect data for the studies, how is the lip articulation affected by expressive speech, can the recorded data be used interchangeably in different face models, can we use eye movements in the agent for communicative purposes? The work of this thesis includes studies of these questions and also an experiment using a talking head as a complement to a targeted audio device, in order to increase the intelligibility of the speech.

The data collection described in the first paper resulted in two multimodal speech corpora. In the following analysis of the recorded data it could be stated that expressive modes strongly affect the speech articulation, although further studies are needed in order to acquire more quantitative results and to cover more phonemes and expressions as well as to be able to generalise the results to more than one individual.

When switching the files containing facial animation parameters (FAPs) between different face models (as well as research sites), some problematic issues were encountered despite the fact that both face models were created according to the MPEG-4 standard. The evaluation test of the implemented emotional expressions showed that best recognition results were obtained when the face model and FAP-file originated from the same site.

The perception experiment where a synthetic talking head was combined with a targeted audio, parametric loudspeaker showed that the virtual face augmented the intelligibility of speech, especially when the sound beam was directed slightly to the side of the listener i. e. at lower sound intesities.

In the experiment with eye gaze in a virtual talking head, the possibility of achieving mutual gaze with the observer was assessed. The results indicated that it is possible, but also pointed at some design features in the face model that need to be altered in order to achieve a better control of the perceived gaze direction.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH, 2006. s. 23
Serie
Trita-CSC-A, ISSN 1653-5723 ; 2006:28
Nationell ämneskategori
Språkteknologi (språkvetenskaplig databehandling)
Identifikatorer
urn:nbn:se:kth:diva-4210 (URN)978-91-7178-530-5 (ISBN)
Presentation
2006-12-18, Fantum, KTH, Lindstedtsvägen 24 plan 5, Stockholm, 15:00
Opponent
Handledare
Anmärkning
QC 20101126Tillgänglig från: 2006-12-06 Skapad: 2006-12-06 Senast uppdaterad: 2018-01-13Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

TMH KTH

Personposter BETA

Beskow, JonasHouse, David

Sök vidare i DiVA

Av författaren/redaktören
Beskow, JonasCerrato, LoredanaGranström, BjörnHouse, DavidNordstrand, MagnusSvanfeldt, Gunilla
Av organisationen
Tal, musik och hörsel
Språkteknologi (språkvetenskaplig databehandling)

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 276 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf