Change search
Refine search result
1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Allwood, Jens
    et al.
    Cerrato, Loredana
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Jokinen, Kristiina
    Navarretta, Costanza
    Paggio, Patrizia
    The MUMIN coding scheme for the annotation of feedback, turn management and sequencing phenomena2007In: Language resources and evaluation, ISSN 1574-020X, E-ISSN 1574-0218, Vol. 41, no 3-4, p. 273-287Article in journal (Refereed)
    Abstract [en]

    This paper deals with a multimodal annotation scheme dedicated to the study of gestures in interpersonal communication, with particular regard to the role played by multimodal expressions for feedback, turn management and sequencing. The scheme has been developed under the framework of the MUMIN network and tested on the analysis of multimodal behaviour in short video clips in Swedish, Finnish and Danish. The preliminary results obtained in these studies show that the reliability of the categories defined in the scheme is acceptable, and that the scheme as a whole constitutes a versatile analysis tool for the study of multimodal communication behaviour.

  • 2.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Cerrato, Loredana
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Evaluation of the expressivity of a Swedish talking head in the context of human-machine interaction2008In: Comunicazione parlatae manifestazione delle emozioni: Atti del I Convegno GSCP, Padova 29 novembre - 1 dicembre 2004 / [ed] Emanuela Magno Caldognetto, Federica Cavicchio e Piero Cosi, 2008Conference paper (Refereed)
    Abstract [en]

    ABSTRACTThis paper describes a first attempt at synthesis and evaluation of expressive visualarticulation using an MPEG-4 based virtual talking head. The synthesis is data-driven,trained on a corpus of emotional speech recorded using optical motion capture. Eachemotion is modelled separately using principal component analysis and a parametriccoarticulation model.In order to evaluate the expressivity of the data driven synthesis two tests wereconducted. Our talking head was used in interactions with a human being in a givenrealistic usage context.The interactions were presented to external observers that were asked to judge theemotion of the talking head. The participants in the experiment could only hear the voice ofthe user, which was a pre-recorded female voice, and see and hear the talking head. Theresults of the evaluation, even if constrained by the results of the implementation, clearlyshow that the visual expression plays a relevant role in the recognition of emotions.

  • 3.
    Beskow, Jonas
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Cerrato, Loredana
    KTH, Superseded Departments, Speech, Music and Hearing.
    Cosi, P.
    Costantini, E.
    Nordstrand, Magnus
    KTH, Superseded Departments, Speech, Music and Hearing.
    Pianesi, F.
    Prete, M.
    Svanfeldt, Gunilla
    KTH, Superseded Departments, Speech, Music and Hearing.
    Preliminary cross-cultural evaluation of expressiveness in synthetic faces2004In: Affective Dialogue Systems, Proceedings / [ed] Andre E, Dybkjaer L, Minker W, Heisterkamp P, Berlin: SPRINGER-VERLAG , 2004, p. 301-304Conference paper (Refereed)
    Abstract [en]

    This paper reports the results of a preliminary cross-evaluation experiment run in the framework of the European research project PF-Star(1), with the double I aim of evaluating the possibility of exchanging FAP data between the involved sites and assessing the-adequacy of the emotional facial gestures performed by talking heads. The results provide initial insights in the way people belonging to various cultures-react to natural and synthetic facial expressions produced in different cultural settings, and in the potentials and limits of FAP data exchange.

  • 4.
    Beskow, Jonas
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Cerrato, Loredana
    KTH, Superseded Departments, Speech, Music and Hearing.
    Granström, Björn
    KTH, Superseded Departments, Speech, Music and Hearing.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Nordenberg, Mikael
    KTH, Superseded Departments, Speech, Music and Hearing.
    Nordstrand, Magnus
    KTH, Superseded Departments, Speech, Music and Hearing.
    Svanfeldt, Gunilla
    KTH, Superseded Departments, Speech, Music and Hearing.
    Expressive animated agents for affective dialogue systems2004In: AFFECTIVE DIALOGUE SYSTEMS, PROCEEDINGS / [ed] Andre, E; Dybkjaer, L; Minker, W; Heisterkamp, P, BERLIN: SPRINGER , 2004, Vol. 3068, p. 240-243Conference paper (Refereed)
    Abstract [en]

    We present our current state of development regarding animated agents applicable to affective dialogue systems. A new set of tools are under development to support the creation of animated characters compatible with the MPEG-4 facial animation standard. Furthermore, we have collected a multimodal expressive speech database including video, audio and 3D point motion registration. One of the objectives of collecting the database is to examine how emotional expression influences articulatory patterns, to be able to model this in our agents. Analysis of the 3D data shows for example that variation in mouth width due to expression greatly exceeds that due to vowel quality.

  • 5.
    Beskow, Jonas
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Cerrato, Loredana
    KTH, Superseded Departments, Speech, Music and Hearing.
    Granström, Björn
    KTH, Superseded Departments, Speech, Music and Hearing.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Nordstrand, Magnus
    KTH, Superseded Departments, Speech, Music and Hearing.
    Svanfeldt, Gunilla
    KTH, Superseded Departments, Speech, Music and Hearing.
    The Swedish PFs-Star Multimodal Corpora2004In: Proceedings of LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces, 2004, p. 34-37Conference paper (Refereed)
    Abstract [en]

    The aim of this paper is to present the multimodal speech corpora collected at KTH, in the framework of the European project PF-Star, and discuss some of the issues related to the analysis and implementation of human communicative and emotional visual correlates of speech in synthetic conversational agents. Two multimodal speech corpora have been collected by means of an opto-electronic system, which allows capturing the dynamics of emotional facial expressions with very high precision. The data has been evaluated through a classification test and the results show promising identification rates for the different acted emotions. These multimodal speech corpora will truly represent a valuable source to get more knowledge about how speech articulation and communicative gestures are affected by the expression of emotions.

  • 6.
    Cerrato, Loredana
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A coding scheme for the annotation of feedback phenomena in conversational speech2004In: Proc of LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces / [ed] Martin, J.C., Lisboa, 2004, p. 25-28Conference paper (Refereed)
    Abstract [en]

    A coding scheme specifically developed to label feedback phenomena in conversational speech is presented in this paper. The coding scheme allows the categorization of feedback phenomena according to their typology, direction, and communicative function in the given context. The results of the reliability tests run to verify the appropriateness of the coding scheme to code feedback phenomena in different languages and across different modalities are also presented.

  • 7.
    Cerrato, Loredana
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A comparative study of verbal feedback in Italian and Swedish map-task dialogues2004In: Proceedings of the Nordic Symposium on the comparison of spoken languages, Copenhagen Working Papers in LSP / [ed] Copenhagen, P.; Hernrichsen, J., 2004, p. 99-126Conference paper (Other academic)
  • 8.
    Cerrato, Loredana
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Linguistic functions of head nods2005In: Proceedings from The Second Nordic Conference on Multimodal Communication / [ed] Allwood, J.; Dorriots, B., Göteborg: Göteborg University , 2005, p. 137-152Conference paper (Refereed)
    Abstract [en]

    The aim of the present study is to investigate which communicative functions head nods can have in spoken Swedish. By nod is here meant a vertical down-up movement of the head. To classify the communicative functions of head nods 10 short video-recorded Swedish dialogues were analysed and labeled. The labels used are referred to the different communicative functions that the head nods carry out in the given context. The results show that the most common function carried out by head nods is that of feedback. Beside feedback function, head nods can be produced to signal turn taking, focus and emphasis, to give affirmative responses and to show courtesy. The visual information carried out by head nods in spoken communicative interactions is without doubt extremely important; therefore it should be exploited in the field of human-machine interfaces. This could be done by integrating head nods in the design and development of embodied conversational agents. Thanks to the production of head nods, embodied conversational agents might become more effective and appear more natural during their interactions with human beings.

  • 9.
    Cerrato, Loredana
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    On the acoustic, prosodic and gestural characteristics of “m-like” sounds in Swedish2005In: Feedback in spoken interaction: NordTalk Symposium / [ed] Jens Allwood, Göteborg: Göteborg University , 2005, Vol. Feedback in Spoken Interaction- Nordtalk Symposium 2003, p. 18-31Conference paper (Refereed)
    Abstract [en]

    The aim of the present study is to verify what communicative functions “m-like” sounds can have in spoken Swedish and investigate both the relationship between prosodic variation and communicative function and the relationship between the production of “mlike” sounds and their accompanying gestures. The main hypothesis tested is that the different communicative functions carried by these “m-like” sounds are conveyed by means of different prosodic cues. To test this hypothesis, audio-recordings of two dialogues, elicited with the map-task technique, were used. A distributional and functional analysis of “m-like” sounds was first carried out. Afterwards, an acoustic analysis of these sounds was performed to find out how prosodic variation and communicative function are related. The results show that the most common function carried out by “m-like” sounds is that of feedback. The general category of feedback can be further divided in sub-categories depending on the specific function that the short expression carries out in the given context. To each function it is possible to relate a prototypical F0 contour and acoustic characteristics. For the analysis of the accompanying gestures of “m-like” sounds, two AV recordings of spontaneous dialogues were used. The results of the distributional analysis show that 41% of all the analysed “m-like” sounds are accompanied by a gesture. The most common accompanying gestures are head movement s such as nods and jerks. The relationship between the function carried by speech and the specific function of the accompanying gesture has also been coded and analyzed. Gestures co-occurring with speech can either have a “non-marked/neutral” function, which means that they do not add further information to what is being said with speech, or can be produced to add, emphasize weaken or contradicting speech. When the function of these gestures is neutral, they tend to have a minimal extent, while when their specific function is to emphasize the information expressed by speech, their extent tends to be bigger. This result might be related to the fact that gestures are often produced to emphasize information that is also focused by mechanisms like prosody in speech.

  • 10.
    Cerrato, Loredana
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    The communicative function of "sì" in Italian and "ja" in Swedish: an acoustic analysis2005In: Proceedings of Fonetik 2005 / [ed] Anders Eriksson, Jonas Lindh, Göteborg: Göteborg University , 2005, p. 41-44Conference paper (Other academic)
    Abstract [en]

    The results of an acoustic analysis and a perceptual evaluation of the role of prosody inspontaneously produced “ja” and “sì” in Swedish and Italian are reported and discussedin this paper. The hypothesis is that pitch contour, duration cues and relative intensity can beuseful in the identification of the different communicative functions of these short expressions taken out of their context. The results of the perceptual tests run to verify whether the acoustic cues alone can be used to distinguish different functions of the same lexical items are encouraging only for Italian “sí”, while for Swedish “ja” they show some confusions among the different categories.

  • 11.
    Cerrato, Loredana
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Ekeklint, Susanne
    Evaluating users reactions to human-like interfaces: Prosodic and paralinguistic features as new evaluation measures for users satisfaction2004In: From Brows to Trust: Evaluating Embodied Conversational Agents / [ed] Ruttkay, Z.; Pelachaud, C., Dordrecht: Kluwer Academic Publishers, 2004, p. 101-124Chapter in book (Refereed)
    Abstract [en]

    An increasing number of dialogue systems are deployed to provide publicservices in our everyday lives. They are becoming more service-minded and several ofthem provide different channels for interaction. The rationale is to make automaticservices available in new environments and more attractive to use. From a developerperspective, this affects the complexity of the requirements elicitation activity, as newcombinations and variations in end-user interaction need to be considered. The aimof our investigation is to propose new parameters and metrics to evaluate multimodaldialogue systems endowed with embodied conversational agents (ECAs). These newmetrics focus on the users, rather than on the system. Our assumption is that theintentional use of prosodic variation and the production of communicative non-verbalbehaviour by users can give an indication of their attitude towards the system andmight also help to evaluate the users’ overall experience of the interaction. To testour hypothesis we carried out analyses on different Swedish corpora of interactionsbetween users and multimodal dialogue systems. We analysed the prosodic variationin the way the users ended their interactions with the system and we observed theproduction of non-verbal communicative expressions by users. Our study supports theidea that the observation of users’ prosodic variation and production of communicativenon-verbal behaviour during the interaction with dialogue systems could be used asan indication of whether or not the users are satisfied with the system performance.

  • 12.
    Cerrato, Loredana
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Svanfeldt, Gunilla
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A method for the detection of communicative head nods in expressive speech2006In: Papers from the Second Nordic Conference on Multimodal Communication 2005 / [ed] Allwood, J.; Dorriots, B.; Nicholson, S., Göteborg: Göteborg University , 2006, p. 153-165Conference paper (Refereed)
    Abstract [en]

    The aim of this study is to propose a method for automatic detection of head nods during the production of semi-spontaneous speech. This method also provides means for extracting certain characteristics of head nods, that may vary depending on placement, function and even underlying emotional expression. The material used is part of the Swedish PF-Star corpora which were recorded by means of an optical motion capture system (Qualisys) able to successfully register articulatory movements as well as head movements and facial expressions. The material consists of short sentences as well as of dialogic speech produced by a Swedish actor. The method for automatic head nods detection on the 3D data acquired with Qualisys is based on criteria for slope, amplitude and a minimum number of consecutive frames. The criteria are tuned on head nods that have been manually annotated. These parameters can be varied to detect different kinds of head movements and can also be combined with other parameters in order to detect facial gestures, such as eyebrow displacements. For this study we focused in particular on the detection of head nods, since in earlier studies they have been found to be important visual cues in particular for signaling feedback and focus. In order to evaluate the method a preliminary test was run on semi-spontaneous dialogic speech, which is also part of the Swedish PF-Star corpora and produced by the same actor who read the sentences. The results show that the parameters and the criteria that had been set on the basis of the training corpus are valid also for the dialogic speech, even if more sophisticated parameters could be useful to achieve a more precise result.

  • 13.
    Sundberg Cerrato, Loredana
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Investigating Communicative Feedback Phenomena across Languages and Modalities2007Doctoral thesis, monograph (Other scientific)
    Abstract [en]

    This thesis deals with human communicative behaviour related to feedback, analysed across languages (Italian and Swedish), modalities (auditory versus visual) and different communicative situations (human-human versus human-machine dialogues).

    The aim of this study is to give more insight into how humans use communicative behaviour related to feedback and at the same time to suggest a method to collect valuable data that can be useful to control facial and head movements related to visual feedback in synthetic conversational agents. The study of human communicative behaviour necessitates the good quality of the materials under analysis, the support of reliable software packages for the audio-visual analysis and a specific coding scheme for the annotation of the phenomena under observation.

    The materials used for the investigations presented in this thesis span from spontaneous conversations video recorded in real communicative situations, and semi-spontaneous dialogues obtained with different eliciting techniques, such as map-task and information-seeking scenarios, to a specific corpus of controlled interactive speech collected by means of a motion capture system. When motion caption is used it is possible to register facial and head movements with a high degree of precision, so as to obtain valuable data useful for the implementation of facial displays in talking heads.

    A specific coding scheme has been developed, tested and used to annotate feedback. The annotation has been carried out with the support of different available software packages for audio-visual analysis.

    The procedure followed in this thesis involves initial analyses of communicative phenomena in spontaneous human-human dialogues and human-machine interaction, in order to learn about regularities in human communicative behaviour that could be transferred to talking heads, then, for the sake of reproduction in talking heads, the investigation includes more detailed analyses of data collected in a lab environment with a novel acquisition set-up that allows capturing the dynamics of facial and head movements.

    Finally the possibilities of transferring human communicative behaviour to a talking face are discussed and some evaluation paradigms are illustrated. The idea of reproducing human behaviour in talking heads is based on the assumption that the reproduction of facial displays related to communicative phenomena such as turn management, feedback production and expression of emotions in embodied conversational agents, might result in the design of advanced systems capable of effective multi-modal interactions with humans.

1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf