Ändra sökning
Avgränsa sökresultatet
12 1 - 50 av 84
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Träffar per sida
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sortering
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
  • Standard (Relevans)
  • Författare A-Ö
  • Författare Ö-A
  • Titel A-Ö
  • Titel Ö-A
  • Publikationstyp A-Ö
  • Publikationstyp Ö-A
  • Äldst först
  • Nyast först
  • Skapad (Äldst först)
  • Skapad (Nyast först)
  • Senast uppdaterad (Äldst först)
  • Senast uppdaterad (Nyast först)
  • Disputationsdatum (tidigaste först)
  • Disputationsdatum (senaste först)
Markera
Maxantalet träffar du kan exportera från sökgränssnittet är 250. Vid större uttag använd dig av utsökningar.
  • 1.
    Al Moubayed, Samer
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Animated Faces for Robotic Heads: Gaze and Beyond2011Ingår i: Analysis of Verbal and Nonverbal Communication and Enactment: The Processing Issues / [ed] Anna Esposito, Alessandro Vinciarelli, Klára Vicsi, Catherine Pelachaud and Anton Nijholt, Springer Berlin/Heidelberg, 2011, s. 19-35Konferensbidrag (Refereegranskat)
    Abstract [en]

    We introduce an approach to using animated faces for robotics where a static physical object is used as a projection surface for an animation. The talking head is projected onto a 3D physical head model. In this chapter we discuss the different benefits this approach adds over mechanical heads. After that, we investigate a phenomenon commonly referred to as the Mona Lisa gaze effect. This effect results from the use of 2D surfaces to display 3D images and causes the gaze of a portrait to seemingly follow the observer no matter where it is viewed from. The experiment investigates the perception of gaze direction by observers. The analysis shows that the 3D model eliminates the effect, and provides an accurate perception of gaze direction. We discuss at the end the different requirements of gaze in interactive systems, and explore the different settings these findings give access to.

  • 2.
    Al Moubayed, Samer
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC).
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Audio-Visual Prosody: Perception, Detection, and Synthesis of Prominence2010Ingår i: 3rd COST 2102 International Training School on Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces: Theoretical and Practical Issues / [ed] Esposito A; Esposito AM; Martone R; Muller VC; Scarpetta G, 2010, Vol. 6456, s. 55-71Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this chapter, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study a speech intelligibility experiment is conducted, where speech quality is acoustically degraded, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrow raising gestures. The experiment shows that perceiving visual prominence as gestures, synchronized with the auditory prominence, significantly increases speech intelligibility compared to when these gestures are randomly added to speech. We also present a study examining the perception of the behavior of the talking heads when gestures are added at pitch movements. Using eye-gaze tracking technology and questionnaires for 10 moderately hearing impaired subjects, the results of the gaze data show that users look at the face in a similar fashion to when they look at a natural face when gestures are coupled with pitch movements opposed to when the face carries no gestures. From the questionnaires, the results also show that these gestures significantly increase the naturalness and helpfulness of the talking head.

  • 3.
    Alexanderson, Simon
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Aspects of co-occurring syllables and head nods in spontaneous dialogue2013Ingår i: Proceedings of 12th International Conference on Auditory-Visual Speech Processing (AVSP2013), 2013, s. 169-172Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper reports on the extraction and analysis of head nods taken from motion capture data of spontaneous dialogue in Swedish. The head nods were extracted automatically and then manually classified in terms of gestures having a beat function or multifunctional gestures. Prosodic features were extracted from syllables co-occurring with the beat gestures. While the peak rotation of the nod is on average aligned with the stressed syllable, the results show considerable variation in fine temporal synchronization. The syllables co-occurring with the gestures generally show greater intensity, higher F0, and greater F0 range when compared to the mean across the entire dialogue. A functional analysis shows that the majority of the syllables belong to words bearing a focal accent.

  • 4.
    Alexanderson, Simon
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Automatic annotation of gestural units in spontaneous face-to-face interaction2016Ingår i: MA3HMI 2016 - Proceedings of the Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, 2016, s. 15-19Konferensbidrag (Refereegranskat)
    Abstract [en]

    Speech and gesture co-occur in spontaneous dialogue in a highly complex fashion. There is a large variability in the motion that people exhibit during a dialogue, and different kinds of motion occur during different states of the interaction. A wide range of multimodal interface applications, for example in the fields of virtual agents or social robots, can be envisioned where it is important to be able to automatically identify gestures that carry information and discriminate them from other types of motion. While it is easy for a human to distinguish and segment manual gestures from a flow of multimodal information, the same task is not trivial to perform for a machine. In this paper we present a method to automatically segment and label gestural units from a stream of 3D motion capture data. The gestural flow is modeled with a 2-level Hierarchical Hidden Markov Model (HHMM) where the sub-states correspond to gesture phases. The model is trained based on labels of complete gesture units and self-adaptive manipulators. The model is tested and validated on two datasets differing in genre and in method of capturing motion, and outperforms a state-of-the-art SVM classifier on a publicly available dataset.

  • 5.
    Alexanderson, Simon
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Extracting and analysing co-speech head gestures from motion-capture data2013Ingår i: Proceedings of Fonetik 2013 / [ed] Eklund, Robert, Linköping University Electronic Press, 2013, s. 1-4Konferensbidrag (Refereegranskat)
  • 6.
    Alexanderson, Simon
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Extracting and analyzing head movements accompanying spontaneous dialogue2013Ingår i: Conference Proceedings TiGeR 2013: Tilburg Gesture Research Meeting, 2013Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper reports on a method developed for extracting and analyzing head gestures taken from motion capture data of spontaneous dialogue in Swedish. Candidate head gestures with beat function were extracted automatically and then manually classified using a 3D player which displays timesynced audio and 3D point data of the motion capture markers together with animated characters. Prosodic features were extracted from syllables co-occurring with a subset of the classified gestures. The beat gestures show considerable variation in temporal synchronization with the syllables, while the syllables generally show greater intensity, higher F0, and greater F0 range when compared to the mean across the entire dialogue. Additional features for further analysis and automatic classification of the head gestures are discussed.

  • 7. Ambrazaitis, G.
    et al.
    House, David
    KTH, Skolan för elektroteknik och datavetenskap (EECS), Tal, musik och hörsel, TMH.
    Multimodal prominences: Exploring the patterning and usage of focal pitch accents, head beats and eyebrow beats in Swedish television news readings2017Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 95, s. 100-113Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Facial beat gestures align with pitch accents in speech, functioning as visual prominence markers. However, it is not yet well understood whether and how gestures and pitch accents might be combined to create different types of multimodal prominence, and how specifically visual prominence cues are used in spoken communication. In this study, we explore the use and possible interaction of eyebrow (EB) and head (HB) beats with so-called focal pitch accents (FA) in a corpus of 31 brief news readings from Swedish television (four news anchors, 986 words in total), focusing on effects of position in text, information structure as well as speaker expressivity. Results reveal an inventory of four primary (combinations of) prominence markers in the corpus: FA+HB+EB, FA+HB, FA only (i.e., no gesture), and HB only, implying that eyebrow beats tend to occur only in combination with the other two markers. In addition, head beats occur significantly more frequently in the second than in the first part of a news reading. A functional analysis of the data suggests that the distribution of head beats might to some degree be governed by information structure, as the text-initial clause often defines a common ground or presents the theme of the news story. In the rheme part of the news story, FA, HB, and FA+HB are all common prominence markers. The choice between them is subject to variation which we suggest might represent a degree of freedom for the speaker to use the markers expressively. A second main observation concerns eyebrow beats, which seem to be used mainly as a kind of intensification marker for highlighting not only contrast, but also value, magnitude, or emotionally loaded words; it is applicable in any position in a text. We thus observe largely different patterns of occurrence and usage of head beats on the one hand and eyebrow beats on the other, suggesting that the two represent two separate modalities of visual prominence cuing.

  • 8. Ambrazaitis, G.
    et al.
    Svensson Lundmark, M.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Head beats and eyebrow movements as a function of phonological prominence levels and word accents in Stockholm Swedish news broadcasts2015Ingår i: The 3rd European Symposium on Multimodal Communication, Dublin, Ireland, 2015Konferensbidrag (Refereegranskat)
  • 9. Ambrazaitis, G.
    et al.
    Svensson Lundmark, M.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Head Movements, Eyebrows, and Phonological Prosodic Prominence Levels in Stockholm2015Ingår i: 13th International Conference on Auditory-Visual Speech Processing (AVSP 2015), Vienna, Austria, 2015, s. 42-Konferensbidrag (Refereegranskat)
  • 10. Ambrazaitis, G.
    et al.
    Svensson Lundmark, M.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Multimodal levels of promincence: a preliminary analysis of head and eyebrow movements in Swedish news broadcasts2015Ingår i: Proceedings of Fonetik 2015 / [ed] Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J., Lund, 2015, s. 11-16Konferensbidrag (Övrigt vetenskapligt)
  • 11. Artman, H.
    et al.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hultén, M.
    Karlgren, K.
    Ramberg, R.
    The Interactionary as a didactic format in design education2015Ingår i: Proc. of KTH Scholarship of Teaching and Learning 2015, Stockholm, Sweden, 2015Konferensbidrag (Refereegranskat)
  • 12.
    Artman, Henrik
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Hulten, Magnus
    Linköpings universitet.
    Designed by Engineers: An analysis of interactionaries with engineering students2015Ingår i: Designs for Learning, ISSN 1654-7608, Vol. 7, nr 2, s. 28-56, artikel-id 10.2478/dfl-2014-0062Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    The aim of this study is to describe and analyze learning taking place in a collaborative design exercise involving engineering students. The students perform a time-constrained, open-ended, complex interaction design task, an “interactionary”. A multimodal learning perspective is used. We have performed detailed analyses of video recordings of the engineering students, including classifying aspects of interaction. Our results show that the engineering students carry out and articulate their design work using a technology-centred approach and focus more on the function of their designs than on aspects of interaction. The engineering students mainly make use of ephemeral communication strategies (gestures and speech) rather than sketching in physical materials. We conclude that the interactionary may be an educational format that can help engineering students learn the messiness of design work. We further identify several constraints to the engineering students’ design learning and propose useful interventions that a teacher could make during an interactionary. We especially emphasize interventions that help engineering students retain aspects of human-centered design throughout the design process. This study partially replicates a previous study which involved interaction design students.

  • 13.
    Artman, Henrik
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Medieteknik och interaktionsdesign, MID.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hultén, Magnus
    Linköpings universitet.
    Design Learning Opportunities in Engineering Education: A case study of students solving an interaction–design task2014Ingår i: Proc. 4th International Designs for Learning Conference, 2014Konferensbidrag (Refereegranskat)
    Abstract [en]

    How do engineering students embrace interaction design? We presented two groups of chemical engineering students with an interaction design brief with the task of producing a concept prototype of an interactive artefact. Through interaction analysis of video material we analyse how the students gesture and use concepts adhering to interaction. The students frequently use gestures to enhance idea-generation. Sketches are used sparsely and other design materials were almost not used at all.

  • 14. Beaugendre, F.
    et al.
    House, David
    KTH, Tidigare Institutioner                               , Talöverföring och musikakustik.
    Hermes, D. J.
    Accentuation boundaries in Dutch, French and Swedish2001Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 33, nr 4, s. 305-318Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper presents a comparative study investigating the relation between the timing of a rising or falling pitch movement and the temporal structure of the syllable it accentuates for three languages: Dutch, French and Swedish. In a perception experiment, the five-syllable utterances /mamamamama/ and /?a?a?a?a?a/ were provided with a relatively fast rising or falling pitch movement. The timing of the movement was systematically varied so that it accented the third or the fourth syllable, subjects were asked to indicate which syllable they perceived as accented. The accentuation boundary (AB) between the third and the fourth syllable was then defined as the moment before which more than half of the subjects indicated the third syllable as accented and after which more than half of the subjects indicated the fourth syllable. The results show that there are significant differences between the three languages as to the location of the AB. In general, for the rises, well-defined ABs were found. They were located in the middle of the vowel of the third syllable for French subjects, and later in that vowel for Dutch and swedish subjects. For the falls, a clear AB was obtained only for the Dutch and the Swedish listeners. This was located at the end of the third syllable. For the French listeners, the fall did not yield a clear AB, This corroborates the absence of accentuation by means of falls in French. By varying the duration of the pitch movement it could be shown that, in all cases in which a clear AB was found. the cue for accentuation was located at the beginning of the pitch movement.

  • 15.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Alexanderson, Simon
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH.
    Al Moubayed, Samer
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Kinetic Data for Large-Scale Analysis and Modeling of Face-to-Face Conversation2011Ingår i: Proceedings of International Conference on Audio-Visual Speech Processing 2011 / [ed] Salvi, G.; Beskow, J.; Engwall, O.; Al Moubayed, S., Stockholm: KTH Royal Institute of Technology, 2011, s. 103-106Konferensbidrag (Refereegranskat)
    Abstract [en]

    Spoken face to face interaction is a rich and complex form of communication that includes a wide array of phenomena thatare not fully explored or understood. While there has been extensive studies on many aspects in face-to-face interaction, these are traditionally of a qualitative nature, relying on hand annotated corpora, typically rather limited in extent, which is a natural consequence of the labour intensive task of multimodal data annotation. In this paper we present a corpus of 60 hours of unrestricted Swedish face-to-face conversations recorded with audio, video and optical motion capture, and we describe a new project setting out to exploit primarily the kinetic data in this corpus in order to gain quantitative knowledge on humanface-to-face interaction.

  • 16.
    Beskow, Jonas
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cerrato, Loredana
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    House, David
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Nordenberg, Mikael
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Nordstrand, Magnus
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Svanfeldt, Gunilla
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Expressive animated agents for affective dialogue systems2004Ingår i: AFFECTIVE DIALOGUE SYSTEMS, PROCEEDINGS / [ed] Andre, E; Dybkjaer, L; Minker, W; Heisterkamp, P, BERLIN: SPRINGER , 2004, Vol. 3068, s. 240-243Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present our current state of development regarding animated agents applicable to affective dialogue systems. A new set of tools are under development to support the creation of animated characters compatible with the MPEG-4 facial animation standard. Furthermore, we have collected a multimodal expressive speech database including video, audio and 3D point motion registration. One of the objectives of collecting the database is to examine how emotional expression influences articulatory patterns, to be able to model this in our agents. Analysis of the 3D data shows for example that variation in mouth width due to expression greatly exceeds that due to vowel quality.

  • 17.
    Beskow, Jonas
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Cerrato, Loredana
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Granström, Björn
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    House, David
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Nordstrand, Magnus
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Svanfeldt, Gunilla
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    The Swedish PFs-Star Multimodal Corpora2004Ingår i: Proceedings of LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces, 2004, s. 34-37Konferensbidrag (Refereegranskat)
    Abstract [en]

    The aim of this paper is to present the multimodal speech corpora collected at KTH, in the framework of the European project PF-Star, and discuss some of the issues related to the analysis and implementation of human communicative and emotional visual correlates of speech in synthetic conversational agents. Two multimodal speech corpora have been collected by means of an opto-electronic system, which allows capturing the dynamics of emotional facial expressions with very high precision. The data has been evaluated through a classification test and the results show promising identification rates for the different acted emotions. These multimodal speech corpora will truly represent a valuable source to get more knowledge about how speech articulation and communicative gestures are affected by the expression of emotions.

  • 18.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hellmer, Kahl
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strömbergsson, Sofia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Project presentation: Spontal: multimodal database of spontaneous dialog2009Ingår i: Proceedings of Fonetik 2009: The XXIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, s. 190-193Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We describe the ongoing Swedish speech database project Spontal: Multimodal database of spontaneous speech in dialog (VR 2006-7482). The project takes as its point of departure the fact that both vocal signals and gesture involving the face and body are important in every-day, face-to-face communicative interaction, and that there is a great need for data with which we more precisely measure these.

  • 19.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Face-to-Face Interaction and the KTH Cooking Show2010Ingår i: DEVELOPMENT OF MULTIMODAL INTERFACES: ACTIVE LISTING AND SYNCHRONY / [ed] Esposito A; Campbell N; Vogel C; Hussain A; Nijholt A, 2010, Vol. 5967, s. 157-168Konferensbidrag (Refereegranskat)
    Abstract [en]

    We share our experiences with integrating motion capture recordings in speech and dialogue research by describing (1) Spontal, a large project collecting 60 hours of video, audio and motion capture spontaneous dialogues, is described with special attention to motion capture and its pitfalls; (2) a tutorial where we use motion capture, speech synthesis and an animated talking head to allow students to create an active listener; and (3) brief preliminary results in the form of visualizations of motion capture data over time in a Spontal dialogue. We hope that given the lack of writings on the use of motion capture for speech research, these accounts will prove inspirational and informative.

  • 20.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Modelling humanlike conversational behaviour2010Ingår i: SLTC 2010: The Third Swedish Language Technology Conference (SLTC 2010), Proceedings of the Conference, Linköping, Sweden, 2010, s. 9-10Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We have a visionar y goal: to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is humanlike. We take the opportunity here to present four new projects inaugurated in 2010, each adding pieces of the puzzle through a shared research focus: modelling interactional aspects of spoken face-to-face communication.

  • 21.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gustafson, Joakim
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Research focus: Interactional aspects of spoken face-to-face communication2010Ingår i: Proceedings from Fonetik, Lund, June 2-4, 2010: / [ed] Susanne Schötz, Gilbert Ambrazaitis, Lund, Sweden: Lund University , 2010, s. 7-10Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We have a visionary goal: to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is human-like. We take the opportunity here to present four new projects inaugurated in 2010, each adding pieces of the puzzle through a shared research focus: interactional aspects of spoken face-to-face communication.

  • 22.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Analysis and synthesis of multimodal verbal and non-verbal interaction for animated interface agents2007Ingår i: VERBAL AND NONVERBAL COMMUNICATION BEHAVIOURS / [ed] Esposito, A; FaundezZanuy, M; Keller, E; Marinaro, M, BERLIN: SPRINGER-VERLAG BERLIN , 2007, Vol. 4775, s. 250-263Konferensbidrag (Refereegranskat)
    Abstract [en]

    The use of animated talking agents is a novel feature of many multimodal spoken dialogue systems. The addition and integration of a virtual talking head has direct implications for the way in which users approach and interact with such systems. However, understanding the interactions between visual expressions, dialogue functions and the acoustics of the corresponding speech presents a substantial challenge. Some of the visual articulation is closely related to the speech acoustics, while there are other articulatory movements affecting speech acoustics that are not visible on the outside of the face. Many facial gestures used for communicative purposes do not affect the acoustics directly, but might nevertheless be connected on a higher communicative level in which the timing of the gestures could play an important role. This chapter looks into the communicative function of the animated talking agent, and its effect on intelligibility and the flow of the dialogue.

  • 23.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Focal accent and facial movements in expressive speech2006Ingår i: Proceedings from Fonetik 2006, Lund, June, 7-9, 2006 / [ed] Gilbert Ambrazaitis, Susanne Schötz, Lund: Lund University , 2006, s. 9-12Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    In this paper, we present measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systetnatically varied. The utterances were recorded in a variety of expressive modes including Certain, Confirming,Questioning, Uncertain, Happy, Angry and Neutral. Results showed that in all expressive modes, words with focal accent are accompanied by a greater variation of the facial parameters than are words in non-focal positions. Moreover, interesting differences between the expressions in terms of different parameters were found.

  • 24.
    Beskow, Jonas
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Granström, Björn
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Visual correlates to prominence in several expressive modes2006Ingår i: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, s. 1272-1275Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, we present measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systematically varied. The utterances were recorded in a variety of expressive modes including certain, confirming, questioning, uncertain, happy, angry and neutral. Results showed that in all expressive modes, words with focal accent are accompanied by a greater variation of the facial parameters than are words in non-focal positions. Moreover, interesting differences between the expressions in terms of different parameters were found.

  • 25.
    Blomberg, Mats
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Karlsson, Inger A.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Research Challenges in Speech Technology: A Special Issue in Honour of Rolf Carlson and Bjorn Granstrom2009Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 51, nr 7, s. 563-563Artikel i tidskrift (Refereegranskat)
  • 26. Boves, L.
    et al.
    Carlson, Rolf
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hinrichs, E.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Krauwer, S.
    Lemnitzer, L.
    Vainio, M.
    Wittenburg, P.
    Resources for Speech Research: Present and Future Infrastructure Needs2009Ingår i: Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, UK, 2009, s. 1803-1806Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper introduces the EU-FP7 project CLARIN, a joint effort of over 150 institutions in Europe, aimed at the creation of a sustainable language resources and technology infrastructure for the humanities and social sciences research community. The paper briefly introduces the vision behind the project and how it relates to speech research with a focus on the contributions that CLARIN can and will make to research in spoken language processing.

  • 27.
    Carlson, Rolf
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Edlund, Jens
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Heldner, Mattias
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hjalmarsson, Anna
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Towards human-like behaviour in spoken dialog systems2006Ingår i: Proceedings of Swedish Language Technology Conference (SLTC 2006), Gothenburg, Sweden, 2006Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We and others have found it fruitful to assume that users, when interacting with spoken dialogue systems, perceive the systems and their actions metaphorically. Common metaphors include the human metaphor and the interface metaphor (cf. Edlund, Heldner, & Gustafson, 2006). In the interface metaphor, the spoken dialogue system is perceived as a machine interface – often but not always a computer interface. Speech is used to accomplish what would have otherwise been accomplished by some other means of input, such as a keyboard or a mouse. In the human metaphor, on the other hand, the computer is perceived as a creature (or even a person) with humanlike conversational abilities, and speech is not a substitute or one of many alternatives, but rather the primary means of communicating with this creature. We are aware that more “natural ” or human-like behaviour does not automatically make a spoken dialogue system “better ” (i.e. more efficient or more well-liked by its users). Indeed, we are quite convinced that the advantage (or disadvantage) of humanlike behaviour will be highly dependent on the application. However, a dialogue system that is coherent with a human metaphor may profit from a number of characteristics.

  • 28.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Elenius, Kjell
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Hellmer, Kahl
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strömbergsson, Sofia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Spontal: a Swedish spontaneous dialogue corpus of audio, video and motion capture2010Ingår i: Proc. of the Seventh conference on International Language Resources and Evaluation (LREC'10) / [ed] Calzolari, Nicoletta; Choukri, Khalid; Maegaard, Bente; Mariani, Joseph; Odjik, Jan; Piperidis, Stelios; Rosner, Mike; Tapias, Daniel, 2010, s. 2992-2995Konferensbidrag (Refereegranskat)
    Abstract [en]

    We present the Spontal database of spontaneous Swedish dialogues. 120 dialogues of at least 30 minutes each have been captured in high-quality audio, high-resolution video and with a motion capture system. The corpus is currently being processed and annotated, and will be made available for research at the end of the project.

  • 29.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Beskow, Jonas
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Gesture movement profiles in dialogues from a Swedish multimodal database of spontaneous speech2012Ingår i: Prosodic and Visual Resources in Interactional Grammar / [ed] Bergmann, Pia; Brenning, Jana; Pfeiffer, Martin C.; Reber, Elisabeth, Walter de Gruyter, 2012Kapitel i bok, del av antologi (Refereegranskat)
  • 30.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Prosodic Features in the Perception of Clarification Ellipses2005Ingår i: Proceedings of Fonetik 2005: The XVIIIth Swedish Phonetics Conference, Gothenburg, Sweden, 2005, s. 107-110Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    We present an experiment where subjects were asked to listen to Swedish human-computer dialogue fragments where a synthetic voice makes an elliptical clarification after a user turn. The prosodic features of the synthetic voice were systematically varied, and subjects were asked to judge the computer's actual intention. The results show that an early low F0 peak signals acceptance, that a late high peak is perceived as a request for clarification of what was said, and that a mid high peak is perceived as a request for clarification of the meaning of what was said. The study can be seen as the beginnings of a tentative model for intonation of clarification ellipses in Swedish, which can be implemented and tested in spoken dialogue systems.

  • 31.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Skantze, Gabriel
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    The effects of prosodic features on the interpretation of clarification ellipses2005Ingår i: Proceedings of Interspeech 2005: Eurospeech, 2005, s. 2389-2392Konferensbidrag (Refereegranskat)
    Abstract [en]

    In this paper, the effects of prosodic features on the interpretation of elliptical clarification requests in dialogue are studied. An experiment is presented where subjects were asked to listen to short human-computer dialogue fragments in Swedish, where a synthetic voice was making an elliptical clarification after a user turn. The prosodic features of the synthetic voice were systematically varied, and the subjects were asked to judge what was actually intended by the computer. The results show that an early low F0 peak signals acceptance, that a late high peak is perceived as a request for clarification of what was said, and that a mid high peak is perceived as a request for clarification of the meaning of what was said. The study can be seen as the beginnings of a tentative model for intonation of clarification ellipses in Swedish, which can be implemented and tested in spoken dialogue systems.

  • 32.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strömbergsson, Sofia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Question types and some prosodic correlates in 600 questions in the Spontal database of Swedish dialogues2012Ingår i: Proceedings Of The 6th International Conference On Speech Prosody, Vols I and  II, Shanghai, China: Tongji Univ Press , 2012, s. 737-740Konferensbidrag (Refereegranskat)
    Abstract [en]

    Studies of questions present strong evidence that there is no one-to-one relationship between intonation and interrogative mode. We present initial steps of a larger project investigating and describing intonational variation in the Spontal database of 120 half-hour spontaneous dialogues in Swedish, and testing the hypothesis that the concept of a standard question intonation such as a final pitch rise contrasting a final low declarative intonation is not consistent with the pragmatic use of intonation in dialogue. We report on the extraction of 600 questions from the Spontal corpus, coding and annotation of question typology, and preliminary results concerning some prosodic correlates related to question type.

  • 33.
    Edlund, Jens
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Strömbergsson, Sofia
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Telling questions from statements in spoken dialogue systems2012Ingår i: Proc. of SLTC 2012, Lund, Sweden, 2012Konferensbidrag (Refereegranskat)
  • 34.
    Granström, Björn
    et al.
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    House, David
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Audiovisual representation of prosody in expressive speech communication2004Ingår i: Proc of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, s. 393-396Konferensbidrag (Refereegranskat)
  • 35.
    Granström, Björn
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Audiovisual representation of prosody in expressive speech communication2005Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 46, nr 3-4, s. 473-484Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    Prosody in a single speaking style-often read speech-has been studied extensively in acoustic speech. During the past few years we have expanded our interest in two directions: (1) Prosody in expressive speech communication and (2) prosody as an audiovisual expression. Understanding the interactions between visual expressions (primarily in the face) and the acoustics of the corresponding speech presents a substantial challenge. Some of the visual articulation is for obvious reasons tightly connected to the acoustics (e.g. lip and jaw movements), but there are other articulatory movements that do not show up on the outside of the face. Furthermore, many facial gestures used for communicative purposes do not affect the acoustics directly, but might nevertheless be connected on a higher communicative level in which the timing of the gestures could play an important role. In this presentation we will give some examples of recent work, primarily at KTH, addressing these questions. We will report on methods for the acquisition and modeling of visual and acoustic data, and some evaluation experiments in which audiovisual prosody is tested. The context of much of our work in this area is to create an animated talking agent capable of displaying realistic communicative behavior and suitable for use in conversational spoken language systems, e.g. a virtual language teacher.

  • 36.
    Granström, Björn
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Effective Interaction with Talking Animated Agents in Dialogue Systems2005Ingår i: Advances in Natural Multimodal Dialogue Systems / [ed] van Kuppevelt, J.; Dybkjaer, L.; Bernsen, N. O., Springer Netherlands, 2005, s. 215-243Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    At the Centre for Speech Technology at KTH, we have for the past several years been developing spoken dialogue applications that include animated talking agents. Our motivation for moving into audiovisual output is to investigate the advantages of multimodality in human-system communication. While the mainstream character animation area has focussed on the naturalness and realism of the animated agents, our primary concern has been the possible increase of intelligibility and efficiency of interaction resulting from the addition of a talking face. In our first dialogue system, Waxholm, the agent used the deictic function of indicating specific information on the screen by eye gaze. In another project, Synface, we were specifically concerned with the advantages in intelligibility that a talking face could provide. In recent studies we have investigated the use of facial gesture cues to convey such dialogue-related functions as feedback and turn-taking as well as prosodic functions such as prominence. Results show that cues such as eyebrow and head movement can independently signal prominence. Current results also indicate that there can be considerable differences in cue strengths among visual cues such as smiling and nodding and that such cues can contribute in an additive manner together with auditory prosody as cues to different dialogue functions. Results from some of these studies are presented in the chapter along with examples of spoken dialogue applications using talking heads.

  • 37.
    Granström, Björn
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Inside out - Acoustic and visual aspects of verbal and non-verbal communication: Keynote Paper2007Ingår i: Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken / [ed] Trouvain, J.; Barry, W., 2007, s. 11-18Konferensbidrag (Refereegranskat)
    Abstract [en]

    In face-to-face communication both visual andauditory information play an obvious andsignificant role. In this presentation we will discusswork done, primarily at KTH, that aims atanalyzing and modelling verbal and non-verbalcommunication from a multi-modal perspective. Inour studies, it appears that both segmental andprosodic phenomena are strongly affected by thecommunicative context of speech interaction. Oneplatform for modelling audiovisual speechcommunication is the ECA, embodiedconversational agent. We will describe how ECAshave been used in our research, including examplesof applications and a series of experiments forstudying multimodal aspects of speechcommunication.

  • 38.
    Granström, Björn
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Measuring and modeling audiovisual prosody for animated agents2006Ingår i: Proceedings of Speech Prosody 2006. Dresden, 2006Konferensbidrag (Refereegranskat)
    Abstract [en]

    Understanding the interactions between visual expressions, dialogue functions and the acoustics of the corresponding speech presents a substantial challenge. The context of much of our work in this area is to create an animated talking agent capable of displaying realistic communicative behavior and suitable for use in conversational spoken language systems, e.g. a virtual language teacher. In this presentation we will give some examples of recent work, primarily at KTH, involving the collection and analysis of a database for audiovisual prosody. We will report on methods for the acquisition and modeling of visual and acoustic data, and provide some examples of analysis of head nods and eyebrow settings.

  • 39.
    Granström, Björn
    et al.
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Modelling and evaluating verbal and non-verbal communication in talking animated interface agents2007Ingår i: Evaluation of Text and Speech Systems / [ed] Dybkjaer, l.; Hemsen, H.; Minker, W., Dordrecht: Springer-Verlag Ltd , 2007, s. 65-98Kapitel i bok, del av antologi (Refereegranskat)
    Abstract [en]

    The use of animated talking agents is a novel feature of many multimodal experimental spoken dialogue systems. The addition and integration of a virtual talking head has direct implications for the way in which users approach and interact with such systems. Established techniques for evaluating the quality, efficiency, and other impacts of this technology have not yet appeared in standard textbooks. The focus of this chapter is to look into the communicative function of the agent, both the capability to increase intelligibility of the spoken interaction and the possibility to make the flow of the dialogue smoother, through different kinds of communicative gestures such as gestures for emphatic stress, emotions, turntaking, and negative or positive system feedback. The chapter reviews state-of-the-art animated agent technologies and their applications primarily in dialogue systems. The chapter also includes examples of methods of evaluating communicative gestures in different contexts.

  • 40. Horne, Merle
    et al.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Svantesson, Jan-Olof
    Touati, Paul
    Gösta Bruce 1947-2010 In Memoriam2010Ingår i: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 67, nr 4, s. 268-270Artikel i tidskrift (Refereegranskat)
  • 41.
    House, David
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Final rises and Swedish question intonation2004Ingår i: Proc of The XVIIth Swedish Phonetics Conference, Fonetik 2004, Stockholm University, 2004, s. 56-59Konferensbidrag (Övrigt vetenskapligt)
    Abstract [en]

    Phrase-final intonation was analysed in a subcorpusof Swedish computer-directed questionutterances with the objective of investigatingthe extent to which final rises occur in spontaneousquestions, and also to see if such risesmight have pragmatic functions over and beyondthe signalling of interrogative mode. Finalrises occurred in 22 percent of the utterances.Final rises occurred mostly in conjunctionwith final focal accent. Children exhibitedthe largest percentage of final rises (32%), withwomen second (27%) and men lowest (17%).These results are viewed in relationship to resultsof related perception studies and are discussedin terms of Swedish question intonationand the pragmatic social function of rises in abiological account of intonation.

  • 42.
    House, David
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Final rises in spontaneous Swedish computer-directed questions: incidence and function2004Ingår i: Proc of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, s. 115-118Konferensbidrag (Refereegranskat)
    Abstract [en]

    Phrase-final intonation was analysed in a subcorpus of Swedish computer-directed question utterances with the objective of investigating the extent to which final rises occur in spontaneous questions, and also to see if such rises might have pragmatic functions over and beyond the signalling of interrogative mode. Final rises occurred in 22 percent of the utterances. Final rises occurred mostly in conjunction with final focal accent. Children exhibited the largest percentage of final rises (32%), with women second (27%) and men lowest (17%). These results are discussed in terms of Swedish question intonation and the pragmatic social function of rises in a biological account of intonation.

  • 43.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Fonetiska undersökningar av kammu2005Ingår i: Kammu - om ett folk i Laos / [ed] Lundström, H.; Svantesson, J.-O., Lund: Lunds universitetshistoriska sällskap , 2005, s. 164-167Kapitel i bok, del av antologi (Refereegranskat)
  • 44.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Integrating Audio and Visual Cues for Speaker Friendliness in Multimodal Speech Synthesis2007Ingår i: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, s. 1461-1464Konferensbidrag (Refereegranskat)
    Abstract [en]

    This paper investigates interactions between audio and visual cues to friendliness in questions in two perception experiments. In the first experiment, manually edited parametric audio-visual synthesis was used to create the stimuli. Results were consistent with earlier findings in that a late, high final focal accent peak was perceived as friendlier than an earlier, lower focal accent peak. Friendliness was also effectively signaled by visual facial parameters such as a smile, head nod and eyebrow raising synchronized with the final accent. Consistent additive effects were found between the audio and visual cues for the subjects as a group and individually showing that subjects integrate the two modalities. The second experiment used data-driven visual synthesis where the database was recorded by an actor instructed to portray anger and happiness. Friendliness was correlated to the happy database, but the effect was not as strong as for the parametric synthesis.

  • 45.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    On the interaction of audio and visual cues to friendliness in interrogative prosody2006Ingår i: Proceedings of The Nordic Conference on Multimodal Communication, 2005, Göteborg, 2006, s. 201-213Konferensbidrag (Refereegranskat)
  • 46.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Perception and production of phrase-final intonation in Swedish questions2006Ingår i: Nordic Prosody, Proceedings of the IXth Conference, Lund 2004 / [ed] Bruce, G.; Horne, M., Frankfurt am Main: Peter Lang , 2006, s. 127-136Konferensbidrag (Refereegranskat)
  • 47.
    House, David
    KTH, Skolan för datavetenskap och kommunikation (CSC), Tal, musik och hörsel, TMH, Tal-kommunikation.
    Phrase-final rises as a prosodic feature in wh-questions in Swedish human-machine dialogue2005Ingår i: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 46, nr 3-4, s. 268-283Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This paper examines the extent to which optional final rises occur in a set of 200 wh-questions extracted from a large corpus of computer-directed spontaneous speech in Swedish and discusses the function these rises may have in signalling dialogue acts and speaker attitude over and beyond an information question. Final rises occurred in 22% of the utterances, primarily in conjunction with final focal accent. Children exhibited the largest percentage of final rises (32%), with women second (27%) and men lowest (17%). The distribution of the rises in the material is examined and evidence relating to the final rise as a signal of a social interaction oriented dialogue act is gathered from the distribution. Two separate perception tests were carried out to test the hypothesis that high and late focal accent peaks in a wh-question are perceived as friendlier and more socially interested than low and early peaks. Generally, the results were consistent with these hypotheses when the late peaks were in phrase-final position. Finally, the results of this study are discussed in terms of pragmatic and attitudinal meanings and biological codes.

  • 48.
    House, David
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Pitch and alignment in the perception of tone and intonation2004Ingår i: From Traditional Phonology to Modern Speech Processing / [ed] Fant, G.; Fujisaki, H.; Cao, J.; Xu, Y., Beijing: Foreign Language Teaching and Research Press , 2004, s. 189-204Kapitel i bok, del av antologi (Refereegranskat)
  • 49.
    House, David
    KTH, Tidigare Institutioner, Tal, musik och hörsel.
    Pitch and alignment in the perception of tone and intonation: pragmatic signals and biological codes2004Ingår i: Proc of International Symposium on Tonal Aspects of Languages: Emphasis on Tone Languages / [ed] Bel, B.; Marlein, I., Beijng, China, 2004, s. 93-96Konferensbidrag (Refereegranskat)
  • 50.
    House, David
    KTH, Skolan för elektroteknik och datavetenskap (EECS).
    Response to Fred Cummins: Looking for Rhythm in Speech.2012Ingår i: Empirical Musicology Review, ISSN 1559-5749, E-ISSN 1559-5749, Vol. 7, nr 1-2, s. 45-48Artikel i tidskrift (Refereegranskat)
    Abstract [en]

    This commentary briefly reviews three aspects of rhythm in speech. The first concerns the issues of what to measure and how measurements should relate to rhythm's communicative functions. The second relates to how tonal and durational features of speech contribute to the percept of rhythm, noting evidence that indicates such features can be tightly language-specific. The third aspect addressed is how bodily gestures integrate with and enhance the communicative functions of speech rhythm.

12 1 - 50 av 84
RefereraExporteraLänk till träfflistan
Permanent länk
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf