Change search
Refine search result
12 1 - 50 of 83
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Animated Faces for Robotic Heads: Gaze and Beyond2011In: Analysis of Verbal and Nonverbal Communication and Enactment: The Processing Issues / [ed] Anna Esposito, Alessandro Vinciarelli, Klára Vicsi, Catherine Pelachaud and Anton Nijholt, Springer Berlin/Heidelberg, 2011, p. 19-35Conference paper (Refereed)
    Abstract [en]

    We introduce an approach to using animated faces for robotics where a static physical object is used as a projection surface for an animation. The talking head is projected onto a 3D physical head model. In this chapter we discuss the different benefits this approach adds over mechanical heads. After that, we investigate a phenomenon commonly referred to as the Mona Lisa gaze effect. This effect results from the use of 2D surfaces to display 3D images and causes the gaze of a portrait to seemingly follow the observer no matter where it is viewed from. The experiment investigates the perception of gaze direction by observers. The analysis shows that the 3D model eliminates the effect, and provides an accurate perception of gaze direction. We discuss at the end the different requirements of gaze in interactive systems, and explore the different settings these findings give access to.

  • 2.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC).
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Audio-Visual Prosody: Perception, Detection, and Synthesis of Prominence2010In: 3rd COST 2102 International Training School on Toward Autonomous, Adaptive, and Context-Aware Multimodal Interfaces: Theoretical and Practical Issues / [ed] Esposito A; Esposito AM; Martone R; Muller VC; Scarpetta G, 2010, Vol. 6456, p. 55-71Conference paper (Refereed)
    Abstract [en]

    In this chapter, we investigate the effects of facial prominence cues, in terms of gestures, when synthesized on animated talking heads. In the first study a speech intelligibility experiment is conducted, where speech quality is acoustically degraded, then the speech is presented to 12 subjects through a lip synchronized talking head carrying head-nods and eyebrow raising gestures. The experiment shows that perceiving visual prominence as gestures, synchronized with the auditory prominence, significantly increases speech intelligibility compared to when these gestures are randomly added to speech. We also present a study examining the perception of the behavior of the talking heads when gestures are added at pitch movements. Using eye-gaze tracking technology and questionnaires for 10 moderately hearing impaired subjects, the results of the gaze data show that users look at the face in a similar fashion to when they look at a natural face when gestures are coupled with pitch movements opposed to when the face carries no gestures. From the questionnaires, the results also show that these gestures significantly increase the naturalness and helpfulness of the talking head.

  • 3.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Aspects of co-occurring syllables and head nods in spontaneous dialogue2013In: Proceedings of 12th International Conference on Auditory-Visual Speech Processing (AVSP2013), 2013, p. 169-172Conference paper (Refereed)
    Abstract [en]

    This paper reports on the extraction and analysis of head nods taken from motion capture data of spontaneous dialogue in Swedish. The head nods were extracted automatically and then manually classified in terms of gestures having a beat function or multifunctional gestures. Prosodic features were extracted from syllables co-occurring with the beat gestures. While the peak rotation of the nod is on average aligned with the stressed syllable, the results show considerable variation in fine temporal synchronization. The syllables co-occurring with the gestures generally show greater intensity, higher F0, and greater F0 range when compared to the mean across the entire dialogue. A functional analysis shows that the majority of the syllables belong to words bearing a focal accent.

  • 4.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Automatic annotation of gestural units in spontaneous face-to-face interaction2016In: MA3HMI 2016 - Proceedings of the Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction, 2016, p. 15-19Conference paper (Refereed)
    Abstract [en]

    Speech and gesture co-occur in spontaneous dialogue in a highly complex fashion. There is a large variability in the motion that people exhibit during a dialogue, and different kinds of motion occur during different states of the interaction. A wide range of multimodal interface applications, for example in the fields of virtual agents or social robots, can be envisioned where it is important to be able to automatically identify gestures that carry information and discriminate them from other types of motion. While it is easy for a human to distinguish and segment manual gestures from a flow of multimodal information, the same task is not trivial to perform for a machine. In this paper we present a method to automatically segment and label gestural units from a stream of 3D motion capture data. The gestural flow is modeled with a 2-level Hierarchical Hidden Markov Model (HHMM) where the sub-states correspond to gesture phases. The model is trained based on labels of complete gesture units and self-adaptive manipulators. The model is tested and validated on two datasets differing in genre and in method of capturing motion, and outperforms a state-of-the-art SVM classifier on a publicly available dataset.

  • 5.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Extracting and analysing co-speech head gestures from motion-capture data2013In: Proceedings of Fonetik 2013 / [ed] Eklund, Robert, Linköping University Electronic Press, 2013, p. 1-4Conference paper (Refereed)
  • 6.
    Alexanderson, Simon
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Extracting and analyzing head movements accompanying spontaneous dialogue2013In: Conference Proceedings TiGeR 2013: Tilburg Gesture Research Meeting, 2013Conference paper (Refereed)
    Abstract [en]

    This paper reports on a method developed for extracting and analyzing head gestures taken from motion capture data of spontaneous dialogue in Swedish. Candidate head gestures with beat function were extracted automatically and then manually classified using a 3D player which displays timesynced audio and 3D point data of the motion capture markers together with animated characters. Prosodic features were extracted from syllables co-occurring with a subset of the classified gestures. The beat gestures show considerable variation in temporal synchronization with the syllables, while the syllables generally show greater intensity, higher F0, and greater F0 range when compared to the mean across the entire dialogue. Additional features for further analysis and automatic classification of the head gestures are discussed.

  • 7. Ambrazaitis, G.
    et al.
    House, David
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Multimodal prominences: Exploring the patterning and usage of focal pitch accents, head beats and eyebrow beats in Swedish television news readings2017In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 95, p. 100-113Article in journal (Refereed)
    Abstract [en]

    Facial beat gestures align with pitch accents in speech, functioning as visual prominence markers. However, it is not yet well understood whether and how gestures and pitch accents might be combined to create different types of multimodal prominence, and how specifically visual prominence cues are used in spoken communication. In this study, we explore the use and possible interaction of eyebrow (EB) and head (HB) beats with so-called focal pitch accents (FA) in a corpus of 31 brief news readings from Swedish television (four news anchors, 986 words in total), focusing on effects of position in text, information structure as well as speaker expressivity. Results reveal an inventory of four primary (combinations of) prominence markers in the corpus: FA+HB+EB, FA+HB, FA only (i.e., no gesture), and HB only, implying that eyebrow beats tend to occur only in combination with the other two markers. In addition, head beats occur significantly more frequently in the second than in the first part of a news reading. A functional analysis of the data suggests that the distribution of head beats might to some degree be governed by information structure, as the text-initial clause often defines a common ground or presents the theme of the news story. In the rheme part of the news story, FA, HB, and FA+HB are all common prominence markers. The choice between them is subject to variation which we suggest might represent a degree of freedom for the speaker to use the markers expressively. A second main observation concerns eyebrow beats, which seem to be used mainly as a kind of intensification marker for highlighting not only contrast, but also value, magnitude, or emotionally loaded words; it is applicable in any position in a text. We thus observe largely different patterns of occurrence and usage of head beats on the one hand and eyebrow beats on the other, suggesting that the two represent two separate modalities of visual prominence cuing.

  • 8. Ambrazaitis, G.
    et al.
    Svensson Lundmark, M.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Head beats and eyebrow movements as a function of phonological prominence levels and word accents in Stockholm Swedish news broadcasts2015In: The 3rd European Symposium on Multimodal Communication, Dublin, Ireland, 2015Conference paper (Refereed)
  • 9. Ambrazaitis, G.
    et al.
    Svensson Lundmark, M.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Head Movements, Eyebrows, and Phonological Prosodic Prominence Levels in Stockholm2015In: 13th International Conference on Auditory-Visual Speech Processing (AVSP 2015), Vienna, Austria, 2015, p. 42-Conference paper (Refereed)
  • 10. Ambrazaitis, G.
    et al.
    Svensson Lundmark, M.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Multimodal levels of promincence: a preliminary analysis of head and eyebrow movements in Swedish news broadcasts2015In: Proceedings of Fonetik 2015 / [ed] Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J., Lund, 2015, p. 11-16Conference paper (Other academic)
  • 11. Artman, H.
    et al.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hultén, M.
    Karlgren, K.
    Ramberg, R.
    The Interactionary as a didactic format in design education2015In: Proc. of KTH Scholarship of Teaching and Learning 2015, Stockholm, Sweden, 2015Conference paper (Refereed)
  • 12.
    Artman, Henrik
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Hulten, Magnus
    Linköpings universitet.
    Designed by Engineers: An analysis of interactionaries with engineering students2015In: Designs for Learning, ISSN 1654-7608, Vol. 7, no 2, p. 28-56, article id 10.2478/dfl-2014-0062Article in journal (Refereed)
    Abstract [en]

    The aim of this study is to describe and analyze learning taking place in a collaborative design exercise involving engineering students. The students perform a time-constrained, open-ended, complex interaction design task, an “interactionary”. A multimodal learning perspective is used. We have performed detailed analyses of video recordings of the engineering students, including classifying aspects of interaction. Our results show that the engineering students carry out and articulate their design work using a technology-centred approach and focus more on the function of their designs than on aspects of interaction. The engineering students mainly make use of ephemeral communication strategies (gestures and speech) rather than sketching in physical materials. We conclude that the interactionary may be an educational format that can help engineering students learn the messiness of design work. We further identify several constraints to the engineering students’ design learning and propose useful interventions that a teacher could make during an interactionary. We especially emphasize interventions that help engineering students retain aspects of human-centered design throughout the design process. This study partially replicates a previous study which involved interaction design students.

  • 13.
    Artman, Henrik
    et al.
    KTH, School of Computer Science and Communication (CSC), Media Technology and Interaction Design, MID.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hultén, Magnus
    Linköpings universitet.
    Design Learning Opportunities in Engineering Education: A case study of students solving an interaction–design task2014In: Proc. 4th International Designs for Learning Conference, 2014Conference paper (Refereed)
    Abstract [en]

    How do engineering students embrace interaction design? We presented two groups of chemical engineering students with an interaction design brief with the task of producing a concept prototype of an interactive artefact. Through interaction analysis of video material we analyse how the students gesture and use concepts adhering to interaction. The students frequently use gestures to enhance idea-generation. Sketches are used sparsely and other design materials were almost not used at all.

  • 14. Beaugendre, F.
    et al.
    House, David
    KTH, Superseded Departments, Speech Transmission and Music Acoustics.
    Hermes, D. J.
    Accentuation boundaries in Dutch, French and Swedish2001In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 33, no 4, p. 305-318Article in journal (Refereed)
    Abstract [en]

    This paper presents a comparative study investigating the relation between the timing of a rising or falling pitch movement and the temporal structure of the syllable it accentuates for three languages: Dutch, French and Swedish. In a perception experiment, the five-syllable utterances /mamamamama/ and /?a?a?a?a?a/ were provided with a relatively fast rising or falling pitch movement. The timing of the movement was systematically varied so that it accented the third or the fourth syllable, subjects were asked to indicate which syllable they perceived as accented. The accentuation boundary (AB) between the third and the fourth syllable was then defined as the moment before which more than half of the subjects indicated the third syllable as accented and after which more than half of the subjects indicated the fourth syllable. The results show that there are significant differences between the three languages as to the location of the AB. In general, for the rises, well-defined ABs were found. They were located in the middle of the vowel of the third syllable for French subjects, and later in that vowel for Dutch and swedish subjects. For the falls, a clear AB was obtained only for the Dutch and the Swedish listeners. This was located at the end of the third syllable. For the French listeners, the fall did not yield a clear AB, This corroborates the absence of accentuation by means of falls in French. By varying the duration of the pitch movement it could be shown that, in all cases in which a clear AB was found. the cue for accentuation was located at the beginning of the pitch movement.

  • 15.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Alexanderson, Simon
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Kinetic Data for Large-Scale Analysis and Modeling of Face-to-Face Conversation2011In: Proceedings of International Conference on Audio-Visual Speech Processing 2011 / [ed] Salvi, G.; Beskow, J.; Engwall, O.; Al Moubayed, S., Stockholm: KTH Royal Institute of Technology, 2011, p. 103-106Conference paper (Refereed)
    Abstract [en]

    Spoken face to face interaction is a rich and complex form of communication that includes a wide array of phenomena thatare not fully explored or understood. While there has been extensive studies on many aspects in face-to-face interaction, these are traditionally of a qualitative nature, relying on hand annotated corpora, typically rather limited in extent, which is a natural consequence of the labour intensive task of multimodal data annotation. In this paper we present a corpus of 60 hours of unrestricted Swedish face-to-face conversations recorded with audio, video and optical motion capture, and we describe a new project setting out to exploit primarily the kinetic data in this corpus in order to gain quantitative knowledge on humanface-to-face interaction.

  • 16.
    Beskow, Jonas
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Cerrato, Loredana
    KTH, Superseded Departments, Speech, Music and Hearing.
    Granström, Björn
    KTH, Superseded Departments, Speech, Music and Hearing.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Nordenberg, Mikael
    KTH, Superseded Departments, Speech, Music and Hearing.
    Nordstrand, Magnus
    KTH, Superseded Departments, Speech, Music and Hearing.
    Svanfeldt, Gunilla
    KTH, Superseded Departments, Speech, Music and Hearing.
    Expressive animated agents for affective dialogue systems2004In: AFFECTIVE DIALOGUE SYSTEMS, PROCEEDINGS / [ed] Andre, E; Dybkjaer, L; Minker, W; Heisterkamp, P, BERLIN: SPRINGER , 2004, Vol. 3068, p. 240-243Conference paper (Refereed)
    Abstract [en]

    We present our current state of development regarding animated agents applicable to affective dialogue systems. A new set of tools are under development to support the creation of animated characters compatible with the MPEG-4 facial animation standard. Furthermore, we have collected a multimodal expressive speech database including video, audio and 3D point motion registration. One of the objectives of collecting the database is to examine how emotional expression influences articulatory patterns, to be able to model this in our agents. Analysis of the 3D data shows for example that variation in mouth width due to expression greatly exceeds that due to vowel quality.

  • 17.
    Beskow, Jonas
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Cerrato, Loredana
    KTH, Superseded Departments, Speech, Music and Hearing.
    Granström, Björn
    KTH, Superseded Departments, Speech, Music and Hearing.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Nordstrand, Magnus
    KTH, Superseded Departments, Speech, Music and Hearing.
    Svanfeldt, Gunilla
    KTH, Superseded Departments, Speech, Music and Hearing.
    The Swedish PFs-Star Multimodal Corpora2004In: Proceedings of LREC Workshop on Models of Human Behaviour for the Specification and Evaluation of Multimodal Input and Output Interfaces, 2004, p. 34-37Conference paper (Refereed)
    Abstract [en]

    The aim of this paper is to present the multimodal speech corpora collected at KTH, in the framework of the European project PF-Star, and discuss some of the issues related to the analysis and implementation of human communicative and emotional visual correlates of speech in synthetic conversational agents. Two multimodal speech corpora have been collected by means of an opto-electronic system, which allows capturing the dynamics of emotional facial expressions with very high precision. The data has been evaluated through a classification test and the results show promising identification rates for the different acted emotions. These multimodal speech corpora will truly represent a valuable source to get more knowledge about how speech articulation and communicative gestures are affected by the expression of emotions.

  • 18.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Elenius, Kjell
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hellmer, Kahl
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Strömbergsson, Sofia
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Project presentation: Spontal: multimodal database of spontaneous dialog2009In: Proceedings of Fonetik 2009: The XXIIth Swedish Phonetics Conference / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, p. 190-193Conference paper (Other academic)
    Abstract [en]

    We describe the ongoing Swedish speech database project Spontal: Multimodal database of spontaneous speech in dialog (VR 2006-7482). The project takes as its point of departure the fact that both vocal signals and gesture involving the face and body are important in every-day, face-to-face communicative interaction, and that there is a great need for data with which we more precisely measure these.

  • 19.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Face-to-Face Interaction and the KTH Cooking Show2010In: DEVELOPMENT OF MULTIMODAL INTERFACES: ACTIVE LISTING AND SYNCHRONY / [ed] Esposito A; Campbell N; Vogel C; Hussain A; Nijholt A, 2010, Vol. 5967, p. 157-168Conference paper (Refereed)
    Abstract [en]

    We share our experiences with integrating motion capture recordings in speech and dialogue research by describing (1) Spontal, a large project collecting 60 hours of video, audio and motion capture spontaneous dialogues, is described with special attention to motion capture and its pitfalls; (2) a tutorial where we use motion capture, speech synthesis and an animated talking head to allow students to create an active listener; and (3) brief preliminary results in the form of visualizations of motion capture data over time in a Spontal dialogue. We hope that given the lack of writings on the use of motion capture for speech research, these accounts will prove inspirational and informative.

  • 20.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Modelling humanlike conversational behaviour2010In: SLTC 2010: The Third Swedish Language Technology Conference (SLTC 2010), Proceedings of the Conference, Linköping, Sweden, 2010, p. 9-10Conference paper (Other academic)
    Abstract [en]

    We have a visionar y goal: to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is humanlike. We take the opportunity here to present four new projects inaugurated in 2010, each adding pieces of the puzzle through a shared research focus: modelling interactional aspects of spoken face-to-face communication.

  • 21.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Research focus: Interactional aspects of spoken face-to-face communication2010In: Proceedings from Fonetik, Lund, June 2-4, 2010: / [ed] Susanne Schötz, Gilbert Ambrazaitis, Lund, Sweden: Lund University , 2010, p. 7-10Conference paper (Other academic)
    Abstract [en]

    We have a visionary goal: to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is human-like. We take the opportunity here to present four new projects inaugurated in 2010, each adding pieces of the puzzle through a shared research focus: interactional aspects of spoken face-to-face communication.

  • 22.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Analysis and synthesis of multimodal verbal and non-verbal interaction for animated interface agents2007In: VERBAL AND NONVERBAL COMMUNICATION BEHAVIOURS / [ed] Esposito, A; FaundezZanuy, M; Keller, E; Marinaro, M, BERLIN: SPRINGER-VERLAG BERLIN , 2007, Vol. 4775, p. 250-263Conference paper (Refereed)
    Abstract [en]

    The use of animated talking agents is a novel feature of many multimodal spoken dialogue systems. The addition and integration of a virtual talking head has direct implications for the way in which users approach and interact with such systems. However, understanding the interactions between visual expressions, dialogue functions and the acoustics of the corresponding speech presents a substantial challenge. Some of the visual articulation is closely related to the speech acoustics, while there are other articulatory movements affecting speech acoustics that are not visible on the outside of the face. Many facial gestures used for communicative purposes do not affect the acoustics directly, but might nevertheless be connected on a higher communicative level in which the timing of the gestures could play an important role. This chapter looks into the communicative function of the animated talking agent, and its effect on intelligibility and the flow of the dialogue.

  • 23.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Focal accent and facial movements in expressive speech2006In: Proceedings from Fonetik 2006, Lund, June, 7-9, 2006 / [ed] Gilbert Ambrazaitis, Susanne Schötz, Lund: Lund University , 2006, p. 9-12Conference paper (Other academic)
    Abstract [en]

    In this paper, we present measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systetnatically varied. The utterances were recorded in a variety of expressive modes including Certain, Confirming,Questioning, Uncertain, Happy, Angry and Neutral. Results showed that in all expressive modes, words with focal accent are accompanied by a greater variation of the facial parameters than are words in non-focal positions. Moreover, interesting differences between the expressions in terms of different parameters were found.

  • 24.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Visual correlates to prominence in several expressive modes2006In: INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2006, p. 1272-1275Conference paper (Refereed)
    Abstract [en]

    In this paper, we present measurements of visual, facial parameters obtained from a speech corpus consisting of short, read utterances in which focal accent was systematically varied. The utterances were recorded in a variety of expressive modes including certain, confirming, questioning, uncertain, happy, angry and neutral. Results showed that in all expressive modes, words with focal accent are accompanied by a greater variation of the facial parameters than are words in non-focal positions. Moreover, interesting differences between the expressions in terms of different parameters were found.

  • 25.
    Blomberg, Mats
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Elenius, Kjell
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Karlsson, Inger A.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Research Challenges in Speech Technology: A Special Issue in Honour of Rolf Carlson and Bjorn Granstrom2009In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 51, no 7, p. 563-563Article in journal (Refereed)
  • 26. Boves, L.
    et al.
    Carlson, Rolf
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hinrichs, E.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Krauwer, S.
    Lemnitzer, L.
    Vainio, M.
    Wittenburg, P.
    Resources for Speech Research: Present and Future Infrastructure Needs2009In: Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, UK, 2009, p. 1803-1806Conference paper (Refereed)
    Abstract [en]

    This paper introduces the EU-FP7 project CLARIN, a joint effort of over 150 institutions in Europe, aimed at the creation of a sustainable language resources and technology infrastructure for the humanities and social sciences research community. The paper briefly introduces the vision behind the project and how it relates to speech research with a focus on the contributions that CLARIN can and will make to research in spoken language processing.

  • 27.
    Carlson, Rolf
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Towards human-like behaviour in spoken dialog systems2006In: Proceedings of Swedish Language Technology Conference (SLTC 2006), Gothenburg, Sweden, 2006Conference paper (Other academic)
    Abstract [en]

    We and others have found it fruitful to assume that users, when interacting with spoken dialogue systems, perceive the systems and their actions metaphorically. Common metaphors include the human metaphor and the interface metaphor (cf. Edlund, Heldner, & Gustafson, 2006). In the interface metaphor, the spoken dialogue system is perceived as a machine interface – often but not always a computer interface. Speech is used to accomplish what would have otherwise been accomplished by some other means of input, such as a keyboard or a mouse. In the human metaphor, on the other hand, the computer is perceived as a creature (or even a person) with humanlike conversational abilities, and speech is not a substitute or one of many alternatives, but rather the primary means of communicating with this creature. We are aware that more “natural ” or human-like behaviour does not automatically make a spoken dialogue system “better ” (i.e. more efficient or more well-liked by its users). Indeed, we are quite convinced that the advantage (or disadvantage) of humanlike behaviour will be highly dependent on the application. However, a dialogue system that is coherent with a human metaphor may profit from a number of characteristics.

  • 28.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Elenius, Kjell
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hellmer, Kahl
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Strömbergsson, Sofia
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Spontal: a Swedish spontaneous dialogue corpus of audio, video and motion capture2010In: Proc. of the Seventh conference on International Language Resources and Evaluation (LREC'10) / [ed] Calzolari, Nicoletta; Choukri, Khalid; Maegaard, Bente; Mariani, Joseph; Odjik, Jan; Piperidis, Stelios; Rosner, Mike; Tapias, Daniel, 2010, p. 2992-2995Conference paper (Refereed)
    Abstract [en]

    We present the Spontal database of spontaneous Swedish dialogues. 120 dialogues of at least 30 minutes each have been captured in high-quality audio, high-resolution video and with a motion capture system. The corpus is currently being processed and annotated, and will be made available for research at the end of the project.

  • 29.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gesture movement profiles in dialogues from a Swedish multimodal database of spontaneous speech2012In: Prosodic and Visual Resources in Interactional Grammar / [ed] Bergmann, Pia; Brenning, Jana; Pfeiffer, Martin C.; Reber, Elisabeth, Walter de Gruyter, 2012Chapter in book (Refereed)
  • 30.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Prosodic Features in the Perception of Clarification Ellipses2005In: Proceedings of Fonetik 2005: The XVIIIth Swedish Phonetics Conference, Gothenburg, Sweden, 2005, p. 107-110Conference paper (Other academic)
    Abstract [en]

    We present an experiment where subjects were asked to listen to Swedish human-computer dialogue fragments where a synthetic voice makes an elliptical clarification after a user turn. The prosodic features of the synthetic voice were systematically varied, and subjects were asked to judge the computer's actual intention. The results show that an early low F0 peak signals acceptance, that a late high peak is perceived as a request for clarification of what was said, and that a mid high peak is perceived as a request for clarification of the meaning of what was said. The study can be seen as the beginnings of a tentative model for intonation of clarification ellipses in Swedish, which can be implemented and tested in spoken dialogue systems.

  • 31.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    The effects of prosodic features on the interpretation of clarification ellipses2005In: Proceedings of Interspeech 2005: Eurospeech, 2005, p. 2389-2392Conference paper (Refereed)
    Abstract [en]

    In this paper, the effects of prosodic features on the interpretation of elliptical clarification requests in dialogue are studied. An experiment is presented where subjects were asked to listen to short human-computer dialogue fragments in Swedish, where a synthetic voice was making an elliptical clarification after a user turn. The prosodic features of the synthetic voice were systematically varied, and the subjects were asked to judge what was actually intended by the computer. The results show that an early low F0 peak signals acceptance, that a late high peak is perceived as a request for clarification of what was said, and that a mid high peak is perceived as a request for clarification of the meaning of what was said. The study can be seen as the beginnings of a tentative model for intonation of clarification ellipses in Swedish, which can be implemented and tested in spoken dialogue systems.

  • 32.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Strömbergsson, Sofia
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Question types and some prosodic correlates in 600 questions in the Spontal database of Swedish dialogues2012In: Proceedings Of The 6th International Conference On Speech Prosody, Vols I and  II, Shanghai, China: Tongji Univ Press , 2012, p. 737-740Conference paper (Refereed)
    Abstract [en]

    Studies of questions present strong evidence that there is no one-to-one relationship between intonation and interrogative mode. We present initial steps of a larger project investigating and describing intonational variation in the Spontal database of 120 half-hour spontaneous dialogues in Swedish, and testing the hypothesis that the concept of a standard question intonation such as a final pitch rise contrasting a final low declarative intonation is not consistent with the pragmatic use of intonation in dialogue. We report on the extraction of 600 questions from the Spontal corpus, coding and annotation of question typology, and preliminary results concerning some prosodic correlates related to question type.

  • 33.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Strömbergsson, Sofia
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Telling questions from statements in spoken dialogue systems2012In: Proc. of SLTC 2012, Lund, Sweden, 2012Conference paper (Refereed)
  • 34.
    Granström, Björn
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Audiovisual representation of prosody in expressive speech communication2004In: Proc of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, p. 393-396Conference paper (Refereed)
  • 35.
    Granström, Björn
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Audiovisual representation of prosody in expressive speech communication2005In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 46, no 3-4, p. 473-484Article in journal (Refereed)
    Abstract [en]

    Prosody in a single speaking style-often read speech-has been studied extensively in acoustic speech. During the past few years we have expanded our interest in two directions: (1) Prosody in expressive speech communication and (2) prosody as an audiovisual expression. Understanding the interactions between visual expressions (primarily in the face) and the acoustics of the corresponding speech presents a substantial challenge. Some of the visual articulation is for obvious reasons tightly connected to the acoustics (e.g. lip and jaw movements), but there are other articulatory movements that do not show up on the outside of the face. Furthermore, many facial gestures used for communicative purposes do not affect the acoustics directly, but might nevertheless be connected on a higher communicative level in which the timing of the gestures could play an important role. In this presentation we will give some examples of recent work, primarily at KTH, addressing these questions. We will report on methods for the acquisition and modeling of visual and acoustic data, and some evaluation experiments in which audiovisual prosody is tested. The context of much of our work in this area is to create an animated talking agent capable of displaying realistic communicative behavior and suitable for use in conversational spoken language systems, e.g. a virtual language teacher.

  • 36.
    Granström, Björn
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Effective Interaction with Talking Animated Agents in Dialogue Systems2005In: Advances in Natural Multimodal Dialogue Systems / [ed] van Kuppevelt, J.; Dybkjaer, L.; Bernsen, N. O., Springer Netherlands, 2005, p. 215-243Chapter in book (Refereed)
    Abstract [en]

    At the Centre for Speech Technology at KTH, we have for the past several years been developing spoken dialogue applications that include animated talking agents. Our motivation for moving into audiovisual output is to investigate the advantages of multimodality in human-system communication. While the mainstream character animation area has focussed on the naturalness and realism of the animated agents, our primary concern has been the possible increase of intelligibility and efficiency of interaction resulting from the addition of a talking face. In our first dialogue system, Waxholm, the agent used the deictic function of indicating specific information on the screen by eye gaze. In another project, Synface, we were specifically concerned with the advantages in intelligibility that a talking face could provide. In recent studies we have investigated the use of facial gesture cues to convey such dialogue-related functions as feedback and turn-taking as well as prosodic functions such as prominence. Results show that cues such as eyebrow and head movement can independently signal prominence. Current results also indicate that there can be considerable differences in cue strengths among visual cues such as smiling and nodding and that such cues can contribute in an additive manner together with auditory prosody as cues to different dialogue functions. Results from some of these studies are presented in the chapter along with examples of spoken dialogue applications using talking heads.

  • 37.
    Granström, Björn
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Inside out - Acoustic and visual aspects of verbal and non-verbal communication: Keynote Paper2007In: Proceedings of the 16th International Congress of Phonetic Sciences, Saarbrücken / [ed] Trouvain, J.; Barry, W., 2007, p. 11-18Conference paper (Refereed)
    Abstract [en]

    In face-to-face communication both visual andauditory information play an obvious andsignificant role. In this presentation we will discusswork done, primarily at KTH, that aims atanalyzing and modelling verbal and non-verbalcommunication from a multi-modal perspective. Inour studies, it appears that both segmental andprosodic phenomena are strongly affected by thecommunicative context of speech interaction. Oneplatform for modelling audiovisual speechcommunication is the ECA, embodiedconversational agent. We will describe how ECAshave been used in our research, including examplesof applications and a series of experiments forstudying multimodal aspects of speechcommunication.

  • 38.
    Granström, Björn
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Measuring and modeling audiovisual prosody for animated agents2006In: Proceedings of Speech Prosody 2006. Dresden, 2006Conference paper (Refereed)
    Abstract [en]

    Understanding the interactions between visual expressions, dialogue functions and the acoustics of the corresponding speech presents a substantial challenge. The context of much of our work in this area is to create an animated talking agent capable of displaying realistic communicative behavior and suitable for use in conversational spoken language systems, e.g. a virtual language teacher. In this presentation we will give some examples of recent work, primarily at KTH, involving the collection and analysis of a database for audiovisual prosody. We will report on methods for the acquisition and modeling of visual and acoustic data, and provide some examples of analysis of head nods and eyebrow settings.

  • 39.
    Granström, Björn
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Modelling and evaluating verbal and non-verbal communication in talking animated interface agents2007In: Evaluation of Text and Speech Systems / [ed] Dybkjaer, l.; Hemsen, H.; Minker, W., Dordrecht: Springer-Verlag Ltd , 2007, p. 65-98Chapter in book (Refereed)
    Abstract [en]

    The use of animated talking agents is a novel feature of many multimodal experimental spoken dialogue systems. The addition and integration of a virtual talking head has direct implications for the way in which users approach and interact with such systems. Established techniques for evaluating the quality, efficiency, and other impacts of this technology have not yet appeared in standard textbooks. The focus of this chapter is to look into the communicative function of the agent, both the capability to increase intelligibility of the spoken interaction and the possibility to make the flow of the dialogue smoother, through different kinds of communicative gestures such as gestures for emphatic stress, emotions, turntaking, and negative or positive system feedback. The chapter reviews state-of-the-art animated agent technologies and their applications primarily in dialogue systems. The chapter also includes examples of methods of evaluating communicative gestures in different contexts.

  • 40. Horne, Merle
    et al.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Svantesson, Jan-Olof
    Touati, Paul
    Gösta Bruce 1947-2010 In Memoriam2010In: Phonetica, ISSN 0031-8388, E-ISSN 1423-0321, Vol. 67, no 4, p. 268-270Article in journal (Refereed)
  • 41.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Final rises and Swedish question intonation2004In: Proc of The XVIIth Swedish Phonetics Conference, Fonetik 2004, Stockholm University, 2004, p. 56-59Conference paper (Other academic)
    Abstract [en]

    Phrase-final intonation was analysed in a subcorpusof Swedish computer-directed questionutterances with the objective of investigatingthe extent to which final rises occur in spontaneousquestions, and also to see if such risesmight have pragmatic functions over and beyondthe signalling of interrogative mode. Finalrises occurred in 22 percent of the utterances.Final rises occurred mostly in conjunctionwith final focal accent. Children exhibitedthe largest percentage of final rises (32%), withwomen second (27%) and men lowest (17%).These results are viewed in relationship to resultsof related perception studies and are discussedin terms of Swedish question intonationand the pragmatic social function of rises in abiological account of intonation.

  • 42.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Final rises in spontaneous Swedish computer-directed questions: incidence and function2004In: Proc of Intl Conference on Speech Prosody 2004 / [ed] Bel, B.; Marlin, I., Nara, Japan, 2004, p. 115-118Conference paper (Refereed)
    Abstract [en]

    Phrase-final intonation was analysed in a subcorpus of Swedish computer-directed question utterances with the objective of investigating the extent to which final rises occur in spontaneous questions, and also to see if such rises might have pragmatic functions over and beyond the signalling of interrogative mode. Final rises occurred in 22 percent of the utterances. Final rises occurred mostly in conjunction with final focal accent. Children exhibited the largest percentage of final rises (32%), with women second (27%) and men lowest (17%). These results are discussed in terms of Swedish question intonation and the pragmatic social function of rises in a biological account of intonation.

  • 43.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Fonetiska undersökningar av kammu2005In: Kammu - om ett folk i Laos / [ed] Lundström, H.; Svantesson, J.-O., Lund: Lunds universitetshistoriska sällskap , 2005, p. 164-167Chapter in book (Refereed)
  • 44.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Integrating Audio and Visual Cues for Speaker Friendliness in Multimodal Speech Synthesis2007In: INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, BAIXAS: ISCA-INST SPEECH COMMUNICATION ASSOC , 2007, p. 1461-1464Conference paper (Refereed)
    Abstract [en]

    This paper investigates interactions between audio and visual cues to friendliness in questions in two perception experiments. In the first experiment, manually edited parametric audio-visual synthesis was used to create the stimuli. Results were consistent with earlier findings in that a late, high final focal accent peak was perceived as friendlier than an earlier, lower focal accent peak. Friendliness was also effectively signaled by visual facial parameters such as a smile, head nod and eyebrow raising synchronized with the final accent. Consistent additive effects were found between the audio and visual cues for the subjects as a group and individually showing that subjects integrate the two modalities. The second experiment used data-driven visual synthesis where the database was recorded by an actor instructed to portray anger and happiness. Friendliness was correlated to the happy database, but the effect was not as strong as for the parametric synthesis.

  • 45.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    On the interaction of audio and visual cues to friendliness in interrogative prosody2006In: Proceedings of The Nordic Conference on Multimodal Communication, 2005, Göteborg, 2006, p. 201-213Conference paper (Refereed)
  • 46.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Perception and production of phrase-final intonation in Swedish questions2006In: Nordic Prosody, Proceedings of the IXth Conference, Lund 2004 / [ed] Bruce, G.; Horne, M., Frankfurt am Main: Peter Lang , 2006, p. 127-136Conference paper (Refereed)
  • 47.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Phrase-final rises as a prosodic feature in wh-questions in Swedish human-machine dialogue2005In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 46, no 3-4, p. 268-283Article in journal (Refereed)
    Abstract [en]

    This paper examines the extent to which optional final rises occur in a set of 200 wh-questions extracted from a large corpus of computer-directed spontaneous speech in Swedish and discusses the function these rises may have in signalling dialogue acts and speaker attitude over and beyond an information question. Final rises occurred in 22% of the utterances, primarily in conjunction with final focal accent. Children exhibited the largest percentage of final rises (32%), with women second (27%) and men lowest (17%). The distribution of the rises in the material is examined and evidence relating to the final rise as a signal of a social interaction oriented dialogue act is gathered from the distribution. Two separate perception tests were carried out to test the hypothesis that high and late focal accent peaks in a wh-question are perceived as friendlier and more socially interested than low and early peaks. Generally, the results were consistent with these hypotheses when the late peaks were in phrase-final position. Finally, the results of this study are discussed in terms of pragmatic and attitudinal meanings and biological codes.

  • 48.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Pitch and alignment in the perception of tone and intonation2004In: From Traditional Phonology to Modern Speech Processing / [ed] Fant, G.; Fujisaki, H.; Cao, J.; Xu, Y., Beijing: Foreign Language Teaching and Research Press , 2004, p. 189-204Chapter in book (Refereed)
  • 49.
    House, David
    KTH, Superseded Departments, Speech, Music and Hearing.
    Pitch and alignment in the perception of tone and intonation: pragmatic signals and biological codes2004In: Proc of International Symposium on Tonal Aspects of Languages: Emphasis on Tone Languages / [ed] Bel, B.; Marlein, I., Beijng, China, 2004, p. 93-96Conference paper (Refereed)
  • 50.
    House, David
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Alexanderson, Simon
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    On the temporal domain of co-speech gestures: syllable, phrase or talk spurt?2015In: Proceedings of Fonetik 2015 / [ed] Lundmark Svensson, M.; Ambrazaitis, G.; van de Weijer, J., 2015, p. 63-68Conference paper (Other academic)
    Abstract [en]

    This study explores the use of automatic methods to detect and extract handgesture movement co-occuring with speech. Two spontaneous dyadic dialogueswere analyzed using 3D motion-capture techniques to track hand movement.Automatic speech/non-speech detection was performed on the dialogues resultingin a series of connected talk spurts for each speaker. Temporal synchrony of onsetand offset of gesture and speech was studied between the automatic hand gesturetracking and talk spurts, and compared to an earlier study of head nods andsyllable synchronization. The results indicated onset synchronization between headnods and the syllable in the short temporal domain and between the onset of longergesture units and the talk spurt in a more extended temporal domain.

12 1 - 50 of 83
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf