Change search
Refine search result
12 1 - 50 of 99
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Blomberg, Mats
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Mirning, N.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Talking with Furhat - multi-party interaction with a back-projected robot head2012In: Proceedings of Fonetik 2012, Gothenberg, Sweden, 2012, 109-112 p.Conference paper (Other academic)
    Abstract [en]

    This is a condensed presentation of some recent work on a back-projected robotic head for multi-party interaction in public settings. We will describe some of the design strategies and give some preliminary analysis of an interaction database collected at the Robotville exhibition at the London Science Museum

  • 2.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Bollepalli, Bajibabu
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hussen-Abdelaziz, A.
    Johansson, Martin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Koutsombogera, M.
    Lopes, J. D.
    Novikova, J.
    Oertel, Catharine
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Varol, G.
    Human-robot Collaborative Tutoring Using Multiparty Multimodal Spoken Dialogue2014Conference paper (Refereed)
    Abstract [en]

    In this paper, we describe a project that explores a novel experi-mental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robotinteraction setup is designed, and a human-human dialogue corpus is collect-ed. The corpus targets the development of a dialogue system platform to study verbal and nonverbaltutoring strategies in mul-tiparty spoken interactions with robots which are capable of spo-ken dialogue. The dialogue task is centered on two participants involved in a dialogueaiming to solve a card-ordering game. Along with the participants sits a tutor (robot) that helps the par-ticipants perform the task, and organizes and balances their inter-action. Differentmultimodal signals captured and auto-synchronized by different audio-visual capture technologies, such as a microphone array, Kinects, and video cameras, were coupled with manual annotations. These are used build a situated model of the interaction based on the participants personalities, their state of attention, their conversational engagement and verbal domi-nance, and how that is correlated with the verbal and visual feed-back, turn-management, and conversation regulatory actions gen-erated by the tutor. Driven by the analysis of the corpus, we will show also the detailed design methodologies for an affective, and multimodally rich dialogue system that allows the robot to meas-ure incrementally the attention states, and the dominance for each participant, allowing the robot head Furhat to maintain a well-coordinated, balanced, and engaging conversation, that attempts to maximize the agreement and the contribution to solve the task. This project sets the first steps to explore the potential of us-ing multimodal dialogue systems to build interactive robots that can serve in educational, team building, and collaborative task solving applications.

  • 3.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Mirning, Nicole
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Tscheligi, Manfred
    Furhat goes to Robotville: a large-scale multiparty human-robot interaction data collection in a public space2012In: Proc of LREC Workshop on Multimodal Corpora, Istanbul, Turkey, 2012Conference paper (Refereed)
    Abstract [en]

    In the four days of the Robotville exhibition at the London Science Museum, UK, during which the back-projected head Furhat in a situated spoken dialogue system was seen by almost 8 000 visitors, we collected a database of 10 000 utterances spoken to Furhat in situated interaction. The data collection is an example of a particular kind of corpus collection of human-machine dialogues in public spaces that has several interesting and specific characteristics, both with respect to the technical details of the collection and with respect to the resulting corpus contents. In this paper, we take the Furhat data collection as a starting point for a discussion of the motives for this type of data collection, its technical peculiarities and prerequisites, and the characteristics of the resulting corpus.

  • 4.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Analysis of gaze and speech patterns in three-party quiz game interaction2013In: Interspeech 2013, 2013, 1126-1130 p.Conference paper (Refereed)
    Abstract [en]

    In order to understand and model the dynamics between interaction phenomena such as gaze and speech in face-to-face multiparty interaction between humans, we need large quantities of reliable, objective data of such interactions. To date, this type of data is in short supply. We present a data collection setup using automated, objective techniques in which we capture the gaze and speech patterns of triads deeply engaged in a high-stakes quiz game. The resulting corpus consists of five one-hour recordings, and is unique in that it makes use of three state-of-the-art gaze trackers (one per subject) in combination with a state-of-theart conical microphone array designed to capture roundtable meetings. Several video channels are also included. In this paper we present the obstacles we encountered and the possibilities afforded by a synchronised, reliable combination of large-scale multi-party speech and gaze data, and an overview of the first analyses of the data. Index Terms: multimodal corpus, multiparty dialogue, gaze patterns, multiparty gaze.

  • 5.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Multimodal Multiparty Social Interaction with the Furhat Head2012Conference paper (Refereed)
    Abstract [en]

    We will show in this demonstrator an advanced multimodal and multiparty spoken conversational system using Furhat, a robot head based on projected facial animation. Furhat is a human-like interface that utilizes facial animation for physical robot heads using back-projection. In the system, multimodality is enabled using speech and rich visual input signals such as multi-person real-time face tracking and microphone tracking. The demonstrator will showcase a system that is able to carry out social dialogue with multiple interlocutors simultaneously with rich output signals such as eye and head coordination, lips synchronized speech synthesis, and non-verbal facial gestures used to regulate fluent and expressive multiparty conversations.

  • 6.
    Bell, Linda
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing. Telia Research, Sweden.
    Boye, Johan
    KTH, Superseded Departments, Speech, Music and Hearing.
    Gustafson, Joakim
    KTH, Superseded Departments, Speech, Music and Hearing. Telia Research, Sweden.
    Real-time Handling of Fragmented Utterances2001In: Proceedings of the NAACL Workshop on Adaption in Dialogue Systems, 2001Conference paper (Refereed)
    Abstract [en]

    this paper, we discuss an adaptive method of handling fragmented user utterances to a speech-based multimodal dialogue system. Inserted silent pauses between fragments present the following problem: Does the current silence indicate that the user has completed her utterance, or is the silence just a pause between two fragments, so that the system should wait for more input? Our system incrementally classifies user utterances as either closing (more input is unlikely to come) or non-closing (more input is likely to come), partly depending on the current dialogue state. Utterances that are categorized as non-closing allow the dialogue system to await additional spoken or graphical input before responding

  • 7. Bell, Linda
    et al.
    Boye, Johan
    Gustafson, Joakim
    TeliaSonera.
    Heldner, Mattias
    TeliaSonera.
    Lindström, Anders
    Wirén, Mats
    The Swedish NICE Corpus: Spoken dialogues between children and embodied characters in a computer game scenario2005In: 9th European Conference on Speech Communication and Technology, Lisbon, Portugal, 2005, 2765-2768 p.Conference paper (Refereed)
    Abstract [en]

    This article describes the collection and analysis of a Swedish database of spontaneous and unconstrained children-machine dialogues. The Swedish NICE corpus consists ofspoken dialogues between children aged 8 to 15 and embodied fairy-tale characters in acomputer game scenario. Compared to previously collected corpora of children'scomputer-directed speech, the Swedish NICE corpus contains extended interactions, including three-party conversation, in which the young users used spoken dialogue asthe primary means of progression in the game.

  • 8.
    Bell, Linda
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Boye, Johan
    KTH, Superseded Departments, Speech, Music and Hearing.
    Gustafson, Joakim
    KTH, Superseded Departments, Speech, Music and Hearing.
    Wirén, Mats
    Modality Convergence in a Multimodal Dialogue System2000In: Proceedings of Götalog, 2000, 29-34 p.Conference paper (Other academic)
    Abstract [en]

    When designing multimodal dialogue systems allowing speech as well as graphical operations, it is important to understand not only how people make use of the different modalities in their utterances, but also how the system might influence a user’s choice of modality by its own behavior. This paper describes an experiment in which subjects interacted with two versions of a simulated multimodal dialogue system. One version used predominantly graphical means when referring to specific objects; the other used predominantly verbal referential expressions. The purpose of the study was to find out what effect, if any, the system’s referential strategy had on the user’s behavior. The results provided limited support for the hypothesis that the system can influence users to adopt another modality for the purpose of referring

  • 9.
    Bell, Linda
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Eklund, Robert
    Gustafson, Joakim
    KTH, Superseded Departments, Speech, Music and Hearing.
    A Comparison of Disfluency Distribution in a Unimodal and a Multimodal Speech Interface2000In: Proceedings of ICSLP 00, 2000Conference paper (Other academic)
    Abstract [en]

    In this paper, we compare the distribution of disfluencies in two human--computer dialogue corpora. One corpus consists of unimodal travel booking dialogues, which were recorded over the telephone. In this unimodal system, all components except the speech recognition were authentic. The other corpus was collected using a semi-simulated multi-modal dialogue system with an animated talking agent and a clickable map. The aim of this paper is to analyze and discuss the effects of modality, task and interface design on the distribution and frequency of disfluencies in these two corpora.

  • 10.
    Bell, Linda
    et al.
    TeliaSonera R and D, Sweden.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Children’s convergence in referring expressions to graphical objects in a speech-enabled computer game2007In: 8th Annual Conference of the International Speech Communication Association, Antwerp, Belgium, 2007, 2788-2791 p.Conference paper (Refereed)
    Abstract [en]

    This paper describes an empirical study of children's spontaneous interactions with an animated character in a speech-enabled computer game. More specifically, it deals with convergence of referring expressions. 49 children were invited to play the game, which was initiated by a collaborative "put-that-there" task. In order to solve this task, the children had to refer to both physical objects and icons in a 3D environment. For physical objects, which were mostly referred to using straight-forward noun phrases, lexical convergence took place in 90% of all cases. In the case of the icons, the children were more innovative and spontaneously referred to them in many different ways. Even after being prompted by the system, lexical convergence took place for only 50% of the icons. In the cases where convergence did take place, the effect of the system's prompts were quite local, and the children quickly resorted to their original way of referring when naming new icons in later tasks.

  • 11.
    Bell, Linda
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Gustafson, Joakim
    KTH, Superseded Departments, Speech, Music and Hearing.
    Positive and Negative User Feedback in a Spoken Dialogue Corpus2000In: Proceedings of ICSLP 00, 2000Conference paper (Other academic)
    Abstract [en]

    This paper examines feedback strategies in a Swedish corpus of multimodal human--computer interaction. The aim of the study is to investigate how users provide positive and negative feedback to a dialogue system and to discuss the function of these utterances in the dialogues. User feedback in the AdApt corpus was labeled and analyzed, and its distribution in the dialogues is discussed. The question of whether it is possible to utilize user feedback in future systems is considered. More specifically, we discuss how error handling in human--computer dialogue might be improved through greater knowledge of user feedback strategies. In the present corpus, almost all subjects used positive or negative feedback at least once during their interaction with the system. Our results indicate that some types of feedback more often occur in certain positions in the dialogue. Another observation is that there appear to be great individual variations in feedback strategies, so that certain subjects give feedback at almost every turn while others rarely or never respond to a spoken dialogue system in this manner. Finally, we discuss how feedback could be used to prevent problems in human--computer dialogue.

  • 12.
    Bell, Linda
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Gustafson, Joakim
    KTH, Superseded Departments, Speech, Music and Hearing.
    Repetition and its phonetic realizations: investigating a Swedish databaseof spontaneous computer directed speech1999In: Proceedings of the XIVth International Congress of Phonetic Sciences / [ed] Ohala, John, 1999, 1221- p.Conference paper (Other academic)
    Abstract [en]

    This paper is an investigation of repetitive utterances in a Swedish database of spontaneous computer-directed speech. A spoken dialogue system was installed in a public location in downtown Stockholm and spontaneous human-computerinteractions with adults and children were recorded [1]. Several acoustic and prosodic features such as duration, shifting of focusand hyperarticulation were examined to see whether repetitions could be distinguished from what the users first said to the system. The present study indicates that adults and children use partly different strategies as they attempt to resolve errors by means of repetition. As repetition occurs, duration is increased and words are often hyperarticulated or contrastively focused. These results could have implications for the development of future spoken dialogue systems with robust error handling.

  • 13.
    Bertenstam, Johan
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Mats, Blomberg
    KTH, Superseded Departments, Speech, Music and Hearing.
    Carlson, Rolf
    KTH, Superseded Departments, Speech, Music and Hearing.
    Elenius, Kjell
    KTH, Superseded Departments, Speech, Music and Hearing.
    Granström, Björn
    KTH, Superseded Departments, Speech, Music and Hearing.
    Gustafson, Joakim
    KTH, Superseded Departments, Speech, Music and Hearing.
    Hunnicutt, Sheri
    Högberg, Jesper
    KTH, Superseded Departments, Speech, Music and Hearing.
    Lindell, Roger
    KTH, Superseded Departments, Speech, Music and Hearing.
    Neovius, Lennart
    KTH, Superseded Departments, Speech, Music and Hearing.
    Nord, Lennart
    de Serpa-Leitao, Antonio
    KTH, Superseded Departments, Speech, Music and Hearing.
    Ström, Nikko
    KTH, Superseded Departments, Speech, Music and Hearing.
    Spoken dialogue data collected in the Waxholm project1995In: Quarterly progress and status report: April 15, 1995 /Speech Transmission Laboratory, Stockholm: KTH , 1995, 1, 50-73 p.Chapter in book (Other academic)
  • 14.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Face-to-Face Interaction and the KTH Cooking Show2010In: DEVELOPMENT OF MULTIMODAL INTERFACES: ACTIVE LISTING AND SYNCHRONY / [ed] Esposito A; Campbell N; Vogel C; Hussain A; Nijholt A, 2010, Vol. 5967, 157-168 p.Conference paper (Refereed)
    Abstract [en]

    We share our experiences with integrating motion capture recordings in speech and dialogue research by describing (1) Spontal, a large project collecting 60 hours of video, audio and motion capture spontaneous dialogues, is described with special attention to motion capture and its pitfalls; (2) a tutorial where we use motion capture, speech synthesis and an animated talking head to allow students to create an active listener; and (3) brief preliminary results in the form of visualizations of motion capture data over time in a Spontal dialogue. We hope that given the lack of writings on the use of motion capture for speech research, these accounts will prove inspirational and informative.

  • 15.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Jonsson, Oskar
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Speech technology in the European project MonAMI2008In: Proceedings of FONETIK 2008 / [ed] Anders Eriksson, Jonas Lindh, Gothenburg, Sweden: University of Gothenburg , 2008, 33-36 p.Conference paper (Other academic)
    Abstract [en]

    This paper describes the role of speech and speech technology in the European project MonAMI, which aims at “mainstreaming ac-cessibility in consumer goods and services, us-ing advanced technologies to ensure equal ac-cess, independent living and participation for all”. It presents the Reminder, a prototype em-bodied conversational agent (ECA) which helps users to plan activities and to remember what to do. The prototype merges speech technology with other, existing technologies: Google Cal-endar and a digital pen and paper. The solution allows users to continue using a paper calendar in the manner they are used to, whilst the ECA provides notifications on what has been written in the calendar. Users may also ask questions such as “When was I supposed to meet Sara?” or “What’s on my schedule today?”

  • 16.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Innovative interfaces in MonAMI: The Reminder2008In: Perception In Multimodal Dialogue Systems, Proceedings / [ed] Andre, E; Dybkjaer, L; Minker, W; Neumann, H; Pieraccini, R; Weber, M, 2008, Vol. 5078, 272-275 p.Conference paper (Refereed)
    Abstract [en]

    This demo paper presents the first version of the Reminder, a prototype ECA developed in the European project MonAMI, which aims at "main-streaming accessibility in consumer goods and services, using advanced technologies to ensure equal access, independent living and participation for all". The Reminder helps users to plan activities and to remember what to do. The prototype merges ECA technology with other, existing technologies: Google Calendar and a digital pen and paper. This innovative combination of modalities allows users to continue using a paper calendar in the manner they are used to, whilst the ECA provides verbal notifications on what has been written in the calendar. Users may also ask questions such as "When was I supposed to meet Sara?" or "What's on my schedule today?"

  • 17.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Tobiasson, Helena
    KTH, School of Computer Science and Communication (CSC), Human - Computer Interaction, MDI (closed 20111231).
    The MonAMI Reminder: a spoken dialogue system for face-to-face interaction2009In: Proceedings of the 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009, Brighton, U.K, 2009, 300-303 p.Conference paper (Refereed)
    Abstract [en]

    We describe the MonAMI Reminder, a multimodal spoken dialogue system which can assist elderly and disabled people in organising and initiating their daily activities. Based on deep interviews with potential users, we have designed a calendar and reminder application which uses an innovative mix of an embodied conversational agent, digital pen and paper, and the web to meet the needs of those users as well as the current constraints of speech technology. We also explore the use of head pose tracking for interaction and attention control in human-computer face-to-face interaction.

  • 18.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Modelling humanlike conversational behaviour2010In: SLTC 2010: The Third Swedish Language Technology Conference (SLTC 2010), Proceedings of the Conference, Linköping, Sweden, 2010, 9-10 p.Conference paper (Other academic)
    Abstract [en]

    We have a visionar y goal: to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is humanlike. We take the opportunity here to present four new projects inaugurated in 2010, each adding pieces of the puzzle through a shared research focus: modelling interactional aspects of spoken face-to-face communication.

  • 19.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    House, David
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Research focus: Interactional aspects of spoken face-to-face communication2010In: Proceedings from Fonetik, Lund, June 2-4, 2010: / [ed] Susanne Schötz, Gilbert Ambrazaitis, Lund, Sweden: Lund University , 2010, 7-10 p.Conference paper (Other academic)
    Abstract [en]

    We have a visionary goal: to learn enough about human face-to-face interaction that we are able to create an artificial conversational partner that is human-like. We take the opportunity here to present four new projects inaugurated in 2010, each adding pieces of the puzzle through a shared research focus: interactional aspects of spoken face-to-face communication.

  • 20.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Experiments with Synthesis of Swedish Dialects2009In: Proceedings of Fonetik 2009 / [ed] Peter Branderud, Hartmut Traunmüller, Stockholm: Stockholm University, 2009, 28-29 p.Conference paper (Other academic)
    Abstract [en]

    We describe ongoing work on synthesizing Swedish dialects with an HMM synthesizer. A prototype synthesizer has been trained on alarge database for standard Swedish read by a professional male voice talent. We have selected a few untrained speakers from each ofthe following dialectal region: Norrland, Dala,Göta, Gotland and South of Sweden. The planis to train a multi-dialect average voice, and then use 20-30 minutes of dialectal speech from one speaker to adapt either the standard Swedish voice or the average voice to the dialect of that speaker.

  • 21.
    Blomberg, Mats
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Granström, Björn
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Children and adults in dialogue with the robot head Furhat - corpus collection and initial analysis2012In: Proceedings of WOCCI, Portland, OR, 2012Conference paper (Refereed)
  • 22.
    Bollepalli, Bajibabu
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    HMM based speech synthesis system for Swedish Language2012In: The Fourth Swedish Language Technology Conference, Lund, Sweden, 2012Conference paper (Refereed)
  • 23.
    Bollepalli, Bajibabu
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Non-Linear Pitch Modification in Voice Conversion using Artificial Neural Networks2013In: Advances in nonlinear speech processing: 6th International Conference, NOLISP 2013, Mons, Belgium, June 19-21, 2013 : proceedings, Springer Berlin/Heidelberg, 2013, 97-103 p.Conference paper (Refereed)
    Abstract [en]

    Majority of the current voice conversion methods do not focus on the modelling local variations of pitch contour, but only on linear modification of the pitch values, based on means and standard deviations. However, a significant amount of speaker related information is also present in pitch contour. In this paper we propose a non-linear pitch modification method for mapping the pitch contours of the source speaker according to the target speaker pitch contours. This work is done within the framework of Artificial Neural Networks (ANNs) based voice conversion. The pitch contours are represented with Discrete Cosine Transform (DCT) coefficients at the segmental level. The results evaluated using subjective and objective measures confirm that the proposed method performed better in mimicking the target speaker's speaking style when compared to the linear modification method.

  • 24.
    Bollepalli, Bajibabu
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Urbain, Jerome
    Raitio, Tuomo
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Cakmak, Huseyin
    A COMPARATIVE EVALUATION OF VOCODING TECHNIQUES FOR HMM-BASED LAUGHTER SYNTHESIS2014Conference paper (Refereed)
    Abstract [en]

    This paper presents an experimental comparison of various leading vocoders for the application of HMM-based laughter synthesis. Four vocoders, commonly used in HMM-based speech synthesis, are used in copy-synthesis and HMM-based synthesis of both male and female laughter. Subjective evaluations are conducted to assess the performance of the vocoders. The results show that all vocoders perform relatively well in copy-synthesis. In HMM-based laughter synthesis using original phonetic transcriptions, all synthesized laughter voices were significantly lower in quality than copy-synthesis, indicating a challenging task and room for improvements. Interestingly, two vocoders using rather simple and robust excitation modeling performed the best, indicating that robustness in speech parameter extraction and simple parameter representation in statistical modeling are key factors in successful laughter synthesis.

  • 25. Boye, J.
    et al.
    Gustafson, Joakim
    Wiren, M.
    Robust spoken language understanding in a computer game2006In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 48, no 03-4, 335-353 p.Article in journal (Refereed)
    Abstract [en]

    We present and evaluate a robust method for the interpretation of spoken input to a conversational computer game. The scenario- of the game is that of a player interacting with embodied fairy-tale characters in a 3D world via spoken dialogue (supplemented by graphical pointing actions) to solve various problems. The player himself cannot directly perform actions in the world, but interacts with the fairy-tale characters to have them perform various tasks, and to get information about the world and the problems to solve. Hence the role of spoken dialogue as the primary means of control is obvious and natural to the player. Naturally, this means that robust spoken language understanding becomes a critical component. To this end, the paper describes a semantic representation formalism and an accompanying parsing algorithm which works off the output of the speech recogniser's statistical language model. The evaluation shows that the parser is robust in the sense of considerably improving on the noisy output of the speech recogniser.

  • 26.
    Boye, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Fredriksson, M.
    Götze, Jana
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Königsmann, J.
    Walk this way: Spatial grounding for city exploration2012In: IWSDS, 2012Conference paper (Refereed)
  • 27.
    Boye, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Fredriksson, Morgan
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Götze, Jana
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Königsmann, Jurgen
    KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.
    Walk this way: Spatial grounding for city exploration2014In: Natural interaction with robots, knowbots and smartphones, Springer-Verlag , 2014, 59-67 p.Chapter in book (Refereed)
    Abstract [en]

    Recently there has been an interest in spatially aware systems for pedestrian routing and city exploration, due to the proliferation of smartphones with GPS receivers among the general public. Since GPS readings are noisy, giving good and well-timed route instructions to pedestrians is a challenging problem. This paper describes a spoken-dialogue prototype for pedestrian navigation in Stockholm that addresses this problem by using various grounding strategies.

  • 28.
    Boye, Johan
    et al.
    TeliaSonera.
    Gustafson, Joakim
    TeliaSonera.
    How to do dialogue in a fairy-tale world2005In: Proceedings of the 6th SIGDial workshop on discourse and dialogue, 2005Conference paper (Refereed)
    Abstract [en]

    The work presented in this paper is an endeavor tocreate a prototype of a computer game with spokendialogue capabilities. Advanced spoken dialogue hasthe potential to considerably enrich computer games,where it for example would allow players to refer topast events and to objects currently not visible onthe screen. It would also allaow users to interactsocially and to negotiate solutions with the gamecharacters. The game takes place in a fairy-taleworld, and features two different fairy-talecharacters, who can interact with the player and witheach other using spoken dialogue. The fairy-talecharacters are separate entities in the sense that eachcharacter has its own set of goals and its ownperception of the world. This paper gives anoverview of the functionality of the implementeddialogue manager in the NICE fairy-tale gamesystem.

  • 29.
    Boye, Johan
    et al.
    TeliaSonera.
    Wirén, Mats
    TeliaSonera.
    Gustafson, Joakim
    TeliaSonera.
    Contextual reasoning in multimodal dialogue systems: two case studies2004In: Proceedings of The 8th Workshop on the Semantics and Pragmatics of Dialogue Catalogue'04, Barcelona, 2004, 19-21 p.Conference paper (Refereed)
    Abstract [en]

    This paper describes an approach to contextual reasoning for interpretation ofspoken multimodal dialogue. The approach is based on combining recencybased search for antecedents with an object-oriented domain representation insuch a way that the search is highly constrained by the type information of theantecedents. By furthermore representingcandidate antecedents from the dialoguehistory and visual context in a uniformway, a single machinery (based on -reduction in lambda calculus) can be usedfor resolving many kinds of underspecified utterances. The approach has beenimplemented in two highly different domains.

  • 30.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Tånnander, Christina
    Swedish Agency for Accessible Media, MTM, Stockholm, Sweden.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Audience response system based annotation of speech2013In: Proceedings of Fonetik 2013, Linköping: Linköping University , 2013, 13-16 p.Conference paper (Other academic)
    Abstract [en]

    Manual annotators are often used to label speech. The task is associated with high costs and with great time consumption. We suggest to reach an increased throughput while maintaining a high measure of experimental control by borrowing from the Audience Response Systems used in the film and television industries, and demonstrate a cost-efficient setup for rapid, plenary annotation of phenomena occurring in recorded speech together with some results from studies we have undertaken to quantify the temporal precision and reliability of such annotations.

  • 31.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Tånnander, Christina
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Temporal precision and reliability of audience response system based annotation2013In: Proc. of Multimodal Corpora 2013, 2013Conference paper (Refereed)
  • 32.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edelstam, Fredrik
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Human pause and resume behaviours for unobtrusive humanlike in-car spoken dialogue systems2014In: Proceedings of the of the EACL 2014 Workshop on Dialogue in Motion (DM), Gothenburg, Sweden, 2014, 73-77 p.Conference paper (Refereed)
    Abstract [en]

    This paper presents a first, largely qualitative analysis of a set of human-human dialogues recorded specifically to provide insights in how humans handle pauses and resumptions in situations where the speakers cannot see each other, but have to rely on the acoustic signal alone. The work presented is part of a larger effort to find unobtrusive human dialogue behaviours that can be mimicked and implemented in-car spoken dialogue systems within in the EU project Get Home Safe, a collaboration between KTH, DFKI, Nuance, IBM and Daimler aiming to find ways of driver interaction that minimizes safety issues,. The analysis reveals several human temporal, semantic/pragmatic, and structural behaviours that are good candidates for inclusion in spoken dialogue systems.

  • 33.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Ask the experts: Part II: Analysis2010In: Linguistic Theory and Raw Sound / [ed] Juel Henrichsen, Peter, Frederiksberg: Samfundslitteratur, 2010, 183-198 p.Chapter in book (Refereed)
    Abstract [en]

    We present work fuelled by an urge to understand speech in its original and most fundamental context: in conversation between people. And what better way than to look to the experts? Regarding human conversation, authority lies with the speakers themselves, and asking the experts is a matter of observing and analyzing what speakers do. This is the second part of a two-part discussion which is illustrated with examples mainly from the work at KTH Speech, Music and Hearing. In this part, we discuss methods of extracting useful information from captured data, with a special focus on raw sound.

  • 34.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Cocktail: a demonstration of massively multi-component audio environments for illustration and analysis2010In: SLTC 2010, The Third Swedish Language Technology Conference (SLTC 2010): Proceedings of the Conference, 2010Conference paper (Other academic)
    Abstract [en]

    We present MMAE – Massively Multi-component Audio Environments – a new concept in auditory presentation, and Cocktail – a demonstrator built on this technology. MMAE creates a dynamic audio environment by playing a large number of sound clips simultaneously at different locations in a virtual 3D space. The technique utilizes standard soundboards and is based in the Snack Sound Toolkit. The result is an efficient 3D audio environment that can be modified dynamically, in real time. Applications range from the creation of canned as well as online audio environments for games and entertainment to the browsing, analyzing and comparing of large quantities of audio data. We also demonstrate the Cocktail implementation of MMAE using several test cases as examples.

  • 35.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Hjalmarsson, Anna
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Towards human-like spoken dialogue systems2008In: Speech Communication, ISSN 0167-6393, E-ISSN 1872-7182, Vol. 50, no 8-9, 630-645 p.Article in journal (Refereed)
    Abstract [en]

    This paper presents an overview of methods that can be used to collect and analyse data on user responses to spoken dialogue system components intended to increase human-likeness, and to evaluate how well the components succeed in reaching that goal. Wizard-of-Oz variations, human-human data manipulation, and micro-domains are discussed ill this context, as is the use of third-party reviewers to get a measure of the degree of human-likeness. We also present the two-way mimicry target, a model for measuring how well a human-computer dialogue mimics or replicates some aspect of human-human dialogue, including human flaws and inconsistencies. Although we have added a measure of innovation, none of the techniques is new in its entirely. Taken together and described from a human-likeness perspective, however, they form a set of tools that may widen the path towards human-like spoken dialogue systems.

  • 36.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    On the effect of the acoustic environment on the accuracy of perception of speaker orientation from auditory cues alone2012In: 13th Annual Conference of the International Speech Communication Association 2012, INTERSPEECH 2012, Vol 2, 2012, 1482-1485 p.Conference paper (Refereed)
    Abstract [en]

    The ability of people, and of machines, to determine the position of a sound source in a room is well studied. The related ability to determine the orientation of a directed sound source, on the other hand, is not, but the few studies there are show people to be surprisingly skilled at it. This has bearing for studies of face-to-face interaction and of embodied spoken dialogue systems, as sound source orientation of a speaker is connected to the head pose of the speaker, which is meaningful in a number of ways. The feature most often implicated for detection of sound source orientation is the inter-aural level difference - a feature which it is assumed is more easily exploited in anechoic chambers than in everyday surroundings. We expand here on our previous studies and compare detection of speaker orientation within and outside of the anechoic chamber. Our results show that listeners find the task easier, rather than harder, in everyday surroundings, which suggests that inter-aural level differences is not the only feature at play.

  • 37.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    Voice Technologies, Expert Functions, Teliasonera.
    Two faces of spoken dialogue systems2006In: Interspeech 2006 - ICSLP Satellite Workshop Dialogue on Dialogues: Multidisciplinary Evaluation of Advanced Speech-based Interactive Systems, Pittsburgh PA, USA, 2006Conference paper (Refereed)
    Abstract [en]

    This paper is intended as a basis for discussion. We propose that users may, knowingly or subconsciously, interpret the events that occur when interacting with spoken dialogue systems in more than one way. Put differently, there is more than one metaphor people may use in order to make sense of spoken human-computer dialogue. We further suggest that different metaphors may not play well together. The analysis is consistent with many observations in human-computer interaction and has implications that may be helpful to researchers and developers alike. For example, developers may want to guide users towards a metaphor of their choice and ensure that the interaction is coherent with that metaphor; researchers may need different approaches depending on the metaphor employed in the system they study; and in both cases one would need to have very good reasons to use mixed metaphors.

  • 38.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for Speech Technology, CTT.
    Gustafson, Joakim
    Voice Technologies, Expert Functions, Teliasonera, Haninge, Sweden.
    Utterance segmentation and turn-taking in spoken dialogue systems2005In: Computer Studies in Language and Speech / [ed] Fisseni, B.; Schmitz, H-C.; Schröder, B.; Wagner, P., Frankfurt am Main, Germany: Peter Lang , 2005, 576-587 p.Chapter in book (Refereed)
    Abstract [en]

    A widely used method for finding places to take turn in spoken dialogue systems is to assume that an utterance ends where the user ceases to speak. Such endpoint detection normally triggers on a certain amount of silence, or non-speech. However, spontaneous speech frequently contains silent pauses inside sentencelike units, for example when the speaker hesitates. This paper presents /nailon/, an on-line, real-time prosodic analysis tool, and a number of experiments in which end-point detection has been augmented with prosodic analysis in order to segment the speech signal into what humans intuitively perceive as utterance-like units.

  • 39.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Heldner, Mattias
    Stockholm University, Faculty of Humanities, Department of Linguistics.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Who am I speaking at?: perceiving the head orientation of speakers from acoustic cues alone2012In: Proc. of LREC Workshop on Multimodal Corpora 2012, Istanbul, Turkey, 2012Conference paper (Refereed)
    Abstract [en]

    The ability of people, and of machines, to determine the position of a sound source in a room is well studied. The related ability to determine the orientation of a directed sound source, on the other hand, is not, but the few studies there are show people to be surprisingly skilled at it. This has bearing for studies of face-to-face interaction and of embodied spoken dialogue systems, as sound source orientation of a speaker is connected to the head pose of the speaker, which is meaningful in a number of ways. We describe in passing some preliminary findings that led us onto this line of investigation, and in detail a study in which we extend an experiment design intended to measure perception of gaze direction to test instead for perception of sound source orientation. The results corroborate those of previous studies, and further show that people are very good at performing this skill outside of studio conditions as well.

  • 40.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Oertel, Catharine
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Investigating negotiation for load-time in the GetHomeSafe project2012In: Proc. of Workshop on Innovation and Applications in Speech Technology (IAST), Dublin, Ireland, 2012, 45-48 p.Conference paper (Refereed)
    Abstract [en]

    This paper describes ongoing work by KTH Speech, Music and Hearing in GetHomeSafe, a newly inaugurated EU project in collaboration with DFKI, Nuance, IBM and Daimler. Under the assumption that drivers will utilize technology while driving regardless of legislation, the project aims at finding out how to make the use of in-car technology as safe as possible rather than prohibiting it. We describe the project in general briefly and our role in some more detail, in particular one of our tasks: to build a system that can ask the driver if now is a good time to speak about X? in an unobtrusive manner, and that knows how to deal with rejection, for example by asking the driver to get back when it is a good time or to schedule a time that will be convenient.

  • 41.
    Edlund, Jens
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Tånnander, Christina
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Audience response system-based assessment for analysis-by-synthesis2015In: Proc. of ICPhS 2015, ICPhS , 2015Conference paper (Refereed)
  • 42.
    Gustafson, Joakim
    KTH, Superseded Departments, Speech, Music and Hearing.
    Developing Multimodal Spoken Dialogue Systems: Empirical Studies of Spoken Human–Computer Interaction2002Doctoral thesis, comprehensive summary (Other scientific)
    Abstract [en]

    This thesis presents work done during the last ten years on developing five multimodal spoken dialogue systems, and the empirical user studies that have been conducted with them. The dialogue systems have been multimodal, giving information both verbally with animated talking characters and graphically on maps and in text tables. To be able to study a wider rage of user behaviour each new system has been in a new domain and with a new set of interactional abilities. The five system presented in this thesis are: The Waxholm system where users could ask about the boat traffic in the Stockholm archipelago; the Gulan system where people could retrieve information from the Yellow pages of Stockholm; the August system which was a publicly available system where people could get information about the author Strindberg, KTH and Stockholm; the AdAptsystem that allowed users to browse apartments for sale in Stockholm and the Pixie system where users could help ananimated agent to fix things in a visionary apartment publicly available at the Telecom museum in Stockholm. Some of the dialogue systems have been used in controlled experiments in laboratory environments, while others have been placed inpublic environments where members of the general public have interacted with them. All spoken human-computer interactions have been transcribed and analyzed to increase our understanding of how people interact verbally with computers, and to obtain knowledge on how spoken dialogue systems canutilize the regularities found in these interactions. This thesis summarizes the experiences from building these five dialogue systems and presents some of the findings from the analyses of the collected dialogue corpora.

  • 43.
    Gustafson, Joakim
    et al.
    TeliaSonera AB.
    Bell, L.
    TeliaSonera AB.
    Boye, J.
    TeliaSonera AB.
    Lindström, A.
    TeliaSonera AB.
    Wirén, M.
    TeliaSonera AB.
    The NICE fairy-tale game system2004In: Proceedings of the 5th SIGdial Workshop on Discourse and Dialogue at HLT-NAACL 2004, Boston, 2004Conference paper (Refereed)
  • 44.
    Gustafson, Joakim
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Bell, Linda
    KTH, Superseded Departments, Speech, Music and Hearing.
    Speech technology on trial: Experiences from the August system2000In: Natural Language Engineering, ISSN 1351-3249, E-ISSN 1469-8110, Vol. 6, no 3-4, 273-286 p.Article in journal (Refereed)
    Abstract [en]

    In this paper, the August spoken dialogue system is described. This experimental Swedish dialogue system, which featured an animated talking agent, was exposed to the general public during a trial period of six months. The construction of the system was partly motivated by the need to collect genuine speech data from people with little or no previous experience of spoken dialogue systems. A corpus of more than 10,000 utterances of spontaneous computer- directed speech was collected and empirical linguistic analyses were carried out. Acoustical, lexical and syntactical aspects of this data were examined. In particular, user behavior and user adaptation during error resolution were emphasized. Repetitive sequences in the database were analyzed in detail. Results suggest that computer-directed speech during error resolution is increased in duration, hyperarticulated and contains inserted pauses. Design decisions which may have influenced how the users behaved when they interacted with August are discussed and implications for the development of future systems are outlined.

  • 45.
    Gustafson, Joakim
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Bell, Linda
    KTH, Superseded Departments, Speech, Music and Hearing.
    Johan, Boye
    KTH, Superseded Departments, Speech, Music and Hearing.
    Edlund, Jens
    KTH, Superseded Departments, Speech, Music and Hearing.
    Wirn, Mats
    Constraint Manipulation and Visualization in a Multimodal Dialogue System2002In: Proceedings of MultiModal Dialogue in Mobile Environments, 2002Conference paper (Other academic)
  • 46. Gustafson, Joakim
    et al.
    Boye, Johan
    Fredriksson, M.
    Johannesson, L.
    Königsmann, J.
    Providing computer game characters with conversational abilities2005In: INTELLIGENT VIRTUAL AGENTS, PROCEEDINGS, Kos, Greece, 2005, 37-51 p.Conference paper (Refereed)
    Abstract [en]

    This paper presents the NICE fairy-tale game system, which enables adults and children to engage in conversation with animated characters in a 3D world. In this paper we argue that spoken dialogue technology have the potential to greatly enrichen the user's experience in future computer games. We also present some requirements that have to be fulfilled to successfully integrate spoken dialogue technology with a computer game application. Finally, we briefly describe an implemented system that has provided computer game characters with some conversational abilities that kids have interacted with in studies.

  • 47.
    Gustafson, Joakim
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Ask the experts - Part I: Elicitation2010In: Linguistic Theory and Raw Sound / [ed] Juel Henrichsen, Peter, Samfundslitteratur , 2010, 169-182 p.Chapter in book (Refereed)
    Abstract [en]

    We present work fuelled by an urge to understand speech in its original and most fundamental context: in conversation between people. And what better way than to look to the experts? Regarding human conversation, authority lies with the speakers themselves, and asking the experts is a matter of observing and analyzing what speakers do. This is the first part of a two-part discussion which is illustrated with examples mainly from the work at KTH Speech, Music and Hearing. In this part, we discuss methods of capturing what people do when they talk to each other.

  • 48.
    Gustafson, Joakim
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    EXPROS: Tools for exploratory experimentation with prosody2008In: Proceedings of FONETIK 2008, Gothenburg, Sweden, 2008, 17-20 p.Conference paper (Other academic)
    Abstract [en]

    This demo paper presents EXPROS, a toolkit for experimentation with prosody in diphone voices. Although prosodic features play an important role in human-human spoken dialogue, they are largely unexploited in current spoken dialogue systems. The toolkit contains tools for a number of purposes: for example extraction of prosodic features such as pitch, intensity and duration for transplantation onto synthetic utterances and creation of purpose-built customized MBROLA mini-voices.

  • 49.
    Gustafson, Joakim
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Heldner, Mattias
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Edlund, Jens
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Potential benefits of human-like dialogue behaviour in the call routing domain2008In: Perception In Multimodal Dialogue Systems, Proceedings / [ed] Andre, E; Dybkjaer, L; Minker, W; Neumann, H; Pieraccini, R; Weber, M, 2008, Vol. 5078, 240-251 p.Conference paper (Refereed)
    Abstract [en]

    This paper presents a Wizard-of-Oz (Woz) experiment in the call routing domain that took place during the development of a call routing system for the TeliaSonera residential customer care in Sweden. A corpus of 42,000 calls was used as a basis for identifying problematic dialogues and the strategies used by operators to overcome the problems. A new Woz recording was made, implementing some of these strategies. The collected data is described and discussed with a view to explore the possible benefits of more human-like dialogue behaviour in call routing applications.

  • 50.
    Gustafson, Joakim
    et al.
    KTH, Superseded Departments, Speech, Music and Hearing.
    Larsson, A
    Hellman, K
    How do System Questions Influence Lexical Choices in User Answers?1997In: Proceedings of Eurospeech 97, 1997, 2275-2278 p.Conference paper (Other academic)
    Abstract [en]

    This paper describes some studies on the effect of the system vocabulary on the lexical choices of the users. There are many theories about human-human dialogues that could be useful in the design of spoken dialoguesystems. This paper will give an overview of some of these theories and report the results from two experiments that examines one of these theories, namely lexical entrainment. The first experiment was a small Wizard of Oz-test that simulated a tourist informationsystem with a speech interface, and the second experiment simulated a system with speech recognition that controlled a questionnaire about peoples plans for their vacation. Both experiments show that the subjects mostly adapt their lexical choices to the system questions. Only in less than 5% of the cases did they use an alternative main verb in the answer. These results encourage us to investigate the possibility to add anadaptive language model in the speech recognizer in our dialogue system, where the probabilities for the words used in the system questions are increased.

12 1 - 50 of 99
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf