Change search
Refine search result
1 - 17 of 17
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Bollepalli, Bajibabu
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hussen-Abdelaziz, A.
    Johansson, Martin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Koutsombogera, M.
    Lopes, J. D.
    Novikova, J.
    Oertel, Catharine
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Varol, G.
    Human-robot Collaborative Tutoring Using Multiparty Multimodal Spoken Dialogue2014Conference paper (Refereed)
    Abstract [en]

    In this paper, we describe a project that explores a novel experi-mental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robotinteraction setup is designed, and a human-human dialogue corpus is collect-ed. The corpus targets the development of a dialogue system platform to study verbal and nonverbaltutoring strategies in mul-tiparty spoken interactions with robots which are capable of spo-ken dialogue. The dialogue task is centered on two participants involved in a dialogueaiming to solve a card-ordering game. Along with the participants sits a tutor (robot) that helps the par-ticipants perform the task, and organizes and balances their inter-action. Differentmultimodal signals captured and auto-synchronized by different audio-visual capture technologies, such as a microphone array, Kinects, and video cameras, were coupled with manual annotations. These are used build a situated model of the interaction based on the participants personalities, their state of attention, their conversational engagement and verbal domi-nance, and how that is correlated with the verbal and visual feed-back, turn-management, and conversation regulatory actions gen-erated by the tutor. Driven by the analysis of the corpus, we will show also the detailed design methodologies for an affective, and multimodally rich dialogue system that allows the robot to meas-ure incrementally the attention states, and the dominance for each participant, allowing the robot head Furhat to maintain a well-coordinated, balanced, and engaging conversation, that attempts to maximize the agreement and the contribution to solve the task. This project sets the first steps to explore the potential of us-ing multimodal dialogue systems to build interactive robots that can serve in educational, team building, and collaborative task solving applications.

  • 2.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Bollepalli, Bajibabu
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Hussen-Abdelaziz, A.
    Johansson, Martin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Koutsombogera, M.
    Lopes, J.
    Novikova, J.
    Oertel, Catharine
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Varol, G.
    Tutoring Robots: Multiparty Multimodal Social Dialogue With an Embodied Tutor2014Conference paper (Refereed)
    Abstract [en]

    This project explores a novel experimental setup towards building spoken, multi-modally rich, and human-like multiparty tutoring agent. A setup is developed and a corpus is collected that targets the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions with embodied agents. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. With the participants sits a tutor that helps the participants perform the task and organizes and balances their interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies were coupled with manual annotations to build a situated model of the interaction based on the participants personalities, their temporally-changing state of attention, their conversational engagement and verbal dominance, and the way these are correlated with the verbal and visual feedback, turn-management, and conversation regulatory actions generated by the tutor. At the end of this chapter we discuss the potential areas of research and developments this work opens and some of the challenges that lie in the road ahead.

  • 3.
    Al Moubayed, Samer
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Skantze, Gabriel
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Gustafson, Joakim
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Multimodal Multiparty Social Interaction with the Furhat Head2012Conference paper (Refereed)
    Abstract [en]

    We will show in this demonstrator an advanced multimodal and multiparty spoken conversational system using Furhat, a robot head based on projected facial animation. Furhat is a human-like interface that utilizes facial animation for physical robot heads using back-projection. In the system, multimodality is enabled using speech and rich visual input signals such as multi-person real-time face tracking and microphone tracking. The demonstrator will showcase a system that is able to carry out social dialogue with multiple interlocutors simultaneously with rich output signals such as eye and head coordination, lips synchronized speech synthesis, and non-verbal facial gestures used to regulate fluent and expressive multiparty conversations.

  • 4.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Alexanderson, Simon
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Claesson, Britt
    Derbring, Sandra
    Fredriksson, Morgan
    The Tivoli System - A Sign-driven Game for Children with Communicative Disorders2013Conference paper (Refereed)
  • 5.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Alexanderson, Simon
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Claesson, Britt
    Derbring, Sandra
    Fredriksson, Morgan
    Starck, J.
    Axelsson, E.
    Tivoli - Learning Signs Through Games and Interaction for Children with Communicative Disorders2014Conference paper (Refereed)
  • 6.
    Beskow, Jonas
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Web-enabled 3D Talking Avatars Based on WebGL and HTML52013Conference paper (Refereed)
  • 7. Chollet, M.
    et al.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Prendinger, H.
    Scherer, S.
    Public Speaking Training with a Multimodal Interactive Virtual Audience Framework2015In: ICMI '15 Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM Digital Library, 2015, p. 367-368Conference paper (Refereed)
    Abstract [en]

    We have developed an interactive virtual audience platform for public speaking training. Users' public speaking behavior is automatically analyzed using multimodal sensors, and ultimodal feedback is produced by virtual characters and generic visual widgets depending on the user's behavior. The flexibility of our system allows to compare different interaction mediums (e.g. virtual reality vs normal interaction), social situations (e.g. one-on-one meetings vs large audiences) and trained behaviors (e.g. general public speaking performance vs specific behaviors).

  • 8. Eyben, F.
    et al.
    Gilmartin, E.
    Joder, C.
    Marchi, E.
    Munier, C.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC).
    Weninger, F.
    Schuller, B.
    Socially Aware Many-to-Machine Communication2012Conference paper (Refereed)
    Abstract [en]

    This reports describes the output of the project P5: Socially Aware Many-to-Machine Communication (M2M) at the eNTERFACE’12 workshop. In this project, we designed and implemented a new front-end for handling multi-user interaction in a dialog system. We exploit the Microsoft Kinect device for capturing multimodal input and extract some features describing user and face positions. These data are then analyzed in real-time to robustly detect speech and determine both who is speaking and whether the speech is directed to the system or not. This new front-end is integrated to the SEMAINE (Sustained Emotionally colored Machine-human Interaction using Nonverbal Expression) system. Furthermore, a multimodal corpus has been created, capturing all of the system inputs in two different scenarios involving human-human and human-computer interaction.

  • 9. Koutsombogera, Maria
    et al.
    Al Moubayed, Samer
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Bollepalli, Bajibabu
    Abdelaziz, Ahmed Hussen
    Johansson, Martin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Aguas Lopes, Jose David
    Novikova, Jekaterina
    Oertel, Catharine
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Varol, Gul
    The Tutorbot Corpus - A Corpus for Studying Tutoring Behaviour in Multiparty Face-to-Face Spoken Dialogue2014Conference paper (Refereed)
    Abstract [en]

    This paper describes a novel experimental setup exploiting state-of-the-art capture equipment to collect a multimodally rich game-solving collaborative multiparty dialogue corpus. The corpus is targeted and designed towards the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. The participants were paired into teams based on their degree of extraversion as resulted from a personality test. With the participants sits a tutor that helps them perform the task, organizes and balances their interaction and whose behavior was assessed by the participants after each interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies, together with manual annotations of the tutor’s behavior constitute the Tutorbot corpus. This corpus is exploited to build a situated model of the interaction based on the participants’ temporally-changing state of attention, their conversational engagement and verbal dominance, and their correlation with the verbal and visual feedback and conversation regulatory actions generated by the tutor.

  • 10.
    Meena, Raveesh
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Dabbaghchian, Saeed
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Stefanov, Kalin
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A Data-driven Approach to Detection of Interruptions in Human-–human Conversations2014Conference paper (Refereed)
  • 11.
    Stefanov, Kalin
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH, Speech Communication and Technology. University of Southern California.
    Salvi, Giampiero (Contributor)
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition2019In: IEEE Transactions on Cognitive and Developmental Systems, ISSN 2379-8920Article in journal (Refereed)
    Abstract [en]

    This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.

  • 12.
    Stefanov, Kalin
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH, Speech Communication and Technology.
    A Kinect Corpus of Swedish Sign Language Signs2013In: Proceedings of the 2013 Workshop on Multimodal Corpora: Beyond Audio and Video, 2013Conference paper (Refereed)
  • 13. Stefanov, Kalin
    et al.
    Beskow, Jonas
    A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction2016Conference paper (Refereed)
  • 14.
    Stefanov, Kalin
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Gesture Recognition System for Isolated Sign Language Signs2016Conference paper (Refereed)
  • 15. Stefanov, Kalin
    et al.
    Salvi, Giampiero
    Kontogiorgos, Dimosthenis
    Kjellström, Hedvig
    Beskow, Jonas
    Analysis and Generation of Candidate Gaze Targets in Multiparty Open-World DialoguesManuscript (preprint) (Other academic)
  • 16.
    Stefanov, Kalin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Salvi, Giampiero
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Kontogiorgos, Dimosthenis
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Kjellström, Hedvig
    KTH, School of Electrical Engineering and Computer Science (EECS), Robotics, Perception and Learning, RPL.
    Beskow, Jonas
    KTH, School of Electrical Engineering and Computer Science (EECS), Speech, Music and Hearing, TMH.
    Modeling of Human Visual Attention in Multiparty Open-World Dialogues2019In: ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, ISSN 2573-9522, Vol. 8, no 2, article id UNSP 8Article in journal (Refereed)
    Abstract [en]

    This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.

  • 17.
    Stefanov, Kalin
    et al.
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Sugimoto, A.
    Beskow, Jonas
    KTH, School of Computer Science and Communication (CSC), Speech, Music and Hearing, TMH.
    Look Who’s Talking: Visual Identification of the Active Speaker in Multi-party Human-robot Interaction2016Conference paper (Refereed)
    Abstract [en]

    This paper presents analysis of a previously recorded multimodal interaction dataset. The primary purpose of that dataset is to explore patterns in the focus of visual attention of humans under three different conditions - two humans involved in task-based interaction with a robot; the same two humans involved in task-based interaction where the robot is replaced by a third human, and a free three-party human interaction. The paper presents a data-driven methodology for automatic visual identification of the active speaker based on facial action units (AUs). The paper also presents an evaluation of the proposed methodology on 12 different interactions with an approximate length of 4 hours. The methodology will be implemented on a robot and used to generate natural focus of visual attention behavior during multi-party human-robot interactions

1 - 17 of 17
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf