Endre søk
Link to record
Permanent link

Direct link
BETA
Publikasjoner (10 av 15) Visa alla publikasjoner
Stefanov, K., Salvi, G., Kontogiorgos, D., Kjellström, H. & Beskow, J. (2019). Modeling of Human Visual Attention in Multiparty Open-World Dialogues. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 8(2), Article ID UNSP 8.
Åpne denne publikasjonen i ny fane eller vindu >>Modeling of Human Visual Attention in Multiparty Open-World Dialogues
Vise andre…
2019 (engelsk)Inngår i: ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, ISSN 2573-9522, Vol. 8, nr 2, artikkel-id UNSP 8Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.

sted, utgiver, år, opplag, sider
ASSOC COMPUTING MACHINERY, 2019
Emneord
Human-human interaction, open-world dialogue, eye-gaze direction, head orientation, multiparty
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-255203 (URN)10.1145/3323231 (DOI)000472066800003 ()
Merknad

QC 20190904

Tilgjengelig fra: 2019-09-04 Laget: 2019-09-04 Sist oppdatert: 2019-10-15bibliografisk kontrollert
Stefanov, K. (2019). Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition. IEEE Transactions on Cognitive and Developmental Systems
Åpne denne publikasjonen i ny fane eller vindu >>Self-Supervised Vision-Based Detection of the Active Speaker as Support for Socially-Aware Language Acquisition
2019 (engelsk)Inngår i: IEEE Transactions on Cognitive and Developmental Systems, ISSN 2379-8920Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

This paper presents a self-supervised method for visual detection of the active speaker in a multi-person spoken interaction scenario. Active speaker detection is a fundamental prerequisite for any artificial cognitive system attempting to acquire language in social settings. The proposed method is intended to complement the acoustic detection of the active speaker, thus improving the system robustness in noisy conditions. The method can detect an arbitrary number of possibly overlapping active speakers based exclusively on visual information about their face. Furthermore, the method does not rely on external annotations, thus complying with cognitive development. Instead, the method uses information from the auditory modality to support learning in the visual domain. This paper reports an extensive evaluation of the proposed method using a large multi-person face-to-face interaction dataset. The results show good performance in a speaker dependent setting. However, in a speaker independent setting the proposed method yields a significantly lower performance. We believe that the proposed method represents an essential component of any artificial cognitive system or robotic platform engaging in social interactions.

HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-260126 (URN)10.1109/TCDS.2019.2927941 (DOI)2-s2.0-85069908129 (Scopus ID)
Merknad

QC 20191011

Tilgjengelig fra: 2019-09-25 Laget: 2019-09-25 Sist oppdatert: 2019-10-11bibliografisk kontrollert
Stefanov, K. & Beskow, J. (2016). A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction. In: : . Paper presented at Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016, 23-28 of May.. ELRA
Åpne denne publikasjonen i ny fane eller vindu >>A Multi-party Multi-modal Dataset for Focus of Visual Attention in Human-human and Human-robot Interaction
2016 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
sted, utgiver, år, opplag, sider
ELRA, 2016
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-187954 (URN)
Konferanse
Proceedings of the 10th edition of the Language Resources and Evaluation Conference (LREC 2016, 23-28 of May.
Merknad

QC 20160602

Tilgjengelig fra: 2016-06-02 Laget: 2016-06-02 Sist oppdatert: 2018-05-16bibliografisk kontrollert
Chollet, M., Stefanov, K., Prendinger, H. & Scherer, S. (2015). Public Speaking Training with a Multimodal Interactive Virtual Audience Framework. In: ICMI '15 Proceedings of the 2015 ACM on International Conference on Multimodal Interaction: . Paper presented at 17th ACM International Conference on Multimodal Interaction ICMI 2015,New York, NY (pp. 367-368). ACM Digital Library
Åpne denne publikasjonen i ny fane eller vindu >>Public Speaking Training with a Multimodal Interactive Virtual Audience Framework
2015 (engelsk)Inngår i: ICMI '15 Proceedings of the 2015 ACM on International Conference on Multimodal Interaction, ACM Digital Library, 2015, s. 367-368Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

We have developed an interactive virtual audience platform for public speaking training. Users' public speaking behavior is automatically analyzed using multimodal sensors, and ultimodal feedback is produced by virtual characters and generic visual widgets depending on the user's behavior. The flexibility of our system allows to compare different interaction mediums (e.g. virtual reality vs normal interaction), social situations (e.g. one-on-one meetings vs large audiences) and trained behaviors (e.g. general public speaking performance vs specific behaviors).

sted, utgiver, år, opplag, sider
ACM Digital Library, 2015
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-180569 (URN)10.1145/2818346.2823294 (DOI)000380609500058 ()2-s2.0-84959308165 (Scopus ID)
Konferanse
17th ACM International Conference on Multimodal Interaction ICMI 2015,New York, NY
Merknad

QC 20160125

Tilgjengelig fra: 2016-01-19 Laget: 2016-01-19 Sist oppdatert: 2016-09-20bibliografisk kontrollert
Meena, R., Dabbaghchian, S. & Stefanov, K. (2014). A Data-driven Approach to Detection of Interruptions in Human-–human Conversations. In: : . Paper presented at FONETIK, Stockholm, Sweden.
Åpne denne publikasjonen i ny fane eller vindu >>A Data-driven Approach to Detection of Interruptions in Human-–human Conversations
2014 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-158181 (URN)
Konferanse
FONETIK, Stockholm, Sweden
Merknad

QC 20161017

Tilgjengelig fra: 2014-12-30 Laget: 2014-12-30 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Al Moubayed, S., Beskow, J., Bollepalli, B., Gustafson, J., Hussen-Abdelaziz, A., Johansson, M., . . . Varol, G. (2014). Human-robot Collaborative Tutoring Using Multiparty Multimodal Spoken Dialogue. In: : . Paper presented at 9th Annual ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, Germany. IEEE conference proceedings
Åpne denne publikasjonen i ny fane eller vindu >>Human-robot Collaborative Tutoring Using Multiparty Multimodal Spoken Dialogue
Vise andre…
2014 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In this paper, we describe a project that explores a novel experi-mental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robotinteraction setup is designed, and a human-human dialogue corpus is collect-ed. The corpus targets the development of a dialogue system platform to study verbal and nonverbaltutoring strategies in mul-tiparty spoken interactions with robots which are capable of spo-ken dialogue. The dialogue task is centered on two participants involved in a dialogueaiming to solve a card-ordering game. Along with the participants sits a tutor (robot) that helps the par-ticipants perform the task, and organizes and balances their inter-action. Differentmultimodal signals captured and auto-synchronized by different audio-visual capture technologies, such as a microphone array, Kinects, and video cameras, were coupled with manual annotations. These are used build a situated model of the interaction based on the participants personalities, their state of attention, their conversational engagement and verbal domi-nance, and how that is correlated with the verbal and visual feed-back, turn-management, and conversation regulatory actions gen-erated by the tutor. Driven by the analysis of the corpus, we will show also the detailed design methodologies for an affective, and multimodally rich dialogue system that allows the robot to meas-ure incrementally the attention states, and the dominance for each participant, allowing the robot head Furhat to maintain a well-coordinated, balanced, and engaging conversation, that attempts to maximize the agreement and the contribution to solve the task. This project sets the first steps to explore the potential of us-ing multimodal dialogue systems to build interactive robots that can serve in educational, team building, and collaborative task solving applications.

sted, utgiver, år, opplag, sider
IEEE conference proceedings, 2014
Emneord
Furhat robot; Human-robot collaboration; Human-robot interaction; Multiparty interaction; Spoken dialog
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-145511 (URN)10.1145/2559636.2563681 (DOI)2-s2.0-84896934381 (Scopus ID)
Konferanse
9th Annual ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, Germany
Merknad

QC 20161018

Tilgjengelig fra: 2014-05-21 Laget: 2014-05-21 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Koutsombogera, M., Al Moubayed, S., Bollepalli, B., Abdelaziz, A. H., Johansson, M., Aguas Lopes, J. D., . . . Varol, G. (2014). The Tutorbot Corpus - A Corpus for Studying Tutoring Behaviour in Multiparty Face-to-Face Spoken Dialogue. In: : . Paper presented at 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland. EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA
Åpne denne publikasjonen i ny fane eller vindu >>The Tutorbot Corpus - A Corpus for Studying Tutoring Behaviour in Multiparty Face-to-Face Spoken Dialogue
Vise andre…
2014 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper describes a novel experimental setup exploiting state-of-the-art capture equipment to collect a multimodally rich game-solving collaborative multiparty dialogue corpus. The corpus is targeted and designed towards the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. The participants were paired into teams based on their degree of extraversion as resulted from a personality test. With the participants sits a tutor that helps them perform the task, organizes and balances their interaction and whose behavior was assessed by the participants after each interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies, together with manual annotations of the tutor’s behavior constitute the Tutorbot corpus. This corpus is exploited to build a situated model of the interaction based on the participants’ temporally-changing state of attention, their conversational engagement and verbal dominance, and their correlation with the verbal and visual feedback and conversation regulatory actions generated by the tutor.

sted, utgiver, år, opplag, sider
EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, 2014
Emneord
Multimodal corpus; Multiparty Interaction; Tutor
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-173469 (URN)000355611005138 ()
Konferanse
9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland
Merknad

QC 20161017

Tilgjengelig fra: 2015-09-15 Laget: 2015-09-11 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Beskow, J., Alexanderson, S., Stefanov, K., Claesson, B., Derbring, S., Fredriksson, M., . . . Axelsson, E. (2014). Tivoli - Learning Signs Through Games and Interaction for Children with Communicative Disorders. In: : . Paper presented at 6th Biennial Conference of the International Society for Augmentative and Alternative Communication, Lisbon, Portugal.
Åpne denne publikasjonen i ny fane eller vindu >>Tivoli - Learning Signs Through Games and Interaction for Children with Communicative Disorders
Vise andre…
2014 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-165816 (URN)
Konferanse
6th Biennial Conference of the International Society for Augmentative and Alternative Communication, Lisbon, Portugal
Merknad

QC 20161018

Tilgjengelig fra: 2015-04-29 Laget: 2015-04-29 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Al Moubayed, S., Beskow, J., Bollepalli, B., Hussen-Abdelaziz, A., Johansson, M., Koutsombogera, M., . . . Varol, G. (2014). Tutoring Robots: Multiparty Multimodal Social Dialogue With an Embodied Tutor. In: : . Paper presented at 9th International Summer Workshop on Multimodal Interfaces, Lisbon, Portugal. Springer Berlin/Heidelberg
Åpne denne publikasjonen i ny fane eller vindu >>Tutoring Robots: Multiparty Multimodal Social Dialogue With an Embodied Tutor
Vise andre…
2014 (engelsk)Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This project explores a novel experimental setup towards building spoken, multi-modally rich, and human-like multiparty tutoring agent. A setup is developed and a corpus is collected that targets the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions with embodied agents. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. With the participants sits a tutor that helps the participants perform the task and organizes and balances their interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies were coupled with manual annotations to build a situated model of the interaction based on the participants personalities, their temporally-changing state of attention, their conversational engagement and verbal dominance, and the way these are correlated with the verbal and visual feedback, turn-management, and conversation regulatory actions generated by the tutor. At the end of this chapter we discuss the potential areas of research and developments this work opens and some of the challenges that lie in the road ahead.

sted, utgiver, år, opplag, sider
Springer Berlin/Heidelberg, 2014
Emneord
Conversational Dominance; Embodied Agent; Multimodal; Multiparty; Non-verbal Signals; Social Robot; Spoken Dialogue; Turn-taking; Tutor; Visual Attention
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-158149 (URN)000349440300004 ()2-s2.0-84927643008 (Scopus ID)
Konferanse
9th International Summer Workshop on Multimodal Interfaces, Lisbon, Portugal
Merknad

QC 20161018

Tilgjengelig fra: 2014-12-30 Laget: 2014-12-30 Sist oppdatert: 2018-01-11bibliografisk kontrollert
Stefanov, K. & Beskow, J. (2013). A Kinect Corpus of Swedish Sign Language Signs. In: Proceedings of the 2013 Workshop on Multimodal Corpora: Beyond Audio and Video. Paper presented at Multimodal Corpora: Beyond Audio and Video, Edinburgh, UK, 2013.
Åpne denne publikasjonen i ny fane eller vindu >>A Kinect Corpus of Swedish Sign Language Signs
2013 (engelsk)Inngår i: Proceedings of the 2013 Workshop on Multimodal Corpora: Beyond Audio and Video, 2013Konferansepaper, Publicerat paper (Fagfellevurdert)
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-137412 (URN)
Konferanse
Multimodal Corpora: Beyond Audio and Video, Edinburgh, UK, 2013
Merknad

QC 20161013

Tilgjengelig fra: 2013-12-13 Laget: 2013-12-13 Sist oppdatert: 2018-05-16bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-0861-8660