kth.sePublications
Change search
Link to record
Permanent link

Direct link
Al Moubayed, Samer
Publications (10 of 42) Show all publications
Agarwal, P., Al Moubayed, S., Alspach, A., Kim, J., Carter, E. J., Lehman, J. F. & Yamane, K. (2016). Imitating Human Movement with Teleoperated Robotic Head. In: 2016 25TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN): . Paper presented at 25th IEEE International Symposium on Robot and Human Interactive Communication (IEEE RO-MAN), AUG 26-31, 2016, Columbia Univ, Teachers Coll, New York City, NY (pp. 630-637).
Open this publication in new window or tab >>Imitating Human Movement with Teleoperated Robotic Head
Show others...
2016 (English)In: 2016 25TH IEEE INTERNATIONAL SYMPOSIUM ON ROBOT AND HUMAN INTERACTIVE COMMUNICATION (RO-MAN), 2016, p. 630-637Conference paper, Published paper (Refereed)
Abstract [en]

Effective teleoperation requires real-time control of a remote robotic system. In this work, we develop a controller for realizing smooth and accurate motion of a robotic head with application to a teleoperation system for the Furhat robot head [1], which we call TeleFurhat. The controller uses the head motion of an operator measured by a Microsoft Kinect 2 sensor as reference and applies a processing framework to condition and render the motion on the robot head. The processing framework includes a pre-filter based on a moving average filter, a neural network-based model for improving the accuracy of the raw pose measurements of Kinect, and a constrained-state Kalman filter that uses a minimum jerk model to smooth motion trajectories and limit the magnitude of changes in position, velocity, and acceleration. Our results demonstrate that the robot can reproduce the human head motion in real time with a latency of approximately 100 to 170 ms while operating within its physical limits. Furthermore, viewers prefer our new method over rendering the raw pose data from Kinect.

Series
IEEE RO-MAN, ISSN 1944-9445
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-200244 (URN)10.1109/ROMAN.2016.7745184 (DOI)000390682500081 ()2-s2.0-85002840070 (Scopus ID)978-1-5090-3929-6 (ISBN)
Conference
25th IEEE International Symposium on Robot and Human Interactive Communication (IEEE RO-MAN), AUG 26-31, 2016, Columbia Univ, Teachers Coll, New York City, NY
Note

QC 20170214

Available from: 2017-02-14 Created: 2017-02-14 Last updated: 2025-02-09Bibliographically approved
Persson, A., Al Moubayed, S. & Loutfi, A. (2014). Fluent Human-Robot Dialogues About Grounded Objects in Home Environments. Cognitive Computation, 6(4), 914-927
Open this publication in new window or tab >>Fluent Human-Robot Dialogues About Grounded Objects in Home Environments
2014 (English)In: Cognitive Computation, ISSN 1866-9956, E-ISSN 1866-9964, Vol. 6, no 4, p. 914-927Article in journal (Refereed) Published
Abstract [en]

To provide a spoken interaction between robots and human users, an internal representation of the robots sensory information must be available at a semantic level and accessible to a dialogue system in order to be used in a human-like and intuitive manner. In this paper, we integrate the fields of perceptual anchoring (which creates and maintains the symbol-percept correspondence of objects) in robotics with multimodal dialogues in order to achieve a fluent interaction between humans and robots when talking about objects. These everyday objects are located in a so-called symbiotic system where humans, robots, and sensors are co-operating in a home environment. To orchestrate the dialogue system, the IrisTK dialogue platform is used. The IrisTK system is based on modelling the interaction of events, between different modules, e.g. speech recognizer, face tracker, etc. This system is running on a mobile robot device, which is part of a distributed sensor network. A perceptual anchoring framework, recognizes objects placed in the home and maintains a consistent identity of the objects consisting of their symbolic and perceptual data. Particular effort is placed on creating flexible dialogues where requests to objects can be made in a variety of ways. Experimental validation consists of evaluating the system when many objects are possible candidates for satisfying these requests.

Keywords
Human-robot interaction, Perceptual anchoring, Symbol grounding, Spoken dialogue systems, Social robotics
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-158441 (URN)10.1007/s12559-014-9291-y (DOI)000345994900022 ()2-s2.0-84916227381 (Scopus ID)
Funder
Swedish Research Council
Note

QC 20150108

Available from: 2015-01-08 Created: 2015-01-08 Last updated: 2024-03-15Bibliographically approved
Al Moubayed, S., Beskow, J., Bollepalli, B., Gustafson, J., Hussen-Abdelaziz, A., Johansson, M., . . . Varol, G. (2014). Human-robot Collaborative Tutoring Using Multiparty Multimodal Spoken Dialogue. In: : . Paper presented at 9th Annual ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, Germany. IEEE conference proceedings
Open this publication in new window or tab >>Human-robot Collaborative Tutoring Using Multiparty Multimodal Spoken Dialogue
Show others...
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we describe a project that explores a novel experi-mental setup towards building a spoken, multi-modally rich, and human-like multiparty tutoring robot. A human-robotinteraction setup is designed, and a human-human dialogue corpus is collect-ed. The corpus targets the development of a dialogue system platform to study verbal and nonverbaltutoring strategies in mul-tiparty spoken interactions with robots which are capable of spo-ken dialogue. The dialogue task is centered on two participants involved in a dialogueaiming to solve a card-ordering game. Along with the participants sits a tutor (robot) that helps the par-ticipants perform the task, and organizes and balances their inter-action. Differentmultimodal signals captured and auto-synchronized by different audio-visual capture technologies, such as a microphone array, Kinects, and video cameras, were coupled with manual annotations. These are used build a situated model of the interaction based on the participants personalities, their state of attention, their conversational engagement and verbal domi-nance, and how that is correlated with the verbal and visual feed-back, turn-management, and conversation regulatory actions gen-erated by the tutor. Driven by the analysis of the corpus, we will show also the detailed design methodologies for an affective, and multimodally rich dialogue system that allows the robot to meas-ure incrementally the attention states, and the dominance for each participant, allowing the robot head Furhat to maintain a well-coordinated, balanced, and engaging conversation, that attempts to maximize the agreement and the contribution to solve the task. This project sets the first steps to explore the potential of us-ing multimodal dialogue systems to build interactive robots that can serve in educational, team building, and collaborative task solving applications.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2014
Keywords
Furhat robot; Human-robot collaboration; Human-robot interaction; Multiparty interaction; Spoken dialog
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-145511 (URN)10.1145/2559636.2563681 (DOI)000455229400029 ()2-s2.0-84896934381 (Scopus ID)
Conference
9th Annual ACM/IEEE International Conference on Human-Robot Interaction, Bielefeld, Germany
Note

QC 20161018

Available from: 2014-05-21 Created: 2014-05-21 Last updated: 2024-03-15Bibliographically approved
Al Moubayed, S., Beskow, J. & Skantze, G. (2014). Spontaneous spoken dialogues with the Furhat human-like robot head. In: HRI '14 Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction: . Paper presented at HRI'14 2014 ACM/IEEE international conference on Human-robot interaction, Bielefeld, Germany — March 03 - 06, 2014 (pp. 326). Bielefeld, Germany
Open this publication in new window or tab >>Spontaneous spoken dialogues with the Furhat human-like robot head
2014 (English)In: HRI '14 Proceedings of the 2014 ACM/IEEE international conference on Human-robot interaction, Bielefeld, Germany, 2014, p. 326-Conference paper, Published paper (Refereed)
Abstract [en]

We will show in this demonstrator an advanced multimodal and multiparty spoken conversational system using Furhat, a robot head based on projected facial animation. Furhat is an anthropomorphic robot head that utilizes facial animation for physical robot heads using back-projection. In the system, multimodality is enabled using speech and rich visual input signals such as multi-person real-time face tracking and microphone tracking. The demonstrator will showcase a system that is able to carry out social dialogue with multiple interlocutors simultaneously with rich output signals such as eye and head coordination, lips synchronized speech synthesis, and non-verbal facial gestures used to regulate fluent and expressive multiparty conversations. The dialogue design is performed using the IrisTK [4] dialogue authoring toolkit developed at KTH. The system will also be able to perform a moderator in a quiz-game showing different strategies for regulating spoken situated interactions.

Place, publisher, year, edition, pages
Bielefeld, Germany: , 2014
Keywords
Human-Robot Interaction, Multiparty interaction, human-robot collaboration, Spoken dialog, Furhat robot, conversational man-agement.
National Category
Computer Sciences Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-158150 (URN)10.1145/2559636.2559781 (DOI)000455229400135 ()
Conference
HRI'14 2014 ACM/IEEE international conference on Human-robot interaction, Bielefeld, Germany — March 03 - 06, 2014
Note

tmh_import_14_12_30, tmh_id_3913. QC 20150203

Available from: 2014-12-30 Created: 2014-12-30 Last updated: 2025-02-01Bibliographically approved
Koutsombogera, M., Al Moubayed, S., Bollepalli, B., Abdelaziz, A. H., Johansson, M., Aguas Lopes, J. D., . . . Varol, G. (2014). The Tutorbot Corpus - A Corpus for Studying Tutoring Behaviour in Multiparty Face-to-Face Spoken Dialogue. In: : . Paper presented at 9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland. EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA
Open this publication in new window or tab >>The Tutorbot Corpus - A Corpus for Studying Tutoring Behaviour in Multiparty Face-to-Face Spoken Dialogue
Show others...
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This paper describes a novel experimental setup exploiting state-of-the-art capture equipment to collect a multimodally rich game-solving collaborative multiparty dialogue corpus. The corpus is targeted and designed towards the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. The participants were paired into teams based on their degree of extraversion as resulted from a personality test. With the participants sits a tutor that helps them perform the task, organizes and balances their interaction and whose behavior was assessed by the participants after each interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies, together with manual annotations of the tutor’s behavior constitute the Tutorbot corpus. This corpus is exploited to build a situated model of the interaction based on the participants’ temporally-changing state of attention, their conversational engagement and verbal dominance, and their correlation with the verbal and visual feedback and conversation regulatory actions generated by the tutor.

Place, publisher, year, edition, pages
EUROPEAN LANGUAGE RESOURCES ASSOC-ELRA, 2014
Keywords
Multimodal corpus; Multiparty Interaction; Tutor
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-173469 (URN)000355611005138 ()2-s2.0-84990228583 (Scopus ID)
Conference
9th International Conference on Language Resources and Evaluation, Reykjavik, Iceland
Note

QC 20161017

Available from: 2015-09-15 Created: 2015-09-11 Last updated: 2024-03-15Bibliographically approved
Al Moubayed, S., Beskow, J., Bollepalli, B., Hussen-Abdelaziz, A., Johansson, M., Koutsombogera, M., . . . Varol, G. (2014). Tutoring Robots: Multiparty Multimodal Social Dialogue With an Embodied Tutor. In: : . Paper presented at 9th International Summer Workshop on Multimodal Interfaces, Lisbon, Portugal. Springer Berlin/Heidelberg
Open this publication in new window or tab >>Tutoring Robots: Multiparty Multimodal Social Dialogue With an Embodied Tutor
Show others...
2014 (English)Conference paper, Published paper (Refereed)
Abstract [en]

This project explores a novel experimental setup towards building spoken, multi-modally rich, and human-like multiparty tutoring agent. A setup is developed and a corpus is collected that targets the development of a dialogue system platform to explore verbal and nonverbal tutoring strategies in multiparty spoken interactions with embodied agents. The dialogue task is centered on two participants involved in a dialogue aiming to solve a card-ordering game. With the participants sits a tutor that helps the participants perform the task and organizes and balances their interaction. Different multimodal signals captured and auto-synchronized by different audio-visual capture technologies were coupled with manual annotations to build a situated model of the interaction based on the participants personalities, their temporally-changing state of attention, their conversational engagement and verbal dominance, and the way these are correlated with the verbal and visual feedback, turn-management, and conversation regulatory actions generated by the tutor. At the end of this chapter we discuss the potential areas of research and developments this work opens and some of the challenges that lie in the road ahead.

Place, publisher, year, edition, pages
Springer Berlin/Heidelberg, 2014
Keywords
Conversational Dominance; Embodied Agent; Multimodal; Multiparty; Non-verbal Signals; Social Robot; Spoken Dialogue; Turn-taking; Tutor; Visual Attention
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-158149 (URN)000349440300004 ()2-s2.0-84927643008 (Scopus ID)
Conference
9th International Summer Workshop on Multimodal Interfaces, Lisbon, Portugal
Note

QC 20161018

Available from: 2014-12-30 Created: 2014-12-30 Last updated: 2024-03-15Bibliographically approved
Al Moubayed, S., Heylen, D., Bohus, D., Koutsombogera, M., Papageorgiou, H., Esposito, A. & Skantze, G. (2014). UM3I 2014: International workshop on understanding and modeling multiparty, multimodal interactions. In: ICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction: . Paper presented at 16th ACM International Conference on Multimodal Interaction, ICMI 2014, 12 November 2014 through 16 November 2014 (pp. 537-538). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>UM3I 2014: International workshop on understanding and modeling multiparty, multimodal interactions
Show others...
2014 (English)In: ICMI 2014 - Proceedings of the 2014 International Conference on Multimodal Interaction, Association for Computing Machinery (ACM), 2014, p. 537-538Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we present a brief summary of the international workshop on Modeling Multiparty, Multimodal Interactions. The UM3I 2014 workshop is held in conjunction with the ICMI 2014 conference. The workshop will highlight recent developments and adopted methodologies in the analysis and modeling of multiparty and multimodal interactions, the design and implementation principles of related human-machine interfaces, as well as the identification of potential limitations and ways of overcoming them.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2014
Keywords
Human-computer interaction, Modeling, Multimodality, Multiparty interaction, Summary, Workshop, Interactive computer systems, Models, User interfaces, Design and implementations, Human Machine Interface, International workshops, Multi-Modal Interactions, Multi-modality, Multi-party interactions, Human computer interaction
National Category
Other Engineering and Technologies Computer and Information Sciences Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-181654 (URN)10.1145/2663204.2668321 (DOI)2-s2.0-84947282619 (Scopus ID)9781450328852 (ISBN)
Conference
16th ACM International Conference on Multimodal Interaction, ICMI 2014, 12 November 2014 through 16 November 2014
Note

QC 20160202

Available from: 2016-02-02 Created: 2016-02-02 Last updated: 2025-02-18Bibliographically approved
Al Moubayed, S., Edlund, J. & Gustafson, J. (2013). Analysis of gaze and speech patterns in three-party quiz game interaction. In: Interspeech 2013: . Paper presented at 14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013. ISCA 2013 (pp. 1126-1130). The International Speech Communication Association (ISCA)
Open this publication in new window or tab >>Analysis of gaze and speech patterns in three-party quiz game interaction
2013 (English)In: Interspeech 2013, The International Speech Communication Association (ISCA), 2013, p. 1126-1130Conference paper, Published paper (Refereed)
Abstract [en]

In order to understand and model the dynamics between interaction phenomena such as gaze and speech in face-to-face multiparty interaction between humans, we need large quantities of reliable, objective data of such interactions. To date, this type of data is in short supply. We present a data collection setup using automated, objective techniques in which we capture the gaze and speech patterns of triads deeply engaged in a high-stakes quiz game. The resulting corpus consists of five one-hour recordings, and is unique in that it makes use of three state-of-the-art gaze trackers (one per subject) in combination with a state-of-theart conical microphone array designed to capture roundtable meetings. Several video channels are also included. In this paper we present the obstacles we encountered and the possibilities afforded by a synchronised, reliable combination of large-scale multi-party speech and gaze data, and an overview of the first analyses of the data. Index Terms: multimodal corpus, multiparty dialogue, gaze patterns, multiparty gaze.

Place, publisher, year, edition, pages
The International Speech Communication Association (ISCA), 2013
National Category
Computer Sciences Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-137388 (URN)000395050000238 ()2-s2.0-84906231582 (Scopus ID)
Conference
14th Annual Conference of the International Speech Communication Association, Lyon, France, August 25-29, 2013. ISCA 2013
Note

QC 20140603

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2025-02-01Bibliographically approved
Edlund, J., Al Moubayed, S., Tånnander, C. & Gustafson, J. (2013). Audience response system based annotation of speech. In: Proceedings of Fonetik 2013: . Paper presented at XXVIth Annual Phonetics Meeting Fonetik 2013; Linköping, Sweden, 12–13 June, 2013 (pp. 13-16). Linköping: Linköping University
Open this publication in new window or tab >>Audience response system based annotation of speech
2013 (English)In: Proceedings of Fonetik 2013, Linköping: Linköping University , 2013, p. 13-16Conference paper, Published paper (Other academic)
Abstract [en]

Manual annotators are often used to label speech. The task is associated with high costs and with great time consumption. We suggest to reach an increased throughput while maintaining a high measure of experimental control by borrowing from the Audience Response Systems used in the film and television industries, and demonstrate a cost-efficient setup for rapid, plenary annotation of phenomena occurring in recorded speech together with some results from studies we have undertaken to quantify the temporal precision and reliability of such annotations.

Place, publisher, year, edition, pages
Linköping: Linköping University, 2013
Series
Studies in Language and Culture, ISSN 1403-2570 ; 21
National Category
Computer Sciences Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-137389 (URN)978-91-7519-582-7 (ISBN)978-91-7519-579-7 (ISBN)
Conference
XXVIth Annual Phonetics Meeting Fonetik 2013; Linköping, Sweden, 12–13 June, 2013
Note

QC 20140219

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2025-02-01Bibliographically approved
Edlund, J., Al Moubayed, S. & Beskow, J. (2013). Co-present or Not?: Embodiment, Situatedness and the Mona Lisa Gaze Effect. In: Nakano, Yukiko; Conati, Cristina; Bader, Thomas (Ed.), Eye gaze in intelligent user interfaces: gaze-based analyses, models and applications (pp. 185-203). London: Springer London
Open this publication in new window or tab >>Co-present or Not?: Embodiment, Situatedness and the Mona Lisa Gaze Effect
2013 (English)In: Eye gaze in intelligent user interfaces: gaze-based analyses, models and applications / [ed] Nakano, Yukiko; Conati, Cristina; Bader, Thomas, London: Springer London, 2013, p. 185-203Chapter in book (Refereed)
Abstract [en]

The interest in embodying and situating computer programmes took off in the autonomous agents community in the 90s. Today, researchers and designers of programmes that interact with people on human terms endow their systems with humanoid physiognomies for a variety of reasons. In most cases, attempts at achieving this embodiment and situatedness has taken one of two directions: virtual characters and actual physical robots. In addition, a technique that is far from new is gaining ground rapidly: projection of animated faces on head-shaped 3D surfaces. In this chapter, we provide a history of this technique; an overview of its pros and cons; and an in-depth description of the cause and mechanics of the main drawback of 2D displays of 3D faces (and objects): the Mona Liza gaze effect. We conclude with a description of an experimental paradigm that measures perceived directionality in general and the Mona Lisa gaze effect in particular.

Place, publisher, year, edition, pages
London: Springer London, 2013
National Category
Computer Sciences Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-137382 (URN)10.1007/978-1-4471-4784-8_10 (DOI)
Note

Part of ISBN 978-1-4471-4783-1, 978-1-4471-4784-8

QC 20250214

Available from: 2013-12-13 Created: 2013-12-13 Last updated: 2025-02-14Bibliographically approved
Organisations

Search in DiVA

Show all publications