Change search
Link to record
Permanent link

Direct link
BETA
Kontogiorgos, DimosthenisORCID iD iconorcid.org/0000-0002-8874-6629
Publications (9 of 9) Show all publications
Kontogiorgos, D., Pereira, A. & Gustafson, J. (2019). Estimating Uncertainty in Task Oriented Dialogue. In: : . Paper presented at 21st ACM International Conference on Multimodal Interaction, Suzhou, Jiangsu, China. October 14-18, 2019.
Open this publication in new window or tab >>Estimating Uncertainty in Task Oriented Dialogue
2019 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Situated multimodal systems that instruct humans need to handle user uncertainties, as expressed in behaviour, and plan their actions accordingly. Speakers’ decision to reformulate or repair previous utterances depends greatly on the listeners’ signals of uncertainty. In this paper, we estimate uncertainty in a situated guided task, as leveraged in non-verbal cues expressed by the listener, and predict that the speaker will reformulate their utterance. We use a corpus where people instruct how to assemble furniture, and extract their multimodal features. While uncertainty is in cases ver- bally expressed, most instances are expressed non-verbally, which indicates the importance of multimodal approaches. In this work, we present a model for uncertainty estimation. Our findings indicate that uncertainty estimation from non- verbal cues works well, and can exceed human annotator performance when verbal features cannot be perceived.

Keywords
situated interaction, dialogue and discourse, grounding
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-261628 (URN)
Conference
21st ACM International Conference on Multimodal Interaction, Suzhou, Jiangsu, China. October 14-18, 2019
Available from: 2019-10-08 Created: 2019-10-08 Last updated: 2019-10-10
Stefanov, K., Salvi, G., Kontogiorgos, D., Kjellström, H. & Beskow, J. (2019). Modeling of Human Visual Attention in Multiparty Open-World Dialogues. ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, 8(2), Article ID UNSP 8.
Open this publication in new window or tab >>Modeling of Human Visual Attention in Multiparty Open-World Dialogues
Show others...
2019 (English)In: ACM TRANSACTIONS ON HUMAN-ROBOT INTERACTION, ISSN 2573-9522, Vol. 8, no 2, article id UNSP 8Article in journal (Refereed) Published
Abstract [en]

This study proposes, develops, and evaluates methods for modeling the eye-gaze direction and head orientation of a person in multiparty open-world dialogues, as a function of low-level communicative signals generated by his/hers interlocutors. These signals include speech activity, eye-gaze direction, and head orientation, all of which can be estimated in real time during the interaction. By utilizing these signals and novel data representations suitable for the task and context, the developed methods can generate plausible candidate gaze targets in real time. The methods are based on Feedforward Neural Networks and Long Short-Term Memory Networks. The proposed methods are developed using several hours of unrestricted interaction data and their performance is compared with a heuristic baseline method. The study offers an extensive evaluation of the proposed methods that investigates the contribution of different predictors to the accurate generation of candidate gaze targets. The results show that the methods can accurately generate candidate gaze targets when the person being modeled is in a listening state. However, when the person being modeled is in a speaking state, the proposed methods yield significantly lower performance.

Place, publisher, year, edition, pages
ASSOC COMPUTING MACHINERY, 2019
Keywords
Human-human interaction, open-world dialogue, eye-gaze direction, head orientation, multiparty
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:kth:diva-255203 (URN)10.1145/3323231 (DOI)000472066800003 ()
Note

QC 20190904

Available from: 2019-09-04 Created: 2019-09-04 Last updated: 2019-10-15Bibliographically approved
Kontogiorgos, D. (2019). Multimodal Language Grounding for Human-Robot Collaboration: YRRSDS 2019 - Dimosthenis Kontogiorgos. In: Young Researchers Roundtable on Spoken Dialogue Systems: . Paper presented at Young Researchers Roundtable on Spoken Dialogue Systems.
Open this publication in new window or tab >>Multimodal Language Grounding for Human-Robot Collaboration: YRRSDS 2019 - Dimosthenis Kontogiorgos
2019 (English)In: Young Researchers Roundtable on Spoken Dialogue Systems, 2019Conference paper, Published paper (Refereed)
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-257885 (URN)
Conference
Young Researchers Roundtable on Spoken Dialogue Systems
Note

QC 20190909

Available from: 2019-09-07 Created: 2019-09-07 Last updated: 2019-09-09Bibliographically approved
Kontogiorgos, D., Skantze, G., Abelho Pereira, A. T. & Gustafson, J. (2019). The Effects of Embodiment and Social Eye-Gaze in Conversational Agents. In: Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci): . Paper presented at 41st Annual Meeting of the Cognitive Science (CogSci), Montreal July 24th – Saturday July 27th, 2019.
Open this publication in new window or tab >>The Effects of Embodiment and Social Eye-Gaze in Conversational Agents
2019 (English)In: Proceedings of the 41st Annual Conference of the Cognitive Science Society (CogSci), 2019Conference paper, Published paper (Refereed)
Abstract [en]

The adoption of conversational agents is growing at a rapid pace. Agents however, are not optimised to simulate key social aspects of situated human conversational environments. Humans are intellectually biased towards social activity when facing more anthropomorphic agents or when presented with subtle social cues. In this work, we explore the effects of simulating anthropomorphism and social eye-gaze in three conversational agents. We tested whether subjects’ visual attention would be similar to agents in different forms of embodiment and social eye-gaze. In a within-subject situated interaction study (N=30), we asked subjects to engage in task-oriented dialogue with a smart speaker and two variations of a social robot. We observed shifting of interactive behaviour by human users, as shown in differences in behavioural and objective measures. With a trade-off in task performance, social facilitation is higher with more anthropomorphic social agents when performing the same task.

National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-255126 (URN)
Conference
41st Annual Meeting of the Cognitive Science (CogSci), Montreal July 24th – Saturday July 27th, 2019
Note

QC 20190722

Available from: 2019-07-21 Created: 2019-07-21 Last updated: 2019-07-22Bibliographically approved
Kontogiorgos, D., Abelho Pereira, A. T. & Gustafson, J. (2019). The Trade-off between Interaction Time and Social Facilitation with Collaborative Social Robots. In: The Challenges of Working on Social Robots that Collaborate with People: . Paper presented at The ACM CHI Conference on Human Factors in Computing Systems, May 4-9, Glasgow, UK.
Open this publication in new window or tab >>The Trade-off between Interaction Time and Social Facilitation with Collaborative Social Robots
2019 (English)In: The Challenges of Working on Social Robots that Collaborate with People, 2019Conference paper, Published paper (Refereed)
Abstract [en]

The adoption of social robots and conversational agents is growing at a rapid pace. These agents, however, are still not optimised to simulate key social aspects of situated human conversational environments. Humans are intellectually biased towards social activity when facing more anthropomorphic agents or when presented with subtle social cues. In this paper, we discuss the effects of simulating anthropomorphism and non-verbal social behaviour in social robots and its implications for human-robot collaborative guided tasks. Our results indicate that it is not always favourable for agents to be anthropomorphised or to communicate with nonverbal behaviour. We found a clear trade-off between interaction time and social facilitation when controlling for anthropomorphism and social behaviour.

Keywords
social robots, smart speakers
National Category
Human Computer Interaction
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-251651 (URN)
Conference
The ACM CHI Conference on Human Factors in Computing Systems, May 4-9, Glasgow, UK
Available from: 2019-05-16 Created: 2019-05-16 Last updated: 2019-05-17Bibliographically approved
Kontogiorgos, D., Avramova, V., Alexanderson, S., Jonell, P., Oertel, C., Beskow, J., . . . Gustafson, J. (2018). A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018): . Paper presented at International Conference on Language Resources and Evaluation (LREC 2018) (pp. 119-127). Paris
Open this publication in new window or tab >>A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction
Show others...
2018 (English)In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, 2018, p. 119-127Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present a corpus of multiparty situated interaction where participants collaborated on moving virtual objects on a large touch screen. A moderator facilitated the discussion and directed the interaction. The corpus contains recordings of a variety of multimodal data, in that we captured speech, eye gaze and gesture data using a multisensory setup (wearable eye trackers, motion capture and audio/video). Furthermore, in the description of the multimodal corpus, we investigate four different types of social gaze: referential gaze, joint attention, mutual gaze and gaze aversion by both perspectives of a speaker and a listener. We annotated the groups’ object references during object manipulation tasks and analysed the group’s proportional referential eye-gaze with regards to the referent object. When investigating the distributions of gaze during and before referring expressions we could corroborate the differences in time between speakers’ and listeners’ eye gaze found in earlier studies. This corpus is of particular interest to researchers who are interested in social eye-gaze patterns in turn-taking and referring language in situated multi-party interaction.

Place, publisher, year, edition, pages
Paris: , 2018
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-230238 (URN)2-s2.0-85059891166 (Scopus ID)979-10-95546-00-9 (ISBN)
Conference
International Conference on Language Resources and Evaluation (LREC 2018)
Note

QC 20180614

Available from: 2018-06-13 Created: 2018-06-13 Last updated: 2019-02-19Bibliographically approved
Jonell, P., Oertel, C., Kontogiorgos, D., Beskow, J. & Gustafson, J. (2018). Crowdsourced Multimodal Corpora Collection Tool. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018): . Paper presented at The Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 728-734). Paris
Open this publication in new window or tab >>Crowdsourced Multimodal Corpora Collection Tool
Show others...
2018 (English)In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris, 2018, p. 728-734Conference paper, Published paper (Refereed)
Abstract [en]

In recent years, more and more multimodal corpora have been created. To our knowledge there is no publicly available tool which allows for acquiring controlled multimodal data of people in a rapid and scalable fashion. We therefore are proposing (1) a novel tool which will enable researchers to rapidly gather large amounts of multimodal data spanning a wide demographic range, and (2) an example of how we used this tool for corpus collection of our "Attentive listener'' multimodal corpus. The code is released under an Apache License 2.0 and available as an open-source repository, which can be found at https://github.com/kth-social-robotics/multimodal-crowdsourcing-tool. This tool will allow researchers to set-up their own multimodal data collection system quickly and create their own multimodal corpora. Finally, this paper provides a discussion about the advantages and disadvantages with a crowd-sourced data collection tool, especially in comparison to a lab recorded corpora.

Place, publisher, year, edition, pages
Paris: , 2018
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-230236 (URN)979-10-95546-00-9 (ISBN)
Conference
The Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Note

QC 20180618

Available from: 2018-06-13 Created: 2018-06-13 Last updated: 2018-11-13Bibliographically approved
Jonell, P., Mattias, B., Per, F., Kontogiorgos, D., David Aguas Lopes, J., Malisz, Z., . . . Shore, T. (2018). FARMI: A Framework for Recording Multi-Modal Interactions. In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018): . Paper presented at The Eleventh International Conference on Language Resources and Evaluation (LREC 2018) (pp. 3969-3974). Paris: European Language Resources Association
Open this publication in new window or tab >>FARMI: A Framework for Recording Multi-Modal Interactions
Show others...
2018 (English)In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), Paris: European Language Resources Association, 2018, p. 3969-3974Conference paper, Published paper (Refereed)
Abstract [en]

In this paper we present (1) a processing architecture used to collect multi-modal sensor data, both for corpora collection and real-time processing, (2) an open-source implementation thereof and (3) a use-case where we deploy the architecture in a multi-party deception game, featuring six human players and one robot. The architecture is agnostic to the choice of hardware (e.g. microphones, cameras, etc.) and programming languages, although our implementation is mostly written in Python. In our use-case, different methods of capturing verbal and non-verbal cues from the participants were used. These were processed in real-time and used to inform the robot about the participants’ deceptive behaviour. The framework is of particular interest for researchers who are interested in the collection of multi-party, richly recorded corpora and the design of conversational systems. Moreover for researchers who are interested in human-robot interaction the available modules offer the possibility to easily create both autonomous and wizard-of-Oz interactions.

Place, publisher, year, edition, pages
Paris: European Language Resources Association, 2018
National Category
Natural Sciences Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-230237 (URN)979-10-95546-00-9 (ISBN)
Conference
The Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
Note

QC 20180618

Available from: 2018-06-13 Created: 2018-06-13 Last updated: 2018-06-18Bibliographically approved
Kontogiorgos, D. & Manikas, K. (2015). Towards identifying programming expertise with the use of physiological measures. In: : . Paper presented at Eye Movements in Programming. University of Eastern Finland
Open this publication in new window or tab >>Towards identifying programming expertise with the use of physiological measures
2015 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this position paper we propose means of measuring pro- gramming expertise on novice and expert programmers. Our approach is to measure the cognitive load of programmers while they assess Java/Python code in accordance with their experience in programming. Our hypothesis is that expert programmers encounter smaller pupillary dilation within pro- gramming problem solving tasks. We aim to evaluate our hypothesis using the EMIP Distributed Data Collection in order to confirm or reject our approach. 

Place, publisher, year, edition, pages
University of Eastern Finland: , 2015
Keywords
code comprehension, programming expertise, pupillometry
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-194351 (URN)
Conference
Eye Movements in Programming
Note

QC20180129

Available from: 2016-10-24 Created: 2016-10-24 Last updated: 2018-01-29Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-8874-6629

Search in DiVA

Show all publications