kth.sePublications
Change search
Link to record
Permanent link

Direct link
Kontogiorgos, DimosthenisORCID iD iconorcid.org/0000-0002-8874-6629
Publications (10 of 29) Show all publications
Kontogiorgos, D. (2022). Mutual Understanding in Situated Interactions with Conversational User Interfaces: Theory, Studies, and Computation. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Mutual Understanding in Situated Interactions with Conversational User Interfaces: Theory, Studies, and Computation
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This dissertation presents advances in HCI through a series of studies focusing on task-oriented interactions between humans and between humans and machines. The notion of mutual understanding is central, also known as grounding in psycholinguistics, in particular how people establish understanding in conversations and what interactional phenomena are present in that process. Addressing the gap in computational models of understanding, interactions in this dissertation are observed through multisensory input and evaluated with statistical and machine-learning models. As it becomes apparent, miscommunication is ordinary in human conversations and therefore embodied computer interfaces interacting with humans are subject to a large number of conversational failures. Investigating how these inter- faces can evaluate human responses to distinguish whether spoken utterances are understood is one of the central contributions of this thesis.

The first papers (Papers A and B) included in this dissertation describe studies on how humans establish understanding incrementally and how they co-produce utterances to resolve misunderstandings in joint-construction tasks. Utilising the same interaction paradigm from such human-human settings, the remaining papers describe collaborative interactions between humans and machines with two central manipulations: embodiment (Papers C, D, E, and F) and conversational failures (Papers D, E, F, and G). The methods used investigate whether embodiment affects grounding behaviours among speakers and what verbal and non-verbal channels are utilised in response and recovery to miscommunication. For application to robotics and conversational user interfaces, failure detection systems are developed predicting in real-time user uncertainty, paving the way for new multimodal computer interfaces that are aware of dialogue breakdown and system failures.

Through the lens of Theory, Studies, and Computation, a comprehensive overview is presented on how mutual understanding has been observed in interactions with humans and between humans and machines. A summary of literature in mutual understanding from psycholinguistics and human-computer interaction perspectives is reported. An overview is also presented on how prior knowledge in mutual understanding has and can be observed through experimentation and empirical studies, along with perspectives of how knowledge acquired through observation is put into practice through the analysis and development of computational models. Derived from literature and empirical observations, the central thesis of this dissertation is that embodiment and mutual understanding are intertwined in task-oriented interactions, both in successful communication but also in situations of miscommunication.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2022. p. xxi, 139
Series
TRITA-EECS-AVL ; 2022-10
Keywords
human-computer interaction, social robots, smart-speakers, multimodal behaviours, social signal processing, common ground, dialogue and discourse, joint-construction tasks, embodiment, conversational failures
National Category
Human Computer Interaction
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-308927 (URN)978-91-8040-137-1 (ISBN)
Public defence
2022-03-11, https://kth-se.zoom.us/j/62813774919, Kollegiesalen, Brinellvägen 8, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20220216

Available from: 2022-02-16 Created: 2022-02-15 Last updated: 2022-06-25Bibliographically approved
Kontogiorgos, D., Tran, M., Gustafsson, J. & Soleymani, M. (2021). A Systematic Cross-Corpus Analysis of Human Reactions to Robot Conversational Failures. In: ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction: . Paper presented at 23rd ACM International Conference on Multimodal Interaction, ICMI 2021, 18 October 2021 through 22 October 2021 (pp. 112-120). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>A Systematic Cross-Corpus Analysis of Human Reactions to Robot Conversational Failures
2021 (English)In: ICMI 2021 - Proceedings of the 2021 International Conference on Multimodal Interaction, Association for Computing Machinery (ACM) , 2021, p. 112-120Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we analyze multimodal behavioral responses to robot failures across different tasks. Two multimodal datasets are examined in which humans interact with guided-task robots in task-oriented dialogues. In both datasets, the robots simulated failures of conversational breakdown and miscommunication typically observed in human-robot interactions. We closely examine human reactions to these failures looking at facial and acoustic features. Our analyses identify the significant behavioral features for automatic detection of such failures in interaction. We also examine human responses to different types of robot failures and if failures occurred early or late in the interaction cause variation in the responses. Our findings indicate that several nonverbal behaviors are consistently present in responses to robots' failures, e.g., gaze and speech prosody, whereas, linguistic features appear to be task-dependent. We discuss how these findings may generalize to other tasks, and how autonomous robots may identify opportunities to detect and recover from failures in interactions with humans.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
behavioral responses, miscommunication, social signal processing, Human robot interaction, Linguistics, Man machine systems, Signal processing, Behavioral response, Cross corpus analysis, Facial feature, Human reaction, Humans-robot interactions, Multi-modal, Multi-modal dataset, Task-oriented, Behavioral research
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-299870 (URN)10.1145/3462244.3479887 (DOI)2-s2.0-85119019530 (Scopus ID)
Conference
23rd ACM International Conference on Multimodal Interaction, ICMI 2021, 18 October 2021 through 22 October 2021
Note

Part of proceedings: ISBN 978-1-4503-8481-0

QC 20220602

Available from: 2021-08-18 Created: 2021-08-18 Last updated: 2022-07-06Bibliographically approved
Kontogiorgos, D., Abelho Pereira, A. T. & Gustafsson, J. (2021). Grounding behaviours with conversational interfaces: effects of embodiment and failures. Journal on Multimodal User Interfaces, 15(2), 239-254
Open this publication in new window or tab >>Grounding behaviours with conversational interfaces: effects of embodiment and failures
2021 (English)In: Journal on Multimodal User Interfaces, ISSN 1783-7677, E-ISSN 1783-8738, Vol. 15, no 2, p. 239-254Article in journal (Refereed) Published
Abstract [en]

Conversational interfaces that interact with humans need to continuously establish, maintain and repair common ground in task-oriented dialogues. Uncertainty, repairs and acknowledgements are expressed in user behaviour in the continuous efforts of the conversational partners to maintain mutual understanding. Users change their behaviour when interacting with systems in different forms of embodiment, which affects the abilities of these interfaces to observe users’ recurrent social signals. Additionally, humans are intellectually biased towards social activity when facing anthropomorphic agents or when presented with subtle social cues. Two studies are presented in this paper examining how humans interact in a referential communication task with wizarded interfaces in different forms of embodiment. In study 1 (N = 30), we test whether humans respond the same way to agents, in different forms of embodiment and social behaviour. In study 2 (N = 44), we replicate the same task and agents but introduce conversational failures disrupting the process of grounding. Findings indicate that it is not always favourable for agents to be anthropomorphised or to communicate with non-verbal cues, as human grounding behaviours change when embodiment and failures are manipulated.

Place, publisher, year, edition, pages
Springer Nature, 2021
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-295461 (URN)10.1007/s12193-021-00366-y (DOI)000632299500001 ()2-s2.0-85103164370 (Scopus ID)
Note

QC 20250331

Available from: 2021-05-20 Created: 2021-05-20 Last updated: 2025-03-31Bibliographically approved
Kontogiorgos, D. & Gustafson, J. (2021). Measuring Collaboration Load With Pupillary Responses-Implications for the Design of Instructions in Task-Oriented HRI. Frontiers in Psychology, 12, Article ID 623657.
Open this publication in new window or tab >>Measuring Collaboration Load With Pupillary Responses-Implications for the Design of Instructions in Task-Oriented HRI
2021 (English)In: Frontiers in Psychology, E-ISSN 1664-1078, Vol. 12, article id 623657Article in journal (Refereed) Published
Abstract [en]

In face-to-face interaction, speakers establish common ground incrementally, the mutual belief of understanding. Instead of constructing "one-shot" complete utterances, speakers tend to package pieces of information in smaller fragments (what Clark calls "installments"). The aim of this paper was to investigate how speakers' fragmented construction of utterances affect the cognitive load of the conversational partners during utterance production and comprehension. In a collaborative furniture assembly, participants instructed each other how to build an IKEA stool. Pupil diameter was measured as an outcome of effort and cognitive processing in the collaborative task. Pupillometry data and eye-gaze behaviour indicated that more cognitive resources were required by speakers to construct fragmented rather than non-fragmented utterances. Such construction of utterances by audience design was associated with higher cognitive load for speakers. We also found that listeners' cognitive resources were decreased in each new speaker utterance, suggesting that speakers' efforts in the fragmented construction of utterances were successful to resolve ambiguities. The results indicated that speaking in fragments is beneficial for minimising collaboration load, however, adapting to listeners is a demanding task. We discuss implications for future empirical research on the design of task-oriented human-robot interactions, and how assistive social robots may benefit from the production of fragmented instructions.

Place, publisher, year, edition, pages
Frontiers Media SA, 2021
Keywords
social signal processing, pupillometry, dialogue and discourse, collaboration, common ground, least-collaborative-effort, situated interaction, referential communication
National Category
General Language Studies and Linguistics
Identifiers
urn:nbn:se:kth:diva-299683 (URN)10.3389/fpsyg.2021.623657 (DOI)000680354600001 ()34354623 (PubMedID)2-s2.0-85111926686 (Scopus ID)
Note

QC 20210818

Available from: 2021-08-18 Created: 2021-08-18 Last updated: 2022-06-25Bibliographically approved
Lee, M., Kontogiorgos, D., Torre, I., Luria, M., Tejwani, R., Dennis, M. & Abelho Pereira, A. T. (2021). Robo-Identity: Exploring Artificial Identity and Multi-Embodiment. In: ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at 2021 ACM/IEEE International Conference on Human-Robot Interaction, HRI 2021. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Robo-Identity: Exploring Artificial Identity and Multi-Embodiment
Show others...
2021 (English)In: ACM/IEEE International Conference on Human-Robot Interaction, Association for Computing Machinery (ACM) , 2021Conference paper, Published paper (Refereed)
Abstract [en]

Interactive robots are becoming more commonplace and complex, but their identity has not yet been a key point of investigation. Identity is an overarching concept that combines traits like personality or a backstory (among other aspects) that people readily attribute to a robot to individuate it as a unique entity. Given people's tendency to anthropomorphize social robots, "who is a robot?"should be a guiding question above and beyond "what is a robot?"Hence, we open up a discussion on artificial identity through this workshop in a multi-disciplinary manner; we welcome perspectives on challenges and opportunities from fields of ethics, design, and engineering. For instance, dynamic embodiment, e.g., an agent that dynamically moves across one's smartwatch, smart speaker, and laptop, is a technical and theoretical problem, with ethical ramifications. Another consideration is whether multiple bodies may warrant multiple identities instead of an "all-in-one"identity. Who "lives"in which devices or bodies? Should their identity travel across different forms, and how can that be achieved in an ethically mindful manner? We bring together philosophical, ethical, technical, and designerly perspectives on exploring artificial identity.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2021
Keywords
Artificial identity, Embodiment, Migratable ai, Robo-identity
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-295462 (URN)10.1145/3434074.3444878 (DOI)000767970100177 ()2-s2.0-85102725996 (Scopus ID)
Conference
2021 ACM/IEEE International Conference on Human-Robot Interaction, HRI 2021
Note

QC 20220413

Available from: 2021-05-20 Created: 2021-05-20 Last updated: 2022-06-25Bibliographically approved
Oertel, C., Jonell, P., Kontogiorgos, D., Mora, K. F., Odobez, J.-M. & Gustafsson, J. (2021). Towards an Engagement-Aware Attentive Artificial Listener for Multi-Party Interactions. Frontiers in Robotics and AI, 8, Article ID 555913.
Open this publication in new window or tab >>Towards an Engagement-Aware Attentive Artificial Listener for Multi-Party Interactions
Show others...
2021 (English)In: Frontiers in Robotics and AI, E-ISSN 2296-9144, Vol. 8, article id 555913Article in journal (Refereed) Published
Abstract [en]

Listening to one another is essential to human-human interaction. In fact, we humans spend a substantial part of our day listening to other people, in private as well as in work settings. Attentive listening serves the function to gather information for oneself, but at the same time, it also signals to the speaker that he/she is being heard. To deduce whether our interlocutor is listening to us, we are relying on reading his/her nonverbal cues, very much like how we also use non-verbal cues to signal our attention. Such signaling becomes more complex when we move from dyadic to multi-party interactions. Understanding how humans use nonverbal cues in a multi-party listening context not only increases our understanding of human-human communication but also aids the development of successful human-robot interactions. This paper aims to bring together previous analyses of listener behavior analyses in human-human multi-party interaction and provide novel insights into gaze patterns between the listeners in particular. We are investigating whether the gaze patterns and feedback behavior, as observed in the humanhuman dialogue, are also beneficial for the perception of a robot in multi-party humanrobot interaction. To answer this question, we are implementing an attentive listening system that generates multi-modal listening behavior based on our human-human analysis. We are comparing our system to a baseline system that does not differentiate between different listener types in its behavior generation. We are evaluating it in terms of the participant's perception of the robot, his behavior as well as the perception of third-party observers.

Place, publisher, year, edition, pages
Frontiers Media SA, 2021
Keywords
multi-party interactions, non-verbal behaviors, eye-gaze patterns, head gestures, human-robot interaction, artificial listener, social signal processing
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-299298 (URN)10.3389/frobt.2021.555913 (DOI)000673604300001 ()34277714 (PubMedID)2-s2.0-85110106028 (Scopus ID)
Note

QC 20220301

Available from: 2021-08-18 Created: 2021-08-18 Last updated: 2022-06-25Bibliographically approved
Torre, I., Dogan, F. I. & Kontogiorgos, D. (2021). Voice, Embodiment, and Autonomy as Identity Affordances. In: HRI '21 Companion: Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at HRI 2021 - Robo-Identity: Exploring Artificial Identity and Multi-Embodiment March 2021.
Open this publication in new window or tab >>Voice, Embodiment, and Autonomy as Identity Affordances
2021 (English)In: HRI '21 Companion: Companion of the 2021 ACM/IEEE International Conference on Human-Robot Interaction, 2021Conference paper, Published paper (Refereed)
Abstract [en]

Perceived robot identity has not been discussed thoroughly in Human-Robot Interaction. In particular, very few works have explored how humans tend to perceive robots that migrate through a variety of media and devices. In this paper, we discuss some of the open challenges for artificial robot identity stemming from the robotic features of voice, embodiment, and autonomy. How does a robot's voice affect perceived robot gender identity, and can we use this knowledge to fight injustice? And how do robot autonomy and decisions affect the mental image humans form of the robot? These, among others, are open questions we wish to bring researchers and designers’ attention to, in order to influence best practices on the timely topic of artificial agent identity.

National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-291341 (URN)
Conference
HRI 2021 - Robo-Identity: Exploring Artificial Identity and Multi-Embodiment March 2021
Note

QC 20210310

Available from: 2021-03-09 Created: 2021-03-09 Last updated: 2025-02-09Bibliographically approved
Kontogiorgos, D., Abelho Pereira, A. T., Sahindal, B., van Waveren, S. & Gustafson, J. (2020). Behavioural Responses to Robot Conversational Failures. In: HRI '20: Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at International Conference on Human Robot Interaction (HRI), HRI ’20, March 23–26, 2020, Cambridge, United Kingdom. ACM Digital Library
Open this publication in new window or tab >>Behavioural Responses to Robot Conversational Failures
Show others...
2020 (English)In: HRI '20: Proceedings of the 2020 ACM/IEEE International Conference on Human-Robot Interaction, ACM Digital Library, 2020Conference paper, Published paper (Refereed)
Abstract [en]

Humans and robots will increasingly collaborate in domestic environments which will cause users to encounter more failures in interactions. Robots should be able to infer conversational failures by detecting human users’ behavioural and social signals. In this paper, we study and analyse these behavioural cues in response to robot conversational failures. Using a guided task corpus, where robot embodiment and time pressure are manipulated, we ask human annotators to estimate whether user affective states differ during various types of robot failures. We also train a random forest classifier to detect whether a robot failure has occurred and compare results to human annotator benchmarks. Our findings show that human-like robots augment users’ reactions to failures, as shown in users’ visual attention, in comparison to non-humanlike smart-speaker embodiments. The results further suggest that speech behaviours are utilised more in responses to failures when non-human-like designs are present. This is particularly important to robot failure detection mechanisms that may need to consider the robot’s physical design in its failure detection model.

Place, publisher, year, edition, pages
ACM Digital Library, 2020
National Category
Other Engineering and Technologies
Identifiers
urn:nbn:se:kth:diva-267231 (URN)10.1145/3319502.3374782 (DOI)000570011000007 ()2-s2.0-85082009759 (Scopus ID)
Conference
International Conference on Human Robot Interaction (HRI), HRI ’20, March 23–26, 2020, Cambridge, United Kingdom
Note

QC 20200214

Available from: 2020-02-04 Created: 2020-02-04 Last updated: 2025-02-18Bibliographically approved
Kontogiorgos, D., Sibirtseva, E. & Gustafson, J. (2020). Chinese whispers: A multimodal dataset for embodied language grounding. In: LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings: . Paper presented at 12th International Conference on Language Resources and Evaluation, LREC 2020, Marseille, France, 11-16 May 2020 (pp. 743-749). European Language Resources Association (ELRA)
Open this publication in new window or tab >>Chinese whispers: A multimodal dataset for embodied language grounding
2020 (English)In: LREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings, European Language Resources Association (ELRA) , 2020, p. 743-749Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we introduce a multimodal dataset in which subjects are instructing each other how to assemble IKEA furniture. Using the concept of 'Chinese Whispers', an old children's game, we employ a novel method to avoid implicit experimenter biases. We let subjects instruct each other on the nature of the task: the process of the furniture assembly. Uncertainty, hesitations, repairs and self-corrections are naturally introduced in the incremental process of establishing common ground. The corpus consists of 34 interactions, where each subject first assembles and then instructs. We collected speech, eye-gaze, pointing gestures, and object movements, as well as subjective interpretations of mutual understanding, collaboration and task recall. The corpus is of particular interest to researchers who are interested in multimodal signals in situated dialogue, especially in referential communication and the process of language grounding.

Place, publisher, year, edition, pages
European Language Resources Association (ELRA), 2020
Keywords
Grounding, Multimodal interaction, Non-verbal signals, Referential communication, Situated dialogue and discourse, Incremental process, Language grounding, Multi-modal dataset, Mutual understanding, Object movements, Pointing gestures, Referential communications, Self-correction, Eye movements
National Category
Languages and Literature
Identifiers
urn:nbn:se:kth:diva-290864 (URN)000724697200093 ()2-s2.0-85093942292 (Scopus ID)
Conference
12th International Conference on Language Resources and Evaluation, LREC 2020, Marseille, France, 11-16 May 2020
Note

Part of proceedings ISBN 979-109554634-4

QC 20210310

Available from: 2021-03-10 Created: 2021-03-10 Last updated: 2022-11-07Bibliographically approved
Kontogiorgos, D., Sibirtseva, E. & Gustafson, J. (2020). Chinese Whispers: A Multimodal Dataset for Embodied Language Grounding. In: : . Paper presented at Language Resources and Evaluation (LREC).
Open this publication in new window or tab >>Chinese Whispers: A Multimodal Dataset for Embodied Language Grounding
2020 (English)Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we introduce a multimodal dataset in which subjects are instructing each other how to assemble IKEA furniture. Using the concept of ‘Chinese Whispers’, an old children’s game, we employ a novel method to avoid implicit experimenter biases. We let subjects instruct each other on the nature of the task: the process of the furniture assembly. Uncertainty, hesitations, repairs and self-corrections are naturally introduced in the incremental process of establishing common ground. The corpus consists of 34 interactions, where each subject first assembles and then instructs. We collected speech, eye-gaze, pointing gestures, and object movements, as well as subjective interpretations of mutual understanding, collaboration and task recall. The corpus is of particular interest to researchers who are interested in multimodal signals in situated dialogue, especially in referential communication and the process of language grounding.

National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-273215 (URN)
Conference
Language Resources and Evaluation (LREC)
Available from: 2020-05-11 Created: 2020-05-11 Last updated: 2024-03-18Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-8874-6629

Search in DiVA

Show all publications