kth.sePublikationer KTH
Ändra sökning
Länk till posten
Permanent länk

Direktlänk
Publikationer (10 of 13) Visa alla publikationer
Skantze, G. & Irfan, B. (2025). Applying General Turn-Taking Models to Conversational Human-Robot Interaction. In: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at 20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, Mar 4 2025 - Mar 6 2025 (pp. 859-868). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Applying General Turn-Taking Models to Conversational Human-Robot Interaction
2025 (Engelska)Ingår i: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, Institute of Electrical and Electronics Engineers (IEEE) , 2025, s. 859-868Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Turn-taking is a fundamental aspect of conversation, but current Human-Robot Interaction (HRI) systems often rely on simplistic, silence-based models, leading to unnatural pauses and interruptions. This paper investigates, for the first time, the application of general turn-taking models, specifically TurnGPT and Voice Activity Projection (VAP), to improve conversational dynamics in HRI. These models are trained on human-human dialogue data using self-supervised learning objectives, without requiring domain-specific fine-tuning. We propose methods for using these models in tandem to predict when a robot should begin preparing responses, take turns, and handle potential interruptions. We evaluated the proposed system in a within-subject study against a traditional baseline system, using the Furhat robot with 39 adults in a conversational setting, in combination with a large language model for autonomous response generation. The results show that participants significantly prefer the proposed system, and it significantly reduces response delays and interruptions.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2025
Nyckelord
conversational AI, human-robot interaction, large language model, turn-taking
Nationell ämneskategori
Språkbehandling och datorlingvistik Datavetenskap (datalogi) Människa-datorinteraktion (interaktionsdesign)
Identifikatorer
urn:nbn:se:kth:diva-363767 (URN)10.1109/HRI61500.2025.10973958 (DOI)2-s2.0-105004876033 (Scopus ID)
Konferens
20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, Mar 4 2025 - Mar 6 2025
Anmärkning

Part of ISBN 9798350378931

QC 20250527

Tillgänglig från: 2025-05-21 Skapad: 2025-05-21 Senast uppdaterad: 2025-05-27Bibliografiskt granskad
Irfan, B., Kuoppamäki, S., Hosseini, A. & Skantze, G. (2025). Between reality and delusion: challenges of applying large language models to companion robots for open-domain dialogues with older adults. Autonomous Robots, 49(1), Article ID 9.
Öppna denna publikation i ny flik eller fönster >>Between reality and delusion: challenges of applying large language models to companion robots for open-domain dialogues with older adults
2025 (Engelska)Ingår i: Autonomous Robots, ISSN 0929-5593, E-ISSN 1573-7527, Vol. 49, nr 1, artikel-id 9Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Throughout our lives, we interact daily in conversations with our friends and family, covering a wide range of topics, known as open-domain dialogue. As we age, these interactions may diminish due to changes in social and personal relationships, leading to loneliness in older adults. Conversational companion robots can alleviate this issue by providing daily social support. Large language models (LLMs) offer flexibility for enabling open-domain dialogue in these robots. However, LLMs are typically trained and evaluated on textual data, while robots introduce additional complexity through multi-modal interactions, which has not been explored in prior studies. Moreover, it is crucial to involve older adults in the development of robots to ensure alignment with their needs and expectations. Correspondingly, using iterative participatory design approaches, this paper exposes the challenges of integrating LLMs into conversational robots, deriving from 34 Swedish-speaking older adults' (one-to-one) interactions with a personalized companion robot, built on Furhat robot with GPT-\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$-$$\end{document}3.5. These challenges encompass disruptions in conversations, including frequent interruptions, slow, repetitive, superficial, incoherent, and disengaging responses, language barriers, hallucinations, and outdated information, leading to frustration, confusion, and worry among older adults. Drawing on insights from these challenges, we offer recommendations to enhance the integration of LLMs into conversational robots, encompassing both general suggestions and those tailored to companion robots for older adults.

Ort, förlag, år, upplaga, sidor
Springer Nature, 2025
Nyckelord
Large language models, Companion robot, Elderly care, Open-domain dialogue, Socially assistive robot, Participatory design
Nationell ämneskategori
Datavetenskap (datalogi)
Identifikatorer
urn:nbn:se:kth:diva-361621 (URN)10.1007/s10514-025-10190-y (DOI)001440005600001 ()2-s2.0-86000731912 (Scopus ID)
Anmärkning

QC 20250324

Tillgänglig från: 2025-03-24 Skapad: 2025-03-24 Senast uppdaterad: 2025-03-24Bibliografiskt granskad
Irfan, B. & Skantze, G. (2025). Between You and Me: Ethics of Self-Disclosure in Human-Robot Interaction. In: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at 20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, March 4-6, 2025 (pp. 1357-1362). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Between You and Me: Ethics of Self-Disclosure in Human-Robot Interaction
2025 (Engelska)Ingår i: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, Institute of Electrical and Electronics Engineers (IEEE) , 2025, s. 1357-1362Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

As we move toward a future where robots are increasingly part of daily life, the privacy risks associated with interactions, particularly those relying on cloud-based large language models (LLMs), are becoming more pressing. Users may unknowingly share sensitive information in environments, such as homes or hospitals. To explore these risks, we conducted a study with 39 native English speakers using a Furhat robot with an integrated LLM. Participants discussed two moral dilemmas: (i) dishonesty, sharing personal stories of justified lying, and (ii) robot disobedience, discussing whether robots should disobey commands. On average, participants disclosed personal stories 45% of the time when asked in both scenarios. The main reason for non-disclosure was difficulty recalling examples quickly (33.3-56%), rather than reluctance to share (7.2-16%). However, most participants reported a lack of discomfort and concern about sharing personal information with the robot, indicating limited awareness of the privacy risks involved in such disclosures.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2025
Nyckelord
ethics, human-robot interaction, large language model, moral dilemmas, privacy, self-disclosure
Nationell ämneskategori
Människa-datorinteraktion (interaktionsdesign) Robotik och automation Etik
Identifikatorer
urn:nbn:se:kth:diva-363756 (URN)10.1109/HRI61500.2025.10974215 (DOI)2-s2.0-105004876468 (Scopus ID)
Konferens
20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, March 4-6, 2025
Anmärkning

Part of ISBN 9798350378931

QC 20250525

Tillgänglig från: 2025-05-21 Skapad: 2025-05-21 Senast uppdaterad: 2025-05-25Bibliografiskt granskad
Irfan, B., Miniota, J., Thunberg, S., Lagerstedt, E., Kuoppamäki, S., Skantze, G. & Abelho Pereira, A. T. (2025). Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES). IEEE Transactions on Affective Computing
Öppna denna publikation i ny flik eller fönster >>Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES)
Visa övriga...
2025 (Engelska)Ingår i: IEEE Transactions on Affective Computing, E-ISSN 1949-3045Artikel i tidskrift (Refereegranskat) Epub ahead of print
Abstract [en]

Understanding user enjoyment is crucial in human-robot interaction (HRI), as it can impact interaction quality and influence user acceptance and long-term engagement with robots, particularly in the context of conversations with social robots. However, current assessment methods rely solely on self-reported questionnaires, failing to capture interaction dynamics. This work introduces the Human-Robot Interaction Conversational User Enjoyment Scale (HRI CUES), a novel 5-point scale to assess user enjoyment from an external perspective (e.g. by an annotator) for conversations with a robot. The scale was developed through rigorous evaluations and discussions among three annotators with relevant expertise, using open-domain conversations with a companion robot that was powered by a large language model, and was applied to each conversation exchange (i.e. a robot-participant turn pair) alongside overall interaction. It was evaluated on 25 older adults' interactions with the companion robot, corresponding to 174 minutes of data, showing moderate to good alignment between annotators. Although the scale was developed and tested in the context of older adult interactions with a robot, its basis in general and non-task-specific indicators of enjoyment supports its broader applicability. The study further offers insights into understanding the nuances and challenges of assessing user enjoyment in robot interactions, and provides guidelines on applying the scale to other domains and populations. The dataset is available online.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2025
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:kth:diva-374884 (URN)10.1109/TAFFC.2025.3590359 (DOI)2-s2.0-105011494748 (Scopus ID)
Anmärkning

QC 20260107

Tillgänglig från: 2026-01-06 Skapad: 2026-01-06 Senast uppdaterad: 2026-01-07Bibliografiskt granskad
Irfan, B., Churamani, N., Zhao, M., Ayub, A. & Rossi, S. (2025). Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Overcoming Inequalities with Adaptation. In: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at 20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, March 4-6, 2025 (pp. 1970-1972). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Overcoming Inequalities with Adaptation
Visa övriga...
2025 (Engelska)Ingår i: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, Institute of Electrical and Electronics Engineers (IEEE) , 2025, s. 1970-1972Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Global inequalities in access to essential resources such as education, healthcare, and technology continue to widen social and economic disparities, especially in underserved and underrepresented communities. The growing integration of foundation models and other machine learning systems in robots offers promising and personalized solutions that can adapt to various individuals, situations, and environments, potentially addressing some of these gaps. By learning from interactions and evolving with local conditions, these systems can provide individualized support, such as assisting older adults with daily tasks, aiding children with special needs in learning environments, or empowering people with disabilities to live more independently. Building trust and fostering collaboration between humans and robots will help ensure that these systems meet the unique needs of all individuals, especially within long-term human-robot interaction (HRI). With this year's theme of 'Overcoming Inequalities with Adaptation', in line with the overall theme of the conference 'Robots for a Sustainable World', the fifth edition of the 'Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI)'l workshop aims to bring together insights across diverse disciplines, exploring how continually evolving robots can effectively operate in diverse environments, promoting greater equity, inclusivity, and empowerment for individuals and communities. The workshop aims to facilitate collaborations across diverse scientific perspectives through a keynote presentation, panel discussions, and in-depth discussions on the contributed talks, attempting to shape a more sustainable and equitable future through adaptive advancements in long-term HRI.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2025
Nyckelord
adaptation, continual learning, Human-Robot Interaction (HRI), inequalities, lifelong learning, long-term HRI, personalization, user modeling, workshop
Nationell ämneskategori
Människa-datorinteraktion (interaktionsdesign) Robotik och automation
Identifikatorer
urn:nbn:se:kth:diva-363762 (URN)10.1109/HRI61500.2025.10973812 (DOI)2-s2.0-105004879520 (Scopus ID)
Konferens
20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, March 4-6, 2025
Anmärkning

Part of ISBN 9798350378931

QC 20250525

Tillgänglig från: 2025-05-21 Skapad: 2025-05-21 Senast uppdaterad: 2025-05-25Bibliografiskt granskad
Janssens, R., Pereira, A., Skantze, G., Irfan, B. & Belpaeme, T. (2025). Online Prediction of User Enjoyment in Human-Robot Dialogue with LLMs. In: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at 20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, March 4-6, 2025 (pp. 1363-1367). Institute of Electrical and Electronics Engineers (IEEE)
Öppna denna publikation i ny flik eller fönster >>Online Prediction of User Enjoyment in Human-Robot Dialogue with LLMs
Visa övriga...
2025 (Engelska)Ingår i: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, Institute of Electrical and Electronics Engineers (IEEE) , 2025, s. 1363-1367Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Large Language Models (LLMs) allow social robots to engage in unconstrained open-domain dialogue, but often make mistakes when employed in real-world interactions, requiring adaptation of LLMs to specific conversational contexts. However, LLM adaptation techniques require a feedback signal, ideally for multiple alternative utterances. At the same time, human-robot dialogue data is scarce and research often relies on external annotators. A tool for automatic prediction of user enjoyment in human-robot dialogue is therefore needed. We investigate the possibility of predicting user enjoyment turn-by-turn using an LLM, giving it a proposed robot utterance within the dialogue context, but without access to user response. We compare this performance to the system's enjoyment ratings when user responses are available and to assessments by expert human annotators, in addition to self-reported user perceptions. We evaluate the proposed LLM predictor in a human-robot interaction (HRI) dataset with conversation transcripts of 25 older adults' 7-minute dialogues with a companion robot. Our results show that an LLM is capable of predicting user enjoyment, without loss of performance despite the lack of user response and even achieving performance similar to that of human expert annotators. Furthermore, results show that the system surpasses expert annotators in its correlation with the user's self-reported perceptions of the conversation. This work presents a tool to remove the reliance on external annotators for enjoyment evaluation and paves the way toward real-time adaptation in human-robot dialogue.

Ort, förlag, år, upplaga, sidor
Institute of Electrical and Electronics Engineers (IEEE), 2025
Nyckelord
human-robot interaction, large language model, open-domain dialogue, prediction, user enjoyment
Nationell ämneskategori
Språkbehandling och datorlingvistik Datavetenskap (datalogi) Robotik och automation Människa-datorinteraktion (interaktionsdesign)
Identifikatorer
urn:nbn:se:kth:diva-363754 (URN)10.1109/HRI61500.2025.10973944 (DOI)2-s2.0-105004873166 (Scopus ID)
Konferens
20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, March 4-6, 2025
Anmärkning

Part of ISBN 9798350378931

QC 20250525

Tillgänglig från: 2025-05-21 Skapad: 2025-05-21 Senast uppdaterad: 2025-05-25Bibliografiskt granskad
Marcinek, L., Irfan, B., Skantze, G., Abelho Pereira, A. T. & Gustafsson, J. (2025). Role of Reasoning in LLM Enjoyment Detection: Evaluation Across Conversational Levels for Human-Robot Interaction. In: Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin (Ed.), Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue: . Paper presented at The 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Avignon, France, Aug 25-27, 2025 (pp. 573-590). SIGDIAL
Öppna denna publikation i ny flik eller fönster >>Role of Reasoning in LLM Enjoyment Detection: Evaluation Across Conversational Levels for Human-Robot Interaction
Visa övriga...
2025 (Engelska)Ingår i: Proceedings of the 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue / [ed] Frédéric Béchet, Fabrice Lefèvre, Nicholas Asher, Seokhwan Kim, Teva Merlin, SIGDIAL , 2025, s. 573-590Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

User enjoyment is central to developing conversational AI systems that can recover from failures and maintain interest over time. However, existing approaches often struggle to detect subtle cues that reflect user experience. Large Language Models (LLMs) with reasoning capabilities have outperformed standard models on various other tasks, suggesting potential benefits for enjoyment detection. This study investigates whether models with reasoning capabilities outperform standard models when assessing enjoyment in a human-robot dialogue corpus at both turn and interaction levels. Results indicate that reasoning capabilities have complex, model-dependent effects rather than universal benefits. While performance was nearly identical at the interaction level (0.44 vs 0.43), reasoning models substantially outperformed at the turn level (0.42 vs 0.36). Notably, LLMs correlated better with users’ self-reported enjoyment metrics than human annotators, despite achieving lower accuracy against human consensus ratings. Analysis revealed distinctive error patterns: non-reasoning models showed bias toward positive ratings at the turn level, while both model types exhibited central tendency bias at the interaction level. These findings suggest that reasoning should be applied selectively based on model architecture and assessment context, with assessment granularity significantly influencing relative effectiveness.

Ort, förlag, år, upplaga, sidor
SIGDIAL, 2025
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:kth:diva-374881 (URN)
Konferens
The 26th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Avignon, France, Aug 25-27, 2025
Anmärkning

Part of ISBN 979-8-89176-329-6

QC 20260107

Tillgänglig från: 2026-01-06 Skapad: 2026-01-06 Senast uppdaterad: 2026-01-07Bibliografiskt granskad
Santana, R., Irfan, B., Lagerstedt, E., Skantze, G. & Abelho Pereira, A. T. (2025). Speech-to-Joy: Self-Supervised Features for Enjoyment Prediction in Human-Robot Conversation. In: Ram Subramanian, Yukiko I. Nakano, Tom Gedeon, Mohan Kankanhalli, Tanaya Guha, Jainendra Shukla, Gelareh Mohammadi, Oya Celiktutan (Ed.), Proceedings of the 27th International Conference on Multimodal Interaction, ICMI 2025: . Paper presented at The 27th International Conference on Multimodal Interaction, ICMI 2025, Canberra, Australia, October 13-17, 2025 (pp. 238-248). Association for Computing Machinery (ACM)
Öppna denna publikation i ny flik eller fönster >>Speech-to-Joy: Self-Supervised Features for Enjoyment Prediction in Human-Robot Conversation
Visa övriga...
2025 (Engelska)Ingår i: Proceedings of the 27th International Conference on Multimodal Interaction, ICMI 2025 / [ed] Ram Subramanian, Yukiko I. Nakano, Tom Gedeon, Mohan Kankanhalli, Tanaya Guha, Jainendra Shukla, Gelareh Mohammadi, Oya Celiktutan, Association for Computing Machinery (ACM) , 2025, s. 238-248Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Conversational systems that interact or collaborate with people must understand not only task success but also the quality of human experience. We present Speech-to-Joy, a lightweight framework that learns to predict users’ own post-interaction enjoyment ratings using latent embeddings from audio and text modalities. Evaluated on a corpus of human-robot dialogues, the model’s predicted enjoyment correlates strongly and significantly with user self-reports, outperforming both an experienced HRI annotator and heavier LLM-based uni- and multimodal baselines. Notably, even the unimodal audio branch - using only frozen speech embeddings - surpasses all baselines, and a late-fusion of text and audio achieves the highest performance. Designed for real-time inference on resource-limited platforms, Speech-to-Joy replaces ad-hoc emotion heuristics with a direct and user-centered measure of enjoyment. This work paves the way for optimizing interactions with robots and other conversational systems through the lens that matters most: the user’s own experience.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2025
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:kth:diva-374889 (URN)10.1145/3716553.3750747 (DOI)2-s2.0-105022238812 (Scopus ID)
Konferens
The 27th International Conference on Multimodal Interaction, ICMI 2025, Canberra, Australia, October 13-17, 2025
Anmärkning

Part of ISBN 979-8-4007-1499-3

QC 20260107

Tillgänglig från: 2026-01-06 Skapad: 2026-01-06 Senast uppdaterad: 2026-01-07Bibliografiskt granskad
Irfan, B., Staffa, M., Bobu, A. & Churamani, N. (2024). Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Open-World Learning. In: HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at 19th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2024, Boulder, United States of America, Mar 11 2024 - Mar 15 2024 (pp. 1323-1325). Association for Computing Machinery (ACM)
Öppna denna publikation i ny flik eller fönster >>Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI): Open-World Learning
2024 (Engelska)Ingår i: HRI 2024 Companion - Companion of the 2024 ACM/IEEE International Conference on Human-Robot Interaction, Association for Computing Machinery (ACM) , 2024, s. 1323-1325Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

The complex and largely unstructured nature of real-world situations makes it challenging for conventional closed-world robot learning solutions to adapt to such interaction dynamics. These challenges become particularly pronounced in long-term interactions where robots need to go beyond their past learning to continuously evolve with changing environment settings and personalize towards individual user behaviors. In contrast, open-world learning embraces the complexity and unpredictability of the real world, enabling robots to be “lifelong learners” that continuously acquire new knowledge and navigate novel challenges, making them more context-aware while intuitively engaging the users. Adopting the theme of “open-world learning”, the fourth edition of the “Lifelong Learning and Personalization in Long-Term Human-Robot Interaction (LEAP-HRI)”1 workshop seeks to bring together interdisciplinary perspectives on real-world applications in human-robot interaction (HRI), including education, rehabilitation, elderly care, service, and companionship. The goal of the workshop is to foster collaboration and understanding across diverse scientific communities through invited keynote presentations and in-depth discussions facilitated by contributed talks, a break-out session, and a debate.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2024
Serie
ACM/IEEE International Conference on Human-Robot Interaction, ISSN 2167-2148
Nyckelord
Adaptation, Continual Learning, Human-Robot Interaction, Lifelong Learning, Open-World Learning, Personalization, Workshop
Nationell ämneskategori
Människa-datorinteraktion (interaktionsdesign)
Identifikatorer
urn:nbn:se:kth:diva-344806 (URN)10.1145/3610978.3638159 (DOI)001255070800287 ()2-s2.0-85188103162 (Scopus ID)
Konferens
19th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2024, Boulder, United States of America, Mar 11 2024 - Mar 15 2024
Anmärkning

QC 20240402

Part of ISBN 9798400703232

Tillgänglig från: 2024-03-28 Skapad: 2024-03-28 Senast uppdaterad: 2024-09-03Bibliografiskt granskad
Abelho Pereira, A. T., Marcinek, L., Miniotaitė, J., Thunberg, S., Lagerstedt, E., Gustafsson, J., . . . Irfan, B. (2024). Multimodal User Enjoyment Detection in Human-Robot Conversation: The Power of Large Language Models. In: : . Paper presented at 26th International Conference on Multimodal Interaction (ICMI), San Jose, USA, November 4-8, 2024 (pp. 469-478). Association for Computing Machinery (ACM)
Öppna denna publikation i ny flik eller fönster >>Multimodal User Enjoyment Detection in Human-Robot Conversation: The Power of Large Language Models
Visa övriga...
2024 (Engelska)Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Enjoyment is a crucial yet complex indicator of positive user experience in Human-Robot Interaction (HRI). While manual enjoyment annotation is feasible, developing reliable automatic detection methods remains a challenge. This paper investigates a multimodal approach to automatic enjoyment annotation for HRI conversations, leveraging large language models (LLMs), visual, audio, and temporal cues. Our findings demonstrate that both text-only and multimodal LLMs with carefully designed prompts can achieve performance comparable to human annotators in detecting user enjoyment. Furthermore, results reveal a stronger alignment between LLM-based annotations and user self-reports of enjoyment compared to human annotators. While multimodal supervised learning techniques did not improve all of our performance metrics, they could successfully replicate human annotators and highlighted the importance of visual and audio cues in detecting subtle shifts in enjoyment. This research demonstrates the potential of LLMs for real-time enjoyment detection, paving the way for adaptive companion robots that can dynamically enhance user experiences.

Ort, förlag, år, upplaga, sidor
Association for Computing Machinery (ACM), 2024
Nyckelord
Afect Recognition, Human-Robot Interaction, Large Language Models, Multimodal, Older Adults, User Enjoyment
Nationell ämneskategori
Språkbehandling och datorlingvistik
Identifikatorer
urn:nbn:se:kth:diva-359146 (URN)10.1145/3678957.3685729 (DOI)001433669800051 ()2-s2.0-85212589337 (Scopus ID)
Konferens
26th International Conference on Multimodal Interaction (ICMI), San Jose, USA, November 4-8, 2024
Anmärkning

QC 20250127

Tillgänglig från: 2025-01-27 Skapad: 2025-01-27 Senast uppdaterad: 2025-04-30Bibliografiskt granskad
Organisationer
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-7983-079X

Sök vidare i DiVA

Visa alla publikationer