Open this publication in new window or tab >>Show others...
2024 (English)Conference paper, Published paper (Refereed)
Abstract [en]
Enjoyment is a crucial yet complex indicator of positive user experience in Human-Robot Interaction (HRI). While manual enjoyment annotation is feasible, developing reliable automatic detection methods remains a challenge. This paper investigates a multimodal approach to automatic enjoyment annotation for HRI conversations, leveraging large language models (LLMs), visual, audio, and temporal cues. Our findings demonstrate that both text-only and multimodal LLMs with carefully designed prompts can achieve performance comparable to human annotators in detecting user enjoyment. Furthermore, results reveal a stronger alignment between LLM-based annotations and user self-reports of enjoyment compared to human annotators. While multimodal supervised learning techniques did not improve all of our performance metrics, they could successfully replicate human annotators and highlighted the importance of visual and audio cues in detecting subtle shifts in enjoyment. This research demonstrates the potential of LLMs for real-time enjoyment detection, paving the way for adaptive companion robots that can dynamically enhance user experiences.
Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2024
Keywords
Afect Recognition, Human-Robot Interaction, Large Language Models, Multimodal, Older Adults, User Enjoyment
National Category
Natural Language Processing
Identifiers
urn:nbn:se:kth:diva-359146 (URN)10.1145/3678957.3685729 (DOI)2-s2.0-85212589337 (Scopus ID)
Conference
26th International Conference on Multimodal Interaction (ICMI), San Jose, USA, November 4-8, 2024
Note
QC 20250127
2025-01-272025-01-272025-02-07Bibliographically approved