kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (8 of 8) Show all publications
Mohamed, Y., Lemaignan, S., Güneysu, A., Jensfelt, P. & Smith, C. (2025). Are You an Expert? Instruction Adaptation Using Multi-Modal Affect Detections with Thermal Imaging and Context. In: : . Paper presented at IEEE International Conference on Robot and Human Interactive Communication, Eindhoven University of Technology, Eindhoven, The Netherlands, Aug 25-29, 2025..
Open this publication in new window or tab >>Are You an Expert? Instruction Adaptation Using Multi-Modal Affect Detections with Thermal Imaging and Context
Show others...
2025 (English)Conference paper, Published paper (Refereed)
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-369162 (URN)
Conference
IEEE International Conference on Robot and Human Interactive Communication, Eindhoven University of Technology, Eindhoven, The Netherlands, Aug 25-29, 2025.
Available from: 2025-08-29 Created: 2025-08-29 Last updated: 2025-09-05Bibliographically approved
Mohamed, Y., Lemaignan, S., Güneysu, A., Jensfelt, P. & Smith, C. (2025). Context Matters: Understanding Socially Appropriate Affective Responses Via Sentence Embeddings. In: Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings: . Paper presented at 16th International Conference on Social Robotics, ICSR + AI 2024, Odense, Denmark, October 23-26, 2024 (pp. 78-91). Springer Nature
Open this publication in new window or tab >>Context Matters: Understanding Socially Appropriate Affective Responses Via Sentence Embeddings
Show others...
2025 (English)In: Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings, Springer Nature , 2025, p. 78-91Conference paper, Published paper (Refereed)
Abstract [en]

As AI systems increasingly engage in social interactions, comprehending human social dynamics is crucial. Affect recognition enables systems to respond appropriately to emotional nuances in social situations. However, existing multimodal approaches lack accounting for the social appropriateness of detected emotions within their contexts. This paper presents a novel methodology leveraging sentence embeddings to distinguish socially appropriate and inappropriate interactions for more context-aware AI systems. Our approach measures the semantic distance between facial expression descriptions and predefined reference points. We evaluate our method using a benchmark dataset and a real-world robot deployment in a library, combining GPT-4(V) for expression descriptions and ada-2 for sentence embeddings to detect socially inappropriate interactions. Our results underscore the importance of considering contextual factors for effective social interaction understanding through context-aware affect recognition, contributing to the development of socially intelligent AI capable of interpreting and responding to human affect appropriately.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
embeddings, human-robot interaction, machine learning, Social representation
National Category
Sociology (Excluding Social Work, Social Anthropology, Demography and Criminology) Robotics and automation
Identifiers
urn:nbn:se:kth:diva-362501 (URN)10.1007/978-981-96-3522-1_9 (DOI)001531722800009 ()2-s2.0-105002016733 (Scopus ID)
Conference
16th International Conference on Social Robotics, ICSR + AI 2024, Odense, Denmark, October 23-26, 2024
Note

Part of ISBN 9789819635214

QC 20250428

Available from: 2025-04-16 Created: 2025-04-16 Last updated: 2025-12-08Bibliographically approved
Mohamed, Y., Séverin, L., Güneysu, A., Jensfelt, P. & Smith, C. (2025). Fusion in Context: A Multimodal Approach to Affective State Recognition. In: : . Paper presented at 34th IEEE International Conference on Robot and Human Interactive Communication, Eindhoven University of Technology, Eindhoven, The Netherlands, Aug 25-29, 2025..
Open this publication in new window or tab >>Fusion in Context: A Multimodal Approach to Affective State Recognition
Show others...
2025 (English)Conference paper, Published paper (Refereed)
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-369160 (URN)
Conference
34th IEEE International Conference on Robot and Human Interactive Communication, Eindhoven University of Technology, Eindhoven, The Netherlands, Aug 25-29, 2025.
Note

QC 20250905

Available from: 2025-08-29 Created: 2025-08-29 Last updated: 2025-09-05Bibliographically approved
Mohamed, Y. (2025). Multi-Modal Affective State Detection For Dyadic Interactions Using Thermal Imaging and Context. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Multi-Modal Affective State Detection For Dyadic Interactions Using Thermal Imaging and Context
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Until recently, most robotic systems have operated with limited emotional intelligence, primarily responding to pre-programmed cues rather than adapting to human emotional states. Thus, affect recognition in humanrobot-interaction remains a significant challenge and is twofold: robots must not only detect emotional expressions but they also need to interpret them within their social context, requiring systems that are capable of collecting information from its surrounding, analyzing it and thereafter generalizing across different interaction scenarios and cultural contexts to handle more complex situations.

This thesis tackles affect recognition using multi-modal approaches that combine thermal imaging, facial expression analysis, and contextual understanding. Thermal imaging offers unique insights into physiological responses associated with emotional states, complementing traditional vision-based approaches while maintaining non-contact operation. The integration of thermal imaging, facial expression analysis, and contextual understanding creates a comprehensive multi-modal framework that addresses the key challenges in affect recognition, such as varying lighting conditions, occlusions, and ambiguous emotional expressions. This combination provides complementary information streams that enhance robustness in real-world environments, making it an effective case study for developing context-aware emotional intelligence in robotics.

We introduce a novel context-aware transformer architecture that processes multiple data streams while maintaining temporal relationships and contextual understanding. Each modality contributes complementary information about the user’s emotional state, while the context processing ensures situation-appropriate interpretation. For instance, distinguishing between a smile indicating enjoyment during collaborative tasks versus one masking nervousness in stressful situations. This contextual awareness is crucial for appropriate robot responses in real-world deployments.

The research contributions span four areas: (1) developing robust thermal feature extraction techniques that capture subtle emotional responses (2) creating a transformer-based architecture for multi-modal fusion that effectively incorporates situational information, (3) implementing real-time processing pipelines that enable practical deployment in human-robot interaction scenarios, and (4) validating these approaches through extensive real-world interaction studies. Results show improved recognition accuracy from 77% using traditional approaches to 89% with our context-aware multi-modal system, demonstrating the ability to understand and appropriately respond to human emotions in dynamic social situations.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. x, 56
Series
TRITA-EECS-AVL ; 2025:74
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-368995 (URN)9789181063431 (ISBN)
Public defence
2025-09-26, D37, Lindstedtsvägen 9, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20250905

Available from: 2025-09-05 Created: 2025-08-25 Last updated: 2025-09-29Bibliographically approved
Mohamed, Y., Güneysu Özgür, A., Lemaignan, S. & Leite, I. (2024). Multi-modal Affect Detection Using Thermal and Optical Imaging in a Gamified Robotic Exercise. International Journal of Social Robotics, 16(5), 981-997
Open this publication in new window or tab >>Multi-modal Affect Detection Using Thermal and Optical Imaging in a Gamified Robotic Exercise
2024 (English)In: International Journal of Social Robotics, ISSN 1875-4791, E-ISSN 1875-4805, Vol. 16, no 5, p. 981-997Article in journal (Refereed) Epub ahead of print
Abstract [en]

Affect recognition, or the ability to detect and interpret emotional states, has the potential to be a valuable tool in the field of healthcare. In particular, it can be useful in gamified therapy, which involves using gaming techniques to motivate and keep the engagement of patients in therapeutic activities. This study aims to examine the accuracy of machine learning models using thermal imaging and action unit data for affect classification in a gamified robot therapy scenario. A self-report survey and three machine learning models were used to assess emotions including frustration, boredom, and enjoyment in participants during different phases of the game. The results showed that the multimodal approach with the combination of thermal imaging and action units with LSTM model had the highest accuracy of 77% for emotion classification over a 7-s sliding window, while thermal imaging had the lowest standard deviation among participants. The results suggest that thermal imaging and action units can be effective in detecting affective states and might have the potential to be used in healthcare applications, such as gamified therapy, as a promising non-intrusive method for recognizing internal states.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Action units, Emotionally aware systems, Frustration, Human–robot interaction, Multi-modal affect recognition, Thermal imaging
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-350001 (URN)10.1007/s12369-023-01066-1 (DOI)001090565600001 ()2-s2.0-85175291284 (Scopus ID)
Note

QC 20240705

Available from: 2024-07-05 Created: 2024-07-05 Last updated: 2025-09-22Bibliographically approved
Mohamed, Y., Ballardini, G., Parreira, M. T., Lemaignan, S. & Leite, I. (2022). Automatic Frustration Detection Using Thermal Imaging. In: PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22): . Paper presented at 17th Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI), MAR 07-10, 2022, ELECTR NETWORK (pp. 451-460). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Automatic Frustration Detection Using Thermal Imaging
Show others...
2022 (English)In: PROCEEDINGS OF THE 2022 17TH ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI '22), Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 451-460Conference paper, Published paper (Refereed)
Abstract [en]

To achieve seamless interactions, robots have to be capable of reliably detecting affective states in real time. One of the possible states that humans go through while interacting with robots is frustration. Detecting frustration from RGB images can be challenging in some real-world situations; thus, we investigate in this work whether thermal imaging can be used to create a model that is capable of detecting frustration induced by cognitive load and failure. To train our model, we collected a data set from 18 participants experiencing both types of frustration induced by a robot. The model was tested using features from several modalities: thermal, RGB, Electrodermal Activity (EDA), and all three combined. When data from both frustration cases were combined and used as training input, the model reached an accuracy of 89% with just RGB features, 87% using only thermal features, 84% using EDA, and 86% when using all modalities. Furthermore, the highest accuracy for the thermal data was reached using three facial regions of interest: nose, forehead and lower lip.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Series
ACM IEEE International Conference on Human-Robot Interaction, ISSN 2167-2121
Keywords
Human-robot interaction, Thermal imaging, Frustration, cognitive load, Action units
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-322478 (URN)10.1109/HRI53351.2022.9889545 (DOI)000869793600050 ()2-s2.0-85140750883 (Scopus ID)
Conference
17th Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI), MAR 07-10, 2022, ELECTR NETWORK
Note

Part of proceedings: ISBN 978-1-6654-0731-1

QC 20221216

Available from: 2022-12-16 Created: 2022-12-16 Last updated: 2025-08-25Bibliographically approved
Mohamed, Y. (Ed.). (2021). Predicting Human Interactivity State from Surrounding Social Signals. Boulder, Colorado, USA: IEEE
Open this publication in new window or tab >>Predicting Human Interactivity State from Surrounding Social Signals
2021 (English)Conference proceedings (editor) (Refereed)
Place, publisher, year, edition, pages
Boulder, Colorado, USA: IEEE, 2021
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-292956 (URN)10.1145/3434074.3447230 (DOI)000767970100110 ()
Note

QC 20210427

Available from: 2021-04-18 Created: 2021-04-18 Last updated: 2022-11-02Bibliographically approved
Mohamed, Y. & Lemaignan, S. (2021). ROS for Human-Robot Interaction. In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS): . Paper presented at IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), SEP 27-OCT 01, 2021, ELECTR NETWORK, Prague (pp. 3020-3027). IEEE
Open this publication in new window or tab >>ROS for Human-Robot Interaction
2021 (English)In: 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), IEEE , 2021, p. 3020-3027Conference paper, Published paper (Refereed)
Abstract [en]

Integrating real-time, complex social signal processing into robotic systems - especially in real-world, multiparty interaction situations - is a challenge faced by many in the Human-Robot Interaction (HRI) community. The difficulty is compounded by the lack of any standard model for human representation that would facilitate the development and interoperability of social perception components and pipelines. We introduce in this paper a set of conventions and standard interfaces for HRI scenarios, designed to be used with the Robot Operating System (ROS). It directly aims at promoting interoperability and re-usability of core functionality between the many HRI-related software tools, from skeleton tracking, to face recognition, to natural language processing. Importantly, these interfaces are designed to be relevant to a broad range of HRI applications, from high-level crowd simulation, to group-level social interaction modelling, to detailed modelling of human kinematics. We demonstrate these interfaces by providing a reference pipeline implementation, packaged to be easily downloaded and evaluated by the community.

Place, publisher, year, edition, pages
IEEE, 2021
Series
IEEE International Conference on Intelligent Robots and Systems, ISSN 2153-0858
National Category
Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-310037 (URN)10.1109/IROS51168.2021.9636816 (DOI)000755125502060 ()2-s2.0-85124368180 (Scopus ID)
Conference
IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), SEP 27-OCT 01, 2021, ELECTR NETWORK, Prague
Note

QC 20220324

Part of conference proceedings: ISBN 978-1-6654-1714-3

Available from: 2022-03-24 Created: 2022-03-24 Last updated: 2022-06-25Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-5660-5330

Search in DiVA

Show all publications