kth.sePublikationer KTH
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Multi-modal Affect Detection Using Thermal and Optical Imaging in a Gamified Robotic Exercise
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Robotik, perception och lärande, RPL.ORCID-id: 0000-0001-5660-5330
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Robotik, perception och lärande, RPL. KTH, Skolan för elektroteknik och datavetenskap (EECS), Centra, Digital futures.ORCID-id: 0000-0003-2282-9939
PAL Robotics, Barcelona, Spain.ORCID-id: 0000-0002-3391-8876
KTH, Skolan för elektroteknik och datavetenskap (EECS), Intelligenta system, Robotik, perception och lärande, RPL.ORCID-id: 0000-0002-2212-4325
2024 (Engelska)Ingår i: International Journal of Social Robotics, ISSN 1875-4791, E-ISSN 1875-4805, Vol. 16, nr 5, s. 981-997Artikel i tidskrift (Refereegranskat) Published
Abstract [en]

Affect recognition, or the ability to detect and interpret emotional states, has the potential to be a valuable tool in the field of healthcare. In particular, it can be useful in gamified therapy, which involves using gaming techniques to motivate and keep the engagement of patients in therapeutic activities. This study aims to examine the accuracy of machine learning models using thermal imaging and action unit data for affect classification in a gamified robot therapy scenario. A self-report survey and three machine learning models were used to assess emotions including frustration, boredom, and enjoyment in participants during different phases of the game. The results showed that the multimodal approach with the combination of thermal imaging and action units with LSTM model had the highest accuracy of 77% for emotion classification over a 7-s sliding window, while thermal imaging had the lowest standard deviation among participants. The results suggest that thermal imaging and action units can be effective in detecting affective states and might have the potential to be used in healthcare applications, such as gamified therapy, as a promising non-intrusive method for recognizing internal states.

Ort, förlag, år, upplaga, sidor
Springer Nature , 2024. Vol. 16, nr 5, s. 981-997
Nyckelord [en]
Action units, Emotionally aware systems, Frustration, Human–robot interaction, Multi-modal affect recognition, Thermal imaging
Nationell ämneskategori
Människa-datorinteraktion (interaktionsdesign)
Identifikatorer
URN: urn:nbn:se:kth:diva-350001DOI: 10.1007/s12369-023-01066-1ISI: 001090565600001Scopus ID: 2-s2.0-85175291284OAI: oai:DiVA.org:kth-350001DiVA, id: diva2:1882428
Anmärkning

QC 20260127

Tillgänglig från: 2024-07-05 Skapad: 2024-07-05 Senast uppdaterad: 2026-01-27Bibliografiskt granskad
Ingår i avhandling
1. Multi-Modal Affective State Detection For Dyadic Interactions Using Thermal Imaging and Context
Öppna denna publikation i ny flik eller fönster >>Multi-Modal Affective State Detection For Dyadic Interactions Using Thermal Imaging and Context
2025 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Until recently, most robotic systems have operated with limited emotional intelligence, primarily responding to pre-programmed cues rather than adapting to human emotional states. Thus, affect recognition in humanrobot-interaction remains a significant challenge and is twofold: robots must not only detect emotional expressions but they also need to interpret them within their social context, requiring systems that are capable of collecting information from its surrounding, analyzing it and thereafter generalizing across different interaction scenarios and cultural contexts to handle more complex situations.

This thesis tackles affect recognition using multi-modal approaches that combine thermal imaging, facial expression analysis, and contextual understanding. Thermal imaging offers unique insights into physiological responses associated with emotional states, complementing traditional vision-based approaches while maintaining non-contact operation. The integration of thermal imaging, facial expression analysis, and contextual understanding creates a comprehensive multi-modal framework that addresses the key challenges in affect recognition, such as varying lighting conditions, occlusions, and ambiguous emotional expressions. This combination provides complementary information streams that enhance robustness in real-world environments, making it an effective case study for developing context-aware emotional intelligence in robotics.

We introduce a novel context-aware transformer architecture that processes multiple data streams while maintaining temporal relationships and contextual understanding. Each modality contributes complementary information about the user’s emotional state, while the context processing ensures situation-appropriate interpretation. For instance, distinguishing between a smile indicating enjoyment during collaborative tasks versus one masking nervousness in stressful situations. This contextual awareness is crucial for appropriate robot responses in real-world deployments.

The research contributions span four areas: (1) developing robust thermal feature extraction techniques that capture subtle emotional responses (2) creating a transformer-based architecture for multi-modal fusion that effectively incorporates situational information, (3) implementing real-time processing pipelines that enable practical deployment in human-robot interaction scenarios, and (4) validating these approaches through extensive real-world interaction studies. Results show improved recognition accuracy from 77% using traditional approaches to 89% with our context-aware multi-modal system, demonstrating the ability to understand and appropriately respond to human emotions in dynamic social situations.

Ort, förlag, år, upplaga, sidor
Stockholm: KTH Royal Institute of Technology, 2025. s. x, 56
Serie
TRITA-EECS-AVL ; 2025:74
Nationell ämneskategori
Data- och informationsvetenskap
Identifikatorer
urn:nbn:se:kth:diva-368995 (URN)9789181063431 (ISBN)
Disputation
2025-09-26, D37, Lindstedtsvägen 9, Stockholm, 13:00 (Engelska)
Opponent
Handledare
Anmärkning

QC 20250905

Tillgänglig från: 2025-09-05 Skapad: 2025-08-25 Senast uppdaterad: 2025-09-29Bibliografiskt granskad

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

Förlagets fulltextScopus

Person

Mohamed, YoussefGüneysu Özgür, ArzuLeite, Iolanda

Sök vidare i DiVA

Av författaren/redaktören
Mohamed, YoussefGüneysu Özgür, ArzuLemaignan, SéverinLeite, Iolanda
Av organisationen
Robotik, perception och lärande, RPLDigital futures
I samma tidskrift
International Journal of Social Robotics
Människa-datorinteraktion (interaktionsdesign)

Sök vidare utanför DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetricpoäng

doi
urn-nbn
Totalt: 113 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf