kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Are You an Expert? Instruction Adaptation Using Multi-Modal Affect Detections with Thermal Imaging and Context
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0001-5660-5330
PAL Robotics, Barcelona, Spain.
Umeå University, Umeå, Sweden.ORCID iD: 0000-0003-2282-9939
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0000-0002-1170-7162
Show others and affiliations
2025 (English)Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
2025.
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-369162OAI: oai:DiVA.org:kth-369162DiVA, id: diva2:1993093
Conference
IEEE International Conference on Robot and Human Interactive Communication, Eindhoven University of Technology, Eindhoven, The Netherlands, Aug 25-29, 2025.
Available from: 2025-08-29 Created: 2025-08-29 Last updated: 2025-09-05Bibliographically approved
In thesis
1. Multi-Modal Affective State Detection For Dyadic Interactions Using Thermal Imaging and Context
Open this publication in new window or tab >>Multi-Modal Affective State Detection For Dyadic Interactions Using Thermal Imaging and Context
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Until recently, most robotic systems have operated with limited emotional intelligence, primarily responding to pre-programmed cues rather than adapting to human emotional states. Thus, affect recognition in humanrobot-interaction remains a significant challenge and is twofold: robots must not only detect emotional expressions but they also need to interpret them within their social context, requiring systems that are capable of collecting information from its surrounding, analyzing it and thereafter generalizing across different interaction scenarios and cultural contexts to handle more complex situations.

This thesis tackles affect recognition using multi-modal approaches that combine thermal imaging, facial expression analysis, and contextual understanding. Thermal imaging offers unique insights into physiological responses associated with emotional states, complementing traditional vision-based approaches while maintaining non-contact operation. The integration of thermal imaging, facial expression analysis, and contextual understanding creates a comprehensive multi-modal framework that addresses the key challenges in affect recognition, such as varying lighting conditions, occlusions, and ambiguous emotional expressions. This combination provides complementary information streams that enhance robustness in real-world environments, making it an effective case study for developing context-aware emotional intelligence in robotics.

We introduce a novel context-aware transformer architecture that processes multiple data streams while maintaining temporal relationships and contextual understanding. Each modality contributes complementary information about the user’s emotional state, while the context processing ensures situation-appropriate interpretation. For instance, distinguishing between a smile indicating enjoyment during collaborative tasks versus one masking nervousness in stressful situations. This contextual awareness is crucial for appropriate robot responses in real-world deployments.

The research contributions span four areas: (1) developing robust thermal feature extraction techniques that capture subtle emotional responses (2) creating a transformer-based architecture for multi-modal fusion that effectively incorporates situational information, (3) implementing real-time processing pipelines that enable practical deployment in human-robot interaction scenarios, and (4) validating these approaches through extensive real-world interaction studies. Results show improved recognition accuracy from 77% using traditional approaches to 89% with our context-aware multi-modal system, demonstrating the ability to understand and appropriately respond to human emotions in dynamic social situations.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. x, 56
Series
TRITA-EECS-AVL ; 2025:74
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-368995 (URN)9789181063431 (ISBN)
Public defence
2025-09-26, D37, Lindstedtsvägen 9, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20250905

Available from: 2025-09-05 Created: 2025-08-25 Last updated: 2025-09-29Bibliographically approved

Open Access in DiVA

fulltext(1046 kB)69 downloads
File information
File name FULLTEXT01.pdfFile size 1046 kBChecksum SHA-512
f2c9a91ae058668c92d4623e464c8959ab130c5e27bddf9a5fef82976f420276131c80c848e36b7c646a30918ea420eb79d83f9c763d1cac068c5368681398a7
Type fulltextMimetype application/pdf

Authority records

Mohamed, YoussefGüneysu, ArzuJensfelt, PatricSmith, Christian

Search in DiVA

By author/editor
Mohamed, YoussefGüneysu, ArzuJensfelt, PatricSmith, Christian
By organisation
Robotics, Perception and Learning, RPL
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 69 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 505 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf