kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 189) Show all publications
Mohamed, Y., Lemaignan, S., Güneysu, A., Jensfelt, P. & Smith, C. (2025). Are You an Expert? Instruction Adaptation Using Multi-Modal Affect Detections with Thermal Imaging and Context. In: : . Paper presented at IEEE International Conference on Robot and Human Interactive Communication, Eindhoven University of Technology, Eindhoven, The Netherlands, Aug 25-29, 2025..
Open this publication in new window or tab >>Are You an Expert? Instruction Adaptation Using Multi-Modal Affect Detections with Thermal Imaging and Context
Show others...
2025 (English)Conference paper, Published paper (Refereed)
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-369162 (URN)
Conference
IEEE International Conference on Robot and Human Interactive Communication, Eindhoven University of Technology, Eindhoven, The Netherlands, Aug 25-29, 2025.
Available from: 2025-08-29 Created: 2025-08-29 Last updated: 2025-09-05Bibliographically approved
Mohamed, Y., Lemaignan, S., Güneysu, A., Jensfelt, P. & Smith, C. (2025). Context Matters: Understanding Socially Appropriate Affective Responses Via Sentence Embeddings. In: Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings: . Paper presented at 16th International Conference on Social Robotics, ICSR + AI 2024, Odense, Denmark, October 23-26, 2024 (pp. 78-91). Springer Nature
Open this publication in new window or tab >>Context Matters: Understanding Socially Appropriate Affective Responses Via Sentence Embeddings
Show others...
2025 (English)In: Social Robotics - 16th International Conference, ICSR + AI 2024, Proceedings, Springer Nature , 2025, p. 78-91Conference paper, Published paper (Refereed)
Abstract [en]

As AI systems increasingly engage in social interactions, comprehending human social dynamics is crucial. Affect recognition enables systems to respond appropriately to emotional nuances in social situations. However, existing multimodal approaches lack accounting for the social appropriateness of detected emotions within their contexts. This paper presents a novel methodology leveraging sentence embeddings to distinguish socially appropriate and inappropriate interactions for more context-aware AI systems. Our approach measures the semantic distance between facial expression descriptions and predefined reference points. We evaluate our method using a benchmark dataset and a real-world robot deployment in a library, combining GPT-4(V) for expression descriptions and ada-2 for sentence embeddings to detect socially inappropriate interactions. Our results underscore the importance of considering contextual factors for effective social interaction understanding through context-aware affect recognition, contributing to the development of socially intelligent AI capable of interpreting and responding to human affect appropriately.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
embeddings, human-robot interaction, machine learning, Social representation
National Category
Sociology (Excluding Social Work, Social Anthropology, Demography and Criminology) Robotics and automation
Identifiers
urn:nbn:se:kth:diva-362501 (URN)10.1007/978-981-96-3522-1_9 (DOI)001531722800009 ()2-s2.0-105002016733 (Scopus ID)
Conference
16th International Conference on Social Robotics, ICSR + AI 2024, Odense, Denmark, October 23-26, 2024
Note

Part of ISBN 9789819635214

QC 20250428

Available from: 2025-04-16 Created: 2025-04-16 Last updated: 2025-12-08Bibliographically approved
Mohamed, Y., Séverin, L., Güneysu, A., Jensfelt, P. & Smith, C. (2025). Fusion in Context: A Multimodal Approach to Affective State Recognition. In: : . Paper presented at 34th IEEE International Conference on Robot and Human Interactive Communication, Eindhoven University of Technology, Eindhoven, The Netherlands, Aug 25-29, 2025..
Open this publication in new window or tab >>Fusion in Context: A Multimodal Approach to Affective State Recognition
Show others...
2025 (English)Conference paper, Published paper (Refereed)
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-369160 (URN)
Conference
34th IEEE International Conference on Robot and Human Interactive Communication, Eindhoven University of Technology, Eindhoven, The Netherlands, Aug 25-29, 2025.
Note

QC 20250905

Available from: 2025-08-29 Created: 2025-08-29 Last updated: 2025-09-05Bibliographically approved
Zhang, Q., Khoche, A., Yang, Y., Ling, L., Mansouri, S. S., Andersson, O. & Jensfelt, P. (2025). HiMo: High-Speed Objects Motion Compensation in Point Clouds. IEEE Transactions on robotics, 41, 5896-5911
Open this publication in new window or tab >>HiMo: High-Speed Objects Motion Compensation in Point Clouds
Show others...
2025 (English)In: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 41, p. 5896-5911Article in journal (Refereed) Published
Abstract [en]

LiDAR point cloud is essential for autonomous vehicles, but motion distortions from dynamic objects degrade the data quality. While previous work has considered distortions caused by ego motion, distortions caused by other moving objects remain largely overlooked, leading to errors in object shape and position. This distortion is particularly pronounced in high-speed environments such as highways and in multi-LiDAR configurations, a common setup for heavy vehicles. To address this challenge, we introduce HiMo, a pipeline that repurposes scene flow estimation for non-ego motion compensation, correcting the representation of dynamic objects in point clouds. During the development of HiMo, we observed that existing self-supervised scene flow estimators often produce degenerate or inconsistent estimates under high-speed distortion. We further propose SeFlow++, a real-time scene flow estimator that achieves state-of-the-art performance on both scene flow and motion compensation. Since well-established motion distortion metrics are absent in the literature, we introduce two evaluation metrics: compensation accuracy at a point level and shape similarity of objects. We validate HiMo through extensive experiments on Argoverse 2, ZOD and a newly collected real-world dataset featuring highway driving and multi-LiDAR-equipped heavy vehicles. Our findings show that HiMo improves the geometric consistency and visual fidelity of dynamic objects in LiDAR point clouds, benefiting downstream tasks such as semantic segmentation and 3D detection. See https://kin-zhang.github.io/HiMo for more details.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Autonomous Driving Navigation, Computer Vision for Transportation, Motion Compensation, Range Sensing
National Category
Computer graphics and computer vision Robotics and automation Vehicle and Aerospace Engineering Signal Processing
Identifiers
urn:nbn:se:kth:diva-372474 (URN)10.1109/TRO.2025.3619042 (DOI)2-s2.0-105019222489 (Scopus ID)
Note

QC 20251107

Available from: 2025-11-07 Created: 2025-11-07 Last updated: 2025-11-07Bibliographically approved
Bruns, L., Zhang, J. & Jensfelt, P. (2025). Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV): . Paper presented at 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, February 26 - March 6, 2025 (pp. 2900-2909). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration
2025 (English)In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 2900-2909Conference paper, Published paper (Refereed)
Abstract [en]

Neural field-based SLAM methods typically employ a single monolithic field as their scene representation. This prevents efficient incorporation of loop closure constraints and limits scalability. To address these shortcomings we propose a novel RGB-D neural mapping framework in which the scene is represented by a collection of lightweight neural fields which are dynamically anchored to the pose graph of a sparse visual SLAM system. Our approach shows the ability to integrate large-scale loop closures while requiring only minimal reintegration. Furthermore we verify the scalability of our approach by demonstrating successful building-scale mapping taking multiple loop closures into account during the optimization and show that our method outperforms existing state-of-the-art approaches on large scenes in terms of quality and runtime. Our code is available open-source at https://github.com/KTH-RPL/neural_graph_mapping. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Computer Vision and Learning Systems
Identifiers
urn:nbn:se:kth:diva-372392 (URN)10.1109/WACV61041.2025.00287 (DOI)001481328900277 ()2-s2.0-105003634119 (Scopus ID)
Conference
2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, February 26 - March 6, 2025
Note

Part of ISBN 9798331510831, 9798331510848

QC 20251106

Available from: 2025-11-05 Created: 2025-11-05 Last updated: 2025-11-06Bibliographically approved
Zangeneh, F., Dekel, A., Pieropan, A. & Jensfelt, P. (2025). Quantifying Epistemic Uncertainty in Absolute Pose Regression. In: Image Analysis - 23rd Scandinavian Conference, SCIA 2025, Proceedings: . Paper presented at 23rd Scandinavian Conference on Image Analysis, SCIA 2025, Reykjavik, Iceland, June 23-25, 2025 (pp. 180-195). Springer Nature
Open this publication in new window or tab >>Quantifying Epistemic Uncertainty in Absolute Pose Regression
2025 (English)In: Image Analysis - 23rd Scandinavian Conference, SCIA 2025, Proceedings, Springer Nature , 2025, p. 180-195Conference paper, Published paper (Refereed)
Abstract [en]

Visual relocalization is the task of estimating the camera pose given an image it views. Absolute pose regression offers a solution to this task by training a neural network, directly regressing the camera pose from image features. While an attractive solution in terms of memory and compute efficiency, absolute pose regression’s predictions are inaccurate and unreliable outside the training domain. In this work, we propose a novel method for quantifying the epistemic uncertainty of an absolute pose regression model by estimating the likelihood of observations within a variational framework. Beyond providing a measure of confidence in predictions, our approach offers a unified model that also handles observation ambiguities, probabilistically localizing the camera in the presence of repetitive structures. Our method outperforms existing approaches in capturing the relation between uncertainty and prediction error.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Camera Relocalization, Uncertainty Estimation, VAEs
National Category
Computer graphics and computer vision Signal Processing
Identifiers
urn:nbn:se:kth:diva-368911 (URN)10.1007/978-3-031-95918-9_13 (DOI)001553877800013 ()2-s2.0-105009846579 (Scopus ID)
Conference
23rd Scandinavian Conference on Image Analysis, SCIA 2025, Reykjavik, Iceland, June 23-25, 2025
Note

Part of ISBN 9783031959172

QC 20250822

Available from: 2025-08-22 Created: 2025-08-22 Last updated: 2025-12-08Bibliographically approved
Zhang, Q., Yang, Y., Li, P., Andersson, O. & Jensfelt, P. (2025). SeFlow: A Self-supervised Scene Flow Method in Autonomous Driving. In: Roth, S Russakovsky, O Sattler, T Varol, G Leonardis, A Ricci, E (Ed.), COMPUTER VISION-ECCV 2024, PT I: . Paper presented at 18th European Conference on Computer Vision (ECCV), SEP 29-OCT 04, 2024, Milan, ITALY (pp. 353-369). Springer Nature, 15059
Open this publication in new window or tab >>SeFlow: A Self-supervised Scene Flow Method in Autonomous Driving
Show others...
2025 (English)In: COMPUTER VISION-ECCV 2024, PT I / [ed] Roth, S Russakovsky, O Sattler, T Varol, G Leonardis, A Ricci, E, Springer Nature , 2025, Vol. 15059, p. 353-369Conference paper, Published paper (Refereed)
Abstract [en]

Scene flow estimation predicts the 3D motion at each point in successive LiDAR scans. This detailed, point-level, information can help autonomous vehicles to accurately predict and understand dynamic changes in their surroundings. Current state-of-the-art methods require annotated data to train scene flow networks and the expense of labeling inherently limits their scalability. Self-supervised approaches can overcome the above limitations, yet face two principal challenges that hinder optimal performance: point distribution imbalance and disregard for object-level motion constraints. In this paper, we propose SeFlow, a self-supervised method that integrates efficient dynamic classification into a learning-based scene flow pipeline. We demonstrate that classifying static and dynamic points helps design targeted objective functions for different motion patterns. We also emphasize the importance of internal cluster consistency and correct object point association to refine the scene flow estimation, in particular on object details. Our real-time capable method achieves state-of-the-art performance on the self-supervised scene flow task on Argoverse 2 and Waymo datasets. The code is open-sourced at https://github.com/KTH-RPL/SeFlow.

Place, publisher, year, edition, pages
Springer Nature, 2025
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 15059
Keywords
3D scene flow, self-supervised, autonomous driving, large-scale point cloud
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-357529 (URN)10.1007/978-3-031-73232-4_20 (DOI)001346378300020 ()2-s2.0-85206389477 (Scopus ID)
Conference
18th European Conference on Computer Vision (ECCV), SEP 29-OCT 04, 2024, Milan, ITALY
Note

Part of ISBN 978-3-031-73231-7; 978-3-031-73232-4

QC 20241209

Available from: 2024-12-09 Created: 2024-12-09 Last updated: 2025-02-07Bibliographically approved
Khoche, A., Zhang, Q., Sánchez, L. P., Asefaw, A., Mansouri, S. S. & Jensfelt, P. (2025). SSF: Sparse Long-Range Scene Flow for Autonomous Driving. In: 2025 IEEE International Conference on Robotics and Automation, ICRA 2025: . Paper presented at 2025 IEEE International Conference on Robotics and Automation, ICRA 2025, Atlanta, United States of America, May 19 2025 - May 23 2025 (pp. 6394-6400). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>SSF: Sparse Long-Range Scene Flow for Autonomous Driving
Show others...
2025 (English)In: 2025 IEEE International Conference on Robotics and Automation, ICRA 2025, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 6394-6400Conference paper, Published paper (Refereed)
Abstract [en]

Scene flow enables an understanding of the motion characteristics of the environment in the 3D world. It gains particular significance in the long-range, where object-based perception methods might fail due to sparse observations far away. Although significant advancements have been made in scene flow pipelines to handle large-scale point clouds, a gap remains in scalability with respect to long-range. We attribute this limitation to the common design choice of using dense feature grids, which scale quadratically with range. In this paper, we propose Sparse Scene Flow (SSF), a general pipeline for long-range scene flow, adopting a sparse convolution based backbone for feature extraction. This approach introduces a new challenge: a mismatch in size and ordering of sparse feature maps between time-sequential point scans. To address this, we propose a sparse feature fusion scheme, that augments the feature maps with virtual voxels at missing locations. Additionally, we propose a range-wise metric that implicitly gives greater importance to faraway points. Our method, SSF, achieves state-of-the-art results on the Argoverse2 dataset, demonstrating strong performance in long-range scene flow estimation. Our code is open-sourced at https://github.com/KTH-RPL/SSF.git.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Computer graphics and computer vision Computer Sciences Condensed Matter Physics Signal Processing
Identifiers
urn:nbn:se:kth:diva-371385 (URN)10.1109/ICRA55743.2025.11128770 (DOI)2-s2.0-105016555490 (Scopus ID)979-8-3315-4139-2 (ISBN)
Conference
2025 IEEE International Conference on Robotics and Automation, ICRA 2025, Atlanta, United States of America, May 19 2025 - May 23 2025
Note

Part of ISBN 979-8-3315-4139-2

QC 20251009

Available from: 2025-10-09 Created: 2025-10-09 Last updated: 2025-10-09Bibliographically approved
Stower, R., Gautier, A., Wozniak, M. K., Jensfelt, P., Tumova, J. & Leite, I. (2025). Take a Chance on Me: How Robot Performance and Risk Behaviour Affects Trust and Risk-Taking. In: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction: . Paper presented at 20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, Mar 4 2025 - Mar 6 2025 (pp. 391-399). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Take a Chance on Me: How Robot Performance and Risk Behaviour Affects Trust and Risk-Taking
Show others...
2025 (English)In: HRI 2025 - Proceedings of the 2025 ACM/IEEE International Conference on Human-Robot Interaction, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 391-399Conference paper, Published paper (Refereed)
Abstract [en]

Real-world human-robot interactions often encompass uncertainty. This uncertainty can be handled in different ways, for example by designing robot planners to be more or less risk-tolerant. However, how users actually perceive different risk-taking behaviours in robots has yet to be described. Additionally, in the absence of guarantees on optimal robot performance, the interaction between risk and performance on user perceptions is also unclear. To address this gap, we conducted a user study with 84 participants investigating how robot performance and risk behaviour affects users' trust and risk-taking decisions. Participants collaborated with a Franka robot arm to perform a block-stacking task. We compared a robot which displays consistent but sub-optimal behaviours to a robot displaying risky but occasionally optimal behaviour. Risky robot behaviour led to higher trust than consistent behaviour when the robot was on average good at stacking blocks (high expectation), but lower trust when the robot was on average bad at stacking blocks (low expectation). Individual risk-willingness also predicted likelihood of selecting the risky robot over the consistent robot for future interactions, but only when the average expectation was low. These findings have implications for risk-aware planning and decision-making in mixed human-robot systems.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
collaborative robot, failure, risk-taking, trust, user study
National Category
Robotics and automation Human Computer Interaction
Identifiers
urn:nbn:se:kth:diva-363768 (URN)10.1109/HRI61500.2025.10973966 (DOI)2-s2.0-105004879443 (Scopus ID)
Conference
20th Annual ACM/IEEE International Conference on Human-Robot Interaction, HRI 2025, Melbourne, Australia, Mar 4 2025 - Mar 6 2025
Note

Part of ISBN 9798350378931

QC 20250527

Available from: 2025-05-21 Created: 2025-05-21 Last updated: 2025-05-27Bibliographically approved
Gaspar Sánchez, J. M., Bruns, L., Tumova, J., Jensfelt, P. & Törngren, M. (2025). Transitional Grid Maps: Joint Modeling of Static and Dynamic Occupancy. IEEE Open Journal of Intelligent Transportation Systems, 6, 1-10
Open this publication in new window or tab >>Transitional Grid Maps: Joint Modeling of Static and Dynamic Occupancy
Show others...
2025 (English)In: IEEE Open Journal of Intelligent Transportation Systems, E-ISSN 2687-7813, Vol. 6, p. 1-10Article in journal (Refereed) Published
Abstract [en]

Autonomous agents rely on sensor data to construct representations of their environments, essential for predicting future events and planning their actions. However, sensor measurements suffer from limited range, occlusions, and sensor noise. These challenges become more evident in highly dynamic environments. This work proposes a probabilistic framework to jointly infer which parts of an environment are statically and which parts are dynamically occupied. We formulate the problem as a Bayesian network and introduce minimal assumptions that significantly reduce the complexity of the problem. Based on those, we derive Transitional Grid Maps (TGMs), an efficient analytical solution. Using real data, we demonstrate how this approach produces better maps than the state-of-the-art by keeping track of both static and dynamic elements and, as a side effect, can help improve existing SLAM algorithms.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Computer Sciences Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-359349 (URN)10.1109/ojits.2024.3521449 (DOI)2-s2.0-85210909052 (Scopus ID)
Note

QC 20250130

Available from: 2025-01-30 Created: 2025-01-30 Last updated: 2025-05-27Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-1170-7162

Search in DiVA

Show all publications