kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 14) Show all publications
BagheriFard, Z., Guarese, R., Quintero, L., Johnson, F., Edvinsson, B. & Romero, M. (2026). Enhancing Manufacturing Training Through Augmented Situated Visualization. In: Human-Computer Interaction – INTERACT 2025 - 20th IFIP TC 13 International Conference, 2025, Proceedings: . Paper presented at 20th IFIP TC 13 International Conference on Human-Computer Interaction, INTERACT 2025, Belo Horizonte, Brazil, Sep 8 2025 - Sep 12 2025 (pp. 66-71). Springer Nature
Open this publication in new window or tab >>Enhancing Manufacturing Training Through Augmented Situated Visualization
Show others...
2026 (English)In: Human-Computer Interaction – INTERACT 2025 - 20th IFIP TC 13 International Conference, 2025, Proceedings, Springer Nature , 2026, p. 66-71Conference paper, Published paper (Refereed)
Abstract [en]

Traditional training methods in industrial environments often lack real-time guidance and interactive feedback, which can make knowledge sharing challenging. Augmented Situated Visualization (SV) can improve industrial training by providing in-depth, spatially relevant instructions, which are particularly valuable when safety procedures are crucial. This work describes a tool for SV deployed on a commercial headset and investigates how two different SV patterns, 2D labels and 3D ghosts, impact user experience, workload, discomfort, task completion time, and memory recall in a training scenario for machine maintenance.

Place, publisher, year, edition, pages
Springer Nature, 2026
Keywords
AR, HCI, Industrial Training, Situated Visualization
National Category
Human Computer Interaction Production Engineering, Human Work Science and Ergonomics Computer Sciences Computer Systems
Identifiers
urn:nbn:se:kth:diva-371719 (URN)10.1007/978-3-032-05008-3_14 (DOI)2-s2.0-105016569722 (Scopus ID)
Conference
20th IFIP TC 13 International Conference on Human-Computer Interaction, INTERACT 2025, Belo Horizonte, Brazil, Sep 8 2025 - Sep 12 2025
Note

Part of ISBN 9783032050076

QC 20251022

Available from: 2025-10-22 Created: 2025-10-22 Last updated: 2025-10-22Bibliographically approved
Guarese, R., Gokan Khan, M., Lassiter, D., Vachier, J., Johnson, F. & Edvinsson, B. (2025). A Scoping Review and Expert Recommendations for Immersive Solutions towards Predictive Maintenance. In: Proceedings 2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW): . Paper presented at IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2025 - Abstracts and Workshops, Saint Malo, France, March 8-12, 2025. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A Scoping Review and Expert Recommendations for Immersive Solutions towards Predictive Maintenance
Show others...
2025 (English)In: Proceedings 2025 IEEE Conference on Virtual Reality and 3D User Interfaces Abstracts and Workshops (VRW), Institute of Electrical and Electronics Engineers (IEEE) , 2025Conference paper, Published paper (Refereed)
Abstract [en]

Under an industrial-academic partnership project, the present work aims to map and catalog the different applications of Augmented and Virtual Reality in predictive maintenance (PdM) practices. Through a preliminary scoping review, we targeted two main digital libraries in computing and engineering. Thus, we address the key attributes regarding the types of immersive technologies and the solutions used in several industries for PdM. By categorizing the surveyed prototypes according to 10 parameters in their interaction, visualization, and research methods, we expose the state-of-the-art and valuable knowledge gaps within immersive PdM. After this analysis, we conducted a workshop with 3 manufacturing experts discussing the future of maintenance interfaces, bringing forth their feedback in the shape of recommendations for what to further explore within immersive PdM.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Predictive Maintenance, Virtual Reality, Augmented Reality, Digital Twin, Multimodal Interaction
National Category
Human Computer Interaction
Research subject
Computer Science; Production Engineering
Identifiers
urn:nbn:se:kth:diva-363247 (URN)10.1109/VRW66409.2025.00217 (DOI)001535113600211 ()2-s2.0-105005148446 (Scopus ID)
Conference
IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2025 - Abstracts and Workshops, Saint Malo, France, March 8-12, 2025
Projects
SMART Pharmaceutical Manufacturing
Note

Part of ISBN 979-8-3315-1484-6

QC 20250929

Available from: 2025-05-09 Created: 2025-05-09 Last updated: 2025-12-05Bibliographically approved
Chhatre, K., Guarese, R., Matviienko, A. & Peters, C. (2025). Evaluating Speech and Video Models for Face-Body Congruence. In: I3D Companion '25: Companion Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games: . Paper presented at ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games-I3D 2025, NJIT, Jersey City, NJ, USA, 7-9 May 2025. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Evaluating Speech and Video Models for Face-Body Congruence
2025 (English)In: I3D Companion '25: Companion Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games, Association for Computing Machinery (ACM) , 2025Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Animations produced by generative models are often evaluated using objective quantitative metrics that do not fully capture perceptual effects in immersive virtual environments. To address this gap, we present a preliminary perceptual evaluation of generative models for animation synthesis, conducted via a VR-based user study (N = 48). Our investigation specifically focuses on animation congruency—ensuring that generated facial expressions and body gestures are both congruent with and synchronized to driving speech. We evaluated two state-of-the-art methods: a speech-driven full-body animation model and a video-driven full-body reconstruction model, assessing their capability to produce congruent facial expressions and body gestures. Our results demonstrate a strong user preference for combined facial and body animations, highlighting that congruent multimodal animations significantly enhance perceived realism compared to animations featuring only a single modality. By incorporating VR-based perceptual feedback into training pipelines, our approach provides a foundation for developing more engaging and responsive virtual characters.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
Computer graphics, Animation
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-363248 (URN)10.1145/3722564.3728374 (DOI)001502592200005 ()
Conference
ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games-I3D 2025, NJIT, Jersey City, NJ, USA, 7-9 May 2025
Note

Part of ISBN 9798400718335

QC 20250509

Available from: 2025-05-09 Created: 2025-05-09 Last updated: 2025-08-15Bibliographically approved
Chhatre, K., Guarese, R., Matviienko, A. & Peters, C. (2025). Evaluation of generative models for emotional 3D animation generation in VR. Frontiers in Computer Science, 7, Article ID 1598099.
Open this publication in new window or tab >>Evaluation of generative models for emotional 3D animation generation in VR
2025 (English)In: Frontiers in Computer Science, E-ISSN 2624-9898, Vol. 7, article id 1598099Article in journal (Refereed) Published
Abstract [en]

Introduction: Social interactions incorporate various nonverbal signals to convey emotions alongside speech, including facial expressions and body gestures. Generative models have demonstrated promising results in creating full-body nonverbal animations synchronized with speech; however, evaluations using statistical metrics in 2D settings fail to fully capture user-perceived emotions, limiting our understanding of the effectiveness of these models. Methods: To address this, we evaluate emotional 3D animation generative models within an immersive Virtual Reality (VR) environment, emphasizing user—centric metrics-emotional arousal realism, naturalness, enjoyment, diversity, and interaction quality—in a real-time human-agent interaction scenario. Through a user study (N = 48), we systematically examine perceived emotional quality for three state-of-the-art speech-driven 3D animation methods across two specific emotions: happiness (high arousal) and neutral (mid arousal). Additionally, we compare these generative models against real human expressions obtained via a reconstruction-based method to assess both their strengths and limitations and how closely they replicate real human facial and body expressions. Results: Our results demonstrate that methods explicitly modeling emotions lead to higher recognition accuracy compared to those focusing solely on speech-driven synchrony. Users rated the realism and naturalness of happy animations significantly higher than those of neutral animations, highlighting the limitations of current generative models in handling subtle emotional states. Discussion: Generative models underperformed compared to reconstruction-based methods in facial expression quality, and all methods received relatively low ratings for animation enjoyment and interaction quality, emphasizing the importance of incorporating user-centric evaluations into generative model development. Finally, participants positively recognized animation diversity across all generative models.

Place, publisher, year, edition, pages
Frontiers Media SA, 2025
Keywords
3D emotional animation, generative models, nonverbal communication, user-centric evaluation, virtual reality
National Category
Human Computer Interaction Computer Sciences
Identifiers
urn:nbn:se:kth:diva-369923 (URN)10.3389/fcomp.2025.1598099 (DOI)001549678200001 ()2-s2.0-105013367950 (Scopus ID)
Note

QC 20250918

Available from: 2025-09-18 Created: 2025-09-18 Last updated: 2025-09-18Bibliographically approved
Li, T., Moradi, M., Gokan Khan, M., Guarese, R., Kronqvist, J., Romero, M., . . . Wang, X. V. (2025). Fusing model-based and data-driven prognostic methods for real-time model updating. Mechanical systems and signal processing, 238, Article ID 113200.
Open this publication in new window or tab >>Fusing model-based and data-driven prognostic methods for real-time model updating
Show others...
2025 (English)In: Mechanical systems and signal processing, ISSN 0888-3270, E-ISSN 1096-1216, Vol. 238, article id 113200Article in journal (Refereed) Published
Abstract [en]

Prognostic methods broadly fall into two categories—model-based and data-driven—both of which have shown effectiveness across a range of engineering applications. Model-based approaches require an explicit representation of the degradation process, defining failure as the point when the physical damage state exceeds a predetermined threshold. Data-driven methods, on the other hand, leverage sensor data to directly predict end-of-life (EOL) or related prognostic information. Although both approaches offer insights that could be complementary and potentially fused, most existing fusion methods either combine the outputs from multiple methods or adopt a data-driven method to assist the model-based method. To further enhance the prognostic performance, this study proposes a fusion-based prognostic approach in which the output of one method is actively used to update the model of the other through either the crossover operator or the likelihood function. The proposed approach is validated using both an aluminum fatigue dataset and the Prognostics and Health Management (PHM) 2010 cutter wear dataset, demonstrating improved prognostic accuracy compared to either method used independently.

Place, publisher, year, edition, pages
Elsevier BV, 2025
Keywords
Data-driven prognostics, Fusion, Model-based prognostics, Mutual updating, Particle filter, Prognostics and health management
National Category
Other Civil Engineering Control Engineering
Identifiers
urn:nbn:se:kth:diva-369345 (URN)10.1016/j.ymssp.2025.113200 (DOI)2-s2.0-105013295168 (Scopus ID)
Note

QC 20250923

Available from: 2025-09-04 Created: 2025-09-04 Last updated: 2025-09-23Bibliographically approved
Gokan Khan, M., Guarese, R., Johnson, F., Wang, X. V., Bergman, A., Edvinsson, B., . . . Kronqvist, J. (2025). PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models. IEEE Access, 1-1
Open this publication in new window or tab >>PerfCam: Digital Twinning for Production Lines Using 3D Gaussian Splatting and Vision Models
Show others...
2025 (English)In: IEEE Access, E-ISSN 2169-3536, p. 1-1Article in journal (Refereed) Epub ahead of print
Abstract [en]

We introduce PerfCam, an open source Proof-of-Concept (PoC) digital twinning framework that combines camera and sensory data with 3D Gaussian Splatting and computer vision models for digital twinning, object tracking, and Key Performance Indicators (KPIs) extraction in industrial production lines. By utilizing 3D reconstruction and Convolutional Neural Networks (CNNs), PerfCam offers a semi-automated approach to object tracking and spatial mapping, enabling highly accurate digital twins that capture real-time KPIs such as availability, performance, Overall Equipment Effectiveness (OEE), and rate of conveyor belts in the production line. We validate the effectiveness of PerfCam through a practical deployment within realistic test production lines in the pharmaceutical industry and contribute an openly published dataset to support further research and development in the field. The results demonstrate PerfCam’s ability to deliver actionable insights through its precise digital twin capabilities, underscoring its value as an effective tool for developing usable digital twins in smart manufacturing environments and extracting operational analytics.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Production Line, Visual Model, Digital Twin, Convolutional Neural Network, Computer Vision, Sensor Data, 3D Reconstruction
National Category
Computer Sciences
Research subject
Computer Science; Industrial Engineering and Management
Identifiers
urn:nbn:se:kth:diva-363250 (URN)10.1109/access.2025.3567702 (DOI)001492129400039 ()2-s2.0-105004694919 (Scopus ID)
Projects
SMART Pharmaceutical Manufacturing
Funder
AstraZeneca, KTH-RPROJ-0146472
Note

QC 20250509

Available from: 2025-05-09 Created: 2025-05-09 Last updated: 2025-09-22Bibliographically approved
Vasiliu, M. M., Guarese, R., Jaatinen, J., Johnson, F., Edvinsson, B. & Romero, M. (2025). Towards Enhancing Industrial Training Through Conversational AI. In: CUI '25: Proceedings of the 7th ACM Conference on Conversational User Interfaces: . Paper presented at 7th ACM Conference on Conversational User Interfaces, CUI ’25, July 08–10, 2025, Waterloo, ON, Canada. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Towards Enhancing Industrial Training Through Conversational AI
Show others...
2025 (English)In: CUI '25: Proceedings of the 7th ACM Conference on Conversational User Interfaces, Association for Computing Machinery (ACM) , 2025Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Conversational AI (CAI) has proven effective in educational settings, however its potential in industrial training, where higher precision and reliability are required, remains under-explored. This work-in-progress paper proposes a study to examine how AI persona design (Machine vs. Expert Operator) and voice embodiment (Diegetic vs. Disembodied) influence cognitive load, task efficiency, and usability in industrial training. By training a large language model (LLM) on Standard Operating Procedure (SOP) data, this project aims to develop a CAI assistant that provides real-time, easy-to-access information during task execution, in an attempt to enhance training efficiency and reduce reliance on text-heavy manuals through a user-centered approach.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
Natural language interfaces
National Category
Natural Language Processing
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-367516 (URN)10.1145/3719160.3737643 (DOI)001539402100008 ()2-s2.0-105011598225 (Scopus ID)
Conference
7th ACM Conference on Conversational User Interfaces, CUI ’25, July 08–10, 2025, Waterloo, ON, Canada
Note

QC 20250729

Available from: 2025-07-18 Created: 2025-07-18 Last updated: 2025-12-08Bibliographically approved
Renata, A., Guarese, R., Takac, M. & Zambetta, F. (2024). Assessment of embodied visuospatial perspective taking in augmented reality: insights from a reaction time task. Frontiers in Virtual Reality, 5
Open this publication in new window or tab >>Assessment of embodied visuospatial perspective taking in augmented reality: insights from a reaction time task
2024 (English)In: Frontiers in Virtual Reality, E-ISSN 2673-4192, Vol. 5Article in journal (Refereed) Published
Abstract [en]

Understanding another person’s visual perceptions is known as visuospatial perspective taking, with evidence to date demonstrating it is delineated across two levels, depending on how different that perspective is. Some strategies for visuospatial perspective taking have also been found to involve embodied cognition. However, the current generalisation of these findings is limited due to experimental setup and the use of computer monitors as the interface for experimental tasks. Augmented reality interfaces could possibly extend on the generalisation of these findings by situating virtual stimuli in the real environment, thus providing a higher degree of ecological validity and experimental standardisation. This study aimed to observe visuospatial perspective taking in augmented reality. This was achieved in participant experiments (N=24)(N=24) using the Left-Right behavioural speeded decision task, which requires participants to discriminate between target objects relative to the perspective of an avatar. Angular disparity and posture congruence between the avatar and participant were manipulated between each trial to delineate between the two levels of visuospatial perspective taking and understand its potentially embodied nature. Although generalised linear mixed modeling indicated that angular disparity increased task difficulty, unexpectedly findings on posture congruence were less clear. Together, this suggests that visuospatial perspective taking in this study can be delineated across two levels. Further implications for embodied cognition and empathy research are discussed.

Place, publisher, year, edition, pages
Frontiers Media SA, 2024
Keywords
visuospatial perspective taking, augmented reality, cognitive empathy, reaction time, emobdied cognition
National Category
Computer and Information Sciences Human Computer Interaction
Research subject
Human-computer Interaction; Technology and Health; Computer Science
Identifiers
urn:nbn:se:kth:diva-357762 (URN)10.3389/frvir.2024.1422467 (DOI)001368487200001 ()2-s2.0-85211101217 (Scopus ID)
Note

QC 20241217

Available from: 2024-12-16 Created: 2024-12-16 Last updated: 2025-01-17Bibliographically approved
Guarese, R., van Schyndel, R., Polson, D. & Zambetta, F. (2024). Augmenting the dark: Exploring assistive micro-guidance in sonified mixed reality. Paper presented at 26th Symposium on Virtual and Augmented Reality (SVR), Manaus/AM, Brazil, September 30th to October 3rd, 2024. Anais Estendidos do XXVI Simpósio de Realidade Virtual e Aumentada (SVR Estendido 2024), 90-95
Open this publication in new window or tab >>Augmenting the dark: Exploring assistive micro-guidance in sonified mixed reality
2024 (English)In: Anais Estendidos do XXVI Simpósio de Realidade Virtual e Aumentada (SVR Estendido 2024), p. 90-95Article in journal (Refereed) Published
Abstract [en]

This thesis proposes a series of user evaluations of spatialized sonification methods rendered as AR in simulated and real-life scenarios. It proposes and promotes next-generation micro-guidance methods for low-visibility and vision-impaired (VI) scenarios. In 2D hand-guidance, results (N=47) outlined that sound spatiality methods had the most promising performance in time taken and distance from target. When assessing vertical hand-guidance in a 3D task (N=19), results indicated a significantly higher accuracy for a novel height-to-pitch method. Finally, a significant disparity was found between VI (N=20) and sighted (N=77) people regarding sighted people’s empathy with the VI community. After an AR blindness embodiment experience, sighted people’s (N=15) empathetic and sympathetic responses towards said community significantly increased. Ultimately, this thesis evaluates how audio AR can help users to have accurate and safe performances in day-to-day manual tasks.

Place, publisher, year, edition, pages
Sociedade Brasileira de Computacao - SB, 2024
Keywords
sonification, micro-guidance, augmented reality, visual impairment, empathy
National Category
Human Computer Interaction
Research subject
Human-computer Interaction
Identifiers
urn:nbn:se:kth:diva-357765 (URN)10.5753/svr_estendido.2024.243612 (DOI)
Conference
26th Symposium on Virtual and Augmented Reality (SVR), Manaus/AM, Brazil, September 30th to October 3rd, 2024
Note

No ISSN

QC 20250110

Available from: 2024-12-16 Created: 2024-12-16 Last updated: 2025-03-20Bibliographically approved
Wrife, A., Guarese, R., Iop, A. & Romero, M. (2024). Comparative analysis of spatiotemporal playback manipulation on virtual reality training for External Ventricular Drainage. Computers & graphics, 124, Article ID 104106.
Open this publication in new window or tab >>Comparative analysis of spatiotemporal playback manipulation on virtual reality training for External Ventricular Drainage
2024 (English)In: Computers & graphics, ISSN 0097-8493, E-ISSN 1873-7684, Vol. 124, article id 104106Article in journal (Refereed) Published
Abstract [en]

Extensive research has been conducted in multiple surgical specialities where Virtual Reality (VR) has been utilised, such as spinal neurosurgery. However, cranial neurosurgery remains relatively unexplored in this regard. This work explores the impact of adopting VR to study External Ventricular Drainage (EVD). In this study, pre-recorded Motion Captured data of an EVD procedure is visualised on a VR headset, in comparison to a desktop monitor condition. Participants (N = 20) ) were tasked with identifying and marking a key moment in the recordings. Objective and subjective metrics were recorded, such as completion time, temporal and spatial error distances, workload, and usability. The results from the experiment showed that the task was completed on average twice as fast in VR, when compared to desktop. However, desktop showed fewer error- prone results. Subjective feedback showed a slightly higher preference towards the VR environment concerning usability, while maintaining a comparable workload. Overall, VR displays are promising as an alternative tool to be used for educational and training purposes in cranial surgery.

Place, publisher, year, edition, pages
Elsevier BV, 2024
Keywords
Virtual reality, Surgical simulations, External ventricular drainage, Motion capture, Interaction controls
National Category
Surgery
Identifiers
urn:nbn:se:kth:diva-355300 (URN)10.1016/j.cag.2024.104106 (DOI)001334942500001 ()2-s2.0-85206016553 (Scopus ID)
Note

QC 20241030

Available from: 2024-10-30 Created: 2024-10-30 Last updated: 2024-10-30Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-1206-5701

Search in DiVA

Show all publications