Empowering natural human–robot collaboration through multimodal language models and spatial intelligence: Pathways and perspectivesShow others and affiliations
2026 (English)In: Robotics and Computer-Integrated Manufacturing, ISSN 0736-5845, E-ISSN 1879-2537, Vol. 97, article id 103064Article, review/survey (Refereed) Published
Abstract [en]
Industry 5.0 advocates human-centric smart manufacturing (HSM), with growing attention to proactive human-machine collaboration (HRC). Meanwhile, the rapid development of Multimodal large language models (MLLMs) and embodied intelligence is driving an unprecedented evolution. This work aims to leverage these opportunities to enhance robots’ learning and cognitive capabilities, enabling seamless and natural interaction. However, current research often overlooks human–robot symbiosis and lacks attention to specialized models and practical applications. This review adheres to a human-centric vision, taking language as the pivot to connect humans with large models. To our best knowledge, this is the first attempt to integrate HRC, MLLMs and embodied intelligence into a holistic view. The review first introduces representative foundation models to provide a comprehensive summary of state-of-the-art methods in the ”Perception-Cognition-Actuation” loop. It then discusses pathways and platforms for efficient spatial skills learning, followed by an analysis of four key questions from the ”Why, How, What, Where” perspectives. Finally, it highlights future challenges and potential research directions. It is hoped that this work can help fill the research gap between HRC and MLLMs, offering a systematic pathway for developing human-centered collaborative systems and promoting further exploration and innovation in this exciting and crucial field. The resources are available at: https://github.com/WuDuidi/MLLM-HRC-Survey.
Place, publisher, year, edition, pages
Elsevier BV , 2026. Vol. 97, article id 103064
Keywords [en]
Embodied intelligence, Human–robot collaboration, Large language model, Robot learning, Smart manufacturing
National Category
Production Engineering, Human Work Science and Ergonomics Robotics and automation
Identifiers
URN: urn:nbn:se:kth:diva-368601DOI: 10.1016/j.rcim.2025.103064ISI: 001514039100001Scopus ID: 2-s2.0-105007620255OAI: oai:DiVA.org:kth-368601DiVA, id: diva2:1990132
Note
QC 20250819
2025-08-192025-08-192025-09-26Bibliographically approved