kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 10) Show all publications
Zhang, Q., Khoche, A., Yang, Y., Ling, L., Mansouri, S. S., Andersson, O. & Jensfelt, P. (2025). HiMo: High-Speed Objects Motion Compensation in Point Clouds. IEEE Transactions on robotics, 41, 5896-5911
Open this publication in new window or tab >>HiMo: High-Speed Objects Motion Compensation in Point Clouds
Show others...
2025 (English)In: IEEE Transactions on robotics, ISSN 1552-3098, E-ISSN 1941-0468, Vol. 41, p. 5896-5911Article in journal (Refereed) Published
Abstract [en]

LiDAR point cloud is essential for autonomous vehicles, but motion distortions from dynamic objects degrade the data quality. While previous work has considered distortions caused by ego motion, distortions caused by other moving objects remain largely overlooked, leading to errors in object shape and position. This distortion is particularly pronounced in high-speed environments such as highways and in multi-LiDAR configurations, a common setup for heavy vehicles. To address this challenge, we introduce HiMo, a pipeline that repurposes scene flow estimation for non-ego motion compensation, correcting the representation of dynamic objects in point clouds. During the development of HiMo, we observed that existing self-supervised scene flow estimators often produce degenerate or inconsistent estimates under high-speed distortion. We further propose SeFlow++, a real-time scene flow estimator that achieves state-of-the-art performance on both scene flow and motion compensation. Since well-established motion distortion metrics are absent in the literature, we introduce two evaluation metrics: compensation accuracy at a point level and shape similarity of objects. We validate HiMo through extensive experiments on Argoverse 2, ZOD and a newly collected real-world dataset featuring highway driving and multi-LiDAR-equipped heavy vehicles. Our findings show that HiMo improves the geometric consistency and visual fidelity of dynamic objects in LiDAR point clouds, benefiting downstream tasks such as semantic segmentation and 3D detection. See https://kin-zhang.github.io/HiMo for more details.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
Autonomous Driving Navigation, Computer Vision for Transportation, Motion Compensation, Range Sensing
National Category
Computer graphics and computer vision Robotics and automation Vehicle and Aerospace Engineering Signal Processing
Identifiers
urn:nbn:se:kth:diva-372474 (URN)10.1109/TRO.2025.3619042 (DOI)2-s2.0-105019222489 (Scopus ID)
Note

QC 20251107

Available from: 2025-11-07 Created: 2025-11-07 Last updated: 2025-11-07Bibliographically approved
Zhang, Q., Yang, Y., Li, P., Andersson, O. & Jensfelt, P. (2025). SeFlow: A Self-supervised Scene Flow Method in Autonomous Driving. In: Roth, S Russakovsky, O Sattler, T Varol, G Leonardis, A Ricci, E (Ed.), COMPUTER VISION-ECCV 2024, PT I: . Paper presented at 18th European Conference on Computer Vision (ECCV), SEP 29-OCT 04, 2024, Milan, ITALY (pp. 353-369). Springer Nature, 15059
Open this publication in new window or tab >>SeFlow: A Self-supervised Scene Flow Method in Autonomous Driving
Show others...
2025 (English)In: COMPUTER VISION-ECCV 2024, PT I / [ed] Roth, S Russakovsky, O Sattler, T Varol, G Leonardis, A Ricci, E, Springer Nature , 2025, Vol. 15059, p. 353-369Conference paper, Published paper (Refereed)
Abstract [en]

Scene flow estimation predicts the 3D motion at each point in successive LiDAR scans. This detailed, point-level, information can help autonomous vehicles to accurately predict and understand dynamic changes in their surroundings. Current state-of-the-art methods require annotated data to train scene flow networks and the expense of labeling inherently limits their scalability. Self-supervised approaches can overcome the above limitations, yet face two principal challenges that hinder optimal performance: point distribution imbalance and disregard for object-level motion constraints. In this paper, we propose SeFlow, a self-supervised method that integrates efficient dynamic classification into a learning-based scene flow pipeline. We demonstrate that classifying static and dynamic points helps design targeted objective functions for different motion patterns. We also emphasize the importance of internal cluster consistency and correct object point association to refine the scene flow estimation, in particular on object details. Our real-time capable method achieves state-of-the-art performance on the self-supervised scene flow task on Argoverse 2 and Waymo datasets. The code is open-sourced at https://github.com/KTH-RPL/SeFlow.

Place, publisher, year, edition, pages
Springer Nature, 2025
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 15059
Keywords
3D scene flow, self-supervised, autonomous driving, large-scale point cloud
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-357529 (URN)10.1007/978-3-031-73232-4_20 (DOI)001346378300020 ()2-s2.0-85206389477 (Scopus ID)
Conference
18th European Conference on Computer Vision (ECCV), SEP 29-OCT 04, 2024, Milan, ITALY
Note

Part of ISBN 978-3-031-73231-7; 978-3-031-73232-4

QC 20241209

Available from: 2024-12-09 Created: 2024-12-09 Last updated: 2025-02-07Bibliographically approved
Khoche, A., Zhang, Q., Sánchez, L. P., Asefaw, A., Mansouri, S. S. & Jensfelt, P. (2025). SSF: Sparse Long-Range Scene Flow for Autonomous Driving. In: 2025 IEEE International Conference on Robotics and Automation, ICRA 2025: . Paper presented at 2025 IEEE International Conference on Robotics and Automation, ICRA 2025, Atlanta, United States of America, May 19 2025 - May 23 2025 (pp. 6394-6400). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>SSF: Sparse Long-Range Scene Flow for Autonomous Driving
Show others...
2025 (English)In: 2025 IEEE International Conference on Robotics and Automation, ICRA 2025, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 6394-6400Conference paper, Published paper (Refereed)
Abstract [en]

Scene flow enables an understanding of the motion characteristics of the environment in the 3D world. It gains particular significance in the long-range, where object-based perception methods might fail due to sparse observations far away. Although significant advancements have been made in scene flow pipelines to handle large-scale point clouds, a gap remains in scalability with respect to long-range. We attribute this limitation to the common design choice of using dense feature grids, which scale quadratically with range. In this paper, we propose Sparse Scene Flow (SSF), a general pipeline for long-range scene flow, adopting a sparse convolution based backbone for feature extraction. This approach introduces a new challenge: a mismatch in size and ordering of sparse feature maps between time-sequential point scans. To address this, we propose a sparse feature fusion scheme, that augments the feature maps with virtual voxels at missing locations. Additionally, we propose a range-wise metric that implicitly gives greater importance to faraway points. Our method, SSF, achieves state-of-the-art results on the Argoverse2 dataset, demonstrating strong performance in long-range scene flow estimation. Our code is open-sourced at https://github.com/KTH-RPL/SSF.git.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Computer graphics and computer vision Computer Sciences Condensed Matter Physics Signal Processing
Identifiers
urn:nbn:se:kth:diva-371385 (URN)10.1109/ICRA55743.2025.11128770 (DOI)2-s2.0-105016555490 (Scopus ID)979-8-3315-4139-2 (ISBN)
Conference
2025 IEEE International Conference on Robotics and Automation, ICRA 2025, Atlanta, United States of America, May 19 2025 - May 23 2025
Note

Part of ISBN 979-8-3315-4139-2

QC 20251009

Available from: 2025-10-09 Created: 2025-10-09 Last updated: 2025-10-09Bibliographically approved
Jia, M., Zhang, Q., Yang, B., Wu, J., Liu, M. & Jensfelt, P. (2024). BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global Maps. IEEE Robotics and Automation Letters, 9(7), 6256-6263
Open this publication in new window or tab >>BeautyMap: Binary-Encoded Adaptable Ground Matrix for Dynamic Points Removal in Global Maps
Show others...
2024 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 9, no 7, p. 6256-6263Article in journal (Refereed) Published
Abstract [en]

Global point clouds that correctly represent the static environment features can facilitate accurate localization and robust path planning. However, dynamic objects introduce undesired 'ghost' tracks that are mixed up with the static environment. Existing dynamic removal methods normally fail to balance the performance in computational efficiency and accuracy. In response, we present 'BeautyMap' to efficiently remove the dynamic points while retaining static features for high-fidelity global maps. Our approach utilizes a binary-encoded matrix to efficiently extract the environment features. With a bit-wise comparison between matrices of each frame and the corresponding map region, we can extract potential dynamic regions. Then we use coarse to fine hierarchical segmentation of the z-axis to handle terrain variations. The final static restoration module accounts for the range-visibility of each single scan and protects static points out of sight. Comparative experiments underscore BeautyMap's superior performance in both accuracy and efficiency against other dynamic points removal methods.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
autonomous agents, mapping, Object detection, segmentation and categorization
National Category
Computer Sciences Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-366404 (URN)10.1109/LRA.2024.3402625 (DOI)001235498200001 ()2-s2.0-85193483859 (Scopus ID)
Note

QC 20250708

Available from: 2025-07-08 Created: 2025-07-08 Last updated: 2025-07-08Bibliographically approved
Zhang, Q., Yang, Y., Fang, H., Geng, R. & Jensfelt, P. (2024). DeFlow: Decoder of Scene Flow Network in Autonomous Driving. In: 2024 IEEE International Conference on Robotics and Automation, ICRA 2024: . Paper presented at 2024 IEEE International Conference on Robotics and Automation, ICRA 2024, Yokohama, Japan, May 13-17, 2024 (pp. 2105-2111). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>DeFlow: Decoder of Scene Flow Network in Autonomous Driving
Show others...
2024 (English)In: 2024 IEEE International Conference on Robotics and Automation, ICRA 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 2105-2111Conference paper, Published paper (Refereed)
Abstract [en]

Scene flow estimation determines a scene's 3D motion field, by predicting the motion of points in the scene, especially for aiding tasks in autonomous driving. Many networks with large-scale point clouds as input use voxelization to create a pseudo-image for real-time running. However, the voxelization process often results in the loss of point-specific features. This gives rise to a challenge in recovering those features for scene flow tasks. Our paper introduces DeFlow which enables a transition from voxel-based features to point features using Gated Recurrent Unit (GRU) refinement. To further enhance scene flow estimation performance, we formulate a novel loss function that accounts for the data imbalance between static and dynamic points. Evaluations on the Argoverse 2 scene flow task reveal that DeFlow achieves state-of-the-art results on large-scale point cloud data, demonstrating that our network has better performance and efficiency compared to others. The code is available at https://github.com/KTH-RPL/deflow.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Computer graphics and computer vision Signal Processing
Identifiers
urn:nbn:se:kth:diva-367438 (URN)10.1109/ICRA57147.2024.10610278 (DOI)001294576201114 ()2-s2.0-85188850619 (Scopus ID)
Conference
2024 IEEE International Conference on Robotics and Automation, ICRA 2024, Yokohama, Japan, May 13-17, 2024
Note

Part of ISBN 9798350384574

QC 20250718

Available from: 2025-07-18 Created: 2025-07-18 Last updated: 2025-07-18Bibliographically approved
Duberg, D., Zhang, Q., Jia, M. & Jensfelt, P. (2024). DUFOMap: Efficient Dynamic Awareness Mapping. IEEE Robotics and Automation Letters, 9(6), 5038-5045
Open this publication in new window or tab >>DUFOMap: Efficient Dynamic Awareness Mapping
2024 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 9, no 6, p. 5038-5045Article in journal (Refereed) Published
Abstract [en]

The dynamic nature of the real world is one of the main challenges in robotics. The first step in dealing with it is to detect which parts of the world are dynamic. A typical benchmark task is to create a map that contains only the static part of the world to support, for example, localization and planning. Current solutions are often applied in post-processing, where parameter tuning allows the user to adjust the setting for a specific dataset. In this letter, we propose DUFOMap, a novel dynamic awareness mapping framework designed for efficient online processing. Despite having the same parameter settings for all scenarios, it performs better or is on par with state-of-the-art methods. Ray casting is utilized to identify and classify fully observed empty regions. Since these regions have been observed empty, it follows that anything inside them at another time must be dynamic. Evaluation is carried out in various scenarios, including outdoor environments in KITTI and Argoverse 2, open areas on the KTH campus, and with different sensor types. DUFOMap outperforms the state of the art in terms of accuracy and computational efficiency.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Mapping, object detection, robotics and automation in construction, segmentation and categorization
National Category
Robotics and automation Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-366806 (URN)10.1109/LRA.2024.3387658 (DOI)001205782500001 ()2-s2.0-85190348409 (Scopus ID)
Note

QC 20250710

Available from: 2025-07-10 Created: 2025-07-10 Last updated: 2025-07-10Bibliographically approved
Yang, Y., Zhang, Q., Ikemura, K., Batool, N. & Folkesson, J. (2024). Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models. In: 35th IEEE Intelligent Vehicles Symposium, IV 2024: . Paper presented at 35th IEEE Intelligent Vehicles Symposium, IV 2024, Jeju Island, Korea, Jun 2 2024 - Jun 5 2024 (pp. 2405-2412). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Hard Cases Detection in Motion Prediction by Vision-Language Foundation Models
Show others...
2024 (English)In: 35th IEEE Intelligent Vehicles Symposium, IV 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 2405-2412Conference paper, Published paper (Refereed)
Abstract [en]

Addressing hard cases in autonomous driving, such as anomalous road users, extreme weather conditions, and complex traffic interactions, presents significant challenges. To ensure safety, it is crucial to detect and manage these scenarios effectively for autonomous driving systems. However, the rarity and high-risk nature of these cases demand extensive, diverse datasets for training robust models. Vision-Language Foundation Models (VLMs) have shown remarkable zero-shot capabilities as being trained on extensive datasets. This work explores the potential of VLMs in detecting hard cases in autonomous driving. We demonstrate the capability of VLMs such as GPT-4v in detecting hard cases in traffic participant motion prediction on both agent and scenario levels. We introduce a feasible pipeline where VLMs, fed with sequential image frames with designed prompts, effectively identify challenging agents or scenarios, which are verified by existing prediction models. Moreover, by taking advantage of this detection of hard cases by VLMs, we further improve the training efficiency of the existing motion prediction pipeline by performing data selection for the training samples suggested by GPT. We show the effectiveness and feasibility of our pipeline incorporating VLMs with state-of-the-art methods on NuScenes datasets. The code is accessible at https://github.com/KTH-RPL/Detect-VLM.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-351756 (URN)10.1109/IV55156.2024.10588694 (DOI)001275100902068 ()2-s2.0-85199784263 (Scopus ID)
Conference
35th IEEE Intelligent Vehicles Symposium, IV 2024, Jeju Island, Korea, Jun 2 2024 - Jun 5 2024
Note

Part of ISBN [9798350348811]

QC 20240815

Available from: 2024-08-13 Created: 2024-08-13 Last updated: 2025-02-07Bibliographically approved
Yang, Y., Zhang, Q., Li, C., Simões Marta, D., Batool, N. & Folkesson, J. (2024). Human-Centric Autonomous Systems With LLMs for User Command Reasoning. In: 2024 Ieee Winter Conference On Applications Of Computer Vision Workshops, Wacvw 2024: . Paper presented at IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), JAN 04-08, 2024, Waikoloa, HI (pp. 988-994). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Show others...
2024 (English)In: 2024 Ieee Winter Conference On Applications Of Computer Vision Workshops, Wacvw 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 988-994Conference paper, Published paper (Refereed)
Abstract [en]

The evolution of autonomous driving has made remarkable advancements in recent years, evolving into a tangible reality. However, a human-centric large-scale adoption hinges on meeting a variety of multifaceted requirements. To ensure that the autonomous system meets the user's intent, it is essential to accurately discern and interpret user commands, especially in complex or emergency situations. To this end, we propose to leverage the reasoning capabilities of Large Language Models (LLMs) to infer system requirements from in-cabin users' commands. Through a series of experiments that include different LLM models and prompt designs, we explore the few-shot multivariate binary classification accuracy of system requirements from natural language textual commands. We confirm the general ability of LLMs to understand and reason about prompts but underline that their effectiveness is conditioned on the quality of both the LLM model and the design of appropriate sequential prompts. Code and models are public with the link https://github.com/KTH-RPL/DriveCmd_LLM.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
IEEE Winter Conference on Applications of Computer Vision Workshops, ISSN 2572-4398
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-351635 (URN)10.1109/WACVW60836.2024.00108 (DOI)001223022200040 ()2-s2.0-85188691382 (Scopus ID)
Conference
IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), JAN 04-08, 2024, Waikoloa, HI
Note

QC 20240813

Part of ISBN 979-8-3503-7028-7, 979-8-3503-7071-3

Available from: 2024-08-13 Created: 2024-08-13 Last updated: 2024-10-11Bibliographically approved
Zhang, Q., Duberg, D., Geng, R., Jia, M., Wang, L. & Jensfelt, P. (2023). A Dynamic Points Removal Benchmark in Point Cloud Maps. In: 2023 IEEE 26th International Conference on Intelligent Transportation Systems, ITSC 2023: . Paper presented at 26th IEEE International Conference on Intelligent Transportation Systems, ITSC 2023, Bilbao, Spain, Sep 24 2023 - Sep 28 2023 (pp. 608-614). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A Dynamic Points Removal Benchmark in Point Cloud Maps
Show others...
2023 (English)In: 2023 IEEE 26th International Conference on Intelligent Transportation Systems, ITSC 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 608-614Conference paper, Published paper (Refereed)
Abstract [en]

In the field of robotics, the point cloud has become an essential map representation. From the perspective of downstream tasks like localization and global path planning, points corresponding to dynamic objects will adversely affect their performance. Existing methods for removing dynamic points in point clouds often lack clarity in comparative evaluations and comprehensive analysis. Therefore, we propose an easy-to-extend unified benchmarking framework for evaluating techniques for removing dynamic points in maps. It includes refactored state-of-art methods and novel metrics to analyze the limitations of these approaches. This enables researchers to dive deep into the underlying reasons behind these limitations. The benchmark makes use of several datasets with different sensor types. All the code and datasets related to our study are publicly available for further development and utilization.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-344365 (URN)10.1109/ITSC57777.2023.10422094 (DOI)001178996700090 ()2-s2.0-85186537890 (Scopus ID)
Conference
26th IEEE International Conference on Intelligent Transportation Systems, ITSC 2023, Bilbao, Spain, Sep 24 2023 - Sep 28 2023
Note

Part of ISBN 9798350399462

QC 20240315

Available from: 2024-03-13 Created: 2024-03-13 Last updated: 2025-12-05Bibliographically approved
Yang, Y., Zhang, Q., Gilles, T., Batool, N. & Folkesson, J. (2023). RMP: A Random Mask Pretrain Framework for Motion Prediction. In: 2023 IEEE 26th International Conference on Intelligent Transportation Systems, ITSC 2023: . Paper presented at 26th IEEE International Conference on Intelligent Transportation Systems, ITSC 2023, Bilbao, Spain, Sep 24 2023 - Sep 28 2023 (pp. 3717-3723). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>RMP: A Random Mask Pretrain Framework for Motion Prediction
Show others...
2023 (English)In: 2023 IEEE 26th International Conference on Intelligent Transportation Systems, ITSC 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 3717-3723Conference paper, Published paper (Refereed)
Abstract [en]

As the pretraining technique is growing in popularity, little work has been done on pretrained learning-based motion prediction methods in autonomous driving. In this paper, we propose a framework to formalize the pretraining task for trajectory prediction of traffic participants. Within our framework, inspired by the random masked model in natural language processing (NLP) and computer vision (CV), objects' positions at random timesteps are masked and then filled in by the learned neural network (NN). By changing the mask profile, our framework can easily switch among a range of motion-related tasks. We show that our proposed pretraining framework is able to deal with noisy inputs and improves the motion prediction accuracy and miss rate, especially for objects occluded over time by evaluating it on Argoverse and NuScenes datasets.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Series
IEEE Conference on Intelligent Transportation Systems, Proceedings, ITSC, ISSN 2153-0009
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-344363 (URN)10.1109/ITSC57777.2023.10422522 (DOI)001178996703113 ()2-s2.0-85186535191 (Scopus ID)
Conference
26th IEEE International Conference on Intelligent Transportation Systems, ITSC 2023, Bilbao, Spain, Sep 24 2023 - Sep 28 2023
Note

Part of ISBN 979-835039946-2

QC 20240315

Available from: 2024-03-13 Created: 2024-03-13 Last updated: 2025-02-07Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-7882-948X

Search in DiVA

Show all publications