S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous DrivingShow others and affiliations
2025 (English)In: Proceedings - 2025 IEEE Winter Conference on Applications of Computer Vision, WACV 2025, Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 1660-1670Conference paper, Published paper (Refereed)
Abstract [en]
Recent self-supervised clustering-based pre-training techniques like DINO and CrIBo have shown impressive results for downstream detection and segmentation tasks. However, real-world applications such as autonomous driving face challenges with imbalanced object class and size distributions and complex scene geometries. In this paper, we propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training. Specifically, our contributions are threefold: First, we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals. Second, we introduce object diversity consistent spatial clustering, to handle imbalanced and diverse object sizes, ranging from large background areas to small objects such as pedestrians and traffic signs. Third, we propose a depth-guided spatial clustering to regularize learning based on geometric information of the scene, thus further refining region separation on the feature level. Our learned representations significantly improve performance in downstream semantic segmentation and 3D object detection tasks on the nuScenes, nuImages, and Cityscapes datasets and show promising domain translation properties.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2025. p. 1660-1670
Keywords [en]
3d object detection, autonomous driving, self-supervised learning, semantic segmentation, visual fundational models
National Category
Computer graphics and computer vision Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-363208DOI: 10.1109/WACV61041.2025.00169Scopus ID: 2-s2.0-105003631689OAI: oai:DiVA.org:kth-363208DiVA, id: diva2:1956915
Conference
2025 IEEE/CVF Winter Conference on Applications of Computer Vision, WACV 2025, Tucson, United States of America, Feb 28 2025 - Mar 4 2025
Note
Part of ISBN 9798331510831
QC 20250512
2025-05-072025-05-072025-05-12Bibliographically approved