kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Addressing Data Annotation Challenges in Multiple Sensors: A Solution for Scania Collected Datasets
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.ORCID iD: 0009-0009-6935-6797
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL.
Autonomous Transport Solutions Lab, Scania Group, Södertälje, Sweden.
Autonomous Transport Solutions Lab, Scania Group, Södertälje, Sweden.
Show others and affiliations
2024 (English)In: 2024 European Control Conference, ECC 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 1032-1038Conference paper, Published paper (Refereed)
Abstract [en]

Data annotation in autonomous vehicles is a critical step in the development of Deep Neural Network (DNN) based models or the performance evaluation of the perception system. This often takes the form of adding 3D bounding boxes on time-sequential and registered series of point-sets captured from active sensors like Light Detection and Ranging (LiDAR) and Radio Detection and Ranging (RADAR). When annotating multiple active sensors, there is a need to motion compensate and translate the points to a consistent coordinate frame and timestamp respectively. However, highly dynamic objects pose a unique challenge, as they can appear at different timestamps in each sensor's data. Without knowing the speed of the objects, their position appears to be different in different sensor outputs. Thus, even after motion compensation, highly dynamic objects are not matched from multiple sensors in the same frame, and human annotators struggle to add unique bounding boxes that capture all objects. This article focuses on addressing this challenge, primarily within the context of Scania-collected datasets. The proposed solution takes a track of an annotated object as input and uses the Moving Horizon Estimation (MHE) to robustly estimate its speed. The estimated speed profile is utilized to correct the position of the annotated box and add boxes to object clusters missed by the original annotation.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2024. p. 1032-1038
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-351940DOI: 10.23919/ECC64448.2024.10590958ISI: 001290216500156Scopus ID: 2-s2.0-85200563187OAI: oai:DiVA.org:kth-351940DiVA, id: diva2:1890156
Conference
2024 European Control Conference, ECC 2024, Stockholm, Sweden, Jun 25 2024 - Jun 28 2024
Note

Part of ISBN [9783907144107]

QC 20240830

Available from: 2024-08-19 Created: 2024-08-19 Last updated: 2026-03-26Bibliographically approved
In thesis
1. Beyond Standard Assumptions in Autonomous Driving Perception
Open this publication in new window or tab >>Beyond Standard Assumptions in Autonomous Driving Perception
2026 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Autonomous driving perception is commonly developed and evaluated under a set of enabling assumptions: that multi-sensor evidence is physically consistent at the frame level, that geometry is sufficiently dense to support reliable inference about other traffic participants and the surrounding environment, and that learning can rely on either abundant human labels or self-supervised objectives derived from the sensor stream. This thesis examines what remains feasible when these assumptions no longer hold, and develops methods and design principles for perception under asynchronous sensing, long-range sparsity, and weak or unreliable supervision.

We first study physical inconsistency in multi-sensor data. We show that rolling and asynchronous acquisition, motion during aggregation, and annotation practices that implicitly assume temporal coherence can render the perception problem ill-posed before any representation choice is made. We therefore treat data preparation, motion compensation, and annotation consistency as integral parts of the perception pipeline, since errors at this stage can propagate directly into annotation, training, and evaluation.

We then examine representation under long-range sparsity. We show that long-range performance is limited not only by model capacity, but by the representations used to encode and expose ambiguous evidence. In particular, object-centric outputs and dense internal representations can force premature commitment when available evidence collapses at distance. To study this, we present results on long-range 3D object detection and sparse long-range scene flow, showing both the limits of object-centric perception under weak observability and the value of motion-centric estimation as range increases.

Finally, we study learning signals when labels and geometry-derived self-supervision become unreliable. We show that motion supervision can be recovered by importing physically grounded constraints from complementary modalities, using radar Doppler to guide LiDAR scene flow learning. We further show that scalable semantic supervision can be obtained from foundation-model priors through curriculum-based synthetic-to-real adaptation, which anchors language-aligned representations to real LiDAR characteristics.

Abstract [sv]

Uppfattning om autonom körning utvecklas och utvärderas vanligtvis under en uppsättning möjliggörande antaganden: att multisensorbevis är fysiskt konsistenta på bildnivå, att geometrin är tillräckligt tät för att stödja tillförlitlig slutsats om andra trafikdeltagare och den omgivande miljön, och att inlärning kan förlita sig på antingen rikliga mänskliga etiketter eller självövervakade mål som härrör från sensorströmmen. Denna avhandling undersöker vad som förblir genomförbart när dessa antaganden inte längre gäller, och utvecklar metoder och designprinciper för uppfattning under asynkron avkänning, långdistansgleshet och svag eller opålitlig övervakning.

Vi studerar först fysisk inkonsekvens i multisensordata. Vi visar att rullande och asynkron förvärv, rörelse under aggregering och annoteringsmetoder som implicit antar temporal koherens kan göra uppfattningsproblemet felaktigt ställt innan något representationsval görs. Vi behandlar därför dataförberedelse, rörelsekompensation och annoteringskonsistens som integrerade delar av uppfattningsprocessen, eftersom fel i detta skede kan fortplanta sig direkt till annotering, träning och utvärdering.

Vi undersöker sedan representation under långdistansgleshet. Vi visar att prestanda på lång räckvidd begränsas inte bara av modellens kapacitet, utan också av de representationer som används för att koda och exponera tvetydiga bevis. I synnerhet kan objektcentrerade utdata och täta interna representationer tvinga fram för tidigt engagemang när tillgängliga bevis kollapsar på avstånd. För att studera detta presenterar vi resultat om 3D-objektdetektering på lång räckvidd och gles scenflöde på lång räckvidd, vilket visar både gränserna för objektcentrerad perception under svag observerbarhet och värdet av rörelsecentrerad uppskattning när avståndet ökar.

Slutligen studerar vi inlärningssignaler när etiketter och geometri-härledd självövervakning blir opålitliga. Vi visar att rörelseövervakning kan återställas genom att importera fysiskt grundade begränsningar från komplementära modaliteter, med hjälp av radar-Doppler för att vägleda LiDAR-scenflödesinlärning. Vi visar vidare att skalbar semantisk övervakning kan erhållas från grundläggande modellprior genom läroplanbaserad syntetisk-till-real-anpassning, som förankrar språkanpassade representationer till verkliga LiDAR-egenskaper.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2026. p. 103
Series
TRITA-EECS-AVL ; 2026:22
Keywords
Autonomous Driving, Computer Vision, Robotics
National Category
Computer graphics and computer vision Robotics and automation
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-378742 (URN)978-91-8106-558-9 (ISBN)
Public defence
2026-04-17, Kollegiesalen, Brinellvägen 8, Stockholm, 09:00 (English)
Opponent
Supervisors
Note

Zoom link: https://kth-se.zoom.us/s/68091974260

Available from: 2026-03-27 Created: 2026-03-26 Last updated: 2026-04-08Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Khoche, AjinkyaAsefaw, AronJensfelt, Patric

Search in DiVA

By author/editor
Khoche, AjinkyaAsefaw, AronJensfelt, Patric
By organisation
Robotics, Perception and Learning, RPL
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 126 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf