kth.sePublications KTH
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
Link to record
Permanent link

Direct link
Publications (10 of 13) Show all publications
Bruns, L., Barroso-Laguna, A., Cavallari, T., Monszpart, Á., Munukutla, S., Prisacariu, V. A. & Brachmann, E. (2025). ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV): . Paper presented at IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, Hawai'i, October 19-23, 2025 (pp. 26751-26761).
Open this publication in new window or tab >>ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training
Show others...
2025 (English)In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025, p. 26751-26761Conference paper, Published paper (Refereed)
Abstract [en]

Scene coordinate regression (SCR) has established itself as a promising learning-based approach to visual relocalization. After mere minutes of scene-specific training, SCR models estimate camera poses of query images with high accuracy. Still, SCR methods fall short of the generalization capabilities of more classical feature-matching approaches. When imaging conditions of query images, such as lighting or viewpoint, are too different from the training views, SCR models fail. Failing to generalize is an inherent limitation of previous SCR frameworks, since their training objective is to encode the training views in the weights of the coordinate regressor itself. The regressor essentially overfits to the training views, by design. We propose to separate the coordinate regressor and the map representation into a generic transformer and a scene-specific map code. This separation allows us to pre-train the transformer on tens of thousands of scenes. More importantly, it allows us to train the transformer to generalize from mapping images to unseen query images during pre-training. We demonstrate on multiple challenging relocalization datasets that our method, ACE-G, leads to significantly increased robustness while keeping the computational footprint attractive

Keywords
visual relocalization, pose estimation, visual localization
National Category
Computer Vision and Learning Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-372329 (URN)
Conference
IEEE/CVF International Conference on Computer Vision (ICCV), Honolulu, Hawai'i, October 19-23, 2025
Note

QC 20251105

Available from: 2025-11-05 Created: 2025-11-05 Last updated: 2025-11-05Bibliographically approved
Bruns, L. (2025). Improving Spatial Understanding Through Learning and Optimization. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>Improving Spatial Understanding Through Learning and Optimization
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Spatial understanding comprises various abilities from pose estimation of objects and cameras within a scene to shape completion given partial observations. These abilities are what enable humans to intuitively navigate and interact with the world. Despite significant progress in large-scale learning, computers still lack the same intuitive spatial understanding of humans. In robotics, this lack of abilities implies limited applicability of classical robotics pipelines in real-world environments and in augmented reality it limits achievable fidelity as well as the interaction of virtual content with real-world objects.

This thesis investigates ways to improve spatial understanding of computers using different learning- and optimization-based techniques. Learning-based methods are employed to learn useful priors about the objects and the 3D world, whereas optimization-based techniques are used to find models of objects and scenes aligning well with a set of observations. Within this framework, we investigate and propose methods for three subproblems of spatial understanding.

First, we propose a modular framework for categorical object pose and shape estimation, which combines a pre-trained generative shape model with a discriminative initialization network which regresses an initial pose and latent shape description from a partial point cloud of an object. By combining the generative shape model with a differentiable renderer we further perform iterative, joint pose and shape optimization from one or multiple views. Our approach outperforms existing methods especially on unconstrained orientations, while achieving competitive results for upright, tabletop objects.

Second, we investigate the use of neural fields for dense, volumetric mapping. Specifically, we propose to represent the scene by a set of spatially constrained, movable neural fields anchored to a pose graph. We formulate the optimization problem of the multi-field scene representation as independent optimization of each field demonstrating that this approach allows real-time loop closure integration, avoids transition artifact at field boundaries, and outperforms current neural-field-based SLAM systems on larger scenes in which significant drift can accumulate.

Third, we investigate large-scale pre-training for visual relocalization using scene coordinate regression. We split up the scene-specific regressor into a scene-agnostic regressor and a scene-specific latent map code, and propose a pre-training scheme for the scene-agnostic coordinate regressor to better generalize from mapping images to query images containing different viewpoints, lighting changes, and objects. We demonstrate that our approach outperforms existing methods under such dynamic mapping-query splits.

Abstract [sv]

Rumslig förståelse omfattar olika förmågor, från pose estimering av objekt och kameror i en scen till formkomplettering av objekt utifrån partiella observationer. Dessa förmågor är vad som gör det möjligt för människor att intuitivt navigera och interagera med världen. Trots betydande framsteg inom storskalig inlärning saknar datorer fortfarande samma intuitiva rumsliga förståelse som människor har. Inom robotik innebär denna brist på förmågor en begränsad tillämpning av klassiska robotpipelines i verkliga miljöer, och inom förstärkt verklighet (augmented reality) begränsar den både den uppnåeliga verklighetsgraden och interaktionen mellan virtuellt innehåll och verkliga objekt.

Denna avhandling undersöker sätt att förbättra datorers rumsliga förståelse med hjälp av olika inlärnings- och optimeringsbaserade tekniker. Inlärningsbaserade metoder används för att lära in användbara förkunskaper om objekt och 3D-världen, medan optimeringsbaserade tekniker används för att hitta modeller av objekt och scener som stämmer väl överens med en uppsättning observationer. Inom detta ramverk undersöker och föreslår vi metoder för tre delproblem inom rumslig förståelse.

För det första föreslår vi ett modulärt ramverk för kategorisk pose- och formbestämning av objekt, vilket kombinerar en förtränad generativ formmodell med ett diskriminativt initialiseringsnätverk som skattar en initial pose och en latent form utifrån ett partiellt punktmoln av ett objekt. Genom att kombinera den generativa formmodellen med en differentierbar renderare utför vi vidare iterativ, gemensam optimering av pose och form från en eller flera vyer. Vår metod överträffar befintliga metoder, särskilt för objekt i fria orienteringar, samtidigt som den uppnår konkurrenskraftiga resultat för upprättstående objekt på en bordsyta.

För det andra undersöker vi användningen av neurala fält (neural fields) för tät, volymetrisk kartläggning. Specifikt föreslår vi att representera scenen med en uppsättning rumsligt begränsade, flyttbara neurala fält förankrade i en posegraf. Vi formulerar optimeringsproblemet för scenrepresentationen med flera fält som oberoende optimering av varje fält och visar att denna metod möjliggör integration av loop-stängning (loop closure) i realtid, undviker övergångsartefakter vid fältgränser och överträffar nuvarande neuralfältbaserade SLAM-system i större scener där betydande drift kan ackumuleras.

För det tredje undersöker vi storskalig förträning för visuell relokalisering med hjälp av regression av scenkoordinater. Vi delar upp den scenspecifika regressorn i en scenagnostisk regressor och en scenspecifik latent kartkod. Vi föreslår ett förträningsschema för den scenagnostiska koordinatregressorn för att bättre generalisera från kartläggningsbilder till sökbilder som innehåller olika synvinklar, ljusförändringar och objektplaceringar. Vi visar att vår metod överträffar befintliga metoder under sådana dynamiska uppdelningar mellan kartläggnings- och sökdata.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. viii, 75
Series
TRITA-EECS-AVL ; 2025:97
Keywords
Spatial Understanding, Pose and Shape Estimation, Volumetric Mapping, Visual Relocalization, Rumslig Förståelse, Pose- och Formbestämning, Volymetrisk Kartläggning, Visuell Relokalisering
National Category
Computer Vision and Learning Systems
Identifiers
urn:nbn:se:kth:diva-372393 (URN)978-91-8106-446-9 (ISBN)
Public defence
2025-12-05, https://kth-se.zoom.us/s/65134312330, F3 (Flodis), Lindstedtsvägen 26 & 28, KTH Campus, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20251106

Available from: 2025-11-06 Created: 2025-11-05 Last updated: 2025-12-09Bibliographically approved
Bruns, L., Zhang, J. & Jensfelt, P. (2025). Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration. In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV): . Paper presented at 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, February 26 - March 6, 2025 (pp. 2900-2909). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Neural Graph Map: Dense Mapping with Efficient Loop Closure Integration
2025 (English)In: 2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 2900-2909Conference paper, Published paper (Refereed)
Abstract [en]

Neural field-based SLAM methods typically employ a single monolithic field as their scene representation. This prevents efficient incorporation of loop closure constraints and limits scalability. To address these shortcomings we propose a novel RGB-D neural mapping framework in which the scene is represented by a collection of lightweight neural fields which are dynamically anchored to the pose graph of a sparse visual SLAM system. Our approach shows the ability to integrate large-scale loop closures while requiring only minimal reintegration. Furthermore we verify the scalability of our approach by demonstrating successful building-scale mapping taking multiple loop closures into account during the optimization and show that our method outperforms existing state-of-the-art approaches on large scenes in terms of quality and runtime. Our code is available open-source at https://github.com/KTH-RPL/neural_graph_mapping. 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Computer Vision and Learning Systems
Identifiers
urn:nbn:se:kth:diva-372392 (URN)10.1109/WACV61041.2025.00287 (DOI)001481328900277 ()2-s2.0-105003634119 (Scopus ID)
Conference
2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), Tucson, AZ, USA, February 26 - March 6, 2025
Note

Part of ISBN 9798331510831, 9798331510848

QC 20251106

Available from: 2025-11-05 Created: 2025-11-05 Last updated: 2025-11-06Bibliographically approved
Gaspar Sánchez, J. M., Bruns, L., Tumova, J., Jensfelt, P. & Törngren, M. (2025). Transitional Grid Maps: Joint Modeling of Static and Dynamic Occupancy. IEEE Open Journal of Intelligent Transportation Systems, 6, 1-10
Open this publication in new window or tab >>Transitional Grid Maps: Joint Modeling of Static and Dynamic Occupancy
Show others...
2025 (English)In: IEEE Open Journal of Intelligent Transportation Systems, E-ISSN 2687-7813, Vol. 6, p. 1-10Article in journal (Refereed) Published
Abstract [en]

Autonomous agents rely on sensor data to construct representations of their environments, essential for predicting future events and planning their actions. However, sensor measurements suffer from limited range, occlusions, and sensor noise. These challenges become more evident in highly dynamic environments. This work proposes a probabilistic framework to jointly infer which parts of an environment are statically and which parts are dynamically occupied. We formulate the problem as a Bayesian network and introduce minimal assumptions that significantly reduce the complexity of the problem. Based on those, we derive Transitional Grid Maps (TGMs), an efficient analytical solution. Using real data, we demonstrate how this approach produces better maps than the state-of-the-art by keeping track of both static and dynamic elements and, as a side effect, can help improve existing SLAM algorithms.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Computer Sciences Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-359349 (URN)10.1109/ojits.2024.3521449 (DOI)2-s2.0-85210909052 (Scopus ID)
Note

QC 20250130

Available from: 2025-01-30 Created: 2025-01-30 Last updated: 2025-05-27Bibliographically approved
Zangeneh, F., Bruns, L., Dekel, A., Pieropan, A. & Jensfelt, P. (2024). Conditional Variational Autoencoders for Probabilistic Pose Regression. In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024: . Paper presented at 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024, Abu Dhabi, United Arab Emirates, Oct 14 2024 - Oct 18 2024 (pp. 2794-2800). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Conditional Variational Autoencoders for Probabilistic Pose Regression
Show others...
2024 (English)In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 2794-2800Conference paper, Published paper (Refereed)
Abstract [en]

Robots rely on visual relocalization to estimate their pose from camera images when they lose track. One of the challenges in visual relocalization is repetitive structures in the operation environment of the robot. This calls for probabilistic methods that support multiple hypotheses for robot's pose. We propose such a probabilistic method to predict the posterior distribution of camera poses given an observed image. Our proposed training strategy results in a generative model of camera poses given an image, which can be used to draw samples from the pose posterior distribution. Our method is streamlined and well-founded in theory and outperforms existing methods on localization in presence of ambiguities.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Computer graphics and computer vision Robotics and automation
Identifiers
urn:nbn:se:kth:diva-359873 (URN)10.1109/IROS58592.2024.10802091 (DOI)001411890000287 ()2-s2.0-85216445787 (Scopus ID)
Conference
2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024, Abu Dhabi, United Arab Emirates, Oct 14 2024 - Oct 18 2024
Note

Part of ISBN 9798350377705

QC 20250213

Available from: 2025-02-12 Created: 2025-02-12 Last updated: 2025-11-16Bibliographically approved
Zangeneh, F., Bruns, L., Dekel, A., Pieropan, A. & Jensfelt, P. (2023). A Probabilistic Framework for Visual Localization in Ambiguous Scenes. In: Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation. Paper presented at 2023 IEEE International Conference on Robotics and Automation, ICRA 2023, London, United Kingdom of Great Britain and Northern Ireland, May 29 2023 - Jun 2 2023 (pp. 3969-3975). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>A Probabilistic Framework for Visual Localization in Ambiguous Scenes
Show others...
2023 (English)In: Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 3969-3975Conference paper, Published paper (Refereed)
Abstract [en]

Visual localization allows autonomous robots to relocalize when losing track of their pose by matching their current observation with past ones. However, ambiguous scenes pose a challenge for such systems, as repetitive structures can be viewed from many distinct, equally likely camera poses, which means it is not sufficient to produce a single best pose hypothesis. In this work, we propose a probabilistic framework that for a given image predicts the arbitrarily shaped posterior distribution of its camera pose. We do this via a novel formulation of camera pose regression using variational inference, which allows sampling from the predicted distribution. Our method outperforms existing methods on localization in ambiguous scenes. We open-source our approach and share our recorded data sequence at github.com/efreidun/vapor.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
National Category
Computer graphics and computer vision Robotics and automation Signal Processing
Identifiers
urn:nbn:se:kth:diva-336775 (URN)10.1109/ICRA48891.2023.10160466 (DOI)001036713003052 ()2-s2.0-85168671933 (Scopus ID)
Conference
2023 IEEE International Conference on Robotics and Automation, ICRA 2023, London, United Kingdom of Great Britain and Northern Ireland, May 29 2023 - Jun 2 2023
Note

Part of ISBN 9798350323658

QC 20230920

Available from: 2023-09-20 Created: 2023-09-20 Last updated: 2025-11-16Bibliographically approved
Bruns, L. & Jensfelt, P. (2023). On the Evaluation of RGB-D-Based Categorical Pose and Shape Estimation. In: Petrovic, I Menegatti, E Markovic, I (Ed.), Intelligent Autonomous Systems 17, IAS-17: . Paper presented at 17th International Conference on Intelligent Autonomous Systems (IAS), JUN 13-16, 2022, Zagreb, CROATIA (pp. 360-377). Springer Nature, 577
Open this publication in new window or tab >>On the Evaluation of RGB-D-Based Categorical Pose and Shape Estimation
2023 (English)In: Intelligent Autonomous Systems 17, IAS-17 / [ed] Petrovic, I Menegatti, E Markovic, I, Springer Nature , 2023, Vol. 577, p. 360-377Conference paper, Published paper (Refereed)
Abstract [en]

Recently, various methods for 6D pose and shape estimation of objects have been proposed. Typically, these methods evaluate their pose estimation in terms of average precision and reconstruction quality in terms of chamfer distance. In this work, we take a critical look at this predominant evaluation protocol, including metrics and datasets. We propose a new set of metrics, contribute new annotations for the Redwood dataset, and evaluate state-of-the-art methods in a fair comparison. We find that existing methods do not generalize well to unconstrained orientations and are actually heavily biased towards objects being upright. We provide an easy-to-use evaluation toolbox with well-defined metrics, method, and dataset interfaces, which allows evaluation and comparison with various state-of-the-art approaches (https://github.com/roym899/pose and shape evaluation).

Place, publisher, year, edition, pages
Springer Nature, 2023
Series
Lecture Notes in Networks and Systems, ISSN 2367-3370
Keywords
Pose estimation, Shape reconstruction, RGB-D-based perception
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-328417 (URN)10.1007/978-3-031-22216-0_25 (DOI)000992458200025 ()2-s2.0-85148744517 (Scopus ID)
Conference
17th International Conference on Intelligent Autonomous Systems (IAS), JUN 13-16, 2022, Zagreb, CROATIA
Note

QC 20230613

Available from: 2023-06-13 Created: 2023-06-13 Last updated: 2025-02-07Bibliographically approved
Bruns, L. & Jensfelt, P. (2023). RGB-D-based categorical object pose and shape estimation: Methods, datasets, and evaluation. Robotics and Autonomous Systems, 168, Article ID 104507.
Open this publication in new window or tab >>RGB-D-based categorical object pose and shape estimation: Methods, datasets, and evaluation
2023 (English)In: Robotics and Autonomous Systems, ISSN 0921-8890, E-ISSN 1872-793X, Vol. 168, article id 104507Article in journal (Refereed) Published
Abstract [en]

Recently, various methods for 6D pose and shape estimation of objects at a per-category level have been proposed. This work provides an overview of the field in terms of methods, datasets, and evaluation protocols. First, an overview of existing works and their commonalities and differences is provided. Second, we take a critical look at the predominant evaluation protocol, including metrics and datasets. Based on the findings, we propose a new set of metrics, contribute new annotations for the Redwood dataset, and evaluate state-of-the-art methods in a fair comparison. The results indicate that existing methods do not generalize well to unconstrained orientations and are actually heavily biased towards objects being upright. We provide an easy-to-use evaluation toolbox with well-defined metrics, methods, and dataset interfaces, which allows evaluation and comparison with various state-of-the-art approaches (https://github.com/roym899/pose_and_shape_evaluation).

Place, publisher, year, edition, pages
Elsevier BV, 2023
Keywords
Pose estimation, RGB-D-based perception, Shape estimation, Shape reconstruction
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-336565 (URN)10.1016/j.robot.2023.104507 (DOI)001090698300001 ()2-s2.0-85169011550 (Scopus ID)
Note

QC 20230918

Available from: 2023-09-18 Created: 2023-09-18 Last updated: 2025-11-05Bibliographically approved
Bruns, L. & Jensfelt, P. (2022). SDFEst: Categorical Pose and Shape Estimation of Objects From RGB-D Using Signed Distance Fields. IEEE Robotics and Automation Letters, 7(4), 9597-9604
Open this publication in new window or tab >>SDFEst: Categorical Pose and Shape Estimation of Objects From RGB-D Using Signed Distance Fields
2022 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 7, no 4, p. 9597-9604Article in journal (Refereed) Published
Abstract [en]

Rich geometric understanding of the world is an important component of many robotic applications such as planning and manipulation. In this paper, we present a modular pipeline for pose and shape estimation of objects from RGB-D images given their category. The core of our method is a generative shape model, which we integrate with a novel initialization network and a differentiable renderer to enable 6D pose and shape estimation from a single or multiple views. We investigate the use of discretized signed distance fields as an efficient shape representation for fast analysis-by-synthesis optimization. Our modular framework enables multi-view optimization and extensibility. We demonstrate the benefits of our approach over state-of-the-art methods in several experiments on both synthetic and real data. We open-source our approach at https://github.com/roym899/sdfest.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2022
Keywords
RGB-D perception, deep learning for visual perception
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-316243 (URN)10.1109/LRA.2022.3189792 (DOI)000831182500039 ()2-s2.0-85134261140 (Scopus ID)
Note

QC 20220817

Available from: 2022-08-17 Created: 2022-08-17 Last updated: 2025-11-05Bibliographically approved
Heiden, E., Palmieri, L., Bruns, L., Arras, K. O., Sukhatme, G. S. & Koenig, S. (2021). Bench-MR: A Motion Planning Benchmark for Wheeled Mobile Robots. IEEE Robotics and Automation Letters, 6(3), 4536-4543
Open this publication in new window or tab >>Bench-MR: A Motion Planning Benchmark for Wheeled Mobile Robots
Show others...
2021 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 6, no 3, p. 4536-4543Article in journal (Refereed) Published
Abstract [en]

Planning smooth and energy-efficient paths for wheeled mobile robots is a central task for applications ranging from autonomous driving to service and intralogistic robotics. Over the past decades, several sampling-based motion-planning algorithms, extend functions and post-smoothing algorithms have been introduced for such motion-planning systems. Choosing the best combination of components for an application is a tedious exercise, even for expert users. We therefore present Bench-MR, the first open-source motion-planning benchmarking framework designed for sampling-based motion planning for nonholonomic, wheeled mobile robots. Unlike related software suites, Bench-MR is an easy-to-use and comprehensive benchmarking framework that provides a large variety of sampling-based motion-planning algorithms, extend functions, collision checkers, post-smoothing algorithms and optimization criteria. It aids practitioners and researchers in designing, testing, and evaluating motion-planning systems, and comparing them against the state of the art on complex navigation scenarios through many performance metrics. Through several experiments, we demonstrate how Bench-MR can be used to gain extensive insights from the benchmarking results it generates.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
Keywords
Planning, Benchmark testing, Mobile robots, Navigation, Robot kinematics, Open source software, Collision avoidance, Nonholonomic motion planning, wheeled robots, software tools for benchmarking and reproducibility
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-295374 (URN)10.1109/LRA.2021.3068913 (DOI)000640765600015 ()2-s2.0-85103236568 (Scopus ID)
Note

QC 20210524

Available from: 2021-05-24 Created: 2021-05-24 Last updated: 2025-02-09Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0001-8747-6359

Search in DiVA

Show all publications