kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Camera Relocalization through Distribution Modeling
KTH, School of Electrical Engineering and Computer Science (EECS), Intelligent systems, Robotics, Perception and Learning, RPL. Univrses AB, Stockholm, Sweden.ORCID iD: 0000-0001-7819-3541
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Relocalization is a key component of robot navigation: in order to move successfully within an environment, a robot must know its location in relation to that environment. Cameras are inexpensive sensors that enable relocalization by comparing visual observations with a model of the scene. To this end, camera relocalization, which also finds applications in augmented reality, has long been a topic of research, leading to elaborately designed pipelines for accurate camera pose estimation. Recently, a paradigm shift has seen explicit models of the scene replaced by implicit ones, where the scene is encoded in the weights of neural networks. This shift simplifies relocalization pipelines but leaves open a fundamental challenge: scenes with repetitive structures often produce ambiguous observations, meaning that the same visual input can correspond to multiple distinct camera poses. This thesis addresses this challenge, with a particular focus on implicit relocalization methods. It critically examines the assumptions underlying existing paradigms such as Absolute Pose Regression (APR) and Scene Coordinate Regression (SCR) about the uniqueness of appearances. As its central contribution, the thesis proposes to model the full distribution of possible solutions, which can be arbitrarily shaped, rather than attempting to recover a single best estimate. To this end, it proposes to leverage Conditional Variational Autoencoders (C-VAEs) as generative models capable of representing both distributions over poses and distributions over points. Furthermore, likelihood estimation within this framework provides a principled means of attaching confidence measures to predictions. These contributions, together with the suggested applications and directions for future work, lay a foundation for simplifying relocalization pipelines by more effectively handling ambiguities in observations.

Abstract [sv]

Omlokalisering är en nyckelkomponent i robotnavigering: för att kunna röra sig framgångsrikt inom en miljö måste en robot känna till sin position i förhållande till den miljön. Kameror är kostnadseffektiva sensorer som möjliggör omlokalisering genom att jämföra visuella observationer med en modell av scenen. Därför har kameraomlokalisering, som också hittar tillämpningar inom förstärkt verklighet, länge varit ett forskningsämne, vilket har lett till noggrant utformade pipelines för korrekt kameraposeuppskattning. Nyligen har ett paradigmskifte sett explicita modeller av scenen ersättas av implicita, där scenen är kodad i vikterna av neurala nätverk. Detta skifte förenklar omlokaliseringspipelines men lämnar en grundläggande utmaning öppen: scener med repetitiva strukturer producerar ofta tvetydiga observationer, vilket innebär att samma visuella input kan motsvara flera distinkta kamerapositioner. Denna avhandling tar upp denna utmaning, med särskilt fokus på implicita omlokaliseringsmetoder. Den granskar kritiskt antagandena bakom befintliga paradigm som Absolute Pose Regression (APR) och Scene Coordinate Regression (SCR), som vanligtvis förutsätter en unik lösning. Som sitt centrala bidrag föreslår avhandlingen att modellera den fullständiga fördelningen av möjliga lösningar, som kan formas godtyckligt, snarare än att försöka hitta en enda bästa uppskattning. För detta ändamål föreslogs att man skulle utnyttja Conditional Variational Autoencoders (C-VAEs) som generativa modeller som kan representera både fördelningar över poser och fördelningar över punkter. Dessutom ger sannolikhetsuppskattning inom detta ramverk ett principiellt sätt att koppla konfidensmått till förutsägelser. Dessa bidrag, tillsammans med de föreslagna tillämpningarna och riktningarna för framtida arbete, lägger en grund för att förenkla omlokaliseringspipelines genom att mer effektivt hantera tvetydighet i observationer.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. , p. xii, 41
Series
TRITA-EECS-AVL ; 2025:106
National Category
Computer Vision and Learning Systems
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-372920ISBN: 978-91-8106-468-1 (print)OAI: oai:DiVA.org:kth-372920DiVA, id: diva2:2014039
Public defence
2025-12-11, https://kth-se.zoom.us/j/68470117111, D3, Lindstedtsvägen 5, KTH Campus, Stockholm, 14:00 (English)
Opponent
Supervisors
Note

QC 20251117

Available from: 2025-11-17 Created: 2025-11-16 Last updated: 2025-11-17Bibliographically approved
List of papers
1. A Probabilistic Framework for Visual Localization in Ambiguous Scenes
Open this publication in new window or tab >>A Probabilistic Framework for Visual Localization in Ambiguous Scenes
Show others...
2023 (English)In: Proceedings - ICRA 2023: IEEE International Conference on Robotics and Automation, Institute of Electrical and Electronics Engineers (IEEE) , 2023, p. 3969-3975Conference paper, Published paper (Refereed)
Abstract [en]

Visual localization allows autonomous robots to relocalize when losing track of their pose by matching their current observation with past ones. However, ambiguous scenes pose a challenge for such systems, as repetitive structures can be viewed from many distinct, equally likely camera poses, which means it is not sufficient to produce a single best pose hypothesis. In this work, we propose a probabilistic framework that for a given image predicts the arbitrarily shaped posterior distribution of its camera pose. We do this via a novel formulation of camera pose regression using variational inference, which allows sampling from the predicted distribution. Our method outperforms existing methods on localization in ambiguous scenes. We open-source our approach and share our recorded data sequence at github.com/efreidun/vapor.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
National Category
Computer graphics and computer vision Robotics and automation Signal Processing
Identifiers
urn:nbn:se:kth:diva-336775 (URN)10.1109/ICRA48891.2023.10160466 (DOI)001036713003052 ()2-s2.0-85168671933 (Scopus ID)
Conference
2023 IEEE International Conference on Robotics and Automation, ICRA 2023, London, United Kingdom of Great Britain and Northern Ireland, May 29 2023 - Jun 2 2023
Note

Part of ISBN 9798350323658

QC 20230920

Available from: 2023-09-20 Created: 2023-09-20 Last updated: 2025-11-16Bibliographically approved
2. Conditional Variational Autoencoders for Probabilistic Pose Regression
Open this publication in new window or tab >>Conditional Variational Autoencoders for Probabilistic Pose Regression
Show others...
2024 (English)In: 2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 2794-2800Conference paper, Published paper (Refereed)
Abstract [en]

Robots rely on visual relocalization to estimate their pose from camera images when they lose track. One of the challenges in visual relocalization is repetitive structures in the operation environment of the robot. This calls for probabilistic methods that support multiple hypotheses for robot's pose. We propose such a probabilistic method to predict the posterior distribution of camera poses given an observed image. Our proposed training strategy results in a generative model of camera poses given an image, which can be used to draw samples from the pose posterior distribution. Our method is streamlined and well-founded in theory and outperforms existing methods on localization in presence of ambiguities.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
National Category
Computer graphics and computer vision Robotics and automation
Identifiers
urn:nbn:se:kth:diva-359873 (URN)10.1109/IROS58592.2024.10802091 (DOI)001411890000287 ()2-s2.0-85216445787 (Scopus ID)
Conference
2024 IEEE/RSJ International Conference on Intelligent Robots and Systems, IROS 2024, Abu Dhabi, United Arab Emirates, Oct 14 2024 - Oct 18 2024
Note

Part of ISBN 9798350377705

QC 20250213

Available from: 2025-02-12 Created: 2025-02-12 Last updated: 2025-11-16Bibliographically approved
3. Quantifying Epistemic Uncertainty in Absolute Pose Regression
Open this publication in new window or tab >>Quantifying Epistemic Uncertainty in Absolute Pose Regression
2025 (English)In: Image Analysis - 23rd Scandinavian Conference, SCIA 2025, Proceedings, Springer Nature , 2025, p. 180-195Conference paper, Published paper (Refereed)
Abstract [en]

Visual relocalization is the task of estimating the camera pose given an image it views. Absolute pose regression offers a solution to this task by training a neural network, directly regressing the camera pose from image features. While an attractive solution in terms of memory and compute efficiency, absolute pose regression’s predictions are inaccurate and unreliable outside the training domain. In this work, we propose a novel method for quantifying the epistemic uncertainty of an absolute pose regression model by estimating the likelihood of observations within a variational framework. Beyond providing a measure of confidence in predictions, our approach offers a unified model that also handles observation ambiguities, probabilistically localizing the camera in the presence of repetitive structures. Our method outperforms existing approaches in capturing the relation between uncertainty and prediction error.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Camera Relocalization, Uncertainty Estimation, VAEs
National Category
Computer graphics and computer vision Signal Processing
Identifiers
urn:nbn:se:kth:diva-368911 (URN)10.1007/978-3-031-95918-9_13 (DOI)001553877800013 ()2-s2.0-105009846579 (Scopus ID)
Conference
23rd Scandinavian Conference on Image Analysis, SCIA 2025, Reykjavik, Iceland, June 23-25, 2025
Note

Part of ISBN 9783031959172

QC 20250822

Available from: 2025-08-22 Created: 2025-08-22 Last updated: 2025-12-08Bibliographically approved
4. Visible Structure Retrieval for Lightweight Image-Based Relocalisation
Open this publication in new window or tab >>Visible Structure Retrieval for Lightweight Image-Based Relocalisation
Show others...
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Accurate camera pose estimation from an image observation in a previously mapped environment is commonly done through structure-based methods: by finding correspondences between 2D keypoints on the image and 3D structure points in the map. In order to make this correspondence search tractable in large scenes, existing pipelines either rely on search heuristics, or perform image retrieval to reduce the search space by comparing the current image to a database of past observations. However, these approaches result in elaborate pipelines or storage requirements that grow with the number of past observations. In this work, we propose a new paradigm for making structure-based relocalisation tractable. Instead of relying on image retrieval or search heuristics, we learn a direct mapping from image observations to the visible scene structure in a compact neural network. Given a query image, a forward pass through our novel visible structure retrieval network allows obtaining the subset of 3D structure points in the map that the image views, thus reducing the search space of 2D-3D correspondences. We show that our proposed method enables performing localisation with an accuracy comparable to the state of the art, while requiring lower computational and storage footprint.

National Category
Computer graphics and computer vision Robotics and automation
Identifiers
urn:nbn:se:kth:diva-372919 (URN)
Note

Accepted at the 36th British Machine Vision Conference (BMVC) 2025

QC 20251117

Available from: 2025-11-16 Created: 2025-11-16 Last updated: 2025-11-17Bibliographically approved

Open Access in DiVA

fulltext(4295 kB)64 downloads
File information
File name FULLTEXT01.pdfFile size 4295 kBChecksum SHA-512
2704dbb59aa0b26fd7d4d7081209a6a3673f1e1310847f4a8b4ebe668a3a3b85f6a432318bc93d17ce96124ebcf50eb5a379cf3dca69d54fc11c5c187f7b0385
Type fulltextMimetype application/pdf

Authority records

Zangeneh, Fereidoon

Search in DiVA

By author/editor
Zangeneh, Fereidoon
By organisation
Robotics, Perception and Learning, RPL
Computer Vision and Learning Systems

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1284 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf