kth.sePublications
Change search
Link to record
Permanent link

Direct link
Tang, Jiexiong
Publications (8 of 8) Show all publications
Tang, J. (2020). Deep Learning Assisted Visual Odometry. (Doctoral dissertation). KTH Royal Institute of Technology
Open this publication in new window or tab >>Deep Learning Assisted Visual Odometry
2020 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The capabilities to autonomously explore and interact with the environmenthas always been a greatly demanded capability for robots. Varioussensor based SLAM methods were investigated and served for this purposein the past decades. Vision intuitively provides 3D understanding of the surroundingand contains a vast amount of information that require high levelintelligence to interpret. Sensors like LIDAR, returns the range measurementdirectly. The motion estimation and scene reconstruction using camera is aharder problem. In this thesis, we are in particular interested in the trackingfrond-end of vision based SLAM, i.e. Visual Odometry (VO), with afocus on deep learning approaches. Recently, learning based methods havedominated most of the vision applications and gradually appears in our dailylife and real-world applications. Different to classical methods, deep learningbased methods can potentially tackle some of the intrinsic problems inmulti-view geometry and straightforwardly improve the performance of crucialprocedures of VO. For example, the correspondences estimation, densereconstruction and semantic representation.

In this work, we propose novel learning schemes for assisting both directand in-direct visual odometry methods. For the direct approaches, weinvestigate mainly the monocular setup. The lack of the baseline that providesscale as in stereo has been one of the well-known intrinsic problems inthis case. We propose a coupled single view depth and normal estimationmethod to reduce the scale drift and address the issue of lacking observationsof the absolute scale. It is achieved by providing priors for the depthoptimization. Moreover, we utilize higher-order geometrical information toguide the dense reconstruction in a sparse-to-dense manner. For the in-directmethods, we propose novel feature learning based methods which noticeablyimprove the feature matching performance in comparison with common classicalfeature detectors and descriptors. Finally, we discuss potential ways tomake the training self-supervised. This is accomplished by incorporating thedifferential motion estimation into the training while performing multi-viewadaptation to maximize the repeatability and matching performance. We alsoinvestigate using a different type of supervisory signal for the training. Weadd a higher-level proxy task and show that it is possible to train a featureextraction network even without the explicit loss for it.

In summary, this thesis presents successful examples of incorporating deeplearning techniques to assist a classical visual odometry system. The resultsare promising and have been extensively evaluated on challenging benchmarks,real robot and handheld cameras. The problem we investigate is stillin an early stage, but is attracting more and more interest from researcher inrelated fields.

Abstract [sv]

Förmågan att självständigt utforska och interagera med en miljö har alltidvarit önskvärd hos robotar. Olika sensorbaserade SLAM-metoder har utvecklatsoch använts för detta ändamål under de senaste decennierna. Datorseendekan intuitivt används för 3D-förståelse men bygger på en enorm mängd informationsom kräver en hög nivå av intelligens för att tolka. Sensorer somLIDAR returnerar avståndet för varje mätpunkt direkt vilket gör rörelseuppskattningoch scenrekonstruktion mer rättframt än med en kamera. I den häravhandlingen är vi särskilt intresserade av kamerabaserad SLAM och merspecifikt den första delen av ett sådan system, dvs det som normalt kallasvisuell odometri (VO). Vi fokuserar på strategier baserade på djupinlärning.Nyligen har inlärningsbaserade metoder kommit att dominera de flesta avkameratillämpningarna och dyker gradvis upp i vårt dagliga liv. Till skillnadfrån klassiska metoder kan djupinlärningsbaserade metoder potentielltta itu med några av de inneboende problemen i kamerabaserade system ochförbättra prestandan för viktiga delar i VO. Till exempel uppskattningar avkorrespondenser, tät rekonstruktion och semantisk representation. I detta arbeteföreslår vi nya inlärningssystem för att stödja både direkta och indirektavisuella odometrimetoder. För de direkta metoder undersöker vi huvudsakligenfallet med endast en kamera. Bristen på baslinje, som i stereo, somger skalan i en scen har varit ett av de välkända problemen i detta fall. Viföreslår en metod som kopplar skattningen av djup och normaler, baseradpå endast en bild. För att adressera problemen med att skatta den absolutaskalan och drift i dessa skattningar, används det predikterade djupet somstartgissningar för avståndsoptimeringen. Dessutom använder vi geometriskinformation för att vägleda den täta rekonstruktionen på ett glest-till-tättsätt. För de indirekta metoderna föreslår vi nya nyckelpunktsbaserade metodersom märkbart förbättrar matchningsprestanda jämfört med klassiskametoder. Slutligen diskuterar vi potentiella sätt att göra inlärningen självkontrollerad.Detta åstadkoms genom att integrera skattningen av den inkrementellarörelsen i träningen. Vi undersöker också hur man kan använda enså kallad proxy-uppgift för att generera en implicit kontrollsignal och visaratt vi kan träna ett nyckelpunktgenererande nätverk på detta sätt.

Sammanfattningsvis presenterar denna avhandling flera fungerade exempelpå att hur djupinlärningstekniker kan hjälpa ett klassiskt visuellt odometrisystem.Resultaten är lovande och har utvärderats i omfattande ochutmanande scenarier, från dataset, på riktiga robotar så väl som handhållnakameror. Problemet vi undersöker befinner sig fortfarande i ett tidigt skedeforskningsmässigt, men intresserar nu också forskare från närliggande områden.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2020
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-273749 (URN)978-91-7873-550-1 (ISBN)
Public defence
2020-06-12, U1, https://kth-se.zoom.us/w/69450802964?tk=N7iFB13No_I0ip6YuiqlOvmrTXcCPmzXkcWcUSgbnow.DQIAAAAQK5cnFBZva1hiYUpCS1M5aWdjLXFkUWNCOWRBAAAAAAAAAAAAAAAAAAAAAAAAAAAA&uuid=WN_9RUa0Q0iRZC4gGmHG-ravw, Stockholm, 10:00 (English)
Opponent
Supervisors
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)
Note

QC 20200527

Available from: 2020-05-27 Created: 2020-05-26 Last updated: 2025-05-23Bibliographically approved
Tang, J., Kim, H., Guizilini, V., Pillai, S. & Rares, A. (2020). Neural Outlier Rejection For Self-Supervised Keypoint Learning. In: 8th International Conference on Learning Representations, ICLR 2020: . Paper presented at 8th International Conference on Learning Representations, ICLR 2020. International Conference on Learning Representations, ICLR
Open this publication in new window or tab >>Neural Outlier Rejection For Self-Supervised Keypoint Learning
Show others...
2020 (English)In: 8th International Conference on Learning Representations, ICLR 2020, International Conference on Learning Representations, ICLR , 2020Conference paper, Published paper (Refereed)
Abstract [en]

Identifying salient points in images is a crucial component for visual odometry, Structure-from-Motion or SLAM algorithms. Recently, several learned keypoint methods have demonstrated compelling performance on challenging benchmarks. However, generating consistent and accurate training data for interest-point detection in natural images still remains challenging, especially for human annotators. We introduce IO-Net (i.e. InlierOutlierNet), a novel proxy task for the self-supervision of keypoint detection, description and matching. By making the sampling of inlier-outlier sets from point-pair correspondences fully differentiable within the keypoint learning framework, we show that are able to simultaneously self-supervise keypoint description and improve keypoint matching. Second, we introduce KeyPointNet, a keypoint-network architecture that is especially amenable to robust keypoint detection and description. We design the network to allow local keypoint aggregation to avoid artifacts due to spatial discretizations commonly used for this task, and we improve fine-grained keypoint descriptor performance by taking advantage of efficient sub-pixel convolutions to upsample the descriptor feature-maps to a higher operating resolution. Through extensive experiments and ablative analysis, we show that the proposed self-supervised keypoint learning method greatly improves the quality of feature matching and homography estimation on challenging benchmarks over the state-of-the-art. 

Place, publisher, year, edition, pages
International Conference on Learning Representations, ICLR, 2020
Keywords
Benchmarking, Learning systems, Statistics, Descriptors, Key point matching, Keypoint descriptions, Keypoint detection, Keypoints, Outliers rejections, Performance, Salient points, Structure from motion algorithm, Visual odometry, Network architecture
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-327295 (URN)2-s2.0-85138175088 (Scopus ID)
Conference
8th International Conference on Learning Representations, ICLR 2020
Note

QC 20230524

Available from: 2023-05-24 Created: 2023-05-24 Last updated: 2025-02-07Bibliographically approved
Tang, J., Hanme, K., Vitor, G., Sudeep, P. & Rares, A. (2020). Neural Outlier Rejection for Self-Supervised KeypointLearning. In: International Conference on Learning Representations(ICLR), Apr 26th through May 1st, 2020: . Paper presented at International Conference on Learning Representations(ICLR) 2020.
Open this publication in new window or tab >>Neural Outlier Rejection for Self-Supervised KeypointLearning
Show others...
2020 (English)In: International Conference on Learning Representations(ICLR), Apr 26th through May 1st, 2020, 2020Conference paper, Poster (with or without abstract) (Refereed)
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-273745 (URN)
Conference
International Conference on Learning Representations(ICLR) 2020
Note

QC 20200702

Available from: 2020-05-26 Created: 2020-05-26 Last updated: 2025-02-07Bibliographically approved
Tang, J., Rares, A., Guizilini, V., Pillai, S., Kim, H., Jensfelt, P. & Gaidon, A. (2020). Self-Supervised 3D Keypoint Learning for Ego-Motion Estimation. In: Proceedings of the 2020 Conference on Robot Learning, CoRL 2020: . Paper presented at 4th Conference on Robot Learning, CoRL 2020, Virtual/Online, United States of America, Nov 16 2020 - Nov 18 2020 (pp. 2085-2103). ML Research Press
Open this publication in new window or tab >>Self-Supervised 3D Keypoint Learning for Ego-Motion Estimation
Show others...
2020 (English)In: Proceedings of the 2020 Conference on Robot Learning, CoRL 2020, ML Research Press , 2020, p. 2085-2103Conference paper, Published paper (Refereed)
Abstract [en]

Detecting and matching robust viewpoint-invariant keypoints is critical for visual SLAM and Structure-from-Motion. State-of-the-art learning-based methods generate training samples via homography adaptation to create 2D synthetic views with known keypoint matches from a single image. This approach, however, does not generalize to non-planar 3D scenes with illumination variations commonly seen in real-world videos. In this work, we propose self-supervised learning of depth-aware keypoints directly from unlabeled videos. We jointly learn keypoint and depth estimation networks by combining appearance and geometric matching via a differentiable structure-from-motion module based on Procrustean residual pose correction. We describe how our self-supervised keypoints can be integrated into state-of-the-art visual odometry frameworks for robust and accurate ego-motion estimation of autonomous vehicles in real-world conditions.

Place, publisher, year, edition, pages
ML Research Press, 2020
Keywords
Keypoints, Monocular, Self-supervised-learning, Visual odometry
National Category
Computer graphics and computer vision Robotics and automation
Identifiers
urn:nbn:se:kth:diva-339686 (URN)2-s2.0-85175858250 (Scopus ID)
Conference
4th Conference on Robot Learning, CoRL 2020, Virtual/Online, United States of America, Nov 16 2020 - Nov 18 2020
Note

QC 20231116

Available from: 2023-11-16 Created: 2023-11-16 Last updated: 2025-02-05Bibliographically approved
Tang, J., Ericson, L., Folkesson, J. & Jensfelt, P. (2019). GCNv2: Efficient Correspondence Prediction for Real-Time SLAM. IEEE Robotics and Automation Letters, 4(4), 3505-3512
Open this publication in new window or tab >>GCNv2: Efficient Correspondence Prediction for Real-Time SLAM
2019 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 4, no 4, p. 3505-3512Article in journal (Refereed) Published
Abstract [en]

In this letter, we present a deep learning-based network, GCNv2, for generation of keypoints and descriptors. GCNv2 is built on our previous method, GCN, a network trained for 3D projective geometry. GCNv2 is designed with a binary descriptor vector as the ORB feature so that it can easily replace ORB in systems such as ORB-SLAM2. GCNv2 significantly improves the computational efficiency over GCN that was only able to run on desktop hardware. We show how a modified version of ORBSLAM2 using GCNv2 features runs on a Jetson TX2, an embedded low-power platform. Experimental results show that GCNv2 retains comparable accuracy as GCN and that it is robust enough to use for control of a flying drone. Source code is available at: https://github.com/jiexiong2016/GCNv2_SLAM.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
National Category
Robotics and automation
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-257883 (URN)10.1109/LRA.2019.2927954 (DOI)000477983400013 ()2-s2.0-85069905338 (Scopus ID)
Funder
Wallenberg AI, Autonomous Systems and Software Program (WASP)Swedish Foundation for Strategic Research, FactSwedish Research Council
Note

QC 20190909

Available from: 2019-09-06 Created: 2019-09-06 Last updated: 2025-02-09Bibliographically approved
Tang, J., Rares, A., Vitor, G., Sudeep, P., Hanme, K. & Adrien, G. (2019). Self-Supervised 3D Keypoint Learning for Ego-motionEstimation.
Open this publication in new window or tab >>Self-Supervised 3D Keypoint Learning for Ego-motionEstimation
Show others...
2019 (English)Report (Other academic)
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-273748 (URN)
Note

QC 20200702

Available from: 2020-05-26 Created: 2020-05-26 Last updated: 2025-02-07Bibliographically approved
Tang, J., Folkesson, J. & Jensfelt, P. (2019). Sparse2Dense: From Direct Sparse Odometry to Dense 3-D Reconstruction. IEEE Robotics and Automation Letters, 4(2), 530-537
Open this publication in new window or tab >>Sparse2Dense: From Direct Sparse Odometry to Dense 3-D Reconstruction
2019 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 4, no 2, p. 530-537Article in journal (Refereed) Published
Abstract [en]

In this letter, we proposed a new deep learning based dense monocular simultaneous localization and mapping (SLAM) method. Compared to existing methods, the proposed framework constructs a dense three-dimensional (3-D) model via a sparse to dense mapping using learned surface normals. With single view learned depth estimation as prior for monocular visual odometry, we obtain both accurate positioning and high-quality depth reconstruction. The depth and normal are predicted by a single network trained in a tightly coupled manner. Experimental results show that our method significantly improves the performance of visual tracking and depth prediction in comparison to the state-of-the-art in deep monocular dense SLAM.

Place, publisher, year, edition, pages
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2019
Keywords
Visual-based navigation, SLAM, deep learning in robotics and automation
National Category
Robotics and automation
Identifiers
urn:nbn:se:kth:diva-243927 (URN)10.1109/LRA.2019.2891433 (DOI)000456673300007 ()2-s2.0-85063310740 (Scopus ID)
Available from: 2019-03-13 Created: 2019-03-13 Last updated: 2025-02-09Bibliographically approved
Tang, J., Folkesson, J. & Jensfelt, P. (2018). Geometric Correspondence Network for Camera Motion Estimation. IEEE Robotics and Automation Letters, 3(2), 1010-1017
Open this publication in new window or tab >>Geometric Correspondence Network for Camera Motion Estimation
2018 (English)In: IEEE Robotics and Automation Letters, E-ISSN 2377-3766, Vol. 3, no 2, p. 1010-1017Article in journal (Refereed) Published
Abstract [en]

In this paper, we propose a new learning scheme for generating geometric correspondences to be used for visual odometry. A convolutional neural network (CNN) combined with a recurrent neural network (RNN) are trained together to detect the location of keypoints as well as to generate corresponding descriptors in one unified structure. The network is optimized by warping points from source frame to reference frame, with a rigid body transform. Essentially, learning from warping. The overall training is focused on movements of the camera rather than movements within the image, which leads to better consistency in the matching and ultimately better motion estimation. Experimental results show that the proposed method achieves better results than both related deep learning and hand crafted methods. Furthermore, as a demonstration of the promise of our method we use a naive SLAM implementation based on these keypoints and get a performance on par with ORB-SLAM.

Place, publisher, year, edition, pages
IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC, 2018
Keywords
Visual-based navigation, SLAM, deep learning in robotics and automation
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-223775 (URN)10.1109/LRA.2018.2794624 (DOI)000424646100022 ()2-s2.0-85063305858 (Scopus ID)
Note

QC 20180307

Available from: 2018-03-07 Created: 2018-03-07 Last updated: 2024-03-18Bibliographically approved
Organisations

Search in DiVA

Show all publications