Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
3D YOLO: End-to-End 3D Object Detection Using Point Clouds
KTH, School of Electrical Engineering and Computer Science (EECS).
2018 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
3D YOLO: Objektdetektering i 3D med LiDAR-data (Swedish)
Abstract [en]

For safe and reliable driving, it is essential that an autonomous vehicle can accurately perceive the surrounding environment. Modern sensor technologies used for perception, such as LiDAR and RADAR, deliver a large set of 3D measurement points known as a point cloud. There is a huge need to interpret the point cloud data to detect other road users, such as vehicles and pedestrians.

Many research studies have proposed image-based models for 2D object detection. This thesis takes it a step further and aims to develop a LiDAR-based 3D object detection model that operates in real-time, with emphasis on autonomous driving scenarios. We propose 3D YOLO, an extension of YOLO (You Only Look Once), which is one of the fastest state-of-the-art 2D object detectors for images. The proposed model takes point cloud data as input and outputs 3D bounding boxes with class scores in real-time. Most of the existing 3D object detectors use hand-crafted features, while our model follows the end-to-end learning fashion, which removes manual feature engineering.

3D YOLO pipeline consists of two networks: (a) Feature Learning Network, an artificial neural network that transforms the input point cloud to a new feature space; (b) 3DNet, a novel convolutional neural network architecture based on YOLO that learns the shape description of the objects.

Our experiments on the KITTI dataset shows that the 3D YOLO has high accuracy and outperforms the state-of-the-art LiDAR-based models in efficiency. This makes it a suitable candidate for deployment in autonomous vehicles.

Abstract [sv]

För att autonoma fordon ska ha en god uppfattning av sin omgivning används moderna sensorer som LiDAR och RADAR. Dessa genererar en stor mängd 3-dimensionella datapunkter som kallas point clouds. Inom utvecklingen av autonoma fordon finns det ett stort behov av att tolka LiDAR-data samt klassificera medtrafikanter. Ett stort antal studier har gjorts om 2D-objektdetektering som analyserar bilder för att upptäcka fordon, men vi är intresserade av 3D-objektdetektering med hjälp av endast LiDAR data. Därför introducerar vi modellen 3D YOLO, som bygger på YOLO (You Only Look Once), som är en av de snabbaste state-of-the-art modellerna inom 2D-objektdetektering för bilder. 3D YOLO tar in ett point cloud och producerar 3D lådor som markerar de olika objekten samt anger objektets kategori. Vi har tränat och evaluerat modellen med den publika träningsdatan KITTI. Våra resultat visar att 3D YOLO är snabbare än dagens state-of-the-art LiDAR-baserade modeller med en hög träffsäkerhet. Detta gör den till en god kandidat för kunna användas av autonoma fordon.

Place, publisher, year, edition, pages
2018. , p. 47
Series
TRITA-EECS-EX ; 2018:539
Keywords [en]
Computer vision, Machine Learning, Autonomous Vehicles, Autonomous Cars, Lider, Point Cloud, Deep Learning, Object Detection, YOLO
National Category
Computer Sciences Computer Vision and Robotics (Autonomous Systems)
Identifiers
URN: urn:nbn:se:kth:diva-234242OAI: oai:DiVA.org:kth-234242DiVA, id: diva2:1245296
Educational program
Master of Science - Machine Learning
Presentation
2018-06-18, Mötesrum 1537, Lindstedtsvägen 3, E-huset, huvudbyggnaden, floor 5, KTH Campus, Stockholm, 14:00 (English)
Supervisors
Examiners
Available from: 2018-09-19 Created: 2018-09-04 Last updated: 2018-09-19Bibliographically approved

Open Access in DiVA

fulltext(5781 kB)2213 downloads
File information
File name FULLTEXT01.pdfFile size 5781 kBChecksum SHA-512
6452737075887900bbbd013cd13a161002257aea2a4135532c5afdd6eb6779cfe870045892a1a348bf64afd44f583783fcd1bb5877a8027d9b7450b9daedd7f2
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Al Hakim, Ezeddin
By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer SciencesComputer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 2213 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 8942 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf