kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Integrating Vision Transformers and Neuromorphic Models: A Method for Object Detection in Event-Based Datasets
KTH, School of Electrical Engineering and Computer Science (EECS).
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

Event-based vision offers significant advantages for high-speed and low-power applications by capturing visual information through asynchronous event cameras. However, object detection in event-based datasets remains challenging due to the sparse and asynchronous nature of the data. This thesis explores the integration of neuromorphic models, specifically Leaky Integrate-and-Fire (LIF) neurons and their variants, with Vision Transformers to enhance object detection in event-based datasets. By replacing the recurrent layers in the Recurrent Vision Transformer (RVT) architecture with neuromorphic models, the study aims to leverage the temporal processing capabilities of Spiking Neural Networks (SNNs). Experiments were conducted using the Prophesee Gen1 Automotive Detection Dataset, and the models were evaluated using standard object detection metrics. The results indicate that while pure neuromorphic models underperformed compared to the baseline RVT, hybrid models integrating both neuromorphic and recurrent components demonstrated potential, especially with extended training iterations. The study highlights challenges in training neuromorphic models within such architectures, including hardware limitations and the need for optimized training strategies. This research contributes to the understanding of integrating neuromorphic computing with Vision Transformers and provides insights for future work in enhancing object detection in event-based vision applications.

Abstract [sv]

Eventbaserad seende ger stora fördelar för högprestanda och lågeffektsapplikationer genom att fånga visuell information via asynkrona händelsekameror. Emellertid förblir objektdetektion i eventbaserade dataset utmanande på grund av datans glesa och asynkrona natur. Denna avhandling undersöker integrationen av neuromorfa modeller, specifikt Leaky Integrate-and-Fire (LIF) neuroner och deras varianter, med Vision Transformers för att förbättra objektdetektion i eventbaserade dataset. Genom att ersätta de rekurrenta lagren i Recurrent Vision Transformer (RVT)- arkitekturen med neuromorfa modeller syftar studien till att utnyttja de temporala bearbetningsförmågorna hos Spiking Neural Networks (SNN). Experiment utfördes med hjälp av Prophesee Gen1 Automotive Detection Dataset, och modellerna utvärderades med standardmetoder för objektdetektion. Resultaten visar att medan rena neuromorfa modeller presterade sämre jämfört med baslinjen RVT, visade hybridmodeller som integrerar både neuromorfa och rekurrenta komponenter potential, särskilt med förlängda träningsiterationer. Studien belyser utmaningar i att träna neuromorfa modeller inom sådana arkitekturer, inklusive hårdvarubegränsningar och behovet av optimerade träningsstrategier. Denna forskning bidrar till förståelsen av att integrera neuromorf databehandling med Vision Transformers och ger insikter för framtida arbete med att förbättra objektdetektion i eventbaserade visionsapplikationer.

Place, publisher, year, edition, pages
2024. , p. 46
Series
TRITA-EECS-EX ; 2024:910
Keywords [en]
Event-based vision, Spiking Neural Networks, Vision Transformers, Neuromorphic models, Object detection, Leaky Integrate-and-Fire neurons.
Keywords [sv]
Eventbaserad vision, Spiking Neural Networks, Vision Transformers, Neuromorfa modeller, Objektdetektion, Leaky Integrate-and-Fire neuroner.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-360876OAI: oai:DiVA.org:kth-360876DiVA, id: diva2:1942312
Supervisors
Examiners
Available from: 2025-03-13 Created: 2025-03-04 Last updated: 2025-03-13Bibliographically approved

Open Access in DiVA

fulltext(1206 kB)47 downloads
File information
File name FULLTEXT01.pdfFile size 1206 kBChecksum SHA-512
dea4db2abd73a52ac515604ae5a5e8c9498a8906e83ed08c848b84f8ce7255e2c1b2ddc2e5c7b98374b316ae96c44ac851cddabc4196b33fde0f26d26dd856d3
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 47 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 407 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf