Integrating Vision Transformers and Neuromorphic Models: A Method for Object Detection in Event-Based Datasets
2024 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Event-based vision offers significant advantages for high-speed and low-power applications by capturing visual information through asynchronous event cameras. However, object detection in event-based datasets remains challenging due to the sparse and asynchronous nature of the data. This thesis explores the integration of neuromorphic models, specifically Leaky Integrate-and-Fire (LIF) neurons and their variants, with Vision Transformers to enhance object detection in event-based datasets. By replacing the recurrent layers in the Recurrent Vision Transformer (RVT) architecture with neuromorphic models, the study aims to leverage the temporal processing capabilities of Spiking Neural Networks (SNNs). Experiments were conducted using the Prophesee Gen1 Automotive Detection Dataset, and the models were evaluated using standard object detection metrics. The results indicate that while pure neuromorphic models underperformed compared to the baseline RVT, hybrid models integrating both neuromorphic and recurrent components demonstrated potential, especially with extended training iterations. The study highlights challenges in training neuromorphic models within such architectures, including hardware limitations and the need for optimized training strategies. This research contributes to the understanding of integrating neuromorphic computing with Vision Transformers and provides insights for future work in enhancing object detection in event-based vision applications.
Abstract [sv]
Eventbaserad seende ger stora fördelar för högprestanda och lågeffektsapplikationer genom att fånga visuell information via asynkrona händelsekameror. Emellertid förblir objektdetektion i eventbaserade dataset utmanande på grund av datans glesa och asynkrona natur. Denna avhandling undersöker integrationen av neuromorfa modeller, specifikt Leaky Integrate-and-Fire (LIF) neuroner och deras varianter, med Vision Transformers för att förbättra objektdetektion i eventbaserade dataset. Genom att ersätta de rekurrenta lagren i Recurrent Vision Transformer (RVT)- arkitekturen med neuromorfa modeller syftar studien till att utnyttja de temporala bearbetningsförmågorna hos Spiking Neural Networks (SNN). Experiment utfördes med hjälp av Prophesee Gen1 Automotive Detection Dataset, och modellerna utvärderades med standardmetoder för objektdetektion. Resultaten visar att medan rena neuromorfa modeller presterade sämre jämfört med baslinjen RVT, visade hybridmodeller som integrerar både neuromorfa och rekurrenta komponenter potential, särskilt med förlängda träningsiterationer. Studien belyser utmaningar i att träna neuromorfa modeller inom sådana arkitekturer, inklusive hårdvarubegränsningar och behovet av optimerade träningsstrategier. Denna forskning bidrar till förståelsen av att integrera neuromorf databehandling med Vision Transformers och ger insikter för framtida arbete med att förbättra objektdetektion i eventbaserade visionsapplikationer.
Place, publisher, year, edition, pages
2024. , p. 46
Series
TRITA-EECS-EX ; 2024:910
Keywords [en]
Event-based vision, Spiking Neural Networks, Vision Transformers, Neuromorphic models, Object detection, Leaky Integrate-and-Fire neurons.
Keywords [sv]
Eventbaserad vision, Spiking Neural Networks, Vision Transformers, Neuromorfa modeller, Objektdetektion, Leaky Integrate-and-Fire neuroner.
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-360876OAI: oai:DiVA.org:kth-360876DiVA, id: diva2:1942312
Supervisors
Examiners
2025-03-132025-03-042025-03-13Bibliographically approved