kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Pooling On-the-Go for NoC-Based Convolutional Neural Network Accelerator
KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems.ORCID iD: 0000-0002-4911-0257
KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems.ORCID iD: 0000-0001-8488-3506
KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems.ORCID iD: 0000-0003-0061-3475
2025 (English)In: Embedded Computer Systems: Architectures, Modeling, and Simulation - 24th International Conference, SAMOS 2024, Proceedings, Springer Nature , 2025, p. 109-118Conference paper, Published paper (Refereed)
Abstract [en]

Due to the complexity and diversity of deep convolutional neural networks (CNNs), Network-on-chip (NoC) based CNN accelerators have grown in popularity to improve inference efficiency and flexibility. Current optimization approaches focus on computational-heavy layers. Therefore, pooling layers are often ignored and processed individually using general processing units. In this work, we explore the acceleration of pooling layers by in-network processing. We propose a pooling on-the-go method to do the pooling operations while transmitting its prior layer outputs. Consequently, we combine the pooling layer with its prior convolution layer to remove unnecessary data movements. We demonstrate our method on a cycle-accurate NoC-CNN accelerator simulator on two CNN models, LeNet and VGG16. The results show that the processing time of individual pooling layers is almost eliminated by around 99%. Compared with the pooling standalone baseline, we can achieve 1.09x speedup in the full LeNet model, and up to 1.16x speedup in the combined layers that our approach applies.

Place, publisher, year, edition, pages
Springer Nature , 2025. p. 109-118
Keywords [en]
CNN Accelerator, In-network Processing, Network-on-Chip, Pooling
National Category
Computer graphics and computer vision
Identifiers
URN: urn:nbn:se:kth:diva-360913DOI: 10.1007/978-3-031-78380-7_9ISI: 001447102500009Scopus ID: 2-s2.0-85218439695OAI: oai:DiVA.org:kth-360913DiVA, id: diva2:1942576
Conference
24th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2024, Samos, Greece, Jun 29 2024 - Jul 4 2024
Note

Part of ISBN 9783031783791

QC 20250310

Available from: 2025-03-05 Created: 2025-03-05 Last updated: 2025-05-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Zhu, WenyaoChen, YizhiLu, Zhonghai

Search in DiVA

By author/editor
Zhu, WenyaoChen, YizhiLu, Zhonghai
By organisation
Electronics and Embedded systems
Computer graphics and computer vision

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 97 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf