Network-on-Chip (NoC) based Deep Neural Net-work (DNN) accelerators are widely adopted, but their performance is still not satisfactory as the network congestion may enlarge the inference latency. In this work, we leverage the idea of in-network processing and propose a computation-while-blocking method to conduct activation in network that improves inference latency for NoC-based DNN accelerators. Our approach offloads the non-linear activation from processing elements (PEs) to network routers. Based on a cycle-accurate NoC-DNN simulator, we experiment on a popular neural network model LeNet. The proposed approach can achieve up to 12% speedup in the first layer, and an overall around 6% decrease in total cycles compared to the baseline.
QC 20240704
Part of ISBN 979-8-3503-6034-9