Enabling Energy-Efficient Inference for Self-Attention Mechanisms in Neural Networks
2022 (English)In: 2022 Ieee International Conference On Artificial Intelligence Circuits And Systems (Aicas 2022): Intelligent Technology In The Post-Pandemic Era, Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 25-28Conference paper, Published paper (Refereed)
Abstract [en]
The study of specialized accelerators tailored for neural networks is becoming a promising topic in recent years. Such existing neural network accelerators are usually designed for convolutional neural networks (CNNs) or recurrent neural networks have been (RNNs), however, less attention has been paid to the attention mechanisms, which is an emerging neural network primitive with the ability to identify the relations within input entities. The self-attention-oriented models such as Transformer have achieved great performance on natural language processing, computer vision and machine translation. However, the self-attention mechanism has intrinsically expensive computational workloads, which increase quadratically with the number of input entities. Therefore, in this work, we propose an software-hardware co-design solution for energy-efficient self-attention inference. A prediction-based approximate self-attention mechanism is introduced to substantially reduce the runtime as well as power consumption, and then a specialized hardware architecture is designed to further increase the speedup. The design is implemented on a Xilinx XC7Z035 FPGA, and the results show that the energy efficiency is improved by 5.7x with less than 1% accuracy loss.
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2022. p. 25-28
Keywords [en]
Self-attention, approximate computing, VLSI
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-321316DOI: 10.1109/AICAS54282.2022.9869924ISI: 000859273200008Scopus ID: 2-s2.0-85139001907OAI: oai:DiVA.org:kth-321316DiVA, id: diva2:1710212
Conference
IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) - Intelligent Technology in the Post-Pandemic Era, JUN 13-15, 2022, Incheon, SOUTH KOREA
Note
QC 20221111
Part of proceeding: ISBN 978-1-6654-0996-4
2022-11-112022-11-112022-11-11Bibliographically approved