kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Enabling Energy-Efficient Inference for Self-Attention Mechanisms in Neural Networks
Univ Shanghai Sci & Technol, Inst Photon Chips, Shanghai, Peoples R China..
Nanjing Univ, Sch Elect Sci & Engn, Nanjing, Peoples R China..
KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems.ORCID iD: 0000-0003-0061-3475
Univ Zurich, Inst Neuroinformat, Zurich, Switzerland.;Swiss Fed Inst Technol, Zurich, Switzerland..ORCID iD: 0000-0002-3284-4078
2022 (English)In: 2022 Ieee International Conference On Artificial Intelligence Circuits And Systems (Aicas 2022): Intelligent Technology In The Post-Pandemic Era, Institute of Electrical and Electronics Engineers (IEEE) , 2022, p. 25-28Conference paper, Published paper (Refereed)
Abstract [en]

The study of specialized accelerators tailored for neural networks is becoming a promising topic in recent years. Such existing neural network accelerators are usually designed for convolutional neural networks (CNNs) or recurrent neural networks have been (RNNs), however, less attention has been paid to the attention mechanisms, which is an emerging neural network primitive with the ability to identify the relations within input entities. The self-attention-oriented models such as Transformer have achieved great performance on natural language processing, computer vision and machine translation. However, the self-attention mechanism has intrinsically expensive computational workloads, which increase quadratically with the number of input entities. Therefore, in this work, we propose an software-hardware co-design solution for energy-efficient self-attention inference. A prediction-based approximate self-attention mechanism is introduced to substantially reduce the runtime as well as power consumption, and then a specialized hardware architecture is designed to further increase the speedup. The design is implemented on a Xilinx XC7Z035 FPGA, and the results show that the energy efficiency is improved by 5.7x with less than 1% accuracy loss.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2022. p. 25-28
Keywords [en]
Self-attention, approximate computing, VLSI
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-321316DOI: 10.1109/AICAS54282.2022.9869924ISI: 000859273200008Scopus ID: 2-s2.0-85139001907OAI: oai:DiVA.org:kth-321316DiVA, id: diva2:1710212
Conference
IEEE International Conference on Artificial Intelligence Circuits and Systems (AICAS) - Intelligent Technology in the Post-Pandemic Era, JUN 13-15, 2022, Incheon, SOUTH KOREA
Note

QC 20221111

Part of proceeding: ISBN 978-1-6654-0996-4

Available from: 2022-11-11 Created: 2022-11-11 Last updated: 2022-11-11Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Lu, Zhonghai

Search in DiVA

By author/editor
Lu, ZhonghaiGao, Chang
By organisation
Electronics and Embedded systems
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 58 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf