kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
IECA: An In-Execution Configuration CNN Accelerator With 30.55 GOPS/mm(2) Area Efficiency
Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China..
KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems. Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China.ORCID iD: 0000-0002-9155-1451
Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China..
Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China..
Show others and affiliations
2021 (English)In: IEEE Transactions on Circuits and Systems Part 1: Regular Papers, ISSN 1549-8328, E-ISSN 1558-0806, Vol. 68, no 11, p. 4672-4685Article in journal (Refereed) Published
Abstract [en]

It remains challenging for a Convolutional Neural Network (CNN) accelerator to maintain high hardware utilization and low processing latency with restricted on-chip memory. This paper presents an In-Execution Configuration Accelerator (IECA) that realizes an efficient control scheme, exploring architectural data reuse, unified in-execution controlling, and pipelined latency hiding to minimize configuration overhead out of the computation scope. The proposed IECA achieves row-wise convolution with tiny distributed buffers and reduces the size of total on-chip memory by removing 40% of redundant memory storage with shared delay chains. By exploiting a reconfigurable Sequence Mapping Table (SMT) and Finite State Machine (FSM) control, the chip realizes cycle-accurate Processing Element (PE) control, automatic loop tiling and latency hiding without extra time slots for pre-configuration. Evaluated on AlexNet and VGG-16, the IECA retains over 97.3% PE utilization and over 95.6% memory access time hiding on average. The chip is designed and fabricated in a UMC 55-nm process running at a frequency of 250 MHz and achieves an area efficiency of 30.55 GOPS/mm(2) and 0.244 GOPS/KGE (kilo-gate-equivalent), which makes an over 2.0x and 2.1x improvement, respectively, compared with that of previous related works. Implementation of the IEC control scheme uses only a 0.55% area of the 2.75 mm(2) core.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2021. Vol. 68, no 11, p. 4672-4685
Keywords [en]
Convolutional neural network (CNN), area-efficient, accelerator, in-execution configuration
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:kth:diva-305370DOI: 10.1109/TCSI.2021.3108762ISI: 000716698600026Scopus ID: 2-s2.0-85119498618OAI: oai:DiVA.org:kth-305370DiVA, id: diva2:1616524
Note

QC 20211203

Available from: 2021-12-03 Created: 2021-12-03 Last updated: 2022-10-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Huan, Yuxiang

Search in DiVA

By author/editor
Huan, Yuxiang
By organisation
Electronics and Embedded systems
In the same journal
IEEE Transactions on Circuits and Systems Part 1: Regular Papers
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 107 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf