kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
CNN Sensor Analytics With Hybrid-Float6 Quantization on Low-Power Embedded FPGAs
Univ Bremen, Inst Electrodynam & Microelect, D-28359 Bremen, Germany..
Show others and affiliations
2023 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 11, p. 4852-4868Article in journal (Refereed) Published
Abstract [en]

The use of artificial intelligence (AI) in sensor analytics is entering a new era based on the use of ubiquitous embedded connected devices. This transformation requires the adoption of design techniques that reconcile accurate results with sustainable system architectures. As such, improving the efficiency of AI hardware engines as well as backward compatibility must be considered. In this paper, we present the Hybrid-Float6 (HF6) quantization and its dedicated hardware design. We propose an optimized multiply-accumulate (MAC) hardware by reducing the mantissa multiplication to a multiplexor-adder operation. We exploit the intrinsic error tolerance of neural networks to further reduce the hardware design with approximation. To preserve model accuracy, we present a quantization-aware training (QAT) method, which in some cases improves accuracy. We demonstrate this concept in 2D convolution layers. We present a lightweight tensor processor (TP) implementing a pipelined vector dot-product. For compatibility and portability, the 6-bit floating-point (FP) is wrapped in the standard FP format, which is automatically extracted by the proposed hardware. The hardware/software architecture is compatible with TensorFlow (TF) Lite. We evaluate the applicability of our approach with a CNN-regression model for anomaly localization in a structural health monitoring (SHM) application based on acoustic emission (AE). The embedded hardware/software framework is demonstrated on XC7Z007S as the smallest Zynq-7000 SoC. The proposed implementation achieves a peak power efficiency and run-time acceleration of 5.7 GFLOPS/s/W and 48.3x, respectively.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2023. Vol. 11, p. 4852-4868
Keywords [en]
Hardware, Quantization (signal), Field programmable gate arrays, Tensors, Computational modeling, Convolutional neural networks, Computer architecture, structural health monitoring, hardware accelerator, TensorFlow Lite, embedded systems, FPGA, custom floating-point
National Category
Computer Systems Embedded Systems
Identifiers
URN: urn:nbn:se:kth:diva-324310DOI: 10.1109/ACCESS.2023.3235866ISI: 000917229200001Scopus ID: 2-s2.0-85147312363OAI: oai:DiVA.org:kth-324310DiVA, id: diva2:1739635
Note

QC 20230227

Available from: 2023-02-27 Created: 2023-02-27 Last updated: 2023-02-27Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Chen, Yizhi

Search in DiVA

By author/editor
Chen, Yizhi
By organisation
Electronic and embedded systems
In the same journal
IEEE Access
Computer SystemsEmbedded Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 166 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf