kth.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 233) Show all publications
Wang, D., Yan, X., Yu, Y., Stathis, D., Hemani, A., Lansner, A., . . . Zou, Z. (2025). Scalable Multi-FPGA HPC Architecture for Associative Memory System. IEEE Transactions on Biomedical Circuits and Systems, 19(2), 454-468
Open this publication in new window or tab >>Scalable Multi-FPGA HPC Architecture for Associative Memory System
Show others...
2025 (English)In: IEEE Transactions on Biomedical Circuits and Systems, ISSN 1932-4545, E-ISSN 1940-9990, Vol. 19, no 2, p. 454-468Article in journal (Refereed) Published
Abstract [en]

Associative memory is a cornerstone of cognitive intelligence within the human brain. The Bayesian confidence propagation neural network (BCPNN), a cortex-inspired model with high biological plausibility, has proven effective in emulating high-level cognitive functions like associative memory. However, the current approach using GPUs to simulate BCPNN-based associative memory tasks encounters challenges in latency and power efficiency as the model size scales. This work proposes a scalable multi-FPGA high performance computing (HPC) architecture designed for the associative memory system. The architecture integrates a set of hypercolumn unit (HCU) computing cores for intra-board online learning and inference, along with a spike-based synchronization scheme for inter-board communication among multiple FPGAs. Several design strategies, including population-based model mapping, packet-based spike synchronization, and cluster-based timing optimization, are presented to facilitate the multi-FPGA implementation. The architecture is implemented and validated on two Xilinx Alveo U50 FPGA cards, achieving a maximum model size of 200x10 and a peak working frequency of 220 MHz for the associative memory system. Both the memory-bounded spatial scalability and compute-bounded temporal scalability of the architecture are evaluated and optimized, achieving a maximum scale-latency ratio (SLR) of 268.82 for the two-FPGA implementation. Compared to a two-GPU counterpart, the two-FPGA approach demonstrates a maximum latency reduction of 51.72x and a power reduction exceeding 5.28x under the same network configuration. Compared with the state-of-the-art works, the two-FPGA implementation exhibits a high pattern storage capacity for the associative memory task.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
multi-FPGA, scalability, Associative memory, high performance computing (HPC), spiking neural network (SNN), Bayesian confidence propa-gation neural network (BCPNN)
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-363869 (URN)10.1109/TBCAS.2024.3446660 (DOI)001458211300017 ()39163180 (PubMedID)2-s2.0-85201785533 (Scopus ID)
Note

QC 20250526

Available from: 2025-05-26 Created: 2025-05-26 Last updated: 2025-05-26Bibliographically approved
Pudi, D., Yu, Y., Stathis, D., Prajapati, S. K., Boppu, S., Hemani, A. & Cenkeramaddi, L. R. (2024). Application Level Synthesis: Creating Matrix-Matrix Multiplication Library: A Case Study. IEEE Access, 12, 155885-155903
Open this publication in new window or tab >>Application Level Synthesis: Creating Matrix-Matrix Multiplication Library: A Case Study
Show others...
2024 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 12, p. 155885-155903Article in journal (Refereed) Published
Abstract [en]

Efficiently synthesizing an entire application that consists of multiple algorithms for hardware implementation is a very difficult and unsolved problem. One of the main challenges is the lack of good algorithmic libraries. A good algorithmic library should contain algorithmic implementations that can be physically composable, and their cost metrics can be accurately predictable. Physical composability and cost predictability can be achieved using a novel framework called SiLago. By physically abutting small hardware blocks together like Lego bricks, the SiLago framework can eliminate the time-consuming logic and physical synthesis and immediately give post-layout accurate cost estimation. In this paper, we build a library for matrix-matrix multiplication algorithm based on the SiLago framework as a case study because matrix-matrix multiplication is a fundamental operation in scientific computing that is frequently found in applications such as signal processing, image processing, pattern recognition, robotics, and so on. This paper demonstrates the methodology to construct such a library containing composable and predictable algorithms so that the application-level synthesis tools can utilize it to explore the design space for an entire application. Specifically, in this paper, we present an algorithm for matrix decomposition, several mapping strategies for selected kernel functions, an algorithm to construct the mapping of each matrix-matrix multiplication, and finally, the method to calculate the cost estimation of each solution.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Coarse grain reconfigurable architecture, field programmable gate array (FPGA), dynamically reconfigurable resource array (DRRA), distributed memory architecture (DiMArch), matrix multiplication, high-level synthesis, hardware accelerators, hardware-software co-design
National Category
Embedded Systems
Identifiers
urn:nbn:se:kth:diva-356494 (URN)10.1109/ACCESS.2024.3484175 (DOI)001347210500001 ()2-s2.0-85207469544 (Scopus ID)
Note

QC 20241115

Available from: 2024-11-15 Created: 2024-11-15 Last updated: 2024-11-15Bibliographically approved
Pudi, D., Tiwari, U., Boppu, S., Yu, Y. & Hemani, A. (2024). Automating functional unit and register binding for synchoros CGRA platform. Design automation for embedded systems, 28(2), 155-186
Open this publication in new window or tab >>Automating functional unit and register binding for synchoros CGRA platform
Show others...
2024 (English)In: Design automation for embedded systems, ISSN 0929-5585, E-ISSN 1572-8080, Vol. 28, no 2, p. 155-186Article in journal (Refereed) Published
Abstract [en]

Coarse-grain reconfigurable architectures, which provide high computing throughput, low cost, scalability, and energy efficiency, have grown in popularity in recent years. SiLago is a new VLSI design framework comprised of two coarse-grain reconfigurable fabrics: a dynamically reconfigurable resource array and a distributed memory architecture. It employs the Vesyla compiler to map streaming applications on these fabrics. Binding is a critical step in the high-level synthesis that maps operations and variables to functional units and storage elements in the design. It influences design performance metrics such as power, latency, area, etc. The current version of Vesyla does not support automatic binding, and it has to be specified manually through pragmas, which makes it less flexible. This paper proposes various approaches to automate the binding in Vesyla. We present a list scheduling-based approach to automate functional unit binding and an integer linear programming approach to automate register binding. Furthermore, we determine the binding of various basic linear algebraic subprogram and image processing tasks using the proposed approaches. Finally, a comparative analysis has been made between the automatic and manual binding concerning the power dissipation and latency for various benchmarks. The experimental results show that the proposed automatic binding consumes significantly less power for nearly the same latency as manual binding.

Place, publisher, year, edition, pages
Springer Nature, 2024
Keywords
Binding, Coarse-grain reconfigurable architecture, Distributed memory architecture, Dynamically reconfigurable resource array, High-level synthesis, Integer linear programming, SiLago, Vesyla
National Category
Embedded Systems
Identifiers
urn:nbn:se:kth:diva-366798 (URN)10.1007/s10617-024-09286-y (DOI)001244630800001 ()2-s2.0-85195687192 (Scopus ID)
Note

QC 20250710

Available from: 2025-07-10 Created: 2025-07-10 Last updated: 2025-07-10Bibliographically approved
Abbas, H., Shahzad, M., Safdar, M. & Hemani, A. (2024). DUDE: Decryption, Unpacking, Deobfuscation, and Endian Conversion Framework for Embedded Devices Firmware. IEEE Transactions on Dependable and Secure Computing, 21(4), 2917-2929
Open this publication in new window or tab >>DUDE: Decryption, Unpacking, Deobfuscation, and Endian Conversion Framework for Embedded Devices Firmware
2024 (English)In: IEEE Transactions on Dependable and Secure Computing, ISSN 1545-5971, E-ISSN 1941-0018, Vol. 21, no 4, p. 2917-2929Article in journal (Refereed) Published
Abstract [en]

Commercial-Off-The-Shelf (COTS) embedded devices rely on vendor-specific firmware to perform essential tasks. These firmware have been under active analysis by researchers to check security features and identify possible vendor backdoors. However, consistently unpacking newly created filesystem formats has been exceptionally challenging. To thwart attempts at unpacking, vendors frequently use encryption and obfuscation methods. On the other hand, when handling encrypted, obfuscated, big endian cramfs, or custom filesystem formats found in firmware under test, the available literature and tools are insufficient. This study introduces DUDE, an automated framework that provides novel functionalities, outperforming cutting-edge tools in the decryption, unpacking, deobfuscation, and endian conversion of firmware. For big endian compressed romfs filesystem formats, DUDE supports endian conversion. It also supports deobfuscating obfuscated signatures for successful unpacking. Moreover, decryption support for encrypted binaries from the D-Link and MOXA series has also been added, allowing for easier analysis and access to the contents of these firmware files. Additionally, the framework offers unpacking assistance by supporting the extraction of special filesystem formats commonly found in firmware samples from various vendors. A remarkable 78% (1424 out of 1814) firmware binaries from different vendors were successfully unpacked using the suggested framework. This performance surpasses the capabilities of commercially available tools combined on a single platform.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Binary analysis, deobfuscation, Encryption, EPROM, filesystem unpacking, Hardware, Internet of Things, Microprogramming, reverse engineering, Sockets, Task analysis
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-350059 (URN)10.1109/TDSC.2023.3320675 (DOI)001270728400058 ()2-s2.0-85173414016 (Scopus ID)
Note

QC 20240706

Available from: 2024-07-06 Created: 2024-07-06 Last updated: 2025-03-27Bibliographically approved
Yousefzadeh, S., Yu, Y., Peter, A., Stathis, D. & Hemani, A. (2024). Exploration of Custom Floating-Point Formats: A Systematic Approach. In: Proceedings - 2024 27th Euromicro Conference on Digital System Design, DSD 2024: . Paper presented at 27th Euromicro Conference on Digital System Design, DSD 2024, Paris, France, Aug 28 2024 - Aug 30 2024 (pp. 266-273). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Exploration of Custom Floating-Point Formats: A Systematic Approach
Show others...
2024 (English)In: Proceedings - 2024 27th Euromicro Conference on Digital System Design, DSD 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 266-273Conference paper, Published paper (Refereed)
Abstract [en]

The remarkable advancements in AI algorithms over the past three decades have been paralleled by an exponential growth in their complexity, with parameter counts soaring from 60,000 in LeNet during the late 1980s to a staggering 175 billion in ChatGPT 3.0. To mitigate this surge in memory footprint, approximate computing has emerged as a promising strategy, focusing on deploying the minimal resolution necessary to maintain acceptable accuracy. Yet, current practices are hindered by two major challenges: a) the process of identifying the optimal resolution and representation format for each tensor remains a manual, ad hoc task, and b) the representation, typically in floating point (FP) format, is confined to standardized norms predominantly supported by commercial-off-the-shelf (COTS) products like GPUs. This paper tackles these issues by introducing a systematic approach to exploring the FP representation design space to find the ideal FP format for each tensor, thereby leveraging the full potential of FP quantization techniques. It is designed for custom hardware, enabling access to arbitrary FP formats, but also allows users to limit their exploration to standard FP formats, making it compatible with COTS. Additionally, the proposed method explores the Block Floating-Point (BFP) and automatically decides on the size of the blocks. A heuristic-based search method is proposed to handle the large design space. The proposed approach is general, and the heuristic is not biased towards any specific category of algorithms. We apply this method to a Self-Organizing Map (SOM) for bacterial genome identification and LeNet-5 neural network, demonstrating a significant reduction in memory footprint by around 94% and 96%, respectively, compared to the conventional 32-bit FP baseline.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
approximate computing, block floating point, design space exploration, floating point, quantization
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-358137 (URN)10.1109/DSD64264.2024.00043 (DOI)001414927800034 ()2-s2.0-85211894711 (Scopus ID)
Conference
27th Euromicro Conference on Digital System Design, DSD 2024, Paris, France, Aug 28 2024 - Aug 30 2024
Note

 Part of ISBN 9798350380385

QC 20250115

Available from: 2025-01-07 Created: 2025-01-07 Last updated: 2025-03-24Bibliographically approved
Wang, D., Wang, Y., Yang, Y., Stathis, D., Hemani, A., Lansner, A., . . . Zou, Z. (2024). FPGA-Based HPC for Associative Memory System. In: 29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024: . Paper presented at 29th Asia and South Pacific Design Automation Conference (ASP-DAC), JAN 22-25, 2024, BrainKorea Four 21, Incheon, SOUTH KOREA (pp. 52-57). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>FPGA-Based HPC for Associative Memory System
Show others...
2024 (English)In: 29TH ASIA AND SOUTH PACIFIC DESIGN AUTOMATION CONFERENCE, ASP-DAC 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 52-57Conference paper, Published paper (Refereed)
Abstract [en]

Associative memory plays a crucial role in the cognitive capabilities of the human brain. The Bayesian Confidence Propagation Neural Network (BCPNN) is a cortex model capable of emulating brain-like cognitive capabilities, particularly associative memory. However, the existing GPU-based approach for BCPNN simulations faces challenges in terms of time overhead and power efficiency. In this paper, we propose a novel FPGA-based high performance computing (HPC) design for the BCPNN-based associative memory system. Our design endeavors to maximize the spatial and timing utilization of FPGA while adhering to the constraints of the available hardware resources. By incorporating optimization techniques including shared parallel computing units, hybrid-precision computing for a hybrid update mechanism, and the globally asynchronous and locally synchronous (GALS) strategy, we achieve a maximum network size of 150x10 and a peak working frequency of 100 MHz for the BCPNN-based associative memory system on the Xilinx Alveo U200 Card. The tradeoff between performance and hardware overhead of the design is explored and evaluated. Compared with the GPU counterpart, the FPGA-based implementation demonstrates significant improvements in both performance and energy efficiency, achieving a maximum latency reduction of 33.25x, and a power reduction of over 6.9x, all while maintaining the same network configuration.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Series
Asia and South Pacific Design Automation Conference Proceedings, ISSN 2153-6961
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-346310 (URN)10.1109/ASP-DAC58780.2024.10473880 (DOI)001196002900009 ()2-s2.0-85189308319 (Scopus ID)
Conference
29th Asia and South Pacific Design Automation Conference (ASP-DAC), JAN 22-25, 2024, BrainKorea Four 21, Incheon, SOUTH KOREA
Note

QC 20240513

Part of ISBN 979-8-3503-9354-5

Available from: 2024-05-13 Created: 2024-05-13 Last updated: 2024-07-23Bibliographically approved
Pudi, D., Malviya, S., Boppu, S., Yu, Y., Hemani, A. & Cenkeramaddi, L. R. (2024). Integer Linear Programming-Based Simultaneous Scheduling and Binding for SiLago Framework. IEEE Access, 12, 124081-124094
Open this publication in new window or tab >>Integer Linear Programming-Based Simultaneous Scheduling and Binding for SiLago Framework
Show others...
2024 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 12, p. 124081-124094Article in journal (Refereed) Published
Abstract [en]

Coarse-Grained Reconfigurable Array (CGRA) architectures are potential high-performance and power-efficient platforms. However, mapping applications efficiently on CGRA, which includes scheduling and binding operations on functional units and variables on registers, is a daunting problem. SiLago is a recently developed VLSI design framework comprising two large-scale reconfigurable fabrics: Dynamically Reconfigurable Resource Array (DRRA) and Distributed Memory Architecture (DiMArch). It uses the Vesyla compiler to map applications on these fabrics. The present version of Vesyla executes binding and scheduling sequentially, with binding first, followed by scheduling. In this paper, we proposed an Integer Linear Programming (ILP)-based exact method to solve scheduling and binding simultaneously that delivers better solutions while mapping applications on these fabrics. The proposed ILP combines two objective functions, one for scheduling and one for binding, and both of these objective functions are coupled with weightage factors $\alpha $ and $\beta $ so that the user can have the flexibility to prioritize either scheduling or binding or both based on the requirements. We determined the binding and execution time of image processing tasks and various routines of the Basic Linear Algebraic Subprogram (BLAS) using the proposed ILP for multiple combinations of weightage factors. Furthermore, a comparison analysis has been conducted to compare the latency and power dissipation of several benchmarks between the existing and proposed approaches. The experimental results demonstrate that the proposed method exhibits a substantial reduction in power consumption and latency compared to the existing method.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Random access memory, Radio frequency, Registers, Switches, Dynamic scheduling, Reconfigurable architectures, Distributed management, Memory management, Integer linear programming, Scheduling, Power demand, Coarse-grain reconfigurable architecture, dynamically reconfigurable resource array, distributed memory architecture, high-level synthesis, binding
National Category
Embedded Systems
Identifiers
urn:nbn:se:kth:diva-354595 (URN)10.1109/ACCESS.2024.3453503 (DOI)001311208000001 ()2-s2.0-85203629716 (Scopus ID)
Note

QC 20241008

Available from: 2024-10-08 Created: 2024-10-08 Last updated: 2024-10-08Bibliographically approved
Xu, J., Zheng, Y., Li, F., Stathis, D., Shen, R., Chu, H., . . . Hemani, A. (2024). Modeling Cycle-to-Cycle Variation in Memristors for In-Situ Unsupervised Trace-STDP Learning. IEEE Transactions on Circuits and Systems - II - Express Briefs, 71(2), 627-631
Open this publication in new window or tab >>Modeling Cycle-to-Cycle Variation in Memristors for In-Situ Unsupervised Trace-STDP Learning
Show others...
2024 (English)In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 71, no 2, p. 627-631Article in journal (Refereed) Published
Abstract [en]

Evaluating the computational accuracy of Spiking Neural Network (SNN) implemented as in-situ learning on large-scale memristor crossbars remains a challenge due to the lack of a versatile model for the variations in non-ideal memristors. This brief proposes a novel behavioral variation model along with a four-stage pipeline for physical memristors. The proposed variation model combines both absolute and relative variations. Therefore, it can better characterize different memristor cycle-to-cycle (C2C) variations in practice. The proposed variation model has been used to simulate the behavior of two physical memristors. Adopting the non-ideal memristor model, the trace-based spiking-timing dependent plasticity (STDP) unsupervised in-memristor learning system is simulated. Although the synaptic-level weight simulation shows a performance degradation of 7.99% and 4.07% increase in the relative root mean square error (RRMSE), the network-level simulation results show no accuracy loss on the MNIST benchmark. Furthermore, the impacts of absolute and relative C2C variations on network performance are simulated and analyzed through two sets of univariate experiments.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Memristors, Correlation, Integrated circuit modeling, Behavioral sciences, Mathematical models, Computational modeling, Task analysis, Memristor, non-ideality, variation model, trace-based STDP, in-situ unsupervised learning
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-345567 (URN)10.1109/TCSII.2023.3309329 (DOI)001167527900030 ()2-s2.0-85169689021 (Scopus ID)
Note

QC 20240412

Available from: 2024-04-12 Created: 2024-04-12 Last updated: 2024-04-12Bibliographically approved
Mirsalari, S. A., Yousefzadeh, S., Hemani, A. & Tagliavini, G. (2024). Unleashing 8-Bit Floating Point Formats Out of the Deep-Learning Domain. In: 2024 31st IEEE International Conference on Electronics, Circuits and Systems, ICECS 2024: . Paper presented at 31st IEEE International Conference on Electronics, Circuits and Systems, ICECS 2024, Nancy, France, Nov 18 2024 - Nov 20 2024. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Unleashing 8-Bit Floating Point Formats Out of the Deep-Learning Domain
2024 (English)In: 2024 31st IEEE International Conference on Electronics, Circuits and Systems, ICECS 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Reduced-precision floating-point (FP) arithmetic is a technology trend to minimize memory usage and execution time on power-constrained devices. This paper explores the potential applications of the 8-bit FP format beyond the classical deep learning use cases. We comprehensively analyze alternative FP8 formats, considering the allocation of mantissa and exponent bits. Additionally, we examine the impact on energy efficiency, accuracy, and execution time of several digital signal processing and classical machine learning kernels using the parallel ultra-low-power (PULP) platform based on the RISC-V instruction set architecture. Our findings show that using appropriate exponent choice and scaling methods results in acceptable errors compared to FP32. Our study facilitates the adoption of FP8 formats outside the deep learning domain to achieve consistent energy efficiency and speed improvements without compromising accuracy. On average, our results indicate speedup of 3.14x, 6.19x, 11.11x, and 18.81x on 1, 2, 4, and 8 cores, respectively. Furthermore, the vectorized implementation of FP8 in the same setup delivers remarkable energy savings of 2.97x, 5.07x, 7.37x, and 15.05x.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
approximate computing, float8, Parallel ultra-low-power platform, RISC-V, smallFloat data types
National Category
Computer Systems Computer Sciences
Identifiers
urn:nbn:se:kth:diva-360165 (URN)10.1109/ICECS61496.2024.10848785 (DOI)001445799800055 ()2-s2.0-85217619865 (Scopus ID)
Conference
31st IEEE International Conference on Electronics, Circuits and Systems, ICECS 2024, Nancy, France, Nov 18 2024 - Nov 20 2024
Note

Part of ISBN 979-8-3503-7720-0

QC 20250224

Available from: 2025-02-19 Created: 2025-02-19 Last updated: 2025-05-05Bibliographically approved
Wang, D., Xu, J., Li, F., Zhang, L., Cao, C., Stathis, D., . . . Zou, Z. (2023). A Memristor-Based Learning Engine for Synaptic Trace-Based Online Learning. IEEE Transactions on Biomedical Circuits and Systems, 17(5), 1153-1165
Open this publication in new window or tab >>A Memristor-Based Learning Engine for Synaptic Trace-Based Online Learning
Show others...
2023 (English)In: IEEE Transactions on Biomedical Circuits and Systems, ISSN 1932-4545, E-ISSN 1940-9990, Vol. 17, no 5, p. 1153-1165Article in journal (Refereed) Published
Abstract [en]

The memristor has been extensively used to facilitate the synaptic online learning of brain-inspired spiking neural networks (SNNs). However, the current memristor-based work can not support the widely used yet sophisticated trace-based learning rules, including the trace-based Spike-Timing-Dependent Plasticity (STDP) and the Bayesian Confidence Propagation Neural Network (BCPNN) learning rules. This paper proposes a learning engine to implement trace-based online learning, consisting of memristor-based blocks and analog computing blocks. The memristor is used to mimic the synaptic trace dynamics by exploiting the nonlinear physical property of the device. The analog computing blocks are used for the addition, multiplication, logarithmic and integral operations. By organizing these building blocks, a reconfigurable learning engine is architected and realized to simulate the STDP and BCPNN online learning rules, using memristors and 180 nm analog CMOS technology. The results show that the proposed learning engine can achieve energy consumption of 10.61 pJ and 51.49 pJ per synaptic update for the STDP and BCPNN learning rules, respectively, with a 147.03× and 93.61× reduction compared to the 180 nm ASIC counterparts, and also a 9.39× and 5.63× reduction compared to the 40 nm ASIC counterparts. Compared with the state-of-the-art work of Loihi and eBrainII, the learning engine can reduce the energy per synaptic update by 11.31× and 13.13× for trace-based STDP and BCPNN learning rules, respectively.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Bayesian confidence propagation neural network (BCPNN), learning engine, memristor, online learning, spike-timing-dependent plasticity (STDP), spiking neural network (SNN), trace dynamics
National Category
Biomedical Laboratory Science/Technology
Identifiers
urn:nbn:se:kth:diva-349842 (URN)10.1109/TBCAS.2023.3291021 (DOI)001122543600001 ()37390002 (PubMedID)2-s2.0-85163535883 (Scopus ID)
Note

QC 20240703

Available from: 2024-07-03 Created: 2024-07-03 Last updated: 2024-07-03Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0565-9376

Search in DiVA

Show all publications