kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (5 of 5) Show all publications
Ji, H., Li, S., Cao, Y., Ding, C., Xu, J., Tan, Q., . . . Zou, Z. (2025). A Computation and Energy Efficient Hardware Architecture for SSL Acceleration. In: ASP-DAC 2025 - 30th Asia and South Pacific Design Automation Conference, Proceedings: . Paper presented at 30th Asia and South Pacific Design Automation Conference, ASP-DAC 2025, Tokyo, Japan, January 20-23, 2025 (pp. 23-29). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>A Computation and Energy Efficient Hardware Architecture for SSL Acceleration
Show others...
2025 (English)In: ASP-DAC 2025 - 30th Asia and South Pacific Design Automation Conference, Proceedings, Association for Computing Machinery (ACM) , 2025, p. 23-29Conference paper, Published paper (Refereed)
Abstract [en]

In Computer Vision (CV), the deployment of Convolutional Neural Networks (CNNs) is often hindered by their substantial computational requirements and large labeled datasets. Self-supervised learning (SSL) serves as an effective approach to reducing the reliance on labeled data with the option of augmentation methods to infer and train CNNs. Excluding irrelevant features accelerates learning and improves optimization. We propose a Field-Programmable Gate Array (FPGA)-based hardware accelerator architecture tailored for SSL framework, leveraging its parallelism and reconfigurability to expedite block matching, optimize sparse convolutions, and manage data reuse, significantly improving resource and energy efficiency. The implementation and evaluation of our work on Xilinx ZCU102 FPGA working at 200 MHz confirm that the similarity finding part's FPGA accelerations with a low hardware overhead generates a latency of 0.0106 seconds, surpassing GPU and CPU, and in the sparse CNN's FPGA acceleration part, with the processing of VGG16 and ResNet50, compared with the related FPGA-based works, our design claims a maximum of 3.08× throughput improvement and 1.5× in energy efficiency.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2025
Keywords
CNN, computation efficiency, energy efficiency, FPGA acceleration, self-supervised learning
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-361963 (URN)10.1145/3658617.3697548 (DOI)001476945200004 ()2-s2.0-105000311281 (Scopus ID)
Conference
30th Asia and South Pacific Design Automation Conference, ASP-DAC 2025, Tokyo, Japan, January 20-23, 2025
Note

Part of ISBN 9798400706356

QC 20250404

Available from: 2025-04-03 Created: 2025-04-03 Last updated: 2025-10-06Bibliographically approved
Wang, D., Yan, X., Yu, Y., Stathis, D., Hemani, A., Lansner, A., . . . Zou, Z. (2025). Scalable Multi-FPGA HPC Architecture for Associative Memory System. IEEE Transactions on Biomedical Circuits and Systems, 19(2), 454-468
Open this publication in new window or tab >>Scalable Multi-FPGA HPC Architecture for Associative Memory System
Show others...
2025 (English)In: IEEE Transactions on Biomedical Circuits and Systems, ISSN 1932-4545, E-ISSN 1940-9990, Vol. 19, no 2, p. 454-468Article in journal (Refereed) Published
Abstract [en]

Associative memory is a cornerstone of cognitive intelligence within the human brain. The Bayesian confidence propagation neural network (BCPNN), a cortex-inspired model with high biological plausibility, has proven effective in emulating high-level cognitive functions like associative memory. However, the current approach using GPUs to simulate BCPNN-based associative memory tasks encounters challenges in latency and power efficiency as the model size scales. This work proposes a scalable multi-FPGA high performance computing (HPC) architecture designed for the associative memory system. The architecture integrates a set of hypercolumn unit (HCU) computing cores for intra-board online learning and inference, along with a spike-based synchronization scheme for inter-board communication among multiple FPGAs. Several design strategies, including population-based model mapping, packet-based spike synchronization, and cluster-based timing optimization, are presented to facilitate the multi-FPGA implementation. The architecture is implemented and validated on two Xilinx Alveo U50 FPGA cards, achieving a maximum model size of 200x10 and a peak working frequency of 220 MHz for the associative memory system. Both the memory-bounded spatial scalability and compute-bounded temporal scalability of the architecture are evaluated and optimized, achieving a maximum scale-latency ratio (SLR) of 268.82 for the two-FPGA implementation. Compared to a two-GPU counterpart, the two-FPGA approach demonstrates a maximum latency reduction of 51.72x and a power reduction exceeding 5.28x under the same network configuration. Compared with the state-of-the-art works, the two-FPGA implementation exhibits a high pattern storage capacity for the associative memory task.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
multi-FPGA, scalability, Associative memory, high performance computing (HPC), spiking neural network (SNN), Bayesian confidence propa-gation neural network (BCPNN)
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-363869 (URN)10.1109/TBCAS.2024.3446660 (DOI)001458211300017 ()39163180 (PubMedID)2-s2.0-85201785533 (Scopus ID)
Note

QC 20250526

Available from: 2025-05-26 Created: 2025-05-26 Last updated: 2025-05-26Bibliographically approved
Xu, J., Zheng, Y., Li, F., Stathis, D., Shen, R., Chu, H., . . . Hemani, A. (2024). Modeling Cycle-to-Cycle Variation in Memristors for In-Situ Unsupervised Trace-STDP Learning. IEEE Transactions on Circuits and Systems - II - Express Briefs, 71(2), 627-631
Open this publication in new window or tab >>Modeling Cycle-to-Cycle Variation in Memristors for In-Situ Unsupervised Trace-STDP Learning
Show others...
2024 (English)In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 71, no 2, p. 627-631Article in journal (Refereed) Published
Abstract [en]

Evaluating the computational accuracy of Spiking Neural Network (SNN) implemented as in-situ learning on large-scale memristor crossbars remains a challenge due to the lack of a versatile model for the variations in non-ideal memristors. This brief proposes a novel behavioral variation model along with a four-stage pipeline for physical memristors. The proposed variation model combines both absolute and relative variations. Therefore, it can better characterize different memristor cycle-to-cycle (C2C) variations in practice. The proposed variation model has been used to simulate the behavior of two physical memristors. Adopting the non-ideal memristor model, the trace-based spiking-timing dependent plasticity (STDP) unsupervised in-memristor learning system is simulated. Although the synaptic-level weight simulation shows a performance degradation of 7.99% and 4.07% increase in the relative root mean square error (RRMSE), the network-level simulation results show no accuracy loss on the MNIST benchmark. Furthermore, the impacts of absolute and relative C2C variations on network performance are simulated and analyzed through two sets of univariate experiments.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Memristors, Correlation, Integrated circuit modeling, Behavioral sciences, Mathematical models, Computational modeling, Task analysis, Memristor, non-ideality, variation model, trace-based STDP, in-situ unsupervised learning
National Category
Bioinformatics (Computational Biology)
Identifiers
urn:nbn:se:kth:diva-345567 (URN)10.1109/TCSII.2023.3309329 (DOI)001167527900030 ()2-s2.0-85169689021 (Scopus ID)
Note

QC 20240412

Available from: 2024-04-12 Created: 2024-04-12 Last updated: 2024-04-12Bibliographically approved
Wang, D., Xu, J., Li, F., Zhang, L., Cao, C., Stathis, D., . . . Zou, Z. (2023). A Memristor-Based Learning Engine for Synaptic Trace-Based Online Learning. IEEE Transactions on Biomedical Circuits and Systems, 17(5), 1153-1165
Open this publication in new window or tab >>A Memristor-Based Learning Engine for Synaptic Trace-Based Online Learning
Show others...
2023 (English)In: IEEE Transactions on Biomedical Circuits and Systems, ISSN 1932-4545, E-ISSN 1940-9990, Vol. 17, no 5, p. 1153-1165Article in journal (Refereed) Published
Abstract [en]

The memristor has been extensively used to facilitate the synaptic online learning of brain-inspired spiking neural networks (SNNs). However, the current memristor-based work can not support the widely used yet sophisticated trace-based learning rules, including the trace-based Spike-Timing-Dependent Plasticity (STDP) and the Bayesian Confidence Propagation Neural Network (BCPNN) learning rules. This paper proposes a learning engine to implement trace-based online learning, consisting of memristor-based blocks and analog computing blocks. The memristor is used to mimic the synaptic trace dynamics by exploiting the nonlinear physical property of the device. The analog computing blocks are used for the addition, multiplication, logarithmic and integral operations. By organizing these building blocks, a reconfigurable learning engine is architected and realized to simulate the STDP and BCPNN online learning rules, using memristors and 180 nm analog CMOS technology. The results show that the proposed learning engine can achieve energy consumption of 10.61 pJ and 51.49 pJ per synaptic update for the STDP and BCPNN learning rules, respectively, with a 147.03× and 93.61× reduction compared to the 180 nm ASIC counterparts, and also a 9.39× and 5.63× reduction compared to the 40 nm ASIC counterparts. Compared with the state-of-the-art work of Loihi and eBrainII, the learning engine can reduce the energy per synaptic update by 11.31× and 13.13× for trace-based STDP and BCPNN learning rules, respectively.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
Bayesian confidence propagation neural network (BCPNN), learning engine, memristor, online learning, spike-timing-dependent plasticity (STDP), spiking neural network (SNN), trace dynamics
National Category
Biomedical Laboratory Science/Technology
Identifiers
urn:nbn:se:kth:diva-349842 (URN)10.1109/TBCAS.2023.3291021 (DOI)001122543600001 ()37390002 (PubMedID)2-s2.0-85163535883 (Scopus ID)
Note

QC 20240703

Available from: 2024-07-03 Created: 2024-07-03 Last updated: 2024-07-03Bibliographically approved
Xu, J., Zheng, Y., Sheng, C., Cai, Y., Stathis, D., Shen, R., . . . Hemani, A. (2023). Optoelectronic memristor model for optical synaptic circuit of spiking neural networks. In: 21st IEEE Interregional NEWCAS Conference, NEWCAS 2023: Proceedings. Paper presented at 21st IEEE Interregional NEWCAS Conference, NEWCAS 2023, Edinburgh, United Kingdom of Great Britain and Northern Ireland, Jun 26 2023 - Jun 28 2023. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Optoelectronic memristor model for optical synaptic circuit of spiking neural networks
Show others...
2023 (English)In: 21st IEEE Interregional NEWCAS Conference, NEWCAS 2023: Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2023Conference paper, Published paper (Refereed)
Abstract [en]

Optoelectronic memristors are suitable candidates for hardware implementation of optical synapses in spiking neural networks (SNNs), thanks to their electrical and optical characteristics. To study the feasibility of memristor-based optical synapses in SNNs, a behavior model for optoelectronic memristors is proposed in this paper, including electrical programming modeling and photocurrent read modeling. Based on the model, the behavior of a molecular ferroelectric (MF)/semiconductor interfacial memristor is simulated. This paper also proposes an optical synaptic circuit for trace-based spike-timing-dependent plasticity (STDP) learning rule. The electrical characteristics of the memristor are explored and exploited to emulate the trace in the pairwise nearest-neighbor STDP, while the optical characteristics are utilized for non-destructive readout and weight calculation. Synaptic-level simulation results show a 99.96% correlation coefficient (CC) and a 1.91% relative root mean square error (RRMSE) in the weight approximate computation. Extending the simulation to the network level, the optoelectronic memristor-based unsupervised STDP learning system can achieve a 92.07± 0.64% accuracy on the MNIST benchmark.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
memristor model, optical synapse, Optoelectric memristor, STDP learning rule, trace dynamics
National Category
Bioinformatics (Computational Biology) Communication Systems
Identifiers
urn:nbn:se:kth:diva-336780 (URN)10.1109/NEWCAS57931.2023.10198087 (DOI)001050763800058 ()2-s2.0-85168549775 (Scopus ID)
Conference
21st IEEE Interregional NEWCAS Conference, NEWCAS 2023, Edinburgh, United Kingdom of Great Britain and Northern Ireland, Jun 26 2023 - Jun 28 2023
Note

Part of ISBN 9798350300246

QC 20230920

Available from: 2023-09-20 Created: 2023-09-20 Last updated: 2023-10-23Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-6192-558X

Search in DiVA

Show all publications