kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 290) Show all publications
Cui, L., Wu, F., Liu, X., Zhang, M., Lu, Z., Li, M. & Xie, C. (2025). AsLDPC: Improving Decoding Performance with Absorbing Set Characteristic Aware Low-Density Parity-Check Code. IEEE transactions on consumer electronics
Open this publication in new window or tab >>AsLDPC: Improving Decoding Performance with Absorbing Set Characteristic Aware Low-Density Parity-Check Code
Show others...
2025 (English)In: IEEE transactions on consumer electronics, ISSN 0098-3063, E-ISSN 1558-4127Article in journal (Refereed) Published
Abstract [en]

Not AND (NAND) flash memory has been widely used in consumer electronics in recent years. Due to various interferences faced by NAND flash memory, data reliability has severely declined. Low-density parity-check (LDPC) codes are leveraged for popular triple-level cell (TLC) NAND flash due to their strong error correction capability. LDPC decoding is an iterative procedure of log-likelihood ratio (LLR). The absorbing sets cause inaccurate LLRs to continuously propagate, making decoding difficult to converge, and severely affecting decoding performance. To improve decoding performance after LLR quantization, we propose an absorbing set characteristic aware LDPC algorithm, called AsLDPC, which modifies the LLR of some nodes in the absorbing sets through two Steps. For Step One, we select variable nodes that fall into absorbing sets based on sign characteristics of LLR at the late iteration and modify the initial prior message. For Step Two, we adjust the LLR of check nodes that satisfy the variable nodes and utilize the revised LLR to continue iteration. AsLDPC abstracts the process of finding the node locations in absorbing sets into formulas, finds the erroneous nodes through calculation, and then modifies their LLR messages. Simulation results demonstrate that AsLDPC significantly reduces the frame error rate (FER) by 58% and 48.1% for two LDPC codes, respectively, when compared to the quasi-uniform quantization scheme, while also reducing decoding latency. Additionally, AsLDPC outperforms the uniform algorithm in terms of storage space efficiency and maintains accurate decoding capabilities for raw bit error rate (RBER) below 1.2e - 2.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
Keywords
absorbing set, low-density parity-check (LDPC) code, NAND flash, quantization
National Category
Telecommunications
Identifiers
urn:nbn:se:kth:diva-369055 (URN)10.1109/TCE.2025.3588178 (DOI)2-s2.0-105010890949 (Scopus ID)
Note

QC 20250916

Available from: 2025-09-16 Created: 2025-09-16 Last updated: 2025-09-16Bibliographically approved
Zhu, W., Chen, Y. & Lu, Z. (2025). Pooling On-the-Go for NoC-Based Convolutional Neural Network Accelerator. In: Embedded Computer Systems: Architectures, Modeling, and Simulation - 24th International Conference, SAMOS 2024, Proceedings: . Paper presented at 24th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2024, Samos, Greece, Jun 29 2024 - Jul 4 2024 (pp. 109-118). Springer Nature
Open this publication in new window or tab >>Pooling On-the-Go for NoC-Based Convolutional Neural Network Accelerator
2025 (English)In: Embedded Computer Systems: Architectures, Modeling, and Simulation - 24th International Conference, SAMOS 2024, Proceedings, Springer Nature , 2025, p. 109-118Conference paper, Published paper (Refereed)
Abstract [en]

Due to the complexity and diversity of deep convolutional neural networks (CNNs), Network-on-chip (NoC) based CNN accelerators have grown in popularity to improve inference efficiency and flexibility. Current optimization approaches focus on computational-heavy layers. Therefore, pooling layers are often ignored and processed individually using general processing units. In this work, we explore the acceleration of pooling layers by in-network processing. We propose a pooling on-the-go method to do the pooling operations while transmitting its prior layer outputs. Consequently, we combine the pooling layer with its prior convolution layer to remove unnecessary data movements. We demonstrate our method on a cycle-accurate NoC-CNN accelerator simulator on two CNN models, LeNet and VGG16. The results show that the processing time of individual pooling layers is almost eliminated by around 99%. Compared with the pooling standalone baseline, we can achieve 1.09x speedup in the full LeNet model, and up to 1.16x speedup in the combined layers that our approach applies.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
CNN Accelerator, In-network Processing, Network-on-Chip, Pooling
National Category
Computer graphics and computer vision
Identifiers
urn:nbn:se:kth:diva-360913 (URN)10.1007/978-3-031-78380-7_9 (DOI)001447102500009 ()2-s2.0-85218439695 (Scopus ID)
Conference
24th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2024, Samos, Greece, Jun 29 2024 - Jul 4 2024
Note

Part of ISBN 9783031783791

QC 20250310

Available from: 2025-03-05 Created: 2025-03-05 Last updated: 2025-05-27Bibliographically approved
Chen, Y., Zhu, W. & Lu, Z. (2025). Travel Time-Based Task Mapping for NoC-Based DNN Accelerator. In: Embedded Computer Systems: Architectures, Modeling, and Simulation - 24th International Conference, SAMOS 2024, Proceedings: . Paper presented at 24th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2024, Samos, Greece, June 29 - July 4, 2024 (pp. 76-92). Springer Nature
Open this publication in new window or tab >>Travel Time-Based Task Mapping for NoC-Based DNN Accelerator
2025 (English)In: Embedded Computer Systems: Architectures, Modeling, and Simulation - 24th International Conference, SAMOS 2024, Proceedings, Springer Nature , 2025, p. 76-92Conference paper, Published paper (Refereed)
Abstract [en]

Network-on-Chip (NoC) based architectures are recently proposed to accelerate deep neural networks in specialized hardware. Given that the hardware configuration is fixed post-manufacture, proper task mapping attracts researchers’ interest. We propose a travel time-based task mapping method that allocates uneven counts of tasks across different Processing Elements (PEs). This approach utilizes the travel time recorded in the sampling window and implicitly makes use of static NoC architecture information and dynamic NoC congestion status. Furthermore, we examine the effectiveness of our method under various configurations, including different mapping iterations, flit sizes, and NoC architectures. Our method achieves up to 12.1% improvement compared with even mapping and static distance mapping for one layer. For a complete NN example, our method achieves 10.37% and 13.75% overall improvements to row-major mapping and distance-based mapping, respectively. While ideal travel time-based mapping (post-run) achieves 10.37% overall improvements to row-major mapping, we adopt a sampling window to efficiently map tasks during the running, achieving 8.17% (sampling window 10) improvement.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
DNN accelerator, Network-on-Chip, Task mapping
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-360912 (URN)10.1007/978-3-031-78377-7_6 (DOI)001447099800006 ()2-s2.0-85218456046 (Scopus ID)
Conference
24th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2024, Samos, Greece, June 29 - July 4, 2024
Note

Part of ISBN 9783031783760

QC 20250310

Available from: 2025-03-05 Created: 2025-03-05 Last updated: 2025-06-02Bibliographically approved
Zhu, W., Chen, Y. & Lu, Z. (2024). Activation in Network for NoC-Based Deep Neural Network Accelerator. In: 2024 International VLSI Symposium on Technology, Systems and Applications, VLSI TSA 2024 - Proceedings: . Paper presented at 2024 International VLSI Symposium on Technology, Systems and Applications, VLSI TSA 2024, Hsinchu, Taiwan, Apr 22 2024 - Apr 25 2024. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Activation in Network for NoC-Based Deep Neural Network Accelerator
2024 (English)In: 2024 International VLSI Symposium on Technology, Systems and Applications, VLSI TSA 2024 - Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Network-on-Chip (NoC) based Deep Neural Net-work (DNN) accelerators are widely adopted, but their performance is still not satisfactory as the network congestion may enlarge the inference latency. In this work, we leverage the idea of in-network processing and propose a computation-while-blocking method to conduct activation in network that improves inference latency for NoC-based DNN accelerators. Our approach offloads the non-linear activation from processing elements (PEs) to network routers. Based on a cycle-accurate NoC-DNN simulator, we experiment on a popular neural network model LeNet. The proposed approach can achieve up to 12% speedup in the first layer, and an overall around 6% decrease in total cycles compared to the baseline.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Deep neural networks, DNN accelerator, In-network processing, Network-on-Chip
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-349914 (URN)10.1109/VLSITSA60681.2024.10546384 (DOI)001253001400044 ()2-s2.0-85196721436 (Scopus ID)
Conference
2024 International VLSI Symposium on Technology, Systems and Applications, VLSI TSA 2024, Hsinchu, Taiwan, Apr 22 2024 - Apr 25 2024
Note

QC 20240704

Part of ISBN 979-8-3503-6034-9

Available from: 2024-07-03 Created: 2024-07-03 Last updated: 2024-09-03Bibliographically approved
Zhu, W., Liu, Z., Chen, Y., Chen, D. & Lu, Z. (2024). Amputee Gait Phase Recognition Using Multiple GMM-HMM. IEEE Access, 12, 193796-193806
Open this publication in new window or tab >>Amputee Gait Phase Recognition Using Multiple GMM-HMM
Show others...
2024 (English)In: IEEE Access, E-ISSN 2169-3536, Vol. 12, p. 193796-193806Article in journal (Refereed) Published
Abstract [en]

Gait analysis helps clinical assessment and achieves comfortable prosthetic designs for lower limb amputees, in which accurate gait phase recognition is a key component. However, gait phase detection remains a challenge due to the individual nature of prosthetic sockets and limbs. For the first time, we present a gait phase recognition approach for transfemoral amputees based on intra-socket pressure measurement. We proposed a multiple GMM-HMM (Hidden Markov Model with Gaussian Mixture Model emissions) method to label the gait events during walking. For each of the gait phases in the gait cycle, a separate GMM-HMM model is trained from the collected pressure data. We use gait phase recognition accuracy as a primary metric. The evaluation of six human subjects during walking shows a high accuracy of over 99% for single-subject, around 97.4% for multiple-subject, and up to 84.5% for unseen-subject scenarios. We compare our approach with the widely used CHMM (Continuous HMM) and LSTM (Long Short-term Memory) based methods, demonstrating better recognition accuracy performance across all scenarios.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Hidden Markov models, Sockets, Pressure measurement, Prosthetics, Legged locomotion, Accuracy, Gaussian mixture model, Foot, Viterbi algorithm, Phase measurement, Gait phase recognition, hidden Markov model, lower limb prosthesis
National Category
Signal Processing
Identifiers
urn:nbn:se:kth:diva-358816 (URN)10.1109/ACCESS.2024.3516520 (DOI)001383061300030 ()2-s2.0-85212783100 (Scopus ID)
Note

QC 20250122

Available from: 2025-01-22 Created: 2025-01-22 Last updated: 2025-01-22Bibliographically approved
Xue, Y., Ji, J., Yu, X., Zhou, S., Li, S., Li, X., . . . Fu, Y. (2024). Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning. IEEE Transactions on Computers, 73(12), 2882-2896
Open this publication in new window or tab >>Automatic Generation and Optimization Framework of NoC-Based Neural Network Accelerator Through Reinforcement Learning
Show others...
2024 (English)In: IEEE Transactions on Computers, ISSN 0018-9340, E-ISSN 1557-9956, Vol. 73, no 12, p. 2882-2896Article in journal (Refereed) Published
Abstract [en]

Choices of dataflows, which are known as intra-core neural network (NN) computation loop nest scheduling and inter-core hardware mapping strategies, play a critical role in the performance and energy efficiency of NoC-based neural network accelerators. Confronted with an enormous dataflow exploration space, this paper proposes an automatic framework for generating and optimizing the full-layer-mappings based on two reinforcement learning algorithms including A2C and PPO. Combining soft and hard constraints, this work transforms the mapping configuration into a sequential decision problem and aims to explore the performance and energy efficient hardware mapping for NoC systems. We evaluate the performance of the proposed framework on 10 experimental neural networks. The results show that compared with the direct-X mapping, the direct-Y mapping, GA-base mapping, and NN-aware mapping, our optimization framework reduces the average execution time of 10 experimental NNs by 9.09%, improves the throughput by 11.27%, reduces the energy by 12.62%, and reduces the time-energy-product (TEP) by 14.49%. The results also show that the performance enhancement is related to the coefficient of variation of the neural network to be computed.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
hardware mapping, Network-on-chip, neural networks, reinforcement learning
National Category
Embedded Systems
Identifiers
urn:nbn:se:kth:diva-367187 (URN)10.1109/TC.2024.3441822 (DOI)001351576000018 ()2-s2.0-85201268286 (Scopus ID)
Note

QC 20250716

Available from: 2025-07-16 Created: 2025-07-16 Last updated: 2025-07-16Bibliographically approved
Fan, W., Li, S., Zhu, L., Lu, Z., Li, L. & Fu, Y. (2024). Communication Synchronization-Aware Arbitration Policy in NoC-Based DNN Accelerators. IEEE Transactions on Circuits and Systems - II - Express Briefs, 71(10), 4521-4525
Open this publication in new window or tab >>Communication Synchronization-Aware Arbitration Policy in NoC-Based DNN Accelerators
Show others...
2024 (English)In: IEEE Transactions on Circuits and Systems - II - Express Briefs, ISSN 1549-7747, E-ISSN 1558-3791, Vol. 71, no 10, p. 4521-4525Article in journal (Refereed) Published
Abstract [en]

In NoC-based neural network accelerators, many-to-one and many-to-many are prevalent traffic patterns. In these traffic patterns, there exists a need for communication synchronization between Processing Elements (PEs) of adjacent layers to optimize latency. The last received packet will determine the end time of a layer's computation. The Communication Synchronization-aware Arbitration Policy (CSAP) is proposed in this brief to handle this problem, which uses a negative feedback mechanism to regulate the packet sending rate of each source node. Compared with the local-age-based policy, CSAP decreases the execution by 4.69% $\sim$ 12.55% across neural networks of different scales. The proposed policy only incurs 1.06% additional hardware overhead in the router compared with the local-age-based policy.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Synchronization, Biological neural networks, Artificial neural networks, Traffic control, Neurons, Hardware, Pipelines, Network-on-chip (NoC), deep neural network (DNN), accelerator, arbitration policy
National Category
Computer Engineering Computer Sciences
Identifiers
urn:nbn:se:kth:diva-355347 (URN)10.1109/TCSII.2024.3395054 (DOI)001322634100028 ()2-s2.0-85192197943 (Scopus ID)
Note

QC 20241031

Available from: 2024-10-31 Created: 2024-10-31 Last updated: 2024-10-31Bibliographically approved
Lu, Z. & Liu, M. (2024). Computational Network-on-Chip as Convolution Engine. In: 2024 International VLSI Symposium on Technology, Systems and Applications, VLSI TSA 2024 - Proceedings: . Paper presented at 2024 International VLSI Symposium on Technology, Systems and Applications, VLSI TSA 2024, Hsinchu, Taiwan, Apr 22 2024 - Apr 25 2024. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Computational Network-on-Chip as Convolution Engine
2024 (English)In: 2024 International VLSI Symposium on Technology, Systems and Applications, VLSI TSA 2024 - Proceedings, Institute of Electrical and Electronics Engineers (IEEE) , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Inspired by PiN, Processing in Network-on-Chip (NoC), we propose a computational NoC as a convolution engine for accelerating convolutional neural networks in hardware. In contrast to traditional compute architectures where computation and communication are conducted serially and in separation, our computational NoC enables in-transit computation, meaning that computation is performed while packets are transferred in the network. In the paper, we present the router architecture that supports the novel in-transit computation concept, and use a running example to detail the entire convolution process in the computational NoC. Finally, we show simulated performance results in comparison with traditional NoC-based convolution engine.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Computational network, Hardware accelerator, Network-on-Chip, Neural network
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-349913 (URN)10.1109/VLSITSA60681.2024.10546460 (DOI)001253001400119 ()2-s2.0-85196730872 (Scopus ID)
Conference
2024 International VLSI Symposium on Technology, Systems and Applications, VLSI TSA 2024, Hsinchu, Taiwan, Apr 22 2024 - Apr 25 2024
Note

Part of ISBN 9798350360349

QC 20240708

Available from: 2024-07-03 Created: 2024-07-03 Last updated: 2024-09-03Bibliographically approved
Dong, X., Gao, C., Lu, Z., Zhang, W., Zhao, Y. & Jiang, X. (2024). Gait Recognition Based on Modified OVR-CSP Fusion Feature and LSTM. In: 2024 7th International Conference on Advanced Algorithms and Control Engineering, ICAACE 2024: . Paper presented at 7th International Conference on Advanced Algorithms and Control Engineering, ICAACE 2024, Hybrid, Shanghai, China, Mar 1 2024 - Mar 3 2024 (pp. 1551-1554). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Gait Recognition Based on Modified OVR-CSP Fusion Feature and LSTM
Show others...
2024 (English)In: 2024 7th International Conference on Advanced Algorithms and Control Engineering, ICAACE 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 1551-1554Conference paper, Published paper (Refereed)
Abstract [en]

This paper proposes a gait recognition method based on the modified OVR-CSP fusion feature of plantar pressure and Long Short-Term Memory classification (referred to as the OVR-CSP-LSTM model). 10 subjects conducted 4 type of gait experiments including normal speed walking, fast walking, slow walking, imitating stroke gait walking in this paper. Transfer the commonly used Common Spatial Pattern (CSP) feature extraction method for EEG to plantar pressure signals, and splice the OVR-CSP features of 2-class, 3-class and 4-class, adopting Long Short Term Memory Network (LSTM) for classification. In this paper, the Intra-patient mode and Inter-patient mode of 10 people are modeled and compared respectively, and the recognition effects under different sensor number and different position sensors' combination are also studied. The experimental results show that the proposed model has good performance for both modes. The method proposed in this article is expected to be applied to multi-sensor signal processing and classification with spatial characteristics.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
Long Short-Term Memory, Modified OVR-CSP fusion feature, OVR-CSP-LSTM, Plantar pressure sensor
National Category
Control Engineering
Identifiers
urn:nbn:se:kth:diva-350711 (URN)10.1109/ICAACE61206.2024.10548462 (DOI)2-s2.0-85197917501 (Scopus ID)
Conference
7th International Conference on Advanced Algorithms and Control Engineering, ICAACE 2024, Hybrid, Shanghai, China, Mar 1 2024 - Mar 3 2024
Note

Part of ISBN 9798350361445

QC 20240719

Available from: 2024-07-17 Created: 2024-07-17 Last updated: 2024-07-19Bibliographically approved
Lu, Z. & Otto, A. (2024). Health condition estimation for discrete power electronic devices under package failure. In: 2024 IEEE International Conference on Prognostics and Health Management, ICPHM 2024: . Paper presented at 2024 IEEE International Conference on Prognostics and Health Management, ICPHM 2024, June 17-19, 2024, Spokane, United States of America (pp. 336-347). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Health condition estimation for discrete power electronic devices under package failure
2024 (English)In: 2024 IEEE International Conference on Prognostics and Health Management, ICPHM 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, p. 336-347Conference paper, Published paper (Refereed)
Abstract [en]

Health Condition (HC) estimation of discrete power electronic devices is an important task for ensuring their reliable operation in mission-critical applications. Assuming bond-wire failure as a specific type of package failure, we take a direct approach to address the HC estimation problem using a dataset from accelerated aging tests. We first present a new concept of HC reference curve, which intends to represent the general health status over the device's lifetime using a health score. In particular, it is defined in relation to the device's Remaining Useful Lifetime (RUL) and captures the nonlinear health degradation status. Based on the HC reference curve, we mathematically formulate the HC estimation problem, and then develop an HC estimation flow using random forest algorithm. Finally, in our experiments, we demonstrate the effectiveness of our HC estimation method and the suitability of estimating the device's health status in comparison with the RUL estimation, which is a common yet indirect approach.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
decision tree, Health condition, Health score, package failure, power electronic devices, random forest, reliability, Remaining useful lifetime
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-353567 (URN)10.1109/ICPHM61352.2024.10626431 (DOI)001298819500045 ()2-s2.0-85202352678 (Scopus ID)
Conference
2024 IEEE International Conference on Prognostics and Health Management, ICPHM 2024, June 17-19, 2024, Spokane, United States of America
Note

Part of ISBN: 9798350374476

QC 20240927

Available from: 2024-09-19 Created: 2024-09-19 Last updated: 2024-11-11Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0061-3475

Search in DiVA

Show all publications