Energy-Aware Fault-Tolerant CGRAs Addressing Application with Different Reliability Needs
2013 (English)In: Digital System Design (DSD), 2013 Euromicro Conference on, IEEE conference proceedings, 2013, 525-534 p.Conference paper (Refereed)
In this paper, we propose a polymorphic fault tolerant architecture that can be tailored to efficiently support the reliability needs of multiple applications at run-time. Today, coarse-grained reconfigurable architectures (CGRAs) host multiple applications with potentially different reliability needs. Providing platform-wide worst-case (maximum) protection to all the applications is neither optimal nor desirable. To reduce the fault-tolerance overhead, adaptive fault-tolerance strategies have been proposed. The proposed techniques access the reliability requirements of each application and adjust the fault-tolerance intensity (and hence overhead), accordingly. However, existing flexible reliability schemes only allow to shift between different levels of modular redundancy (duplication, triplication, etc.) and deal with only a single class of faults (e.g. soft errors). To complement these strategies, we propose energy-aware fault-tolerance that, in addition to modular redundancy, can also provide low cost, sub-modular (e.g. residue mod 3) redundancy, to cater both permanent and temporary faults. Our solution relies on an agent based control layer and a configurable fault-tolerance data path. The control layer identifies the application class and configures the data path to provide the needed reliability. Simulation results using a few selected algorithms (FFT, matrix multiplication, and FIR filter) showed that the proposed method provides flexible protection with energy overhead ranging from 3.125% to 107% for different reliability levels. Synthesis results have confirmed that the proposed architecture significantly reduces the area overhead for self-checking (59.1%) and fault tolerant (7.1%) versions, compared to the state of the art adaptive reliability techniques.
Place, publisher, year, edition, pages
IEEE conference proceedings, 2013. 525-534 p.
Circuit faults;Computer architecture;Digital signal processing;Fault tolerant systems;Redundancy;Adaptive systems;CGRAs;Energy aware;Fault tolerance;Low power
Other Electrical Engineering, Electronic Engineering, Information Engineering
IdentifiersURN: urn:nbn:se:kth:diva-132293DOI: 10.1109/DSD.2013.62ISI: 000337235200072ScopusID: 2-s2.0-84890066827OAI: oai:DiVA.org:kth-132293DiVA: diva2:659478
Digital System Design (DSD), 2013 Euromicro Conference on
QC 201402042013-10-252013-10-252016-02-10Bibliographically approved