Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Design of Autonomous Error-Tolerant Architectures for Massively Parallel Computing
Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China..ORCID iD: 0000-0002-6554-2041
Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China..
Fudan Univ, State Key Lab ASIC & Syst, Shanghai 200433, Peoples R China..
KTH, School of Information and Communication Technology (ICT).ORCID iD: 0000-0002-7589-9749
Show others and affiliations
2018 (English)In: IEEE Transactions on Very Large Scale Integration (vlsi) Systems, ISSN 1063-8210, E-ISSN 1557-9999, Vol. 26, no 10, p. 2143-2154Article in journal (Refereed) Published
Abstract [en]

The massively parallel computing systems composed of many processors are connected on chips, which will become more and more complex and unreliable. This paper presents an error-tolerant design based on the autonomous error-tolerant (AET) architecture that aims to have a self-repairing capability. A nearby error sensing mechanism is designed to discover faults, and an active evolution scheme is studied to handle unrecoverable errors. A circuit backup switching mechanism is proposed to bypass the failed nodes. The board-level prototype is implemented based on dual-core embedded processors. The analysis shows that the error-tolerant capability of the proposed architecture is better than the conventional multimodular redundant system when the failure rate of a single core is less than 0.7. In the AET test system consisting of 16 processors, the error-tolerant capability is verified. The results show that the relative variation of the overall performance of the AET system will not be changed due to the high reliability requirements of the system. Through experimental comparison, under the premise that the architecture of AET and the triple modular redundancy method are basically consistent in reliability, whether on the logical-level error tolerant or on the physical-level error tolerant, the former has lower power consumption.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018. Vol. 26, no 10, p. 2143-2154
Keywords [en]
Error tolerant, nanosystem, self-reparation, sensing
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:kth:diva-237109DOI: 10.1109/TVLSI.2018.2846298ISI: 000446332500029Scopus ID: 2-s2.0-85049490607OAI: oai:DiVA.org:kth-237109DiVA, id: diva2:1259557
Note

QC 20181030

Available from: 2018-10-30 Created: 2018-10-30 Last updated: 2018-10-30Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records BETA

Ma, NingHuan, Yuxiang

Search in DiVA

By author/editor
Liu, LizhengMa, NingHuan, YuxiangZou, ZhuoZheng, Lirong
By organisation
School of Information and Communication Technology (ICT)
In the same journal
IEEE Transactions on Very Large Scale Integration (vlsi) Systems
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 632 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf