Methods for Fault Tolerance in Networks-on-Chip
2013 (English)In: ACM Computing Surveys, ISSN 0360-0300, E-ISSN 0010-4892, Vol. 46, no 1, 8- p.Article in journal (Refereed) Published
Networks-on-Chip constitute the interconnection architecture of future, massively parallel multiprocessors that assemble hundreds to thousands of processing cores on a single chip. Their integration is enabled by ongoing miniaturization of chip manufacturing technologies following Moore's Law. It comes with the downside of the circuit elements' increased susceptibility to failure. Research on fault-tolerant Networks-on-Chip tries to mitigate partial failure and its effect on network performance and reliability by exploiting various forms of redundancy at the suitable network layers. The article at hand reviews the failure mechanisms, fault models, diagnosis techniques, and fault-tolerance methods in on-chip networks, and surveys and summarizes the research of the last ten years. It is structured along three communication layers: the data link, the network, and the transport layers. The most important results are summarized and open research problems and challenges are highlighted to guide future research on this topic.
Place, publisher, year, edition, pages
2013. Vol. 46, no 1, 8- p.
Design, Reliability, Network-on-Chip, failure mechanisms, dependability, fault models, diagnosis, fault tolerance, reconfiguration
IdentifiersURN: urn:nbn:se:kth:diva-139229DOI: 10.1145/2522968.2522976ISI: 000327453300008ScopusID: 2-s2.0-84887417271OAI: oai:DiVA.org:kth-139229DiVA: diva2:687290
QC 201401142014-01-142014-01-082014-01-17Bibliographically approved