Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Cognitive and Self-Adaptive SoCs with Self-Healing Run-Time-Reconfigurable RecoBlocks
KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.ORCID iD: 0000-0003-0748-125X
2015 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

In contrast to classical Field-Programmable Gate Arrays (FPGAs), partial and run-time reconfigurable (RTR) FPGAs can selectively reconfigure partitions of its hardware almost immediately while it is still powered and operative. In this way, RTR FPGAs combine the flexibility of software with the high efficiency of hardware. However, their potential cannot be fully exploited due to the increased complexity of the design process, and the intricacy to generate partial reconfigurations. FPGAs are often seen as a single auxiliary area to accelerate algorithms for specific problems. However, when several RTR partitions are implemented and combined with a processor system, new opportunities and challenges appear due to the creation of a heterogeneous RTR embedded system-on-chip (SoC).

The aim of this thesis is to investigate how the flexibility, reusability, and productivity in the design process of partial and RTR embedded SoCs can be improved to enable research and development of novel applications in areas such as hardware acceleration, dynamic fault-tolerance, self-healing, self-awareness, and self-adaptation. To address this question, this thesis proposes a solution based on modular reconfigurable IP-cores and design-and-reuse principles to reduce the design complexity and maximize the productivity of such FPGA-based SoCs. The research presented in this thesis found inspiration in several related topics and sciences such as reconfigurable computing, dependability and fault-tolerance, complex adaptive systems, bio-inspired hardware, organic and autonomic computing, psychology, and machine learning.

The outcome of this thesis demonstrates that the proposed solution addressed the research question and enabled investigation in initially unexpected fields. The particular contributions of this thesis are: (1) the RecoBlock SoC concept and platform with its flexible and reusable array of RTR IP-cores, (2) a simplified method to transform complex algorithms modeled in Matlab into relocatable partial reconfigurations adapted to an improved RecoBlock IP-core architecture, (3) the self-healing RTR fault-tolerant (FT) schemes, especially the Upset-Fault-Observer (UFO) that reuse available RTR IP-cores to self-assemble hardware redundancy during runtime, (4) the concept of Cognitive Reconfigurable Hardware (CRH) that defines a development path to achieve self-adaptation and cognitive development, (5) an adaptive self-aware and fault-tolerant RTR SoC that learns to adapt the RTR FT schemes to performance goals under uncertainty using rule-based decision making, (6) a method based on online and model-free reinforcement learning that uses a Q-algorithm to self-optimize the activation of dynamic FT schemes in performance-aware RecoBlock SoCs.

The vision of this thesis proposes a new class of self-adaptive and cognitive hardware systems consisting of arrays of modular RTR IP-cores. Such a system becomes self-aware of its internal performance and learns to self-optimize the decisions that trigger the adequate self-organization of these RTR cores, i.e., to create dynamic hardware redundancy and self-healing, particularly while working in uncertain environments.

Abstract [sv]

Partiell och run-time rekonfigurering (RTR) betyder att en del av en integrerad krets kan konfigureras om, medan den resterande delens operation kan fortlöpa. Moderna Field Programmable Gate Array (FPGA) kretsar är ofta partiell och run-time rekonfigurerbara och kombinerar därmed mjukvarans flexibilitet med hårdvarans effektivitet. Tyvärr hindrar dock den ökade designkomplexiteten att utnyttja dess fulla potential. Idag ses FPGAer mest som hårdvaruacceleratorer, men helt nya möjligheter uppstår genom att kombinera ett multiprocessorsystem med flera rekonfigurerbara partitioner som oberoende av varandra kan omkonfigureras under systemoperation.

Målet med avhandlingen är att undersöka hur utvecklingsprocessen för partiella och run-time rekonfigurerbara FPGAer kan förbättras för att möjliggöra forskning och utveckling av nya tillämpningar i områden som hårdvaruacceleration, själv-läkande och själv-adaptiva system. I avhandlingen föreslås att en lösning baserad på modulära rekonfigurerbara hårdvarukärnor kombinerad med principer för återanvändbarhet kan förenkla komplexiteten av utvecklingsprocessen och leda till en högre produktivitet vid utvecklingen av inbyggda run-time rekonfigurerbara system. Forskningen i avhandlingen inspirerades av flera relaterade områden, så som rekonfigurerbarhet, tillförlitlighet och feltolerans, komplexa adaptiva system, bio-inspirerad hårdvara, organiska och autonoma system, psykologi och maskininlärning.

Avhandlingens resultat visar att den föreslagna lösningen har potential inom olika tillämpningsområden. Avhandlingen har följande bidrag: (1) RecoBlock system-på-kisel plattformen bestående av flera rekonfigurerbara hårdvarukärnor, (2) en förenklad metod för att implementera Matlab modeller i rekonfigurerbara partitioner, (3) metoder för själv-läkande RTR feltoleranta system, t. ex. Upset-Fault-Observer, som själv-skapar hårdvaruredundans under operation, (4) utvecklandet av konceptet för kognitiv rekonfigurerbar hårdvara, (5) användningen av konceptet och plattformen för att implementera kretsar som kan användas i en okänd omgivning på grund av förmågan att fatta regel-baserade beslut, och (6) en förstärkande inlärnings-metod som använder en Q-algoritm för dynamisk feltolerans i prestanda-medvetna RecoBlock SoCs.

Avhandlingens vision är en ny klass av själv-adaptiva och kognitiva hårdvarusystem bestående av modulära run-time rekonfigurerbara hårdvarukärnor. Dessa system blir själv-medvetna om sin interna prestanda och kan genom inlärning optimera sina beslut för själv-organisation av de rekonfigurerbara kärnorna. Därmed skapas dynamisk hårdvaruredundans och självläkande system som har bättre förutsättningar att kunna operera i en okänd omgivning.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2015. , xiv, 83 p.
Series
TRITA-ICT-ECS AVH, ISSN 1653-6363 ; 15:22
National Category
Electrical Engineering, Electronic Engineering, Information Engineering Embedded Systems Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-178000ISBN: 978-91-7595-768-5 (print)OAI: oai:DiVA.org:kth-178000DiVA: diva2:875482
Public defence
2015-12-17, Sal C, Elektrum, KTH-ICT, Kista, 13:00 (English)
Opponent
Supervisors
Note

QC 20151201

Available from: 2015-12-01 Created: 2015-12-01 Last updated: 2015-12-02Bibliographically approved
List of papers
1. The RecoBlock SoC Platform: A Flexible Array of Reusable Run-Time-Reconfigurable IP-Blocks
Open this publication in new window or tab >>The RecoBlock SoC Platform: A Flexible Array of Reusable Run-Time-Reconfigurable IP-Blocks
2013 (English)In: Design, Automation & Test in Europe Conference & Exhibition (DATE), 2013, 2013, 833-838 p.Conference paper, Published paper (Refereed)
Abstract [en]

Run-time reconfigurable (RTR) FPGAs combine the flexibility of software with the high efficiency of hardware. Still, their potential cannot be fully exploited due to increased complexity of the design process. Consequently, to enable an efficient design flow, we devise a set of prerequisites to increase the flexibility and reusability of current FPGA-based RTR architectures. We apply these principles to design and implement the RecoBlock SoC platform, which main characterization is (1) a RTR plug-and-play IP-Core whose functionality is configured at run-time; (2) flexible inter-block communication configured via software, and (3) built-in buffers to support data-driven streams and inter-process communications. We illustrate the potential of our platform by a tutorial case study using an adaptive streaming application to investigate different combinations of reconfigurable arrays and schedules. The experiments underline the benefits of the platform and shows resource utilization.

Series
Design, Automation, and Test in Europe Conference and Exhibition. Proceedings, ISSN 1530-1591
Keyword
reconfigurable architectures, partial and run-time reconfiguration, system-on-chip, adaptivity, embedded systems
National Category
Embedded Systems Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-121778 (URN)10.7873/DATE.2013.176 (DOI)2-s2.0-84885655414 (Scopus ID)978-1-4673-5071-6 (ISBN)
Conference
Design, Automation & Test in Europe (DATE'13); Grenoble, France, 18-22 March 2013
Note

QC 20130822

Available from: 2013-05-04 Created: 2013-05-04 Last updated: 2015-12-01Bibliographically approved
2. Towards the generic reconfigurable accelerator: Algorithm development, core design, and performance analysis
Open this publication in new window or tab >>Towards the generic reconfigurable accelerator: Algorithm development, core design, and performance analysis
2013 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Adoption of reconfigurable computing is limited in part by the lack of simplified, economic, and reusable solutions. The significant speedup and energy saving can increase performance but also design complexity; in particular for heterogeneous SoCs blending several CPUs, GPUs, and FPGA-Accelerator Cores. On the other hand, implementing complex algorithms in hardware requires modeling and verification, not only HDL generation. Most approaches are too specific without looking for reusability. Therefore, we present a solution based on: (1) a design methodology to develop algorithms accelerated in reconfigurable/non-reconfigurable IP-Cores, using common access tools, and contemplating verification from model to embedded software stages; (2) a generic accelerator core design that enables relocation and reuse almost independently of the algorithm, and data-flow driven execution models; and (3) a performance analysis of the acceleration mechanisms included in our system (i.e., accelerator core, burst I/O transfers, and reconfiguration pre-fetch). In consequence, the implemented system accelerates algorithms (e.g., FIR and Kalman filters) with speedups up to 3 orders of magnitude, compared to processor implementations.

Place, publisher, year, edition, pages
IEEE conference proceedings, 2013
Keyword
field programmable gate arrays, reconfigurable architectures, heterogeneous SoC, performance analysis, reconfigurable computing, Acceleration, Algorithm design and analysis, Hardware, MATLAB, algorithm development, design methodology, embedded system, hardware accelerator, partial and run-time reconfiguration, reconfiguration techniques, system-on-chip
National Category
Embedded Systems Other Electrical Engineering, Electronic Engineering, Information Engineering Computer Systems
Identifiers
urn:nbn:se:kth:diva-143222 (URN)10.1109/ReConFig.2013.6732334 (DOI)000349244200077 ()2-s2.0-84894465436 (Scopus ID)
Conference
2013 International Conference on Reconfigurable Computing and FPGAs, ReConFig 2013; Cancun; Mexico
Note

Byron Navas is funded by ESPE. QC 20140626

Available from: 2014-03-19 Created: 2014-03-19 Last updated: 2015-12-04Bibliographically approved
3. The Upset-Fault-Observer: A Concept for Self-healing Adaptive Fault Tolerance
Open this publication in new window or tab >>The Upset-Fault-Observer: A Concept for Self-healing Adaptive Fault Tolerance
2014 (English)In: Proceedings of the 2014 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2014, IEEE Computer Society, 2014, 89-96 p.Conference paper, Published paper (Refereed)
Abstract [en]

Advancing integration reaching atomic-scales makes components highly defective and unstable during lifetime. This demands paradigm shifts in electronic systems design. FPGAs are particularly sensitive to cosmic and other kinds of radiations that produce single-event-upsets (SEU) in configuration and internal memories. Typical fault-tolerance (FT) techniques combine triple-modular-redundancy (TMR) schemes with run-time-reconfiguration (RTR). However, even the most successful approaches disregard the low suitability of fine-grain redundancy in nano-scale design, poor scalability and programmability of application specific architectures, small performance-consumption ratio of board-level designs, or scarce optimization capability of rigid redundancy structures. In that context, we introduce an innovative solution that exploits the flexibility, reusability, and scalability of a modular RTR SoC approach and reuse existing RTR IP-cores in order to assemble different TMR schemes during run-time. Thus, the system can adaptively trigger the adequate self-healing strategy according to execution environment metrics and user-defined goals. Specifically the paper presents: (a) the upset-fault-observer (UFO), an innovative run-time self-test and recovery strategy that delivers FT on request over several function cores but saves the redundancy scalability cost by running periodic reconfigurable TMR scan-cycles, (b) run-time reconfigurable TMR schemes and self-repair mechanisms, and (c) an adaptive software organization model to manage the proposed FT strategies.

Place, publisher, year, edition, pages
IEEE Computer Society, 2014
Series
NASA/ESA Conference on Adaptive Hardware and Systems, ISSN 1939-7003
Keyword
partial and run-time-reconfiguration, fault-tolerance, self-healing, self-configuration, system-on-chip, hardware systems, reconfigurable IP-cores, adaptive embedded systems, reconfigurable computing
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-158304 (URN)10.1109/AHS.2014.6880163 (DOI)000345896600013 ()2-s2.0-84906705557 (Scopus ID)978-1-4799-5356-1 (ISBN)
Conference
2014 NASA/ESA Conference on Adaptive Hardware and Systems, AHS 2014, Leicester, United Kingdom, 14 July 2014 through 18 July 2014
Note

QC 20150107

Available from: 2015-01-07 Created: 2015-01-07 Last updated: 2015-12-01Bibliographically approved
4. On providing scalable self-healing adaptive fault-tolerance to RTR SoCs
Open this publication in new window or tab >>On providing scalable self-healing adaptive fault-tolerance to RTR SoCs
2014 (English)In: Proceedings of ReConFigurable Computing and FPGAs (ReConFig), 2014 International Conference on, 2014, 1-6 p.Conference paper, Published paper (Refereed)
Abstract [en]

The dependability of heterogeneous many-core FPGA based systems are threatened by higher failure rates caused by disruptive scales of integration, increased design complexity, and radiation sensitivity. Triple-modular redundancy (TMR) and run-time reconfiguration (RTR) are traditional fault-tolerant (FT) techniques used to increase dependability. However, hardware redundancy is expensive and most approaches have poor scalability, flexibility, and programmability. Therefore, innovative solutions are needed to reduce the redundancy cost but still preserve acceptable levels of dependability. In this context, this paper presents the implementation of a self-healing adaptive fault-tolerant SoC that reuses RTR IP-cores in order to self-assemble different TMR schemes during run-time. The presented system demonstrates the feasibility of the Upset-Fault-Observer concept, which provides a run-time self-test and recovery strategy that delivers fault-tolerance over functions accelerated in RTR cores, at the same time reducing the redundancy scalability cost by running periodic reconfigurable TMR scan-cycles. In addition, this paper experimentally evaluates the trade-off of the implemented reconfigurable TMR schemes by characterizing important fault tolerant metrics i.e., recovery time (self-repair and self-replicate), detection latency, self-assembly latency, throughput reduction, and increase of physical resources.

Keyword
Fault tolerant systems, Hardware, Redundancy, Software, System-on-chip, Self-healing, Adaptive-computer-systems, FPGA, Partial and run-time reconfiguration, Space applications, Dependability
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering Embedded Systems Computer Systems
Identifiers
urn:nbn:se:kth:diva-160878 (URN)10.1109/ReConFig.2014.7032541 (DOI)2-s2.0-84946690245 (Scopus ID)978-147995944-0 (ISBN)
Conference
ReConFigurable Computing and FPGAs (ReConFig), 2014 International Conference on, Cancun, Mexico, 8-10 December 2014
Note

QC 20150410

Available from: 2015-03-02 Created: 2015-03-02 Last updated: 2015-12-01Bibliographically approved
5. Towards cognitive reconfigurable hardware: Self-aware learning in RTR fault-tolerant SoCs
Open this publication in new window or tab >>Towards cognitive reconfigurable hardware: Self-aware learning in RTR fault-tolerant SoCs
2015 (English)In: Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), 2015, Institute of Electrical and Electronics Engineers (IEEE), 2015, 7238103Conference paper, Published paper (Refereed)
Abstract [en]

Traditional embedded systems are evolving into power-and-performance-domain self-aware intelligent systems in order to overcome complexity and uncertainty. Without human control, they need to keep operative states in applications such as drone-based delivery or robotic space landing. Nowadays, the partial and run-time reconfiguration (RTR) of FPGA-based Systems-on-chip (SoC) can enable dynamic hardware acceleration or self-healing structures, but this conversely increases system-memory traffic. This paper introduces the basis of cognitive reconfigurable hardware and presents the design of an FPGA-based RTR SoC that becomes conscious of its monitored hardware and learns to make decisions that maintain a desired system performance, particularly when triggering hardware acceleration and dynamic fault-tolerant (FT) schemes on RTR cores. Self-awareness is achieved by evaluating monitored metrics in critical AXI-cores, supported by hardware performance counters. We suggest a reinforcement-learning algorithm that helps the system to search out when and which reconfigurable FT-scheme can be triggered. Executing random sequences of an embedded benchmark suite simulates unpredictability and bus traffic. The evaluation shows the effectiveness and implications of our approach.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2015
Keyword
cognitive hardware, partial and run-time reconfiguration, FPGA, complex adaptive systems, self-awareness, self-healing, machine learning, dynamic fault-tolerance
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering Computer Systems Embedded Systems
Identifiers
urn:nbn:se:kth:diva-177998 (URN)10.1109/ReCoSoC.2015.7238103 (DOI)000380396200026 ()2-s2.0-84954191077 (Scopus ID)978-1-4673-7942-7 (ISBN)
External cooperation:
Conference
Reconfigurable Communication-centric Systems-on-Chip (ReCoSoC), Bremen, June 29 2015-July 1 2015
Note

QC 20151201

Available from: 2015-12-01 Created: 2015-12-01 Last updated: 2016-09-07Bibliographically approved
6. Reinforcement Learning Based Self-Optimization of Dynamic Fault-Tolerant Schemes in Performance-Aware RecoBlock SoCs
Open this publication in new window or tab >>Reinforcement Learning Based Self-Optimization of Dynamic Fault-Tolerant Schemes in Performance-Aware RecoBlock SoCs
2015 (English)Report (Other academic)
Abstract [en]

Partial and run-time reconfiguration (RTR) technology has increased the range of opportunities and applications in the design of systems-on-chip (SoCs) based on Field-Programmable Gate Arrays (FPGAs). Nevertheless, RTR adds another complexity to the design process, particularly when embedded FPGAs have to deal with power and performance constraints uncertain environments. Embedded systems will need to make autonomous decisions, develop cognitive properties such as self-awareness and finally become self-adaptive to be deployed in the real world. Classico-line modeling and programming methods are inadequate to cope with unpredictable environments. Reinforcement learning (RL) methods have been successfully explored to solve these complex optimization problems mainly in workstation computers, yet they are rarely implemented in embedded systems. Disruptive integration technologies reaching atomic-scales will increase the probability of fabrication errors and the sensitivity to electromagnetic radiation that can generate single-event upsets (SEUs) in the configuration memory of FPGAs. Dynamic FT schemes are promising RTR hardware redundancy structures that improve dependability, but on the other hand, they increase memory system traffic. This article presents an FPGA-based SoC that is self-aware of its monitored hardware and utilizes an online RL method to self-optimize the decisions that maintain the desired system performance, particularly when triggering hardware acceleration and dynamic FT schemes on RTR IP-cores. Moreover, this article describes the main features of the RecoBlock SoC concept, overviews the RL theory, shows the Q-learning algorithm adapted for the dynamic fault-tolerance optimization problem, and presents its simulation in Matlab. Based on this investigation, the Q-learning algorithm will be implemented and verified in the RecoBlock SoC platform.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2015. 30 p.
Series
TRITA-ICT/ECS, 15:27
Keyword
cognitive hardware, partial and run-time reconfiguration, FPGA, autonomic computing, self-awareness, self-healing, machine learning, dynamic fault-tolerance, partial and run-time reconfiguration, complex adaptive systems, self-awareness, self-healing, machine learning, dynamic fault-tolerance, complex adaptive systems
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering Computer Systems Embedded Systems
Identifiers
urn:nbn:se:kth:diva-177999 (URN)KTH/ICT/ECS/R-15-27-SE (ISRN)
Note

QC 20151201

Available from: 2015-12-01 Created: 2015-12-01 Last updated: 2015-12-01Bibliographically approved

Open Access in DiVA

Thesis(6749 kB)318 downloads
File information
File name FULLTEXT01.pdfFile size 6749 kBChecksum SHA-512
c63dd6004f75205f3ca2506c619631ba3c7bc2b3a6b08896da8729039a1839689aea7937ad42a16f05534ebeea12730110b35858d67731c33ada9d4cfa0ec173
Type fulltextMimetype application/pdf

Authority records BETA

Navas, Byron

Search in DiVA

By author/editor
Navas, Byron
By organisation
Electronics and Embedded Systems
Electrical Engineering, Electronic Engineering, Information EngineeringEmbedded SystemsComputer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 318 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 6179 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf