Åpne denne publikasjonen i ny fane eller vindu >>Vise andre…
2017 (engelsk)Inngår i: The Dark Side of Silicon: Energy Efficient Computing in the Dark Silicon Era, Springer, 2017, s. 47-94Kapittel i bok, del av antologi (Fagfellevurdert)
Abstract [en]
The dark silicon constraint will restrict the VLSI designers to utilize an increasingly smaller percentage of transistors as we progress deeper into nano-scale regime because of the power delivery and thermal dissipation limits. The best way to deal with the dark silicon constraint is to use the transistors that can be turned on as efficiently as possible. Inspired by this rationale, the VLSI design community has adopted customization as the principal means to address the dark silicon constraint. Two categories of customization, often in tandem have been adopted by the community. The first is the processors that are heterogeneous in functionality and/or have ability to more efficiently match varying functionalities and runtime load. The second category of customization is based on the fact that hardware implementations often offer 2–6 orders more efficiency compared to software. For this reason, designers isolate the power and performance critical functionality and map them to custom hardware implementations called accelerators. Both these categories of customizations are partial in being compute centric and still implement the bulk of functionality in the inefficient software style. In this chapter, we propose a contrarian approach: implement the bulk of functionality in hardware style and only retain control intensive and flexibility critical functionality in small simple processors that we call flexilators. We propose using a micro-architecture level coarse grain reconfigurable fabric as the alternative to the Boolean level standard cells and LUTs of the FPGAs as the basis for dynamically reconfigurable hardware implementation. This coarse grain reconfigurable fabric allows dynamic creation of arbitrarily wide and deep datapath with their hierarchical control that can be coupled with a cluster of storage resources to create private execution partitions that host individual applications. Multiple such partitions can be created that can operate at different voltage frequency operating points. Unused resources can be put into a range of low power modes. This CGRA fabric allows not just compute centric customization but also interconnect, control, storage and access to storage can be customized. The customization is not only possible at compile/build time but also at runtime to match the available resources and runtime load conditions. This complete, micro-architecture level hardware centric customization overcomes the limitations of partial compute centric customization offered by the state-of-the-art accelerator-rich heterogeneous multi-processor implementation style by extracting more functionality and performance from the limited number of transistors that can be turned on. Besides offering complete and more effective customization and a hardware centric implementation style, we also propose a methodology that dramatically reduces the cost of customization. This methodology is based on a concept called SiLago (Silicon Large Grain Objects) method. The core idea behind the SiLago method is to use large grain micro-architecture level hardened and characterized blocks, the SiLago blocks, as the atomic physical design building blocks and a grid based structured layout scheme that enables composition of the SiLago fabric simply by abutting the blocks to produce a timing and DRC clean GDSII design. Effectively, the SiLago method raises the abstraction of the physical design to micro-architectural level from the present Boolean level standard cell and LUT based physical design. This significantly improves the efficiency and predictability of synthesis from higher levels of abstraction. In addition, it also enables true system-level synthesis that by virtue of correct-by-construction guarantee eliminates the costly functional verification step. The proposed solution allows a fully customized design with dynamic fine grain power management to be automatically generated from Simulink down to GDSII with computational and silicon efficiencies that are modestly lower than ASIC. The micro-architecture level SiLago block based design process with correct by construction guarantee is 5–6 orders more efficient and 2 orders more accurate compared to the Boolean standard cell based design flows.
sted, utgiver, år, opplag, sider
Springer, 2017
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-216341 (URN)10.1007/978-3-319-31596-6_3 (DOI)2-s2.0-85027724442 (Scopus ID)
Merknad
QC 20241113
Part of ISBN 9783319315966, 9783319315942
2017-10-232017-10-232024-11-13bibliografisk kontrollert