Endre søk
Link to record
Permanent link

Direct link
Alternativa namn
Publikasjoner (10 av 32) Visa alla publikasjoner
Hemani, A., Farahini, N., Jafri, S., Sohofi, H., Li, S. & Paul, K. (2017). The silago solution: Architecture and design methods for a heterogeneous dark silicon aware coarse grain reconfigurable fabric. In: The Dark Side of Silicon: Energy Efficient Computing in the Dark Silicon Era (pp. 47-94). Springer
Åpne denne publikasjonen i ny fane eller vindu >>The silago solution: Architecture and design methods for a heterogeneous dark silicon aware coarse grain reconfigurable fabric
Vise andre…
2017 (engelsk)Inngår i: The Dark Side of Silicon: Energy Efficient Computing in the Dark Silicon Era, Springer, 2017, s. 47-94Kapittel i bok, del av antologi (Fagfellevurdert)
Abstract [en]

The dark silicon constraint will restrict the VLSI designers to utilize an increasingly smaller percentage of transistors as we progress deeper into nano-scale regime because of the power delivery and thermal dissipation limits. The best way to deal with the dark silicon constraint is to use the transistors that can be turned on as efficiently as possible. Inspired by this rationale, the VLSI design community has adopted customization as the principal means to address the dark silicon constraint. Two categories of customization, often in tandem have been adopted by the community. The first is the processors that are heterogeneous in functionality and/or have ability to more efficiently match varying functionalities and runtime load. The second category of customization is based on the fact that hardware implementations often offer 2–6 orders more efficiency compared to software. For this reason, designers isolate the power and performance critical functionality and map them to custom hardware implementations called accelerators. Both these categories of customizations are partial in being compute centric and still implement the bulk of functionality in the inefficient software style. In this chapter, we propose a contrarian approach: implement the bulk of functionality in hardware style and only retain control intensive and flexibility critical functionality in small simple processors that we call flexilators. We propose using a micro-architecture level coarse grain reconfigurable fabric as the alternative to the Boolean level standard cells and LUTs of the FPGAs as the basis for dynamically reconfigurable hardware implementation. This coarse grain reconfigurable fabric allows dynamic creation of arbitrarily wide and deep datapath with their hierarchical control that can be coupled with a cluster of storage resources to create private execution partitions that host individual applications. Multiple such partitions can be created that can operate at different voltage frequency operating points. Unused resources can be put into a range of low power modes. This CGRA fabric allows not just compute centric customization but also interconnect, control, storage and access to storage can be customized. The customization is not only possible at compile/build time but also at runtime to match the available resources and runtime load conditions. This complete, micro-architecture level hardware centric customization overcomes the limitations of partial compute centric customization offered by the state-of-the-art accelerator-rich heterogeneous multi-processor implementation style by extracting more functionality and performance from the limited number of transistors that can be turned on. Besides offering complete and more effective customization and a hardware centric implementation style, we also propose a methodology that dramatically reduces the cost of customization. This methodology is based on a concept called SiLago (Silicon Large Grain Objects) method. The core idea behind the SiLago method is to use large grain micro-architecture level hardened and characterized blocks, the SiLago blocks, as the atomic physical design building blocks and a grid based structured layout scheme that enables composition of the SiLago fabric simply by abutting the blocks to produce a timing and DRC clean GDSII design. Effectively, the SiLago method raises the abstraction of the physical design to micro-architectural level from the present Boolean level standard cell and LUT based physical design. This significantly improves the efficiency and predictability of synthesis from higher levels of abstraction. In addition, it also enables true system-level synthesis that by virtue of correct-by-construction guarantee eliminates the costly functional verification step. The proposed solution allows a fully customized design with dynamic fine grain power management to be automatically generated from Simulink down to GDSII with computational and silicon efficiencies that are modestly lower than ASIC. The micro-architecture level SiLago block based design process with correct by construction guarantee is 5–6 orders more efficient and 2 orders more accurate compared to the Boolean standard cell based design flows.

sted, utgiver, år, opplag, sider
Springer, 2017
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-216341 (URN)10.1007/978-3-319-31596-6_3 (DOI)2-s2.0-85027724442 (Scopus ID)
Merknad

QC 20241113

Part of ISBN 9783319315966, 9783319315942

Tilgjengelig fra: 2017-10-23 Laget: 2017-10-23 Sist oppdatert: 2024-11-13bibliografisk kontrollert
Farahini, N., Hemani, A., Sohofi, H. & Li, S. (2015). Physical Design Aware System Level Synthesis of Hardware. In: Proceedings - Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2015: . Paper presented at International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) 2015, 19-23 July 2015 (pp. 141-148). IEEE
Åpne denne publikasjonen i ny fane eller vindu >>Physical Design Aware System Level Synthesis of Hardware
2015 (engelsk)Inngår i: Proceedings - Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2015, IEEE , 2015, s. 141-148Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In spite of decades of research, only a small percentage of hardware is designed using high-level synthesis because of the large gap between the abstraction levels of standard cells and algorithmic level. We propose a grid-based regular physical design platform composed of large grain hardened building blocks called SiLago blocks. This platform is divided into regions which are specialized for different functionalities like computation, storage, system control, etc. The characterized micro-architectural operations of the SiLago platform serve as the interface to meet-in-the-middle high-level and system-level syntheses framework. This framework was used to generate three hardware macro instances, derived from SiLago platform for three applications from signal processing domain. Results show two orders of magnitude improvements in efficiency of the system-level design space exploration and synthesis time, with average loss in design quality of 18% for energy and 54% for area compared to the commercial SOC flow.

sted, utgiver, år, opplag, sider
IEEE, 2015
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-185777 (URN)10.1109/SAMOS.2015.7363669 (DOI)000380507900020 ()2-s2.0-84963655342 (Scopus ID)
Eksternt samarbeid:
Konferanse
International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) 2015, 19-23 July 2015
Merknad

QC 20160429

Tilgjengelig fra: 2016-04-27 Laget: 2016-04-27 Sist oppdatert: 2022-06-22bibliografisk kontrollert
Li, S. (2015). System-Level Architectural Hardware Synthesis for Digital Signal Processing Sub-Systems. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Åpne denne publikasjonen i ny fane eller vindu >>System-Level Architectural Hardware Synthesis for Digital Signal Processing Sub-Systems
2015 (engelsk)Doktoravhandling, monografi (Annet vitenskapelig)
Abstract [en]

This thesis presents a novel system-level synthesis framework called System-Level Architectural Synthesis Framework (SYLVA), which synthesizes DigitalSignal Processing (DSP) sub-systems modeled by synchronous data ?ow intohardware implementations in Application-Specific Integrated Circuit (ASIC),Field-Programmable Gate Array (FPGA) or Coarse-Grained ReconfigurableArchitecture (CGRA) style. SYLVA synthesizes in terms of pre-characterizedFunction Implementations (FIMPs). It explores the design space in threedimensions, number of FIMPs, type of FIMPs, and pipeline parallelism be-tween the producing and consuming FIMPs. SYLVA also introduces timingand interface model of FIMPs to enable reuse and automatic generation ofGlobal Interconnect and Control (GLIC) to glue the FIMPs together into aworking system. SYLVA has been evaluated by applying it to several realand synthetic DSP applications and the experimental results are analyzedfor the design space exploration, the GLIC synthesis, the code generation,and the CGRA floorplanning features. The conclusion from the experimentalresults is that by exploring the multi-dimensional design space in terms ofpre-characterized FIMPs, SYLVA explores a richer design space and does itmore effectively compared to the existing High-Level Synthesis (HLS) toolsto improve both engineering and computational efficiency.

sted, utgiver, år, opplag, sider
Stockholm: KTH Royal Institute of Technology, 2015. s. xxii, 193
Serie
TRITA-ICT ; 2015:28
HSV kategori
Forskningsprogram
Elektro- och systemteknik
Identifikatorer
urn:nbn:se:kth:diva-180441 (URN)978-91-7595-799-9 (ISBN)
Disputas
2016-02-18, Sal/hall C, Elektrum, KTH-ICT, Kista, 13:00 (engelsk)
Opponent
Veileder
Merknad

QC 20160125

Tilgjengelig fra: 2016-01-25 Laget: 2016-01-13 Sist oppdatert: 2022-06-23bibliografisk kontrollert
Li, S. & Hemani, A. (2014). Accurate and efficient three level design space exploration based on constraints satisfaction optimization problem solver. In: Proceedings - 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014: . Paper presented at 22nd IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, 11 May 2014 through 13 May 2014.
Åpne denne publikasjonen i ny fane eller vindu >>Accurate and efficient three level design space exploration based on constraints satisfaction optimization problem solver
2014 (engelsk)Inngår i: Proceedings - 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, 2014Konferansepaper, Publicerat paper (Fagfellevurdert)
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-167931 (URN)10.1109/FCCM.2014.56 (DOI)000410585800046 ()2-s2.0-84912520888 (Scopus ID)9781479951116 (ISBN)
Konferanse
22nd IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, 11 May 2014 through 13 May 2014
Merknad

QC 20150605

Tilgjengelig fra: 2015-06-05 Laget: 2015-05-22 Sist oppdatert: 2022-06-23bibliografisk kontrollert
Li, S. & Hemani, A. (2014). Case Study: Constraint Programming in a System Level Synthesis Framework. In: PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, CP 2014: . Paper presented at 20th International Conference on the Principles and Practice of Constraint Programming (CP), SEP 08-12, 2014, Lyon, FRANCE (pp. 846-861).
Åpne denne publikasjonen i ny fane eller vindu >>Case Study: Constraint Programming in a System Level Synthesis Framework
2014 (engelsk)Inngår i: PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, CP 2014, 2014, s. 846-861Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This article presents a case study of using a constraint programming solver in a system level synthesis framework called SYLVA. The solver is used to find the repetition vector of a synchronous data flow graph and serving as the design space exploration engine, which rapidly finds qualified system implementations by solving a constraint satisfaction optimization problem. Each system implementation is a combination of a number of function implementation instances and their cycle accurate execution schedules. The problem to be solved is automatically generated based on the user inputs: 1) a system model to be synthesized, 2) a library containing all the usable function implementations, 3) the performance/cost constraints, and 4) the optimization objectives. Use of constraints programming technique enabled a low cost development of design space exploration engine in addition to gaining ease of use.

Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8656
Emneord
System Level Synthesis, Design Space Exploration, Constraint Programming
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-158857 (URN)10.1007/978-3-319-10428-7_60 (DOI)000345088200060 ()2-s2.0-84906222505 (Scopus ID)978-3-319-10428-7 (ISBN)978-3-319-10427-0 (ISBN)
Konferanse
20th International Conference on the Principles and Practice of Constraint Programming (CP), SEP 08-12, 2014, Lyon, FRANCE
Merknad

QC 20150116

Tilgjengelig fra: 2015-01-16 Laget: 2015-01-12 Sist oppdatert: 2022-06-23bibliografisk kontrollert
Li, S. & Hemani, A. (2014). Three-Dimensional Design Space Exploration for System Level Synthesis. In: 2014 17TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD): . Paper presented at 17th Euromicro Conference on Digital System Design (DSD), AUG 27-29, 2014, Verona, ITALY (pp. 419-426).
Åpne denne publikasjonen i ny fane eller vindu >>Three-Dimensional Design Space Exploration for System Level Synthesis
2014 (engelsk)Inngår i: 2014 17TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2014, s. 419-426Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In this paper, we propose an efficient and effective three-dimensional design space exploration method for mapping a DSP system in synchronous data flow graph format onto an RTL or lower level hardware description using constraint programming. The three dimensions are 1) schedule level parallelism (The parallelism of the executions for one DSP function, fully parallel, semi-parallel or fully serial), 2) function level parallelism (how many function implementations are used to implement each of the DSP functions), and 3) arithmetic level parallelism (how the function implementations are implemented). The design space exploration problem is formulated as a constraints satisfaction optimization problem and solved by the constraint programming solver in Google's or-tools. The proposed method is compared against two state-of-the-art commercial HLS tools for four realistic examples and one synthetic example. The metrics compared are runtime, accuracy and quality of results in terms of resource usage. We show on average, the proposed method is 85.22% faster compared to HLS tools, 4.3% more accurate and 8.27% better in quality of results. For the latter we have conservatively assumed the same function execution parallelism.

Emneord
System Level Synthesis, Design Space Exploration, Constraint Programming
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-172644 (URN)10.1109/DSD.2014.45 (DOI)000358409000055 ()2-s2.0-84928822237 (Scopus ID)978-1-4799-5793-4 (ISBN)
Konferanse
17th Euromicro Conference on Digital System Design (DSD), AUG 27-29, 2014, Verona, ITALY
Merknad

QC 20150827

Tilgjengelig fra: 2015-08-27 Laget: 2015-08-27 Sist oppdatert: 2022-06-23bibliografisk kontrollert
Farahini, N., Li, S., Tajammul, M. A., Shami, M. A., Chen, G., Hemani, A. & Ye, W. (2013). 39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation. In: 2013 IEEE International Symposium on Circuits and Systems (ISCAS): . Paper presented at 2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013; Beijing; China; 19 May 2013 through 23 May 2013 (pp. 1448-1451). IEEE
Åpne denne publikasjonen i ny fane eller vindu >>39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation
Vise andre…
2013 (engelsk)Inngår i: 2013 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE , 2013, s. 1448-1451Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper presents an industrial case study of using a Coarse Grain Reconfigurable Architecture (CGRA) for a multi-mode accelerator for two kernels: FFT for the LTE standard and the Correlation Pool for the UMTS standard to be executed in a mutually exclusive manner. The CGRA multi-mode accelerator achieved computational efficiency of 39.94 GOPS/watt (OP is multiply-add) and silicon efficiency of 56.20 GOPS/mm2. By analyzing the code and inferring the unused features of the fully programmable solution, an in-house developed tool was used to automatically customize the design to run just the two kernels and the two efficiency metrics improved to 49.05 GOPS/watt and 107.57 GOPS/mm2. Corresponding numbers for the ASIC implementation are 63.84 GOPS/watt and 90.91 GOPS/mm2. Though the ASIC’s silicon and computational efficiency numbers are slightly better, the engineering efficiency of the pre-verified/characterized CGRA solution is at least 10X better than the ASIC solution.

sted, utgiver, år, opplag, sider
IEEE, 2013
Serie
IEEE International Symposium on Circuits and Systems, ISSN 0271-4310
Emneord
Coarse-grain reconfigurable architectures, Efficiency metrics, Engineering efficiency, Fully programmables, Industrial case study, Multi-standard, Silicon efficiency, UMTS standard
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-132265 (URN)10.1109/ISCAS.2013.6572129 (DOI)000332006801171 ()2-s2.0-84883388914 (Scopus ID)9781467357609 (ISBN)
Konferanse
2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013; Beijing; China; 19 May 2013 through 23 May 2013
Merknad

QC 20131104

Tilgjengelig fra: 2013-10-25 Laget: 2013-10-25 Sist oppdatert: 2022-06-23bibliografisk kontrollert
Li, S., Malik, J. S., Liu, S. & Hemani, A. (2013). A code generation method for system-level synthesis on ASIC, FPGA and manycore CGRA. In: MES '13 Proceedings of the First International Workshop on Many-core Embedded Systems: . Paper presented at 1st International Workshop on Many-Core Embedded Systems, MES 2013, in Conjunction with the 40th Annual IEEE/ACM International Symposium on Computer Architecture, ISCA 2013; Tel-Aviv; Israel; 24 June 2013 through 24 June 2013 (pp. 25-32). ACM
Åpne denne publikasjonen i ny fane eller vindu >>A code generation method for system-level synthesis on ASIC, FPGA and manycore CGRA
2013 (engelsk)Inngår i: MES '13 Proceedings of the First International Workshop on Many-core Embedded Systems, ACM , 2013, s. 25-32Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper presents a code generation method that translates an intermediate Register-Transfer Level (RTL) model of a system into its corresponding VHDL code for ASIC and FPGAs and MATLAB functions for manycores CGRAs. The intermediate representation consists of Function Implementation (FIMPs) and the glue logic. FIMPs are VHDL design units for the ASIC and FPGA implementation styles and MATLAB function templates for the CGRA implementation style, while the glue logic is a compact data structure storing Global Interconnect and Control (GLIC) information. The automatically generated implementation codes increase the resource usage by 1.5% on the average while reducing total design effort by two orders of magnitudes.

sted, utgiver, år, opplag, sider
ACM, 2013
Emneord
code generation, function implementation, global interconnect and control, system-level synthesis
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-132292 (URN)10.1145/2489068.2489072 (DOI)2-s2.0-84882284535 (Scopus ID)978-145032063-4 (ISBN)
Konferanse
1st International Workshop on Many-Core Embedded Systems, MES 2013, in Conjunction with the 40th Annual IEEE/ACM International Symposium on Computer Architecture, ISCA 2013; Tel-Aviv; Israel; 24 June 2013 through 24 June 2013
Merknad

QC 20131113

Tilgjengelig fra: 2013-10-25 Laget: 2013-10-25 Sist oppdatert: 2022-06-23bibliografisk kontrollert
Li, S., Farahini, N. & Hemani, A. (2013). Global control and storage synthesis for a system level synthesis approach. In: Proceedings - 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013: . Paper presented at 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013; Seattle, WA; United States; 28 April 2013 through 30 April 2013 (pp. 6546036). IEEE
Åpne denne publikasjonen i ny fane eller vindu >>Global control and storage synthesis for a system level synthesis approach
2013 (engelsk)Inngår i: Proceedings - 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013, IEEE , 2013, s. 6546036-Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

SYLVA is a System Level Architectural Synthesis Framework that translates Synchronous Data Flow (SDF) models of DSP sub-systems like modems and codecs into hardware implementation in ASIC/Standard Cells, FPGAs or CGRAs (Coarse Grain Reconfigurable Fabric).

sted, utgiver, år, opplag, sider
IEEE, 2013
Emneord
Architectural synthesis, Coarse-grain reconfigurable, Global control, Hardware implementations, Sub-systems, Synchronous data flow, System level synthesis, System levels
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-133889 (URN)10.1109/FCCM.2013.61 (DOI)000326442500051 ()2-s2.0-84881144555 (Scopus ID)978-0-7695-4969-9 (ISBN)
Konferanse
21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013; Seattle, WA; United States; 28 April 2013 through 30 April 2013
Merknad

QC 20131112

Tilgjengelig fra: 2013-11-12 Laget: 2013-11-11 Sist oppdatert: 2022-06-23bibliografisk kontrollert
Li, S. & Hemani, A. (2013). Global interconnect and control synthesis in system level architectural synthesis framework. In: Proceedings - 16th Euromicro Conference on Digital System Design, DSD 2013: . Paper presented at 16th Euromicro Conference on Digital System Design, DSD 2013; Santander; Spain; 4 September 2013 through 6 September 2013 (pp. 11-17). New York: IEEE
Åpne denne publikasjonen i ny fane eller vindu >>Global interconnect and control synthesis in system level architectural synthesis framework
2013 (engelsk)Inngår i: Proceedings - 16th Euromicro Conference on Digital System Design, DSD 2013, New York: IEEE , 2013, s. 11-17Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In this paper, we describe the procedure of the Global Interconnect and Control (GLIC) synthesis step in a system level synthesis framework to automatically generate GLIC logics from a scheduled SDF. The generated GLIC logics consist of control FSMs, interconnect and data buffers to glue existing function implementations to construct the system, which is modeled by the scheduled SDF. The experimental result shows that GLIC synthesis is able to generate compact (5.7%, 0.6% and 0.9% of area usage for three examples implemented in 65nm ASIC) control, interconnect and data buffers while saving huge amount of manual effort and time (0.5s, 2.4s and 4.3s run time on a 2.8GHz x86 microprocessor for the three examples).

sted, utgiver, år, opplag, sider
New York: IEEE, 2013
Emneord
Global Interconnect and Control Synthesis, High Level Synthesis, System Level Synthesis
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-132295 (URN)10.1109/DSD.2013.12 (DOI)000337235200002 ()2-s2.0-84890043994 (Scopus ID)978-0-7695-5074-9 (ISBN)
Konferanse
16th Euromicro Conference on Digital System Design, DSD 2013; Santander; Spain; 4 September 2013 through 6 September 2013
Merknad

QC 20140312

Tilgjengelig fra: 2013-10-25 Laget: 2013-10-25 Sist oppdatert: 2022-06-23bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0002-4157-4487