kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 32) Show all publications
Hemani, A., Farahini, N., Jafri, S., Sohofi, H., Li, S. & Paul, K. (2017). The silago solution: Architecture and design methods for a heterogeneous dark silicon aware coarse grain reconfigurable fabric. In: The Dark Side of Silicon: Energy Efficient Computing in the Dark Silicon Era (pp. 47-94). Springer
Open this publication in new window or tab >>The silago solution: Architecture and design methods for a heterogeneous dark silicon aware coarse grain reconfigurable fabric
Show others...
2017 (English)In: The Dark Side of Silicon: Energy Efficient Computing in the Dark Silicon Era, Springer, 2017, p. 47-94Chapter in book (Refereed)
Abstract [en]

The dark silicon constraint will restrict the VLSI designers to utilize an increasingly smaller percentage of transistors as we progress deeper into nano-scale regime because of the power delivery and thermal dissipation limits. The best way to deal with the dark silicon constraint is to use the transistors that can be turned on as efficiently as possible. Inspired by this rationale, the VLSI design community has adopted customization as the principal means to address the dark silicon constraint. Two categories of customization, often in tandem have been adopted by the community. The first is the processors that are heterogeneous in functionality and/or have ability to more efficiently match varying functionalities and runtime load. The second category of customization is based on the fact that hardware implementations often offer 2–6 orders more efficiency compared to software. For this reason, designers isolate the power and performance critical functionality and map them to custom hardware implementations called accelerators. Both these categories of customizations are partial in being compute centric and still implement the bulk of functionality in the inefficient software style. In this chapter, we propose a contrarian approach: implement the bulk of functionality in hardware style and only retain control intensive and flexibility critical functionality in small simple processors that we call flexilators. We propose using a micro-architecture level coarse grain reconfigurable fabric as the alternative to the Boolean level standard cells and LUTs of the FPGAs as the basis for dynamically reconfigurable hardware implementation. This coarse grain reconfigurable fabric allows dynamic creation of arbitrarily wide and deep datapath with their hierarchical control that can be coupled with a cluster of storage resources to create private execution partitions that host individual applications. Multiple such partitions can be created that can operate at different voltage frequency operating points. Unused resources can be put into a range of low power modes. This CGRA fabric allows not just compute centric customization but also interconnect, control, storage and access to storage can be customized. The customization is not only possible at compile/build time but also at runtime to match the available resources and runtime load conditions. This complete, micro-architecture level hardware centric customization overcomes the limitations of partial compute centric customization offered by the state-of-the-art accelerator-rich heterogeneous multi-processor implementation style by extracting more functionality and performance from the limited number of transistors that can be turned on. Besides offering complete and more effective customization and a hardware centric implementation style, we also propose a methodology that dramatically reduces the cost of customization. This methodology is based on a concept called SiLago (Silicon Large Grain Objects) method. The core idea behind the SiLago method is to use large grain micro-architecture level hardened and characterized blocks, the SiLago blocks, as the atomic physical design building blocks and a grid based structured layout scheme that enables composition of the SiLago fabric simply by abutting the blocks to produce a timing and DRC clean GDSII design. Effectively, the SiLago method raises the abstraction of the physical design to micro-architectural level from the present Boolean level standard cell and LUT based physical design. This significantly improves the efficiency and predictability of synthesis from higher levels of abstraction. In addition, it also enables true system-level synthesis that by virtue of correct-by-construction guarantee eliminates the costly functional verification step. The proposed solution allows a fully customized design with dynamic fine grain power management to be automatically generated from Simulink down to GDSII with computational and silicon efficiencies that are modestly lower than ASIC. The micro-architecture level SiLago block based design process with correct by construction guarantee is 5–6 orders more efficient and 2 orders more accurate compared to the Boolean standard cell based design flows.

Place, publisher, year, edition, pages
Springer, 2017
National Category
Embedded Systems
Identifiers
urn:nbn:se:kth:diva-216341 (URN)10.1007/978-3-319-31596-6_3 (DOI)2-s2.0-85027724442 (Scopus ID)
Note

QC 20241113

Part of ISBN 9783319315966, 9783319315942

Available from: 2017-10-23 Created: 2017-10-23 Last updated: 2024-11-13Bibliographically approved
Farahini, N., Hemani, A., Sohofi, H. & Li, S. (2015). Physical Design Aware System Level Synthesis of Hardware. In: Proceedings - Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2015: . Paper presented at International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) 2015, 19-23 July 2015 (pp. 141-148). IEEE
Open this publication in new window or tab >>Physical Design Aware System Level Synthesis of Hardware
2015 (English)In: Proceedings - Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2015, IEEE , 2015, p. 141-148Conference paper, Published paper (Refereed)
Abstract [en]

In spite of decades of research, only a small percentage of hardware is designed using high-level synthesis because of the large gap between the abstraction levels of standard cells and algorithmic level. We propose a grid-based regular physical design platform composed of large grain hardened building blocks called SiLago blocks. This platform is divided into regions which are specialized for different functionalities like computation, storage, system control, etc. The characterized micro-architectural operations of the SiLago platform serve as the interface to meet-in-the-middle high-level and system-level syntheses framework. This framework was used to generate three hardware macro instances, derived from SiLago platform for three applications from signal processing domain. Results show two orders of magnitude improvements in efficiency of the system-level design space exploration and synthesis time, with average loss in design quality of 18% for energy and 54% for area compared to the commercial SOC flow.

Place, publisher, year, edition, pages
IEEE, 2015
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-185777 (URN)10.1109/SAMOS.2015.7363669 (DOI)000380507900020 ()2-s2.0-84963655342 (Scopus ID)
External cooperation:
Conference
International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) 2015, 19-23 July 2015
Note

QC 20160429

Available from: 2016-04-27 Created: 2016-04-27 Last updated: 2022-06-22Bibliographically approved
Li, S. (2015). System-Level Architectural Hardware Synthesis for Digital Signal Processing Sub-Systems. (Doctoral dissertation). Stockholm: KTH Royal Institute of Technology
Open this publication in new window or tab >>System-Level Architectural Hardware Synthesis for Digital Signal Processing Sub-Systems
2015 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

This thesis presents a novel system-level synthesis framework called System-Level Architectural Synthesis Framework (SYLVA), which synthesizes DigitalSignal Processing (DSP) sub-systems modeled by synchronous data ?ow intohardware implementations in Application-Specific Integrated Circuit (ASIC),Field-Programmable Gate Array (FPGA) or Coarse-Grained ReconfigurableArchitecture (CGRA) style. SYLVA synthesizes in terms of pre-characterizedFunction Implementations (FIMPs). It explores the design space in threedimensions, number of FIMPs, type of FIMPs, and pipeline parallelism be-tween the producing and consuming FIMPs. SYLVA also introduces timingand interface model of FIMPs to enable reuse and automatic generation ofGlobal Interconnect and Control (GLIC) to glue the FIMPs together into aworking system. SYLVA has been evaluated by applying it to several realand synthetic DSP applications and the experimental results are analyzedfor the design space exploration, the GLIC synthesis, the code generation,and the CGRA floorplanning features. The conclusion from the experimentalresults is that by exploring the multi-dimensional design space in terms ofpre-characterized FIMPs, SYLVA explores a richer design space and does itmore effectively compared to the existing High-Level Synthesis (HLS) toolsto improve both engineering and computational efficiency.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2015. p. xxii, 193
Series
TRITA-ICT ; 2015:28
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-180441 (URN)978-91-7595-799-9 (ISBN)
Public defence
2016-02-18, Sal/hall C, Elektrum, KTH-ICT, Kista, 13:00 (English)
Opponent
Supervisors
Note

QC 20160125

Available from: 2016-01-25 Created: 2016-01-13 Last updated: 2022-06-23Bibliographically approved
Li, S. & Hemani, A. (2014). Accurate and efficient three level design space exploration based on constraints satisfaction optimization problem solver. In: Proceedings - 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014: . Paper presented at 22nd IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, 11 May 2014 through 13 May 2014.
Open this publication in new window or tab >>Accurate and efficient three level design space exploration based on constraints satisfaction optimization problem solver
2014 (English)In: Proceedings - 2014 IEEE 22nd International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, 2014Conference paper, Published paper (Refereed)
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-167931 (URN)10.1109/FCCM.2014.56 (DOI)000410585800046 ()2-s2.0-84912520888 (Scopus ID)9781479951116 (ISBN)
Conference
22nd IEEE International Symposium on Field-Programmable Custom Computing Machines, FCCM 2014, 11 May 2014 through 13 May 2014
Note

QC 20150605

Available from: 2015-06-05 Created: 2015-05-22 Last updated: 2022-06-23Bibliographically approved
Li, S. & Hemani, A. (2014). Case Study: Constraint Programming in a System Level Synthesis Framework. In: PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, CP 2014: . Paper presented at 20th International Conference on the Principles and Practice of Constraint Programming (CP), SEP 08-12, 2014, Lyon, FRANCE (pp. 846-861).
Open this publication in new window or tab >>Case Study: Constraint Programming in a System Level Synthesis Framework
2014 (English)In: PRINCIPLES AND PRACTICE OF CONSTRAINT PROGRAMMING, CP 2014, 2014, p. 846-861Conference paper, Published paper (Refereed)
Abstract [en]

This article presents a case study of using a constraint programming solver in a system level synthesis framework called SYLVA. The solver is used to find the repetition vector of a synchronous data flow graph and serving as the design space exploration engine, which rapidly finds qualified system implementations by solving a constraint satisfaction optimization problem. Each system implementation is a combination of a number of function implementation instances and their cycle accurate execution schedules. The problem to be solved is automatically generated based on the user inputs: 1) a system model to be synthesized, 2) a library containing all the usable function implementations, 3) the performance/cost constraints, and 4) the optimization objectives. Use of constraints programming technique enabled a low cost development of design space exploration engine in addition to gaining ease of use.

Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8656
Keywords
System Level Synthesis, Design Space Exploration, Constraint Programming
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-158857 (URN)10.1007/978-3-319-10428-7_60 (DOI)000345088200060 ()2-s2.0-84906222505 (Scopus ID)978-3-319-10428-7 (ISBN)978-3-319-10427-0 (ISBN)
Conference
20th International Conference on the Principles and Practice of Constraint Programming (CP), SEP 08-12, 2014, Lyon, FRANCE
Note

QC 20150116

Available from: 2015-01-16 Created: 2015-01-12 Last updated: 2022-06-23Bibliographically approved
Li, S. & Hemani, A. (2014). Three-Dimensional Design Space Exploration for System Level Synthesis. In: 2014 17TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD): . Paper presented at 17th Euromicro Conference on Digital System Design (DSD), AUG 27-29, 2014, Verona, ITALY (pp. 419-426).
Open this publication in new window or tab >>Three-Dimensional Design Space Exploration for System Level Synthesis
2014 (English)In: 2014 17TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD), 2014, p. 419-426Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we propose an efficient and effective three-dimensional design space exploration method for mapping a DSP system in synchronous data flow graph format onto an RTL or lower level hardware description using constraint programming. The three dimensions are 1) schedule level parallelism (The parallelism of the executions for one DSP function, fully parallel, semi-parallel or fully serial), 2) function level parallelism (how many function implementations are used to implement each of the DSP functions), and 3) arithmetic level parallelism (how the function implementations are implemented). The design space exploration problem is formulated as a constraints satisfaction optimization problem and solved by the constraint programming solver in Google's or-tools. The proposed method is compared against two state-of-the-art commercial HLS tools for four realistic examples and one synthetic example. The metrics compared are runtime, accuracy and quality of results in terms of resource usage. We show on average, the proposed method is 85.22% faster compared to HLS tools, 4.3% more accurate and 8.27% better in quality of results. For the latter we have conservatively assumed the same function execution parallelism.

Keywords
System Level Synthesis, Design Space Exploration, Constraint Programming
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-172644 (URN)10.1109/DSD.2014.45 (DOI)000358409000055 ()2-s2.0-84928822237 (Scopus ID)978-1-4799-5793-4 (ISBN)
Conference
17th Euromicro Conference on Digital System Design (DSD), AUG 27-29, 2014, Verona, ITALY
Note

QC 20150827

Available from: 2015-08-27 Created: 2015-08-27 Last updated: 2022-06-23Bibliographically approved
Farahini, N., Li, S., Tajammul, M. A., Shami, M. A., Chen, G., Hemani, A. & Ye, W. (2013). 39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation. In: 2013 IEEE International Symposium on Circuits and Systems (ISCAS): . Paper presented at 2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013; Beijing; China; 19 May 2013 through 23 May 2013 (pp. 1448-1451). IEEE
Open this publication in new window or tab >>39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation
Show others...
2013 (English)In: 2013 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE , 2013, p. 1448-1451Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents an industrial case study of using a Coarse Grain Reconfigurable Architecture (CGRA) for a multi-mode accelerator for two kernels: FFT for the LTE standard and the Correlation Pool for the UMTS standard to be executed in a mutually exclusive manner. The CGRA multi-mode accelerator achieved computational efficiency of 39.94 GOPS/watt (OP is multiply-add) and silicon efficiency of 56.20 GOPS/mm2. By analyzing the code and inferring the unused features of the fully programmable solution, an in-house developed tool was used to automatically customize the design to run just the two kernels and the two efficiency metrics improved to 49.05 GOPS/watt and 107.57 GOPS/mm2. Corresponding numbers for the ASIC implementation are 63.84 GOPS/watt and 90.91 GOPS/mm2. Though the ASIC’s silicon and computational efficiency numbers are slightly better, the engineering efficiency of the pre-verified/characterized CGRA solution is at least 10X better than the ASIC solution.

Place, publisher, year, edition, pages
IEEE, 2013
Series
IEEE International Symposium on Circuits and Systems, ISSN 0271-4310
Keywords
Coarse-grain reconfigurable architectures, Efficiency metrics, Engineering efficiency, Fully programmables, Industrial case study, Multi-standard, Silicon efficiency, UMTS standard
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering Telecommunications Signal Processing
Identifiers
urn:nbn:se:kth:diva-132265 (URN)10.1109/ISCAS.2013.6572129 (DOI)000332006801171 ()2-s2.0-84883388914 (Scopus ID)9781467357609 (ISBN)
Conference
2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013; Beijing; China; 19 May 2013 through 23 May 2013
Note

QC 20131104

Available from: 2013-10-25 Created: 2013-10-25 Last updated: 2022-06-23Bibliographically approved
Li, S., Malik, J. S., Liu, S. & Hemani, A. (2013). A code generation method for system-level synthesis on ASIC, FPGA and manycore CGRA. In: MES '13 Proceedings of the First International Workshop on Many-core Embedded Systems: . Paper presented at 1st International Workshop on Many-Core Embedded Systems, MES 2013, in Conjunction with the 40th Annual IEEE/ACM International Symposium on Computer Architecture, ISCA 2013; Tel-Aviv; Israel; 24 June 2013 through 24 June 2013 (pp. 25-32). ACM
Open this publication in new window or tab >>A code generation method for system-level synthesis on ASIC, FPGA and manycore CGRA
2013 (English)In: MES '13 Proceedings of the First International Workshop on Many-core Embedded Systems, ACM , 2013, p. 25-32Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a code generation method that translates an intermediate Register-Transfer Level (RTL) model of a system into its corresponding VHDL code for ASIC and FPGAs and MATLAB functions for manycores CGRAs. The intermediate representation consists of Function Implementation (FIMPs) and the glue logic. FIMPs are VHDL design units for the ASIC and FPGA implementation styles and MATLAB function templates for the CGRA implementation style, while the glue logic is a compact data structure storing Global Interconnect and Control (GLIC) information. The automatically generated implementation codes increase the resource usage by 1.5% on the average while reducing total design effort by two orders of magnitudes.

Place, publisher, year, edition, pages
ACM, 2013
Keywords
code generation, function implementation, global interconnect and control, system-level synthesis
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-132292 (URN)10.1145/2489068.2489072 (DOI)2-s2.0-84882284535 (Scopus ID)978-145032063-4 (ISBN)
Conference
1st International Workshop on Many-Core Embedded Systems, MES 2013, in Conjunction with the 40th Annual IEEE/ACM International Symposium on Computer Architecture, ISCA 2013; Tel-Aviv; Israel; 24 June 2013 through 24 June 2013
Note

QC 20131113

Available from: 2013-10-25 Created: 2013-10-25 Last updated: 2022-06-23Bibliographically approved
Li, S., Farahini, N. & Hemani, A. (2013). Global control and storage synthesis for a system level synthesis approach. In: Proceedings - 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013: . Paper presented at 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013; Seattle, WA; United States; 28 April 2013 through 30 April 2013 (pp. 6546036). IEEE
Open this publication in new window or tab >>Global control and storage synthesis for a system level synthesis approach
2013 (English)In: Proceedings - 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013, IEEE , 2013, p. 6546036-Conference paper, Published paper (Refereed)
Abstract [en]

SYLVA is a System Level Architectural Synthesis Framework that translates Synchronous Data Flow (SDF) models of DSP sub-systems like modems and codecs into hardware implementation in ASIC/Standard Cells, FPGAs or CGRAs (Coarse Grain Reconfigurable Fabric).

Place, publisher, year, edition, pages
IEEE, 2013
Keywords
Architectural synthesis, Coarse-grain reconfigurable, Global control, Hardware implementations, Sub-systems, Synchronous data flow, System level synthesis, System levels
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-133889 (URN)10.1109/FCCM.2013.61 (DOI)000326442500051 ()2-s2.0-84881144555 (Scopus ID)978-0-7695-4969-9 (ISBN)
Conference
21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013; Seattle, WA; United States; 28 April 2013 through 30 April 2013
Note

QC 20131112

Available from: 2013-11-12 Created: 2013-11-11 Last updated: 2022-06-23Bibliographically approved
Li, S. & Hemani, A. (2013). Global interconnect and control synthesis in system level architectural synthesis framework. In: Proceedings - 16th Euromicro Conference on Digital System Design, DSD 2013: . Paper presented at 16th Euromicro Conference on Digital System Design, DSD 2013; Santander; Spain; 4 September 2013 through 6 September 2013 (pp. 11-17). New York: IEEE
Open this publication in new window or tab >>Global interconnect and control synthesis in system level architectural synthesis framework
2013 (English)In: Proceedings - 16th Euromicro Conference on Digital System Design, DSD 2013, New York: IEEE , 2013, p. 11-17Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we describe the procedure of the Global Interconnect and Control (GLIC) synthesis step in a system level synthesis framework to automatically generate GLIC logics from a scheduled SDF. The generated GLIC logics consist of control FSMs, interconnect and data buffers to glue existing function implementations to construct the system, which is modeled by the scheduled SDF. The experimental result shows that GLIC synthesis is able to generate compact (5.7%, 0.6% and 0.9% of area usage for three examples implemented in 65nm ASIC) control, interconnect and data buffers while saving huge amount of manual effort and time (0.5s, 2.4s and 4.3s run time on a 2.8GHz x86 microprocessor for the three examples).

Place, publisher, year, edition, pages
New York: IEEE, 2013
Keywords
Global Interconnect and Control Synthesis, High Level Synthesis, System Level Synthesis
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-132295 (URN)10.1109/DSD.2013.12 (DOI)000337235200002 ()2-s2.0-84890043994 (Scopus ID)978-0-7695-5074-9 (ISBN)
Conference
16th Euromicro Conference on Digital System Design, DSD 2013; Santander; Spain; 4 September 2013 through 6 September 2013
Note

QC 20140312

Available from: 2013-10-25 Created: 2013-10-25 Last updated: 2022-06-23Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-4157-4487

Search in DiVA

Show all publications