Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
SiLago: Enabling System Level Automation Methodology to Design Custom High-Performance Computing Platforms: Toward Next Generation Hardware Synthesis Methodologies
KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2016. , 56 p.
Series
TRITA-ICT, 2016:05
Keyword [en]
System Level Synthesis, High Level Synthesis, VLSI Design Methodology, Brain-like Computation, Neuromorphic Hardware, Address Generation, Thread Level Parallelism
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Electrical Engineering
Identifiers
URN: urn:nbn:se:kth:diva-185787ISBN: 978-91-7595-900-9 (print)OAI: oai:DiVA.org:kth-185787DiVA: diva2:924088
Public defence
2016-05-17, Sal B, Electrum 229, Isafjordsgatan 22, Kista, Stockholm, 20:24 (English)
Opponent
Supervisors
Note

QC 20160428

Available from: 2016-04-28 Created: 2016-04-27 Last updated: 2016-04-28Bibliographically approved
List of papers
1. Physical Design Aware System Level Synthesis of Hardware
Open this publication in new window or tab >>Physical Design Aware System Level Synthesis of Hardware
2015 (English)In: Proceedings - Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2015, IEEE , 2015, 141-148 p.Conference paper, Published paper (Refereed)
Abstract [en]

In spite of decades of research, only a small percentage of hardware is designed using high-level synthesis because of the large gap between the abstraction levels of standard cells and algorithmic level. We propose a grid-based regular physical design platform composed of large grain hardened building blocks called SiLago blocks. This platform is divided into regions which are specialized for different functionalities like computation, storage, system control, etc. The characterized micro-architectural operations of the SiLago platform serve as the interface to meet-in-the-middle high-level and system-level syntheses framework. This framework was used to generate three hardware macro instances, derived from SiLago platform for three applications from signal processing domain. Results show two orders of magnitude improvements in efficiency of the system-level design space exploration and synthesis time, with average loss in design quality of 18% for energy and 54% for area compared to the commercial SOC flow.

Place, publisher, year, edition, pages
IEEE, 2015
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-185777 (URN)10.1109/SAMOS.2015.7363669 (DOI)000380507900020 ()2-s2.0-84963655342 (Scopus ID)
External cooperation:
Conference
International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS) 2015, 19-23 July 2015
Note

QC 20160429

Available from: 2016-04-27 Created: 2016-04-27 Last updated: 2016-09-05Bibliographically approved
2. SiLago: A Structured Layout Scheme to Enable Efficient High Level and System Level Synthesis
Open this publication in new window or tab >>SiLago: A Structured Layout Scheme to Enable Efficient High Level and System Level Synthesis
2016 (English)Report (Other academic)
Series
TRITA-ICT, 2016:13
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-185781 (URN)978-91-7595-974-0 (ISBN)
Note

QC 20160429

Available from: 2016-04-27 Created: 2016-04-27 Last updated: 2016-04-29Bibliographically approved
3. AlgoSil: A High Level Synthesis Tool targeting Micro-architecture Level Physical Design Platform
Open this publication in new window or tab >>AlgoSil: A High Level Synthesis Tool targeting Micro-architecture Level Physical Design Platform
2016 (English)Report (Other academic)
Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2016
Series
TRITA-ICT, 2016:14
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-185782 (URN)978-91-7595-973-3 (ISBN)
Note

QC 20160429

Available from: 2016-04-27 Created: 2016-04-27 Last updated: 2016-04-29Bibliographically approved
4. 39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation
Open this publication in new window or tab >>39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation
Show others...
2013 (English)In: 2013 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE , 2013, 1448-1451 p.Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents an industrial case study of using a Coarse Grain Reconfigurable Architecture (CGRA) for a multi-mode accelerator for two kernels: FFT for the LTE standard and the Correlation Pool for the UMTS standard to be executed in a mutually exclusive manner. The CGRA multi-mode accelerator achieved computational efficiency of 39.94 GOPS/watt (OP is multiply-add) and silicon efficiency of 56.20 GOPS/mm2. By analyzing the code and inferring the unused features of the fully programmable solution, an in-house developed tool was used to automatically customize the design to run just the two kernels and the two efficiency metrics improved to 49.05 GOPS/watt and 107.57 GOPS/mm2. Corresponding numbers for the ASIC implementation are 63.84 GOPS/watt and 90.91 GOPS/mm2. Though the ASIC’s silicon and computational efficiency numbers are slightly better, the engineering efficiency of the pre-verified/characterized CGRA solution is at least 10X better than the ASIC solution.

Place, publisher, year, edition, pages
IEEE, 2013
Series
IEEE International Symposium on Circuits and Systems, ISSN 0271-4310
Keyword
Coarse-grain reconfigurable architectures, Efficiency metrics, Engineering efficiency, Fully programmables, Industrial case study, Multi-standard, Silicon efficiency, UMTS standard
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering Telecommunications Signal Processing
Identifiers
urn:nbn:se:kth:diva-132265 (URN)10.1109/ISCAS.2013.6572129 (DOI)000332006801171 ()2-s2.0-84883388914 (Scopus ID)9781467357609 (ISBN)
Conference
2013 IEEE International Symposium on Circuits and Systems, ISCAS 2013; Beijing; China; 19 May 2013 through 23 May 2013
Note

QC 20131104

Available from: 2013-10-25 Created: 2013-10-25 Last updated: 2017-03-27Bibliographically approved
5. Distributed Runtime Computation of Constraints for Multiple Inner Loops
Open this publication in new window or tab >>Distributed Runtime Computation of Constraints for Multiple Inner Loops
2013 (English)In: Proceedings - 16th Euromicro Conference on Digital System Design, DSD 2013, New York: IEEE , 2013, 389-395 p.Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents hardware solution for runtime computation of loop constraints and synchronizing delays for multiple inner loops in parallel distributed implementation of digital signal processing sub-systems. Methods to map and generate the runtime computation code for loop constraints and synchronizing delays are also presented. Compared to the traditional methods, the proposed solution achieves 55% average code compaction and 32.7% average performance improvement. The solution has modest hardware cost that increases linearly with the dimension of the architecture and has no performance penalty. Results from multiple realistic examples are presented, analyzed and compared to the traditional methods.

Place, publisher, year, edition, pages
New York: IEEE, 2013
Keyword
Streaming address generation, CGRA, Inner loop acceleration, Code compaction
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-132294 (URN)10.1109/DSD.2013.49 (DOI)000337235200053 ()2-s2.0-84890042733 (Scopus ID)978-076955074-9 (ISBN)
Conference
16th Euromicro Conference on Digital System Design, DSD 2013; Santander, Spain, 4-6 September 2013
Note

QC 20140211

Available from: 2013-10-25 Created: 2013-10-25 Last updated: 2016-04-28Bibliographically approved
6. Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric
Open this publication in new window or tab >>Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric
Show others...
2014 (English)In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 38, no 8, 788-802 p.Article in journal (Refereed) Published
Abstract [en]

This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconfigurable computation and storage fabric. The scheme can also deal with non-affine functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The first category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to configure the AGUs, transformation of non-affine functions to affine function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconfiguration agility and energy compared to the prevalent pre-computation of address constraints. The efficacy of the proposed method has been validated against the prevalent address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance improvement.

Keyword
Streaming address generation, CGRA, Parallel distributed DSP, Code compaction
National Category
Computer Science
Identifiers
urn:nbn:se:kth:diva-159997 (URN)10.1016/j.micpro.2014.05.009 (DOI)000347755200006 ()2-s2.0-84910626332 (Scopus ID)
Note

QC 20150225

Available from: 2015-02-25 Created: 2015-02-12 Last updated: 2017-12-04Bibliographically approved
7. Atomic stream computation unit based on micro-thread level parallelism
Open this publication in new window or tab >>Atomic stream computation unit based on micro-thread level parallelism
2015 (English)In: IEEE 26th Application-specific Systems, Architectures and Processors (ASAP) 2015, IEEE , 2015, 25-29 p.Conference paper, Published paper (Refereed)
Abstract [en]

The increasing demand for higher resolution of images and communication bandwidth requires the streaming applications to deal with ever increasing size of datasets. Further, with technology scaling the cost of moving data is reducing at a slower pace compared to the cost of computing. These trends have motivated the proposed micro-architectural reorganization of stream processors by dividing the stream computation into functional computation, address constraints computation and address generation and deploying independent, distributed micro-threads to implement them. This scheme is an alternative to parallelizing them at instruction level. The proposed scheme has two benefits: a more efficient sequencer logic and energy savings in address generation and transportation. These benefits are quantified for a set of streaming applications and show average percentage improvement of 39 in silicon efficiency of the sequencer logic and 23 in total computational efficiency.

Place, publisher, year, edition, pages
IEEE, 2015
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-185779 (URN)10.1109/ASAP.2015.7245700 (DOI)000380462200004 ()2-s2.0-84955568733 (Scopus ID)978-147991924-6 (ISBN)
Conference
IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP), 2015
Note

QC 20160429

Available from: 2016-04-27 Created: 2016-04-27 Last updated: 2016-08-23Bibliographically approved
8. A conceptual custom super-computer design for real-time simulation of human brain
Open this publication in new window or tab >>A conceptual custom super-computer design for real-time simulation of human brain
2013 (English)In: 2013 21st Iranian Conference on Electrical Engineering, ICEE 2013, 2013, 1-6 p.Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we introduce BRIC, a novel custom multi-chip digital computer architecture for simulating in realtime a model of human brain in form of a spiking Bayesian Confidence Propagation Neural Network (BCPNN). The design is conceptually dimensioned for available technology in 2015-2020 with the estimated size of a pizza box, consuming less than 3 kWs of power, delivering 800 Teraflops/sec (single precision multiply operation) and 30 TBs of memory. To the best of our knowledge, this will be the smallest and lowest power real-time brain simulation engine if manufactured. The silicon and computational efficiencies come from use of 3D memory stacking, innovation in algorithm and architectural customization. The chip will be programmable allowing experimentation with variants of the BCPNN brain model.

Series
Iranian Conference on Electrical Engineering, ISSN 2164-7054
Keyword
belief networks, bioelectric phenomena, biology computing, brain models, digital simulation, neural nets, parallel architectures, parallel machines, real-time systems, 3D memory stacking, BCPNN brain model, BRIC, architectural customization, computational efficiencies, conceptual custom super-computer design, human brain model, multichip digital computer architecture, real-time brain simulation engine, real-time simulation, spiking Bayesian confidence propagation neural network, Bandwidth, Brain modeling, Computational modeling, Fabrics, Memory management, Neurons, Random access memory
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-132288 (URN)10.1109/IranianCEE.2013.6599755 (DOI)000333194300234 ()2-s2.0-84886879786 (Scopus ID)978-1-4673-5634-3 (ISBN)
Conference
2013 21st Iranian Conference on Electrical Engineering, ICEE 2013; Mashhad; Iran; 14 May 2013 through 16 May 2013
Note

QC 20140122

Available from: 2013-10-25 Created: 2013-10-25 Last updated: 2016-04-28Bibliographically approved
9. A scalable custom simulation machine for the Bayesian Confidence Propagation Neural Network model of the brain
Open this publication in new window or tab >>A scalable custom simulation machine for the Bayesian Confidence Propagation Neural Network model of the brain
Show others...
2014 (English)In: 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE , 2014, 578-585 p.Conference paper, Published paper (Refereed)
Abstract [en]

A multi-chip custom digital super-computer called eBrain for simulating Bayesian Confidence Propagation Neural Network (BCPNN) model of the human brain has been proposed. It uses Hybrid Memory Cube (HMC), the 3D stacked DRAM memories for storing synaptic weights that are integrated with a custom designed logic chip that implements the BCPNN model. In 22nm node, eBrain executes BCPNN in real time with 740 TFlops/s while accessing 30 TBs synaptic weights with a bandwidth of 112 TBs/s while consuming less than 6 kWs power for the typical case. This efficiency is three orders better than general purpose supercomputers in the same technology node.

Place, publisher, year, edition, pages
IEEE, 2014
Series
Proceedings of the Asia and South Pacific Design Automation Conference, ASP-DAC
Keyword
3d-stacked drams, Human brain, Hybrid memory, Logic chips, Neural network model, Simulation machine, Synaptic weight, Technology nodes
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-145446 (URN)10.1109/ASPDAC.2014.6742953 (DOI)000350791700104 ()2-s2.0-84897883326 (Scopus ID)978-147992816-3 (ISBN)
Conference
2014 19th Asia and South Pacific Design Automation Conference, ASP-DAC 2014; Suntec; Singapore; 20 January 2014 through 23 January 2014
Note

QC 20140522

Available from: 2014-05-22 Created: 2014-05-21 Last updated: 2016-04-28Bibliographically approved

Open Access in DiVA

fulltext(2939 kB)237 downloads
File information
File name FULLTEXT01.pdfFile size 2939 kBChecksum SHA-512
d72ca92209b57767500ab0a9c0bf61205086769ec5d2743a7bd0a42940d1afb5998f83c1ff9a1ca8f15f7238bafac8b82086c42c6562d77662fee0887a2e5ea8
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Farahini, Nasim
By organisation
Electronics and Embedded Systems
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 237 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 893 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf