kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Reducing the Configuration Overhead of the Distributed Two-level Control System
KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems, Electronic and embedded systems.ORCID iD: 0000-0003-2396-3590
KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems.ORCID iD: 0000-0002-5697-4272
KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems.ORCID iD: 0000-0003-0565-9376
2022 (English)In: PROCEEDINGS OF THE 2022 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2022), IEEE, 2022, p. 104-107Conference paper, Published paper (Refereed)
Abstract [en]

With the growing demand for more efficient hardware accelerators for streaming applications, a novel Coarse-Grained Reconfigurable Architecture (CGRA) that uses a DistributedTwo-Level Control (D2LC) system has been proposed in the literature. Even though the highly distributed and parallel structure makes it fast and energy-efficient, the single-issue instruction channel between the level-1 and level-2 controller in each D2LC cell becomes the bottleneck of its performance. In this paper, we improve its design to mimic a multi-issued architecture by inserting shadow instruction buffers between the level-1 and level-2 controllers. Together with a zero-overhead hardware loop, the improved D2LC architecture can enable efficient overlap between loop iterations. We also propose a complete constraint programming based instruction scheduling algorithm to support the above hardware features. The experiment result shows that the improved D2LC architecture can achieve up to 25% of reduction on the instruction execution cycles and 35% reduction on the energy-delay product.

Place, publisher, year, edition, pages
IEEE, 2022. p. 104-107
Series
Design Automation and Test in Europe Conference and Exhibition, ISSN 1530-1591
Keywords [en]
Loop acceleration, Instruction scheduling, CGRA, Two-level control, Constraint programming
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
URN: urn:nbn:se:kth:diva-311689DOI: 10.23919/date54114.2022.9774741ISI: 000819484300024Scopus ID: 2-s2.0-85130840236OAI: oai:DiVA.org:kth-311689DiVA, id: diva2:1655361
Conference
25th Design, Automation and Test in Europe Conference and Exhibition (DATE), 14-23 Mars, 2022
Note

Part of proceedings: ISBN 978-3-9819263-6-1

QC 20220503

QC 20220121

Available from: 2022-05-02 Created: 2022-05-02 Last updated: 2023-02-21Bibliographically approved
In thesis
1. Synchoros VLSI Design Style
Open this publication in new window or tab >>Synchoros VLSI Design Style
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Computers have become essential to everyday life as much as electricity, communications and transport. That is evident from the amount of electricity we spend to power our computing systems. According to some reports it is estimated to be ≈ 7% of the total consumption worldwide. This trend is very worrisome, and the development of computing systems with lower power consumption is essential. This is even more important for battery-powered computers deployed in the field. The industry and the scientific community have realised that general-purpose computing platforms cannot offer that level of computational efficiency and that customisation is the solution to this problem. Application-Specific Integrated Circuits (ASICs) provide the highest efficiency in the mainstream implementation styles. ASICs have been shown to provide 100 to 1000× better computational efficiency than general-purpose computing platforms. However, the design cost of ASICs restricts it to products that have a large volume or large profit. In essence, to achieve ASIC-like computational efficiency, the design efficiency becomes the bottleneck. SynchorosVLSI design has been proposed to non-incrementally lower the design cost of custom ASIC-like solutions. The synchoros VLSI design is a novel concept that can reduce the design cost of ASICs and their manufacturing. Insynchoros design, the space is discretised, and the final design emerges by the abutment of synchoros micro-architecture level design objects called SiLago(Silicon Lego) blocks. The SiLago framework has the potential to reduce the design cost of ASICs and their manufacturing. This thesis makes three research areas of contributions toward synchoros VLSI design. The first area concerns composition by abutment. In this contribution, a design has been proposed to show how a clock tree can be created by abutting fragments inside the SiLago blocks. Additionally, the clock tree created by abutment was validated by the EDA tools and its cost metrics compared to the functionally equivalent clock tree created by the conventional EDA flows. The second area is to enhance the micro-architectural framework. These contributions include SiLago blocks tailored for neural network computation and architectural enhancements to improve the efficiency of executing streaming applications in the SiLago framework. Furthermore, a novel genome recognition application based on a self-organising map (SOM) was also mapped to the SiLago framework. The third area of contribution is implementing a model of cortex as a tiled ASIC design using custom 3D DRAM vaults for synaptic storage. This work is preparatory work to identify the SiLago blocks needed to support the implementation of spiking neuromorphic structures and in general applications of ordinary differential equations.

Abstract [sv]

Datorer har blivit lika oumbärliga för vardagen som el, kommunikations- och transportmedel. Något som bekräftas av mängden el vi lägger på våra datorer. Några rapporter uppskattas mängden till hela ≈ 7 % av världens totala elbehov. Utvecklingen är mycket oroväckande och det är av högsta vikt att vi tar fram energisnålare datorer. Det är ännu viktigare för batteridrivna datorer. Både i näringslivet och i forskningsvärlden har man insett att standardiserade datorer och plattformar inte kan erbjuda samma prestanda och energisnålhet som datasystem specialbyggda för specifika ändamål kan. Appli-kationsspecifica integrerade kretsar (Application-Specific Integrated Circuit –ASIC) är det som oftast används när målet är högsta prestanda. Dessa har visats kunna uppnå 100 till 1000 gånger högre prestanda än standardiserade datasystem. Nackdelen med ASIC:er är att utvecklingskostnaderna är mycket höga och att de därför endast kan användas till produkter som massproduceras eller som har hög vinstmarginal. Utvecklingskostnaderna har alltså blivit flaskhalsen och det främsta hindret på vägen mot ASIC-liknande prestanda. För att komma runt flaskhalsen och drastiskt minska utvecklingskostnaderna för ASIC-liknande kretsar så har designmetoden synkoros storskalig integration (Synchorous Very Large-Scale Integration, VLSI) föreslagits. Synkoros VLSI är ett nytt koncept som kan minska ASIC:ers utvecklings- och produktionskostnader. Metoden går ut på att diskretisera utrymme så att designen uppkommer genom att mindre, sykorosa, SiLago-komponenter (Silicon Le-go) fogas samman. Potentiellt kan SiLago och dess ramverk minska ASIC:ers utvecklings- och produktionskostnader. Denna tes bidrar till tre forsknings-områden inom synkoros VLSI. Det första området handlar om design via sam-manfogning. Här föreslås en design av ett klockträd som skapas genom sam-manfogning av SiLago-komponenter. Klockträdet verifieras med elektronik-designverktyg (Electronic Design Automation, EDA) och prestandan jämförs med ett klockträd som skapats med ett vanligt elektronikdesignverktyg. Detandra området handlar om hur SiLago-komponenterna kan förbättras. Bidragen inom detta område beskriver SiLago-komponenter för neurala nätverk och för ökad prestanda för exekvering av strömmande applikationer. Därtill designas ett intressant genidentifieringssystem baserat på en självorganiserad karta för SiLago. Det tredje bidraget är en modell av hjärnbarken implemented som en kaklad ASIC-krets på ett specialbyggt tredimensionellt DRAM-valv för sy-napslagring. Bidraget är förberedande och undersöker vad som krävs för att implementera skjutande neuromorfiska strukturer och ordinära differentia-lekvationer i allmänhet

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2022. p. ix, 61
Series
TRITA-EECS-AVL ; 2022:30
Keywords
VLSI, ASIC, CGRA, hardware architectures, synchoros VLSI, SiLago, eBrain, BCPNN
National Category
Other Electrical Engineering, Electronic Engineering, Information Engineering Embedded Systems
Research subject
Electrical Engineering
Identifiers
urn:nbn:se:kth:diva-311977 (URN)978-91-8040-214-9 (ISBN)
Public defence
2022-05-27, https://kth-se.zoom.us/j/66321917120, Ka-Sal C, Electrum, Kungliga Tekniska Högskolan, Kistagången 16, Kista, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20220506

Available from: 2022-05-06 Created: 2022-05-06 Last updated: 2022-06-25Bibliographically approved
2. High-Level Synthesis for SiLago: Advances in Optimization of High-Level Synthesis Tool and Neural Network Algorithms
Open this publication in new window or tab >>High-Level Synthesis for SiLago: Advances in Optimization of High-Level Synthesis Tool and Neural Network Algorithms
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Embedded hardware designs and their automation improve energy and engineering efficiency. However, these two goals are often contradictory. The attempts to improve energy efficiency often come at the cost of engineering efficiency and vice-versa. High-level synthesis (HLS) is a good example of this challenge. It has been researched for more than three decades. Nevertheless, it has not become a mainstream design flow component concerning custom hardware synthesis due to the big efficiency gap between the HLS-generated hardware design and the manual RTL design.

This thesis attempts to address the HLS challenge. We divide the research challenge of improving state-of-the-art HLS into three components: 1) the hardware architecture and its underlying VLSI design style, 2) the design automation algorithms and data structures, and 3) the optimization of the algorithm to be mapped.

The SiLago hardware platform has been reported as a prominent hardware architecture that can deliver ASIC-like efficiency and could be an ideal HLS hardware platform. It has the following features: 1) SiLago embodies parallel distributed two-level control. 2) SiLago blocks are hardened blocks that can create valid VLSI designs by abutment without involving logic or physical synthesis.

Consequently, when targeting the SiLago hardware platform, the SiLago HLS tool generates not a single controller but multiple collaborative controllers, each of which is a hierarchy of two levels. The distributed two-level control scheme poses unique challenges in synchronization and scheduling tasks. Unique data structures and instruction scheduling models are developed for the SiLago HLS tool to support the distributed two-level control scheme. The SiLago HLS tool also generates a valid GDSII macro whose average energy, area, and performance are not estimated but known with post-layout accuracy thanks to the predictable SiLago hardware blocks. Moreover, the SiLago HLS tool is not intended for the end-user. It is designed to develop a library of algorithm implementations used by the application-level synthesis (ALS) tool in the SiLago framework. The application is defined as a hierarchy of algorithms. This library would include algorithms that vary in their function, dimension, and degree of parallelism. The ALS tool explores the design space in terms of number and type of algorithm implementation, rather than arithmetic resources, as HLS tools do.

Algorithms are often developed by domain experts. For efficient implementation in hardware, such algorithms often need to be optimized with the hardware platform in mind. Two algorithm instances have been chosen for demonstration purposes. The first instance is a self-organizing map (SOM) based genome recognition algorithm. The second example concerns a complex model of cortex called Bayesian confidence propagation neural network (BCPNN). As developed by computational neuroscientists, the original model demands too much memory storage and memory access.

This thesis addresses the latter two components because the first component has been addressed in the literature. We will first demonstrate the design of the SiLago HLS tool to support the hardware features like the distributed two-level control system. Moreover, we will use the two complex algorithm instances -- SOM and BCPNN, to demonstrate both general-purpose and algorithm-specific hardware-oriented algorithm optimization techniques. With the research carried out in this thesis, the SiLago HLS framework is greatly improved.

Abstract [sv]

Automatiseringen av inbyggda system ökar ingenjörers produktivitet och minskar systemens energiförbrukning. Målen är ofta motstridiga då högre ingenjörsproduktivitet sker på bekostnad av energiförbrukning och vice versa. Högnivåsyntes (high-level synthesis, HLS) exemplifierar dilemmat. Trots att det forskats i mer än tre decennier på HLS har inte designmetodiken blivit mainstream inom elektronikdesign på grund av det stora effektivitetsgapet mellan den HLS-genererade och den manuellt skapade RTL-designen.

Avhandlingen handlar om detta dilemma. Forskningsutmaningarna kring att förbättra HLS avhandlar vi i tre delar: 1) hårdvaruarkitektur och underliggande VLSI-designmetodik, 2) elektronikdesignens algoritmer och datastruktuer och 3) optimering av den algoritm som ska implementeras.

SiLago-plattformen har visats vara en framstående hårdvaruarkitektur som kan uppnå ASIC-liknande prestanda samtidigt som den är en idealisk för HLS. Plattformen har följande egenskaper: 1) SiLago förkroppsligar den parallella distribuerade två-nivåskontrollparadigmen, 2) SiLago-komponenter är försyntetiserade med vilka funktionella VLSI-designer kan skapas genom hopfogning utan ytterligare logisk eller fysisk syntes.

Därför skapar inte SiLagos HLS-verktyg en enda controller utan flera stycken samverkande controllers. Var och en av dessa består av en tvånivåshierarki. Detta medför unika synkroniserings- och schemaläggningsutmaningar. Unika datastrukturer och schemaläggningsmodeller har utvecklats för SiLago HLS för att stödja denna två-nivåskontrollparadigm. Därutöver skapar SiLagos HLS-verktyg GDSII-makron vars genomsnittliga energiförbrukning, yta och prestanda inte uppskattas utan bestäms med post-layout precision tack vare SiLagos försyntetiserade komponenter. Målgruppen för SiLagos HLS-verktyg är inte slutanvändare utan utvecklare som utvecklar algoritmbibliotek som sedan används av SiLagos applikationssyntesverktyg (application level synthesis, ALS). Applikationen ses som en hierarki av algoritmer. Biblioteket kan innehålla algoritmer vars egenskaper skiljer, såsom olika funktioner, dimensioner och parallelliseringsmöjligheter. ALS-verktyget utforskar designrymden i termer av antal och algoritmtyper, istället för aritmetiska resurser, som konkurrerande HLS-verktyg gör.

Algoritmer utvecklas ofta av sakkunniga. För att de ska kunna realiseras i hårdvara måste de optimeras med målplattformen i åtanke. Två algoritmer har valts som demonstrationsexempel. Det första exemplet är en genidentifieringsalgoritm som bygger på en självorganiserande karta (self-organizing map, SOM). Det andra exemplet är en avancerad modell av hjärnbarken, ett bayesiskt neuralt överföringsnätverk (Bayesian confidence propagation network, BCPNN). Då modellen utvecklats av beräkningsneurovetenskapspersoner kräver den för mycket lagring och överföringskapacitet.

Avhandlingen handlar om de två sistnämnda delarna eftersom den förstnämnda redan avhandlats av andra. Vi visar hur SiLago HLS stödjer distribuerade tvånivåskontrollsystem. Därutöver och med de två nämnda algoritmexemplen - SOM och BCPNN - demonstrerar vi algoritmspecifika och plattformsspecifika optimeringstekniker. Forskningen som beskrivs i avhandlingen har signifikant förbättrat SiLagos HLS-ramverk.

Place, publisher, year, edition, pages
Sweden: KTH Royal Institute of Technology, 2022. p. 76
Series
TRITA-EECS-AVL ; 2022:48
Keywords
Electronic Design Automation (EDA), Computer Aided Design (CAD), Algorithm-level Synthesis, SiLago, Optimization Techniques, Neural Network
National Category
Embedded Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-317555 (URN)978-91-8040-300-9 (ISBN)
Public defence
2022-10-06, Ka-Sal C, Electrum, Kungliga Tekniska Högskolan, Kistagången 16, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20220914

Available from: 2022-09-14 Created: 2022-09-13 Last updated: 2022-09-14Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Yang, YuStathis, DimitriosHemani, Ahmed

Search in DiVA

By author/editor
Yang, YuStathis, DimitriosHemani, Ahmed
By organisation
Electronic and embedded systemsElectronics and Embedded systems
Electrical Engineering, Electronic Engineering, Information Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 193 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf