kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Scheduling Persistent and Fully Cooperative Instructions
KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems, Electronic and embedded systems.ORCID iD: 0000-0003-2396-3590
KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems, Electronic and embedded systems.ORCID iD: 0000-0003-0565-9376
Indian Inst Technol Delhi, Dept Comp Sci & Engn, Delhi, India..
2021 (English)In: 2021 24TH EUROMICRO CONFERENCE ON DIGITAL SYSTEM DESIGN (DSD 2021) / [ed] Leporati, F Vitabile, S Skavhaug, A, Institute of Electrical and Electronics Engineers (IEEE) , 2021, p. 229-237Conference paper, Published paper (Refereed)
Abstract [en]

Parallel, distributed two-level control system has been adopted in streaming application accelerators that implement atomic vector operations. Each instruction of such architecture deals with one aspect (arithmetic, interconnect, storage, etc.) of an atomic vector operation. Such instructions are persistent and fully cooperative. Their lifetimes vary because of the vector size and the degree of parallelism. More complex constraints are also required to express the cooperation among these instructions. The conventional instruction behavior models are no longer suitable for such instructions. Therefore, we develop a novel instruction behavior model to address the scheduling aspect of the instruction set required by such architecture. Based on the behavior model, we formally define the scheduling problem and formulate it as a constraint satisfaction optimization problem (CSOP). However, the naive CSOP formulation quickly becomes unscalable. Thus a heuristic enhanced scheduling algorithm is introduced to make the CSOP approach scalable. The enhanced algorithm's scalability is validated by a large set of experiments varying in problem size.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE) , 2021. p. 229-237
Keywords [en]
Instruction scheduling, CGRA, Two-level control, Constraint programming
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-307013DOI: 10.1109/DSD53832.2021.00044ISI: 000728394500035Scopus ID: 2-s2.0-85125768770OAI: oai:DiVA.org:kth-307013DiVA, id: diva2:1626833
Conference
24th Euromicro Conference on Digital System Design (DSD), SEP 01-03, 2021, Palermo, ITALY
Note

Part of proceedings ISBN 978-1-6654-2703-6

Not duplicate with DiVA 1588102 which has the same title but is a different conference.

QC 20220112

Available from: 2022-01-12 Created: 2022-01-12 Last updated: 2022-09-13Bibliographically approved
In thesis
1. High-Level Synthesis for SiLago: Advances in Optimization of High-Level Synthesis Tool and Neural Network Algorithms
Open this publication in new window or tab >>High-Level Synthesis for SiLago: Advances in Optimization of High-Level Synthesis Tool and Neural Network Algorithms
2022 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Embedded hardware designs and their automation improve energy and engineering efficiency. However, these two goals are often contradictory. The attempts to improve energy efficiency often come at the cost of engineering efficiency and vice-versa. High-level synthesis (HLS) is a good example of this challenge. It has been researched for more than three decades. Nevertheless, it has not become a mainstream design flow component concerning custom hardware synthesis due to the big efficiency gap between the HLS-generated hardware design and the manual RTL design.

This thesis attempts to address the HLS challenge. We divide the research challenge of improving state-of-the-art HLS into three components: 1) the hardware architecture and its underlying VLSI design style, 2) the design automation algorithms and data structures, and 3) the optimization of the algorithm to be mapped.

The SiLago hardware platform has been reported as a prominent hardware architecture that can deliver ASIC-like efficiency and could be an ideal HLS hardware platform. It has the following features: 1) SiLago embodies parallel distributed two-level control. 2) SiLago blocks are hardened blocks that can create valid VLSI designs by abutment without involving logic or physical synthesis.

Consequently, when targeting the SiLago hardware platform, the SiLago HLS tool generates not a single controller but multiple collaborative controllers, each of which is a hierarchy of two levels. The distributed two-level control scheme poses unique challenges in synchronization and scheduling tasks. Unique data structures and instruction scheduling models are developed for the SiLago HLS tool to support the distributed two-level control scheme. The SiLago HLS tool also generates a valid GDSII macro whose average energy, area, and performance are not estimated but known with post-layout accuracy thanks to the predictable SiLago hardware blocks. Moreover, the SiLago HLS tool is not intended for the end-user. It is designed to develop a library of algorithm implementations used by the application-level synthesis (ALS) tool in the SiLago framework. The application is defined as a hierarchy of algorithms. This library would include algorithms that vary in their function, dimension, and degree of parallelism. The ALS tool explores the design space in terms of number and type of algorithm implementation, rather than arithmetic resources, as HLS tools do.

Algorithms are often developed by domain experts. For efficient implementation in hardware, such algorithms often need to be optimized with the hardware platform in mind. Two algorithm instances have been chosen for demonstration purposes. The first instance is a self-organizing map (SOM) based genome recognition algorithm. The second example concerns a complex model of cortex called Bayesian confidence propagation neural network (BCPNN). As developed by computational neuroscientists, the original model demands too much memory storage and memory access.

This thesis addresses the latter two components because the first component has been addressed in the literature. We will first demonstrate the design of the SiLago HLS tool to support the hardware features like the distributed two-level control system. Moreover, we will use the two complex algorithm instances -- SOM and BCPNN, to demonstrate both general-purpose and algorithm-specific hardware-oriented algorithm optimization techniques. With the research carried out in this thesis, the SiLago HLS framework is greatly improved.

Abstract [sv]

Automatiseringen av inbyggda system ökar ingenjörers produktivitet och minskar systemens energiförbrukning. Målen är ofta motstridiga då högre ingenjörsproduktivitet sker på bekostnad av energiförbrukning och vice versa. Högnivåsyntes (high-level synthesis, HLS) exemplifierar dilemmat. Trots att det forskats i mer än tre decennier på HLS har inte designmetodiken blivit mainstream inom elektronikdesign på grund av det stora effektivitetsgapet mellan den HLS-genererade och den manuellt skapade RTL-designen.

Avhandlingen handlar om detta dilemma. Forskningsutmaningarna kring att förbättra HLS avhandlar vi i tre delar: 1) hårdvaruarkitektur och underliggande VLSI-designmetodik, 2) elektronikdesignens algoritmer och datastruktuer och 3) optimering av den algoritm som ska implementeras.

SiLago-plattformen har visats vara en framstående hårdvaruarkitektur som kan uppnå ASIC-liknande prestanda samtidigt som den är en idealisk för HLS. Plattformen har följande egenskaper: 1) SiLago förkroppsligar den parallella distribuerade två-nivåskontrollparadigmen, 2) SiLago-komponenter är försyntetiserade med vilka funktionella VLSI-designer kan skapas genom hopfogning utan ytterligare logisk eller fysisk syntes.

Därför skapar inte SiLagos HLS-verktyg en enda controller utan flera stycken samverkande controllers. Var och en av dessa består av en tvånivåshierarki. Detta medför unika synkroniserings- och schemaläggningsutmaningar. Unika datastrukturer och schemaläggningsmodeller har utvecklats för SiLago HLS för att stödja denna två-nivåskontrollparadigm. Därutöver skapar SiLagos HLS-verktyg GDSII-makron vars genomsnittliga energiförbrukning, yta och prestanda inte uppskattas utan bestäms med post-layout precision tack vare SiLagos försyntetiserade komponenter. Målgruppen för SiLagos HLS-verktyg är inte slutanvändare utan utvecklare som utvecklar algoritmbibliotek som sedan används av SiLagos applikationssyntesverktyg (application level synthesis, ALS). Applikationen ses som en hierarki av algoritmer. Biblioteket kan innehålla algoritmer vars egenskaper skiljer, såsom olika funktioner, dimensioner och parallelliseringsmöjligheter. ALS-verktyget utforskar designrymden i termer av antal och algoritmtyper, istället för aritmetiska resurser, som konkurrerande HLS-verktyg gör.

Algoritmer utvecklas ofta av sakkunniga. För att de ska kunna realiseras i hårdvara måste de optimeras med målplattformen i åtanke. Två algoritmer har valts som demonstrationsexempel. Det första exemplet är en genidentifieringsalgoritm som bygger på en självorganiserande karta (self-organizing map, SOM). Det andra exemplet är en avancerad modell av hjärnbarken, ett bayesiskt neuralt överföringsnätverk (Bayesian confidence propagation network, BCPNN). Då modellen utvecklats av beräkningsneurovetenskapspersoner kräver den för mycket lagring och överföringskapacitet.

Avhandlingen handlar om de två sistnämnda delarna eftersom den förstnämnda redan avhandlats av andra. Vi visar hur SiLago HLS stödjer distribuerade tvånivåskontrollsystem. Därutöver och med de två nämnda algoritmexemplen - SOM och BCPNN - demonstrerar vi algoritmspecifika och plattformsspecifika optimeringstekniker. Forskningen som beskrivs i avhandlingen har signifikant förbättrat SiLagos HLS-ramverk.

Place, publisher, year, edition, pages
Sweden: KTH Royal Institute of Technology, 2022. p. 76
Series
TRITA-EECS-AVL ; 2022:48
Keywords
Electronic Design Automation (EDA), Computer Aided Design (CAD), Algorithm-level Synthesis, SiLago, Optimization Techniques, Neural Network
National Category
Embedded Systems
Research subject
Information and Communication Technology
Identifiers
urn:nbn:se:kth:diva-317555 (URN)978-91-8040-300-9 (ISBN)
Public defence
2022-10-06, Ka-Sal C, Electrum, Kungliga Tekniska Högskolan, Kistagången 16, Stockholm, 13:00 (English)
Opponent
Supervisors
Note

QC 20220914

Available from: 2022-09-14 Created: 2022-09-13 Last updated: 2022-09-14Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Yang, YuHemani, Ahmed

Search in DiVA

By author/editor
Yang, YuHemani, Ahmed
By organisation
Electronic and embedded systems
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 93 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf