Change search
Refine search result
1 - 19 of 19
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1. Azad, S. P.
    et al.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Customization methodology of a Coarse Grained Reconfigurable architecture2015In: NORCHIP 2014 - 32nd NORCHIP Conference: The Nordic Microelectronics Event, 2015Conference paper (Refereed)
    Abstract [en]

    Mapping algorithms on CGRAs can lead to an inefficient implementation and hardware under-utilization if there is a mismatch between the granularity of reconfigurable processing unit and the algorithm. In this paper, we introduce a tool that takes the hardware configuration of a set of applications, identifies the unused parts of the CGRA, and let the user sweep the design space from fully programmable to fully customized by eliminating the unused components. User can select among multiple design points according to the application specification. This method is very useful to design multi-mode ASIC accelerators. The fully customized hardware generated using our tool has a negligible area and power overhead compared to the equivalent ASIC but can be generated significantly faster.

  • 2.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    SiLago: Enabling System Level Automation Methodology to Design Custom High-Performance Computing Platforms: Toward Next Generation Hardware Synthesis Methodologies2016Doctoral thesis, comprehensive summary (Other academic)
  • 3.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    A conceptual custom super-computer design for real-time simulation of human brain2013In: 2013 21st Iranian Conference on Electrical Engineering, ICEE 2013, 2013, p. 1-6Conference paper (Refereed)
    Abstract [en]

    In this paper, we introduce BRIC, a novel custom multi-chip digital computer architecture for simulating in realtime a model of human brain in form of a spiking Bayesian Confidence Propagation Neural Network (BCPNN). The design is conceptually dimensioned for available technology in 2015-2020 with the estimated size of a pizza box, consuming less than 3 kWs of power, delivering 800 Teraflops/sec (single precision multiply operation) and 30 TBs of memory. To the best of our knowledge, this will be the smallest and lowest power real-time brain simulation engine if manufactured. The silicon and computational efficiencies come from use of 3D memory stacking, innovation in algorithm and architectural customization. The chip will be programmable allowing experimentation with variants of the BCPNN brain model.

  • 4.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Atomic stream computation unit based on micro-thread level parallelism2015In: IEEE 26th Application-specific Systems, Architectures and Processors (ASAP) 2015, IEEE , 2015, p. 25-29Conference paper (Refereed)
    Abstract [en]

    The increasing demand for higher resolution of images and communication bandwidth requires the streaming applications to deal with ever increasing size of datasets. Further, with technology scaling the cost of moving data is reducing at a slower pace compared to the cost of computing. These trends have motivated the proposed micro-architectural reorganization of stream processors by dividing the stream computation into functional computation, address constraints computation and address generation and deploying independent, distributed micro-threads to implement them. This scheme is an alternative to parallelizing them at instruction level. The proposed scheme has two benefits: a more efficient sequencer logic and energy savings in address generation and transportation. These benefits are quantified for a set of streaming applications and show average percentage improvement of 39 in silicon efficiency of the sequencer logic and 23 in total computational efficiency.

  • 5.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Jafri, S. M. A. H.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Sohofi, Hassan
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    SiLago: A Structured Layout Scheme to Enable Efficient High Level and System Level Synthesis2016Report (Other academic)
  • 6.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Lansner, Anders
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Clermidy, F.
    Svensson, C.
    A scalable custom simulation machine for the Bayesian Confidence Propagation Neural Network model of the brain2014In: 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE , 2014, p. 578-585Conference paper (Refereed)
    Abstract [en]

    A multi-chip custom digital super-computer called eBrain for simulating Bayesian Confidence Propagation Neural Network (BCPNN) model of the human brain has been proposed. It uses Hybrid Memory Cube (HMC), the 3D stacked DRAM memories for storing synaptic weights that are integrated with a custom designed logic chip that implements the BCPNN model. In 22nm node, eBrain executes BCPNN in real time with 740 TFlops/s while accessing 30 TBs synaptic weights with a bandwidth of 112 TBs/s while consuming less than 6 kWs power for the typical case. This efficiency is three orders better than general purpose supercomputers in the same technology node.

  • 7.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Indian Institute of Technology, Delhi, India.
    Distributed Runtime Computation of Constraints for Multiple Inner Loops2013In: Proceedings - 16th Euromicro Conference on Digital System Design, DSD 2013, New York: IEEE , 2013, p. 389-395Conference paper (Refereed)
    Abstract [en]

    This paper presents hardware solution for runtime computation of loop constraints and synchronizing delays for multiple inner loops in parallel distributed implementation of digital signal processing sub-systems. Methods to map and generate the runtime computation code for loop constraints and synchronizing delays are also presented. Compared to the traditional methods, the proposed solution achieves 55% average code compaction and 32.7% average performance improvement. The solution has modest hardware cost that increases linearly with the dimension of the architecture and has no performance penalty. Results from multiple realistic examples are presented, analyzed and compared to the traditional methods.

  • 8.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Sohofi, Hassan
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    AlgoSil: A High Level Synthesis Tool targeting Micro-architecture Level Physical Design Platform2016Report (Other academic)
  • 9.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sohofi, Hassan
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Jafri, Syed M. A. H.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Tajammul, Muhammad Adeel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Parallel distributed scalable runtime address generation scheme for a coarse grain reconfigurable computation and storage fabric2014In: Microprocessors and microsystems, ISSN 0141-9331, E-ISSN 1872-9436, Vol. 38, no 8, p. 788-802Article in journal (Refereed)
    Abstract [en]

    This paper presents a hardware based solution for a scalable runtime address generation scheme for DSP applications mapped to a parallel distributed coarse grain reconfigurable computation and storage fabric. The scheme can also deal with non-affine functions of multiple variables that typically correspond to multiple nested loops. The key innovation is the judicious use of two categories of address generation resources. The first category of resource is the low cost AGU that generates addresses for given address bounds for affine functions of up to two variables. Such low cost AGUs are distributed and associated with every read/write port in the distributed memory architecture. The second category of resource is relatively more complex but is also distributed but shared among a few storage units and is capable of handling more complex address generation requirements like dynamic computation of address bounds that are then used to configure the AGUs, transformation of non-affine functions to affine function by computing the affine factor outside the loop, etc. The runtime computation of the address constraints results in negligibly small overhead in latency, area and energy while it provides substantial reduction in program storage, reconfiguration agility and energy compared to the prevalent pre-computation of address constraints. The efficacy of the proposed method has been validated against the prevalent address generation schemes for a set of six realistic DSP functions. Compared to the pre-computation method, the proposed solution achieved 75% average code compaction and compared to the centralized runtime address generation scheme, the proposed solution achieved 32.7% average performance improvement.

  • 10.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Sohofi, Hassan
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Li, Shuo
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Physical Design Aware System Level Synthesis of Hardware2015In: Proceedings - Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS), 2015, IEEE , 2015, p. 141-148Conference paper (Refereed)
    Abstract [en]

    In spite of decades of research, only a small percentage of hardware is designed using high-level synthesis because of the large gap between the abstraction levels of standard cells and algorithmic level. We propose a grid-based regular physical design platform composed of large grain hardened building blocks called SiLago blocks. This platform is divided into regions which are specialized for different functionalities like computation, storage, system control, etc. The characterized micro-architectural operations of the SiLago platform serve as the interface to meet-in-the-middle high-level and system-level syntheses framework. This framework was used to generate three hardware macro instances, derived from SiLago platform for three applications from signal processing domain. Results show two orders of magnitude improvements in efficiency of the system-level design space exploration and synthesis time, with average loss in design quality of 18% for energy and 54% for area compared to the commercial SOC flow.

  • 11.
    Farahini, Nasim
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Li, Shuo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Tajammul, Muhammad Adeel
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Shami, Muhammad Ali
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Chen, Guo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Ye, Wei
    Huawei, Wireless Beijing Division, China.
    39.9 GOPs/watt multi-mode CGRA accelerator for a multi-standard basestation2013In: 2013 IEEE International Symposium on Circuits and Systems (ISCAS), IEEE , 2013, p. 1448-1451Conference paper (Refereed)
    Abstract [en]

    This paper presents an industrial case study of using a Coarse Grain Reconfigurable Architecture (CGRA) for a multi-mode accelerator for two kernels: FFT for the LTE standard and the Correlation Pool for the UMTS standard to be executed in a mutually exclusive manner. The CGRA multi-mode accelerator achieved computational efficiency of 39.94 GOPS/watt (OP is multiply-add) and silicon efficiency of 56.20 GOPS/mm2. By analyzing the code and inferring the unused features of the fully programmable solution, an in-house developed tool was used to automatically customize the design to run just the two kernels and the two efficiency metrics improved to 49.05 GOPS/watt and 107.57 GOPS/mm2. Corresponding numbers for the ASIC implementation are 63.84 GOPS/watt and 90.91 GOPS/mm2. Though the ASIC’s silicon and computational efficiency numbers are slightly better, the engineering efficiency of the pre-verified/characterized CGRA solution is at least 10X better than the ASIC solution.

  • 12.
    Hemani, Ahmed
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Farahini, Nasim
    Jafri, Syed
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Sohofi, Hassan
    KTH, School of Information and Communication Technology (ICT).
    Li, Shuo
    KTH, School of Information and Communication Technology (ICT).
    Paul, K.
    The silago solution: Architecture and design methods for a heterogeneous dark silicon aware coarse grain reconfigurable fabric2017In: The Dark Side of Silicon: Energy Efficient Computing in the Dark Silicon Era, Springer, 2017, p. 47-94Chapter in book (Refereed)
    Abstract [en]

    The dark silicon constraint will restrict the VLSI designers to utilize an increasingly smaller percentage of transistors as we progress deeper into nano-scale regime because of the power delivery and thermal dissipation limits. The best way to deal with the dark silicon constraint is to use the transistors that can be turned on as efficiently as possible. Inspired by this rationale, the VLSI design community has adopted customization as the principal means to address the dark silicon constraint. Two categories of customization, often in tandem have been adopted by the community. The first is the processors that are heterogeneous in functionality and/or have ability to more efficiently match varying functionalities and runtime load. The second category of customization is based on the fact that hardware implementations often offer 2–6 orders more efficiency compared to software. For this reason, designers isolate the power and performance critical functionality and map them to custom hardware implementations called accelerators. Both these categories of customizations are partial in being compute centric and still implement the bulk of functionality in the inefficient software style. In this chapter, we propose a contrarian approach: implement the bulk of functionality in hardware style and only retain control intensive and flexibility critical functionality in small simple processors that we call flexilators. We propose using a micro-architecture level coarse grain reconfigurable fabric as the alternative to the Boolean level standard cells and LUTs of the FPGAs as the basis for dynamically reconfigurable hardware implementation. This coarse grain reconfigurable fabric allows dynamic creation of arbitrarily wide and deep datapath with their hierarchical control that can be coupled with a cluster of storage resources to create private execution partitions that host individual applications. Multiple such partitions can be created that can operate at different voltage frequency operating points. Unused resources can be put into a range of low power modes. This CGRA fabric allows not just compute centric customization but also interconnect, control, storage and access to storage can be customized. The customization is not only possible at compile/build time but also at runtime to match the available resources and runtime load conditions. This complete, micro-architecture level hardware centric customization overcomes the limitations of partial compute centric customization offered by the state-of-the-art accelerator-rich heterogeneous multi-processor implementation style by extracting more functionality and performance from the limited number of transistors that can be turned on. Besides offering complete and more effective customization and a hardware centric implementation style, we also propose a methodology that dramatically reduces the cost of customization. This methodology is based on a concept called SiLago (Silicon Large Grain Objects) method. The core idea behind the SiLago method is to use large grain micro-architecture level hardened and characterized blocks, the SiLago blocks, as the atomic physical design building blocks and a grid based structured layout scheme that enables composition of the SiLago fabric simply by abutting the blocks to produce a timing and DRC clean GDSII design. Effectively, the SiLago method raises the abstraction of the physical design to micro-architectural level from the present Boolean level standard cell and LUT based physical design. This significantly improves the efficiency and predictability of synthesis from higher levels of abstraction. In addition, it also enables true system-level synthesis that by virtue of correct-by-construction guarantee eliminates the costly functional verification step. The proposed solution allows a fully customized design with dynamic fine grain power management to be automatically generated from Simulink down to GDSII with computational and silicon efficiencies that are modestly lower than ASIC. The micro-architecture level SiLago block based design process with correct by construction guarantee is 5–6 orders more efficient and 2 orders more accurate compared to the Boolean standard cell based design flows.

  • 13.
    Jafri, Syed
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics.
    SiLago-CoG: Coarse-Grained Grid-Based Design for Near Tape-Out Power Estimation Accuracy at High Level2017In: 2017 IEEE Computer Society Annual Symposium on VLSI ISVLSI 2017: 3-5 July 2017, Bochum, North Rhine-Westfalia, Germany : proceedings, IEEE Computer Society, 2017, Vol. 2017, p. 25-31, article id 7987490Conference paper (Refereed)
    Abstract [en]

    It is well known that ASICs have orders of magnitude higher power efficiency than general propose processors. However, due to the high engineering and manufacturing cost only handful of companies can afford to design ASICs. To reduce this cost numerous high-level synthesis tools have emerged since last 2-3 decades. In spite of these tools, ASIC design is still considered expensive because they fail to accurately predict the cost metrics. The inaccuracy is costly as it results in multiple iterations between RTL, logic synthesis, and physical design. The major reason behind this inaccuracy, at high level, is unavailability of information like wiring, orientation, and placement of hardware blocks. To tackle this issue, recent works have proposed to raise the abstraction of the physical design from standard cells to micro-architectural blocks physically organized in a structured grid based layout scheme. While these works have been successful in accurately predicting area and timing, to the best of our knowledge their effectiveness in accurately estimating power is yet to be determined. SiLago-CoG provides an efficient technique to characterize these blocks and estimate power at high level. Simulation and synthesis results reveal that SiLago-CoG provides up to 15X better power estimates in 680X less time at the cost of up to 50% additional area, compared to state-of-the-art.

  • 14.
    Jafri, Syed. M. A. H.
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Bag, Ozan
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Kolin, Paul
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Plosila, Juha
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Energy-Aware Coarse-Grained Reconfigurable Architectures using Dynamically Reconfigurable Isolation Cells2013In: Proceedings Of The Fourteenth International Symposium On Quality Electronic Design (ISQED 2013), 2013, p. 104-111Conference paper (Refereed)
    Abstract [en]

    This paper presents a self adaptive architecture to enhance the energy efficiency of coarse-grained reconfigurable architectures (CGRAs). Today, platforms host multiple applications, with arbitrary inter-application communication and concurrency patterns. Each application itself can have multiple versions (implementations with different degree of parallelism) and the optimal version can only be determined at runtime. For such scenarios, traditional worst case designs and compile time mapping decisions are neither optimal nor desirable. Existing solutions to this problem employ costly dedicated hardware to configure the operating point at runtime (using DVFS). As an alternative to dedicated hardware, we propose exploiting the reconfiguration features of modern CGRAs. Our solution relies on dynamically reconfigurable isolation cells (DRICs) and autonomous parallelism, voltage, and frequency selection algorithm (APVFS). The DRICs reduce the overheads of DVFS circuitry by configuring the existing resources as isolation cells. APVFS ensures high efficiency by dynamically selecting the parallelism, voltage and frequency trio, which consumes minimum power to meet the deadlines on available resources. Simulation results using representative applications (Matrix multiplication, FIR, and FFT) showed up to 23% and 51% reduction in power and energy, respectively, compared to traditional DVFS designs. Synthesis results have confirmed significant reduction in area overheads compared to state of the art DVFS methods.

  • 15. Jafri, Syed M. A. H.
    et al.
    Ozbag, Ozan
    KTH.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Paul, Kolin
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics and Embedded Systems.
    Plosila, Juha
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Industrial and Medical Electronics.
    Architecture and Implementation of Dynamic Parallelism, Voltage and Frequency Scaling (PVFS) on CGRAs2015In: ACM Journal on Emerging Technologies in Computing Systems, ISSN 1550-4832, Vol. 11, no 4, article id 40Article in journal (Refereed)
    Abstract [en]

    In the era of platforms hosting multiple applications with arbitrary performance requirements, providing a worst-case platform-wide voltage/frequency operating point is neither optimal nor desirable. As a solution to this problem, designs commonly employ dynamic voltage and frequency scaling (DVFS). DVFS promises significant energy and power reductions by providing each application with the operating point (and hence the performance) tailored to its needs. To further enhance the optimization potential, recent works interleave dynamic parallelism with conventional DVFS. The induced parallelism results in performance gains that allow an application to lower its operating point even further (thereby saving energy and power consumption). However, the existing works employ costly dedicated hardware (for synchronization) and rely solely on greedy algorithms to make parallelism decisions. To efficiently integrate parallelism with DVFS, compared to state-of-the-art, we exploit the reconfiguration (to reduce DVFS synchronization overheads) and enhance the intelligence of the greedy algorithm (to make optimal parallelism decisions). Specifically, our solution relies on dynamically reconfigurable isolation cells and an autonomous parallelism, voltage, and frequency selection algorithm. The dynamically reconfigurable isolation cells reduce the area overheads of DVFS circuitry by configuring the existing resources to provide synchronization. The autonomous parallelism, voltage, and frequency selection algorithm ensures high power efficiency by combining parallelism with DVFS. It selects that parallelism, voltage, and frequency trio which consumes minimum power to meet the deadlines on available resources. Synthesis and simulation results using various applications/algorithms (WLAN, MPEG4, FFT, FIR, matrix multiplication) show that our solution promises significant reduction in area and power consumption (23% and 51%) compared to state-of-the-art.

  • 16.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Bag, Ozan
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Paul, Kolin
    Plosila, Juha
    University of Turku, Finland.
    Tenhunen, Hannu
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Energy-Aware CGRAs using Dynamically Re-configurable isolation Cells2013Conference paper (Refereed)
    Abstract [en]

    This paper presents a self adaptive architectureto enhance the energy efficiency of coarse-grained reconfigurablearchitectures (CGRAs). Today, platforms host multipleapplications, with arbitrary inter-application communication andconcurrency patterns. Each application itself can have multipleversions (implementations with different degree of parallelism)and the optimal version can only be determined at runtime. Forsuch scenarios, traditional worst case designs and compile timemapping decisions are neither optimal nor desirable. Existingsolutions to this problem employ costly dedicated hardware toconfigure the operating point at runtime (using DVFS). As analternative to dedicated hardware, we propose exploiting thereconfiguration features of modern CGRAs. Our solution relieson dynamically reconfigurable isolation cells (DRICs) and autonomousparallelism, voltage, and frequency selection algorithm(APVFS). The DRICs reduce the overheads of DVFS circuitryby configuring the existing resources as isolation cells. APVFSensures high efficiency by dynamically selecting the parallelism,voltage and frequency trio, which consumes minimum powerto meet the deadlines on available resources. Simulation resultsusing representative applications (Matrix multiplication, FIR,and FFT) showed up to 23% and 51% reduction in powerand energy, respectively, compared to traditional DVFS designs.Synthesis results have confirmed significant reduction in areaoverheads compared to state of the art DVFS methods.

  • 17.
    Lansner, Anders
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Biology, CB.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Spiking brain models: Computation, memory and communication constraints for custom hardware implementation2014In: 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC), IEEE , 2014, p. 556-562Conference paper (Refereed)
    Abstract [en]

    We estimate the computational capacity required to simulate in real time the neural information processing in the human brain. We show that the computational demands of a detailed implementation are beyond reach of current technology, but that some biologically plausible reductions of problem complexity can give performance gains between two and six orders of magnitude, which put implementations within reach of tomorrow's technology.

  • 18.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Global control and storage synthesis for a system level synthesis approach2013In: Proceedings - 21st Annual International IEEE Symposium on Field-Programmable Custom Computing Machines, FCCM 2013, IEEE , 2013, p. 6546036-Conference paper (Refereed)
    Abstract [en]

    SYLVA is a System Level Architectural Synthesis Framework that translates Synchronous Data Flow (SDF) models of DSP sub-systems like modems and codecs into hardware implementation in ASIC/Standard Cells, FPGAs or CGRAs (Coarse Grain Reconfigurable Fabric).

  • 19.
    Li, Shuo
    et al.
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Farahini, Nasim
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Rosvall, Kathrin
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    Sander, Ingo
    KTH, School of Information and Communication Technology (ICT), Electronic Systems.
    System level synthesis of hardware for DSP applications using pre-characterized function implementations2013In: 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS), IEEE , 2013Conference paper (Refereed)
    Abstract [en]

    SYLVA is a system level synthesis framework that transforms DSP sub-systems modeled as synchronous data flow into hardware implementations in ASIC, FPGAs or CGRAs. SYLVA synthesizes in terms of pre-characterized function implementations (FTMPs). It explores the design space in three dimensions, number of FTMPs, type of FTMPs and pipeline parallelism between the producing and consuming FTMPs. We introduce timing and interface model of FTMPs to enable reuse and automatic generation of Global Interconnect and Control (GLIC) to glue the FTMPs together into a working system. SYLVA has been evaluated by applying it to five realistic DSP applications and results analyzed for design space exploration, efficacy in generating GLIC by comparing to manually generated GLIC and accuracy of design space exploration by comparing the area and energy costs considered during the design space exploration based on pre-characterized FIMPs and the final results.

1 - 19 of 19
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf