Change search
Refine search result
1 - 6 of 6
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Chaourani, Panagiotis
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics, Integrated devices and circuits.
    Stathis, Dimitrios
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Rodriguez, Saul
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Hellström, Per-Erik
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Rusu, Ana
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    A Study on Monolithic 3-D RF/AMS ICs: Placing Digital Blocks Under Inductors2018In: IEEE SOI-3D-Subthreshold Microelectronics Technology Unified Conference (S3S), IEEE conference proceedings, 2018Conference paper (Refereed)
    Abstract [en]

    The placement of bottom tier blocks under top-tierinductors could significantly improve the area-efficiency of M3DRF/AMS circuits, paving the way for new applications of thisintegration technology. This work investigates the potential ofplacing digital blocks in the bottom tier, underneath top tierinductors. A design-technology co-optimization flow is appliedand a number of design guidelines are suggested. These guidelinesensure high electromagnetic isolation between the two tiers, withminimum penalties on the loading of bottom tier wires, as wellas on the inductor’s performance.

  • 2.
    Jafri, Syed Mohammad Asad Hassan
    et al.
    KTH.
    Hemani, Ahmed
    KTH.
    Stathis, Dimmitrios
    KTH.
    Can a reconfigurable architecture beat ASIC as a CNN accelerator?2018In: Proceedings - 2017 17th International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation, SAMOS 2017, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 97-104Conference paper (Refereed)
    Abstract [en]

    To exploit the high accuracy, inherent redundancy, and embarrassingly parallel nature of Convolutional Neural Networks (CNN), for intelligent embedded systems, many dedicated CNN accelerators have been presented. These accelerators are optimized to employ compression, tiling, and layer merging for a specific data flow/parallelism pattern. However, the dimension of a CNN differ widely from one application to another (and also from one layer to another). Therefore, the optimal parallelism and data flow pattern also differs significantly in different CNN layers. An efficient accelerator should have flexibility to not only efficiently support different data flow patterns but also to interleave and cascade them. To achieve this ability requires configuration overheads. This paper analyzes whether the reconfiguration overheads for interleaving and cascading multiple data flow and parallelism patterns are justified. To answer this question, we first design a reconfigurable CNN accelerator, called ReCon. ReCon is the compared with state-of-the-art accelerators. Post-layout synthesis results reveal that ReCon provides up to 2.2X higher throughput and up to 2.3X better energy efficiency at the cost of 26-35% additional area.

  • 3.
    Stathis, Dimitrios
    KTH, School of Information and Communication Technology (ICT).
    A SystemC model for the eBrain2017Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
    Abstract [en]

    The development of neural networks has become one of the most interesting topics in the scientific community. Systems that are based on the brain behavior can find applications in a wide variety of fields, from simulating the brain to better understand it (applications in neuroscience), to control theory and super computing. Brain-like systems could possible be a new kind of computer architecture that will lead us away from the classic von Neumann architecture. That can help us bypass the problems that we now face, with the Moore’s low slowing down and complex problems becoming all the more common. With brain-like computing, we might be in the road to computer systems that are no longer programmed but taught. To-date, the most common platform for simulating such systems are the GPGPUs and super computers. But they lack on scalability, and real time simulations are far from trivial. Because of that there is an interest in custom hardware implementation of such system (in ASIC or FPGAs).

    In this work, we focus on the ASIC design of such a system. Specifically, with the characterization and design space exploration of the eBrain architecture, a hardware architecture for the BCPNN model. During the design process of an ASIC, in order to be able to characterize it, the simulation of the synthesized physical design of the RTL model is required. Those kinds of simulations require an extensive amount of time. In this thesis, to tackle with this problem a systemC model of the architecture is developed. This model is able to be modified and fits different configurations of a general hardware architecture. The systemC model can be used to reduce the amount of time the simulation requires and, by using back annotated data from synthesized parts of the hardware architecture, to provide us with accurate characterization of the design. In this work, we go through the basics of the BCPNN and the eBrain architecture. Then we develop a model that can emulate the behavior of the eBrain architecture in a probabilistic manner. A specific configuration is chosen to be explored. Furthermore, floating-point units are synthesized in the physical level in order to be able to back annotate their power measurements to the model. Moreover, the BCPNN equations are explored and implemented in an RTL level with the use of the floating-point units. Finally, an example configuration is simulated and its results are presented.

  • 4.
    Stathis, Dimitrios
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems, Electronic and embedded systems.
    Yang, Yu
    KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems, Electronic and embedded systems.
    Tewari, Saurabh
    IIT Delhi, India.
    Hemani, Ahmed
    KTH, School of Electrical Engineering and Computer Science (EECS), Electrical Engineering, Electronics and Embedded systems, Electronic and embedded systems.
    Paul, Kolin
    IIT Delhi, India.
    Grabherr, Manfred
    Uppsala University, Sweden.
    Ahmad, Rafi
    Inland University of Norway.
    Approximate Computing Applied to Bacterial Genome Identification using Self-Organizing Maps2019In: 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), IEEE Computer Society, 2019, p. 560-567, article id 8839522Conference paper (Refereed)
    Abstract [en]

    In this paper we explore the design space of a self-organizing map (SOM) used for rapid and accurate identification of bacterial genomes. This is an important health care problem because even in Europe, 70% of prescriptions for antibiotics is wrong. The SOM is trained on Next Generation Sequencing (NGS) data and is able to identify the exact strain of bacteria. This is in contrast to conventional methods that require genome assembly to identify the bacterial strain. SOM has been implemented as an synchoros VLSI design and shown to have 3-4 orders better computational efficiency compared to GPUs. To further lower the energy consumption, we exploit the robustness of SOM by successively lowering the resolution to gain further improvements in efficiency and lower the implementation cost without substantially sacrificing the accuracy. We do an in depth analysis of the reduction in resolution vs. loss in accuracy as the basis for designing a system with the lowest cost and acceptable accuracy using NGS data from samples containing multiple bacteria from the labs of one of the co-authors. The objective of this method is to design a bacterial recognition system for battery operated clinical use where the area, power and performance are of critical importance. We demonstrate that with 39% loss in accuracy in 12 bits and 1% in 16 bit representation can yield significant savings in energy and area.

  • 5.
    Yang, Y.
    et al.
    KTH.
    Jafri, Syed
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Hemani, Ahmed
    KTH, School of Information and Communication Technology (ICT), Electronics.
    Stathis, Dimitrios
    KTH, School of Information and Communication Technology (ICT), Electronics.
    MTP-caffe: Memory, timing, and power aware tool for mapping CNNs to GPUs2017In: ACM International Conference Proceeding Series, Association for Computing Machinery (ACM), 2017, p. 31-36Conference paper (Refereed)
    Abstract [en]

    In the recent past, the Convolutional Neural Networks (CNNs) have attracted intense research. The high processing requirements (of CNNs) and the availability of efficient mapping tools have made GPUs a popular CNN accelerator. To extract the maximum performance, the mapping tools transform the unsupported convolutions to GPU supported matrix multiplications. However, this transformation incurs significant memory overheads (3-5X). Furthermore, since the tool is unaware of the GPU architecture, even after the transformation the performance and power is sub-optimal. To tackle this problem we present MTP-Caffe that complements Caffe by making it memory, timing, and power aware. It analyses the CNN structure and the GPU architecture to convert a CNN into smaller parts, tailored for GPU resources. Simulation results reveal that MTP-Caffe not only eliminates the additional memory overheads but also provides up to 21% speedup and up to 23.5% less power.

  • 6.
    Yang, Yu
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Stathis, Dimitrios
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Sharma, Prashant
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Paul, K.
    Hemani, Ahmed
    KTH, School of Electrical Engineering and Computer Science (EECS), Electronics.
    Grabherr, M.
    Ahmad, R.
    RiBoSOM: Rapid bacterial genome identification using self-organizing map implemented on the synchoros SiLago platform2018In: ACM International Conference Proceeding Series, Association for Computing Machinery (ACM), 2018, p. 105-114Conference paper (Refereed)
    Abstract [en]

    Artificial Neural Networks have been applied to many traditional machine learning applications in image and speech processing. More recently, ANNs have caught attention of the bioinformatics community for their ability to not only speed up by not having to assemble genomes but also work with imperfect data set with duplications. ANNs for bioinformatics also have the added attraction of better scaling for massive parallelism compared to traditional bioinformatics algorithms. In this paper, we have adapted Self-organizing Maps for rapid identification of bacterial genomes called BioSOM. BioSOM has been implemented on a design of two coarse grain reconfigurable fabrics customized for dense linear algebra and streaming scratchpad memory respectively. These fabrics are implemented in a novel synchoros VLSI design style that enables composition by abutment. The synchoricity empowers rapid and accurate synthesis from Matlab models to create near ASIC like efficient solution. This platform, called SiLago (Silicon Lego) is benchmarked against a GPU implementation. The SiLago implementation of BioSOMs in four different dimensions, 128, 256, 512 and 1024 Neurons, were trained for two E Coli strains of bacteria with 40K training vectors. The results of SiLago implementation were benchmarked against a GPU GTX 1070 implementation in the CUDA framework. The comparison reveals 4 to 140X speed up and 4 to 5 orders of improvement in energy-delay product compared to implementation on GPU. This extreme efficiency comes with the added benefit of automated generation of GDSII level design from Matlab by using the Synchoros VLSI design style.

1 - 6 of 6
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf