kth.sePublications
Change search
Refine search result
1 - 4 of 4
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Johnsson, L.
    et al.
    Netzer, Gilbert
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    The impact of Moore's Law and loss of Dennard scaling: Are DSP SoCs an energy efficient alternative to x86 SoCs?2016In: Journal of Physics, Conference Series, ISSN 1742-6588, E-ISSN 1742-6596, Vol. 762, no 1, article id 012022Article in journal (Refereed)
    Abstract [en]

    Moore's law, the doubling of transistors per unit area for each CMOS technology generation, is expected to continue throughout the decade, while Dennard voltage scaling resulting in constant power per unit area stopped about a decade ago. The semiconductor industry's response to the loss of Dennard scaling and the consequent challenges in managing power distribution and dissipation has been leveled off clock rates, a die performance gain reduced from about a factor of 2.8 to 1.4 per technology generation, and multi-core processor dies with increased cache sizes. Increased caches sizes offers performance benefits for many applications as well as energy savings. Accessing data in cache is considerably more energy efficient than main memory accesses. Further, caches consume less power than a corresponding amount of functional logic. As feature sizes continue to be scaled down an increasing fraction of the die must be "underutilized" or "dark" due to power constraints. With power being a prime design constraint there is a concerted effort to find significantly more energy efficient chip architectures than dominant in servers today, with chips potentially incorporating several types of cores to cover a range of applications, or different functions in an application, as is already common for the mobile processor market. Digital Signal Processors (DSPs), largely targeting the embedded and mobile processor markets, typically have been designed for a power consumption of 10% or less of a typical x86 CPU, yet with much more than 10% of the floating-point capability of the same technology generation x86 CPUs. Thus, DSPs could potentially offer an energy efficient alternative to x86 CPUs. Here we report an assessment of the Texas Instruments TMS320C6678 DSP in regards to its energy efficiency for two common HPC benchmarks: STREAM (memory system benchmark) and HPL (CPU benchmark).

  • 2.
    Netzer, Gilbert
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Johnsson, Lennart
    University of Houston.
    Ahlin, Daniel
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Instrumentation for accurate energy-to-solution measurements of a texas instruments TMS320C6678 digital signal processor and its DDR3 memory2014In: E2SC ’14: Proceedings of the 2nd International Workshop on Energy Efficient Supercomputing, 2014, p. 89-98Conference paper (Refereed)
    Abstract [en]

    Architectural choices for High-Performance Computingsystems have once again become interesting with energyefficiency for targeted workloads now being a major decisionfactor. A detailed understanding of the energy consumptionof major system components during code execution is criticalfor evolving architectures towards enhanced energy efficiency.The focus of this paper is on the measurement system hard- and software we designed and implemented for the assessmentof the energy-to-solution of HPC workloads for the Texas Instruments TMS320C6678 (6678) Digital Signal Processor. The 6678’s thermal design power falls between x86 serverprocessors and mobile CPUs and so does its floating-pointand memory system capabilities. Yet, compared to those types of processors in corresponding CMOS technology, it offers a potentially significant energy advantage. The measurement system is described together with a thorough error analysis. Measurements are processed out-of-band minimizing the impact on the measured system. Sample observations of the energy efficiency of the 6678 and its memory system are included for illustration.

  • 3.
    Netzer, Gilbert
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Johnsson, Lennart
    University of Houston.
    Ahlin, Daniel
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Stotzer, Eric
    Texas Instruments.
    Varis, Pekka
    Texas Instruments.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Exploiting DMA for Performance and Energy Optimized STREAM on a DSP2014In: IPDPSW ’14: Proceedings of the 2014 IEEE International Parallel & Distributed Processing Symposium Workshops, 2014, p. 805-814Conference paper (Refereed)
    Abstract [en]

    Energy efficiency is of major concern in HPC.DSP architectures have the potential to offer highly competitiveenergy efficiency for applications requiring 64-bit floatingpointprecision. For STREAM, we achieved 1.47GB/J energy efficiency and 96% DDR3 memory bandwidth utilization on the Texas Instruments TMS320C6678 DSP by using its DMAengines for prefetching to avoid cache misses, which cause pipeline stalls in the DSP’s cores, and to prevent write-allocate loads, which would significantly reduce performance. The DMA engines were also used to coordinate the DSPs cores and schedule main memory accesses to improve DDR3 bandwidth utilization. We briefly describe the instrumentation that we designed and implemented for accurate measurement of the core-related, on-chip memory, and DDR3 power consumption and the effectiveness of the DSP’s power saving mechanisms to trade-off performance and energy efficiency.

  • 4.
    Netzer, Gilbert
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    QHDL: a Low-Level Circuit Description Language for Quantum Computing2023In: Proceedings of the 20th ACM International Conference on Computing Frontiers 2023, CF 2023, Association for Computing Machinery (ACM) , 2023, p. 201-204Conference paper (Refereed)
    Abstract [en]

    This paper proposes a descriptive language called QHDL, akin to VHDL, to program gate-based quantum computing systems. Unlike other popular quantum programming languages, QHDL targets low-level quantum computing programming and aims to provide a common framework for programming FPGAs and gate-based quantum computing systems. The paper presents an initial implementation and design principles of the QHDL framework, including a compiler and quantum computer simulator. We discuss the challenges of low-level integration of streaming models and quantum computing for programming FPGAs and gate-based quantum computing systems.

1 - 4 of 4
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf