kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels
Forschungszentrum Jülich GmbH, 52428, Jülich, Germany.
KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0001-7296-7817
2024 (English)In: Euro-Par 2024: Parallel Processing - 30th European Conference on Parallel and Distributed Processing, Proceedings, Springer Nature , 2024, p. 47-61Conference paper, Published paper (Refereed)
Abstract [en]

Dense matrix-matrix operations are relevant for a broad range of numerical applications, e.g. for implementing deep neural networks. Past research has led to a good understanding of how these operations can be mapped in a generic manner on typical processor architectures with multiple cache levels such that near-optimal performance can be reached. However, while commonly used micro-architectures are typically suitable for such operations, their architectural parameters need to be suitably tuned. The performance of highly optimised implementations of these operations relies on micro-kernels that are often handwritten. Given the increased variety of instruction set architectures and SIMD instruction extensions, this becomes challenging. In this paper, we present and implement a methodology for an exhaustive exploration of a processor core micro-architecture design space based on gem5 simulations. Furthermore, we present a tool for generating efficiently vectorised code leveraging Arm’s SVE and RISC-V’s RVV instructions. It enables automatisation of the generation of micro-kernels and, therefore, the generation of a large range of such kernels. The results provide insights both, to micro-architecture architects as well as micro-kernel developers. The assembler generator is open-sourced and the simulation data is available as supplementary material.

Place, publisher, year, edition, pages
Springer Nature , 2024. p. 47-61
Keywords [en]
assembly generator, dense matrix-matrix multiplication, gem5 simulations, Processor micro-architectures, SIMD/vector instructions
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-353525DOI: 10.1007/978-3-031-69766-1_4ISI: 001308370400004Scopus ID: 2-s2.0-85202745849OAI: oai:DiVA.org:kth-353525DiVA, id: diva2:1899200
Conference
30th International Conference on Parallel and Distributed Computing, Euro-Par 2024, August 26-30, 2024, Madrid, Spain
Note

QC 20241023

Available from: 2024-09-19 Created: 2024-09-19 Last updated: 2024-10-23Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Pleiter, Dirk

Search in DiVA

By author/editor
Pleiter, Dirk
By organisation
Centre for High Performance Computing, PDCComputational Science and Technology (CST)
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 42 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf