kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC
Université Paris-Saclay, CEA, List, 91120, Palaiseau, France.
Université Paris-Saclay, CEA, List, 91120, Palaiseau, France.
Université Paris-Saclay, CEA, List, 91120, Palaiseau, France.
Jülich Supercomputing Centre, Institute for Advanced Simulation, Forschungszentrum Jülich GmbH, Jülich, Germany.
Show others and affiliations
2024 (English)In: Architecture of Computing Systems - 37th International Conference, ARCS 2024, Proceedings, Springer Nature , 2024, p. 251-265Conference paper, Published paper (Refereed)
Abstract [en]

The memory systems of High-Performance Computing (HPC) systems commonly feature non-uniform data paths to memory, i.e. are non-uniform memory access (NUMA) architectures. Memory is divided into multiple regions, with each processing unit having its own local memory. Therefore, for each processing unit access to local memory regions is faster compared to accessing memory at non-local regions. Architectures with hybrid memory technologies result in further non-uniformity. This paper presents case studies of the performance potential and data placement implications of non-uniform and heterogeneous memory in HPC systems. Using the gem5 and VPSim simulation platforms, we model NUMA systems with processors based on the ARMv8 Neoverse V1 Reference Design. The gem5 simulator provides a cycle-accurate view, while VPSim offers greater simulation speed, with a high-level view of the simulated system. We highlight the performance impact of design trade-offs regarding NUMA node organization and System Level Cache (SLC) group assignment, as well as Network-on-Chip (NoC) configuration. Our case studies provide essential input to a co-design process involving HPC processor architects and system integrators. A comparison of system configurations for different NoC bandwidths shows reduced NoC latency and high memory bandwidth improvement when NUMA control is enabled. Furthermore, a configuration with HBM2 memory organized as four NUMA nodes highlights the memory bandwidth performance gap and NoC queuing latency impact when comparing local vs. remote memory accesses. On the other hand, NUMA can result in an unbalanced distribution of memory accesses and reduced SLC hit ratios, as shown with DDR4 memory organized as four NUMA nodes.

Place, publisher, year, edition, pages
Springer Nature , 2024. p. 251-265
Keywords [en]
benchmarking, co-design, High Performance Computing (HPC), Non-Uniform Memory Access (NUMA), simulation
National Category
Computer Systems
Identifiers
URN: urn:nbn:se:kth:diva-352150DOI: 10.1007/978-3-031-66146-4_17ISI: 001293533700017Scopus ID: 2-s2.0-85201001415OAI: oai:DiVA.org:kth-352150DiVA, id: diva2:1891388
Conference
37th International Conference on Architecture of Computing Systems, ARCS 2024, Potsdam, Germany, May 14-16, 2024
Note

Part of ISBN: 9783031661457

QC 20241004

Available from: 2024-08-22 Created: 2024-08-22 Last updated: 2024-10-04Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Pleiter, Dirk

Search in DiVA

By author/editor
Pleiter, Dirk
By organisation
Computational Science and Technology (CST)
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 48 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf