Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (10 av 20) Visa alla publikasjoner
Bouvry, P., Brorsson, M., Canal, R., Eftekhari, A., Höfinger, S., Smets, D., . . . Silvano, C. (2025). The European master for HPC curriculum. Journal of Parallel and Distributed Computing, 201, Article ID 105081.
Åpne denne publikasjonen i ny fane eller vindu >>The European master for HPC curriculum
Vise andre…
2025 (engelsk)Inngår i: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 201, artikkel-id 105081Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The use of High-Performance Computing (HPC) is crucial for addressing various grand challenges. While significant investments are made in digital infrastructures that comprise HPC resources, its realisation, operation, and, in particular, its use critically depends on suitably trained experts. In this paper, we present the results of an effort to design and implement a pan-European reference curriculum for a master's degree in HPC.

Emneord
Computing education, High-performance computing, Master in HPC, Model curricula
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-362519 (URN)10.1016/j.jpdc.2025.105081 (DOI)001466093600001 ()2-s2.0-105001852460 (Scopus ID)
Merknad

QC 20250422

Tilgjengelig fra: 2025-04-16 Laget: 2025-04-16 Sist oppdatert: 2025-05-22bibliografisk kontrollert
Pichetti, L., De Sensi, D., Sivalingam, K., Nassyr, S., Cesarini, D., Turisini, M., . . . Vella, F. (2024). Benchmarking Ethernet Interconnect for HPC/AI workloads. In: Proceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis: . Paper presented at 2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024, Atlanta, United States of America, Nov 17 2024 - Nov 22 2024 (pp. 869-875). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>Benchmarking Ethernet Interconnect for HPC/AI workloads
Vise andre…
2024 (engelsk)Inngår i: Proceedings of SC 2024-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, Institute of Electrical and Electronics Engineers (IEEE) , 2024, s. 869-875Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Interconnects have always played a cornerstone role In HPC. Since the Inception of the Top500 ranking, Interconnect statistics have been predominantly dominated by two competing technologies: InfiniBand and Ethernet. However, even if Ethernet is very popular due to versatility and cost-effectiveness, InfiniBand used to provide higher bandwidth and continues to feature lower latency. Industry seeks for a further evolution of the Ethernet standards to enable fast and low-latency interconnect for emerging AI workloads by offering competitive, open-standard solutions. This paper analyzes the early results obtained from two systems relying on an HPC Ethernet interconnect, one relying on 100G and the other on 200G Ethernet. Preliminary findings indicate that the Ethernet-based networks exhibit competitive performance, closely aligning with InfiniBand, especially for large message exchanges.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2024
Emneord
ethernet, gigabit ethernet, hpc/ai workloads, infiniband, interconnect
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-360171 (URN)10.1109/SCW63240.2024.00124 (DOI)2-s2.0-85217184791 (Scopus ID)
Konferanse
2024 Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC Workshops 2024, Atlanta, United States of America, Nov 17 2024 - Nov 22 2024
Merknad

Part of ISBN 979-8-3503-5554-3

QC 20250224

Tilgjengelig fra: 2025-02-19 Laget: 2025-02-19 Sist oppdatert: 2025-03-24bibliografisk kontrollert
Zaourar, L., Benazouz, M., Mouhagir, A., Falquez, C., Portero, A., Ho, N., . . . Pleiter, D. (2024). Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC. In: Architecture of Computing Systems - 37th International Conference, ARCS 2024, Proceedings: . Paper presented at 37th International Conference on Architecture of Computing Systems, ARCS 2024, Potsdam, Germany, May 14-16, 2024 (pp. 251-265). Springer Nature
Åpne denne publikasjonen i ny fane eller vindu >>Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC
Vise andre…
2024 (engelsk)Inngår i: Architecture of Computing Systems - 37th International Conference, ARCS 2024, Proceedings, Springer Nature , 2024, s. 251-265Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

The memory systems of High-Performance Computing (HPC) systems commonly feature non-uniform data paths to memory, i.e. are non-uniform memory access (NUMA) architectures. Memory is divided into multiple regions, with each processing unit having its own local memory. Therefore, for each processing unit access to local memory regions is faster compared to accessing memory at non-local regions. Architectures with hybrid memory technologies result in further non-uniformity. This paper presents case studies of the performance potential and data placement implications of non-uniform and heterogeneous memory in HPC systems. Using the gem5 and VPSim simulation platforms, we model NUMA systems with processors based on the ARMv8 Neoverse V1 Reference Design. The gem5 simulator provides a cycle-accurate view, while VPSim offers greater simulation speed, with a high-level view of the simulated system. We highlight the performance impact of design trade-offs regarding NUMA node organization and System Level Cache (SLC) group assignment, as well as Network-on-Chip (NoC) configuration. Our case studies provide essential input to a co-design process involving HPC processor architects and system integrators. A comparison of system configurations for different NoC bandwidths shows reduced NoC latency and high memory bandwidth improvement when NUMA control is enabled. Furthermore, a configuration with HBM2 memory organized as four NUMA nodes highlights the memory bandwidth performance gap and NoC queuing latency impact when comparing local vs. remote memory accesses. On the other hand, NUMA can result in an unbalanced distribution of memory accesses and reduced SLC hit ratios, as shown with DDR4 memory organized as four NUMA nodes.

sted, utgiver, år, opplag, sider
Springer Nature, 2024
Emneord
benchmarking, co-design, High Performance Computing (HPC), Non-Uniform Memory Access (NUMA), simulation
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-352150 (URN)10.1007/978-3-031-66146-4_17 (DOI)001293533700017 ()2-s2.0-85201001415 (Scopus ID)
Konferanse
37th International Conference on Architecture of Computing Systems, ARCS 2024, Potsdam, Germany, May 14-16, 2024
Merknad

Part of ISBN: 9783031661457

QC 20241004

Tilgjengelig fra: 2024-08-22 Laget: 2024-08-22 Sist oppdatert: 2024-10-04bibliografisk kontrollert
Saglam, B., Ho, N., Falquez, C., Portero, A., Schätzle, F., Suarez, E. & Pleiter, D. (2024). Data Prefetching on Processors with Heterogeneous Memory. In: MEMSYS 2024 - Proceedings of the International Symposium on Memory Systems: . Paper presented at 10th International Symposium on Memory Systems, MEMSYS 2024, Washington, United States of America, Sep 30 2024 - Oct 3 2024 (pp. 45-60). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>Data Prefetching on Processors with Heterogeneous Memory
Vise andre…
2024 (engelsk)Inngår i: MEMSYS 2024 - Proceedings of the International Symposium on Memory Systems, Association for Computing Machinery (ACM) , 2024, s. 45-60Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Heterogeneous memory architectures, such as a mix of High Bandwidth Memory (HBM) and Double Data Rate (DDR), offer flexible performance optimization by leveraging the high bandwidth of HBM along with the high capacity of DDR. However, these architectures present challenges in balancing bandwidth and capacity to maximize overall system performance and complicate hardware design. In a flat memory organization mixing HBM and DDR, prefetchers must carefully reduce prefetch requests on DDR when transitioning from HBM to avoid performance degradation due to potential bandwidth saturation. Traditional hardware prefetchers, which typically assume a homogeneous memory, are unaware of this circumstance, so they may not be effective in heterogeneous memory architectures. The paper enhances the aggressiveness of prefetchers in this kind of architecture. Our technique enables a prefetcher to dynamically determine the optimal prefetch degree and distance based on memory type. It balances prefetch aggressiveness and timeliness through an adaptive strategy informed by bandwidth utilization and prefetch metrics learned for each memory type. We evaluated the technique within the Stride and Stream Prefetchers at L2 in a gem5 model of a 20-core Arm Neoverse V1-like architecture, a mix of HBM2 and DDR5. The simulation results, focusing on scientific benchmarks, showed that the technique effectively guides prefetchers to near-optimal static configurations. On HBM2, the adaptation strategy detects bandwidth availability and prefetches more aggressively to boost performance, achieving speedups of 1.3× to 2.3×. On DDR5, when faced with saturated bandwidth contention, the adaptation strategy switches to conservative prefetching mode to mitigate performance degradation.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2024
Emneord
Hardware Prefetcher, Hybrid Memory, NUMA
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-359656 (URN)10.1145/3695794.3695800 (DOI)2-s2.0-85216078744 (Scopus ID)
Konferanse
10th International Symposium on Memory Systems, MEMSYS 2024, Washington, United States of America, Sep 30 2024 - Oct 3 2024
Merknad

Part of ISBN 9798400710919

QC 20250206

Tilgjengelig fra: 2025-02-06 Laget: 2025-02-06 Sist oppdatert: 2025-02-06bibliografisk kontrollert
Nassyr, S. & Pleiter, D. (2024). Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels. In: Euro-Par 2024: Parallel Processing - 30th European Conference on Parallel and Distributed Processing, Proceedings: . Paper presented at 30th International Conference on Parallel and Distributed Computing, Euro-Par 2024, August 26-30, 2024, Madrid, Spain (pp. 47-61). Springer Nature
Åpne denne publikasjonen i ny fane eller vindu >>Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels
2024 (engelsk)Inngår i: Euro-Par 2024: Parallel Processing - 30th European Conference on Parallel and Distributed Processing, Proceedings, Springer Nature , 2024, s. 47-61Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Dense matrix-matrix operations are relevant for a broad range of numerical applications, e.g. for implementing deep neural networks. Past research has led to a good understanding of how these operations can be mapped in a generic manner on typical processor architectures with multiple cache levels such that near-optimal performance can be reached. However, while commonly used micro-architectures are typically suitable for such operations, their architectural parameters need to be suitably tuned. The performance of highly optimised implementations of these operations relies on micro-kernels that are often handwritten. Given the increased variety of instruction set architectures and SIMD instruction extensions, this becomes challenging. In this paper, we present and implement a methodology for an exhaustive exploration of a processor core micro-architecture design space based on gem5 simulations. Furthermore, we present a tool for generating efficiently vectorised code leveraging Arm’s SVE and RISC-V’s RVV instructions. It enables automatisation of the generation of micro-kernels and, therefore, the generation of a large range of such kernels. The results provide insights both, to micro-architecture architects as well as micro-kernel developers. The assembler generator is open-sourced and the simulation data is available as supplementary material.

sted, utgiver, år, opplag, sider
Springer Nature, 2024
Emneord
assembly generator, dense matrix-matrix multiplication, gem5 simulations, Processor micro-architectures, SIMD/vector instructions
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-353525 (URN)10.1007/978-3-031-69766-1_4 (DOI)001308370400004 ()2-s2.0-85202745849 (Scopus ID)
Konferanse
30th International Conference on Parallel and Distributed Computing, Euro-Par 2024, August 26-30, 2024, Madrid, Spain
Merknad

QC 20241023

Tilgjengelig fra: 2024-09-19 Laget: 2024-09-19 Sist oppdatert: 2024-10-23bibliografisk kontrollert
Long, S., Pleiter, D., Patrascoiu, M., Padrin, C., Carpene, M., More, S. & Carpio, M. (2024). Integrating FTS in the Fenix HPC infrastructure. In: Espinal, X DeVita, R Laycock, P Shadura, O (Ed.), 26th international conference on computing in high energy and nuclear physics, CHEP 2023: . Paper presented at 26th International Conference on Computing in High Energy and Nuclear Physics (CHEP), May 08-12, 2023, Norfolk, VA, USA. EDP Sciences, 295, Article ID 01037.
Åpne denne publikasjonen i ny fane eller vindu >>Integrating FTS in the Fenix HPC infrastructure
Vise andre…
2024 (engelsk)Inngår i: 26th international conference on computing in high energy and nuclear physics, CHEP 2023 / [ed] Espinal, X DeVita, R Laycock, P Shadura, O, EDP Sciences , 2024, Vol. 295, artikkel-id 01037Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

As compute requirements in experimental high-energy physics are expected to significantly increase, there is a need for leveraging high-performance computing (HPC) resources. However, HPC systems are currently organised and operated in a way that this is not easily possible. Here we will focus on a specific e-infrastructure that incorporates HPC resources, namely Fenix, which is based on a consortium of 6 leading European supercomputing centres. Fenix was initiated through the Human Brain Project (HBP) but also provides resources to other research communities in Europe. The Fenix sites are integrated into a common AAI and provide a so-called Archival Data Repository that can be accessed through a Swift API. In this paper, we report on our efforts to realise a data transfer service that allow to exchange data with the Fenix e-infrastructure. This has been enabled by implementing support of Swift in FTS3 and related software components. We will, in particular, discuss how FTS3 has been integrated into the Fenix AAI, which largely follows the architectural principles of the European Open Science Cloud (EOSC). Furthermore, we show how end-users can use this service through a WebFTS service that has been integrated into the science gateway of the HBP, which is also known as the HBP Collaboratory. Finally, we discuss how transfer commands can be automatically distributed over several FTS3 instances to optimise transfer between different Fenix sites.

sted, utgiver, år, opplag, sider
EDP Sciences, 2024
Serie
EPJ Web of Conferences, ISSN 2100-014X
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-353118 (URN)10.1051/epjconf/202429501037 (DOI)001244151900037 ()2-s2.0-85212211207 (Scopus ID)
Konferanse
26th International Conference on Computing in High Energy and Nuclear Physics (CHEP), May 08-12, 2023, Norfolk, VA, USA
Merknad

QC 20240912

Tilgjengelig fra: 2024-09-12 Laget: 2024-09-12 Sist oppdatert: 2025-01-07bibliografisk kontrollert
Kierans, D. & Pleiter, D. (2024). Realising Distributed Digital Twins within Federated Digital Infrastructures. In: DiDit 2024 - Proceedings of the 1st International Workshop on Distributed Digital Twins, co-located with DisCoTec 2024 - 19th International Federated Conference on Distributed Computing Techniques, DisCoTec 2024: . Paper presented at 1st International Workshop on Distributed Digital Twins, DiDit 2024, Groningen, Netherlands, Jun 17 2024. CEUR-WS
Åpne denne publikasjonen i ny fane eller vindu >>Realising Distributed Digital Twins within Federated Digital Infrastructures
2024 (engelsk)Inngår i: DiDit 2024 - Proceedings of the 1st International Workshop on Distributed Digital Twins, co-located with DisCoTec 2024 - 19th International Federated Conference on Distributed Computing Techniques, DisCoTec 2024, CEUR-WS , 2024Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Digital twins are a concept that has initially become popular in an industrial context to support product life cycle management. Over time, the number of domains where this concept is applied has grown significantly. This includes, in particular, domains where distributed digital infrastructures become mandatory for operating digital twins. A prominent example is the earth systems, weather, and climate domain. In the area of digital infrastructures, we observe significant efforts towards the federation of computing, storage, and data management services. In the context of digital twins, we consider efforts of particular interest that aim for distributed digital infrastructures based on geographically distributed resources with services operated by a diversity of organisations. In this paper, we review a selection of use cases for distributed digital twins. Analysing these use cases and their implementation leads us to a set of common features and requirements. The key idea of this paper is to link these to earlier identified research and development challenges in the area of federated digital infrastructures.

sted, utgiver, år, opplag, sider
CEUR-WS, 2024
Emneord
Digital twins, federated digital infrastructures, high-performance computing (HPC)
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-367168 (URN)2-s2.0-85204366913 (Scopus ID)
Konferanse
1st International Workshop on Distributed Digital Twins, DiDit 2024, Groningen, Netherlands, Jun 17 2024
Merknad

QC 20250715

Tilgjengelig fra: 2025-07-15 Laget: 2025-07-15 Sist oppdatert: 2025-07-15bibliografisk kontrollert
Portero, A., Falquez, C., Ho, N., Petrakis, P., Nassyr, S., Marazakis, M., . . . Suarez, E. (2023). COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores. In: Architecture of Computing Systems: 36th International Conference, ARCS 2023, Proceedings. Paper presented at 36th International Conference on Architecture of Computing Systems, ARCS 2023, June 13-15, 2023, Athens, Greece (pp. 105-119). Springer Nature
Åpne denne publikasjonen i ny fane eller vindu >>COMPESCE: A Co-design Approach for Memory Subsystem Performance Analysis in HPC Many-Cores
Vise andre…
2023 (engelsk)Inngår i: Architecture of Computing Systems: 36th International Conference, ARCS 2023, Proceedings, Springer Nature , 2023, s. 105-119Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

This paper explores the memory subsystem design through gem5 simulations of a non-uniform memory access (NUMA) architecture with ARM cores equipped with vector engines. And connected to a Network-on-Chip (NoC) following the Coherent Hub Interface (CHI) protocol. The study quantifies the benefits of vectorization, prefetching, and multichannel NoC configurations using a benchmark for generating memory patterns and indexed accesses. The outcomes provide insights into improving bus utilization and bandwidth and reducing stalls in the system. The paper proposes hardware/software (HW/SW) advancements to reach and use the HBM device with a higher percentage than 80% at the memory controllers in the simulated manycore system.

sted, utgiver, år, opplag, sider
Springer Nature, 2023
Emneord
Co-design, gem5, HPC, Network on Chip
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-337883 (URN)10.1007/978-3-031-42785-5_8 (DOI)001293532100008 ()2-s2.0-85171444909 (Scopus ID)
Konferanse
36th International Conference on Architecture of Computing Systems, ARCS 2023, June 13-15, 2023, Athens, Greece
Merknad

Part of ISBN: 9783031427848

QC 20241004

Tilgjengelig fra: 2023-10-10 Laget: 2023-10-10 Sist oppdatert: 2024-10-04bibliografisk kontrollert
Smail, R. E., Batelaan, M., Horsley, R., Nakamura, Y., Perlt, H., Pleiter, D., . . . Zanotti, J. M. (2023). Constraining beyond the standard model nucleon isovector charges. Physical Review D: covering particles, fields, gravitation, and cosmology, 108(9), Article ID 094511.
Åpne denne publikasjonen i ny fane eller vindu >>Constraining beyond the standard model nucleon isovector charges
Vise andre…
2023 (engelsk)Inngår i: Physical Review D: covering particles, fields, gravitation, and cosmology, ISSN 2470-0010, E-ISSN 2470-0029, Vol. 108, nr 9, artikkel-id 094511Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

At the TeV scale, low-energy precision observations of neutron characteristics provide unique probes of novel physics. Precision studies of neutron decay observables are susceptible to beyond the Standard Model (BSM) tensor and scalar interactions, while the neutron electric dipole moment, dn, also has high sensitivity to new BSM CP-violating interactions. To fully utilize the potential of future experimental neutron physics programs, matrix elements of appropriate low-energy effective operators within neutron states must be precisely calculated. We present results from the QCDSF/UKQCD/CSSM Collaboration for the isovector charges gT, gA and gS of the nucleon, ς and Ξ baryons using lattice QCD methods and the Feynman-Hellmann theorem. We use a flavor symmetry breaking method to systematically approach the physical quark mass using ensembles that span five lattice spacings and multiple volumes. We extend this existing flavor-breaking expansion to also account for lattice spacing and finite volume effects in order to quantify all systematic uncertainties. Our final estimates of the nucleon isovector charges are gT=1.010(21)stat(12)sys,gA=1.253(63)stat(41)sys and gS=1.08(21)stat(03)sys renormalized, where appropriate, at μ=2 GeV in the MS¯ scheme.

sted, utgiver, år, opplag, sider
American Physical Society (APS), 2023
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-340967 (URN)10.1103/PhysRevD.108.094511 (DOI)001119009700015 ()2-s2.0-85178092762 (Scopus ID)
Merknad

QC 20231218

Tilgjengelig fra: 2023-12-18 Laget: 2023-12-18 Sist oppdatert: 2024-02-29bibliografisk kontrollert
Brank, B. & Pleiter, D. (2023). CPU Architecture Modelling and Co-design. In: High Performance Computing - 38th International Conference, ISC High Performance 2023, Proceedings: . Paper presented at 38th International Conference on High Performance Computing, ISC High Performance 2023, Hamburg, Germany, May 21 2023 - May 25 2023 (pp. 3-21). Springer Nature
Åpne denne publikasjonen i ny fane eller vindu >>CPU Architecture Modelling and Co-design
2023 (engelsk)Inngår i: High Performance Computing - 38th International Conference, ISC High Performance 2023, Proceedings, Springer Nature , 2023, s. 3-21Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Co-design has become an established process for both developing high-performance computing (HPC) architectures (and, more specifically, CPU architectures) as well as HPC applications. The co-design process is frequently based on models. This paper discusses an approach to CPU architecture modelling and its relation to modelling theory. The approach is implemented using the gem5 simulator for Arm-based CPU architectures and applied for the purpose of generating co-design knowledge using two applications that are widely used on HPC systems.

sted, utgiver, år, opplag, sider
Springer Nature, 2023
Emneord
Arm, computer architecture modelling, computer architecture simulation, gem5, GPAW, Graviton 2, GROMACS, HPC applications, HPC architectures
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-338629 (URN)10.1007/978-3-031-32041-5_1 (DOI)2-s2.0-85161134699 (Scopus ID)
Konferanse
38th International Conference on High Performance Computing, ISC High Performance 2023, Hamburg, Germany, May 21 2023 - May 25 2023
Merknad

Part of ISBN 9783031320408

QC 20231102

Tilgjengelig fra: 2023-11-02 Laget: 2023-11-02 Sist oppdatert: 2023-11-02bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0001-7296-7817