Endre søk
Link to record
Permanent link

Direct link
Publikasjoner (10 av 26) Visa alla publikasjoner
Karp, M., Stanly, R., Mukha, T., Galimberti, L., Toosi, S., Song, H., . . . Schlatter, P. (2026). Effects of lower floating-point precision on scale-resolving numerical simulations of turbulence. Journal of Computational Physics, 549, Article ID 114600.
Åpne denne publikasjonen i ny fane eller vindu >>Effects of lower floating-point precision on scale-resolving numerical simulations of turbulence
Vise andre…
2026 (engelsk)Inngår i: Journal of Computational Physics, ISSN 0021-9991, E-ISSN 1090-2716, Vol. 549, artikkel-id 114600Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Modern computing clusters offer specialized hardware for reduced-precision arithmetic, which can significantly speed up the time to solution. This is possible due to a decrease in data movement, as well as the ability to perform arithmetic operations at a faster rate. However, for high-fidelity simulations of turbulence, such as direct and large-eddy simulation, the impact of reduced precision on the computed solution and the resulting uncertainty across flow solvers and different flow cases has not been explored in detail, and limits the optimal utilization of new high-performance computing systems. In this work, the effect of reduced precision is studied using four diverse computational fluid dynamics (CFD) solvers (two incompressible, Neko and Simson, and two compressible, PadeLibs and SSDC) using four test cases: turbulent channel flow at Reτ=550 and higher, forced transition in a channel, flow over a cylinder at ReD=3900, and compressible flow over a wing section at Rec=50000. We observe that the flow physics are remarkably robust with respect to reductions in lower floating-point precision, and that often other forms of uncertainty, due to, for example, time averaging, often have a much larger impact on the computed result. Our results indicate that different terms in the Navier–Stokes equations can be computed to a lower floating-point accuracy without affecting the results. In particular, standard IEEE single precision can be used effectively for the entirety of the simulation, showing no significant discrepancies from double-precision results across the solvers and cases considered. Potential pitfalls are also discussed. 

sted, utgiver, år, opplag, sider
Elsevier BV, 2026
Emneord
Computational fluid dynamics, Direct numerical simulation, Floating-point precision, Turbulence
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-375324 (URN)10.1016/j.jcp.2025.114600 (DOI)001654296600002 ()2-s2.0-105025717580 (Scopus ID)
Merknad

Not duplicate with DiVA 2002138

QC 20260112

Tilgjengelig fra: 2026-01-12 Laget: 2026-01-12 Sist oppdatert: 2026-01-12bibliografisk kontrollert
Jansson, N., Karp, M., Wahlgren, J., Markidis, S. & Schlatter, P. (2025). Design of Neko—A Scalable High‐Fidelity Simulation Framework With Extensive Accelerator Support. Concurrency and Computation, 37(2), Article ID e8340.
Åpne denne publikasjonen i ny fane eller vindu >>Design of Neko—A Scalable High‐Fidelity Simulation Framework With Extensive Accelerator Support
Vise andre…
2025 (engelsk)Inngår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 37, nr 2, artikkel-id e8340Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Recent trends and advancements in including more diverse and heterogeneous hardware in High-Performance Computing (HPC) are challenging scientific software developers in their pursuit of efficient numerical methods with sustained performance across a diverse set of platforms. As a result, researchers are today forced to re-factor their codes to leverage these powerful new heterogeneous systems. We present our design considerations of Neko—a portable framework for high-fidelity spectral element flow simulations. Unlike prior works, Neko adopts a modern object-oriented Fortran 2008 approach, allowing multi-tier abstractions of the solver stack and facilitating various hardware backends ranging from general-purpose processors, accelerators down to exotic vector processors and Field-Programmable Gate Arrays (FPGAs). Focusing on the performance and portability of Neko, we describe the framework's device abstraction layer managing device memory, data transfer and kernel launches from Fortran, allowing for a solver written in a hardware-neutral yet performant way. Accelerator-specific optimizations are also discussed, with auto-tuning of key kernels and various communication strategies using device-aware MPI. Finally, we present performance measurements on a wide range of computing platforms, including the EuroHPC pre-exascale system LUMI, where Neko achieves excellent parallel efficiency for a large direct numerical simulation (DNS) of turbulent fluid flow using up to 80% of the entire LUMI supercomputer.

sted, utgiver, år, opplag, sider
Wiley, 2025
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-358042 (URN)10.1002/cpe.8340 (DOI)001387473600001 ()2-s2.0-85213688601 (Scopus ID)
Forskningsfinansiär
Swedish Research Council, 2019‐04723Swedish e‐Science Research Center, SESSIEU, Horizon Europe, 101093393
Merknad

QC 20250122

Tilgjengelig fra: 2025-01-03 Laget: 2025-01-03 Sist oppdatert: 2025-01-22bibliografisk kontrollert
Karp, M., Suarez, E., Meinke, J. H., Andersson, M. I., Schlatter, P., Markidis, S. & Jansson, N. (2025). Experience and analysis of scalable high-fidelity computational fluid dynamics on modular supercomputing architectures. The international journal of high performance computing applications, 39(3), 329-344
Åpne denne publikasjonen i ny fane eller vindu >>Experience and analysis of scalable high-fidelity computational fluid dynamics on modular supercomputing architectures
Vise andre…
2025 (engelsk)Inngår i: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 39, nr 3, s. 329-344Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between a Booster module powered by GPUs and a Cluster module with conventional CPU nodes. We investigate several different flow cases and computer systems based on the Modular Supercomputing Architecture (MSA). We observe that for our simulations, the communication overhead and load balancing issues incurred by incorporating different computing architectures are seldom worthwhile, especially when I/O is also considered, but when the simulation at hand requires more than the combined global memory on the GPUs, utilizing additional CPUs to increase the available memory can be fruitful. We support our results with a simple performance model to assess when running across modules might be beneficial. As MSA is becoming more widespread and efforts to increase system utilization are growing more important our results give insight into when and how a monolithic application can utilize and spread out to more than one module and obtain a faster time to solution.

sted, utgiver, år, opplag, sider
SAGE Publications, 2025
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-358044 (URN)10.1177/10943420241303163 (DOI)001366656300001 ()2-s2.0-105003765421 (Scopus ID)
Forskningsfinansiär
Swedish Research Council, 2019-04723Swedish e‐Science Research Center, SESSIEU, Horizon 2020, 955606
Merknad

QC 20260123

Tilgjengelig fra: 2025-01-03 Laget: 2025-01-03 Sist oppdatert: 2026-01-23bibliografisk kontrollert
Baconnet, V., Karp, M., Hanifi, A., Lengani, D., Simoni, D. & Henningson, D. S. (2025). Investigation of the Dynamics of Secondary Flow Vortex Systems in Low-Pressure Turbines Using Direct Numerical Simulation. In: Proceedings of ASME Turbo Expo 2025: Turbomachinery Technical Conference and Exposition, GT 2025: . Paper presented at 70th ASME Turbo Expo 2025: Turbomachinery Technical Conference and Exposition, GT 2025, Memphis, United States of America, June 16-20, 2025. ASME International, Article ID V012T36A005.
Åpne denne publikasjonen i ny fane eller vindu >>Investigation of the Dynamics of Secondary Flow Vortex Systems in Low-Pressure Turbines Using Direct Numerical Simulation
Vise andre…
2025 (engelsk)Inngår i: Proceedings of ASME Turbo Expo 2025: Turbomachinery Technical Conference and Exposition, GT 2025, ASME International , 2025, artikkel-id V012T36A005Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In this work, Direct Numerical Simulation is performed on a low-pressure turbine blade with parallel end-walls, in a linear cascade environment at an exit Reynolds number of 1.5 · 105. Our simulations are performed with Neko, a framework for high-order spectral elements for heterogeneous computing architectures. Secondary flow structures and associated losses are presented in configurations with and without free-stream turbulence and with a Blasius boundary layer inflow profile. Instantaneous and mean flow visualizations validate the classical secondary flow structures reported in the literature. The results highlight strong vortex cores at the outflow and large contributions to losses from the passage vortex and trailing shed vortex (or counter vortex). The application of turbulent structures at the inflow does not affect the formation of the horseshoe vortex nor the vortex cores at the outlet, but still suppresses the shedding at midspan. Proper Orthogonal Decomposition (POD) is applied to provide an overall picture of the flow structures in the entire domain. Without free-stream turbulence, the most energetic modes are found to be linked to the shedding at mid span and the secondary flow structures. Fourier analysis of the POD times series show low frequencies associated with the secondary structures. POD modes for the simulation with free-stream turbulence shows identical secondary flow structures, with additional streamwise-elongated streaky structures in the blade boundary layer and without any modes related to shedding.

sted, utgiver, år, opplag, sider
ASME International, 2025
Emneord
Direct Numerical Simulation, Low-Pressure Turbines, Proper Orthogonal Decomposition, Secondary Flows
HSV kategori
Forskningsprogram
Teknisk mekanik
Identifikatorer
urn:nbn:se:kth:diva-370454 (URN)10.1115/GT2025-151623 (DOI)001560879500036 ()2-s2.0-105014734713 (Scopus ID)
Konferanse
70th ASME Turbo Expo 2025: Turbomachinery Technical Conference and Exposition, GT 2025, Memphis, United States of America, June 16-20, 2025
Merknad

Part of ISBN 9780791888889

QC 20250930

Tilgjengelig fra: 2025-09-30 Laget: 2025-09-30 Sist oppdatert: 2026-01-09bibliografisk kontrollert
Andersson, M., Karp, M., Jansson, N. & Markidis, S. (2025). Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe. In: Proceedings 31st European Conference on Parallel and Distributed Processing: Heteropar 202523RD International Workshop. Paper presented at 31st European Conference on Parallel and Distributed Processing, Dresden, Germany, August 25–29, 2025. Springer
Åpne denne publikasjonen i ny fane eller vindu >>Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe
2025 (engelsk)Inngår i: Proceedings 31st European Conference on Parallel and Distributed Processing: Heteropar 202523RD International Workshop, Springer , 2025Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

With the emergence of new high-performance computing (HPC) accelerators, such as Nvidia and AMD GPUs, efficiently targeting diverse hardware architectures has become a major challenge for HPC application developers. The increasing hardware diversity in HPC systems often necessitates the development of architecture-specific code, hindering the sustainability of large-scale scientific applications. In this work, we leverage DaCe, a data-centric parallel programming framework, to automate the generation of high-performance kernels. DaCe enables automatic code generation for multicore processors and various accelerators, reducing the burden on developers who would otherwise need to rewrite code for each new architecture. Our study demonstrates DaCe's capabilities by applying its automatic code generation to a critical computational kernel used in Computational Fluid Dynamics (CFD). Specifically, we focus on Neko, a Fortran-based solver that employs the spectral-element method, which relies on small tensor operations. We detail the formulation of this computational kernel using DaCe's Stateful Dataflow Multigraph (SDFG) representation and discuss how this approach facilitates high-performance code generation. Additionally, we outline the workflow for seamlessly integrating DaCe's generated code into the Neko solver. Our results highlight the portability and performance of the generated code across multiple platforms, including Nvidia GH200, Nvidia A100, and AMD MI250X GPUs, with competitive performance results. By demonstrating the potential of automatic code generation, we emphasize the feasibility of using portable solutions to ensure the long-term sustainability of large-scale scientific applications. 

sted, utgiver, år, opplag, sider
Springer, 2025
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-368966 (URN)
Konferanse
31st European Conference on Parallel and Distributed Processing, Dresden, Germany, August 25–29, 2025
Merknad

QC 20251204

Tilgjengelig fra: 2025-08-23 Laget: 2025-08-23 Sist oppdatert: 2025-12-04bibliografisk kontrollert
Massaro, D., Karp, M., Jansson, N., Markidis, S. & Schlatter, P. (2024). Direct numerical simulation of the turbulent flow around a Flettner rotor. Scientific Reports, 14(1), Article ID 3004.
Åpne denne publikasjonen i ny fane eller vindu >>Direct numerical simulation of the turbulent flow around a Flettner rotor
Vise andre…
2024 (engelsk)Inngår i: Scientific Reports, E-ISSN 2045-2322, Vol. 14, nr 1, artikkel-id 3004Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

The three-dimensional turbulent flow around a Flettner rotor, i.e. an engine-driven rotating cylinder in an atmospheric boundary layer, is studied via direct numerical simulations (DNS) for three different rotation speeds (α). This technology offers a sustainable alternative mainly for marine propulsion, underscoring the critical importance of comprehending the characteristics of such flow. In this study, we evaluate the aerodynamic loads produced by the rotor of height h, with a specific focus on the changes in lift and drag force along the vertical axis of the cylinder. Correspondingly, we observe that vortex shedding is inhibited at the highest α values investigated. However, in the case of intermediate α, vortices continue to be shed in the upper section of the cylinder (y/h>0.3). As the cylinder begins to rotate, a large-scale motion becomes apparent on the high-pressure side, close to the bottom wall. We offer both a qualitative and quantitative description of this motion, outlining its impact on the wake deflection. This finding is significant as it influences the rotor wake to an extent of approximately one hundred diameters downstream. In practical applications, this phenomenon could influence the performance of subsequent boats and have an impact on the cylinder drag, affecting its fuel consumption. This fundamental study, which investigates a limited yet significant (for DNS) Reynolds number and explores various spinning ratios, provides valuable insights into the complex flow around a Flettner rotor. The simulations were performed using a modern GPU-based spectral element method, leveraging the power of modern supercomputers towards fundamental engineering problems.

sted, utgiver, år, opplag, sider
Springer Nature, 2024
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-344051 (URN)10.1038/s41598-024-53194-x (DOI)001158746700070 ()38321050 (PubMedID)2-s2.0-85184207516 (Scopus ID)
Forskningsfinansiär
KTH Royal Institute of TechnologyKTH Royal Institute of Technology
Merknad

QC 20240301

Tilgjengelig fra: 2024-02-29 Laget: 2024-02-29 Sist oppdatert: 2025-12-08bibliografisk kontrollert
Karp, M. (2024). Direct Numerical Simulation of Turbulence on Heterogenous Computer Systems: Architectures, Algorithms, and Applications. (Doctoral dissertation). Stockholm, Sweden: KTH Royal Institute of Technology
Åpne denne publikasjonen i ny fane eller vindu >>Direct Numerical Simulation of Turbulence on Heterogenous Computer Systems: Architectures, Algorithms, and Applications
2024 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Direct numerical simulations (DNS) of turbulence have a virtually unbounded need for computing power. To carry out these simulations, software, computer architectures, and algorithms must operate as efficiently as possible to amortize the large computational cost. However, in a computing landscape increasingly incorporating heterogeneous computer systems, changes are necessary. In this thesis, we consider how DNS can be carried out efficiently on upcoming heterogeneous computer systems. This work relates to developing algorithms for upcoming heterogeneous computer architectures, overcoming software challenges associated with large-scale DNS on these platforms, and applying these developments to new flow cases that were previously too costly to carry out. We consider in particular the spectral element method for DNS and evaluate how this method maps to field-programmable gate arrays, graphics processing units, as well as conventional processors. We also consider the issue of trading arithmetic operations for less communication, reducing the cost of solving the linear systems that arise in the spectral element method. Our developments are incorporated into the spectral element framework Neko, enabling Neko to strong-scale efficiently on the largest supercomputers in the world. Finally, we have carried out several DNS such as the simulation of a Flettner rotor in a turbulent boundary layer and simulating Rayleigh-Bénard convection at very high Rayleigh numbers. The developments in this thesis enable the high-fidelity simulation of turbulence on emerging computer systems with high parallel efficiency and performance.

Abstract [sv]

Direct numerisk simulering (DNS) av turbulens kräver enorma mängder datorkraft. För att utföra simuleringar som DNS krävs det att mjukvara, datorarkitekturer och algoritmer samverkar så effektivt som möjligt tillsammans. Idag förändras superdatorer snabbt och inkoporerar nya heterogena datorarkitekturer. Detta innebär att nya tillvägagångssätt är nödvändiga för att tillgodogöra sig all beräkningskraft. I den här avhandlingen fokuserar vi på DNS på heterogena, storskaliga, datorsystem för att möjligöra nya simuleringar av turbulenta flöden. För att nå detta mål undersöker vi nya datorarkitekturer, analyserar och förbättrar de numeriska metoderna och algoritmerna vi använder och applicerar slutligen våra utvecklingar på nya simuleringar av turbulens. Vi fokuserar speciellt på den spektrala element metoden (SEM) för DNS och undersöker hur den beter sig på eng. field-programmable gate arrays, grafikkort och konventionella processorer. Vi bidrar även med analys av hur vi löser det linjära systemet som utgör kärnan i SEM för att bättre utnyttja den tillgängliga datorkraften och minska mängden data som behöver överföras. Våra förbättringar inkorporeras i SEM lösaren Neko och möjligör att Neko kan skala effektivt på de största superdatorerna i världen. Vi använder sedan detta ramverk för att genomföra flera storskaliga simuleringar. Vi genomför den första simuleringen av en Flettner rotor och dess interaktion med turbulent skjuvströmning samt simulering av Rayleigh-Bénard konvektion i en cylindrisk domän vid mycket höga Rayleigh tal. Avhandlingen möjligör detaljerad numerisk simulering av turbulens med hög skalbarhet och prestanda i dagens föränderliga datorlandskap. 

sted, utgiver, år, opplag, sider
Stockholm, Sweden: KTH Royal Institute of Technology, 2024. s. 54
Serie
TRITA-EECS-AVL ; 2024:36
Emneord
High Performance Computing, Turbulence, Computational Fluid Dynamics, Heterogenous Computer Architectures, Högprestandaberäkningar, Turbulens, Numerisk Strömingsmekanik, Heterogena Datorarkitekturer
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-345851 (URN)978-91-8040-910-0 (ISBN)
Disputas
2024-05-24, https://kth-se.zoom.us/s/61541415709, Kollegiesalen, Brinellvägen 6, Stockholm, 09:15 (engelsk)
Opponent
Veileder
Forskningsfinansiär
Swedish e‐Science Research Center, SESSI
Merknad

QC 20240423

Tilgjengelig fra: 2024-04-23 Laget: 2024-04-22 Sist oppdatert: 2025-12-02bibliografisk kontrollert
Jansson, N., Karp, M., Markidis, S. & Schlatter, P. (2024). Neko: A Modern, Portable, and Scalable Framework for Extreme-Scale Computational Fluid Dynamics. In: 2024 IEEE International Conference on Cluster Computing workshops, cluster workshops 2024: . Paper presented at 2024 International Conference on Cluster Computing, September 24-27, 2024, Kobe, JAPAN (pp. 156-157). Institute of Electrical and Electronics Engineers (IEEE)
Åpne denne publikasjonen i ny fane eller vindu >>Neko: A Modern, Portable, and Scalable Framework for Extreme-Scale Computational Fluid Dynamics
2024 (engelsk)Inngår i: 2024 IEEE International Conference on Cluster Computing workshops, cluster workshops 2024, Institute of Electrical and Electronics Engineers (IEEE) , 2024, s. 156-157Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Recent trends and advancements in including more diverse and heterogeneous hardware in High-Performance Computing are challenging scientific software developers in their pursuit of good performance and efficient numerical methods. As a result, the well-known maxim "software outlives hardware" may no longer necessarily hold true, and researchers are today forced to re-factor their codes to leverage these powerful new heterogeneous systems. We present Neko - a portable framework for high-fidelity spectral element flow simulations. Unlike prior works, Neko adopts a modern object-oriented Fortran 2008 approach, allowing multi-tier abstractions of the solver stack and facilitating various hardware backends ranging from general-purpose processors, accelerators down to exotic vector processors and Field Programmable Gate Arrays (FPGAs) via Neko's device abstraction layer. Focusing on Neko's performance and exascale readiness, we outline the optimisation and algorithmic work necessary to ensure scalability and performance portability across a wide range of platforms. Finally, we present performance measurements on a wide range of accelerated computing platforms, including the EuroHPC pre-exascale system LUMI and Leonardo, where Neko achieves excellent parallel efficiency for an extreme-scale direct numerical simulation (DNS) of turbulent thermal convection using up to 80% of the entire LUMI supercomputer.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2024
Emneord
Accelerators, Spectral element method, Direct numerical simulation
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-361618 (URN)10.1109/CLUSTERWorkshops61563.2024.00036 (DOI)001422214200025 ()2-s2.0-85211790427 (Scopus ID)
Konferanse
2024 International Conference on Cluster Computing, September 24-27, 2024, Kobe, JAPAN
Merknad

QC 20250326

Tilgjengelig fra: 2025-03-26 Laget: 2025-03-26 Sist oppdatert: 2025-03-27bibliografisk kontrollert
Jansson, N., Karp, M., Podobas, A., Markidis, S. & Schlatter, P. (2024). Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics. Computers & Fluids, 275, 106243-106243, Article ID 106243.
Åpne denne publikasjonen i ny fane eller vindu >>Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics
Vise andre…
2024 (engelsk)Inngår i: Computers & Fluids, ISSN 0045-7930, E-ISSN 1879-0747, Vol. 275, s. 106243-106243, artikkel-id 106243Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

Computational fluid dynamics (CFD), in particular applied to turbulent flows, is a research area with great engineering and fundamental physical interest. However, already at moderately high Reynolds numbers the computational cost becomes prohibitive as the range of active spatial and temporal scales is quickly widening. Specifically scale-resolving simulations, including large-eddy simulation (LES) and direct numerical simulations (DNS), thus need to rely on modern efficient numerical methods and corresponding software implementations. Recent trends and advancements, including more diverse and heterogeneous hardware in High-Performance Computing (HPC), are challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim “software outlives hardware” may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. In this paper, we present Neko, a new portable framework for high-order spectral element discretization, targeting turbulent flows in moderately complex geometries. Neko is fully available as open software. Unlike prior works, Neko adopts a modern object-oriented approach in Fortran 2008, allowing multi-tier abstractions of the solver stack and facilitating hardware backends ranging from general-purpose processors (CPUs) down to exotic vector processors and FPGAs. We show that Neko’s performance and accuracy are comparable to NekRS, and thus on-par with Nek5000’s successor on modern CPU machines. Furthermore, we develop a performance model, which we use to discuss challenges and opportunities for high-order solvers on emerging hardware

sted, utgiver, år, opplag, sider
Elsevier BV, 2024
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-344896 (URN)10.1016/j.compfluid.2024.106243 (DOI)001216304000001 ()2-s2.0-85189508362 (Scopus ID)
Forskningsfinansiär
Swedish Research Council, 2019-04723EU, Horizon 2020, 823691EU, Horizon 2020, 801039
Merknad

QC 20240403

Tilgjengelig fra: 2024-04-02 Laget: 2024-04-02 Sist oppdatert: 2025-12-05bibliografisk kontrollert
Andersson, M., Karp, M. & Markidis, S. (2024). Towards Performance Portable Kernels for Computational Fluid Dynamics Using DaCe. In: 53rd International Conference on Parallel Processing, ICPP 2024 - Workshops Proceedings: . Paper presented at 53rd International Conference on Parallel Processing, ICPP 2024, August 12-15, 2024, Gotland, Sweden (pp. 110-111). Association for Computing Machinery (ACM)
Åpne denne publikasjonen i ny fane eller vindu >>Towards Performance Portable Kernels for Computational Fluid Dynamics Using DaCe
2024 (engelsk)Inngår i: 53rd International Conference on Parallel Processing, ICPP 2024 - Workshops Proceedings, Association for Computing Machinery (ACM) , 2024, s. 110-111Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

With the rise of new high-performance computing (HPC) accelerators, such as Nvidia and AMD GPUs, the demand for efficient code targeting diverse hardware accelerators poses a critical challenge for HPC application developers. This hardware diversity in the HPC systems necessitates the development of new code tailored to specific architectures, which, in turn, hampers the sustainability of large scientific application development. In this work, we rely on DaCe [1, 2], a data-centric parallel programming framework, to automate the generation of high-performance kernels. DaCe can generate automatic code for multicore processors and various accelerators, alleviating the programmer burden of rewriting code for a new architecture. Our work demonstrates the automatic code generation capabilities of DaCe, applied to a critical high-performance computational kernel for Computational Fluid Dynamics code. Specifically, we focus on the Fortran-based solver, Neko [4] which is based on the Spectral Element Method. This method relies on small-sized matrix multiplications akin to BLAS dgemm operations. We describe the formulation of this computational kernel through DaCe's Stateful Dataflow Multigraph (SDFG) representation. We discuss how this representation facilitates high-performance code generation and detail the workflow for integration of DaCe's automatically generated code into the Neko solver. Initial work on Nvidia GH200. By showcasing the potential of automatic code generation, we highlight the feasibility of supporting the long-term sustainability of large-scale scientific applications by using portable solutions for critical computational kernels of large-scale codes.

sted, utgiver, år, opplag, sider
Association for Computing Machinery (ACM), 2024
Emneord
FEM, High-Order Methods, Performance, Portability, SEM
HSV kategori
Identifikatorer
urn:nbn:se:kth:diva-353518 (URN)10.1145/3677333.3678270 (DOI)001298775800004 ()2-s2.0-85202810034 (Scopus ID)
Konferanse
53rd International Conference on Parallel Processing, ICPP 2024, August 12-15, 2024, Gotland, Sweden
Merknad

Part of ISBN 9798400718021

QC 20241008

Tilgjengelig fra: 2024-09-19 Laget: 2024-09-19 Sist oppdatert: 2024-10-08bibliografisk kontrollert
Organisasjoner
Identifikatorer
ORCID-id: ORCID iD iconorcid.org/0000-0003-3374-8093