kth.sePublications KTH
Operational message
There are currently operational disruptions. Troubleshooting is in progress.
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0002-6384-2630
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0003-3374-8093
KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.ORCID iD: 0000-0002-5020-1631
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0003-0639-0639
2025 (English)In: Proceedings 31st European Conference on Parallel and Distributed Processing: Heteropar 202523RD International Workshop, Springer , 2025Conference paper, Published paper (Refereed)
Abstract [en]

With the emergence of new high-performance computing (HPC) accelerators, such as Nvidia and AMD GPUs, efficiently targeting diverse hardware architectures has become a major challenge for HPC application developers. The increasing hardware diversity in HPC systems often necessitates the development of architecture-specific code, hindering the sustainability of large-scale scientific applications. In this work, we leverage DaCe, a data-centric parallel programming framework, to automate the generation of high-performance kernels. DaCe enables automatic code generation for multicore processors and various accelerators, reducing the burden on developers who would otherwise need to rewrite code for each new architecture. Our study demonstrates DaCe's capabilities by applying its automatic code generation to a critical computational kernel used in Computational Fluid Dynamics (CFD). Specifically, we focus on Neko, a Fortran-based solver that employs the spectral-element method, which relies on small tensor operations. We detail the formulation of this computational kernel using DaCe's Stateful Dataflow Multigraph (SDFG) representation and discuss how this approach facilitates high-performance code generation. Additionally, we outline the workflow for seamlessly integrating DaCe's generated code into the Neko solver. Our results highlight the portability and performance of the generated code across multiple platforms, including Nvidia GH200, Nvidia A100, and AMD MI250X GPUs, with competitive performance results. By demonstrating the potential of automatic code generation, we emphasize the feasibility of using portable solutions to ensure the long-term sustainability of large-scale scientific applications. 

Place, publisher, year, edition, pages
Springer , 2025.
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-368966OAI: oai:DiVA.org:kth-368966DiVA, id: diva2:1991539
Conference
31st European Conference on Parallel and Distributed Processing, Dresden, Germany, August 25–29, 2025
Note

QC 20251204

Available from: 2025-08-23 Created: 2025-08-23 Last updated: 2025-12-04Bibliographically approved
In thesis
1. Methods for Solving Large-scale Linear Systems in Scientific Computing: Preconditioners and Performance Portability
Open this publication in new window or tab >>Methods for Solving Large-scale Linear Systems in Scientific Computing: Preconditioners and Performance Portability
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Large-scale simulations play a crucial role in scientific discovery and industrial applications. Many of these simulations require solving large linear systems, which commonly arise in the modeling of fluids, electromagnetic fields,and other physical phenomena. Solving such systems is often computationally expensive and time-consuming, making it a critical component in simulation performance.

This thesis focuses on two types of linear systems that frequently arise in modeling and optimization: saddle-point problems and large-scale Poisson equations. Saddle-point problems naturally occur in coupled systems, such as fluid dynamics involving velocity and pressure, and can often be reformulated as optimization problems. Poisson’s equation, on the other hand, frequently acts as a performance bottleneck in large simulations.

A Jacobi preconditioned Conjugate Gradient and a constraint preconditioned GMRES are evaluated on optimization problems arising in radiotherapy treatment planning; the methods demonstrate good convergence properties. Several preconditioners that were evaluated consider domain decomposition on distributed systems where the quality of the preconditioner is weighted against the communication costs.

A novel Anderson accelerated matrix-splitting method is proposed that behaves similarly to inexact left-preconditioned GMRES. Matrix splitting techniques are especially suitable for saddle-point problems as there are many natural splittings for such systems.

Beyond algorithmic choices, performance is also influenced by modern computing architectures, which are increasingly heterogeneous. Efficient use of these systems often requires hardware-specific implementations, which can be costly to develop and maintain. To address this, various strategies introduce portability layers that abstract away hardware details while maintaining performance.

This thesis presents two approaches for solving large-scale Poisson equations using different portability models. Both methods demonstrate promising results in terms of performance and portability.

Abstract [sv]

Storskaliga simuleringar spelar en avgörande roll inom vetenskaplig forskning och industriella tillämpningar. Många av dessa simuleringar kräver lösning av stora linjära ekvationssystem, som ofta uppstår vid modellering av exempelvis vätskeflöden, elektromagnetiska fält och andra fysikaliska fenomen. Att lösa dessa systemär ofta både beräkningsmässigt krävande och tidsödande, och utgör därför en viktig flaskhals i simuleringarnas prestanda.

Denna avhandling fokuserar på två typer av linjära system som ofta uppstår inom modellering och optimering: sadelpunktsproblem och storskaliga Poisson-ekvationer. Sadelpunktsproblem förekommer naturligt i kopplade system, som inom strömningsmekanik där hastighet och tryck samverkar, och kan ofta omformuleras som optimeringsproblem. Poisson-ekvationen fungerar ofta som en prestandabegränsande faktor i stora simuleringar.

En ny metod för att lösa linjära system med matrisuppdelning föreslås,som beter sig likt inexakt vänster-preconditionerad GMRES med Anderson-acceleration. Matrisuppdelningär särskilt väl lämpad för sadelpunktsproblem. Specifikt utvärderas constraint-preconditionerad GMRES på ett optimeringsproblem som uppstår vid strålterapiplanering. Metoden visar god konvergens i jämförelse med traditionella direkta lösare.

Utöver algoritmval påverkas lösningstidenäven av moderna datorarkitekturer, som blir allt mer heterogena. För att effektivt kunna utföra simuleringar av dessa system krävs ofta hårdvaruspecifik implementation, vilket kan vara resurskrävande. För att förenkla detta har olika strategier utvecklats där portabilitetslager hanteraröversättningen till hårdvaruspecifik kod på ett effektivt sätt.

Avhandlingen presenterar två metoder för att lösa storskaliga Poissonekvationer med hjälp av två olika modeller för portabilitet. Båda metoderna visar goda resultat avseende både prestanda och portabilitet.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. p. xii, 91
Series
TRITA-EECS-AVL ; 2025:75
National Category
Computer Sciences Computational Mathematics
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-368974 (URN)978-91-8106-348-6 (ISBN)
Public defence
2025-09-25, https://kth-se.zoom.us/j/65542778560, Kollegiesalen, Brinellvägen 6, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20250827

Available from: 2025-08-28 Created: 2025-08-25 Last updated: 2025-09-01Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Preprint

Authority records

Andersson, MånsKarp, MartinJansson, NiclasMarkidis, Stefano

Search in DiVA

By author/editor
Andersson, MånsKarp, MartinJansson, NiclasMarkidis, Stefano
By organisation
Computational Science and Technology (CST)Centre for High Performance Computing, PDC
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 266 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf