kth.sePublications KTH
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Methods for Solving Large-scale Linear Systems in Scientific Computing: Preconditioners and Performance Portability
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0002-6384-2630
2025 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Large-scale simulations play a crucial role in scientific discovery and industrial applications. Many of these simulations require solving large linear systems, which commonly arise in the modeling of fluids, electromagnetic fields,and other physical phenomena. Solving such systems is often computationally expensive and time-consuming, making it a critical component in simulation performance.

This thesis focuses on two types of linear systems that frequently arise in modeling and optimization: saddle-point problems and large-scale Poisson equations. Saddle-point problems naturally occur in coupled systems, such as fluid dynamics involving velocity and pressure, and can often be reformulated as optimization problems. Poisson’s equation, on the other hand, frequently acts as a performance bottleneck in large simulations.

A Jacobi preconditioned Conjugate Gradient and a constraint preconditioned GMRES are evaluated on optimization problems arising in radiotherapy treatment planning; the methods demonstrate good convergence properties. Several preconditioners that were evaluated consider domain decomposition on distributed systems where the quality of the preconditioner is weighted against the communication costs.

A novel Anderson accelerated matrix-splitting method is proposed that behaves similarly to inexact left-preconditioned GMRES. Matrix splitting techniques are especially suitable for saddle-point problems as there are many natural splittings for such systems.

Beyond algorithmic choices, performance is also influenced by modern computing architectures, which are increasingly heterogeneous. Efficient use of these systems often requires hardware-specific implementations, which can be costly to develop and maintain. To address this, various strategies introduce portability layers that abstract away hardware details while maintaining performance.

This thesis presents two approaches for solving large-scale Poisson equations using different portability models. Both methods demonstrate promising results in terms of performance and portability.

Abstract [sv]

Storskaliga simuleringar spelar en avgörande roll inom vetenskaplig forskning och industriella tillämpningar. Många av dessa simuleringar kräver lösning av stora linjära ekvationssystem, som ofta uppstår vid modellering av exempelvis vätskeflöden, elektromagnetiska fält och andra fysikaliska fenomen. Att lösa dessa systemär ofta både beräkningsmässigt krävande och tidsödande, och utgör därför en viktig flaskhals i simuleringarnas prestanda.

Denna avhandling fokuserar på två typer av linjära system som ofta uppstår inom modellering och optimering: sadelpunktsproblem och storskaliga Poisson-ekvationer. Sadelpunktsproblem förekommer naturligt i kopplade system, som inom strömningsmekanik där hastighet och tryck samverkar, och kan ofta omformuleras som optimeringsproblem. Poisson-ekvationen fungerar ofta som en prestandabegränsande faktor i stora simuleringar.

En ny metod för att lösa linjära system med matrisuppdelning föreslås,som beter sig likt inexakt vänster-preconditionerad GMRES med Anderson-acceleration. Matrisuppdelningär särskilt väl lämpad för sadelpunktsproblem. Specifikt utvärderas constraint-preconditionerad GMRES på ett optimeringsproblem som uppstår vid strålterapiplanering. Metoden visar god konvergens i jämförelse med traditionella direkta lösare.

Utöver algoritmval påverkas lösningstidenäven av moderna datorarkitekturer, som blir allt mer heterogena. För att effektivt kunna utföra simuleringar av dessa system krävs ofta hårdvaruspecifik implementation, vilket kan vara resurskrävande. För att förenkla detta har olika strategier utvecklats där portabilitetslager hanteraröversättningen till hårdvaruspecifik kod på ett effektivt sätt.

Avhandlingen presenterar två metoder för att lösa storskaliga Poissonekvationer med hjälp av två olika modeller för portabilitet. Båda metoderna visar goda resultat avseende både prestanda och portabilitet.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2025. , p. xii, 91
Series
TRITA-EECS-AVL ; 2025:75
National Category
Computer Sciences Computational Mathematics
Research subject
Computer Science
Identifiers
URN: urn:nbn:se:kth:diva-368974ISBN: 978-91-8106-348-6 (print)OAI: oai:DiVA.org:kth-368974DiVA, id: diva2:1991879
Public defence
2025-09-25, https://kth-se.zoom.us/j/65542778560, Kollegiesalen, Brinellvägen 6, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20250827

Available from: 2025-08-28 Created: 2025-08-25 Last updated: 2025-09-01Bibliographically approved
List of papers
1. Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems
Open this publication in new window or tab >>Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems
2024 (English)In: 2024 SIAM Conference on Parallel Processing for Scientific Computing, PP 2024, Society for Industrial and Applied Mathematics Publications , 2024, p. 39-51Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents the design and development of an Anderson Accelerated Preconditioned Modified Hermitian and Skew-Hermitian Splitting (AA-PMHSS) method for solving complex-symmetric linear systems with application to electromagnetics problems, such as wave scattering and eddy currents. While it has been shown that the Anderson acceleration of real linear systems is essentially equivalent to GMRES, we show here that the formulation using Anderson acceleration leads to a more performant method. We show relatively good robustness compared to existing preconditioned GMRES methods and significantly better performance due to the faster evaluation of the preconditioner. In particular, AA-PMHSS can be applied to solve problems and equations arising from complex-valued systems, such as time-harmonic eddy current simulations discretized with the Finite Element Method. We also evaluate three test systems present in previous literature. We show that the method is competitive with two types of preconditioned GMRES, which share the significant advantage of having a convergence rate that is independent of the discretization size.

Place, publisher, year, edition, pages
Society for Industrial and Applied Mathematics Publications, 2024
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-347317 (URN)001282154500004 ()2-s2.0-85194149187 (Scopus ID)
Conference
22nd SIAM Conference on Parallel Processing for Scientific Computing, PP 2024, Baltimore, United States of America, Mar 5 2024 - Mar 8 2024
Note

QC 20240612

Part of ISBN 978-171389347-9

Available from: 2024-06-10 Created: 2024-06-10 Last updated: 2025-08-25Bibliographically approved
2. Preconditioned CG and GMRES for interior point methods with applications in radiation therapy
Open this publication in new window or tab >>Preconditioned CG and GMRES for interior point methods with applications in radiation therapy
2025 (English)In: Journal of Computational Science, ISSN 1877-7503, E-ISSN 1877-7511, Vol. 90, article id 102652Article in journal (Refereed) Published
Abstract [en]

Interior point methods (IPMs) are widely used for different types of mathematical optimization problems. Many implementations of IPMs in use today rely on direct linear solvers to solve systems of equations in each iteration. The need to solve ever larger optimization problems more efficiently and the rise of hardware accelerators for general-purpose computing has led to much interest in using iterative linear solvers instead, though inevitable ill-conditioning of the linear systems remains a challenge. We investigate the use of the Krylov solvers CG, MINRES and GMRES for IPMs applied to optimization problems from radiation therapy. We implement a prototype IPM for convex quadratic programs and consider two different preconditioning strategies depending on the characteristics of the problem. One where a doubly augmented re-formulation of the Karush–Kuhn–Tucker linear system, originally proposed by Forsgren and Gill, is used together with a Jacobi-preconditioned conjugate gradient solver, and another with constraint preconditioning applied to a symmetric indefinite formulation of the linear system. Our results indicate that the proposed formulation provides sufficient accuracy. Furthermore, profiling of our prototype code shows that it is suitable for GPU acceleration, which may further improve its performance. The constraint preconditioner is shown to work well for cases with few, dense linear constraints, with a substantial improvement in linear solver convergence compared to the doubly augmented version. Our results indicate that our method can find solutions of acceptable accuracy in reasonable time for realistic problems from radiation therapy, as well as a simple test problem from support vector machine training. This is an extended version of a conference paper presented at the International Conference for Computational Science 2024 (Málaga, Spain) (Liu et al. 2024).

Place, publisher, year, edition, pages
Elsevier BV, 2025
Keywords
Interior point method, Krylov solver, Radiation therapy
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-368764 (URN)10.1016/j.jocs.2025.102652 (DOI)001521588200001 ()2-s2.0-105008828457 (Scopus ID)
Note

QC 20250821

Available from: 2025-08-21 Created: 2025-08-21 Last updated: 2025-10-03Bibliographically approved
3. Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe
Open this publication in new window or tab >>Portable High-Performance Kernel Generation for a Computational Fluid Dynamics Code with DaCe
2025 (English)In: Proceedings 31st European Conference on Parallel and Distributed Processing: Heteropar 202523RD International Workshop, Springer , 2025Conference paper, Published paper (Refereed)
Abstract [en]

With the emergence of new high-performance computing (HPC) accelerators, such as Nvidia and AMD GPUs, efficiently targeting diverse hardware architectures has become a major challenge for HPC application developers. The increasing hardware diversity in HPC systems often necessitates the development of architecture-specific code, hindering the sustainability of large-scale scientific applications. In this work, we leverage DaCe, a data-centric parallel programming framework, to automate the generation of high-performance kernels. DaCe enables automatic code generation for multicore processors and various accelerators, reducing the burden on developers who would otherwise need to rewrite code for each new architecture. Our study demonstrates DaCe's capabilities by applying its automatic code generation to a critical computational kernel used in Computational Fluid Dynamics (CFD). Specifically, we focus on Neko, a Fortran-based solver that employs the spectral-element method, which relies on small tensor operations. We detail the formulation of this computational kernel using DaCe's Stateful Dataflow Multigraph (SDFG) representation and discuss how this approach facilitates high-performance code generation. Additionally, we outline the workflow for seamlessly integrating DaCe's generated code into the Neko solver. Our results highlight the portability and performance of the generated code across multiple platforms, including Nvidia GH200, Nvidia A100, and AMD MI250X GPUs, with competitive performance results. By demonstrating the potential of automatic code generation, we emphasize the feasibility of using portable solutions to ensure the long-term sustainability of large-scale scientific applications. 

Place, publisher, year, edition, pages
Springer, 2025
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-368966 (URN)
Conference
31st European Conference on Parallel and Distributed Processing, Dresden, Germany, August 25–29, 2025
Note

QC 20251204

Available from: 2025-08-23 Created: 2025-08-23 Last updated: 2025-12-04Bibliographically approved
4. A Parallel and Highly-Portable HPC Poisson Solver: Preconditioned Bi-CGSTAB with alpaka
Open this publication in new window or tab >>A Parallel and Highly-Portable HPC Poisson Solver: Preconditioned Bi-CGSTAB with alpaka
Show others...
2025 (English)In: 2025 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Institute of Electrical and Electronics Engineers (IEEE) , 2025, p. 473-483Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents the design, implementation, and performance analysis of a parallel and GPU-accelerated Poisson solver based on the Preconditioned Bi-Conjugate Gradient Stabilized (Bi-CGSTAB) method. The implementation utilizes the MPI standard for distributed-memory parallelism, while on-node computation is handled using the alpaka framework: this ensures both shared-memory parallelism and inherent performance portability across different hardware architectures. We evaluate the solver’s performances on CPUs and GPUs (NVIDIA Hopper H100 and AMD MI250X), comparing different preconditioning strategies, including Block Jacobi and Chebyshev iteration, and analyzing the performances both at single and multi-node level. The execution efficiency is characterized with a strong scaling test and using the AMD Omnitrace profiling tool. Our results indicate that a communication-free preconditioner based on the Chebyshev iteration can speed up the solver by more than six times. The solver shows comparable performances across different GPU architectures, achieving a speed-up in computation up to 50 times compared to the CPU implementation. In addition, it shows a strong scaling efficiency greater than 90% up to 64 devices.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2025
National Category
Computational Mathematics
Research subject
Applied and Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-368965 (URN)10.1109/IPDPSW66978.2025.00073 (DOI)2-s2.0-105015570366 (Scopus ID)
Conference
IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Milan, 03-07 June, 2025
Note

Part of ISBN 9798331526436

QC 20250924

Available from: 2025-08-23 Created: 2025-08-23 Last updated: 2025-09-24Bibliographically approved
5. Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software
Open this publication in new window or tab >>Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software
2023 (English)In: PPAM 2022. Lecture Notes in Computer Science, vol 13826., Springer Nature , 2023, p. 333-345Conference paper, Published paper (Refereed)
Abstract [en]

GROMACS is one of the most widely used HPC software packages using the Molecular Dynamics (MD) simulation technique. In this work, we quantify GROMACS parallel performance using different configurations, HPC systems, and FFT libraries (FFTW, Intel MKL FFT, and FFT PACK). We break down the cost of each GROMACS computational phase and identify non-scalable stages, such as MPI communication during the 3D FFT computation when using a large number of processes. We show that the Particle-Mesh Ewald phase and the 3D FFT calculation significantly impact the GROMACS performance. Finally, we discuss performance opportunities with a particular interest in developing GROMACS for the FFT calculations.

Place, publisher, year, edition, pages
Springer Nature, 2023
Series
Lecture Notes in Computer Science ; 13826
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-326454 (URN)10.1007/978-3-031-30442-2_25 (DOI)001547183900025 ()2-s2.0-85161390275 (Scopus ID)
Conference
PPAM 14th INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING AND APPLIED MATHEMATICS
Note

QC 20230515

Available from: 2023-05-02 Created: 2023-05-02 Last updated: 2025-12-08Bibliographically approved

Open Access in DiVA

fulltext(3304 kB)555 downloads
File information
File name FULLTEXT01.pdfFile size 3304 kBChecksum SHA-512
3e12c26900f07ebd29eabcbb305f9838c8e6416fb1d07911f9b249d58a841642111879f5b05308e243334c5dac2723d51c1f9bdbae59138902216608adcf3335
Type fulltextMimetype application/pdf

Authority records

Andersson, Måns

Search in DiVA

By author/editor
Andersson, Måns
By organisation
Computational Science and Technology (CST)
Computer SciencesComputational Mathematics

Search outside of DiVA

GoogleGoogle Scholar
Total: 555 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 2252 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf