kth.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 178) Show all publications
Andersson, M. I., Liu, F. & Markidis, S. (2024). Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems. In: Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP): . Paper presented at the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP), SIAM Conference on Parallel Processing for Scientific Computing (PP24) March 5 - 8, 2024 Baltimore, Maryland (pp. 39-52). Society for Industrial & Applied Mathematics (SIAM)
Open this publication in new window or tab >>Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems
2024 (English)In: Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP), Society for Industrial & Applied Mathematics (SIAM) , 2024, p. 39-52Conference paper, Published paper (Refereed)
Abstract [en]

Abstract This paper presents the design and development of an Anderson Accelerated Preconditioned Modified Hermitian and Skew-Hermitian Splitting (AA-PMHSS) method for solving complex-symmetric linear systems with application to electromagnetics problems, such as wave scattering and eddy currents. While it has been shown that the Anderson acceleration of real linear systems is essentially equivalent to GMRES, we show here that the formulation using Anderson acceleration leads to a more performant method. We show relatively good robustness compared to existing preconditioned GMRES methods and significantly better performance due to the faster evaluation of the preconditioner. In particular, AA-PMHSS can be applied to solve problems and equations arising from complex-valued systems, such as time-harmonic eddy current simulations discretized with the Finite Element Method. We also evaluate three test systems present in previous literature. We show that the method is competitive with two types of preconditioned GMRES, which share the significant advantage of having a convergence rate that is independent of the discretization size.

Place, publisher, year, edition, pages
Society for Industrial & Applied Mathematics (SIAM), 2024
National Category
Computational Mathematics
Research subject
Applied and Computational Mathematics, Numerical Analysis; Applied and Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-343465 (URN)10.1137/1.9781611977967.4 (DOI)
Conference
the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP), SIAM Conference on Parallel Processing for Scientific Computing (PP24) March 5 - 8, 2024 Baltimore, Maryland
Note

QC 20240219

Part of ISBN 978-1-61197-796-7

Available from: 2024-02-14 Created: 2024-02-14 Last updated: 2024-02-19Bibliographically approved
Massaro, D., Karp, M., Jansson, N., Markidis, S. & Schlatter, P. (2024). Direct numerical simulation of the turbulent flow around a Flettner rotor. Scientific Reports, 14(1), Article ID 3004.
Open this publication in new window or tab >>Direct numerical simulation of the turbulent flow around a Flettner rotor
Show others...
2024 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 14, no 1, article id 3004Article in journal (Refereed) Published
Abstract [en]

The three-dimensional turbulent flow around a Flettner rotor, i.e. an engine-driven rotating cylinder in an atmospheric boundary layer, is studied via direct numerical simulations (DNS) for three different rotation speeds (α). This technology offers a sustainable alternative mainly for marine propulsion, underscoring the critical importance of comprehending the characteristics of such flow. In this study, we evaluate the aerodynamic loads produced by the rotor of height h, with a specific focus on the changes in lift and drag force along the vertical axis of the cylinder. Correspondingly, we observe that vortex shedding is inhibited at the highest α values investigated. However, in the case of intermediate α, vortices continue to be shed in the upper section of the cylinder (y/h>0.3). As the cylinder begins to rotate, a large-scale motion becomes apparent on the high-pressure side, close to the bottom wall. We offer both a qualitative and quantitative description of this motion, outlining its impact on the wake deflection. This finding is significant as it influences the rotor wake to an extent of approximately one hundred diameters downstream. In practical applications, this phenomenon could influence the performance of subsequent boats and have an impact on the cylinder drag, affecting its fuel consumption. This fundamental study, which investigates a limited yet significant (for DNS) Reynolds number and explores various spinning ratios, provides valuable insights into the complex flow around a Flettner rotor. The simulations were performed using a modern GPU-based spectral element method, leveraging the power of modern supercomputers towards fundamental engineering problems.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-344051 (URN)10.1038/s41598-024-53194-x (DOI)2-s2.0-85184207516 (Scopus ID)
Funder
KTH Royal Institute of TechnologyKTH Royal Institute of Technology
Note

QC 20240301

Available from: 2024-02-29 Created: 2024-02-29 Last updated: 2024-03-01Bibliographically approved
Andersson, M. & Markidis, S. (2023). A Case Study on DaCe Portability & Performance for Batched Discrete Fourier Transforms. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region: 2023. Paper presented at HPC Asia: International Conference on High Performance Computing in Asia-Pacific Region Singapore, Singapore, 27 February - 2 March 2023. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>A Case Study on DaCe Portability & Performance for Batched Discrete Fourier Transforms
2023 (English)In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region: 2023, Association for Computing Machinery (ACM) , 2023Conference paper, Published paper (Refereed)
Abstract [en]

With the emergence of new computer architectures, portability and performance-portability become significant concerns for developing HPC applications. This work reports our experience and lessons learned using DaCe to create and optimize batched Discrete Fourier Transform (DFT) calculations on different single node computer systems. The batched DFT calculation is an essential component in FFT algorithms and is widely used in computer science, numerical analysis, and signal processing. We implement the batched DFT with three complex-value array data layouts and compare them with the native complex type implementation. We use DaCe, which relies on Stateful DataFlow multiGraphs (SDFG) as an intermediate representation (IR) which can be optimized through transforms and then generates code for different architectures. We present several performance results showcasing the potential of DaCe for expressing HPC applications on different computer systems.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-325875 (URN)10.1145/3578178.3578239 (DOI)2-s2.0-85149440955 (Scopus ID)
Conference
HPC Asia: International Conference on High Performance Computing in Asia-Pacific Region Singapore, Singapore, 27 February - 2 March 2023
Note

QC 20231002

Available from: 2023-04-18 Created: 2023-04-18 Last updated: 2023-10-02Bibliographically approved
Andersson, M., Natarajan Arul, M., Podobas, A. & Markidis, S. (2023). Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software. In: PPAM 2022. Lecture Notes in Computer Science, vol 13826.: . Paper presented at PPAM 14th INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING AND APPLIED MATHEMATICS (pp. 333-345). Springer Nature
Open this publication in new window or tab >>Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software
2023 (English)In: PPAM 2022. Lecture Notes in Computer Science, vol 13826., Springer Nature , 2023, p. 333-345Conference paper, Published paper (Refereed)
Abstract [en]

GROMACS is one of the most widely used HPC software packages using the Molecular Dynamics (MD) simulation technique. In this work, we quantify GROMACS parallel performance using different configurations, HPC systems, and FFT libraries (FFTW, Intel MKL FFT, and FFT PACK). We break down the cost of each GROMACS computational phase and identify non-scalable stages, such as MPI communication during the 3D FFT computation when using a large number of processes. We show that the Particle-Mesh Ewald phase and the 3D FFT calculation significantly impact the GROMACS performance. Finally, we discuss performance opportunities with a particular interest in developing GROMACS for the FFT calculations.

Place, publisher, year, edition, pages
Springer Nature, 2023
Series
Lecture Notes in Computer Science ; 13826
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-326454 (URN)10.1007/978-3-031-30442-2_25 (DOI)
Conference
PPAM 14th INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING AND APPLIED MATHEMATICS
Note

QC 20230515

Available from: 2023-05-02 Created: 2023-05-02 Last updated: 2023-05-22Bibliographically approved
Williams, J. J., Araújo De Medeiros, D., Peng, I. B. & Markidis, S. (2023). Characterizing the Performance of the Implicit Massively Parallel Particle-in-Cell iPIC3D Code. In: SC23 Proccedings: The International Conference for High Performance Computing, Networking, Storage, and Analysis. Paper presented at SC23: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Denver, CO, USA, 12-17 November, 2023. Denver, Colorado, USA
Open this publication in new window or tab >>Characterizing the Performance of the Implicit Massively Parallel Particle-in-Cell iPIC3D Code
2023 (English)In: SC23 Proccedings: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Denver, Colorado, USA, 2023Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Optimizing iPIC3D, an implicit Particle-in-Cell (PIC) code,for large-scale 3D plasma simulations is crucial for spaceand astrophysical applications. This work focuses on characterizing iPIC3D’s communication efficiency through strategic measures like optimal node placement, communicationand computation overlap, and load balancing. Profiling andtracing tools are employed to analyze iPIC3D’s communication efficiency and provide practical recommendations. Implementing optimized communication protocols addressesthe Geospace Environmental Modeling (GEM) magnetic reconnection challenges in plasma physics with more precisesimulations. This approach captures the complexities of 3Dplasma simulations, particularly in magnetic reconnection,advancing space and astrophysical research. 

Place, publisher, year, edition, pages
Denver, Colorado, USA: , 2023
Keywords
iPIC3D, Magnetic Reconnection, Implicit PIC, Space Weather, Performance Analysis, Profiling and Tracing
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-339780 (URN)
Conference
SC23: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Denver, CO, USA, 12-17 November, 2023
Note

QC 20231120

Available from: 2023-11-17 Created: 2023-11-17 Last updated: 2023-11-20Bibliographically approved
Liu, F., Andersson, M., Fredriksson, A. & Markidis, S. (2023). Distributed Objective Function Evaluation for Optimization of Radiation Therapy Treatment Plans. In: PPAM 2022. Lecture Notes in Computer Science, vol 13826.: . Paper presented at PPAM 2022: Parallel Processing and Applied Mathematics. Springer Nature
Open this publication in new window or tab >>Distributed Objective Function Evaluation for Optimization of Radiation Therapy Treatment Plans
2023 (English)In: PPAM 2022. Lecture Notes in Computer Science, vol 13826., Springer Nature , 2023Conference paper, Published paper (Refereed)
Abstract [en]

The modern workflow for radiation therapy treatment planning involves mathematical optimization to determine optimal treatment machine parameters for each patient case. The optimization problems can be computationally expensive, requiring iterative optimization algorithms to solve. In this work, we investigate a method for distributing the calculation of objective functions and gradients for radiation therapy optimization problems across computational nodes. We test our approach on the TROTS dataset--- which consists of optimization problems from real clinical patient cases---using the IPOPT optimization solver in a leader/follower type approach for parallelization. We show that our approach can utilize multiple computational nodes efficiently, with a speedup of approximately 2-3.5 times compared to the serial version.

Place, publisher, year, edition, pages
Springer Nature, 2023
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-327020 (URN)10.1007/978-3-031-30442-2_29 (DOI)
Conference
PPAM 2022: Parallel Processing and Applied Mathematics
Note

QC 20230523

Available from: 2023-05-17 Created: 2023-05-17 Last updated: 2023-05-23Bibliographically approved
Markidis, S. (2023). Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google's qsim. In: Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023: . Paper presented at 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Denver, United States of America, Nov 12 2023 - Nov 17 2023 (pp. 1478-1486). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google's qsim
2023 (English)In: Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Association for Computing Machinery (ACM) , 2023, p. 1478-1486Conference paper, Published paper (Refereed)
Abstract [en]

Quantum computer simulators play a critical role in supporting the development and validation of quantum algorithms and hardware. This study focuses on porting Google's qsim, a quantum computer simulator, to AMD Graphics Processing Units (GPUs). We leverage the existing qsim CUDA backend and harness the HIPIFY tool to provide a qsim HIP backend tailored for AMD GPUs. Our performance analysis centers on evaluating the HIP backend's capabilities, executed on a computing node equipped with the AMD MI250X GPU and the AMD EPYC Trento CPU. We use the Random Quantum Circuit (RQC) sampling benchmark, employing a circuit featuring 30 qubits. The qsim HIP backend on AMD GPU outperforms the CPU version by a remarkable margin, achieving seven to nine times faster speeds. Our investigation also compares qsim's performance on the Nvidia A100 and AMD MI250X GPUs. The Nvidia A100 consistently outperforms the AMD MI250x counterpart, and this performance gap further widens with optimal gate fusion configurations. For instance, a two-gate fusion configuration exhibits a 5% difference, whereas a four-gate fusion setup reveals a large 44% performance gap. Our work highlights the substantial performance advantage of GPU-based quantum simulation over traditional CPU approaches. Despite a performance lag compared to the qsim CUDA backend, the AMD HIP qsim backend emerges as a competitive alternative poised for further optimization.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
AMD GPUs, HIP, MI250x, qsim, quantum computer simulator, state vector simulator
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-341469 (URN)10.1145/3624062.3624223 (DOI)2-s2.0-85178136529 (Scopus ID)
Conference
2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Denver, United States of America, Nov 12 2023 - Nov 17 2023
Note

QC 20240108

Part of ISBN 979-840070785-8

Available from: 2024-01-08 Created: 2024-01-08 Last updated: 2024-01-08Bibliographically approved
Afzal, A., Hager, G., Wellein, G. & Markidis, S. (2023). Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications. In: Parallel Processing and Applied Mathematics - 14th International Conference, PPAM 2022, Revised Selected Papers: . Paper presented at 14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022, Gdansk, Poland, Sep 11 2022 - Sep 14 2022 (pp. 155-170). Springer Nature
Open this publication in new window or tab >>Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications
2023 (English)In: Parallel Processing and Applied Mathematics - 14th International Conference, PPAM 2022, Revised Selected Papers, Springer Nature , 2023, p. 155-170Conference paper, Published paper (Refereed)
Abstract [en]

This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new “phase space plot,” we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.

Place, publisher, year, edition, pages
Springer Nature, 2023
Keywords
Asynchronous MPI execution, Data analytic techniques, Machine learning techniques, Parallel distributed computing, Scalability and bottleneck
National Category
Computer Sciences Computer Engineering
Identifiers
urn:nbn:se:kth:diva-338614 (URN)10.1007/978-3-031-30442-2_12 (DOI)2-s2.0-85161384728 (Scopus ID)
Conference
14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022, Gdansk, Poland, Sep 11 2022 - Sep 14 2022
Note

Part of ISBN 9783031304415

QC 20231106

Available from: 2023-11-06 Created: 2023-11-06 Last updated: 2023-11-06Bibliographically approved
Jansson, N., Karp, M., Perez, A., Mukha, T., Ju, Y., Liu, J., . . . Markidis, S. (2023). Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations. In: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis: . Paper presented at SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis, NOV 12–17 DENVER, CO, USA (pp. 1-9). Association for Computing Machinery (ACM), Article ID 5.
Open this publication in new window or tab >>Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations
Show others...
2023 (English)In: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery (ACM) , 2023, p. 1-9, article id 5Conference paper, Published paper (Refereed)
Abstract [en]

We detail our developments in the high-fidelity spectral-element code Neko that are essential for unprecedented large-scale direct numerical simulations of fully developed turbulence. Major inno- vations are modular multi-backend design enabling performance portability across a wide range of GPUs and CPUs, a GPU-optimized preconditioner with task overlapping for the pressure-Poisson equation and in-situ data compression. We carry out initial runs of Rayleigh–Bénard Convection (RBC) at extreme scale on the LUMI and Leonardo supercomputers. We show how Neko is able to strongly scale to 16,384 GPUs and obtain results that are not pos- sible without careful consideration and optimization of the entire simulation workflow. These developments in Neko will help resolv- ing the long-standing question regarding the ultimate regime in RBC. 

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
National Category
Computer Sciences Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-340333 (URN)10.1145/3581784.3627039 (DOI)2-s2.0-85179549233 (Scopus ID)
Conference
SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis, NOV 12–17 DENVER, CO, USA
Funder
Swedish Research Council, 2019-04723Swedish e‐Science Research CenterEU, Horizon 2020, 101093393, 101092621, 956748
Note

Part of ISBN 9798400701092

QC 20231204

Available from: 2023-12-04 Created: 2023-12-04 Last updated: 2024-03-18Bibliographically approved
Pornthisan, N. & Markidis, S. (2023). Fast Electromagnetic Field Pattern Calculation with Fourier Neural Operators. In: Computational Science – ICCS 2023 - 23rd International Conference, Proceedings: . Paper presented at 23rd International Conference on Computational Science, ICCS 2023, Prague, Czechia, Jul 3 2023 - Jul 5 2023 (pp. 247-255). Springer Nature
Open this publication in new window or tab >>Fast Electromagnetic Field Pattern Calculation with Fourier Neural Operators
2023 (English)In: Computational Science – ICCS 2023 - 23rd International Conference, Proceedings, Springer Nature , 2023, p. 247-255Conference paper, Published paper (Refereed)
Abstract [en]

Calculating the field pattern arising from an array of radiating sources is a central problem in Computational ElectroMagnetics (CEM) and a critical operation for designing and developing antenna systems. Yet, it is a computationally expensive operation when using traditional numerical approaches, including finite-difference in the time and spectral domains. To address this issue, we develop a new data-driven surrogate model for fast and accurate calculation of the field radiation pattern. The method is based on the Fourier Neural Operator (FNO) technique. We show that we achieve a performance improvement of 31x when compared to the performance of the Meep CEM solver when running on a desktop laptop CPU at the cost of a small accuracy loss.

Place, publisher, year, edition, pages
Springer Nature, 2023
Keywords
Computational Electromagnetics, Dipole Antenna Array, Electromagnetic Field Pattern, Fourier Neural Operator
National Category
Computer Sciences Telecommunications
Identifiers
urn:nbn:se:kth:diva-336726 (URN)10.1007/978-3-031-36021-3_24 (DOI)2-s2.0-85169680148 (Scopus ID)
Conference
23rd International Conference on Computational Science, ICCS 2023, Prague, Czechia, Jul 3 2023 - Jul 5 2023
Note

Part of ISBN 9783031360206

QC 20230919

Available from: 2023-09-19 Created: 2023-09-19 Last updated: 2023-09-19Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0639-0639

Search in DiVA

Show all publications