kth.sePublications
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 184) Show all publications
Andersson, M. I., Liu, F. & Markidis, S. (2024). Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems. In: Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP): . Paper presented at the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP), SIAM Conference on Parallel Processing for Scientific Computing (PP24) March 5 - 8, 2024 Baltimore, Maryland (pp. 39-52). Society for Industrial & Applied Mathematics (SIAM)
Open this publication in new window or tab >>Anderson Accelerated PMHSS for Complex-Symmetric Linear Systems
2024 (English)In: Proceedings of the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP), Society for Industrial & Applied Mathematics (SIAM) , 2024, p. 39-52Conference paper, Published paper (Refereed)
Abstract [en]

Abstract This paper presents the design and development of an Anderson Accelerated Preconditioned Modified Hermitian and Skew-Hermitian Splitting (AA-PMHSS) method for solving complex-symmetric linear systems with application to electromagnetics problems, such as wave scattering and eddy currents. While it has been shown that the Anderson acceleration of real linear systems is essentially equivalent to GMRES, we show here that the formulation using Anderson acceleration leads to a more performant method. We show relatively good robustness compared to existing preconditioned GMRES methods and significantly better performance due to the faster evaluation of the preconditioner. In particular, AA-PMHSS can be applied to solve problems and equations arising from complex-valued systems, such as time-harmonic eddy current simulations discretized with the Finite Element Method. We also evaluate three test systems present in previous literature. We show that the method is competitive with two types of preconditioned GMRES, which share the significant advantage of having a convergence rate that is independent of the discretization size.

Place, publisher, year, edition, pages
Society for Industrial & Applied Mathematics (SIAM), 2024
National Category
Computational Mathematics
Research subject
Applied and Computational Mathematics, Numerical Analysis; Applied and Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-343465 (URN)10.1137/1.9781611977967.4 (DOI)
Conference
the 2024 SIAM Conference on Parallel Processing for Scientific Computing (PP), SIAM Conference on Parallel Processing for Scientific Computing (PP24) March 5 - 8, 2024 Baltimore, Maryland
Note

QC 20240219

Part of ISBN 978-1-61197-796-7

Available from: 2024-02-14 Created: 2024-02-14 Last updated: 2024-02-19Bibliographically approved
Massaro, D., Karp, M., Jansson, N., Markidis, S. & Schlatter, P. (2024). Direct numerical simulation of the turbulent flow around a Flettner rotor. Scientific Reports, 14(1), Article ID 3004.
Open this publication in new window or tab >>Direct numerical simulation of the turbulent flow around a Flettner rotor
Show others...
2024 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 14, no 1, article id 3004Article in journal (Refereed) Published
Abstract [en]

The three-dimensional turbulent flow around a Flettner rotor, i.e. an engine-driven rotating cylinder in an atmospheric boundary layer, is studied via direct numerical simulations (DNS) for three different rotation speeds (α). This technology offers a sustainable alternative mainly for marine propulsion, underscoring the critical importance of comprehending the characteristics of such flow. In this study, we evaluate the aerodynamic loads produced by the rotor of height h, with a specific focus on the changes in lift and drag force along the vertical axis of the cylinder. Correspondingly, we observe that vortex shedding is inhibited at the highest α values investigated. However, in the case of intermediate α, vortices continue to be shed in the upper section of the cylinder (y/h>0.3). As the cylinder begins to rotate, a large-scale motion becomes apparent on the high-pressure side, close to the bottom wall. We offer both a qualitative and quantitative description of this motion, outlining its impact on the wake deflection. This finding is significant as it influences the rotor wake to an extent of approximately one hundred diameters downstream. In practical applications, this phenomenon could influence the performance of subsequent boats and have an impact on the cylinder drag, affecting its fuel consumption. This fundamental study, which investigates a limited yet significant (for DNS) Reynolds number and explores various spinning ratios, provides valuable insights into the complex flow around a Flettner rotor. The simulations were performed using a modern GPU-based spectral element method, leveraging the power of modern supercomputers towards fundamental engineering problems.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-344051 (URN)10.1038/s41598-024-53194-x (DOI)2-s2.0-85184207516 (Scopus ID)
Funder
KTH Royal Institute of TechnologyKTH Royal Institute of Technology
Note

QC 20240301

Available from: 2024-02-29 Created: 2024-02-29 Last updated: 2024-04-22Bibliographically approved
Jansson, N., Karp, M., Podobas, A., Markidis, S. & Schlatter, P. (2024). Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics. Computers & Fluids, 275, 106243-106243, Article ID 106243.
Open this publication in new window or tab >>Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics
Show others...
2024 (English)In: Computers & Fluids, ISSN 0045-7930, E-ISSN 1879-0747, Vol. 275, p. 106243-106243, article id 106243Article in journal (Refereed) Published
Abstract [en]

Computational fluid dynamics (CFD), in particular applied to turbulent flows, is a research area with great engineering and fundamental physical interest. However, already at moderately high Reynolds numbers the computational cost becomes prohibitive as the range of active spatial and temporal scales is quickly widening. Specifically scale-resolving simulations, including large-eddy simulation (LES) and direct numerical simulations (DNS), thus need to rely on modern efficient numerical methods and corresponding software implementations. Recent trends and advancements, including more diverse and heterogeneous hardware in High-Performance Computing (HPC), are challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim “software outlives hardware” may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. In this paper, we present Neko, a new portable framework for high-order spectral element discretization, targeting turbulent flows in moderately complex geometries. Neko is fully available as open software. Unlike prior works, Neko adopts a modern object-oriented approach in Fortran 2008, allowing multi-tier abstractions of the solver stack and facilitating hardware backends ranging from general-purpose processors (CPUs) down to exotic vector processors and FPGAs. We show that Neko’s performance and accuracy are comparable to NekRS, and thus on-par with Nek5000’s successor on modern CPU machines. Furthermore, we develop a performance model, which we use to discuss challenges and opportunities for high-order solvers on emerging hardware

Place, publisher, year, edition, pages
Elsevier BV, 2024
National Category
Fluid Mechanics and Acoustics Computational Mathematics Computer Sciences
Identifiers
urn:nbn:se:kth:diva-344896 (URN)10.1016/j.compfluid.2024.106243 (DOI)2-s2.0-85189508362 (Scopus ID)
Funder
Swedish Research Council, 2019-04723EU, Horizon 2020, 823691EU, Horizon 2020, 801039
Note

QC 20240403

Available from: 2024-04-02 Created: 2024-04-02 Last updated: 2024-04-22Bibliographically approved
Peng, I. B., Schulz, M., Haus, U. U., Prunty, C., Marcuello, P., Danovaro, E., . . . Markidis, S. (2024). OpenCUBE: Building an Open Source Cloud Blueprint with EPI Systems. In: Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, 2023, Revised Selected Papers: . Paper presented at International workshops held at the 29th International Conference on Parallel and Distributed Computing, Euro-Par 2023, Limassol, Cyprus, Aug 28 2023 - Sep 1 2023 (pp. 260-264). Springer Nature
Open this publication in new window or tab >>OpenCUBE: Building an Open Source Cloud Blueprint with EPI Systems
Show others...
2024 (English)In: Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, 2023, Revised Selected Papers, Springer Nature , 2024, p. 260-264Conference paper, Published paper (Refereed)
Abstract [en]

OpenCUBE aims to develop an open-source full software stack for Cloud computing blueprint deployed on EPI hardware, adaptable to emerging workloads across the computing continuum. OpenCUBE prioritizes energy awareness and utilizes open APIs, Open Source components, advanced SiPearl Rhea processors, and RISC-V accelerator. The project leverages representative workloads, such as cloud-native workloads and workflows of weather forecast data management, molecular docking, and space weather, for evaluation and validation.

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Computer Science
Keywords
Computing continuum, Converged HPC and Cloud, EPI, Open-source, RISC-V
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-346145 (URN)10.1007/978-3-031-48803-0_29 (DOI)2-s2.0-85190984237 (Scopus ID)
Conference
International workshops held at the 29th International Conference on Parallel and Distributed Computing, Euro-Par 2023, Limassol, Cyprus, Aug 28 2023 - Sep 1 2023
Note

Part of proceedings ISBN: 978-303148802-3

QC 20240506

Available from: 2024-05-03 Created: 2024-05-03 Last updated: 2024-05-06Bibliographically approved
Andersson, M. & Markidis, S. (2023). A Case Study on DaCe Portability & Performance for Batched Discrete Fourier Transforms. In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region: 2023. Paper presented at HPC Asia: International Conference on High Performance Computing in Asia-Pacific Region Singapore, Singapore, 27 February - 2 March 2023. Association for Computing Machinery (ACM)
Open this publication in new window or tab >>A Case Study on DaCe Portability & Performance for Batched Discrete Fourier Transforms
2023 (English)In: Proceedings of the International Conference on High Performance Computing in Asia-Pacific Region: 2023, Association for Computing Machinery (ACM) , 2023Conference paper, Published paper (Refereed)
Abstract [en]

With the emergence of new computer architectures, portability and performance-portability become significant concerns for developing HPC applications. This work reports our experience and lessons learned using DaCe to create and optimize batched Discrete Fourier Transform (DFT) calculations on different single node computer systems. The batched DFT calculation is an essential component in FFT algorithms and is widely used in computer science, numerical analysis, and signal processing. We implement the batched DFT with three complex-value array data layouts and compare them with the native complex type implementation. We use DaCe, which relies on Stateful DataFlow multiGraphs (SDFG) as an intermediate representation (IR) which can be optimized through transforms and then generates code for different architectures. We present several performance results showcasing the potential of DaCe for expressing HPC applications on different computer systems.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
National Category
Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-325875 (URN)10.1145/3578178.3578239 (DOI)2-s2.0-85149440955 (Scopus ID)
Conference
HPC Asia: International Conference on High Performance Computing in Asia-Pacific Region Singapore, Singapore, 27 February - 2 March 2023
Note

QC 20231002

Available from: 2023-04-18 Created: 2023-04-18 Last updated: 2023-10-02Bibliographically approved
Andersson, M., Natarajan Arul, M., Podobas, A. & Markidis, S. (2023). Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software. In: PPAM 2022. Lecture Notes in Computer Science, vol 13826.: . Paper presented at PPAM 14th INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING AND APPLIED MATHEMATICS (pp. 333-345). Springer Nature
Open this publication in new window or tab >>Breaking Down the Parallel Performance of GROMACS, a High-Performance Molecular Dynamics Software
2023 (English)In: PPAM 2022. Lecture Notes in Computer Science, vol 13826., Springer Nature , 2023, p. 333-345Conference paper, Published paper (Refereed)
Abstract [en]

GROMACS is one of the most widely used HPC software packages using the Molecular Dynamics (MD) simulation technique. In this work, we quantify GROMACS parallel performance using different configurations, HPC systems, and FFT libraries (FFTW, Intel MKL FFT, and FFT PACK). We break down the cost of each GROMACS computational phase and identify non-scalable stages, such as MPI communication during the 3D FFT computation when using a large number of processes. We show that the Particle-Mesh Ewald phase and the 3D FFT calculation significantly impact the GROMACS performance. Finally, we discuss performance opportunities with a particular interest in developing GROMACS for the FFT calculations.

Place, publisher, year, edition, pages
Springer Nature, 2023
Series
Lecture Notes in Computer Science ; 13826
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-326454 (URN)10.1007/978-3-031-30442-2_25 (DOI)
Conference
PPAM 14th INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING AND APPLIED MATHEMATICS
Note

QC 20230515

Available from: 2023-05-02 Created: 2023-05-02 Last updated: 2023-05-22Bibliographically approved
Williams, J. J., Araújo De Medeiros, D., Peng, I. B. & Markidis, S. (2023). Characterizing the Performance of the Implicit Massively Parallel Particle-in-Cell iPIC3D Code. In: SC23 Proccedings: The International Conference for High Performance Computing, Networking, Storage, and Analysis. Paper presented at SC23: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Denver, CO, USA, 12-17 November, 2023. Denver, Colorado, USA
Open this publication in new window or tab >>Characterizing the Performance of the Implicit Massively Parallel Particle-in-Cell iPIC3D Code
2023 (English)In: SC23 Proccedings: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Denver, Colorado, USA, 2023Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Optimizing iPIC3D, an implicit Particle-in-Cell (PIC) code,for large-scale 3D plasma simulations is crucial for spaceand astrophysical applications. This work focuses on characterizing iPIC3D’s communication efficiency through strategic measures like optimal node placement, communicationand computation overlap, and load balancing. Profiling andtracing tools are employed to analyze iPIC3D’s communication efficiency and provide practical recommendations. Implementing optimized communication protocols addressesthe Geospace Environmental Modeling (GEM) magnetic reconnection challenges in plasma physics with more precisesimulations. This approach captures the complexities of 3Dplasma simulations, particularly in magnetic reconnection,advancing space and astrophysical research. 

Place, publisher, year, edition, pages
Denver, Colorado, USA: , 2023
Keywords
iPIC3D, Magnetic Reconnection, Implicit PIC, Space Weather, Performance Analysis, Profiling and Tracing
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-339780 (URN)
Conference
SC23: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Denver, CO, USA, 12-17 November, 2023
Note

QC 20231120

Available from: 2023-11-17 Created: 2023-11-17 Last updated: 2023-11-20Bibliographically approved
Liu, F., Andersson, M., Fredriksson, A. & Markidis, S. (2023). Distributed Objective Function Evaluation for Optimization of Radiation Therapy Treatment Plans. In: PPAM 2022. Lecture Notes in Computer Science, vol 13826.: . Paper presented at PPAM 2022: Parallel Processing and Applied Mathematics. Springer Nature
Open this publication in new window or tab >>Distributed Objective Function Evaluation for Optimization of Radiation Therapy Treatment Plans
2023 (English)In: PPAM 2022. Lecture Notes in Computer Science, vol 13826., Springer Nature , 2023Conference paper, Published paper (Refereed)
Abstract [en]

The modern workflow for radiation therapy treatment planning involves mathematical optimization to determine optimal treatment machine parameters for each patient case. The optimization problems can be computationally expensive, requiring iterative optimization algorithms to solve. In this work, we investigate a method for distributing the calculation of objective functions and gradients for radiation therapy optimization problems across computational nodes. We test our approach on the TROTS dataset--- which consists of optimization problems from real clinical patient cases---using the IPOPT optimization solver in a leader/follower type approach for parallelization. We show that our approach can utilize multiple computational nodes efficiently, with a speedup of approximately 2-3.5 times compared to the serial version.

Place, publisher, year, edition, pages
Springer Nature, 2023
National Category
Engineering and Technology
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-327020 (URN)10.1007/978-3-031-30442-2_29 (DOI)
Conference
PPAM 2022: Parallel Processing and Applied Mathematics
Note

QC 20230523

Available from: 2023-05-17 Created: 2023-05-17 Last updated: 2024-04-22Bibliographically approved
Markidis, S. (2023). Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google's qsim. In: Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023: . Paper presented at 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Denver, United States of America, Nov 12 2023 - Nov 17 2023 (pp. 1478-1486). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Enabling Quantum Computer Simulations on AMD GPUs: a HIP Backend for Google's qsim
2023 (English)In: Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Association for Computing Machinery (ACM) , 2023, p. 1478-1486Conference paper, Published paper (Refereed)
Abstract [en]

Quantum computer simulators play a critical role in supporting the development and validation of quantum algorithms and hardware. This study focuses on porting Google's qsim, a quantum computer simulator, to AMD Graphics Processing Units (GPUs). We leverage the existing qsim CUDA backend and harness the HIPIFY tool to provide a qsim HIP backend tailored for AMD GPUs. Our performance analysis centers on evaluating the HIP backend's capabilities, executed on a computing node equipped with the AMD MI250X GPU and the AMD EPYC Trento CPU. We use the Random Quantum Circuit (RQC) sampling benchmark, employing a circuit featuring 30 qubits. The qsim HIP backend on AMD GPU outperforms the CPU version by a remarkable margin, achieving seven to nine times faster speeds. Our investigation also compares qsim's performance on the Nvidia A100 and AMD MI250X GPUs. The Nvidia A100 consistently outperforms the AMD MI250x counterpart, and this performance gap further widens with optimal gate fusion configurations. For instance, a two-gate fusion configuration exhibits a 5% difference, whereas a four-gate fusion setup reveals a large 44% performance gap. Our work highlights the substantial performance advantage of GPU-based quantum simulation over traditional CPU approaches. Despite a performance lag compared to the qsim CUDA backend, the AMD HIP qsim backend emerges as a competitive alternative poised for further optimization.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
Keywords
AMD GPUs, HIP, MI250x, qsim, quantum computer simulator, state vector simulator
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-341469 (URN)10.1145/3624062.3624223 (DOI)2-s2.0-85178136529 (Scopus ID)
Conference
2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Denver, United States of America, Nov 12 2023 - Nov 17 2023
Note

QC 20240108

Part of ISBN 979-840070785-8

Available from: 2024-01-08 Created: 2024-01-08 Last updated: 2024-01-08Bibliographically approved
Afzal, A., Hager, G., Wellein, G. & Markidis, S. (2023). Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications. In: Parallel Processing and Applied Mathematics - 14th International Conference, PPAM 2022, Revised Selected Papers: . Paper presented at 14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022, Gdansk, Poland, Sep 11 2022 - Sep 14 2022 (pp. 155-170). Springer Nature
Open this publication in new window or tab >>Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications
2023 (English)In: Parallel Processing and Applied Mathematics - 14th International Conference, PPAM 2022, Revised Selected Papers, Springer Nature , 2023, p. 155-170Conference paper, Published paper (Refereed)
Abstract [en]

This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new “phase space plot,” we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.

Place, publisher, year, edition, pages
Springer Nature, 2023
Keywords
Asynchronous MPI execution, Data analytic techniques, Machine learning techniques, Parallel distributed computing, Scalability and bottleneck
National Category
Computer Sciences Computer Engineering
Identifiers
urn:nbn:se:kth:diva-338614 (URN)10.1007/978-3-031-30442-2_12 (DOI)2-s2.0-85161384728 (Scopus ID)
Conference
14th International Conference on Parallel Processing and Applied Mathematics, PPAM 2022, Gdansk, Poland, Sep 11 2022 - Sep 14 2022
Note

Part of ISBN 9783031304415

QC 20231106

Available from: 2023-11-06 Created: 2023-11-06 Last updated: 2023-11-06Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-0639-0639

Search in DiVA

Show all publications