kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 15) Show all publications
Hegde, P. R., Marcandelli, P., He, Y., Pennati, L., Williams, J. J., Peng, I. B. & Markidis, S. (2026). A hybrid quantum-classical particle-in-cell method for plasma simulations. Future Generation Computer Systems, 175, Article ID 108087.
Open this publication in new window or tab >>A hybrid quantum-classical particle-in-cell method for plasma simulations
Show others...
2026 (English)In: Future Generation Computer Systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 175, article id 108087Article in journal (Refereed) Published
Abstract [en]

We present a hybrid quantum-classical electrostatic Particle-in-Cell (PIC) method, where the electrostatic field Poisson solver is implemented on a quantum computer simulator using a hybrid classical-quantum Neural Network (HNN) using data-driven and physics-informed learning approaches. The HNN is trained on classical PIC simulation results and executed via a PennyLane quantum simulator. The remaining computational steps, including particle motion and field interpolation, are performed on a classical system. To evaluate the accuracy and computational cost of this hybrid approach, we test the hybrid quantum-classical electrostatic PIC against the two-stream instability, a standard benchmark in plasma physics. Our results show that the quantum Poisson solver achieves comparable accuracy to classical methods. It also provides insights into the feasibility of using quantum computing and HNNs for plasma simulations. We also discuss the computational overhead associated with current quantum computer simulators, showing the challenges and potential advantages of hybrid quantum-classical numerical methods.

Place, publisher, year, edition, pages
Elsevier BV, 2026
Keywords
Hybrid Quantum-Classical Computing, Particle-in-Cell (PIC) Method, Electrostatic Poisson Solver, Quantum Neural Networks (QNNs)
National Category
Fusion, Plasma and Space Physics Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-368973 (URN)10.1016/j.future.2025.108087 (DOI)001561183000001 ()2-s2.0-105013835560 (Scopus ID)
Note

QC 20250825

Available from: 2025-08-25 Created: 2025-08-25 Last updated: 2025-09-17Bibliographically approved
Williams, J. J., Costea, S., Araújo De Medeiros, D., Trilaksono, J., Hegde, P. R., Tskhakaya, D., . . . Markidis, S. (2026). Integrating High Performance In-Memory Data Streaming and In-Situ Visualization in Hybrid MPI+OpenMP PIC MC Simulations Towards Exascale. The international journal of high performance computing applications
Open this publication in new window or tab >>Integrating High Performance In-Memory Data Streaming and In-Situ Visualization in Hybrid MPI+OpenMP PIC MC Simulations Towards Exascale
Show others...
2026 (English)In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846Article in journal (Refereed) Epub ahead of print
Abstract [en]

Efficient simulation of complex plasma dynamics is crucial for advancing fusion energy research. Particle-in-Cell (PIC) Monte Carlo (MC) simulations provide insights into plasma behavior, including turbulence and confinement, which are essential for optimizing fusion reactor performance. Transitioning to exascale simulations introduces significant challenges, with traditional file input/output (I/O) inefficiencies remaining a key bottleneck. This work advances BIT1, an electrostatic PIC MC code, by improving the particle mover with OpenMP task-based parallelism, integrating openPMD’s streaming API, and enabling in-memory data streaming with the ADIOS2 Sustainable Staging Transport (SST) engine to enhance I/O performance, computational efficiency, and system storage utilization. We employ profiling tools such as gprof, perf, IPM and Darshan, which provide insights into computation, communication, and I/O operations. We implement time-dependent data checkpointing with the openPMD API enabling seamless data movement and in-situ visualization for real-time analysis without interrupting the simulation. We demonstrate improvements in simulation runtime, data accessibility and real-time insights by comparing traditional file I/O with the ADIOS2 BP4 and SST backends. The proposed hybrid BIT1 openPMD SST enhancement introduces a new paradigm for real-time scientific discovery in plasma simulations, enabling faster insights and more efficient use of exascale computing resources.

Place, publisher, year, edition, pages
London, United Kingdom: Sage Publications, 2026
Keywords
Hybrid MPI+OMP Parallel Programming, openPMD, ADIOS2, In-Memory Data Streaming, In-situ Visualization, Distributed Computing, Efficient Data Processing, Large-Scale PIC MC Simulations
National Category
Fusion, Plasma and Space Physics Computer Sciences Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-373650 (URN)10.1177/10943420251409229 (DOI)001667829900001 ()2-s2.0-105028315172 (Scopus ID)
Note

QC 20260204

Available from: 2025-12-04 Created: 2025-12-04 Last updated: 2026-02-04Bibliographically approved
Williams, J. J., Liu, F., Trilaksono, J., Tskhakaya, D., Costea, S., Kos, L., . . . Markidis, S. (2025). Accelerating Particle-in-Cell Monte Carlo Simulations with MPI, OpenMP/OpenACC and Asynchronous Multi-GPU Programming. Journal of Computational Science, 88, Article ID 102590.
Open this publication in new window or tab >>Accelerating Particle-in-Cell Monte Carlo Simulations with MPI, OpenMP/OpenACC and Asynchronous Multi-GPU Programming
Show others...
2025 (English)In: Journal of Computational Science, ISSN 1877-7503, E-ISSN 1877-7511, Vol. 88, article id 102590Article in journal (Refereed) Published
Abstract [en]

As fusion energy devices advance, plasma simulations play a critical role in fusion reactor design. Particle-in-Cell Monte Carlo simulations are essential for modelling plasma-material interactions and analysing power load distributions on tokamak divertors. Previous work introduced hybrid parallelization in BIT1 using MPI and OpenMP/OpenACC for shared-memory and multicore CPU processing. In this extended work, we integrate MPI with OpenMP and OpenACC, focusing on asynchronous multi-GPU programming with OpenMP Target Tasks using the "nowait" and "depend" clauses, and OpenACC Parallel with the "async(n)" clause. Our results show significant performance improvements: 16 MPI ranks plus OpenMP threads reduced simulation runtime by 53% on a petascale EuroHPC supercomputer, while the OpenACC multicore implementation achieved a 58% reduction compared to the MPI-only version. Scaling to 64 MPI ranks, OpenACC outperformed OpenMP, achieving a 24% improvement in the particle mover function. On the HPE Cray EX supercomputer, OpenMP and OpenACC consistently reduced simulation times, with a 37% reduction at 100 nodes. Results from MareNostrum 5, a pre-exascale EuroHPC supercomputer, highlight OpenACC's effectiveness, with the "async(n)" configuration delivering notable performance gains. However, OpenMP asynchronous configurations outperform OpenACC at larger node counts, particularly for extreme scaling runs. As BIT1 scales asynchronously to 128 GPUs, OpenMP asynchronous multi-GPU configurations outperformed OpenACC in runtime, demonstrating superior scalability, which continues up to 400 GPUs, further improving runtime. Speedup and parallel efficiency (PE) studies reveal OpenMP asynchronous multi-GPU achieving an 8.77x speedup (54.81% PE) and OpenACC achieving an 8.14x speedup (50.87% PE) on MareNostrum 5, surpassing the CPU-only version. At higher node counts, PE declined across all implementations due to communication and synchronization costs. However, the asynchronous multi-GPU versions maintained better PE, demonstrating the benefits of asynchronous multi-GPU execution in reducing scalability bottlenecks. While the CPU-only implementation is faster in some cases, OpenMP's asynchronous multi-GPU approach delivers better GPU performance through asynchronous data transfer and task dependencies, ensuring data consistency and avoiding race conditions. Using NVIDIA Nsight tools, we confirmed BIT1's overall efficiency for large-scale plasma simulations, leveraging current and future exascale supercomputing infrastructures. Asynchronous data transfers and dedicated GPU assignments to MPI ranks enhance performance, with OpenMP’s asynchronous multi-GPU implementation utilizing OpenMP Target Tasks with "nowait" and "depend" clauses outperforming other configurations. This makes OpenMP the preferred application programming interface when performance portability, high throughput, and efficient GPU utilization are critical. This enables BIT1 to fully exploit modern supercomputing architectures, advancing fusion energy research. MareNostrum 5 brings us closer to achieving exascale performance.

Place, publisher, year, edition, pages
Netherlands: Elsevier BV, 2025
Keywords
Hybrid Programming, OpenMP, Task-Based Parallelism, Dependency Management, OpenACC, Asynchronous Execution, Multi-GPU Offloading, Overlapping Kernels, Large-Scale PIC Simulations
National Category
Computer Systems Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-362742 (URN)10.1016/j.jocs.2025.102590 (DOI)001482576300001 ()2-s2.0-105003577843 (Scopus ID)
Funder
Swedish Research Council, 2022-06725KTH Royal Institute of Technology, 101093261
Note

QC 20250619

Available from: 2025-04-24 Created: 2025-04-24 Last updated: 2025-06-19Bibliographically approved
Araújo De Medeiros, D., Williams, J. J., Wahlgren, J., Saud Maia Leite, L. & Peng, I. B. (2025). ARC-V: Vertical Resource Adaptivity for HPC Workloads in Containerized Environments. In: 31st International European Conference on Parallel and Distributed Computing: . Paper presented at The 31st International European Conference on Parallel and Distributed Computing (Euro-Par ’25), Dresden, Germany, 25-29 Aug, 2025. Springer Nature
Open this publication in new window or tab >>ARC-V: Vertical Resource Adaptivity for HPC Workloads in Containerized Environments
Show others...
2025 (English)In: 31st International European Conference on Parallel and Distributed Computing, Springer Nature , 2025Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
Vertical scaling, HPC workloads, Cloud Computing, Resource Adaptivity, Memory Resource Provisioning
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-363170 (URN)10.1007/978-3-031-99854-6_12 (DOI)2-s2.0-105015430232 (Scopus ID)
Conference
The 31st International European Conference on Parallel and Distributed Computing (Euro-Par ’25), Dresden, Germany, 25-29 Aug, 2025
Note

QC 20250923

Available from: 2025-05-06 Created: 2025-05-06 Last updated: 2025-09-23Bibliographically approved
Trilaksono, J., Ulbl, P., Williams, J. J., Pfeiler, C.-M., Finkbeiner, M., Dannert, T., . . . Jenko, F. (2025). OpenACC and OpenMP-Accelerated Fortran/C++ Gyrokinetic Fusion Code GENE-X for Heterogeneous Architectures. In: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2025: OpenACC and OpenMP-Accelerated Fortran/C++ Gyrokinetic Fusion Code GENE-X for Heterogeneous Architectures. Paper presented at PASC25: Platform for Advanced Scientific Computing Conference, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Brugg-Windisch, Switzerland, June 16-18, 2025. New York, NY, United States: Association for Computing Machinery (ACM)
Open this publication in new window or tab >>OpenACC and OpenMP-Accelerated Fortran/C++ Gyrokinetic Fusion Code GENE-X for Heterogeneous Architectures
Show others...
2025 (English)In: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2025: OpenACC and OpenMP-Accelerated Fortran/C++ Gyrokinetic Fusion Code GENE-X for Heterogeneous Architectures, New York, NY, United States: Association for Computing Machinery (ACM), 2025Conference paper, Published paper (Refereed)
Abstract [en]

Achieving net-positive fusion energy and its commercialization requires not only engineering marvels but also state-of-the-art, massively parallel codes that can handle reactor-scale simulations. The GENE-X code is a global continuum gyrokinetic turbulence code designed to predict energy confinement and heat exhaust for future fusion reactors. GENE-X is capable of simulating plasma turbulence from the core region to the wall of a magnetic confinement fusion (MCF) device. Originally written in Fortran 2008, GENE-X leverages MPI+OpenMP for parallel computing. In this paper, we augment the Fortran-based compute operators in GENE-X to a C++-17 layer exposing them to a wide array of C++-compatible tools. Here we focus on offloading the augmented operators to GPUs via directive-based programming models such as OpenACC and OpenMP offload. The performance of GENE-X is comprehensively characterized, e.g., by roofline analysis on a single GPU and scaling analysis on multi-GPUs. The major compute operators achieve significant performance improvements, shifting the bottleneck to inter-GPU communications. We discuss additional opportunities to further enhance the performance, such a.s reducing memory traffic and improving memory utilization efficiency

Place, publisher, year, edition, pages
New York, NY, United States: Association for Computing Machinery (ACM), 2025
Keywords
GPU, OpenACC, OpenMP, MPI, CUDA-aware MPI, NVIDIA, Fortran, C++, Fusion Plasma, Gyrokinetics, Turbulence
National Category
Fusion, Plasma and Space Physics Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-365479 (URN)10.1145/3732775.3733587 (DOI)001547217100015 ()2-s2.0-105010440977 (Scopus ID)
Conference
PASC25: Platform for Advanced Scientific Computing Conference, FHNW University of Applied Sciences and Arts Northwestern Switzerland, Brugg-Windisch, Switzerland, June 16-18, 2025
Note

Part of ISBN 9798400718861

QC 20250701

Available from: 2025-06-23 Created: 2025-06-23 Last updated: 2025-12-08Bibliographically approved
Williams, J. J., Costea, S., Malony, A. D., Tskhakaya, D., Kos, L., Podolnik, A., . . . Markidis, S. (2025). Understanding the Impact of OpenPMD on BIT1, a Particle-in-Cell Monte Carlo Code, Through Instrumentation, Monitoring, and In-Situ Analysis. In: Euro-Par 2024: Parallel Processing Workshops - Euro-Par 2024 International Workshops, Proceedings: . Paper presented at 30th International Conference on Parallel and Distributed Computing, Euro-Par 2024, Madrid, Spain, Aug 26 2024 - Aug 30 2024 (pp. 214-226). Springer Nature
Open this publication in new window or tab >>Understanding the Impact of OpenPMD on BIT1, a Particle-in-Cell Monte Carlo Code, Through Instrumentation, Monitoring, and In-Situ Analysis
Show others...
2025 (English)In: Euro-Par 2024: Parallel Processing Workshops - Euro-Par 2024 International Workshops, Proceedings, Springer Nature , 2025, p. 214-226Conference paper, Published paper (Refereed)
Abstract [en]

Particle-in-Cell Monte Carlo simulations on large-scale systems play a fundamental role in understanding the complexities of plasma dynamics in fusion devices. Efficient handling and analysis of vast datasets are essential for advancing these simulations. Previously, we addressed this challenge by integrating openPMD with BIT1, a Particle-in-Cell Monte Carlo code, streamlining data streaming and storage. This integration not only enhanced data management but also improved write throughput and storage efficiency. In this work, we delve deeper into the impact of BIT1 openPMD BP4 instrumentation, monitoring, and in-situ analysis. Utilizing cutting-edge profiling and monitoring tools such as gprof, CrayPat, Cray Apprentice2, IPM, and Darshan, we dissect BIT1’s performance post-integration, shedding light on computation, communication, and I/O operations. Fine-grained instrumentation offers insights into BIT1’s runtime behavior, while immediate monitoring aids in understanding system dynamics and resource utilization patterns, facilitating proactive performance optimization. Advanced visualization techniques further enrich our understanding, enabling the optimization of BIT1 simulation workflows aimed at controlling plasma-material interfaces with improved data analysis and visualization at every checkpoint without causing any interruption to the simulation.

Place, publisher, year, edition, pages
Springer Nature, 2025
Keywords
ADIOS2, Cray Apprentice2, CrayPat, Darshan, Distributed Storage, Efficient Data Processing, gprof, In-Situ Analysis, IPM, Large-Scale PIC Simulations, openPMD, Parallel I/O, Performance Monitoring and Analysis
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-368833 (URN)10.1007/978-3-031-90200-0_18 (DOI)001554561400018 ()2-s2.0-105009220990 (Scopus ID)
Conference
30th International Conference on Parallel and Distributed Computing, Euro-Par 2024, Madrid, Spain, Aug 26 2024 - Aug 30 2024
Note

Part of ISBN 9783031901997

QC 20250902

Available from: 2025-09-02 Created: 2025-09-02 Last updated: 2025-12-05Bibliographically approved
Trilaksono, J., Williams, J. J., Ulbl, P., Dannert, T., Laure, E., Markidis, S. & Jenko, F. (2024). Characterizing the Performance of the GENE-X Code for Gyrokinetic Turbulence Simulations. In: SC24 Proccedings: The International Conference for High Performance Computing, Networking, Storage, and Analysis, November 17–22. Paper presented at SC24: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Atlanta, GA, USA, November 17-22 2024. Atlanta, Georgia, USA
Open this publication in new window or tab >>Characterizing the Performance of the GENE-X Code for Gyrokinetic Turbulence Simulations
Show others...
2024 (English)In: SC24 Proccedings: The International Conference for High Performance Computing, Networking, Storage, and Analysis, November 17–22, Atlanta, Georgia, USA, 2024Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Simulating plasma turbulence in the edge region of a magnetic confinement fusion (MCF) device is crucial for identifying optimal operational scenarios for future fusion energy commercialization. GENE-X, an Eulerian electromagnetic gyrokinetic code, can simulate plasma turbulence throughout an MCF device, including the edge region. This work focuses on characterizing GENE-X's performance, such as the elapsed time during the time integration phase, memory usage, and file I/O. Two cases with different MPI decomposition schemes are analyzed using GENE-X's built-in profiler, along with profiling and monitoring tools such as IPM and Darshan. This study aims to provide a preliminary view of the HPC characteristics of the code to assist in future optimization efforts.

Place, publisher, year, edition, pages
Atlanta, Georgia, USA: , 2024
Keywords
Performance Monitoring and Analysis, IPM, Darshan, Distributed Storage, Tokamaks, Magnetic Confinement Fusion, Plasma turbulence, Large-Scale Gyrokinetic Simulations
National Category
Fusion, Plasma and Space Physics Computer Sciences
Identifiers
urn:nbn:se:kth:diva-356335 (URN)
Conference
SC24: The International Conference for High Performance Computing, Networking, Storage, and Analysis, Atlanta, GA, USA, November 17-22 2024
Note

QC 20241127

Available from: 2024-11-13 Created: 2024-11-13 Last updated: 2024-11-27Bibliographically approved
Williams, J. J., Araújo De Medeiros, D., Costea, S., Tskhakaya, D., Poeschel, F., Widera, R., . . . Markidis, S. (2024). Enabling High-Throughput Parallel I/O in Particle-in-Cell Monte Carlo Simulations with openPMD and Darshan I/O Monitoring. In: 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops): . Paper presented at 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), September 24-27, Kobe, Japan. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Enabling High-Throughput Parallel I/O in Particle-in-Cell Monte Carlo Simulations with openPMD and Darshan I/O Monitoring
Show others...
2024 (English)In: 2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), Institute of Electrical and Electronics Engineers (IEEE) , 2024Conference paper, Published paper (Refereed)
Abstract [en]

Large-scale HPC simulations of plasma dynamics in fusion devices require efficient parallel I/O to avoid slowing down the simulation and to enable the post-processing of critical information. Such complex simulations lacking parallel I/O capabilities may encounter performance bottlenecks, hindering their effectiveness in data-intensive computing tasks. In this work, we focus on introducing and enhancing the efficiency of parallel I/O operations in Particle-in-Cell Monte Carlo simulations. We first evaluate the scalability of BIT1, a massively-parallel electrostatic PIC MC code, determining its initial write throughput capabilities and performance bottlenecks using an HPC I/O performance monitoring tool, Darshan. We design and develop an adaptor to the openPMD I/O interface that allows us to stream PIC particle and field information to I/O using the BP4 backend, aggressively optimized for I/O efficiency, including the highly efficient ADIOS2 interface. Next, we explore advanced optimization techniques such as data compression, aggregation, and Lustre file striping, achieving write throughput improvements while enhancing data storage efficiency. Finally, we analyze the enhanced high-throughput parallel I/O and storage capabilities achieved through the integration of openPMD with rapid metadata extraction in BP4 format. Our study demonstrates that the integration of openPMD and advanced I/O optimizations significantly enhances BIT1's I/O performance and storage capabilities, successfully introducing high throughput parallel I/O and surpassing the capabilities of traditional file I/O.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2024
Keywords
openPMD, Darshan, ADIOS2, Parallel I/O, Efficient Data Processing, Distributed Storage, Large-Scale PIC Simulations
National Category
Computer Sciences Fusion, Plasma and Space Physics Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-351318 (URN)10.1109/CLUSTERWorkshops61563.2024.00022 (DOI)001422214200011 ()2-s2.0-85204321286 (Scopus ID)
Conference
2024 IEEE International Conference on Cluster Computing Workshops (CLUSTER Workshops), September 24-27, Kobe, Japan
Note

QC 20241113

Part of ISBN 979-8-3503-8346-1

Available from: 2024-08-07 Created: 2024-08-07 Last updated: 2025-03-24Bibliographically approved
Coti, C., Pfau-Kempf, Y., Battarbee, M., Ganse, U., Shende, S., Huck, K., . . . Palmroth, M. (2024). Integration of Modern HPC Performance Tools in Vlasiator for Exascale Analysis and Optimization. In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 27-31, San Francisco, California, USA.: . Paper presented at IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 - Workshop, San Francisco, CA, USA, May 27-31, 2024.
Open this publication in new window or tab >>Integration of Modern HPC Performance Tools in Vlasiator for Exascale Analysis and Optimization
Show others...
2024 (English)In: IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), May 27-31, San Francisco, California, USA., 2024Conference paper, Published paper (Refereed)
Abstract [en]

Key to the success of developing high-performance applications for present and future heterogeneous supercomputers will be the systematic use of measurement and analysis to understand factors that affect delivered performance in the context of parallelization strategy, heterogeneous programming methodology, data partitioning, and scalable algorithm design. The evolving complexity of future exascale platforms makes it unrealistic for application teams to implement their own tools. Similarly, it is naive to expect available robust performance tools to work effectively out-of-the-box, without integration and specialization in respect to application-specific requirements and knowledge. Vlasiator is a powerful massively parallel code for accurate magnetospheric and solar wind plasma simulations. It is being ported to the LUMI HPC system for advanced modeling of the Earth’s magnetosphere and surrounding solar wind. Building on a preexisting Vlasiator performance API called Phiprof, our work significantly advances the performance measurement and analysis capabilities offered to Vlasiator using the TAU, APEX, and IPM tools. The results presented show in-depth characterization of node-level CPU/GPU and MPI communications performance. We highlight the integration of high-level Phiprof events with detailed performance data to expose opportunities for performance tuning. Our results provide important insights to optimize Vlasiator for the upcoming Exascale machines.

Keywords
Compiler and Tools for Parallel Programming, System and Performance Monitoring, Extreme-scale Algorithms, Performance Measurement, Performance Tools and Simulators
National Category
Computer Sciences Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-351023 (URN)10.1109/IPDPSW63119.2024.00170 (DOI)001284697300022 ()2-s2.0-85200729957 (Scopus ID)
Conference
IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024 - Workshop, San Francisco, CA, USA, May 27-31, 2024
Note

Part of ISBN 979-8-3503-6460-6

QC 20240726

Available from: 2024-07-26 Created: 2024-07-26 Last updated: 2024-10-01Bibliographically approved
Williams, J. J., Tskhakaya, D., Costea, S., Peng, I. B., Garcia-Gasulla, M. & Markidis, S. (2024). Leveraging HPC Profiling and Tracing Tools to Understand the Performance of Particle-in-Cell Monte Carlo Simulations. In: Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28 – September 1, 2023, Revised Selected Papers: . Paper presented at International workshops held at the 29th International Conference on Parallel and Distributed Computing, Euro-Par 2023, Limassol, Cyprus, Aug 28 2023 - Sep 1 2023 (pp. 123-134). Springer Nature, 14351
Open this publication in new window or tab >>Leveraging HPC Profiling and Tracing Tools to Understand the Performance of Particle-in-Cell Monte Carlo Simulations
Show others...
2024 (English)In: Euro-Par 2023: Parallel Processing Workshops - Euro-Par 2023 International Workshops, Limassol, Cyprus, August 28 – September 1, 2023, Revised Selected Papers, Springer Nature , 2024, Vol. 14351, p. 123-134Conference paper, Published paper (Refereed)
Abstract [en]

Large-scale plasma simulations are critical for designing and developing next-generation fusion energy devices and modeling industrial plasmas. BIT1 is a massively parallel Particle-in-Cell code designed for specifically studying plasma material interaction in fusion devices. Its most salient characteristic is the inclusion of collision Monte Carlo models for different plasma species. In this work, we characterize single node, multiple nodes, and I/O performances of the BIT1 code in two realistic cases by using several HPC profilers, such as perf, IPM, Extrae/Paraver, and Darshan tools. We find that the BIT1 sorting function on-node performance is the main performance bottleneck. Strong scaling tests show a parallel performance of 77% and 96% on 2,560 MPI ranks for the two test cases. We demonstrate that communication, load imbalance and self-synchronization are important factors impacting the performance of the BIT1 on large-scale runs.

Place, publisher, year, edition, pages
Springer Nature, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 14351
Keywords
Large-Scale PIC Simulations, Performance Monitoring and Analysis, PIC Performance Bottleneck
National Category
Fusion, Plasma and Space Physics
Identifiers
urn:nbn:se:kth:diva-346537 (URN)10.1007/978-3-031-50684-0_10 (DOI)001279250600010 ()2-s2.0-85192270259 (Scopus ID)
Conference
International workshops held at the 29th International Conference on Parallel and Distributed Computing, Euro-Par 2023, Limassol, Cyprus, Aug 28 2023 - Sep 1 2023
Note

QC 20240521

Part of ISBN 978-303150683-3

Available from: 2024-05-16 Created: 2024-05-16 Last updated: 2024-09-10Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-2095-3063

Search in DiVA

Show all publications