kth.sePublications KTH
Change search
Link to record
Permanent link

Direct link
Publications (10 of 121) Show all publications
Williams, J. J., Costea, S., Araújo De Medeiros, D., Trilaksono, J., Hegde, P. R., Tskhakaya, D., . . . Markidis, S. (2026). Integrating High Performance In-Memory Data Streaming and In-Situ Visualization in Hybrid MPI+OpenMP PIC MC Simulations Towards Exascale. The international journal of high performance computing applications
Open this publication in new window or tab >>Integrating High Performance In-Memory Data Streaming and In-Situ Visualization in Hybrid MPI+OpenMP PIC MC Simulations Towards Exascale
Show others...
2026 (English)In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846Article in journal (Refereed) Epub ahead of print
Abstract [en]

Efficient simulation of complex plasma dynamics is crucial for advancing fusion energy research. Particle-in-Cell (PIC) Monte Carlo (MC) simulations provide insights into plasma behavior, including turbulence and confinement, which are essential for optimizing fusion reactor performance. Transitioning to exascale simulations introduces significant challenges, with traditional file input/output (I/O) inefficiencies remaining a key bottleneck. This work advances BIT1, an electrostatic PIC MC code, by improving the particle mover with OpenMP task-based parallelism, integrating openPMD’s streaming API, and enabling in-memory data streaming with the ADIOS2 Sustainable Staging Transport (SST) engine to enhance I/O performance, computational efficiency, and system storage utilization. We employ profiling tools such as gprof, perf, IPM and Darshan, which provide insights into computation, communication, and I/O operations. We implement time-dependent data checkpointing with the openPMD API enabling seamless data movement and in-situ visualization for real-time analysis without interrupting the simulation. We demonstrate improvements in simulation runtime, data accessibility and real-time insights by comparing traditional file I/O with the ADIOS2 BP4 and SST backends. The proposed hybrid BIT1 openPMD SST enhancement introduces a new paradigm for real-time scientific discovery in plasma simulations, enabling faster insights and more efficient use of exascale computing resources.

Place, publisher, year, edition, pages
London, United Kingdom: Sage Publications, 2026
Keywords
Hybrid MPI+OMP Parallel Programming, openPMD, ADIOS2, In-Memory Data Streaming, In-situ Visualization, Distributed Computing, Efficient Data Processing, Large-Scale PIC MC Simulations
National Category
Fusion, Plasma and Space Physics Computer Sciences Computer Systems
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-373650 (URN)10.1177/10943420251409229 (DOI)001667829900001 ()2-s2.0-105028315172 (Scopus ID)
Note

QC 20260204

Available from: 2025-12-04 Created: 2025-12-04 Last updated: 2026-02-04Bibliographically approved
Williams, J. J., Bhole, A., Kierans, D., Hoelzl, M., Holod, I., Tang, W., . . . Markidis, S. (2024). Understanding Large-Scale Plasma Simulation Challenges for Fusion Energy on Supercomputers. In: 50th European Physical Society Conference on Plasma Physics, Magnetic Confinement Fusion Plasma, P2-097, July 8-12, Salamanca, Spain: . Paper presented at 50th European Physical Society Conference on Plasma Physics (EPS 2024), Magnetic Confinement Fusion Plasma, P2-097, July 8-12, Salamanca, Spain. European Physical Society (EPS)
Open this publication in new window or tab >>Understanding Large-Scale Plasma Simulation Challenges for Fusion Energy on Supercomputers
Show others...
2024 (English)In: 50th European Physical Society Conference on Plasma Physics, Magnetic Confinement Fusion Plasma, P2-097, July 8-12, Salamanca, Spain, European Physical Society (EPS) , 2024Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

Understanding plasma instabilities is essential for achieving sustainable fusion energy, with large-scale plasma simulations playing a crucial role in both the design and development of next-generation fusion energy devices and the modelling of industrial plasmas. To achieve sustainable fusion energy, it is essential to accurately model and predict plasma behavior under extreme conditions, requiring sophisticated simulation codes capable of capturing the complex interaction between plasma dynamics, magnetic fields, and material surfaces. In this work, we conduct a comprehensive HPC analysis of two prominent plasma simulation codes, BIT1 and JOREK, to advance understanding of plasma behavior in fusion energy applications. Our focus is on evaluating JOREK's computational efficiency and scalability for simulating non-linear MHD phenomena in tokamak fusion devices. The motivation behind this work stems from the urgent need to advance our understanding of plasma instabilities in magnetically confined fusion devices. Enhancing JOREK's performance on supercomputers improves fusion plasma code predictability, enabling more accurate modelling and faster optimization of fusion designs, thereby contributing to sustainable fusion energy. In prior studies, we analysed BIT1, a massively parallel Particle-in-Cell (PIC) code for studying plasma-material interactions in fusion devices. Our investigations into BIT1's computational requirements and scalability on advanced supercomputing architectures yielded valuable insights. Through detailed profiling and performance analysis, we have identified the primary bottlenecks and implemented optimization strategies, significantly enhancing parallel performance. This previous work serves as a foundation for our present endeavours.

Place, publisher, year, edition, pages
European Physical Society (EPS), 2024
National Category
Fusion, Plasma and Space Physics Computer Sciences
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-351022 (URN)
Conference
50th European Physical Society Conference on Plasma Physics (EPS 2024), Magnetic Confinement Fusion Plasma, P2-097, July 8-12, Salamanca, Spain
Note

QC 20240726

Vol. 48A, ISBN: 111-22-33333-44-5

Available from: 2024-07-26 Created: 2024-07-26 Last updated: 2025-09-08Bibliographically approved
Jansson, N., Karp, M., Perez, A., Mukha, T., Ju, Y., Liu, J., . . . Markidis, S. (2023). Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations. In: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis: . Paper presented at SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis, NOV 12–17 DENVER, CO, USA (pp. 1-9). Association for Computing Machinery (ACM), Article ID 5.
Open this publication in new window or tab >>Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations
Show others...
2023 (English)In: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery (ACM) , 2023, p. 1-9, article id 5Conference paper, Published paper (Refereed)
Abstract [en]

We detail our developments in the high-fidelity spectral-element code Neko that are essential for unprecedented large-scale direct numerical simulations of fully developed turbulence. Major inno- vations are modular multi-backend design enabling performance portability across a wide range of GPUs and CPUs, a GPU-optimized preconditioner with task overlapping for the pressure-Poisson equation and in-situ data compression. We carry out initial runs of Rayleigh–Bénard Convection (RBC) at extreme scale on the LUMI and Leonardo supercomputers. We show how Neko is able to strongly scale to 16,384 GPUs and obtain results that are not pos- sible without careful consideration and optimization of the entire simulation workflow. These developments in Neko will help resolv- ing the long-standing question regarding the ultimate regime in RBC. 

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
National Category
Computer Sciences Fluid Mechanics
Identifiers
urn:nbn:se:kth:diva-340333 (URN)10.1145/3581784.3627039 (DOI)001461755900003 ()2-s2.0-85179549233 (Scopus ID)
Conference
SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis, NOV 12–17 DENVER, CO, USA
Funder
Swedish Research Council, 2019-04723Swedish e‐Science Research CenterEU, Horizon 2020, 101093393, 101092621, 956748
Note

Part of ISBN 9798400701092

QC 20231204

Available from: 2023-12-04 Created: 2023-12-04 Last updated: 2025-12-08Bibliographically approved
Atzori, M., Köpp, W., Chien, W. D., Massaro, D., Mallor, F., Peplinski, A., . . . Weinkauf, T. (2021). In-situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst.
Open this publication in new window or tab >>In-situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst
Show others...
2021 (English)Report (Other academic)
Abstract [en]

In-situ visualization on HPC systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We design and develop in-situ visualization with Paraview Catalyst in Nek5000, a massively parallel Fortran and C code for computational fluid dynamics applications. We perform strong scalability tests up to 2,048 cores on KTH's Beskow Cray XC40 supercomputer and assess in-situ visualization's impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in-situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ~21\% on 2,048 cores (the relative efficiency of Nek5000 without in-situ operations is ~99\%). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in-situ processing time between rank 0 and all other ranks. Better scaling and load-balancing in the parallel image composition would considerably improve the performance and scalability of Nek5000 with in-situ capabilities in large-scale simulation.

National Category
Mechanical Engineering
Research subject
Engineering Mechanics
Identifiers
urn:nbn:se:kth:diva-295679 (URN)
Funder
Swedish Foundation for Strategic Research , BD15-0082European Commission, 800999 (SAGE2)
Note

QC 20210525

Available from: 2021-05-25 Created: 2021-05-25 Last updated: 2024-03-15Bibliographically approved
O'Donncha, F., Iakymchuk, R., Akhriev, A., Gschwandtner, P., Thoman, P., Heller, T., . . . Fahringer, T. (2020). AllScale toolchain pilot applications: PDE based solvers using a parallel development environment. Computer Physics Communications, 251, Article ID 107089.
Open this publication in new window or tab >>AllScale toolchain pilot applications: PDE based solvers using a parallel development environment
Show others...
2020 (English)In: Computer Physics Communications, ISSN 0010-4655, E-ISSN 1879-2944, Vol. 251, article id 107089Article in journal (Refereed) Published
Abstract [en]

AllScale is a programming environment targeting simplified development of highly scalable parallel applications by dividing development responsibilities into silos. The front-end AllScale API provides a simple C++ development environment through a suite of parallel constructs expressions denoting tasks operating concurrently. This interfaces with the other components of the toolchain (core-level API, compiler and runtime) which manages tasks related to the machine and system level, hidden to the user. The paper describes the development of two large-scale parallel applications within the AllScale API, namely, an advection– diffusion model with data assimilation and a Lagrangian space-weather simulation model based on a particle-in-cell method. We present mathematical formulations and implementations and evaluate parallel constructs developed using the AllScale API. The performance of the applications from the perspective of both parallel scalability, and more importantly productivity are assessed. We demonstrate how the AllScale API can greatly improve developer productivity while maintaining parallel performance in two applications with distinct numerical characteristics. Code complexity metrics demonstrate reduction in application specific implementations of up to 30% while performance tests on three different compute systems demonstrate comparable parallel scalability to an MPI version of the code.

Place, publisher, year, edition, pages
Elsevier B.V., 2020
Keywords
Advection–diffusion, Data assimilation, HPC, Numerical solvers, Partial differential equation, Particle-in-cell, Advection, Application programming interfaces (API), Codes (symbols), Machine components, Partial differential equations, Productivity, Scalability, Development environment, Mathematical formulation, Numerical characteristics, Particle in cell, Programming environment, Space weather simulation, C++ (programming language)
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-268441 (URN)10.1016/j.cpc.2019.107089 (DOI)000528002400010 ()2-s2.0-85076580760 (Scopus ID)
Note

QC 20260130

Available from: 2020-04-23 Created: 2020-04-23 Last updated: 2026-01-30Bibliographically approved
Marco, K., Gong, J., Axner, L., Laure, E. & Jan, N. (2020). GPU-acceleration of A High Order Finite Difference Code Using Curvilinear Coordinates. In: Proceedings of the 2020 International Conference on Computing, Networks and Internet of Things: . Paper presented at the 2020 International Conference on Computing, Networks and Internet of Things (pp. 41-47). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>GPU-acceleration of A High Order Finite Difference Code Using Curvilinear Coordinates
Show others...
2020 (English)In: Proceedings of the 2020 International Conference on Computing, Networks and Internet of Things, Association for Computing Machinery (ACM) , 2020, p. 41-47Conference paper, Published paper (Refereed)
Abstract [en]

GPU-accelerated computing is becoming a popular technology due to the emergence of techniques such as OpenACC, which makes it easy to port codes in their original form to GPU systems using compiler directives, and thereby speeding up computation times relatively simply. In this study we have developed an OpenACC implementation of the high order finite difference CFD solver ESSENSE for simulating compressible flows. The solver is based on summation-by-part form difference operators, and the boundary and interface conditions are weakly implemented using simultaneous approximation terms. This case study focuses on porting code to GPUs for the most time-consuming parts namely sparse matrix vector multiplications and the evaluations of fluxes. The resulting OpenACC implementation is used to simulate the Taylor-Green vortex which produces a maximum speed-up of 61.3 on a single V100 GPU by compared to serial CPU version.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2020
Keywords
Computational fluid dynamics, GPU programming, High order finite difference method, OpenACC
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-273805 (URN)10.1145/3398329.3398336 (DOI)2-s2.0-85086223863 (Scopus ID)
Conference
the 2020 International Conference on Computing, Networks and Internet of Things
Note

QC 20200819

Available from: 2020-06-26 Created: 2020-06-26 Last updated: 2023-03-30Bibliographically approved
Ahmed, L., Alogheli, H., McShane, S. A., Alvarsson, J., Berg, A., Larsson, A., . . . Spjuth, O. (2020). Predicting target profiles with confidence as a service using docking scores. Journal of Cheminformatics, 12(1), Article ID 62.
Open this publication in new window or tab >>Predicting target profiles with confidence as a service using docking scores
Show others...
2020 (English)In: Journal of Cheminformatics, E-ISSN 1758-2946, Vol. 12, no 1, article id 62Article in journal (Refereed) Published
Abstract [en]

Background: Identifying and assessing ligand-target binding is a core component in early drug discovery as one or more unwanted interactions may be associated with safety issues. Contributions: We present an open-source, extendable web service for predicting target profiles with confidence using machine learning for a panel of 7 targets, where models are trained on molecular docking scores from a large virtual library. The method uses conformal prediction to produce valid measures of prediction efficiency for a particular confidence level. The service also offers the possibility to dock chemical structures to the panel of targets with QuickVina on individual compound basis. Results: The docking procedure and resulting models were validated by docking well-known inhibitors for each of the 7 targets using QuickVina. The model predictions showed comparable performance to molecular docking scores against an external validation set. The implementation as publicly available microservices on Kubernetes ensures resilience, scalability, and extensibility.

Place, publisher, year, edition, pages
Springer Nature, 2020
Keywords
Predicted target profiles, Virtual screening, Drug discovery, Conformal prediction, AutoDock Vina, Apache Spark
National Category
Medicinal Chemistry
Identifiers
urn:nbn:se:kth:diva-285717 (URN)10.1186/s13321-020-00464-1 (DOI)000578080500001 ()2-s2.0-85092745291 (Scopus ID)
Note

QC 20201126

Available from: 2020-11-26 Created: 2020-11-26 Last updated: 2022-06-25Bibliographically approved
Ali, M., Laure, E., Zhang, B. & et al., . (2020). Workshop 13: PDSEC Parallel and Distributed Scientific and Engineering Computing. In: Proceedings 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops Ipdpsw 2020: . Paper presented at 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, USA, May 18-22, 2020 (pp. 680-681). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Workshop 13: PDSEC Parallel and Distributed Scientific and Engineering Computing
2020 (English)In: Proceedings 2020 IEEE 34th International Parallel and Distributed Processing Symposium Workshops Ipdpsw 2020, Institute of Electrical and Electronics Engineers (IEEE) , 2020, p. 680-681Conference paper, Published paper (Other academic)
Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2020
National Category
Software Engineering Computer Sciences
Identifiers
urn:nbn:se:kth:diva-377851 (URN)10.1109/IPDPSW50202.2020.00122 (DOI)2-s2.0-105030994120 (Scopus ID)
Conference
2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), New Orleans, LA, USA, May 18-22, 2020
Note

Part of ISBN 9781728174457

QC 20260306

Available from: 2026-03-06 Created: 2026-03-06 Last updated: 2026-03-06Bibliographically approved
Aguilar, X., Jordan, H., Heller, T., Hirsch, A., Fahringer, T. & Laure, E. (2019). An On-Line Performance Introspection Framework for Task-Based Runtime Systems. In: 19th International Conference on Computational Science, ICCS 2019: . Paper presented at 19th International Conference on Computational Science, ICCS 2019, Faro, Portugal, 12-14 June 2019 (pp. 238-252). Springer Verlag
Open this publication in new window or tab >>An On-Line Performance Introspection Framework for Task-Based Runtime Systems
Show others...
2019 (English)In: 19th International Conference on Computational Science, ICCS 2019, Springer Verlag , 2019, p. 238-252Conference paper, Published paper (Refereed)
Abstract [en]

The expected high levels of parallelism together with the heterogeneity and complexity of new computing systems pose many challenges to current software. New programming approaches and runtime systems that can simplify the development of parallel applications are needed. Task-based runtime systems have emerged as a good solution to cope with high levels of parallelism, while providing software portability, and easing program development. However, these runtime systems require real-time information on the state of the system to properly orchestrate program execution and optimise resource utilisation. In this paper, we present a lightweight monitoring infrastructure developed within the AllScale Runtime System, a task-based runtime system for extreme scale. This monitoring component provides real-time introspection capabilities that help the runtime scheduler in its decision-making process and adaptation, while introducing minimum overhead. In addition, the monitoring component provides several post-mortem reports as well as real-time data visualisation that can be of great help in the task of performance debugging.

Place, publisher, year, edition, pages
Springer Verlag, 2019
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 11536
Keywords
AllScale, HPX, Performance introspection, Performance monitoring, Real-time visualisation, Runtime system, Computer software portability, Computer systems programming, Data visualization, Decision making, Visualization, Real time, Runtime systems, Real time systems
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-262387 (URN)10.1007/978-3-030-22734-0_18 (DOI)000589288200018 ()2-s2.0-85067610766 (Scopus ID)
Conference
19th International Conference on Computational Science, ICCS 2019, Faro, Portugal, 12-14 June 2019
Note

QC 20191028

Part of ISBN 9783030227333

Available from: 2019-10-28 Created: 2019-10-28 Last updated: 2024-10-15Bibliographically approved
Rivas Gomez, S., Markidis, S., Laure, E., Brabazon, K., Perks, O. & Narasimhamurthy, S. (2019). Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks. In: Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018: . Paper presented at 20th International Conference on High Performance Computing and Communications, 16th IEEE International Conference on Smart City and 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, 28 June 2018 through 30 June 2018 (pp. 921-927). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks
Show others...
2019 (English)In: Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 921-927Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we consider the integration of MPI one-sided communication and non-blocking I/O in HPC-centric MapReduce frameworks. Using a decoupled strategy, we aim to overlap the Map and Reduce phases of the algorithm by allowing processes to communicate and synchronize using solely one-sided operations. Hence, we effectively increase the performance in situations where the workload per process becomes unexpectedly unbalanced. Using a Word-Count implementation and a large dataset from the Purdue MapReduce Benchmarks Suite (PUMA), we demonstrate that our approach can provide up to 23% performance improvement on average compared to a reference MapReduce implementation that uses state-of-the-art MPI collective communication and I/O.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
High Performance Computing, MapReduce, MPI One Sided Communication
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-246358 (URN)10.1109/HPCC/SmartCity/DSS.2018.00153 (DOI)000468511200121 ()2-s2.0-85062487109 (Scopus ID)
Conference
20th International Conference on High Performance Computing and Communications, 16th IEEE International Conference on Smart City and 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, 28 June 2018 through 30 June 2018
Note

QC 20190319

Part of ISBN 9781538666142

Available from: 2019-03-19 Created: 2019-03-19 Last updated: 2024-10-18Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-9901-9857

Search in DiVA

Show all publications