kth.sePublications
System disruptions
We are currently experiencing disruptions on the search portals due to high traffic. We are working to resolve the issue, you may temporarily encounter an error message.
Change search
Link to record
Permanent link

Direct link
Publications (10 of 67) Show all publications
Jansson, N., Karp, M., Wahlgren, J., Markidis, S. & Schlatter, P. (2025). Design of Neko—A Scalable High‐Fidelity Simulation Framework With Extensive Accelerator Support. Concurrency and Computation, 37(2), Article ID e8340.
Open this publication in new window or tab >>Design of Neko—A Scalable High‐Fidelity Simulation Framework With Extensive Accelerator Support
Show others...
2025 (English)In: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 37, no 2, article id e8340Article in journal (Refereed) Published
Abstract [en]

Recent trends and advancements in including more diverse and heterogeneous hardware in High-Performance Computing (HPC) are challenging scientific software developers in their pursuit of efficient numerical methods with sustained performance across a diverse set of platforms. As a result, researchers are today forced to re-factor their codes to leverage these powerful new heterogeneous systems. We present our design considerations of Neko—a portable framework for high-fidelity spectral element flow simulations. Unlike prior works, Neko adopts a modern object-oriented Fortran 2008 approach, allowing multi-tier abstractions of the solver stack and facilitating various hardware backends ranging from general-purpose processors, accelerators down to exotic vector processors and Field-Programmable Gate Arrays (FPGAs). Focusing on the performance and portability of Neko, we describe the framework's device abstraction layer managing device memory, data transfer and kernel launches from Fortran, allowing for a solver written in a hardware-neutral yet performant way. Accelerator-specific optimizations are also discussed, with auto-tuning of key kernels and various communication strategies using device-aware MPI. Finally, we present performance measurements on a wide range of computing platforms, including the EuroHPC pre-exascale system LUMI, where Neko achieves excellent parallel efficiency for a large direct numerical simulation (DNS) of turbulent fluid flow using up to 80% of the entire LUMI supercomputer.

Place, publisher, year, edition, pages
Wiley, 2025
National Category
Computational Mathematics Computer Sciences
Identifiers
urn:nbn:se:kth:diva-358042 (URN)10.1002/cpe.8340 (DOI)001387473600001 ()2-s2.0-85213688601 (Scopus ID)
Funder
Swedish Research Council, 2019‐04723Swedish e‐Science Research Center, SESSIEU, Horizon Europe, 101093393
Note

QC 20250122

Available from: 2025-01-03 Created: 2025-01-03 Last updated: 2025-01-22Bibliographically approved
Massaro, D., Karp, M., Jansson, N., Markidis, S. & Schlatter, P. (2024). Direct numerical simulation of the turbulent flow around a Flettner rotor. Scientific Reports, 14(1), Article ID 3004.
Open this publication in new window or tab >>Direct numerical simulation of the turbulent flow around a Flettner rotor
Show others...
2024 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 14, no 1, article id 3004Article in journal (Refereed) Published
Abstract [en]

The three-dimensional turbulent flow around a Flettner rotor, i.e. an engine-driven rotating cylinder in an atmospheric boundary layer, is studied via direct numerical simulations (DNS) for three different rotation speeds (α). This technology offers a sustainable alternative mainly for marine propulsion, underscoring the critical importance of comprehending the characteristics of such flow. In this study, we evaluate the aerodynamic loads produced by the rotor of height h, with a specific focus on the changes in lift and drag force along the vertical axis of the cylinder. Correspondingly, we observe that vortex shedding is inhibited at the highest α values investigated. However, in the case of intermediate α, vortices continue to be shed in the upper section of the cylinder (y/h>0.3). As the cylinder begins to rotate, a large-scale motion becomes apparent on the high-pressure side, close to the bottom wall. We offer both a qualitative and quantitative description of this motion, outlining its impact on the wake deflection. This finding is significant as it influences the rotor wake to an extent of approximately one hundred diameters downstream. In practical applications, this phenomenon could influence the performance of subsequent boats and have an impact on the cylinder drag, affecting its fuel consumption. This fundamental study, which investigates a limited yet significant (for DNS) Reynolds number and explores various spinning ratios, provides valuable insights into the complex flow around a Flettner rotor. The simulations were performed using a modern GPU-based spectral element method, leveraging the power of modern supercomputers towards fundamental engineering problems.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Fluid Mechanics
Identifiers
urn:nbn:se:kth:diva-344051 (URN)10.1038/s41598-024-53194-x (DOI)38321050 (PubMedID)2-s2.0-85184207516 (Scopus ID)
Funder
KTH Royal Institute of TechnologyKTH Royal Institute of Technology
Note

QC 20240301

Available from: 2024-02-29 Created: 2024-02-29 Last updated: 2025-02-09Bibliographically approved
Karp, M., Suarez, E., Meinke, J. H., Andersson, M. I., Schlatter, P., Markidis, S. & Jansson, N. (2024). Experience and analysis of scalable high-fidelity computational fluid dynamics on modular supercomputing architectures. The international journal of high performance computing applications
Open this publication in new window or tab >>Experience and analysis of scalable high-fidelity computational fluid dynamics on modular supercomputing architectures
Show others...
2024 (English)In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846Article in journal (Refereed) Epub ahead of print
Abstract [en]

The never-ending computational demand from simulations of turbulence makes computational fluid dynamics (CFD) a prime application use case for current and future exascale systems. High-order finite element methods, such as the spectral element method, have been gaining traction as they offer high performance on both multicore CPUs and modern GPU-based accelerators. In this work, we assess how high-fidelity CFD using the spectral element method can exploit the modular supercomputing architecture at scale through domain partitioning, where the computational domain is split between a Booster module powered by GPUs and a Cluster module with conventional CPU nodes. We investigate several different flow cases and computer systems based on the Modular Supercomputing Architecture (MSA). We observe that for our simulations, the communication overhead and load balancing issues incurred by incorporating different computing architectures are seldom worthwhile, especially when I/O is also considered, but when the simulation at hand requires more than the combined global memory on the GPUs, utilizing additional CPUs to increase the available memory can be fruitful. We support our results with a simple performance model to assess when running across modules might be beneficial. As MSA is becoming more widespread and efforts to increase system utilization are growing more important our results give insight into when and how a monolithic application can utilize and spread out to more than one module and obtain a faster time to solution.

Place, publisher, year, edition, pages
SAGE Publications, 2024
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-358044 (URN)10.1177/10943420241303163 (DOI)001366656300001 ()2-s2.0-85210745928 (Scopus ID)
Funder
Swedish Research Council, 2019-04723Swedish e‐Science Research Center, SESSIEU, Horizon 2020, 955606
Note

QC 20250116

Available from: 2025-01-03 Created: 2025-01-03 Last updated: 2025-03-20Bibliographically approved
Jansson, N., Karp, M., Podobas, A., Markidis, S. & Schlatter, P. (2024). Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics. Computers & Fluids, 275, 106243-106243, Article ID 106243.
Open this publication in new window or tab >>Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics
Show others...
2024 (English)In: Computers & Fluids, ISSN 0045-7930, E-ISSN 1879-0747, Vol. 275, p. 106243-106243, article id 106243Article in journal (Refereed) Published
Abstract [en]

Computational fluid dynamics (CFD), in particular applied to turbulent flows, is a research area with great engineering and fundamental physical interest. However, already at moderately high Reynolds numbers the computational cost becomes prohibitive as the range of active spatial and temporal scales is quickly widening. Specifically scale-resolving simulations, including large-eddy simulation (LES) and direct numerical simulations (DNS), thus need to rely on modern efficient numerical methods and corresponding software implementations. Recent trends and advancements, including more diverse and heterogeneous hardware in High-Performance Computing (HPC), are challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim “software outlives hardware” may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. In this paper, we present Neko, a new portable framework for high-order spectral element discretization, targeting turbulent flows in moderately complex geometries. Neko is fully available as open software. Unlike prior works, Neko adopts a modern object-oriented approach in Fortran 2008, allowing multi-tier abstractions of the solver stack and facilitating hardware backends ranging from general-purpose processors (CPUs) down to exotic vector processors and FPGAs. We show that Neko’s performance and accuracy are comparable to NekRS, and thus on-par with Nek5000’s successor on modern CPU machines. Furthermore, we develop a performance model, which we use to discuss challenges and opportunities for high-order solvers on emerging hardware

Place, publisher, year, edition, pages
Elsevier BV, 2024
National Category
Fluid Mechanics Computational Mathematics Computer Sciences
Identifiers
urn:nbn:se:kth:diva-344896 (URN)10.1016/j.compfluid.2024.106243 (DOI)2-s2.0-85189508362 (Scopus ID)
Funder
Swedish Research Council, 2019-04723EU, Horizon 2020, 823691EU, Horizon 2020, 801039
Note

QC 20240403

Available from: 2024-04-02 Created: 2024-04-02 Last updated: 2025-02-05Bibliographically approved
Jansson, N., Karp, M., Perez, A., Mukha, T., Ju, Y., Liu, J., . . . Markidis, S. (2023). Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations. In: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis: . Paper presented at SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis, NOV 12–17 DENVER, CO, USA (pp. 1-9). Association for Computing Machinery (ACM), Article ID 5.
Open this publication in new window or tab >>Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations
Show others...
2023 (English)In: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery (ACM) , 2023, p. 1-9, article id 5Conference paper, Published paper (Refereed)
Abstract [en]

We detail our developments in the high-fidelity spectral-element code Neko that are essential for unprecedented large-scale direct numerical simulations of fully developed turbulence. Major inno- vations are modular multi-backend design enabling performance portability across a wide range of GPUs and CPUs, a GPU-optimized preconditioner with task overlapping for the pressure-Poisson equation and in-situ data compression. We carry out initial runs of Rayleigh–Bénard Convection (RBC) at extreme scale on the LUMI and Leonardo supercomputers. We show how Neko is able to strongly scale to 16,384 GPUs and obtain results that are not pos- sible without careful consideration and optimization of the entire simulation workflow. These developments in Neko will help resolv- ing the long-standing question regarding the ultimate regime in RBC. 

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
National Category
Computer Sciences Fluid Mechanics
Identifiers
urn:nbn:se:kth:diva-340333 (URN)10.1145/3581784.3627039 (DOI)2-s2.0-85179549233 (Scopus ID)
Conference
SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis, NOV 12–17 DENVER, CO, USA
Funder
Swedish Research Council, 2019-04723Swedish e‐Science Research CenterEU, Horizon 2020, 101093393, 101092621, 956748
Note

Part of ISBN 9798400701092

QC 20231204

Available from: 2023-12-04 Created: 2023-12-04 Last updated: 2025-02-09Bibliographically approved
Chien, S. W. .., Sato, K., Podobas, A., Jansson, N., Markidis, S. & Honda, M. (2023). Improving Cloud Storage Network Bandwidth Utilization of Scientific Applications. In: Proceedings of the 7th Asia-Pacific Workshop on Networking, APNET 2023: . Paper presented at 7th Asia-Pacific Workshop on Networking, APNET 2023, Jun 29 - Jun 30 2023, Hong Kong, China, (pp. 172-173). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Improving Cloud Storage Network Bandwidth Utilization of Scientific Applications
Show others...
2023 (English)In: Proceedings of the 7th Asia-Pacific Workshop on Networking, APNET 2023, Association for Computing Machinery (ACM) , 2023, p. 172-173Conference paper, Published paper (Refereed)
Abstract [en]

Cloud providers began to provide managed services to attract scientific applications, which have been traditionally executed on supercomputers. One example is AWS FSx for Lustre, a fully managed parallel file system (PFS) released in 2018. However, due to the nature of scientific applications, the frontend storage network bandwidth is left completely idle for the majority of its lifetime. Furthermore, the pricing model does not match the scalability requirement. We propose iFast, a novel host-side caching mechanism for scientific applications that improves storage bandwidth utilization and end-to-end application performance: by overlapping compute and data writeback through inexpensive local storage. iFast supports the Massage Passing Interface (MPI) library that is widely used by scientific applications and is implemented as a preloaded library. It requires no change to applications, the MPI library, or support from cloud operators. We demonstrate how iFast can accelerate the end-to-end time of a representative scientific application Neko, by 13-40%.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
National Category
Computer Systems Computer Sciences
Identifiers
urn:nbn:se:kth:diva-338993 (URN)10.1145/3600061.3603122 (DOI)001147804500029 ()2-s2.0-85173833099 (Scopus ID)
Conference
7th Asia-Pacific Workshop on Networking, APNET 2023, Jun 29 - Jun 30 2023, Hong Kong, China,
Note

Part of ISBN 9798400707827

QC 20231101

Available from: 2023-11-01 Created: 2023-11-01 Last updated: 2024-02-27Bibliographically approved
Ju, Y., Li, M., Perez, A., Bellentani, L., Jansson, N., Markidis, S., . . . Laure, E. (2023). In-Situ Techniques on GPU-Accelerated Data-Intensive Applications. In: Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023: . Paper presented at 19th IEEE International Conference on e-Science, e-Science 2023, Limassol, Cyprus, Oct 9 2023 - Oct 14 2023. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>In-Situ Techniques on GPU-Accelerated Data-Intensive Applications
Show others...
2023 (English)In: Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023Conference paper, Published paper (Refereed)
Abstract [en]

The computational power of High-Performance Computing (HPC) systems is constantly increasing, however, their input/output (IO) performance grows relatively slowly, and their storage capacity is also limited. This unbalance presents significant challenges for applications such as Molecular Dynamics (MD) and Computational Fluid Dynamics (CFD), which generate massive amounts of data for further visualization or analysis. At the same time, checkpointing is crucial for long runs on HPC clusters, due to limited walltimes and/or failures of system components, and typically requires the storage of large amount of data. Thus, restricted IO performance and storage capacity can lead to bottlenecks for the performance of full application workflows (as compared to computational kernels without IO). In-situ techniques, where data is further processed while still in memory rather to write it out over the I/O subsystem, can help to tackle these problems. In contrast to traditional post-processing methods, in-situ techniques can reduce or avoid the need to write or read data via the IO subsystem. They offer a promising approach for applications aiming to leverage the full power of large scale HPC systems. In-situ techniques can also be applied to hybrid computational nodes on HPC systems consisting of graphics processing units (GPUs) and central processing units (CPUs). On one node, the GPUs would have significant performance advantages over the CPUs. Therefore, current approaches for GPU-accelerated applications often focus on maximizing GPU usage, leaving CPUs underutilized. In-situ tasks using CPUs to perform data analysis or preprocess data concurrently to the running simulation, offer a possibility to improve this underutilization.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2023
Keywords
CPU, GPU, HPC, in-situ
National Category
Computer Sciences Computer Systems
Identifiers
urn:nbn:se:kth:diva-338984 (URN)10.1109/e-Science58273.2023.10254865 (DOI)2-s2.0-85174292669 (Scopus ID)
Conference
19th IEEE International Conference on e-Science, e-Science 2023, Limassol, Cyprus, Oct 9 2023 - Oct 14 2023
Note

Part of ISBN 9798350322231

QC 20231101

Available from: 2023-11-01 Created: 2023-11-01 Last updated: 2023-12-04Bibliographically approved
Karp, M., Liu, F., Stanly, R., Rezaeiravesh, S., Jansson, N., Schlatter, P. & Markidis, S. (2023). Uncertainty Quantification of Reduced-Precision Time Series in Turbulent Channel Flow. In: Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023: . Paper presented at 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Denver, United States of America, Nov 12 2023 - Nov 17 2023 (pp. 387-390). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Uncertainty Quantification of Reduced-Precision Time Series in Turbulent Channel Flow
Show others...
2023 (English)In: Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Association for Computing Machinery (ACM) , 2023, p. 387-390Conference paper, Published paper (Refereed)
Abstract [en]

With increased computational power through the use of arithmetic in low-precision, a relevant question is how lower precision affects simulation results, especially for chaotic systems where analytical round-off estimates are non-trivial to obtain. In this work, we consider how the uncertainty of the time series of a direct numerical simulation of turbulent channel flow at Ret = 180 is affected when restricted to a reduced-precision representation. We utilize a non-overlapping batch means estimator and find that the mean statistics can, in this case, be obtained with significantly fewer mantissa bits than conventional IEEE-754 double precision, but that the mean values are observed to be more sensitive in the middle of the channel than in the near-wall region. This indicates that using lower precision in the near-wall region, where the majority of the computational efforts are required, may benefit from low-precision floating point units found in upcoming computer hardware.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
National Category
Fluid Mechanics
Identifiers
urn:nbn:se:kth:diva-341470 (URN)10.1145/3624062.3624105 (DOI)2-s2.0-85178155242 (Scopus ID)
Conference
2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Denver, United States of America, Nov 12 2023 - Nov 17 2023
Note

QC 20240109

Part of ISBN 979-840070785-8

Available from: 2024-01-09 Created: 2024-01-09 Last updated: 2025-02-09Bibliographically approved
Karp, M., Podobas, A., Kenter, T., Jansson, N., Plessl, C., Schlatter, P. & Markidis, S. (2022). A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays: Design, Evaluation, and Future Challenges. In: HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region: . Paper presented at HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region (pp. 125-136). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays: Design, Evaluation, and Future Challenges
Show others...
2022 (English)In: HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region, Association for Computing Machinery (ACM) , 2022, p. 125-136Conference paper, Published paper (Refereed)
Abstract [en]

The impending termination of Moore’s law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand.

In this work, we design a custom FPGA-based accelerator for a computational fluid dynamics (CFD) code. Unlike prior work – which often focuses on accelerating small kernels – we target the entire Poisson solver on unstructured meshes based on the high-fidelity spectral element method (SEM) used in modern state-of-the-art CFD systems. We model our accelerator using an analytical performance model based on the I/O cost of the algorithm. We empirically evaluate our accelerator on a state-of-the-art Intel Stratix 10 FPGA in terms of performance and power consumption and contrast it against existing solutions on general-purpose processors (CPUs). Finally, we propose a data movement-reducing technique where we compute geometric factors on the fly, which yields significant (700+ Gflop/s) single-precision performance and an upwards of 2x reduction in runtime for the local evaluation of the Laplace operator.

We end the paper by discussing the challenges and opportunities of using reconfigurable architecture in the future, particularly in the light of emerging (not yet available) technologies.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2022
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-309190 (URN)10.1145/3492805.3492808 (DOI)2-s2.0-85122641610 (Scopus ID)
Conference
HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region
Note

QC 20220223

Available from: 2022-02-22 Created: 2022-02-22 Last updated: 2024-04-22Bibliographically approved
Atzori, M., Köpp, W., Chien, W. D., Massaro, D., Mallor, F., Peplinski, A., . . . Weinkauf, T. (2022). In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst. Journal of Supercomputing, 78(3), 3605-3620
Open this publication in new window or tab >>In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst
Show others...
2022 (English)In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 78, no 3, p. 3605-3620Article in journal (Refereed) Published
Abstract [en]

In situ visualization on high-performance computing systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We develop an in situ adaptor for Paraview Catalyst and Nek5000, a massively parallel Fortran and C code for computational fluid dynamics. We perform a strong scalability test up to 2048 cores on KTH’s Beskow Cray XC40 supercomputer and assess in situ visualization’s impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ≈ 21 % on 2048 cores (the relative efficiency of Nek5000 without in situ operations is ≈ 99 %). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses the Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in situ processing time between rank 0 and all other ranks. In our case, better scaling and load-balancing in the parallel image composition would considerably improve the performance of Nek5000 with in situ capabilities. In general, the result of this study highlights the technical challenges posed by the integration of high-performance simulation codes and data-analysis libraries and their practical use in complex cases, even when efficient algorithms already exist for a certain application scenario.

Place, publisher, year, edition, pages
Springer, 2022
Keywords
Computational fluid dynamics, High-performance computing, In situ visualization, Catalysts, Data visualization, Efficiency, Image enhancement, Scalability, Supercomputers, Visualization, Application scenario, High performance computing systems, High-fidelity simulations, High-performance simulation, Large scale turbulence, Parallel efficiency, Relative efficiency, Technical challenges, In situ processing
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-311178 (URN)10.1007/s11227-021-03990-3 (DOI)000680293400003 ()35210696 (PubMedID)2-s2.0-85111797526 (Scopus ID)
Note

QC 20220502

Available from: 2022-05-02 Created: 2022-05-02 Last updated: 2024-01-19Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-5020-1631

Search in DiVA

Show all publications