kth.sePublications
Change search
Link to record
Permanent link

Direct link
Publications (10 of 13) Show all publications
Massaro, D., Karp, M., Jansson, N., Markidis, S. & Schlatter, P. (2024). Direct numerical simulation of the turbulent flow around a Flettner rotor. Scientific Reports, 14(1), Article ID 3004.
Open this publication in new window or tab >>Direct numerical simulation of the turbulent flow around a Flettner rotor
Show others...
2024 (English)In: Scientific Reports, E-ISSN 2045-2322, Vol. 14, no 1, article id 3004Article in journal (Refereed) Published
Abstract [en]

The three-dimensional turbulent flow around a Flettner rotor, i.e. an engine-driven rotating cylinder in an atmospheric boundary layer, is studied via direct numerical simulations (DNS) for three different rotation speeds (α). This technology offers a sustainable alternative mainly for marine propulsion, underscoring the critical importance of comprehending the characteristics of such flow. In this study, we evaluate the aerodynamic loads produced by the rotor of height h, with a specific focus on the changes in lift and drag force along the vertical axis of the cylinder. Correspondingly, we observe that vortex shedding is inhibited at the highest α values investigated. However, in the case of intermediate α, vortices continue to be shed in the upper section of the cylinder (y/h>0.3). As the cylinder begins to rotate, a large-scale motion becomes apparent on the high-pressure side, close to the bottom wall. We offer both a qualitative and quantitative description of this motion, outlining its impact on the wake deflection. This finding is significant as it influences the rotor wake to an extent of approximately one hundred diameters downstream. In practical applications, this phenomenon could influence the performance of subsequent boats and have an impact on the cylinder drag, affecting its fuel consumption. This fundamental study, which investigates a limited yet significant (for DNS) Reynolds number and explores various spinning ratios, provides valuable insights into the complex flow around a Flettner rotor. The simulations were performed using a modern GPU-based spectral element method, leveraging the power of modern supercomputers towards fundamental engineering problems.

Place, publisher, year, edition, pages
Springer Nature, 2024
National Category
Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-344051 (URN)10.1038/s41598-024-53194-x (DOI)2-s2.0-85184207516 (Scopus ID)
Funder
KTH Royal Institute of TechnologyKTH Royal Institute of Technology
Note

QC 20240301

Available from: 2024-02-29 Created: 2024-02-29 Last updated: 2024-04-22Bibliographically approved
Karp, M. (2024). Direct Numerical Simulation of Turbulence on Heterogenous Computer Systems: Architectures, Algorithms, and Applications. (Doctoral dissertation). Stockholm, Sweden: KTH Royal Institute of Technology
Open this publication in new window or tab >>Direct Numerical Simulation of Turbulence on Heterogenous Computer Systems: Architectures, Algorithms, and Applications
2024 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Direct numerical simulations (DNS) of turbulence have a virtually unbounded need for computing power. To carry out these simulations, software, computer architectures, and algorithms must operate as efficiently as possible to amortize the large computational cost. However, in a computing landscape increasingly incorporating heterogeneous computer systems, changes are necessary. In this thesis, we consider how DNS can be carried out efficiently on upcoming heterogeneous computer systems. This work relates to developing algorithms for upcoming heterogeneous computer architectures, overcoming software challenges associated with large-scale DNS on these platforms, and applying these developments to new flow cases that were previously too costly to carry out. We consider in particular the spectral element method for DNS and evaluate how this method maps to field-programmable gate arrays, graphics processing units, as well as conventional processors. We also consider the issue of trading arithmetic operations for less communication, reducing the cost of solving the linear systems that arise in the spectral element method. Our developments are incorporated into the spectral element framework Neko, enabling Neko to strong-scale efficiently on the largest supercomputers in the world. Finally, we have carried out several DNS such as the simulation of a Flettner rotor in a turbulent boundary layer and simulating Rayleigh-Bénard convection at very high Rayleigh numbers. The developments in this thesis enable the high-fidelity simulation of turbulence on emerging computer systems with high parallel efficiency and performance.

Abstract [sv]

Direct numerisk simulering (DNS) av turbulens kräver enorma mängder datorkraft. För att utföra simuleringar som DNS krävs det att mjukvara, datorarkitekturer och algoritmer samverkar så effektivt som möjligt tillsammans. Idag förändras superdatorer snabbt och inkoporerar nya heterogena datorarkitekturer. Detta innebär att nya tillvägagångssätt är nödvändiga för att tillgodogöra sig all beräkningskraft. I den här avhandlingen fokuserar vi på DNS på heterogena, storskaliga, datorsystem för att möjligöra nya simuleringar av turbulenta flöden. För att nå detta mål undersöker vi nya datorarkitekturer, analyserar och förbättrar de numeriska metoderna och algoritmerna vi använder och applicerar slutligen våra utvecklingar på nya simuleringar av turbulens. Vi fokuserar speciellt på den spektrala element metoden (SEM) för DNS och undersöker hur den beter sig på eng. field-programmable gate arrays, grafikkort och konventionella processorer. Vi bidrar även med analys av hur vi löser det linjära systemet som utgör kärnan i SEM för att bättre utnyttja den tillgängliga datorkraften och minska mängden data som behöver överföras. Våra förbättringar inkorporeras i SEM lösaren Neko och möjligör att Neko kan skala effektivt på de största superdatorerna i världen. Vi använder sedan detta ramverk för att genomföra flera storskaliga simuleringar. Vi genomför den första simuleringen av en Flettner rotor och dess interaktion med turbulent skjuvströmning samt simulering av Rayleigh-Bénard konvektion i en cylindrisk domän vid mycket höga Rayleigh tal. Avhandlingen möjligör detaljerad numerisk simulering av turbulens med hög skalbarhet och prestanda i dagens föränderliga datorlandskap. 

Place, publisher, year, edition, pages
Stockholm, Sweden: KTH Royal Institute of Technology, 2024. p. 54
Series
TRITA-EECS-AVL ; 2024:36
Keywords
High Performance Computing, Turbulence, Computational Fluid Dynamics, Heterogenous Computer Architectures, Högprestandaberäkningar, Turbulens, Numerisk Strömingsmekanik, Heterogena Datorarkitekturer
National Category
Computer Sciences Fluid Mechanics and Acoustics
Research subject
Computer Science
Identifiers
urn:nbn:se:kth:diva-345851 (URN)978-91-8040-910-0 (ISBN)
Public defence
2024-05-24, https://kth-se.zoom.us/s/61541415709, Kollegiesalen, Brinellvägen 6, Stockholm, 09:15 (English)
Opponent
Supervisors
Funder
Swedish e‐Science Research Center, SESSI
Note

QC 20240423

Available from: 2024-04-23 Created: 2024-04-22 Last updated: 2024-05-15Bibliographically approved
Jansson, N., Karp, M., Podobas, A., Markidis, S. & Schlatter, P. (2024). Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics. Computers & Fluids, 275, 106243-106243, Article ID 106243.
Open this publication in new window or tab >>Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics
Show others...
2024 (English)In: Computers & Fluids, ISSN 0045-7930, E-ISSN 1879-0747, Vol. 275, p. 106243-106243, article id 106243Article in journal (Refereed) Published
Abstract [en]

Computational fluid dynamics (CFD), in particular applied to turbulent flows, is a research area with great engineering and fundamental physical interest. However, already at moderately high Reynolds numbers the computational cost becomes prohibitive as the range of active spatial and temporal scales is quickly widening. Specifically scale-resolving simulations, including large-eddy simulation (LES) and direct numerical simulations (DNS), thus need to rely on modern efficient numerical methods and corresponding software implementations. Recent trends and advancements, including more diverse and heterogeneous hardware in High-Performance Computing (HPC), are challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim “software outlives hardware” may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. In this paper, we present Neko, a new portable framework for high-order spectral element discretization, targeting turbulent flows in moderately complex geometries. Neko is fully available as open software. Unlike prior works, Neko adopts a modern object-oriented approach in Fortran 2008, allowing multi-tier abstractions of the solver stack and facilitating hardware backends ranging from general-purpose processors (CPUs) down to exotic vector processors and FPGAs. We show that Neko’s performance and accuracy are comparable to NekRS, and thus on-par with Nek5000’s successor on modern CPU machines. Furthermore, we develop a performance model, which we use to discuss challenges and opportunities for high-order solvers on emerging hardware

Place, publisher, year, edition, pages
Elsevier BV, 2024
National Category
Fluid Mechanics and Acoustics Computational Mathematics Computer Sciences
Identifiers
urn:nbn:se:kth:diva-344896 (URN)10.1016/j.compfluid.2024.106243 (DOI)2-s2.0-85189508362 (Scopus ID)
Funder
Swedish Research Council, 2019-04723EU, Horizon 2020, 823691EU, Horizon 2020, 801039
Note

QC 20240403

Available from: 2024-04-02 Created: 2024-04-02 Last updated: 2024-04-22Bibliographically approved
Jansson, N., Karp, M., Perez, A., Mukha, T., Ju, Y., Liu, J., . . . Markidis, S. (2023). Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations. In: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis: . Paper presented at SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis, NOV 12–17 DENVER, CO, USA (pp. 1-9). Association for Computing Machinery (ACM), Article ID 5.
Open this publication in new window or tab >>Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations
Show others...
2023 (English)In: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery (ACM) , 2023, p. 1-9, article id 5Conference paper, Published paper (Refereed)
Abstract [en]

We detail our developments in the high-fidelity spectral-element code Neko that are essential for unprecedented large-scale direct numerical simulations of fully developed turbulence. Major inno- vations are modular multi-backend design enabling performance portability across a wide range of GPUs and CPUs, a GPU-optimized preconditioner with task overlapping for the pressure-Poisson equation and in-situ data compression. We carry out initial runs of Rayleigh–Bénard Convection (RBC) at extreme scale on the LUMI and Leonardo supercomputers. We show how Neko is able to strongly scale to 16,384 GPUs and obtain results that are not pos- sible without careful consideration and optimization of the entire simulation workflow. These developments in Neko will help resolv- ing the long-standing question regarding the ultimate regime in RBC. 

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
National Category
Computer Sciences Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-340333 (URN)10.1145/3581784.3627039 (DOI)2-s2.0-85179549233 (Scopus ID)
Conference
SC: The International Conference for High Performance Computing, Networking, Storage, and Analysis, NOV 12–17 DENVER, CO, USA
Funder
Swedish Research Council, 2019-04723Swedish e‐Science Research CenterEU, Horizon 2020, 101093393, 101092621, 956748
Note

Part of ISBN 9798400701092

QC 20231204

Available from: 2023-12-04 Created: 2023-12-04 Last updated: 2024-04-22Bibliographically approved
Karp, M., Liu, F., Stanly, R., Rezaeiravesh, S., Jansson, N., Schlatter, P. & Markidis, S. (2023). Uncertainty Quantification of Reduced-Precision Time Series in Turbulent Channel Flow. In: Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023: . Paper presented at 2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Denver, United States of America, Nov 12 2023 - Nov 17 2023 (pp. 387-390). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Uncertainty Quantification of Reduced-Precision Time Series in Turbulent Channel Flow
Show others...
2023 (English)In: Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Association for Computing Machinery (ACM) , 2023, p. 387-390Conference paper, Published paper (Refereed)
Abstract [en]

With increased computational power through the use of arithmetic in low-precision, a relevant question is how lower precision affects simulation results, especially for chaotic systems where analytical round-off estimates are non-trivial to obtain. In this work, we consider how the uncertainty of the time series of a direct numerical simulation of turbulent channel flow at Ret = 180 is affected when restricted to a reduced-precision representation. We utilize a non-overlapping batch means estimator and find that the mean statistics can, in this case, be obtained with significantly fewer mantissa bits than conventional IEEE-754 double precision, but that the mean values are observed to be more sensitive in the middle of the channel than in the near-wall region. This indicates that using lower precision in the near-wall region, where the majority of the computational efforts are required, may benefit from low-precision floating point units found in upcoming computer hardware.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2023
National Category
Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-341470 (URN)10.1145/3624062.3624105 (DOI)2-s2.0-85178155242 (Scopus ID)
Conference
2023 International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Denver, United States of America, Nov 12 2023 - Nov 17 2023
Note

QC 20240109

Part of ISBN 979-840070785-8

Available from: 2024-01-09 Created: 2024-01-09 Last updated: 2024-04-22Bibliographically approved
Karp, M., Podobas, A., Kenter, T., Jansson, N., Plessl, C., Schlatter, P. & Markidis, S. (2022). A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays: Design, Evaluation, and Future Challenges. In: HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region: . Paper presented at HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region (pp. 125-136). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays: Design, Evaluation, and Future Challenges
Show others...
2022 (English)In: HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region, Association for Computing Machinery (ACM) , 2022, p. 125-136Conference paper, Published paper (Refereed)
Abstract [en]

The impending termination of Moore’s law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand.

In this work, we design a custom FPGA-based accelerator for a computational fluid dynamics (CFD) code. Unlike prior work – which often focuses on accelerating small kernels – we target the entire Poisson solver on unstructured meshes based on the high-fidelity spectral element method (SEM) used in modern state-of-the-art CFD systems. We model our accelerator using an analytical performance model based on the I/O cost of the algorithm. We empirically evaluate our accelerator on a state-of-the-art Intel Stratix 10 FPGA in terms of performance and power consumption and contrast it against existing solutions on general-purpose processors (CPUs). Finally, we propose a data movement-reducing technique where we compute geometric factors on the fly, which yields significant (700+ Gflop/s) single-precision performance and an upwards of 2x reduction in runtime for the local evaluation of the Laplace operator.

We end the paper by discussing the challenges and opportunities of using reconfigurable architecture in the future, particularly in the light of emerging (not yet available) technologies.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2022
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-309190 (URN)10.1145/3492805.3492808 (DOI)2-s2.0-85122641610 (Scopus ID)
Conference
HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region
Note

QC 20220223

Available from: 2022-02-22 Created: 2022-02-22 Last updated: 2024-04-22Bibliographically approved
Karp, M., Jansson, N., Podobas, A., Schlatter, P. & Markidis, S. (2022). Reducing Communication in the Conjugate Gradient Method: A Case Study on High-Order Finite Elements. In: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022: . Paper presented at 2022 Platform for Advanced Scientific Computing Conference, PASC 2022, 27 June 2022 through 29 June 2022, Basel, Switzerland. Association for Computing Machinery (ACM), Article ID 2.
Open this publication in new window or tab >>Reducing Communication in the Conjugate Gradient Method: A Case Study on High-Order Finite Elements
Show others...
2022 (English)In: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022, Association for Computing Machinery (ACM) , 2022, article id 2Conference paper, Published paper (Refereed)
Abstract [en]

Currently, a major bottleneck for several scientific computations is communication, both communication between different processors, so-called horizontal communication, and vertical communication between different levels of the memory hierarchy. With this bottleneck in mind, we target a notoriously communication-bound solver at the core of many high-performance applications, namely the conjugate gradient method (CG). To reduce the communication we present lower bounds on the vertical data movement in CG and go on to make a CG solver with reduced data movement. Using our theoretical analysis we apply our CG solver on a high-performance discretization used in practice, the spectral element method (SEM). Guided by our analysis, we show that for the Poisson equation on modern GPUs we can improve the performance by 30% by both rematerializing the discrete system and by reformulating the system to work on unique degrees of freedom. In order to investigate how horizontal communication can be reduced, we compare CG to two communication-reducing techniques, namely communication-avoiding and pipelined CG. We strong scale up to 4096 CPU cores and showcase performance improvements of upwards of 70% for pipelined CG compared to standard CG when applied on SEM at scale. We show that in addition to improving the scaling capabilities of the solver, initial measurements indicate that the convergence of SEM is largely unaffected by pipelined CG.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2022
National Category
Information Systems
Identifiers
urn:nbn:se:kth:diva-317542 (URN)10.1145/3539781.3539785 (DOI)2-s2.0-85134847143 (Scopus ID)
Conference
2022 Platform for Advanced Scientific Computing Conference, PASC 2022, 27 June 2022 through 29 June 2022, Basel, Switzerland
Note

QC 20220913

Part of proceedings: ISBN 978-145039410-9

Available from: 2022-09-13 Created: 2022-09-13 Last updated: 2024-04-22Bibliographically approved
Vincent, J., Gong, J., Karp, M., Peplinski, A., Jansson, N., Podobas, A., . . . Schlatter, P. (2022). Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems. In: HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region. Paper presented at HPC Asia2022: International Conference on High Performance Computing in Asia-Pacific Region Virtual Event Japan January 12 - 14, 2022 (pp. 94-102). Association for Computing Machinery (ACM)
Open this publication in new window or tab >>Strong Scaling of OpenACC enabled Nek5000 on several GPU based HPC systems
Show others...
2022 (English)In: HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region, Association for Computing Machinery (ACM) , 2022, p. 94-102Conference paper, Published paper (Refereed)
Abstract [en]

We present new results on the strong parallel scaling for the OpenACC-accelerated implementation of the high-order spectral element fluid dynamics solver Nek5000. The test case considered consists of a direct numerical simulation of fully-developed turbulent flow in a straight pipe, at two different Reynolds numbers Reτ = 360 and Reτ = 550, based on friction velocity and pipe radius. The strong scaling is tested on several GPU-enabled HPC systems, including the Swiss Piz Daint system, TACC's Longhorn, Jülich's JUWELS Booster, and Berzelius in Sweden. The performance results show that speed-up between 3-5 can be achieved using the GPU accelerated version compared with the CPU version on these different systems. The run-time for 20 timesteps reduces from 43.5 to 13.2 seconds with increasing the number of GPUs from 64 to 512 for Reτ = 550 case on JUWELS Booster system. This illustrates the GPU accelerated version the potential for high throughput. At the same time, the strong scaling limit is significantly larger for GPUs, at about 2000 - 5000 elements per rank; compared to about 50 - 100 for a CPU-rank.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2022
Series
ACM International Conference Proceeding Series
National Category
Computer Sciences Fluid Mechanics and Acoustics
Identifiers
urn:nbn:se:kth:diva-309189 (URN)10.1145/3492805.3492818 (DOI)2-s2.0-85122621284 (Scopus ID)
Conference
HPC Asia2022: International Conference on High Performance Computing in Asia-Pacific Region Virtual Event Japan January 12 - 14, 2022
Note

QC 20220223

Part of conference proceedings: ISBN 978-145038498-8

Available from: 2022-02-22 Created: 2022-02-22 Last updated: 2024-03-18Bibliographically approved
Karp, M., Podobas, A., Jansson, N., Kenter, T., Plessl, C., Schlatter, P. & Markidis, S. (2021). High-Perfomance Spectral Element Methods on Field-Programmable Gate Arrays: Implementation, Evaluation, and Future Projection. In: Proceedings of the 35rd IEEE International Parallel & Distributed Processing Symposium, May 17-21, 2021 Portland, Oregon, USA: . Paper presented at 35rd IEEE International Parallel & Distributed Processing Symposium. Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>High-Perfomance Spectral Element Methods on Field-Programmable Gate Arrays: Implementation, Evaluation, and Future Projection
Show others...
2021 (English)In: Proceedings of the 35rd IEEE International Parallel & Distributed Processing Symposium, May 17-21, 2021 Portland, Oregon, USA, Institute of Electrical and Electronics Engineers (IEEE) , 2021Conference paper, Published paper (Refereed)
Abstract [en]

 Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate Array (FPGA), which strikes a convenient balance between complexity and performance. In this paper, we study modern FPGAs' applicability in accelerating the Spectral Element Method (SEM) core to many computational fluid dynamics (CFD) applications. We design a custom SEM hardware accelerator operating in double-precision that we empirically evaluate on the latest Stratix 10 GX-series FPGAs and position its performance (and power-efficiency) against state-of-the-art systems such as ARM ThunderX2, NVIDIA Pascal/Volta/Ampere Tesla-series cards, and general-purpose manycore CPUs. Finally, we develop a performance model for our SEM-accelerator, which we use to project future FPGAs' performance and role to accelerate CFD applications, ultimately answering the question: what characteristics would a perfect FPGA for CFD applications have? 

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2021
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-296311 (URN)10.1109/IPDPS49936.2021.00116 (DOI)000695273000108 ()2-s2.0-85110962155 (Scopus ID)
Conference
35rd IEEE International Parallel & Distributed Processing Symposium
Funder
Swedish e‐Science Research Center
Note

Part of proceedings: ISBN 978-1-6654-4066-0, QC 20230117

Available from: 2021-06-02 Created: 2021-06-02 Last updated: 2024-04-22Bibliographically approved
Karp, M., Podobas, A., Jansson, N., Kenter, T., Plessl, C., Schlatter, P. & Markidis, S. (2020). Appendix to High-Performance Spectral Element Methods on Field-Programmable Gate Arrays.
Open this publication in new window or tab >>Appendix to High-Performance Spectral Element Methods on Field-Programmable Gate Arrays
Show others...
2020 (English)Other (Other academic)
Abstract [en]

In this Appendix we display some results we omitted fromour article ”High-Performance Spectral Element Methods onField-Programmable Gate Arrays”. In particular we showcasethe measured bandwidth for the FPGA we used (Stratix 10) aswell as the performance for our accelerator at different stagesof optimization. In addition to this, we show illustrate morepractical aspects of our performance/resource modeling

Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the comforts of general-purpose architectures in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate Array (FPGA), which strikes a good balance between complexity and performance.In this paper, we study modern FPGAs' applicability for use in accelerating the Spectral Element Method (SEM) core to many computational fluid dynamics (CFD) applications. We design a custom SEM hardware accelerator that we evaluate and empirically evaluate on the latest Stratix 10 SX-series FPGAs and position its performance (and power-efficiency) against state-of-the-art systems such as ARM ThunderX2, NVIDIA Pascal/Volta/Ampere Tesla-series cards, and general-purpose manycore CPUs. Finally, we develop a performance model for our SEM-accelerator, which we use to project the performance and role of future FPGAs to accelerator CFD applications, ultimately answering the question: what characteristics would a perfect FPGA for CFD applications have?

National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-284225 (URN)
Note

QC 20201026

Available from: 2020-10-17 Created: 2020-10-17 Last updated: 2022-10-24Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0003-3374-8093

Search in DiVA

Show all publications