kth.sePublications
Change search
Refine search result
12 1 - 50 of 65
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abreu, Rodrigo
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Adaptive computation of aeroacoustic sources for a rudimentary landing gear using lighthill's analogy2011In: 17th AIAA/CEAS AeroacousticsConference 2011: 32nd AIAA Aeroacoustics Conference, 2011Conference paper (Refereed)
    Abstract [en]

    We present our simulation results for the benchmark problem of the ow past a Rudimentary Landing Gear (RLG) using a General Galerkin (G2) nite element method, also referred to as Adaptive DNS/LES. In G2 no explicit subgrid model is used, instead the compuational mesh is adaptively re ned with respect to an a posteriori error es-timate of a quantity of interest in the computation, in this case the drag force on the RLG. Turbulent boundary layers are modeled using a simple wall layer model with the shear stress at walls proportional to the skin friction, which here is assumed to be small and, therefore, can be approximated by zero skin friction. We compare our results with experimental data and other state of the art computations, where we nd good agreement in sound pressure levels, surface velocities and ow separation. We also compare with detailed surface pressure experimental data where we nd largely good agreement, apart from some local dierences for which we discuss possible explanations.

  • 2.
    Atzori, Marco
    et al.
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Köpp, Wiebke
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Chien, Wei Der
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Massaro, Daniele
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Mallor, Fermin
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Peplinski, Adam
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Rezaei, Mohamad
    PDC Center for High Performance Computing, KTH Royal Institute of Technology.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Vinuesa, Ricardo
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Weinkauf, Tino
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    In-situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst2021Report (Other academic)
    Abstract [en]

    In-situ visualization on HPC systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We design and develop in-situ visualization with Paraview Catalyst in Nek5000, a massively parallel Fortran and C code for computational fluid dynamics applications. We perform strong scalability tests up to 2,048 cores on KTH's Beskow Cray XC40 supercomputer and assess in-situ visualization's impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in-situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ~21\% on 2,048 cores (the relative efficiency of Nek5000 without in-situ operations is ~99\%). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in-situ processing time between rank 0 and all other ranks. Better scaling and load-balancing in the parallel image composition would considerably improve the performance and scalability of Nek5000 with in-situ capabilities in large-scale simulation.

  • 3.
    Atzori, Marco
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Köpp, Wiebke
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Chien, Wei Der
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Theoretical Computer Science, TCS. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Massaro, Daniele
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics.
    Mallor, Fermin
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Peplinski, Adam
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Rezaei, Mohammadtaghi
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Vinuesa, Ricardo
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Laure, E.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Weinkauf, Tino
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    In situ visualization of large-scale turbulence simulations in Nek5000 with ParaView Catalyst2022In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 78, no 3, p. 3605-3620Article in journal (Refereed)
    Abstract [en]

    In situ visualization on high-performance computing systems allows us to analyze simulation results that would otherwise be impossible, given the size of the simulation data sets and offline post-processing execution time. We develop an in situ adaptor for Paraview Catalyst and Nek5000, a massively parallel Fortran and C code for computational fluid dynamics. We perform a strong scalability test up to 2048 cores on KTH’s Beskow Cray XC40 supercomputer and assess in situ visualization’s impact on the Nek5000 performance. In our study case, a high-fidelity simulation of turbulent flow, we observe that in situ operations significantly limit the strong scalability of the code, reducing the relative parallel efficiency to only ≈ 21 % on 2048 cores (the relative efficiency of Nek5000 without in situ operations is ≈ 99 %). Through profiling with Arm MAP, we identified a bottleneck in the image composition step (that uses the Radix-kr algorithm) where a majority of the time is spent on MPI communication. We also identified an imbalance of in situ processing time between rank 0 and all other ranks. In our case, better scaling and load-balancing in the parallel image composition would considerably improve the performance of Nek5000 with in situ capabilities. In general, the result of this study highlights the technical challenges posed by the integration of high-performance simulation codes and data-analysis libraries and their practical use in complex cases, even when efficient algorithms already exist for a certain application scenario.

  • 4.
    Bale, Rahul
    et al.
    RIKEN Center for Computational Science, Kobe, Japan.
    Patankar, Neelesh A.
    Department of Mechanical Engineering, Northwestern University, USA.
    Jansson, Niclas
    RIKEN Center for Computational Science, Kobe, Japan.
    Onishi, Keiji
    RIKEN Center for Computational Science, Kobe, Japan.
    Tsubokura, Makoto
    Department of Computational Science, Graduate School of System Informatics, Kobe University and RIKEN Center for Computational Science, Kobe, Japan.
    Stencil Penalty approach based constraint immersed boundary method2020In: Computers & Fluids, ISSN 0045-7930, E-ISSN 1879-0747, Vol. 2000, p. 104457-Article in journal (Refereed)
    Abstract [en]

    The constraint-based immersed boundary (cIB) method has been shown to be accurate between low and moderate Reynolds number (Re) flows when the immersed body constraint is imposed as a volumetric constraint force. When the IB is modelled as a zero-thickness interface, where it is no longer possible to model a volumetric constraint force, we found that cIB is not able to produce accurate results. The main source of inaccuracies in the cIB method is the distribution of the pressure field around the IB surface. An IB surface results in a jump in the pressure field across the IB. Evaluation of the discrete gradient of pressure close to the IB leads to a pressure gradient that does not satisfy the Neumann boundary condition for pressure at the IB. Furthermore, a non-zero discrete pressure gradient on the IB results in spurious flow at grid points close to the IB. We present a novel numerical formulation which adapts the cIB formulation for ‘zero-thickness’ immersed bodies. In order to impose the Neumann boundary condition on pressure on the IB more accurately, we introduce an additional body force to the momentum equation. A WENO based stencil penalization technique is used to define the new force term. Due to the more accurate imposition on the Neumann pressure boundary condition on the IB, spurious flow is reduced and the accuracy of no penetration velocity boundary condition on the IB is improved.

  • 5.
    Chien, Steven W.D.
    et al.
    University of Edinburgh, United Kingdom.
    Sato, Kento
    RIKEN Center for Computational Science Japan.
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Honda, Michio
    University of Edinburgh, United Kingdom.
    Improving Cloud Storage Network Bandwidth Utilization of Scientific Applications2023In: Proceedings of the 7th Asia-Pacific Workshop on Networking, APNET 2023, Association for Computing Machinery (ACM) , 2023, p. 172-173Conference paper (Refereed)
    Abstract [en]

    Cloud providers began to provide managed services to attract scientific applications, which have been traditionally executed on supercomputers. One example is AWS FSx for Lustre, a fully managed parallel file system (PFS) released in 2018. However, due to the nature of scientific applications, the frontend storage network bandwidth is left completely idle for the majority of its lifetime. Furthermore, the pricing model does not match the scalability requirement. We propose iFast, a novel host-side caching mechanism for scientific applications that improves storage bandwidth utilization and end-to-end application performance: by overlapping compute and data writeback through inexpensive local storage. iFast supports the Massage Passing Interface (MPI) library that is widely used by scientific applications and is implemented as a preloaded library. It requires no change to applications, the MPI library, or support from cloud operators. We demonstrate how iFast can accelerate the end-to-end time of a representative scientific application Neko, by 13-40%.

  • 6.
    de Abreu, Rodrigo Vilela
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Adaptive Computation of Aeroacoustic Sources for Rudimentary Landing Gear2010In: Benchmark problems for Airframe Noise Computations I, Stockholm 2010, 2010Conference paper (Other academic)
  • 7.
    Dykes, Tim
    et al.
    HPE HPC/AI EMEA Research Lab.
    Foyer, Clément
    HPE HPC/AI EMEA Research Lab, HPC Research Group, Univ. of Bristol.
    Richardson, Harvey
    HPE HPC/AI EMEA Research Lab.
    Svedin, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Tate, Adrian
    Numerical Algorithms Group Ltd. (NAG).
    McIntosh-Smith, Simon
    HPC Research Group, Univ. of Bristol.
    Mamba: Portable Array-based Abstractions for Heterogeneous High-Performance Systems2021In: 2021 International Workshop on Performance, Portability and Productivity in HPC (P3HPC), Institute of Electrical and Electronics Engineers (IEEE) , 2021Conference paper (Refereed)
    Abstract [en]

    High performance computing architectures have become increasingly heterogeneous in recent times. This growing architectural variety presents a multi-faceted portability problem affecting applications, libraries, programming models, languages, compilers, run-times, and system software. Approaches for performance portability typically focus heavily on efficient usage of parallel compute architectures and less on data locality abstractions and complex memory systems, with minimal support afforded to effective memory management in traditional HPC languages such as C and Fortran. We present Mamba, a library to facilitate usage of heterogeneous memory systems by high performance application/library developers through high level array-based abstractions for memory management supported by a low-level generic memory API. We detail the library design and implementation, demonstrating generic memory allocation, data layout specification, array tiling and heterogeneous transport. We evaluate performance in the context of a typical matrix transposition, DNA sequencing benchmark, and an application use case for high-order spectral element based incompressible flow.

  • 8.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    de Abreu, Rodrigo Vilela
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Degirmenci, Niyazi Cem
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Müller, Kaspar
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Nazarov, Murtazo
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Spühler, Jeannette Hiromi
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Unicorn: Parallel adaptive finite element simulation of turbulent flow and fluid-structure interaction for deforming domains and complex geometry2013In: Computers & Fluids, ISSN 0045-7930, E-ISSN 1879-0747, Vol. 80, no SI, p. 310-319Article in journal (Refereed)
    Abstract [en]

    We present a framework for adaptive finite element computation of turbulent flow and fluid structure interaction, with focus on general algorithms that allow for complex geometry and deforming domains. We give basic models and finite element discretization methods, adaptive algorithms and strategies for efficient parallel implementation. To illustrate the capabilities of the computational framework, we show a number of application examples from aerodynamics, aero-acoustics, biomedicine and geophysics. The computational tools are free to download open source as Unicorn, and as a high performance branch of the finite element problem solving environment DOLFIN, both part of the FEniCS project.

  • 9.
    Hoffman, Johan
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jansson, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Degirmenci, Niyasi Cem
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Nazarov, Murtazo
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Unicorn: A unified continuum mechanics solver2012In: Lecture Notes in Computational Science and Engineering, Springer Science and Business Media Deutschland GmbH , 2012, p. 339-361Chapter in book (Refereed)
    Abstract [en]

    This chapter provides a description of the technology of Unicorn focusing on simple, efficient and general algorithms and software for the Unified Continuum (UC) concept and the adaptive General Galerkin (G2) discretization as a unified approach to continuum mechanics. We describe how Unicorn fits into the FEniCS framework, how it interfaces to other FEniCS components, what interfaces and functionality Unicorn provides itself and how the implementation is designed. We also present some examples in fluid–structure interaction and adaptivity computed with Unicorn. 

  • 10.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Degirmenci, Niyasi Cem
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Nazarov, Murtazo
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Unicorn: a unified continuum mechanics solver; in automated solution pf differential equations by the finite element method2012In: Automated Solution of Differential Equations by the Finite Element Method / [ed] Anders Logg, Kent-Andre Mardal, Garth Wells, Springer Berlin/Heidelberg, 2012Chapter in book (Refereed)
  • 11.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC). Basque Center for Applied Mathematics (BCAM), Bilbao, Spain.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). Basque Center for Applied Mathematics (BCAM), Bilbao, Spain.
    Degirmenci, Niyazi Cem
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Spühler, Jeannette Hiromi
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Vilela de Abreu, Rodrigo
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Larcher, Aurélien
    Norwegian University of Science and Technology, Trondheim, Norway.
    FEniCS-HPC: Coupled Multiphysics in Computational Fluid Dynamics2017In: High-Performance Scientific Computing: Jülich Aachen Research Alliance (JARA) High-Performance Computing Symposium / [ed] Edoardo Di Napoli, Marc-André Hermanns, Hristo Iliev, Andreas Lintermann, Alexander Peyser, Springer, 2017, p. 58-69Conference paper (Refereed)
    Abstract [en]

    We present a framework for coupled multiphysics in computational fluid dynamics, targeting massively parallel systems. Our strategy is based on general problem formulations in the form of partial differential equations and the finite element method, which open for automation, and optimization of a set of fundamental algorithms. We describe these algorithms, including finite element matrix assembly, adaptive mesh refinement and mesh smoothing; and multiphysics coupling methodologies such as unified continuum fluid-structure interaction (FSI), and aeroacoustics by coupled acoustic analogies. The framework is implemented as FEniCS open source software components, optimized for massively parallel computing. Examples of applications are presented, including simulation of aeroacoustic noise generated by an airplane landing gear, simulation of the blood flow in the human heart, and simulation of the human voice organ.

  • 12.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    FEniCS-HPC: Automated predictive high-performance finite element computing with applications in aerodynamics2016In: Proceedings of the 11th International Conference on Parallel Processing and Applied Mathematics, PPAM 2015, Springer-Verlag New York, 2016, Vol. 9573, p. 356-365Conference paper (Refereed)
    Abstract [en]

    Developing multiphysics nite element methods (FEM) andscalable HPC implementations can be very challenging in terms of soft-ware complexity and performance, even more so with the addition ofgoal-oriented adaptive mesh renement. To manage the complexity we inthis work presentgeneraladaptive stabilized methods withautomatedimplementation in the FEniCS-HPCautomatedopen source softwareframework. This allows taking the weak form of a partial dierentialequation (PDE) as input in near-mathematical notation and automati-cally generating the low-level implementation source code and auxiliaryequations and quantities necessary for the adaptivity. We demonstratenew optimal strong scaling results for the whole adaptive frameworkapplied to turbulent ow on massively parallel architectures down to25000 vertices per core with ca. 5000 cores with the MPI-based PETScbackend and for assembly down to 500 vertices per core with ca. 20000cores with the PGAS-based JANPACK backend. As a demonstration ofthe high impact of the combination of the scalability together with theadaptive methodology allowing prediction of gross quantities in turbulent ow we present an application in aerodynamics of a full DLR-F11 aircraftin connection with the HiLift-PW2 benchmarking workshop with goodmatch to experiments.

  • 13.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). Basque Ctr Appl Math, Spain.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). Basque Ctr Appl Math, Spain.
    Jansson, Niclas
    RIKEN Advanced Institute for Computational Science, Kobe, Japan.
    De Abreu, Rodrigo Vilela
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Towards a parameter-free method for high Reynolds number turbulent flow simulation based on adaptive finite element approximation2015In: Computer Methods in Applied Mechanics and Engineering, ISSN 0045-7825, E-ISSN 1879-2138, Vol. 288, p. 60-74Article in journal (Refereed)
    Abstract [en]

    We present work towards a parameter-free method for turbulent flow simulation based on adaptive finite element approximation of the Navier-Stokes equations at high Reynolds numbers. In this model, viscous dissipation is assumed to be dominated by turbulent dissipation proportional to the residual of the equations, and skin friction at solid walls is assumed to be negligible compared to inertial effects. The result is a computational model without empirical data, where the only parameter is the local size of the finite element mesh. Under adaptive refinement of the mesh based on a posteriori error estimation, output quantities of interest in the form of functionals of the finite element solution converge to become independent of the mesh resolution, and thus the resulting method has no adjustable parameters. No ad hoc design of the mesh is needed, instead the mesh is optimised based on solution features, in particular no bounder layer mesh is needed. We connect the computational method to the mathematical concept of a dissipative weak solution of the Euler equations, as a model of high Reynolds number turbulent flow, and we highlight a number of benchmark problems for which the method is validated. 

    Download full text (pdf)
    kth-ctl-4031
  • 14.
    Hoffman, Johan
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jansson, Johan
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Johnson, Claes
    KTH, School of Computer Science and Communication (CSC).
    Vilela de Abreu, Rodrigo
    KTH, School of Computer Science and Communication (CSC).
    Turbulent flow and Fluid–structure interaction2012In: Lecture Notes in Computational Science and Engineering, Springer Science and Business Media Deutschland GmbH , 2012, p. 543-552Chapter in book (Refereed)
    Abstract [en]

    The FEniCS Project aims towards the goals of generality, efficiency, and simplicity, concerning mathematical methodology, implementation and application, and the Unicorn project is an implementation aimed at FSI and high Re turbulent flow guided by these principles. Unicorn is based on the DOLFIN/FFC/FIAT suite and the linear algebra package PETSc. We here present some key elements of Unicorn, and a set of computational results from applications. The details of the Unicorn implementation are described in Chapter 18. 

  • 15.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Johnsson, Claes
    Vilela de Abreu, Rodrigo
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Turbulent flow and fluid-structure interaction; in automated solution of differental equations by the finite element method2011In: Automated Solution of Differential Equations by the Finite Element Method / [ed] Anders Logg Kent-Andre Mardal, Garth Wells, Springer Berlin/Heidelberg, 2011Chapter in book (Refereed)
  • 16.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). Basque Center for Applied Mathematics, Spain .
    Jansson, Niclas
    RIKEN Advanced Institute for Computational Science, Japan .
    Vilela De Abrea, Rodrigo
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Time-resolved adaptive FEM simulation of the DLR-F11 aircraft model at high Reynolds number2014In: 52nd AIAA Aerospace Sciences Meeting - AIAA Science and Technology Forum and Exposition, SciTech 2014, 2014Conference paper (Other academic)
    Abstract [en]

    We present a time-resolved, adaptive finite element method for aerodynamics, together with the results from the HiLiftPW-2 workshop, where this method is used to compute the flow past a DLR-F11 aircraft model at realistic Reynolds number. The mesh is automatically constructed by the method as part of the computation, and no explicit turbulence model is needed. The effect of unresolved turbulent boundary layers is modeled by a simple parametrization of the wall shear stress in terms of the skin friction. In the extreme case of very high Reynolds numbers we approximate the small skin friction by zero skin friction, corresponding to a free slip boundary condition, which results in a computational model without any model parameter that needs tuning. Thus, the simulation methodology by- passes the main challenges posed by high Reynolds number CFD: the design of an optimal computational mesh, turbulence (or subgrid) modeling, and the cost of boundary layer res- olution. The results from HiLiftPW-2 presented in this report show good agreement with experimental data for a range of different angles of attack, while using orders of magnitude fewer degrees of freedom than what is needed in state of the art methods such as RANS. 

    Download full text (pdf)
    aiaaSciTech-preprint
  • 17.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Vilela De Abreu, Rodrigo
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW.
    Computation of slat noise sources using adaptive FEM and lighthill's analogy2013In: 19th AIAA/CEAS Aeroacoustics Conference, 2013Conference paper (Refereed)
    Abstract [en]

    This is a summary of preliminary results from simulations with the 30P30N high-lift device. We used the General Galerkin finite element method (G2), where no explicit subgrid model is used, and where the computational mesh is adaptively refined with respect to a posteriori error estimates for a quantity of interest. The mesh is fully unstructured and the solutions are time-resolved, which are key ingredients for solving challenging industrial applications in the field of aeroacoustics. We present preliminary results containing time-averaged quantities and snapshots of unsteady quantities, all reasonably agreeing with previous computational efforts. One important finding is that the use of adaptively generated meshes seems to be a more effcient way of computing aeroacoustic sources than by using "handmade" meshes.

  • 18.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). Basque Center for Applied Mathematics, Bilbao, Spain.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST). Basque Center for Applied Mathematics, Bilbao, Spain.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Vilela de Abreu, Rodrigo
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Johnson, Claes
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Computability and Adaptivity in CFD2018In: Encyclopedia of Computational Mechanics / [ed] Erwin Stein, René de Borst, Thomas J. R. Hughes, John Wiley & Sons, 2018Chapter in book (Refereed)
  • 19.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Vilela de Abreu, Rodrigo
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Degirmenci, Niyazi Cem
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Müller, Kaspar
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Nazarov, Murtazo
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Spühler, Jeannette Hiromi
    Unicorn: Parallel adaptive finite element simulation of turbulent flow and fluid-structure interaction for deforming domains and complex geometry2011Report (Other academic)
    Abstract [en]

    We present a framework for adaptive finite element computation of turbulent flow and fluid-structure interaction, with focus on general algorithms that allow for complex geometry and deforming domains. We give basic models and finite element discretization methods, adaptive algorithms and strategies for e cient parallel implementation. To illustrate the capabilities of the computational framework, we show a number of application examples from aerodynamics, aero-acoustics, biomedicine and geophysics. The computational tools are free to download open source as Unicorn, and as a high performance branch of the finite element problem solving environment DOLFIN, both part of the FEniCS project

    Download full text (pdf)
    fulltext
  • 20.
    Hoffman, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    A computational study of turbulent flow separation for a circular cylinder using skin friction boundary conditions2011In: Quality And Reliability Of Large-Eddy Simulations II, Springer Netherlands, 2011, Vol. 16, no 1, p. 57-68Conference paper (Refereed)
    Abstract [en]

    In this paper we present a computational study of turbulent flow separation for a circular cylinder at high Reynolds numbers. We use a stabilized finite element method together with skin friction boundary conditions, where we study flow separation with respect to the decrease of a friction parameter. In particular, we consider the case of zero friction corresponding to pure slip boundary conditions, for which we observe an inviscid separation mechanism of large scale streamwise vortices, identified in our earlier work. We compare our computational results to experiments for very high Reynolds numbers. In particular, we connect the pattern of streamwise vorticity in our computations to experimental findings of spanwise 3d cell structures reported in the literature.

  • 21.
    Jansson, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Simulation of 3d unsteady incompressible flow past a NACA 0012 wing sectionManuscript (preprint) (Other academic)
  • 22.
    Jansson, Johan
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Simulation of 3D unsteady incompressible flow past a NACA 0012 wing section2012Report (Other academic)
    Abstract [en]

    We present computational simulations of three-dimensional unsteady high Reynolds number incompressible flow past a NACA 0012 wing profile, for a range of angles of attack, from low lift through stall. A stabilized finite element method is used, referred to as General Galerkin (G2), with adaptive mesh refinement with respect to the error in target output, such as aerodynamic forces. Computational predictions of aerodynamic forces are validated against experimental data.

    Download full text (pdf)
    fulltext
  • 23.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    A Hybrid MPI+PGAS Approach to Improve Strong Scalability Limits of Finite Element Solvers2020In: Proceedings - IEEE International Conference on Cluster Computing, ICCC, Institute of Electrical and Electronics Engineers (IEEE) , 2020, p. 303-313Conference paper (Refereed)
    Abstract [en]

    Current finite element codes scale reasonably well as long as each core has sufficient amount of local work that can balance communication costs. However, achieving efficient performance at exascale will require unreasonable large problem sizes, in particular for low-order methods, where the small amount of work per element already is a limiting factor on current post petascale machines. Key bottlenecks for these methods are sparse matrix assembly, where communication latency starts to limit performance as the number of cores increases, and linear solvers, where efficient overlapping is necessary to amortize communication and synchronization cost of sparse matrix vector multiplication and dot products. We present our work on improving strong scalability limits of message passing based general low-order finite element based solvers. Using lightweight one-sided communication offered by partitioned global address space languages (PGAS), we demonstrate that the scalability of performance critical, latency sensitive sparse matrix assembly can achieve almost an order of magnitude better scalability. Linear solvers are also addressed via a signaling put algorithm for low-cost point-to-point synchronization, achieving similar performance as message passing based linear solvers. We introduce a new hybrid MPI+PGAS implementation of the open source general finite element framework FEniCS, replacing the linear algebra backend with a new library written in Unified Parallel C (UPC). A detailed description of the implementation and the hybrid interface to FEniCS is given, and the feasibility of the approach is demonstrated via a performance study of the hybrid implementation on Cray XC40 machines.

  • 24.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    High performance adaptive finite element methods for turbulent fluid flow2011Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

    Understanding the mechanics of turbulent fluid flow is of key importance for industry and society as for example in aerodynamics and aero-acoustics. The massive computational cost for resolving all turbulent scales in a realistic problem makes direct numerical simulation of the underlying Navier-Stokes equations impossible. Recent advances in adaptive finite element methods offer a new powerful tool in Computational Fluid Dynamics (CFD). The computational cost for simulating turbulent flow can be minimized where the mesh is adaptively resolved, based on a posteriori error control. These adaptive methods have been implemented for efficient serial computations, but the extension to an efficient parallel solver is a challenging task.

    This work concerns the development of an adaptive finite element method for modern parallel computer architectures. We present efficient data structures and data decomposition methods for distributed unstructured tetrahedral meshes. Our work also concerns an efficient parallellization of local mesh refinement methods such as recursive longest edge bisection.

    We also address the load balance problem with the development of an a priori predictive dynamic load balancing method. Current results are encouraging with almost linear strong scaling to thousands of cores on several modern architectures.

    Download full text (pdf)
    FULLTEXT01
  • 25.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    High Performance Adaptive Finite Element Methods: With Applications in Aerodynamics2013Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The massive computational cost for resolving all scales in a turbulent flow makes a direct numerical simulation of the underlying Navier-Stokes equations impossible in most engineering applications. Recent advances in adaptive finite element methods offer a new powerful tool in Computational Fluid Dynamics (CFD). The computational cost for simulating turbulent flow can be minimized by adaptively resolution of the mesh, based on a posteriori error estimation. Such adaptive methods have previously been implemented for efficient serial computations, but the extension to an efficient parallel solver is a challenging task. This work concerns the development of an adaptive finite element method that enables efficient computation of time resolved approximations of turbulent flow for complex geometries with a posteriori error control. We present efficient data structures and data decomposition methods for distributed unstructured tetrahedral meshes. Our work also concerns an efficient parallelization of local mesh refinement methods such as recursive longest edge bisection, and the development of an a priori predictive dynamic load balancing method, based on a weighted dual graph. We also address the challenges of emerging supercomputer architectures with the development of new hybrid parallel programming models, combining traditional message passing with lightweight one-sided communication. Our implementation has proven to be both general and efficient, scaling up to more than twelve thousands cores.

    Download full text (pdf)
    fulltext
  • 26.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Improving Strong Scalability Limits of Finite Element Based Solvers2019Conference paper (Refereed)
    Abstract [en]

    Current finite element codes scale reasonably well as long as each core has sufficient amount of local work that can balance communication costs. However, achieving efficient performance at exascale will require unreasonable large problem sizes, in particular for low-order methods, where the small amount of work per element already is a limiting factor on current post petascale machines. One of the key bottlenecks for these methods is sparse matrix assembly, where communication latency starts to limit performance as the number of cores increases. We present our work on improving strong scalability limits of message passing based general low-order finite element based solvers. Using lightweight one-sided communication, we demonstrate that the scalability of performance critical, latency sensitive kernels can achieve almost an order of magnitude better scalability. We introduce a new hybrid MPI/PGAS implementation of the open source general finite element framework FEniCS, replacing the linear algebra backend with a new library written in UPC. A detailed description of the implementation and the hybrid interface to FEniCS is given, and we present a detailed performance study of the hybrid implementation on Cray XC40 machines.

  • 27.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Optimizing Sparse Matrix Assembly in Finite Element Solvers with One-Sided Communication2013In: High Performance Computing for Computational Science - VECPAR 2012, Springer Berlin/Heidelberg, 2013, p. 128-139Conference paper (Refereed)
    Abstract [en]

    In parallel finite element solvers, sparse matrix assembly is often a bottleneck. Implemented using message passing, latency from message matching starts to limit performance as the number of cores increases. We here address this issue by using our own stack based representation of the sparse matrix, and a hybrid parallel programming model combining traditional message passing with one-sided communication. This gives an significantly faster insertion rate compared to state of the art implementations on a Cray XE6.

    Download full text (pdf)
    fulltext
  • 28.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Spectral Element Simulations on the NEC SX-Aurora TSUBASA2021In: HPC Asia 2021: The International Conference on High Performance Computing in Asia-Pacific Region, Association for Computing Machinery (ACM) , 2021Conference paper (Refereed)
    Abstract [en]

    Following the recent transition in the high performance computing landscape to more heterogeneous architectures, application developers are faced with the challenge of ensuring good performance across a diverse set of platforms. In this paper, we present our work on porting the spectral element code Nek5000 to the recent vector architecture SX-Aurora TSUBASA. Using Nek5000's mini-app Nekbone, we formulate suitable loop transformations in key kernels, allowing for better vectorization, increasing the baseline performance by a factor of six. Using the new transformations, we demonstrate that the main compute intensive matrix-vector and matrix-matrix multiplication kernels achieves close to half the peak performance of a SX-Aurora core. Our work also addresses the gather-scatter operations, a key kernel for efficient matrix-free spectral element formulation. We introduce a new implementation of Nek5000's gather-scatter library with mesh topology awareness for improved vectorization via exploitation of the SX-Aurora's hardware gather-scatter instructions, improving performance with up to 116%. A detailed description of the implementation is given together with a performance study, comparing both single node performance and strong scalability characteristics, running across multiple SX-Aurora cards.

    Download full text (pdf)
    fulltext
  • 29.
    Jansson, Niclas
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Towards a Parallel Algebraic Multigrid Solver Using Partitioned Global Address Space2013Report (Other academic)
    Abstract [en]

    The Algebraic Multigrid (AMG) method has over the years developed into an ecient tool for solving unstructured linear systems. The need to solve large industrial problems discretized on unstructured meshes, has been a key motivation for devising a parallel AMG method. Despite some success, the key part of the AMG algorithm; the coarsening step, is far from trivial to parallelize eciently. We here introduce a novel parallelization of the Ruge-Stüben coarsening algorithm, that retains the good interpolation properties of the original method. Our parallelization is based on the Partitioned Global Address Space (PGAS) abstraction, which allows for a simple, yet efficient implementation. The solver is described in detail and a performance study on a Cray XE6 is presented.

    Download full text (pdf)
    fulltext
  • 30.
    Jansson, Niclas
    et al.
    RIKEN Advanced Institute for Computational Science, Kobe, Japan.
    Bale, Rahul
    RIKEN Advanced Institute for Computational Science, Kobe, Japan.
    Onishi, Keiji
    RIKEN Advanced Institute for Computational Science, Kobe, Japan.
    Tsubokura, Makoto
    Kobe University and RIKEN Advanced Institute for Computational Science, Kobe Japan.
    CUBE: A scalable framework for large-scale industrial simulations2019In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 33, no 4, p. 678-698Article in journal (Refereed)
  • 31.
    Jansson, Niclas
    et al.
    RIKEN Advanced Institute for Computational Science.
    Bale, Rahul
    RIKEN Advanced Institute for Computational Science.
    Onishi, Keiji
    RIKEN Advanced Institute for Computational Science.
    Tsubokura, Makoto
    Department of Computational Science, Graduate School of System Informatics, Kobe University and RIKEN Advanced Institute for Computational Science.
    Dynamic Load Balancing for Large-Scale Multiphysics Simulations2017In: High-Performance Scientific Computing: Jülich Aachen Research Alliance (JARA) High-Performance Computing Symposium / [ed] Edoardo Di Napoli, Marc-André Hermanns, Hristo Iliev, Andreas Lintermann, Alexander Peyser, 2017, p. 13-23Conference paper (Refereed)
    Abstract [en]

    In parallel computing load balancing is an essential component of any efficient and scalable simulation code. Static data decomposition methods have proven to work well for symmetric workloads. But, in today’s multiphysics simulations, with asymmetric workloads, this imbalance prevents good scalability on future generation of parallel architectures. We present our work on developing a general dynamic load balancing framework for multiphysics simulations on hierarchical Cartesian meshes. Using a weighted dual graph based workload estimation and constrained multilevel graph partitioning, the required runtime for industrial applications could be reduced by 40%" role="presentation" style="box-sizing: border-box; display: inline-table; line-height: normal; letter-spacing: normal; word-spacing: normal; word-wrap: normal; white-space: nowrap; float: none; direction: ltr; max-width: none; max-height: none; min-width: 0px; min-height: 0px; border: 0px; padding: 0px; margin: 0px; position: relative;">%% of the runtime, running on the K computer.

  • 32.
    Jansson, Niclas
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    A Hybrid MPI/PGAS Finite Element Solver2012Report (Other academic)
    Abstract [en]

    We present our work on developing a hybrid parallel programming model for a general finite element solver. The main focus of our work is to demonstrate that legacy codes with high latency, two-sided communication in the form of message passing can be improved using lightweight one-sided communication. We introduce a new hybrid MPI/PGAS implementation of the open source finite element framework FEniCS, replacing the linear algebra backend (PETSc) with a new library written in UPC.  A detailed description of the linear algebra backend implementation and the hybrid interface to FEniCS is given. We also present a detailed analysis of the performance of this hybrid solver on the Cray XE6 Lindgren at PDC/KTH including a comparison with the MPI only implementation, where we find that the hybrid implementation results in improvements of up to 33% in communication intensive parts of the solver.

    Download full text (pdf)
    fulltext
  • 33.
    Jansson, Niclas
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Computer simulation of incompressible flow past a circular cylinder at a very high Reynolds numbers2011Manuscript (preprint) (Other academic)
  • 34.
    Jansson, Niclas
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Improving Parallel Performance of FEniCS Finite Element Computations by Hybrid MPI/PGASManuscript (preprint) (Other academic)
    Abstract [en]

    We present our work on developing a hybrid parallel programming model for a general finite element solver. The main focus of our work is to demonstrate that legacy codes with high latency, two-sided communication in the form of message passing can be improved using lightweight one-sided communication. We introduce a new hybrid MPI/PGAS implementation of the open source general finite element framework FEniCS, replacing the linear algebra backend (PETSc) with a new library written in UPC. A detailed description of the linear algebra backend implementation and the hybrid interface to FEniCS is given. We also present a detailed analysis of the performance of this hybrid solver on the Cray XE6 Lindgren at PDC/KTH including a comparison with the MPI only implementation, where we find that the hybrid implementation results in significant improvements in performance of the solver.

  • 35.
    Jansson, Niclas
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Improving Parallel Performance of FEniCS Finite Element Computations by Hybrid MPI/PGAS2013Report (Other academic)
    Abstract [en]

    We present our work on developing a hybrid parallel programming model for a general finite element solver. The main focus of our work is to demonstrate that legacy codes with high latency, two-sided communication in the form of message passing can be improved using lightweight one-sided communication. We introduce a new hybrid MPI/PGAS implementation of the open source general finite element framework FEniCS, replacing the linear algebra backend (PETSc) with a new library written in UPC. A detailed description of the linear algebra backend implementation and the hybrid interface to FEniCS is given. We also present a detailed analysis of the performance of this hybrid solver on the Cray XE6 Lindgren at PDC/KTH including a comparison with the MPI only implementation, where we find that the hybrid implementation results in significant improvements in performance of the solver.

    Download full text (pdf)
    fulltext
  • 36.
    Jansson, Niclas
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Framework For Massively Parallel Adaptive Finite Element Computational Fluid Dynamics On Tetrahedral Meshes2012In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197, Vol. 34, no 1, p. C24-C42Article in journal (Refereed)
    Abstract [en]

    In this paper we describe a general adaptive finite element framework for unstructured tetrahedral meshes without hanging nodes suitable for large scale parallel computations. Our framework is designed to scale linearly to several thousands of processors, using fully distributed and efficient algorithms. The key components of our implementation, local mesh refinement and load balancing algorithms, are described in detail. Finally, we present a theoretical and experimental performance study of our framework, used in a large scale computational fluid dynamics computation, and we compare scaling and complexity of different algorithms on different massively parallel architectures.

    Download full text (pdf)
    fulltext
  • 37.
    Jansson, Niclas
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Nazarov, Murtazo
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Adaptive simulation of turbulent flow past a full car model2011In: State of the Practice Reports, SC'11, 2011Conference paper (Refereed)
    Abstract [en]

    The massive computational cost for resolving all turbulent scales makes a direct numerical simulation of the underlying Navier-Stokes equations impossible in most engineering applications. We present recent advances in parallel adaptive finite element methodology that enable us to efficiently compute time resolved approximations for complex geometries with error control. In this paper we present a LES simulation of turbulent flow past a full car model, where we adaptively refine the unstructured mesh to minimize the error in drag prediction. The simulation was partly carried out on the new Cray XE6 at PDC/KTH where the solver shows near optimal strong and weak scaling for the entire adaptive process.

  • 38.
    Jansson, Niclas
    et al.
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Jansson, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Hoffman, Johan
    KTH, School of Computer Science and Communication (CSC), Numerical Analysis, NA.
    Adaptive finite element computational fluid dynamics for large scale massiverly parallel computing2012In: SIAM Journal on Scientific Computing, ISSN 1064-8275, E-ISSN 1095-7197Article in journal (Refereed)
  • 39.
    Jansson, Niclas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Karp, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Perez, Adalberto
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Mukha, Timofey
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Ju, Yi
    Max Planck Computing and Data Facility, Garching, Germany.
    Liu, Jiahui
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Pall, Szilard
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    Max Planck Computing and Data Facility, Garching, Germany.
    Weinkauf, Tino
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Schumacher, Jörg
    Technische Universität Ilmenau, Ilmenau, Germany.
    Schlatter, Philipp
    Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg, Germany.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Exploring the Ultimate Regime of Turbulent Rayleigh–Bénard Convection Through Unprecedented Spectral-Element Simulations2023In: SC '23: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, Association for Computing Machinery (ACM) , 2023, p. 1-9, article id 5Conference paper (Refereed)
    Abstract [en]

    We detail our developments in the high-fidelity spectral-element code Neko that are essential for unprecedented large-scale direct numerical simulations of fully developed turbulence. Major inno- vations are modular multi-backend design enabling performance portability across a wide range of GPUs and CPUs, a GPU-optimized preconditioner with task overlapping for the pressure-Poisson equation and in-situ data compression. We carry out initial runs of Rayleigh–Bénard Convection (RBC) at extreme scale on the LUMI and Leonardo supercomputers. We show how Neko is able to strongly scale to 16,384 GPUs and obtain results that are not pos- sible without careful consideration and optimization of the entire simulation workflow. These developments in Neko will help resolv- ing the long-standing question regarding the ultimate regime in RBC. 

  • 40.
    Jansson, Niclas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Karp, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Software and Computer systems, SCS.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics, Turbulent simulations laboratory.
    Neko: A modern, portable, and scalable framework for high-fidelity computational fluid dynamics2024In: Computers & Fluids, ISSN 0045-7930, E-ISSN 1879-0747, Vol. 275, p. 106243-106243, article id 106243Article in journal (Refereed)
    Abstract [en]

    Computational fluid dynamics (CFD), in particular applied to turbulent flows, is a research area with great engineering and fundamental physical interest. However, already at moderately high Reynolds numbers the computational cost becomes prohibitive as the range of active spatial and temporal scales is quickly widening. Specifically scale-resolving simulations, including large-eddy simulation (LES) and direct numerical simulations (DNS), thus need to rely on modern efficient numerical methods and corresponding software implementations. Recent trends and advancements, including more diverse and heterogeneous hardware in High-Performance Computing (HPC), are challenging software developers in their pursuit for good performance and numerical stability. The well-known maxim “software outlives hardware” may no longer necessarily hold true, and developers are today forced to re-factor their codebases to leverage these powerful new systems. In this paper, we present Neko, a new portable framework for high-order spectral element discretization, targeting turbulent flows in moderately complex geometries. Neko is fully available as open software. Unlike prior works, Neko adopts a modern object-oriented approach in Fortran 2008, allowing multi-tier abstractions of the solver stack and facilitating hardware backends ranging from general-purpose processors (CPUs) down to exotic vector processors and FPGAs. We show that Neko’s performance and accuracy are comparable to NekRS, and thus on-par with Nek5000’s successor on modern CPU machines. Furthermore, we develop a performance model, which we use to discuss challenges and opportunities for high-order solvers on emerging hardware

  • 41.
    Jansson, Niclas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science.
    Karp, Martin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Wahlgren, Jacob
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics. Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg, Germany.
    Design of Neko - A Scalable High-Fidelity Simulation Framework with Extensive Accelerator SupportManuscript (preprint) (Other academic)
  • 42.
    Jansson, Niclas
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Towards a Parallel Algebraic Multigrid Solver Using PGAS2018In: 2018 Workshop on High Performance Computing Asia, New York, NY, USA: Association for Computing Machinery (ACM), 2018, p. 31-38Conference paper (Refereed)
    Abstract [en]

    The Algebraic Multigrid (AMG) method has over the years developed into an efficient tool for solving unstructured linear systems. The need to solve large industrial problems discretized on unstructured meshes, has been a key motivation for devising a parallel AMG method. Despite some success, the key part of the AMG algorithm; the coarsening step, is far from trivial to parallelize efficiently. We here introduce a novel parallelization of the inherently sequential Ruge-Stüben coarsening algorithm, that retains most of the good interpolation properties of the original method. Our parallelization is based on the Partitioned Global Address Space (PGAS) abstraction, which greatly simplifies the parallelization as compared to traditional message passing based implementations. The coarsening algorithm and solver is described in detail and a performance study on a Cray XC40 is presented.

  • 43.
    Ju, Yi
    et al.
    Max Planck Computing and Data Facility, Max Planck Computing and Data Facility.
    Li, Mingshuai
    Technical University of Munich, Technical University of Munich.
    Perez, Adalberto
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Bellentani, Laura
    CINECA, Cineca.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics. Friedrich-Alexander-Universität Erlangen-Nürnberg.
    Laure, Erwin
    Max Planck Computing and Data Facility, Max Planck Computing and Data Facility.
    In-Situ Techniques on GPU-Accelerated Data-Intensive Applications2023In: Proceedings 2023 IEEE 19th International Conference on e-Science, e-Science 2023, Institute of Electrical and Electronics Engineers (IEEE) , 2023Conference paper (Refereed)
    Abstract [en]

    The computational power of High-Performance Computing (HPC) systems is constantly increasing, however, their input/output (IO) performance grows relatively slowly, and their storage capacity is also limited. This unbalance presents significant challenges for applications such as Molecular Dynamics (MD) and Computational Fluid Dynamics (CFD), which generate massive amounts of data for further visualization or analysis. At the same time, checkpointing is crucial for long runs on HPC clusters, due to limited walltimes and/or failures of system components, and typically requires the storage of large amount of data. Thus, restricted IO performance and storage capacity can lead to bottlenecks for the performance of full application workflows (as compared to computational kernels without IO). In-situ techniques, where data is further processed while still in memory rather to write it out over the I/O subsystem, can help to tackle these problems. In contrast to traditional post-processing methods, in-situ techniques can reduce or avoid the need to write or read data via the IO subsystem. They offer a promising approach for applications aiming to leverage the full power of large scale HPC systems. In-situ techniques can also be applied to hybrid computational nodes on HPC systems consisting of graphics processing units (GPUs) and central processing units (CPUs). On one node, the GPUs would have significant performance advantages over the CPUs. Therefore, current approaches for GPU-accelerated applications often focus on maximizing GPU usage, leaving CPUs underutilized. In-situ tasks using CPUs to perform data analysis or preprocess data concurrently to the running simulation, offer a possibility to improve this underutilization.

  • 44.
    Karp, Martin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Markidis, Stefano
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Optimization of Tensor-product Operations in Nekbone on GPUs2020Conference paper (Refereed)
    Abstract [en]

    In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to make the computations layer by layer. This enables us to use loop unrolling as well as utilize registers and shared memory efficiently. Our implementation is then compared on both the Pascal and Volta GPU architectures to previous GPU versions of Nekbone as well as a measured roofline. The results show that our implementation outperforms previous GPU Nekbone implementations by 6-10%. Compared to the measured roofline, we obtain 77-92% of the peak performance for both Nvidia P100 and V100 GPUs for inputs with 1024-4096 elements and polynomial degree 9.

    Download full text (pdf)
    fulltext
  • 45.
    Karp, Martin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Reducing Communication in the Conjugate Gradient Method: A Case Study on High-Order Finite Elements2022In: Proceedings of the Platform for Advanced Scientific Computing Conference, PASC 2022, Association for Computing Machinery (ACM) , 2022, article id 2Conference paper (Refereed)
    Abstract [en]

    Currently, a major bottleneck for several scientific computations is communication, both communication between different processors, so-called horizontal communication, and vertical communication between different levels of the memory hierarchy. With this bottleneck in mind, we target a notoriously communication-bound solver at the core of many high-performance applications, namely the conjugate gradient method (CG). To reduce the communication we present lower bounds on the vertical data movement in CG and go on to make a CG solver with reduced data movement. Using our theoretical analysis we apply our CG solver on a high-performance discretization used in practice, the spectral element method (SEM). Guided by our analysis, we show that for the Poisson equation on modern GPUs we can improve the performance by 30% by both rematerializing the discrete system and by reformulating the system to work on unique degrees of freedom. In order to investigate how horizontal communication can be reduced, we compare CG to two communication-reducing techniques, namely communication-avoiding and pipelined CG. We strong scale up to 4096 CPU cores and showcase performance improvements of upwards of 70% for pipelined CG compared to standard CG when applied on SEM at scale. We show that in addition to improving the scaling capabilities of the solver, initial measurements indicate that the convergence of SEM is largely unaffected by pipelined CG.

  • 46.
    Karp, Martin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Liu, Felix
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). Raysearch Laboratories..
    Stanly, Ronith
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Rezaeiravesh, Saleh
    The University of Manchester, Manchester, United Kingdom.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics. Friedrich-Alexander-Universität (FAU) Erlangen-Nürnberg, Erlangen, Germany.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Uncertainty Quantification of Reduced-Precision Time Series in Turbulent Channel Flow2023In: Proceedings of 2023 SC Workshops of the International Conference on High Performance Computing, Network, Storage, and Analysis, SC Workshops 2023, Association for Computing Machinery (ACM) , 2023, p. 387-390Conference paper (Refereed)
    Abstract [en]

    With increased computational power through the use of arithmetic in low-precision, a relevant question is how lower precision affects simulation results, especially for chaotic systems where analytical round-off estimates are non-trivial to obtain. In this work, we consider how the uncertainty of the time series of a direct numerical simulation of turbulent channel flow at Ret = 180 is affected when restricted to a reduced-precision representation. We utilize a non-overlapping batch means estimator and find that the mean statistics can, in this case, be obtained with significantly fewer mantissa bits than conventional IEEE-754 double precision, but that the mean values are observed to be more sensitive in the middle of the channel than in the near-wall region. This indicates that using lower precision in the near-wall region, where the majority of the computational efforts are required, may benefit from low-precision floating point units found in upcoming computer hardware.

  • 47.
    Karp, Martin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). Division of Computational Science and Technology, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Massaro, Daniele
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics. SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, Stockholm, Sweden.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. PDC Centre for High Performance Computing, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Hart, Alistair
    Hewlett Packard Enterpise (HPE), UK.
    Wahlgren, Jacob
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). Division of Computational Science and Technology, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics. SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, Stockholm, Sweden.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). Division of Computational Science and Technology, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Large-scale direct numerical simulations of turbulence using GPUs and modern Fortran2023In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846Article in journal (Refereed)
    Abstract [en]

    We present our approach to making direct numerical simulations of turbulence with applications in sustainable shipping. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by performing the world’s first direct numerical simulation of the flow around a Flettner rotor at Re = 30,000 and its interaction with a turbulent boundary layer. We present a performance comparison between the AMD Instinct MI250X and Nvidia A100 GPUs for scalable computational fluid dynamics. Our results show that one MI250X offers performance on par with two A100 GPUs and has a similar power efficiency based on readings from on-chip energy sensors.

  • 48.
    Karp, Martin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Kenter, Tobias
    Paderborn University.
    Plessl, Christian
    Paderborn University.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics. KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Appendix to High-Performance Spectral Element Methods on Field-Programmable Gate Arrays2020Other (Other academic)
    Abstract [en]

    In this Appendix we display some results we omitted fromour article ”High-Performance Spectral Element Methods onField-Programmable Gate Arrays”. In particular we showcasethe measured bandwidth for the FPGA we used (Stratix 10) aswell as the performance for our accelerator at different stagesof optimization. In addition to this, we show illustrate morepractical aspects of our performance/resource modeling

    Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the comforts of general-purpose architectures in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate Array (FPGA), which strikes a good balance between complexity and performance.In this paper, we study modern FPGAs' applicability for use in accelerating the Spectral Element Method (SEM) core to many computational fluid dynamics (CFD) applications. We design a custom SEM hardware accelerator that we evaluate and empirically evaluate on the latest Stratix 10 SX-series FPGAs and position its performance (and power-efficiency) against state-of-the-art systems such as ARM ThunderX2, NVIDIA Pascal/Volta/Ampere Tesla-series cards, and general-purpose manycore CPUs. Finally, we develop a performance model for our SEM-accelerator, which we use to project the performance and role of future FPGAs to accelerator CFD applications, ultimately answering the question: what characteristics would a perfect FPGA for CFD applications have?

    Download full text (pdf)
    fulltext
  • 49.
    Karp, Martin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Kenter, Tobias
    Plessl, Christian
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    High-Perfomance Spectral Element Methods on Field-Programmable Gate Arrays: Implementation, Evaluation, and Future Projection2021In: Proceedings of the 35rd IEEE International Parallel & Distributed Processing Symposium, May 17-21, 2021 Portland, Oregon, USA, Institute of Electrical and Electronics Engineers (IEEE) , 2021Conference paper (Refereed)
    Abstract [en]

     Improvements in computer systems have historically relied on two well-known observations: Moore's law and Dennard's scaling. Today, both these observations are ending, forcing computer users, researchers, and practitioners to abandon the general-purpose architectures' comforts in favor of emerging post-Moore systems. Among the most salient of these post-Moore systems is the Field-Programmable Gate Array (FPGA), which strikes a convenient balance between complexity and performance. In this paper, we study modern FPGAs' applicability in accelerating the Spectral Element Method (SEM) core to many computational fluid dynamics (CFD) applications. We design a custom SEM hardware accelerator operating in double-precision that we empirically evaluate on the latest Stratix 10 GX-series FPGAs and position its performance (and power-efficiency) against state-of-the-art systems such as ARM ThunderX2, NVIDIA Pascal/Volta/Ampere Tesla-series cards, and general-purpose manycore CPUs. Finally, we develop a performance model for our SEM-accelerator, which we use to project future FPGAs' performance and role to accelerate CFD applications, ultimately answering the question: what characteristics would a perfect FPGA for CFD applications have? 

  • 50.
    Karp, Martin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Podobas, Artur
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Kenter, Tobias
    Paderborn University.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Plessl, Christian
    Paderborn University.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays: Design, Evaluation, and Future Challenges2022In: HPCAsia2022: International Conference on High Performance Computing in Asia-Pacific Region, Association for Computing Machinery (ACM) , 2022, p. 125-136Conference paper (Refereed)
    Abstract [en]

    The impending termination of Moore’s law motivates the search for new forms of computing to continue the performance scaling we have grown accustomed to. Among the many emerging Post-Moore computing candidates, perhaps none is as salient as the Field-Programmable Gate Array (FPGA), which offers the means of specializing and customizing the hardware to the computation at hand.

    In this work, we design a custom FPGA-based accelerator for a computational fluid dynamics (CFD) code. Unlike prior work – which often focuses on accelerating small kernels – we target the entire Poisson solver on unstructured meshes based on the high-fidelity spectral element method (SEM) used in modern state-of-the-art CFD systems. We model our accelerator using an analytical performance model based on the I/O cost of the algorithm. We empirically evaluate our accelerator on a state-of-the-art Intel Stratix 10 FPGA in terms of performance and power consumption and contrast it against existing solutions on general-purpose processors (CPUs). Finally, we propose a data movement-reducing technique where we compute geometric factors on the fly, which yields significant (700+ Gflop/s) single-precision performance and an upwards of 2x reduction in runtime for the local evaluation of the Laplace operator.

    We end the paper by discussing the challenges and opportunities of using reconfigurable architecture in the future, particularly in the light of emerging (not yet available) technologies.

12 1 - 50 of 65
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf