Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 31) Show all publications
Eliasson, P., Gong, J. & Nordström, J. (2018). A stable and conservative coupling of the unsteady compressible navier-stokes equations at interfaces using finite difference and finite volume methods. In: AIAA Aerospace Sciences Meeting, 2018: . Paper presented at AIAA Aerospace Sciences Meeting, 2018, Kissimmee, United States, 8 January 2018 through 12 January 2018. American Institute of Aeronautics and Astronautics Inc, AIAA (210059)
Open this publication in new window or tab >>A stable and conservative coupling of the unsteady compressible navier-stokes equations at interfaces using finite difference and finite volume methods
2018 (English)In: AIAA Aerospace Sciences Meeting, 2018, American Institute of Aeronautics and Astronautics Inc, AIAA , 2018, no 210059Conference paper, Published paper (Refereed)
Abstract [en]

Stable and conservative interface boundary conditions are developed for the unsteady compressible Navier-Stokes equations using finite difference and finite volume methods. The finite difference approach is based on summation-by-part operators and can be made higher order accurate with boundary conditions imposed weakly. The finite volume approach is an edge- and dual grid-based approach for unstructured grids, formally second order accurate in space, with weak boundary conditions as well. Stable and conservative weak boundary conditions are derived for interfaces between finite difference methods, for finite volume methods and for the coupling between the two approaches. The three types of interface boundary conditions are demonstrated for two test cases. Firstly, inviscid vortex propagation with a known analytical solution is considered. The results show expected error decays as the grid is refined for various couplings and spatial accuracy of the finite difference scheme. The second test case involves viscous laminar flow over a cylinder with vortex shedding. Calculations with various coupling and spatial accuracies of the finite difference solver show that the couplings work as expected and that the higher order finite difference schemes provide enhanced vortex propagation.

Place, publisher, year, edition, pages
American Institute of Aeronautics and Astronautics Inc, AIAA, 2018
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-225496 (URN)10.2514/6.2018-0597 (DOI)2-s2.0-85044411159 (Scopus ID)9781624105241 (ISBN)
Conference
AIAA Aerospace Sciences Meeting, 2018, Kissimmee, United States, 8 January 2018 through 12 January 2018
Funder
Swedish e‐Science Research CenterThe Swedish Foundation for International Cooperation in Research and Higher Education (STINT)VINNOVA
Note

QC 20180406

Available from: 2018-04-06 Created: 2018-04-06 Last updated: 2018-04-06Bibliographically approved
Zhang, M., Melin, T., Gong, J., Barth, M. & Axner, L. (2018). Mixed Fidelity Aerodynamic and Aero-Structural Optimization for Wings. In: 2018 International Conference on High Performance Computing & Simulation: . Paper presented at Conference: HPC and Modeling & Simulation for the 21st Century, At Orléans, France (pp. 476-483).
Open this publication in new window or tab >>Mixed Fidelity Aerodynamic and Aero-Structural Optimization for Wings
Show others...
2018 (English)In: 2018 International Conference on High Performance Computing & Simulation, 2018, p. 476-483Conference paper, Published paper (Refereed)
Abstract [en]

Automatic multidisciplinary design optimization is one of the challenges that are faced in the processes involved in designing efficient wings for aircraft. In this paper we present mixed fidelity aerodynamic and aero-structural optimization methods for designing wings. A novel shape design methodology has been developed - it is based on a mix of the automatic aerodynamic optimization for a reference aircraft model, and the aero-structural optimization for an uninhabited air vehicle (UAV) with a high aspect ratio wing. This paper is a significant step towards making it possible to perform all the core processes for aerodynamic and aero-structural optimization that require special skills in a fully automatic manner - this covers all the processes from creating the mesh for the wing simulation to executing the high-fidelity computational fluid dynamics (CFD) analysis code. Our results confirm that the simulation tools can make it possible for a far broader range of engineering researchers and developers to design aircraft in much simpler and more efficient ways. This is a vital step in the evolution of wing design processes as it means that the extremely expensive laboratory experiments that were traditionally used when designing the wings can now be replaced with more cost effective high performance computing (HPC) simulation that utilize accurate numerical methods.

Keywords
Multidisciplinary design optimization (MDO); Computational fluid dynamics (CFD); High performance computing
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-232360 (URN)10.1109/HPCS.2018.00081 (DOI)000450677700064 ()2-s2.0-85057381095 (Scopus ID)978-1-5386-7877-0 (ISBN)
Conference
Conference: HPC and Modeling & Simulation for the 21st Century, At Orléans, France
Funder
Swedish e‐Science Research Center
Note

QC 20180808

Available from: 2018-07-20 Created: 2018-07-20 Last updated: 2018-12-10Bibliographically approved
Gong, J., Markidis, S., Laure, E., Otten, M., Fischer, P. & Min, M. (2016). Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations. Journal of Supercomputing, 72(11), 4160-4180
Open this publication in new window or tab >>Nekbone performance on GPUs with OpenACC and CUDA Fortran implementations
Show others...
2016 (English)In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 72, no 11, p. 4160-4180Article in journal (Refereed) Published
Abstract [en]

We present a hybrid GPU implementation and performance analysis of Nekbone, which represents one of the core kernels of the incompressible Navier-Stokes solver Nek5000. The implementation is based on OpenACC and CUDA Fortran for local parallelization of the compute-intensive matrix-matrix multiplication part, which significantly minimizes the modification of the existing CPU code while extending the simulation capability of the code to GPU architectures. Our discussion includes the GPU results of OpenACC interoperating with CUDA Fortran and the gather-scatter operations with GPUDirect communication. We demonstrate performance of up to 552 Tflops on 16, 384 GPUs of the OLCF Cray XK7 Titan.

Place, publisher, year, edition, pages
Springer, 2016
Keywords
Nekbone/Nek5000, OpenACC, CUDA Fortran, GPUDirect, Gather-scatter communication, Spectral element discretization
National Category
Electrical Engineering, Electronic Engineering, Information Engineering
Identifiers
urn:nbn:se:kth:diva-198970 (URN)10.1007/s11227-016-1744-5 (DOI)000387234200007 ()2-s2.0-84978656496 (Scopus ID)
Funder
Swedish e‐Science Research Center
Note

QC 20170116

Available from: 2017-01-16 Created: 2016-12-22 Last updated: 2017-08-16Bibliographically approved
Offermans, N., Marin, O., Schanen, M., Gong, J., Fischer, P. & Schlatter, P. (2016). On the strong scaling of the spectral element solver Nek5000 on petascale systems. In: Proceedings of the 2016 Exascale Applications and Software Conference (EASC2016): April 25-29 2016, Stockholm, Sweden. Paper presented at 2016 Exascale Applications and Software Conference, EASC 2016, Stockholm, Sweden, 25 April 2016 through 29 April 2016. Association for Computing Machinery (ACM), Article ID a5.
Open this publication in new window or tab >>On the strong scaling of the spectral element solver Nek5000 on petascale systems
Show others...
2016 (English)In: Proceedings of the 2016 Exascale Applications and Software Conference (EASC2016): April 25-29 2016, Stockholm, Sweden, Association for Computing Machinery (ACM), 2016, article id a5Conference paper, Published paper (Refereed)
Abstract [en]

The present work is targeted at performing a strong scaling study of the high-order spectral element uid dynamics solver Nek5000. Prior studies such as [5] indicated a recommendable metric for strong scalability from a theoretical viewpoint, which we test here extensively on three parallel machines with different performance characteristics and interconnect networks, namely Mira (IBM Blue Gene/Q), Beskow (Cray XC40) and Titan (Cray XK7). The test cases considered for the simulations correspond to a turbulent ow in a straight pipe at four different friction Reynolds numbers Reτ = 180, 360, 550 and 1000. Considering the linear model for parallel communication we quantify the machine characteristics in order to better assess the scaling behaviors of the code. Subsequently sampling and profiling tools are used to measure the computation and communication times over a large range of compute cores. We also study the effect of the two coarse grid solvers XXT and AMG on the computational time. Super-linear scaling due to a reduction in cache misses is observed on each computer. The strong scaling limit is attained for roughly 5000 - 10; 000 degrees of freedom per core on Mira, 30; 000 - 50; 0000 on Beskow, with only a small impact of the problem size for both machines, and ranges between 10; 000 and 220; 000 depending on the problem size on Titan. This work aims at being a reference for Nek5000 users and also serves as a basis for potential issues to address as the community heads towards exascale supercomputers.

Place, publisher, year, edition, pages
Association for Computing Machinery (ACM), 2016
Series
ACM International Conference Proceeding Series
Keywords
Benchmarking, Computational fluid dynamics, Nek5000, Scaling, Degrees of freedom (mechanics), Reynolds number, Supercomputers, Computational time, Interconnect networks, Parallel communication, Parallel machine, Performance characteristics, Spectral element, Application programs
National Category
Mechanical Engineering
Identifiers
urn:nbn:se:kth:diva-207506 (URN)10.1145/2938615.2938617 (DOI)2-s2.0-85014776002 (Scopus ID)9781450341226 (ISBN)
Conference
2016 Exascale Applications and Software Conference, EASC 2016, Stockholm, Sweden, 25 April 2016 through 29 April 2016
Funder
Swedish e‐Science Research Center
Note

QC 20170814

Available from: 2017-06-07 Created: 2017-06-07 Last updated: 2019-05-16Bibliographically approved
Bongo, L. A., Ciegis, R., Frasheri, N., Gong, J., Kimovski, D., Kropf, P., . . . Wyrzykowski, R. (2015). Applications for Ultrascale Computing. Supercomputing Frontiers and Innovations, 2(1), 19-48
Open this publication in new window or tab >>Applications for Ultrascale Computing
Show others...
2015 (English)In: Supercomputing Frontiers and Innovations, ISSN 2409-6008, Vol. 2, no 1, p. 19-48Article in journal (Refereed) Published
Abstract [en]

Studies of complex physical and engineering systems, represented by multi-scale and multi-physics computer simulations have an increasing demand for computing power, especially when the simulations of realistic problems are considered. This demand is driven by the increasing size and complexity of the studied systems or the time constraints. Ultrascale computing systems offer a possible solution to this problem. Future ultrascale systems will be large-scale complex computing systems combining technologies from high performance computing, distributed systems, big data, and cloud computing. Thus, the challenge of developing and programming complex algorithms on these systems is twofold. Firstly, the complex algorithms have to be either developed from scratch, or redesigned in order to yield high performance, while retaining correct functional behaviour. Secondly, ultrascale computing systems impose a number of non-functional cross-cutting concerns, such as fault tolerance or energy consumption, which can significantly impact the deployment of applications on large complex systems. This article discusses the state-of-the-art of programming for current and future large scale systems with an emphasis on complex applications. We derive a number of programming and execution support requirements by studying several computing applications that the authors are currently developing and discuss their potential and necessary upgrades for ultrascale execution.

National Category
Computer Systems Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-171356 (URN)10.14529/jsfi150102 (DOI)
Note

QC 20150817

Available from: 2015-07-27 Created: 2015-07-27 Last updated: 2017-08-16Bibliographically approved
Ivanov, I., Machado, R., Rahn, M., Akhmetova, D., Laure, E., Gong, J., . . . Markidis, S. (2015). Evaluating New Communication Models in the Nek5000 Code for Exascale. In: : . Paper presented at EASC2015. Epigram
Open this publication in new window or tab >>Evaluating New Communication Models in the Nek5000 Code for Exascale
Show others...
2015 (English)Conference paper, Oral presentation with published abstract (Other academic)
Place, publisher, year, edition, pages
Epigram, 2015
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-181105 (URN)
Conference
EASC2015
Note

QC 20160129

Available from: 2016-01-29 Created: 2016-01-29 Last updated: 2016-01-29Bibliographically approved
Ivanov, I., Gong, J., Akhmetova, D., Peng, I. B., Markidis, S., Laure, E., . . . Fischer, P. (2015). Evaluation of Parallel Communication Models in Nekbone, a Nek5000 mini-application. In: 2015 IEEE International Conference on Cluster Computing: . Paper presented at IEEE Cluster 2015 (pp. 760-767). IEEE
Open this publication in new window or tab >>Evaluation of Parallel Communication Models in Nekbone, a Nek5000 mini-application
Show others...
2015 (English)In: 2015 IEEE International Conference on Cluster Computing, IEEE , 2015, p. 760-767Conference paper, Published paper (Refereed)
Abstract [en]

Nekbone is a proxy application of Nek5000, a scalable Computational Fluid Dynamics (CFD) code used for modelling incompressible flows. The Nekbone mini-application is used by several international co-design centers to explore new concepts in computer science and to evaluate their performance. We present the design and implementation of a new communication kernel in the Nekbone mini-application with the goal of studying the performance of different parallel communication models. First, a new MPI blocking communication kernel has been developed to solve Nekbone problems in a three-dimensional Cartesian mesh and process topology. The new MPI implementation delivers a 13% performance improvement compared to the original implementation. The new MPI communication kernel consists of approximately 500 lines of code against the original 7,000 lines of code, allowing experimentation with new approaches in Nekbone parallel communication. Second, the MPI blocking communication in the new kernel was changed to the MPI non-blocking communication. Third, we developed a new Partitioned Global Address Space (PGAS) communication kernel, based on the GPI-2 library. This approach reduces the synchronization among neighbor processes and is on average 3% faster than the new MPI-based, non-blocking, approach. In our tests on 8,192 processes, the GPI-2 communication kernel is 3% faster than the new MPI non-blocking communication kernel. In addition, we have used the OpenMP in all the versions of the new communication kernel. Finally, we highlight the future steps for using the new communication kernel in the parent application Nek5000.

Place, publisher, year, edition, pages
IEEE, 2015
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-181104 (URN)10.1109/CLUSTER.2015.131 (DOI)000378648100121 ()2-s2.0-84959298440 (Scopus ID)
Conference
IEEE Cluster 2015
Note

QC 20160205

Available from: 2016-01-29 Created: 2016-01-29 Last updated: 2018-01-10Bibliographically approved
Gong, J., Markidis, S., Schliephake, M., Laure, E., Henningson, D., Schlatter, P., . . . Fischer, P. (2015). Nek5000 with OpenACC. In: Solving software challenges for exascale: . Paper presented at 2nd International Conference on Exascale Applications and Software (EASC), APR 02-03, 2014, Stockholm, SWEDEN (pp. 57-68).
Open this publication in new window or tab >>Nek5000 with OpenACC
Show others...
2015 (English)In: Solving software challenges for exascale, 2015, p. 57-68Conference paper, Published paper (Refereed)
Abstract [en]

Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flows. We follow up on an earlier study which ported the simplified version of Nek5000 to a GPU-accelerated system by presenting the hybrid CPU/GPU implementation of the full Nek5000 code using OpenACC. The matrix-matrix multiplication, the Nek5000 gather-scatter operator and a preconditioned Conjugate Gradient solver have implemented using OpenACC for multi-GPU systems. We report an speed-up of 1.3 on single node of a Cray XK6 when using OpenACC directives in Nek5000. On 512 nodes of the Titan supercomputer, the speed-up can be approached to 1.4. A performance analysis of the Nek5000 code using Score-P and Vampir performance monitoring tools shows that overlapping of GPU kernels with host-accelerator memory transfers would considerably increase the performance of the OpenACC version of Nek5000 code.

Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8759
Keywords
GPU programming, Nek5000, OpenACC, Spectral element method
National Category
Computer Sciences
Identifiers
urn:nbn:se:kth:diva-170716 (URN)10.1007/978-3-319-15976-8_4 (DOI)000355749700004 ()2-s2.0-84928882903 (Scopus ID)978-3-319-15975-1 (ISBN)978-3-319-15976-8 (ISBN)
Conference
2nd International Conference on Exascale Applications and Software (EASC), APR 02-03, 2014, Stockholm, SWEDEN
Note

QC 20150706

Available from: 2015-07-06 Created: 2015-07-03 Last updated: 2018-01-11Bibliographically approved
Gong, J., Markidis, S., Schliephake, M., Laure, E., Cebamanos, L., Hart, A., . . . Fischer, P. (2015). NekBone with Optimizaed OpenACC directives. In: : . Paper presented at the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015) (pp. 63-70).
Open this publication in new window or tab >>NekBone with Optimizaed OpenACC directives
Show others...
2015 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Accelerators and, in particular, Graphics Processing Units (GPUs) have emerged as promising computing technologies which may be suitable for the future Exascale systems. Here, we present performance results of NekBone, a benchmark of the Nek5000 code, implemented with optimized OpenACC directives and GPUDirect communications. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. Results of an optimized NekBone version lead to 78 Gflops performance on a single node. In addition, a performance result of 609 Tflops has been reached on 16, 384 GPUs of the Titan supercomputer at Oak Ridge National Laboratory.

 

Publisher
p. 8
Keywords
NekBone/Nek5000, OpenACC, Spectral element method, GPUDi rect
National Category
Engineering and Technology
Identifiers
urn:nbn:se:kth:diva-180731 (URN)978-84-608-2581-4 (ISBN)
Conference
the Second International Workshop on Sustainable Ultrascale Computing Systems (NESUS 2015)
Note

QC 20160308

Available from: 2016-01-21 Created: 2016-01-21 Last updated: 2017-08-16Bibliographically approved
Markidis, S., Gong, J., Schliephake, M., Laure, E., Hart, A., Henty, D., . . . Fischer, P. (2015). OpenACC acceleration of the Nek5000 spectral element code. The international journal of high performance computing applications, 29(3), 311-319
Open this publication in new window or tab >>OpenACC acceleration of the Nek5000 spectral element code
Show others...
2015 (English)In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 29, no 3, p. 311-319Article in journal (Refereed) Published
Abstract [en]

We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. The original NekBone Fortran source code has been used as the base and enhanced by OpenACC directives. The profiling of NekBone provided an assessment of the suitability of the code for GPU systems, and indicated possible kernel optimizations. To port NekBone to GPU systems required little effort and a small number of additional lines of code (approximately one OpenACC directive per 1000 lines of code). The naïve implementation using OpenACC leads to little performance improvement: on a single node, from 16 Gflops obtained with the version without OpenACC, we reached 20 Gflops with the naïve OpenACC implementation. An optimized NekBone version leads to a 43 Gflop performance on a single node. In addition, we ported and optimized NekBone to parallel GPU systems, reaching a parallel efficiency of 79.9% on 1024 GPUs of the Titan XK7 supercomputer at the Oak Ridge National Laboratory.

Place, publisher, year, edition, pages
Sage Publications, 2015
National Category
Computer Sciences Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-171357 (URN)10.1177/1094342015576846 (DOI)000358414200006 ()2-s2.0-84938095938 (Scopus ID)
Funder
Swedish e‐Science Research Center
Note

QC 20150804

Available from: 2015-07-27 Created: 2015-07-27 Last updated: 2018-01-11Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-3859-9480

Search in DiVA

Show all publications