kth.sePublications
Change search
Refine search result
1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1. Aldinucci, Marco
    et al.
    Brorsson, Mats
    KTH, School of Information and Communication Technology (ICT), Software and Computer systems, SCS.
    D'Agostino, Daniele
    Daneshtalab, Masoud
    KTH, School of Information and Communication Technology (ICT), Electronics, Electronic and embedded systems.
    Kilpatrick, Peter
    Leppanen, Ville
    Preface2017In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 31, no 3, p. 179-180Article in journal (Refereed)
  • 2. Berman, F.
    et al.
    Chien, A.
    Cooper, K.
    Dongarra, J.
    Foster, I.
    Gannon, D.
    Johnsson, Lennart
    KTH, Superseded Departments (pre-2005), Numerical Analysis and Computer Science, NADA.
    Kennedy, K.
    Kesselman, C.
    Mellor-Crummey, J.
    Reed, D.
    Torczon, L.
    Wolski, R.
    The GrADS project: Software support for high-level grid application development2001In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 15, no 4, p. 327-344Article in journal (Refereed)
    Abstract [en]

    Advances in networking technologies will soon make it possible to use the global information infrastructure in a qualitatively different way-as a computational as well as an information resource. As described in the recent book The Grid: Blueprint for a New Computing Infrastructure, this Grid will connect the nation's computers, databases, instruments, and people in a seamless web of computing and distributed intelligence, which can be used in an on demand fashion as a problem-solving resource in many fields of human endeavor-and, in particular, science and engineering. The availability of grid resources will give rise to dramatically new classes of applications, in which computing resources are no longer localized but, rather, distributed, heterogeneous, and dynamic; computation is increasingly sophisticated and multidisciplinary; and computation is integrated into our daily lives and, hence, subject to stricter time constraints than at present. The impact of these new applications will be pervasive, ranging from new systems for scientific inquiry, through computing support for crisis management, to the use of ambient computing to enhance personal mobile computing environments. To realize this vision, significant scientific and technical obstacles must be overcome. Principal among these is usability. The goal of the Grid Application Development Software (GrADS) project is to simplify distributed heterogeneous computing in the same way that the World Wide Web simplified information sharing over the Internet. To that end, the project is exploring the scientific and technical problems that must be solved to make it easier for ordinary scientific users to develop, execute, and tune applications on the Grid. In this paper, the authors describe the vision and strategies underlying the GrADS project, including the base software architecture for grid execution and performance monitoring, strategies and tools for construction of applications from libraries of grid-aware components, and development of innovative new science and engineering applications that can exploit these new technologies to run effectively in grid environments.

  • 3.
    Iakymchuk, Roman
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Graillat, Stef
    Sorbonne Univ, Paris, France..
    Defour, David
    Univ Perpignan, Perpignan, France..
    Quintana-Orti, Enrique S.
    Univ Jaime I, Castellon de La Plana, Spain..
    Hierarchical approach for deriving a reproducible unblocked LU factorization2019In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 33, no 5, p. 791-803Article in journal (Refereed)
    Abstract [en]

    We propose a reproducible variant of the unblocked LU factorization for graphics processor units (GPUs). For this purpose, we build upon Level-1/2 BLAS kernels that deliver correctly-rounded and reproducible results for the dot (inner) product, vector scaling, and the matrix-vector product. In addition, we draw a strategy to enhance the accuracy of the triangular solve via iterative refinement. Following a bottom-up approach, we finally construct a reproducible unblocked implementation of the LU factorization for GPUs, which accommodates partial pivoting for stability and can be eventually integrated in a high performance and stable algorithm for the (blocked) LU factorization.

  • 4.
    Jansson, Niclas
    et al.
    RIKEN Advanced Institute for Computational Science, Kobe, Japan.
    Bale, Rahul
    RIKEN Advanced Institute for Computational Science, Kobe, Japan.
    Onishi, Keiji
    RIKEN Advanced Institute for Computational Science, Kobe, Japan.
    Tsubokura, Makoto
    Kobe University and RIKEN Advanced Institute for Computational Science, Kobe Japan.
    CUBE: A scalable framework for large-scale industrial simulations2019In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 33, no 4, p. 678-698Article in journal (Refereed)
  • 5.
    Karp, Martin
    et al.
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). Division of Computational Science and Technology, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Massaro, Daniele
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics. SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, Stockholm, Sweden.
    Jansson, Niclas
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. PDC Centre for High Performance Computing, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Hart, Alistair
    Hewlett Packard Enterpise (HPE), UK.
    Wahlgren, Jacob
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). Division of Computational Science and Technology, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics. SimEx/FLOW, Engineering Mechanics, KTH Royal Institute of Technology, Stockholm, Sweden.
    Markidis, Stefano
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). Division of Computational Science and Technology, EECS, KTH Royal Institute of Technology, Stockholm, Sweden.
    Large-scale direct numerical simulations of turbulence using GPUs and modern Fortran2023In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846Article in journal (Refereed)
    Abstract [en]

    We present our approach to making direct numerical simulations of turbulence with applications in sustainable shipping. We use modern Fortran and the spectral element method to leverage and scale on supercomputers powered by the Nvidia A100 and the recent AMD Instinct MI250X GPUs, while still providing support for user software developed in Fortran. We demonstrate the efficiency of our approach by performing the world’s first direct numerical simulation of the flow around a Flettner rotor at Re = 30,000 and its interaction with a turbulent boundary layer. We present a performance comparison between the AMD Instinct MI250X and Nvidia A100 GPUs for scalable computational fluid dynamics. Our results show that one MI250X offers performance on par with two A100 GPUs and has a similar power efficiency based on readings from on-chip energy sensors.

  • 6. Kurzak, Jakub
    et al.
    Mirkovic, Dragan
    Petitt, Montgomery B.
    Johnsson, Lennart
    University of Houston.
    Automatic Generation of FFT for Translations of Multipole Expansions in Spherical Harmonics2008In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 22, no 2, p. 219-230Article in journal (Refereed)
    Abstract [en]

    The fast multipole method (FMM) is an efficient algorithm for calculating electrostatic interactions in molecular simulations and a promising alternative to Ewald summation methods. Translation of multipole expansion in spherical harmonics is the most important operation of the fast multipole method and the fast Fourier transform (FFT) acceleration of this operation is among the fastest methods of improving its performance. The technique relies on highly optimized implementation of fast Fourier transform routines for the desired expansion sizes, which need to incorporate the knowledge of symmetries and zero elements in the input arrays. Here a method is presented for automatic generation of such, highly optimized, routines.

  • 7.
    Markidis, Stefano
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Gong, Jing
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Hart, Alistair
    Henty, David
    Heisey, Katherine
    Fischer, Paul
    OpenACC acceleration of the Nek5000 spectral element code2015In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 29, no 3, p. 311-319Article in journal (Refereed)
    Abstract [en]

    We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. The original NekBone Fortran source code has been used as the base and enhanced by OpenACC directives. The profiling of NekBone provided an assessment of the suitability of the code for GPU systems, and indicated possible kernel optimizations. To port NekBone to GPU systems required little effort and a small number of additional lines of code (approximately one OpenACC directive per 1000 lines of code). The naïve implementation using OpenACC leads to little performance improvement: on a single node, from 16 Gflops obtained with the version without OpenACC, we reached 20 Gflops with the naïve OpenACC implementation. An optimized NekBone version leads to a 43 Gflop performance on a single node. In addition, we ported and optimized NekBone to parallel GPU systems, reaching a parallel efficiency of 79.9% on 1024 GPUs of the Titan XK7 supercomputer at the Oak Ridge National Laboratory.

  • 8. Mirkovic, D.
    et al.
    Johnsson, Lennart
    Automatic performance tuning for fast fourier transforms2004In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 18, no 1, p. 47-64Article in journal (Refereed)
    Abstract [en]

    In this paper we discuss architecture-specific performance tuning for fast Fourier transforms (FFTs) implemented in the UHFFT library. The UHFFT library is an adaptive and portable software library for FFTs developed by the authors. We present the optimization methods used at different levels, starting with the algorithm selection used for the library code generation and ending with the actual implementation and specification of the appropriate compiler optimization options. We report on the performance results for several modern microprocessor architectures.

  • 9. Otten, Matthew
    et al.
    Gong, Jing
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Mametjanov, Azamat
    Vose, Aaron
    Levesque, John
    Fischer, Paul
    Min, Misun
    An MPI/OpenACC implementation of a high-order electromagnetics solver with GPUDirect communication2016In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 30, no 3, p. 320-334Article in journal (Refereed)
    Abstract [en]

    We present performance results and an analysis of a message passing interface (MPI)/OpenACC implementation of an electromagnetic solver based on a spectral-element discontinuous Galerkin discretization of the time-dependent Maxwell equations. The OpenACC implementation covers all solution routines, including a highly tuned element-by-element operator evaluation and a GPUDirect gather-scatter kernel to effect nearest neighbor flux exchanges. Modifications are designed to make effective use of vectorization, streaming, and data management. Performance results using up to 16,384 graphics processing units of the Cray XK7 supercomputer Titan show more than 2.5x speedup over central processing unit-only performance on the same number of nodes (262,144 MPI ranks) for problem sizes of up to 6.9 billion grid points. We discuss performance-enhancement strategies and the overall potential of GPU-based computing for this class of problems.

  • 10.
    Simmendinger, Christian
    et al.
    T Syst Solut Res, Stuttgart, Germany..
    Iakymchuk, Roman
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Cebamanos, Luis
    Univ Edinburgh, EPCC, Edinburgh, Midlothian, Scotland..
    Akhmetova, Dana
    KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).
    Bartsch, Valeria
    Fraunhofer ITWM, HPC Dept, Kaiserslautern, Germany..
    Rotaru, Tiberiu
    Fraunhofer ITWM, Kaiserslautern, Germany..
    Rahn, Mirko
    Fraunhofer ITWM, HPC Dept, Kaiserslautern, Germany..
    Laure, Erwin
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC. KTH Royal Inst Technol, High Performance Comp, Stockholm, Sweden.;KTH Royal Inst Technol, PDC Ctr, High Performance Comp Ctr, Stockholm, Sweden..
    Markidis, Stefano
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST). KTH Royal Inst Technol, High Performance Comp, Stockholm, Sweden..
    Interoperability strategies for GASPI and MPI in large-scale scientific applications2019In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 33, no 3, p. 554-568Article in journal (Refereed)
    Abstract [en]

    One of the main hurdles of partitioned global address space (PGAS) approaches is the dominance of message passing interface (MPI), which as a de facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like global address space programming interface (GASPI) without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we consider an interoperable GASPI/MPI implementation for the communication/performance crucial parts of the Ludwig and iPIC3D applications. To address the discovered performance limitations, we develop a novel strategy for significantly improved performance and interoperability between both APIs by leveraging GASPI shared windows and shared notifications. First results with a corresponding implementation in the MiniGhost proxy application and the Allreduce collective operation demonstrate the viability of this approach.

1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf