Change search
Refine search result
1 - 14 of 14
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the 'Create feeds' function.
  • 1.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Vahtras, Olav
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Gimenez, Judit
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Scalability analysis of Dalton, a molecular structure program2013In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 29, no 8, p. 2197-2204Article in journal (Refereed)
    Abstract [en]

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a performance characterization and optimization of Dalton. We also propose a solution to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers. With these improvements we obtain speedups of 4x, increasing the parallel efficiency of the code and being able to run in it in a much bigger number of cores.

  • 2.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Vahtras, Olav
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Gimenez, Judit
    Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Barcelona, Spain.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Scaling Dalton, a molecular electronic structure program2011In: Seventh International Conference on e-Science, e-Science 2011, 5-8 December 2011, Stockholm, Sweden, IEEE conference proceedings, 2011, p. 256-262Conference paper (Refereed)
    Abstract [en]

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a characterization and performance optimization of Dalton that increases the scalability and parallel efficiency of the application. We also propose asolution that helps to avoid the master/worker design of Daltonto become a performance bottleneck for larger process numbers and increase the parallel efficiency.

  • 3.
    Gong, Jing
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Hart, Alistair
    Cray Inc..
    Henty, David
    University of Edinburgh.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fischer, Paul
    Argonne National Laboratory.
    Heisey, Katherine
    Argonne National Laboratory.
    OpenACC Acceleration of Nek5000: a Spectral Element Code2013Conference paper (Other academic)
  • 4.
    Gong, Jing
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Cebamanos, Luis
    Hart, Alistair
    Min, Misun
    Fischer, Paul
    NekBone with Optimizaed OpenACC directives2015Conference paper (Refereed)
    Abstract [en]

    Accelerators and, in particular, Graphics Processing Units (GPUs) have emerged as promising computing technologies which may be suitable for the future Exascale systems. Here, we present performance results of NekBone, a benchmark of the Nek5000 code, implemented with optimized OpenACC directives and GPUDirect communications. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. Results of an optimized NekBone version lead to 78 Gflops performance on a single node. In addition, a performance result of 609 Tflops has been reached on 16, 384 GPUs of the Titan supercomputer at Oak Ridge National Laboratory.

     

  • 5.
    Gong, Jing
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Henningson, Dan
    KTH, School of Engineering Sciences (SCI), Mechanics, Stability, Transition and Control. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schlatter, Philipp
    KTH, School of Engineering Sciences (SCI), Mechanics, Stability, Transition and Control. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Peplinski, Adam
    Hart, Alistair
    Doleschal, Jens
    Henty, David
    Fischer, Paul
    Nek5000 with OpenACC2015In: Solving software challenges for exascale, 2015, p. 57-68Conference paper (Refereed)
    Abstract [en]

    Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flows. We follow up on an earlier study which ported the simplified version of Nek5000 to a GPU-accelerated system by presenting the hybrid CPU/GPU implementation of the full Nek5000 code using OpenACC. The matrix-matrix multiplication, the Nek5000 gather-scatter operator and a preconditioned Conjugate Gradient solver have implemented using OpenACC for multi-GPU systems. We report an speed-up of 1.3 on single node of a Cray XK6 when using OpenACC directives in Nek5000. On 512 nodes of the Titan supercomputer, the speed-up can be approached to 1.4. A performance analysis of the Nek5000 code using Score-P and Vampir performance monitoring tools shows that overlapping of GPU kernels with host-accelerator memory transfers would considerably increase the performance of the OpenACC version of Nek5000 code.

  • 6. Hilbel, T.
    et al.
    Lux, R. L.
    Dietzsch, J.
    Schliephake, Michael
    Katus, H. A.
    Performance and productivity benefits using multi-core processors for the analysis of digital long-term ECG recordings2008In: Computers in Cardiology, 2008, 2008, p. 1069-1072Conference paper (Refereed)
    Abstract [en]

    Modern Holter recorders allow the acquisition of 12 lead ECGs with a sampling rate of IKHz or higher and a resolution of 16 bits over more than 24h. While large volumes of data can be easily stored on flash memory cards the analysis of these biosignals requires considerable calculation power and network bandwidth. In general, processing time is important when hundreds of digital recordings from large study groups need to be analyzed The objective of the following investigation is to address the question: Can performance and productivity benefits in ECG analysis be achieved using multi-core processor technology? Because these processors have two or more processing cores they can perform parallel processing. The results show that segmentation of Holter data and running the same program multiple times simultaneously can dramatically speedup the computing performance.

  • 7.
    Markidis, Stefano
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Gong, Jing
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Hart, Alistair
    Henty, David
    Heisey, Katherine
    Fischer, Paul
    OpenACC acceleration of the Nek5000 spectral element code2015In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 29, no 3, p. 311-319Article in journal (Refereed)
    Abstract [en]

    We present a case study of porting NekBone, a skeleton version of the Nek5000 code, to a parallel GPU-accelerated system. Nek5000 is a computational fluid dynamics code based on the spectral element method used for the simulation of incompressible flow. The original NekBone Fortran source code has been used as the base and enhanced by OpenACC directives. The profiling of NekBone provided an assessment of the suitability of the code for GPU systems, and indicated possible kernel optimizations. To port NekBone to GPU systems required little effort and a small number of additional lines of code (approximately one OpenACC directive per 1000 lines of code). The naïve implementation using OpenACC leads to little performance improvement: on a single node, from 16 Gflops obtained with the version without OpenACC, we reached 20 Gflops with the naïve OpenACC implementation. An optimized NekBone version leads to a 43 Gflop performance on a single node. In addition, we ported and optimized NekBone to parallel GPU systems, reaching a parallel efficiency of 79.9% on 1024 GPUs of the Titan XK7 supercomputer at the Oak Ridge National Laboratory.

  • 8.
    Markidis, Stefano
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Aguilar, Xavier
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Henty, David
    University of Edinburgh.
    Richardson, Harvey
    Cray Inc..
    Hart, Alistair
    Cray Inc..
    Gray, Alan
    University of Edinburgh.
    Lecomber, David
    Allinea Software Limited.
    Hilbrich, Tobias
    Technische Universität Dresden.
    Doleschal, Jens
    Technische Universität Dresden.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Paving the path to exascale computing with CRESTA development environment2013Conference paper (Other academic)
    Abstract [en]

    The development and implementation of efficient computer codes for exascale supercomputers will require combined advancement of all development environment components: compilers, automatic tuning frameworks, run-time systems, debuggers and performance monitoring and analysis tools. The exascale era poses unprecedented challenges. Because the presence of accelerators is more and more common among the fastest supercomputer and will play a role in exascale computing, compilers will need to support hybrid computer architectures and generate efficient code hiding the complexity of programming accelerators. Hand optimization of the code will be very difficult on exascale machine and will be increasingly assisted by automatic tuners. Application tuning will be more focus on parallel aspects of the computation because of large amount of available parallelism. The application workload will be distributed over million of processes, and to implement ad-hoc strategies directly in the application will be probably unfeasible while an adaptive run-time system will provide automatic load balancing. Debuggers and performance monitoring tools will deal with million processes and with huge amount of data from application and hardware counters, but they will still be required to minimize the overhead and retain scalability. In this talk, we present how the development environment of the CRESTA exascale EC project meets all these challenges by advancing the state of the art in the field.

    An investigation of compiler support for hybrid GPU programming, the design concepts, and the main characteristics of the alpha prototype implementation of the CRESTA development environment components for exascale computing are presented. A performance study of OpenACC compiler directives has been carried out, showing very promising results and indicating OpenACC as viable approach for programming hybrid exascale supercomputer. A new Domain-Specific Language (DSL) has been defined for the expression of parallel auto-tuning at very large scale. The focus of on the extension of the auto-tuning approach into the parallel domain to enable tuning of communication-related aspects of application. A new adaptive run-time system has been designed to schedule processes depending on the resource availability, on the workload, and on the run-time analysis of the application performance. The Allinea DDT debugger and the Dresden University of Technology MUST MPI correctness checker are being extended to provide a unified interface, to improve scalability, and to include new disruptive technology based on statistical analysis of run-time behavior of the application for anomalies detection. The new exascale prototypes of the Dresden University of Technology Vampir, VampirTrace and Score-P performance monitoring and analysis tools have been released. The new features include the possibility of applying filtering technique before loading performance data to drastically reduce memory needs during the performance analysis. The initial evaluation study of the development environment is targeted on the CRESTA project applications to determine how the development environment could be coupled into a production suite for exascale computing.

  • 9. Saini, Subash
    et al.
    Talcott, Dale
    Rabenseifner, Rolf
    Schliephake, Michael
    High-Performance Computing-Center (HLRS), Stuttgart, Germany.
    Benkert, Katharina
    Performance Comparison of Cray XT4 with SGI Altix 4700, IBM POWER5+, SGI ICE 8200, and NEC SX-8 using HPCC and NPB Benchmarks2008In: Proceedings of the Cray Users Group Conference 2008 (CUG 2008), 2008Conference paper (Refereed)
  • 10.
    Schliephake, Michael
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Aguilar, Xavier
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Design and Implementation of a Runtime System for Parallel Numerical Simulations on Large-Scale Clusters2011In: Proceedings Of The International Conference On Computational Science (ICCS) / [ed] Sato, M; Matsuoka, S; Sloot, PMA; VanAlbada, GD; Dongarra, J, Elsevier, 2011, Vol. 4, p. 2105-2114Conference paper (Refereed)
    Abstract [en]

    The execution of scientific codes will introduce a number of new challenges and intensify some old ones on new high-performance computing infrastructures. Petascale computers are large systems with complex designs using heterogeneous technologies that make the programming and porting of applications difficult, particularly if one wants to use the maximum peak performance of the system. In this paper we present the design and first prototype of a runtime system for parallel numerical simulations on large-scale systems. The proposed runtime system addresses the challenges of performance, scalability, and programmability of large-scale HPC systems. We also present initial results of our prototype implementation using a molecular dynamics application kernel.

  • 11.
    Schliephake, Michael
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Communication Performance Analysis of CRESTA’s Co-Design Application NEK50002012In: Workshop Preparing Applications for Exascale Through Co-design in International Conference on High Performance Computing, 2012Conference paper (Refereed)
  • 12.
    Schliephake, Michael
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Performance Analysis of Irregular Collective Communication with the Crystal Router Algorithm2015In: Solving software challenges for exascale, 2015, p. 130-140Conference paper (Refereed)
    Abstract [en]

    In order to achieve exascale performance it is important to detect potential bottlenecks and identify strategies to overcome them. For this, both applications and system software must be analysed and potentially improved. The EU FP7 project Collaborative Research into Exascale Systemware, Tools & Applications (CRESTA) chose the approach to co-design advanced simulation applications and system software as well as development tools. In this paper, we present the results of a co-design activity focused on the simulation code NEK5000 that aims at performance improvements of collective communication operations. We have analysed the algorithms that form the core of NEK5000's communication module in order to assess its viability on recent computer architectures before starting to improve its performance. Our results show that the crystal router algorithm performs well in sparse, irregular collective operations for medium and large processor number but improvements for even larger system sizes of the future will be needed. We sketch the needed improvements, which will make the communication algorithms also beneficial for other applications that need to implement latency-dominated communication schemes with short messages. The latency-optimised communication operations will also become used in a runtime-system providing dynamic load balancing, under development within CRESTA.

  • 13.
    Schliephake, Michael
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Towards improving the communication performance of CRESTA's co-design application NEK50002012In: Proceedings - 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, SCC 2012, IEEE , 2012, p. 669-674Conference paper (Refereed)
    Abstract [en]

    In order to achieve exascale performance, all aspects of applications and system software need to be analysed and potentially improved. The EU FP7 project 'Collaborative Research into Exascale Systemware, Tools & Applications' (CRESTA) uses co-design of advanced simulation applications and system software as well as related development tools as a key element in its approach towards exascale. In this paper we present first results of a co-design activity using the highly scalable application NEK5000. We have analysed the communication structure of NEK5000 and propose new, optimised collective communication operations that will allow to improve the performance of NEK5000 and to prepare it for the use on several millions of cores available in future HPC systems. The latency-optimised communication operations can also be beneficial in other contexts, for instance we expect them to become an important building block for a runtime-system providing dynamic load balancing, also under development within CRESTA.

  • 14.
    Schliephake, Michael
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Heisey, Katherine
    Argonne National Laboratory.
    Fischer, Paul
    Argonne National Laboratory.
    Design, implementation and use of mampicl, the multi-algorithm MPI collective library2013Conference paper (Other academic)
1 - 14 of 14
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf