Change search
Refine search result
1 - 16 of 16
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Aguilar, Xavier
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Towards Scalable Performance Analysis of MPI Parallel Applications2015Licentiate thesis, comprehensive summary (Other academic)
    Abstract [en]

      A considerably fraction of science discovery is nowadays relying on computer simulations. High Performance Computing  (HPC) provides scientists with the means to simulate processes ranging from climate modeling to protein folding. However, achieving good application performance and making an optimal use of HPC resources is a heroic task due to the complexity of parallel software. Therefore, performance tools  and runtime systems that help users to execute  applications in the most optimal way are of utmost importance in the landscape of HPC.  In this thesis, we explore different techniques to tackle the challenges of collecting, storing, and using  fine-grained performance data. First, we investigate the automatic use of real-time performance data in order to run applications in an optimal way. To that end, we present a prototype of an adaptive task-based runtime system that uses real-time performance data for task scheduling. This runtime system has a performance monitoring component that provides real-time access to the performance behavior of anapplication while it runs. The implementation of this monitoring component is presented and evaluated within this thesis. Secondly, we explore lossless compression approaches  for MPI monitoring. One of the main problems that  performance tools face is the huge amount of fine-grained data that can be generated from an instrumented application. Collecting fine-grained data from a program is the best method to uncover the root causes of performance bottlenecks, however, it is unfeasible with extremely parallel applications  or applications with long execution times. On the other hand, collecting coarse-grained data is scalable but  sometimes not enough to discern the root cause of a performance problem. Thus, we propose a new method for performance monitoring of MPI programs using event flow graphs. Event flow graphs  provide very low overhead in terms of execution time and  storage size, and can be used to reconstruct fine-grained trace files of application events ordered in time.

  • 2.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fuerlinger, Karl
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Automatic On-Line Detection of MPI Application Structure with Event Flow Graphs2015In: EURO-PAR 2015: PARALLEL PROCESSING, Springer Berlin/Heidelberg, 2015, p. 70-81Conference paper (Refereed)
    Abstract [en]

    The deployment of larger and larger HPC systems challenges the scalability of both applications and analysis tools. Performance analysis toolsets provide users with means to spot bottlenecks in their applications by either collecting aggregated statistics or generating loss-less time-stamped traces. While obtaining detailed trace information is the best method to examine the behavior of an application in detail, it is infeasible at extreme scales due to the huge volume of data generated. In this context, knowing the application structure, and particularly the nesting of loops in iterative applications is of great importance as it allows, among other things, to reduce the amount of data collected by focusing on important sections of the code. In this paper we demonstrate how the loop nesting structure of an MPI application can be extracted on-line from its event flow graph without the need of any explicit source code instrumentation. We show how this knowledge on the application structure can be used to compute postmortem statistics as well as to reduce the amount of redundant data collected. To that end, we present a usage scenario where this structure information is utilized on-line (while the application runs) to intelligently collect fine-grained data for only a few iterations of an application, considerably reducing the amount of data gathered.

  • 3.
    Aguilar, Xavier
    et al.
    KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Fürlinger, K.
    Laure, Erwin
    KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Online MPI trace compression using event flow graphs and wavelets2016In: Procedia Computer Science, Elsevier, 2016, p. 1497-1506Conference paper (Refereed)
    Abstract [en]

    Performance analysis of scientific parallel applications is essential to use High Performance Computing (HPC) infrastructures efficiently. Nevertheless, collecting detailed data of large-scale parallel programs and long-running applications is infeasible due to the huge amount of performance information generated. Even though there are no technological constraints in storing Terabytes of performance data, the constant flushing of such data to disk introduces a massive overhead into the application that makes the performance measurements worthless. This paper explores the use of Event flow graphs together with wavelet analysis and EZW-encoding to provide MPI event traces that are orders of magnitude smaller while preserving accurate information on timestamped events. Our mechanism compresses the performance data online while the application runs, thus, reducing the pressure put on the I/O system due to buffer flushing. As a result, we achieve lower application perturbation, reduced performance data output, and the possibility to monitor longer application runs. © The Authors. Published by Elsevier B.V.

  • 4.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fürlinger, Karl
    Ludwig-Maximilians-Universitat (LMU).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    MPI Trace Compression Using Event Flow Graphs2014Conference paper (Refereed)
    Abstract [en]

    Understanding how parallel applications behave is crucial for using high-performance computing (HPC) resources efficiently. However, the task of performance analysis is becoming increasingly difficult due to the growing complexity of scientific codes and the size of machines. Even though many tools have been developed over the past years to help in this task, current approaches either only offer an overview of the application discarding temporal information, or they generate huge trace files that are often difficult to handle.

    In this paper we propose the use of event flow graphs for monitoring MPI applications, a new and different approach that balances the low overhead of profiling tools with the abundance of information available from tracers. Event flow graphs are captured with very low overhead, require orders of magnitude less storage than standard trace files, and can still recover the full sequence of events in the application. We test this new approach with the NERSC-8/Trinity Benchmark suite and achieve compression ratios up to 119x.

  • 5.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fürlinger, Karl
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Visual MPI Performance Analysis using Event Flow Graphs2015In: Procedia Computer Science, ISSN 1877-0509, E-ISSN 1877-0509, Vol. 51, p. 1353-1362Article in journal (Refereed)
    Abstract [en]

    Event flow graphs used in the context of performance monitoring combine the scalability and low overhead of profiling methods with lossless information recording of tracing tools. In other words, they capture statistics on the performance behavior of parallel applications while pre- serving the temporal ordering of events. Event flow graphs require significantly less storage than regular event traces and can still be used to recover the full ordered sequence of events performed by the application.  In this paper we explore the usage of event flow graphs in the context of visual performance analysis. We show that graphs can be used to quickly spot performance problems, helping to better understand the behavior of an application. We demonstrate our performance analysis approach with MiniFE, a mini-application that mimics the key performance aspects of finite- element applications in High Performance Computing (HPC).

  • 6.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Fürlinger, Karl
    Ludwig-Maximilians-Universität München.
    Online Performance Data Introspection with IPM2014In: Proceedings of the 15th IEEE International Conference on High Performance Computing and Communications (HPCC 2013), IEEE Computer Society, 2014, p. 728-734Conference paper (Refereed)
    Abstract [en]

    Exascale systems will be heterogeneous architectures with multiple levels of concurrency and energy constraints. In such a complex scenario, performance monitoring and runtime systems play a major role to obtain good application performance and scalability. Furthermore, online access to performance data becomes a necessity to decide how to schedule resources and orchestrate computational elements: processes, threads, tasks, etc. We present the Performance Introspection API, an extension of the IPM tool that provides online runtime access to performance data from an application while it runs. We describe its design and implementation and show its overhead on several test benchmarks. We also present a real test case using the Performance Introspection API in conjunction with processor frequency scaling to reduce power consumption.

  • 7.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Vahtras, Olav
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Gimenez, Judit
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz). KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC. KTH, Centres, SeRC - Swedish e-Science Research Centre.
    Scalability analysis of Dalton, a molecular structure program2013In: Future generations computer systems, ISSN 0167-739X, E-ISSN 1872-7115, Vol. 29, no 8, p. 2197-2204Article in journal (Refereed)
    Abstract [en]

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a performance characterization and optimization of Dalton. We also propose a solution to avoid the master/worker design of Dalton to become a performance bottleneck for larger process numbers. With these improvements we obtain speedups of 4x, increasing the parallel efficiency of the code and being able to run in it in a much bigger number of cores.

  • 8.
    Aguilar, Xavier
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Vahtras, Olav
    KTH, School of Biotechnology (BIO), Theoretical Chemistry and Biology. KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Gimenez, Judit
    Barcelona Supercomputing Center, Universitat Politecnica de Catalunya, Barcelona, Spain.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Scaling Dalton, a molecular electronic structure program2011In: Seventh International Conference on e-Science, e-Science 2011, 5-8 December 2011, Stockholm, Sweden, IEEE conference proceedings, 2011, p. 256-262Conference paper (Refereed)
    Abstract [en]

    Dalton is a molecular electronic structure program featuring common methods of computational chemistry that are based on pure quantum mechanics (QM) as well as hybrid quantum mechanics/molecular mechanics (QM/MM). It is specialized and has a leading position in calculation of molecular properties with a large world-wide user community (over 2000 licenses issued). In this paper, we present a characterization and performance optimization of Dalton that increases the scalability and parallel efficiency of the application. We also propose asolution that helps to avoid the master/worker design of Daltonto become a performance bottleneck for larger process numbers and increase the parallel efficiency.

  • 9. Gonzales-Alvares, G
    et al.
    Servat, H
    Carbera-Benítez, D
    Aguilar, Xavier
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Pons, C
    Fernandez-Recio, J
    Jimenez-Gonzales, D
    Drug design on the cell BE2010In: Scientific Computing with Multicore and Accelerators:   / [ed] J. Kurzak, D.A. Bader, J Dongarra, CRC Press, 2010, p. 331-350Chapter in book (Refereed)
  • 10.
    Labarta, Jesús
    et al.
    Barcelona Supercomputing Center.
    Gimenez, Judit
    Barcelona Supercomputing Center.
    Martinez, Eloy
    Barcelona Supercomputing Center.
    Gonzalez, Pedro
    Barcelona Supercomputing Center.
    Servat, Harald
    Barcelona Supercomputing Center.
    Llort, Germán
    Barcelona Supercomputing Center.
    Aguilar, Xavier
    Barcelona Supercomputing Center.
    Scalability of tracing and visualization tools2005In: Parallel Computing: Current & Future Issues of High-End Computing / [ed] Gerhard R. Joubert, Wolfgang E. Nagel, Frans J. Peters, Oscar G. Plata, P. Tirado, Emilio L. Zapata, Jülich: Central Institute for Applied Mathematics , 2005, p. 869-876Conference paper (Refereed)
    Abstract [en]

    Extending the capability of performance tools to deal with the larger and larger machines being deployed is necessary in order to understand their actual behavior and identify how to achieve per- formance expectations in the frequent case these are not met at a first try. Trace based tools such as Paraver provide extremely powerful and flexible analysis capabilities to identify performance problems not detectable by profile based tools.

    Scaling up the usability of trace based tools requires new techniques in both the acquisition and visualization phases. The CEPBA-tools approach distributes the functionalities required to tackle large systems in three different levels. Different acquisition techniques are used in the instrumen- tation package to control the data captured and maximize the ratio of information to file size. An intermediate level set of tools are used to summarize the generated Paraver traces into smaller traces, with the same format, but where some of the information has been summarized. Examples of filter functionalities at this level include summarization of certain events in periodic software counters and selection of specific time intervals or events. At the final level, different rendering techniques have been introduced in Paraver to visualize traces of many processes while still being able to con- vey to the analyst the information relevant to identify problems at very coarse level as well as the capabilities to dig down to very detailed levels.

    The paper describes in detail the techniques being used along those lines in the CEPBA-tools environment in order to support the analysis of applications run on large systems. 

  • 11.
    Markidis, Stefano
    et al.
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Schliephake, Michael
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Aguilar, Xavier
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Henty, David
    University of Edinburgh.
    Richardson, Harvey
    Cray Inc..
    Hart, Alistair
    Cray Inc..
    Gray, Alan
    University of Edinburgh.
    Lecomber, David
    Allinea Software Limited.
    Hilbrich, Tobias
    Technische Universität Dresden.
    Doleschal, Jens
    Technische Universität Dresden.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), High Performance Computing and Visualization (HPCViz).
    Paving the path to exascale computing with CRESTA development environment2013Conference paper (Other academic)
    Abstract [en]

    The development and implementation of efficient computer codes for exascale supercomputers will require combined advancement of all development environment components: compilers, automatic tuning frameworks, run-time systems, debuggers and performance monitoring and analysis tools. The exascale era poses unprecedented challenges. Because the presence of accelerators is more and more common among the fastest supercomputer and will play a role in exascale computing, compilers will need to support hybrid computer architectures and generate efficient code hiding the complexity of programming accelerators. Hand optimization of the code will be very difficult on exascale machine and will be increasingly assisted by automatic tuners. Application tuning will be more focus on parallel aspects of the computation because of large amount of available parallelism. The application workload will be distributed over million of processes, and to implement ad-hoc strategies directly in the application will be probably unfeasible while an adaptive run-time system will provide automatic load balancing. Debuggers and performance monitoring tools will deal with million processes and with huge amount of data from application and hardware counters, but they will still be required to minimize the overhead and retain scalability. In this talk, we present how the development environment of the CRESTA exascale EC project meets all these challenges by advancing the state of the art in the field.

    An investigation of compiler support for hybrid GPU programming, the design concepts, and the main characteristics of the alpha prototype implementation of the CRESTA development environment components for exascale computing are presented. A performance study of OpenACC compiler directives has been carried out, showing very promising results and indicating OpenACC as viable approach for programming hybrid exascale supercomputer. A new Domain-Specific Language (DSL) has been defined for the expression of parallel auto-tuning at very large scale. The focus of on the extension of the auto-tuning approach into the parallel domain to enable tuning of communication-related aspects of application. A new adaptive run-time system has been designed to schedule processes depending on the resource availability, on the workload, and on the run-time analysis of the application performance. The Allinea DDT debugger and the Dresden University of Technology MUST MPI correctness checker are being extended to provide a unified interface, to improve scalability, and to include new disruptive technology based on statistical analysis of run-time behavior of the application for anomalies detection. The new exascale prototypes of the Dresden University of Technology Vampir, VampirTrace and Score-P performance monitoring and analysis tools have been released. The new features include the possibility of applying filtering technique before loading performance data to drastically reduce memory needs during the performance analysis. The initial evaluation study of the development environment is targeted on the CRESTA project applications to determine how the development environment could be coupled into a production suite for exascale computing.

  • 12. Pons, Carles
    et al.
    Jimenez-Gonzalez, Daniel
    Gonzalez-Alvarez, Cecilia
    Servat, Harald
    Cabrera-Benitez, Daniel
    Aguilar, Xavier
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Fernandez-Recio, Juan
    Cell-Dock: high-performance protein-protein docking2012In: Bioinformatics, ISSN 1367-4803, E-ISSN 1367-4811, Vol. 28, no 18, p. 2394-2396Article in journal (Refereed)
    Abstract [en]

    The application of docking to large-scale experiments or the explicit treatment of protein flexibility are part of the new challenges in structural bioinformatics that will require large computer resources and more efficient algorithms. Highly optimized fast Fourier transform (FFT) approaches are broadly used in docking programs but their optimal code implementation leaves hardware acceleration as the only option to significantly reduce the computational cost of these tools. In this work we present Cell-Dock, an FFT-based docking algorithm adapted to the Cell BE processor. We show that Cell-Dock runs faster than FTDock with maximum speedups of above 200x, while achieving results of similar quality.

  • 13.
    Schliephake, Michael
    et al.
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Aguilar, Xavier
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Laure, Erwin
    KTH, School of Computer Science and Communication (CSC), Centres, Centre for High Performance Computing, PDC.
    Design and Implementation of a Runtime System for Parallel Numerical Simulations on Large-Scale Clusters2011In: Proceedings Of The International Conference On Computational Science (ICCS) / [ed] Sato, M; Matsuoka, S; Sloot, PMA; VanAlbada, GD; Dongarra, J, Elsevier, 2011, Vol. 4, p. 2105-2114Conference paper (Refereed)
    Abstract [en]

    The execution of scientific codes will introduce a number of new challenges and intensify some old ones on new high-performance computing infrastructures. Petascale computers are large systems with complex designs using heterogeneous technologies that make the programming and porting of applications difficult, particularly if one wants to use the maximum peak performance of the system. In this paper we present the design and first prototype of a runtime system for parallel numerical simulations on large-scale systems. The proposed runtime system addresses the challenges of performance, scalability, and programmability of large-scale HPC systems. We also present initial results of our prototype implementation using a molecular dynamics application kernel.

  • 14.
    Servat, Harald
    et al.
    Barcelona Supercomputing Center.
    Gonzalez, Cecilia
    Barcelona Supercomputing Center.
    Aguilar, Xavier
    Barcelona Supercomputing Center.
    Cabrera, Daniel
    Barcelona Supercomputing Center.
    Jimenez, Daniel
    Barcelona Supercomputing Center.
    Drug Design on the Cell BroadBand Engine2007In: Parallel Architecture and Compilation Techniques: Conference Proceedings, PACT, IEEE Computer Society, 2007, p. 425-425Conference paper (Refereed)
    Abstract [en]

    We evaluate a well known protein docking application in the bioinformatic field, Fourier Transform Docking (FTDock) (Gabb et al., 1997), on a Blade with two 3.2GHz cell broadband engine (BE) processor (Kahle et al., 2005). FTDock is a geometry complementary approximation of the protein docking problem, and uses 3D FFTs to reduce the complexity of the algorithm. FTDock achieves a significant speedup when most time consuming functions are offloaded to SPEs, and vectorized. We show the performance impact evolution of of-loading and vectorizing two functions of FTDock (CM and SC) on 1 SPU. We show total execution time of FTDock when CM and SC run in the PPU (bar 1), CM is off loaded (bar 2), CM is also vectorized (bar 3), SC is offloaded (bar 4) and SC is also vectorized (bar 5). Parallelizing functions that are not offloaded, using OpenMP for instance, on the dual-thread PPE helps to increase the PPEpipeline use and system throughput, and the scalability of the application.

  • 15.
    Servat, Harald
    et al.
    Barcelona Supercomputing Center.
    González-Alvarez, Cecilia
    Barcelona Supercomputing Center.
    Aguilar, Xavier
    Barcelona Supercomputing Center.
    Cabrera-Benitez, Daniel
    Barcelona Supercomputing Center.
    Jiménez-González, Daniel
    Barcelona Supercomputing Center.
    Drug Design Issues on the Cell BE2008In: High Performance Embedded Architectures and Compilers / [ed] Per Stenström and Michel Dubois and Manolis Katevenis and Rajiv Gupta and Theo Ungerer, Springer, 2008, p. 176-190Conference paper (Refereed)
    Abstract [en]

    Structure alignment prediction between proteins (protein docking) is crucial for drug design, and a challenging problem for bioinformatics, pharmaceutics, and current and future processors due to it is a very time consuming process. Here, we analyze a well known protein docking application in the Bioinformatic field, Fourier Transform Docking (FTDock), on a 3.2GHz Cell Broadband Engine (BE) processor. FTDock is a geometry complementary approximation of the protein docking problem, and baseline of several protein docking algorithms currently used. In particular, we measure the performance impact of reducing, tuning and overlapping memory accesses, and the efficiency of different parallelization strategies (SIMD, MPI, OpenMP, etc.) on porting that biomedical application to the Cell BE. Results show the potential of the Cell BE processor for drug design applications, but also that there are important memory and computer architecture aspects that should be considered.

  • 16.
    Thoman, Peter
    et al.
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Dichev, Kiril
    Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland..
    Heller, Thomas
    Univ Erlangen Nurnberg, D-91058 Erlangen, Germany..
    Iakymchuk, Roman
    KTH.
    Aguilar, Xavier
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Hasanov, Khalid
    IBM Ireland, Dublin 15, Ireland..
    Gschwandtner, Philipp
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Lemarinier, Pierre
    IBM Ireland, Dublin 15, Ireland..
    Markidis, Stefano
    KTH, School of Computer Science and Communication (CSC), Computational Science and Technology (CST).
    Jordan, Herbert
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Fahringer, Thomas
    Univ Innsbruck, A-6020 Innsbruck, Austria..
    Katrinis, Kostas
    IBM Ireland, Dublin 15, Ireland..
    Laure, Erwin
    KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.
    Nikolopoulos, Dimitrios S.
    Queens Univ Belfast, Belfast BT7 1NN, Antrim, North Ireland..
    A taxonomy of task-based parallel programming technologies for high-performance computing2018In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 74, no 4, p. 1422-1434Article in journal (Refereed)
    Abstract [en]

    Task-based programming models for shared memory-such as Cilk Plus and OpenMP 3-are well established and documented. However, with the increase in parallel, many-core, and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

1 - 16 of 16
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf