Change search
Link to record
Permanent link

Direct link
BETA
Publications (10 of 91) Show all publications
Aguilar, X., Jordan, H., Heller, T., Hirsch, A., Fahringer, T. & Laure, E. (2019). An On-Line Performance Introspection Framework for Task-Based Runtime Systems. In: 19th International Conference on Computational Science, ICCS 2019: . Paper presented at 19th International Conference on Computational Science, ICCS 2019, Faro, Portugal, 12-14 June 2019 (pp. 238-252). Springer Verlag
Open this publication in new window or tab >>An On-Line Performance Introspection Framework for Task-Based Runtime Systems
Show others...
2019 (English)In: 19th International Conference on Computational Science, ICCS 2019, Springer Verlag , 2019, p. 238-252Conference paper, Published paper (Refereed)
Abstract [en]

The expected high levels of parallelism together with the heterogeneity and complexity of new computing systems pose many challenges to current software. New programming approaches and runtime systems that can simplify the development of parallel applications are needed. Task-based runtime systems have emerged as a good solution to cope with high levels of parallelism, while providing software portability, and easing program development. However, these runtime systems require real-time information on the state of the system to properly orchestrate program execution and optimise resource utilisation. In this paper, we present a lightweight monitoring infrastructure developed within the AllScale Runtime System, a task-based runtime system for extreme scale. This monitoring component provides real-time introspection capabilities that help the runtime scheduler in its decision-making process and adaptation, while introducing minimum overhead. In addition, the monitoring component provides several post-mortem reports as well as real-time data visualisation that can be of great help in the task of performance debugging.

Place, publisher, year, edition, pages
Springer Verlag, 2019
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 11536
Keywords
AllScale, HPX, Performance introspection, Performance monitoring, Real-time visualisation, Runtime system, Computer software portability, Computer systems programming, Data visualization, Decision making, Visualization, Real time, Runtime systems, Real time systems
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-262387 (URN)10.1007/978-3-030-22734-0_18 (DOI)2-s2.0-85067610766 (Scopus ID)9783030227333 (ISBN)
Conference
19th International Conference on Computational Science, ICCS 2019, Faro, Portugal, 12-14 June 2019
Note

QC 20191028

Available from: 2019-10-28 Created: 2019-10-28 Last updated: 2019-10-28Bibliographically approved
Rivas Gomez, S., Markidis, S., Laure, E., Brabazon, K., Perks, O. & Narasimhamurthy, S. (2019). Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks. In: Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018: . Paper presented at 20th International Conference on High Performance Computing and Communications, 16th IEEE International Conference on Smart City and 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, 28 June 2018 through 30 June 2018 (pp. 921-927). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks
Show others...
2019 (English)In: Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 921-927Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we consider the integration of MPI one-sided communication and non-blocking I/O in HPC-centric MapReduce frameworks. Using a decoupled strategy, we aim to overlap the Map and Reduce phases of the algorithm by allowing processes to communicate and synchronize using solely one-sided operations. Hence, we effectively increase the performance in situations where the workload per process becomes unexpectedly unbalanced. Using a Word-Count implementation and a large dataset from the Purdue MapReduce Benchmarks Suite (PUMA), we demonstrate that our approach can provide up to 23% performance improvement on average compared to a reference MapReduce implementation that uses state-of-the-art MPI collective communication and I/O.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
High Performance Computing, MapReduce, MPI One Sided Communication
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-246358 (URN)10.1109/HPCC/SmartCity/DSS.2018.00153 (DOI)000468511200121 ()2-s2.0-85062487109 (Scopus ID)9781538666142 (ISBN)
Conference
20th International Conference on High Performance Computing and Communications, 16th IEEE International Conference on Smart City and 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, 28 June 2018 through 30 June 2018
Note

QC 20190319

Available from: 2019-03-19 Created: 2019-03-19 Last updated: 2019-11-01Bibliographically approved
Souza, A., Rezaei, M., Laure, E. & Tordsson, J. (2019). Hybrid Resource Management for HPC and Data Intensive Workloads. In: 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID): . Paper presented at 19th Annual IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGRID), MAY 14-17, 2019, Larnaca, CYPRUS (pp. 399-409). IEEE
Open this publication in new window or tab >>Hybrid Resource Management for HPC and Data Intensive Workloads
2019 (English)In: 2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID), IEEE , 2019, p. 399-409Conference paper, Published paper (Refereed)
Abstract [en]

High Performance Computing (HPC) and Data Intensive (DI) workloads have been executed on separate clusters using different tools for resource and application management. With increasing convergence, where modern applications are composed of both types of jobs in complex workflows, this separation becomes a growing overhead and the need for a common platform increases. Executing both workload classes on the same clusters not only enables hybrid workflows, but can also increase system efficiency, as available hardware often is not fully utilized by applications. While HPC systems are typically managed in a coarse grained fashion, with exclusive resource allocations, DI systems employ a finer grained regime, enabling dynamic allocation and control based on application needs. On the path to full convergence, a useful and less intrusive step is a hybrid resource management system allowing the execution of DI applications on top of standard HPC scheduling systems. In this paper we present the architecture of a hybrid system enabling dual-level scheduling for DI jobs in HPC infrastructures. Our system takes advantage of real-time resource profiling to efficiently co-schedule HPC and DI applications. The architecture is easily extensible to current and new types of distributed applications, allowing efficient combination of hybrid workloads on HPC resources with increased job throughput and higher overall resource utilization. The implementation is based on the Sturm and Mesos resource managers for HPC and DI jobs. Experimental evaluations in a real cluster based on a set of representative HPC and DI applications demonstrate that our hybrid architecture improves resource utilization by 20%, with 12% decrease on queue makespan while still meeting all deadlines for HPC jobs.

Place, publisher, year, edition, pages
IEEE, 2019
Series
IEEE-ACM International Symposium on Cluster Cloud and Grid Computing, ISSN 2376-4414
Keywords
Resource Management, High Performance Computing, Data Intensive Computing, Mesos, Sturm, Bootstrapping
National Category
Computational Mathematics
Identifiers
urn:nbn:se:kth:diva-260219 (URN)10.1109/CCGRID.2019.000.54 (DOI)000483058700045 ()2-s2.0-85069469164 (Scopus ID)978-1-7281-0912-1 (ISBN)
Conference
19th Annual IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing (CCGRID), MAY 14-17, 2019, Larnaca, CYPRUS
Note

QC 20190930

Available from: 2019-09-30 Created: 2019-09-30 Last updated: 2019-09-30Bibliographically approved
Simmendinger, C., Iakymchuk, R., Cebamanos, L., Akhmetova, D., Bartsch, V., Rotaru, T., . . . Markidis, S. (2019). Interoperability strategies for GASPI and MPI in large-scale scientific applications. The international journal of high performance computing applications, 33(3), 554-568
Open this publication in new window or tab >>Interoperability strategies for GASPI and MPI in large-scale scientific applications
Show others...
2019 (English)In: The international journal of high performance computing applications, ISSN 1094-3420, E-ISSN 1741-2846, Vol. 33, no 3, p. 554-568Article in journal (Refereed) Published
Abstract [en]

One of the main hurdles of partitioned global address space (PGAS) approaches is the dominance of message passing interface (MPI), which as a de facto standard appears in the code basis of many applications. To take advantage of the PGAS APIs like global address space programming interface (GASPI) without a major change in the code basis, interoperability between MPI and PGAS approaches needs to be ensured. In this article, we consider an interoperable GASPI/MPI implementation for the communication/performance crucial parts of the Ludwig and iPIC3D applications. To address the discovered performance limitations, we develop a novel strategy for significantly improved performance and interoperability between both APIs by leveraging GASPI shared windows and shared notifications. First results with a corresponding implementation in the MiniGhost proxy application and the Allreduce collective operation demonstrate the viability of this approach.

Place, publisher, year, edition, pages
SAGE PUBLICATIONS LTD, 2019
Keywords
Interoperability, GASPI, MPI, iPIC3D, Ludwig, MiniGhost, halo exchange, Allreduce
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-254034 (URN)10.1177/1094342018808359 (DOI)000468919900011 ()2-s2.0-85059353725 (Scopus ID)
Note

QC 20190814

Available from: 2019-08-14 Created: 2019-08-14 Last updated: 2019-08-14Bibliographically approved
Otero, E., Gong, J., Min, M., Fischer, P., Schlatter, P. & Laure, E. (2019). OpenACC acceleration for the PN-PN-2 algorithm in Nek5000. Journal of Parallel and Distributed Computing, 132, 69-78
Open this publication in new window or tab >>OpenACC acceleration for the PN-PN-2 algorithm in Nek5000
Show others...
2019 (English)In: Journal of Parallel and Distributed Computing, ISSN 0743-7315, E-ISSN 1096-0848, Vol. 132, p. 69-78Article in journal (Refereed) Published
Abstract [en]

Due to its high performance and throughput capabilities, GPU-accelerated computing is becoming a popular technology in scientific computing, in particular using programming models such as CUDA and OpenACC. The main advantage with OpenACC is that it enables to simply port codes in their "original" form to GPU systems through compiler directives, thus allowing an incremental approach. An OpenACC implementation is applied to the CFD code Nek5000 for simulation of incompressible flows, based on the spectral-element method. The work follows up previous implementations and focuses now on the P-N-PN-2 method for the spatial discretization of the Navier-Stokes equations. Performance results of the ported code show a speed-up of up to 3.1 on multi-GPU for a polynomial order N > 11.

Place, publisher, year, edition, pages
Academic Press, 2019
Keywords
Nek5000; OpenACC; GPU programming; Spectral element method; High performance computing
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-253811 (URN)10.1016/j.jpdc.2019.05.010 (DOI)000476580400006 ()2-s2.0-85066835225 (Scopus ID)
Funder
EU, Horizon 2020Swedish e‐Science Research CenterSwedish Foundation for Strategic Research
Note

QC 20190625

Available from: 2019-06-18 Created: 2019-06-18 Last updated: 2019-08-16Bibliographically approved
Sishtla, C. P., Olshevsky, V., Chien, W. D., Markidis, S. & Laure, E. (2019). Particle-in-Cell Simulations of Plasma Dynamics in Cometary Environment. In: Journal of Physics: Conference Series. Paper presented at 13th International Conference on Numerical Modeling of Space Plasma Flows, ASTRONUM 2018; Panama City Beach; United States; 25 June 2018 through 29 June 2018. Institute of Physics Publishing (IOPP), 1225(1), Article ID 012009.
Open this publication in new window or tab >>Particle-in-Cell Simulations of Plasma Dynamics in Cometary Environment
Show others...
2019 (English)In: Journal of Physics: Conference Series, Institute of Physics Publishing (IOPP), 2019, Vol. 1225, no 1, article id 012009Conference paper, Published paper (Refereed)
Abstract [en]

We perform and analyze global Particle-in-Cell (PIC) simulations of the interaction between solar wind and an outgassing comet with the goal of studying the plasma kinetic dynamics of a cometary environment. To achieve this, we design and implement a new numerical method in the iPIC3D code to model outgassing from the comet: new plasma particles are ejected from the comet "surface" at each computational cycle. Our simulations show that a bow shock is formed as a result of the interaction between solar wind and outgassed particles. The analysis of distribution functions for the PIC simulations shows that at the bow shock part of the incoming solar wind, ions are reflected while electrons are heated. This work attempts to reveal kinetic effects in the atmosphere of an outgassing comet using a fully kinetic Particle-in-Cell model.

Place, publisher, year, edition, pages
Institute of Physics Publishing (IOPP), 2019
Series
Journal of Physics: Conference Series, ISSN 17426588 ; 1225
National Category
Physical Sciences
Identifiers
urn:nbn:se:kth:diva-262635 (URN)10.1088/1742-6596/1225/1/012009 (DOI)000478669600009 ()2-s2.0-85068062214 (Scopus ID)
Conference
13th International Conference on Numerical Modeling of Space Plasma Flows, ASTRONUM 2018; Panama City Beach; United States; 25 June 2018 through 29 June 2018
Note

QC 20191018

Available from: 2019-10-18 Created: 2019-10-18 Last updated: 2019-11-07Bibliographically approved
Narasimhamurthy, S., Danilov, N., Wu, S., Umanesan, G., Markidis, S., Rivas-Gomez, S., . . . de Witt, S. (2019). SAGE: Percipient Storage for Exascale Data Centric Computing. Parallel Computing, 83, 22-33
Open this publication in new window or tab >>SAGE: Percipient Storage for Exascale Data Centric Computing
Show others...
2019 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 83, p. 22-33Article in journal (Refereed) Published
Abstract [en]

We aim to implement a Big Data/Extreme Computing (BDEC) capable system infrastructure as we head towards the era of Exascale computing - termed SAGE (Percipient StorAGe for Exascale Data Centric Computing). The SAGE system will be capable of storing and processing immense volumes of data at the Exascale regime, and provide the capability for Exascale class applications to use such a storage infrastructure. SAGE addresses the increasing overlaps between Big Data Analysis and HPC in an era of next-generation data centric computing that has developed due to the proliferation of massive data sources, such as large, dispersed scientific instruments and sensors, whose data needs to be processed, analysed and integrated into simulations to derive scientific and innovative insights. Indeed, Exascale I/O, as a problem that has not been sufficiently dealt with for simulation codes, is appropriately addressed by the SAGE platform. The objective of this paper is to discuss the software architecture of the SAGE system and look at early results we have obtained employing some of its key methodologies, as the system continues to evolve.

Place, publisher, year, edition, pages
Elsevier, 2019
Keywords
SAGE architecture, Object storage, Mero, Clovis, PGAS I/O, MPI I/O, MPI streams
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-254119 (URN)10.1016/j.parco.2018.03.002 (DOI)000469898400003 ()2-s2.0-85044917976 (Scopus ID)
Note

QC 20190624

Available from: 2019-06-24 Created: 2019-06-24 Last updated: 2019-06-24Bibliographically approved
Thoman, P., Dichev, K., Heller, T., Iakymchuk, R., Aguilar, X., Hasanov, K., . . . Nikolopoulos, D. S. (2018). A taxonomy of task-based parallel programming technologies for high-performance computing. Journal of Supercomputing, 74(4), 1422-1434
Open this publication in new window or tab >>A taxonomy of task-based parallel programming technologies for high-performance computing
Show others...
2018 (English)In: Journal of Supercomputing, ISSN 0920-8542, E-ISSN 1573-0484, Vol. 74, no 4, p. 1422-1434Article in journal (Refereed) Published
Abstract [en]

Task-based programming models for shared memory-such as Cilk Plus and OpenMP 3-are well established and documented. However, with the increase in parallel, many-core, and heterogeneous systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing (HPC), no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

Place, publisher, year, edition, pages
SPRINGER, 2018
Keywords
High-performance computing, Task-based parallelism, Taxonomy, API, Runtime system, Scheduler, Monitoring framework, Fault tolerance
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-226199 (URN)10.1007/s11227-018-2238-4 (DOI)000428284000002 ()2-s2.0-85041817729 (Scopus ID)
Note

QC 20180518

Available from: 2018-05-18 Created: 2018-05-18 Last updated: 2019-08-20Bibliographically approved
Thoman, P., Hasanov, K., Dichev, K., Iakymchuk, R., Aguilar, X., Gschwandtner, P., . . . Fahringer, T. (2018). A Taxonomy of Task-Based Technologies for High-Performance Computing. In: Wyrzykowski, R Dongarra, J Deelman, E Karczewski, K (Ed.), PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2017), PT II: . Paper presented at 12th International Conference on Parallel Processing and Applied Mathematics (PPAM), SEP 10-13, 2017, Lublin, POLAND (pp. 264-274). SPRINGER INTERNATIONAL PUBLISHING AG
Open this publication in new window or tab >>A Taxonomy of Task-Based Technologies for High-Performance Computing
Show others...
2018 (English)In: PARALLEL PROCESSING AND APPLIED MATHEMATICS (PPAM 2017), PT II / [ed] Wyrzykowski, R Dongarra, J Deelman, E Karczewski, K, SPRINGER INTERNATIONAL PUBLISHING AG , 2018, p. 264-274Conference paper, Published paper (Refereed)
Abstract [en]

Task-based programming models for shared memory - such as Cilk Plus and OpenMP 3 - are well established and documented. However, with the increase in heterogeneous, many-core and parallel systems, a number of research-driven projects have developed more diversified task-based support, employing various programming and runtime features. Unfortunately, despite the fact that dozens of different task-based systems exist today and are actively used for parallel and high-performance computing, no comprehensive overview or classification of task-based technologies for HPC exists. In this paper, we provide an initial task-focused taxonomy for HPC technologies, which covers both programming interfaces and runtime mechanisms. We demonstrate the usefulness of our taxonomy by classifying state-of-the-art task-based environments in use today.

Place, publisher, year, edition, pages
SPRINGER INTERNATIONAL PUBLISHING AG, 2018
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 10778
Keywords
Task-based parallelism, Taxonomy, API, Runtime system, Scheduler, Monitoring framework, Fault tolerance
National Category
Computer and Information Sciences
Identifiers
urn:nbn:se:kth:diva-245025 (URN)10.1007/978-3-319-78054-2_25 (DOI)000458563900025 ()2-s2.0-85044764286 (Scopus ID)978-3-319-78054-2 (ISBN)
Conference
12th International Conference on Parallel Processing and Applied Mathematics (PPAM), SEP 10-13, 2017, Lublin, POLAND
Note

QC 20190305

Available from: 2019-03-05 Created: 2019-03-05 Last updated: 2019-03-05Bibliographically approved
Chien, S. W. D., Markidis, S., Sishtla, C. P., Santos, L., Herman, P., Nrasimhamurthy, S. & Laure, E. (2018). Characterizing Deep-Learning I/O Workloads in TensorFlow. In: Proceedings of PDSW-DISCS 2018: 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis: . Paper presented at 3rd IEEE/ACM Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, PDSW-DISCS 2018; Dallas; United States; 12 November 2018 (pp. 54-63). Institute of Electrical and Electronics Engineers (IEEE)
Open this publication in new window or tab >>Characterizing Deep-Learning I/O Workloads in TensorFlow
Show others...
2018 (English)In: Proceedings of PDSW-DISCS 2018: 3rd Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, Held in conjunction with SC 2018: The International Conference for High Performance Computing, Networking, Storage and Analysis, Institute of Electrical and Electronics Engineers (IEEE), 2018, p. 54-63Conference paper, Published paper (Refereed)
Abstract [en]

The performance of Deep-Learning (DL) computing frameworks rely on the rformance of data ingestion and checkpointing. In fact, during the aining, a considerable high number of relatively small files are first aded and pre-processed on CPUs and then moved to accelerator for mputation. In addition, checkpointing and restart operations are rried out to allow DL computing frameworks to restart quickly from a eckpoint. Because of this, I/O affects the performance of DL plications. this work, we characterize the I/O performance and scaling of nsorFlow, an open-source programming framework developed by Google and ecifically designed for solving DL problems. To measure TensorFlow I/O rformance, we first design a micro-benchmark to measure TensorFlow ads, and then use a TensorFlow mini-application based on AlexNet to asure the performance cost of I/O and checkpointing in TensorFlow. To prove the checkpointing performance, we design and implement a burst ffer. find that increasing the number of threads increases TensorFlow ndwidth by a maximum of 2.3 x and 7.8 x on our benchmark environments. e use of the tensorFlow prefetcher results in a complete overlap of mputation on accelerator and input pipeline on CPU eliminating the fective cost of I/O on the overall performance. The use of a burst ffer to checkpoint to a fast small capacity storage and copy ynchronously the checkpoints to a slower large capacity storage sulted in a performance improvement of 2.6x with respect to eckpointing directly to slower storage on our benchmark environment.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2018
Keywords
Parallel I/O, Input Pipeline, Deep Learning, TensorFlow
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-248377 (URN)10.1109/PDSW-DISCS.2018.00011 (DOI)000462205000006 ()2-s2.0-85063062239 (Scopus ID)
Conference
3rd IEEE/ACM Joint International Workshop on Parallel Data Storage and Data Intensive Scalable Computing Systems, PDSW-DISCS 2018; Dallas; United States; 12 November 2018
Note

QC 20190405

Available from: 2019-04-05 Created: 2019-04-05 Last updated: 2019-04-05Bibliographically approved
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-9901-9857

Search in DiVA

Show all publications