1617181920212219 of 33
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
High-Performance I/O Programming Models for Exascale Computing
KTH, School of Electrical Engineering and Computer Science (EECS), Computational Science and Technology (CST).
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

The success of the exascale supercomputer is largely dependent on novel breakthroughs that overcome the increasing demands for high-performance I/O on HPC. Scientists are aggressively taking advantage of the available compute power of petascale supercomputers to run larger scale and higher-fidelity simulations. At the same time, data-intensive workloads have recently become dominant as well. Such use-cases inherently pose additional stress into the I/O subsystem, mostly due to the elevated number of I/O transactions.

As a consequence, three critical challenges arise that are of paramount importance at exascale. First, while the concurrency of next-generation supercomputers is expected to increase up to 1000x, the bandwidth and access latency of the I/O subsystem is projected to remain roughly constant in comparison. Storage is, therefore, on the verge of becoming a serious bottleneck. Second, despite upcoming supercomputers expected to integrate emerging non-volatile memory technologies to compensate for some of these limitations, existing programming models and interfaces (e.g., MPI-IO) might not provide any clear technical advantage when targeting distributed intra-node storage, let alone byte-addressable persistent memories. And third, even though compute nodes becoming heterogeneous can provide benefits in terms of performance and thermal dissipation, this technological transformation implicitly increases the programming complexity. Hence, making it difficult for scientific applications to take advantage of these developments.

In this thesis, we explore how programming models and interfaces must evolve to address the aforementioned challenges. We present MPI storage windows, a novel concept that proposes utilizing the MPI one-sided communication model and MPI windows as a unified interface to program memory and storage. We then demonstrate how MPI one-sided can provide benefits on data analytics frameworks following a decoupled strategy, while integrating seamless fault-tolerance and out-of-core execution. Furthermore, we introduce persistent coarrays to enable transparent resiliency in Coarray Fortran, supporting the "failed images" feature recently introduced into the standard. Finally, we propose a global memory abstraction layer, inspired by the memory-mapped I/O mechanism of the OS, to expose different storage technologies using conventional memory operations.

The outcomes from these contributions are expected to have a considerable impact in a wide-variety of scientific applications on HPC, both in current and next-generation supercomputers.

Abstract [sv]

Framgången för superdatorer på exaskala kommer till stor del bero på nya genombrott som tillmötesgår ökande krav på högpresterande I/O inom högprestandaberäkningar. Forskare utnyttjar idag tillgänglig datorkraft hos superdatorer på petaskala för att köra större simuleringar med högre fidelitet. Samtidigt har dataintensiva tillämpningar blivit vanliga. Dessa skapar ytterligare påfrestningar på I/O subsystemet, framförallt genom det större antalet I/O transaktioner. 

Följdaktligen uppstår flera kritiska utmaningar som är av största vikt vid beräkningar på exaskala. Medan samtidigheten hos nästa generationens superdatorer förväntas öka med uppemot tre storleksordningar så beräknas bandvidden och accesslatensen hos I/O subsystemet förbli relativt oförändrad. Lagring står därför på gränsen till att bli en allvarlig flaskhals. Kommande superdatorer förväntas innehålla nya icke-flyktiga minnesteknologier för att kompensera för dessa begränsningar, men existerande programmeringsmodeller och gränssnitt (t.ex. MPI-IO) kommer eventuellt inte att ge några tydliga tekniska fördelar när de tillämpas på distribuerad intra-nod lagring, särskilt inte för byte-addresserbara persistenta minnen. Även om ökande heterogenitet hos beräkningsnoder kommer kunna ge fördelar med avseende på prestanda och termisk dissipation så kommer denna teknologiska transformation att medföra en ökning av programmeringskomplexitet, vilket kommer att försvåra för vetenskapliga tillämpningar att dra nytta av utvecklingen.

I denna avhandling utforskas hur programmeringsmodeller och gränssnitt behöver vidareutvecklas för att kringgå de ovannämnda begränsningarna. MPI lagringsfönster kommer presenteras, vilket är ett nytt koncept som går ut på att använda den ensidiga MPI kommunikationsmodellen tillsammans med MPI fönster som ett enhetligt gränssnitt till programminne och lagring. Därefter demonstreras hur ensidig MPI kommunikation kan vara till gagn för dataanalyssystem genom en frikopplad strategi, samtidigt som den integrerar kontinuerlig feltolerans och exekvering utanför kärnan. Vidare introduceras persistenta coarrays för att möjliggöra transparant resiliens i Coarray Fortran, som stödjer “failed images” funktionen som nyligen införts i standarden. Slutligen föreslås ett globalt minnesabstraktionslager, som med inspiration av minnes-kartlagda I/O mekanismen hos operativsystemet exponerar olika lagringsteknologier med hjälp av konventionella minnesoperationer.

Resultaten från dessa bidrag förväntas ha betydande påverkan för högprestandaberäkningar inom flera vetenskapliga tillämpningsområden, både för existerande och nästa generationens superdatorer.

Place, publisher, year, edition, pages
KTH Royal Institute of Technology, 2019. , p. 135
Series
TRITA-EECS-AVL ; 2019:77
National Category
Computer Engineering
Identifiers
URN: urn:nbn:se:kth:diva-263196ISBN: 978-91-7873-344-6 (print)OAI: oai:DiVA.org:kth-263196DiVA, id: diva2:1367264
Public defence
2019-11-29, B1, Brinellvägen 23, Bergs, våningsplan 1, KTH Campus, Stockholm, 10:00 (English)
Opponent
Supervisors
Note

QC 20191105

Available from: 2019-11-05 Created: 2019-11-01 Last updated: 2019-11-05Bibliographically approved
List of papers
1. MPI windows on storage for HPC applications
Open this publication in new window or tab >>MPI windows on storage for HPC applications
Show others...
2018 (English)In: Parallel Computing, ISSN 0167-8191, E-ISSN 1872-7336, Vol. 77, p. 38-56Article in journal (Refereed) Published
Abstract [en]

Upcoming HPC clusters will feature hybrid memories and storage devices per compute node. In this work, we propose to use the MPI one-sided communication model and MPI windows as unique interface for programming memory and storage. We describe the design and implementation of MPI storage windows, and present its benefits for out-of-core execution, parallel I/O and fault-tolerance. In addition, we explore the integration of heterogeneous window allocations, where memory and storage share a unified virtual address space. When performing large, irregular memory operations, we verify that MPI windows on local storage incurs a 55% performance penalty on average. When using a Lustre parallel file system, "asymmetric" performance is observed with over 90% degradation in writing operations. Nonetheless, experimental results of a Distributed Hash Table, the HACC I/O kernel mini-application, and a novel MapReduce implementation based on the use of MPI one-sided communication, indicate that the overall penalty of MPI windows on storage can be negligible in most cases in real-world applications.

Place, publisher, year, edition, pages
Elsevier, 2018
Keywords
MPI windows on storage, Out-of-core computation, Parallel I/O
National Category
Computer Systems
Identifiers
urn:nbn:se:kth:diva-235114 (URN)10.1016/j.parco.2018.05.007 (DOI)000441688300003 ()2-s2.0-85048347715 (Scopus ID)
Note

QC 20180919

Available from: 2018-09-19 Created: 2018-09-19 Last updated: 2019-11-01Bibliographically approved
2. Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks
Open this publication in new window or tab >>Decoupled Strategy for Imbalanced Workloads in MapReduce Frameworks
Show others...
2019 (English)In: Proceedings - 20th International Conference on High Performance Computing and Communications, 16th International Conference on Smart City and 4th International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, Institute of Electrical and Electronics Engineers (IEEE), 2019, p. 921-927Conference paper, Published paper (Refereed)
Abstract [en]

In this work, we consider the integration of MPI one-sided communication and non-blocking I/O in HPC-centric MapReduce frameworks. Using a decoupled strategy, we aim to overlap the Map and Reduce phases of the algorithm by allowing processes to communicate and synchronize using solely one-sided operations. Hence, we effectively increase the performance in situations where the workload per process becomes unexpectedly unbalanced. Using a Word-Count implementation and a large dataset from the Purdue MapReduce Benchmarks Suite (PUMA), we demonstrate that our approach can provide up to 23% performance improvement on average compared to a reference MapReduce implementation that uses state-of-the-art MPI collective communication and I/O.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
Keywords
High Performance Computing, MapReduce, MPI One Sided Communication
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-246358 (URN)10.1109/HPCC/SmartCity/DSS.2018.00153 (DOI)000468511200121 ()2-s2.0-85062487109 (Scopus ID)9781538666142 (ISBN)
Conference
20th International Conference on High Performance Computing and Communications, 16th IEEE International Conference on Smart City and 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, 28 June 2018 through 30 June 2018
Note

QC 20190319

Available from: 2019-03-19 Created: 2019-03-19 Last updated: 2019-11-01Bibliographically approved
3. Persistent Coarrays: Integrating MPI Storage Windows in Coarray Fortran
Open this publication in new window or tab >>Persistent Coarrays: Integrating MPI Storage Windows in Coarray Fortran
2019 (English)In: Proceedings of the 26th European MPI Users' Group Meeting (EuroMPI 2019), ACM Digital Library, 2019, p. 1-8, article id 3Conference paper, Published paper (Refereed)
Abstract [en]

The inherent integration of novel hardware and software components on HPC is expected to considerably aggravate the Mean Time Between Failures (MTBF) on scientific applications, while simultaneously increase the programming complexity of these clusters. In this work, we present the initial steps towards the integration of transparent resilience support inside Coarray Fortran. In particular, we propose persistent coarrays, an extension of OpenCoarrays that integrates MPI storage windows to leverage its transport layer and seamlessly map coarrays to files on storage. Preliminary results indicate that our approach provides clear benefits on representative workloads, while incurring in minimal source code changes.

Place, publisher, year, edition, pages
ACM Digital Library, 2019
Keywords
MPI storage windows, coarray fortran, persistent coarrays
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-263193 (URN)10.1145/3343211.3343214 (DOI)978-1-4503-7175-9 (ISBN)
Conference
26th European MPI Users' Group Meeting (EuroMPI 2019), Zürich, Switzerland — September 11 - 13, 2019
Note

QC 20191111

Available from: 2019-11-01 Created: 2019-11-01 Last updated: 2019-11-11Bibliographically approved
4. uMMAP-IO: User-level Memory-mapped I/O for HPC
Open this publication in new window or tab >>uMMAP-IO: User-level Memory-mapped I/O for HPC
Show others...
2019 (English)In: Proceedings of the 26th IEEE International Conference on High-Performance Computing, Data, and Analytics (HiPC'19),, Institute of Electrical and Electronics Engineers (IEEE), 2019Conference paper, Published paper (Refereed)
Abstract [en]

The integration of local storage technologies alongside traditional parallel file systems on HPC clusters, is expected to rise the programming complexity on scientific applications aiming to take advantage of the increased-level of heterogeneity. In this work, we present uMMAP-IO, a user-level memory-mapped I/O implementation that simplifies data management on multi-tier storage subsystems. Compared to the memory-mapped I/O mechanism of the OS, our approach features per-allocation configurable settings (e.g., segment size) and transparently enables access to a diverse range of memory and storage technologies, such as the burst buffer I/O accelerators. Preliminary results indicate that uMMAP-IO provides at least 5-10x better performance on representative workloads in comparison with the standard memory-mapped I/O of the OS, and approximately 20-50% degradation on average compared to using conventional memory allocations without storage support up to 8192 processes.

Place, publisher, year, edition, pages
Institute of Electrical and Electronics Engineers (IEEE), 2019
National Category
Computer Engineering
Identifiers
urn:nbn:se:kth:diva-263194 (URN)
Conference
26th IEEE International Conference on High-Performance Computing, Data, and Analytics (HiPC'19), 17-20 December 2019, Hyderabad, India
Note

QC 20191111

Paper accepted for publication, but not published yet.

eCF Paper Id: 1570876046477

Available from: 2019-11-01 Created: 2019-11-01 Last updated: 2019-11-11Bibliographically approved

Open Access in DiVA

fulltext(7790 kB)32 downloads
File information
File name FULLTEXT01.pdfFile size 7790 kBChecksum SHA-512
14ffac18d81b0131f450635e00d1ee32af254e7edacc8eda8daa8ee554425166a3c4a4ca937a7067b215cd1df52851638ddac82e08369a2a8200b2fb35c7a8a9
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Rivas-Gomez, Sergio
By organisation
Computational Science and Technology (CST)
Computer Engineering

Search outside of DiVA

GoogleGoogle Scholar
Total: 32 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 238 hits
1617181920212219 of 33
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf