Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Exploring the performance benefit of hybrid memory system on HPC environments
KTH, Skolan för datavetenskap och kommunikation (CSC), Beräkningsvetenskap och beräkningsteknik (CST).
Vise andre og tillknytning
2017 (engelsk)Inngår i: Proceedings - 2017 IEEE 31st International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, Institute of Electrical and Electronics Engineers (IEEE), 2017, s. 683-692, artikkel-id 7965110Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Hardware accelerators have become a de-facto standard to achieve high performance on current supercomputers and there are indications that this trend will increase in the future. Modern accelerators feature high-bandwidth memory next to the computing cores. For example, the Intel Knights Landing (KNL) processor is equipped with 16 GB of high-bandwidth memory (HBM) that works together with conventional DRAM memory. Theoretically, HBM can provide ∼4× higher bandwidth than conventional DRAM. However, many factors impact the effective performance achieved by applications, including the application memory access pattern, the problem size, the threading level and the actual memory configuration. In this paper, we analyze the Intel KNL system and quantify the impact of the most important factors on the application performance by using a set of applications that are representative of scientific and data-analytics workloads. Our results show that applications with regular memory access benefit from MCDRAM, achieving up to 3× performance when compared to the performance obtained using only DRAM. On the contrary, applications with random memory access pattern are latency-bound and may suffer from performance degradation when using only MCDRAM. For those applications, the use of additional hardware threads may help hide latency and achieve higher aggregated bandwidth when using HBM.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE), 2017. s. 683-692, artikkel-id 7965110
Serie
IEEE International Symposium on Parallel and Distributed Processing Workshops, ISSN 2164-7062
Emneord [en]
application performance on Intel KNL, HBM, hybrid memory system, Intel Knights Landing (KNL), MCDRAM
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-213533DOI: 10.1109/IPDPSW.2017.115ISI: 000417418900077Scopus ID: 2-s2.0-85028050020ISBN: 9781538634080 (tryckt)OAI: oai:DiVA.org:kth-213533DiVA, id: diva2:1138053
Konferanse
31st IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2017, Orlando, United States, 29 May 2017 through 2 June 2017
Merknad

QC 20170904

Tilgjengelig fra: 2017-09-04 Laget: 2017-09-04 Sist oppdatert: 2018-01-29bibliografisk kontrollert
Inngår i avhandling
1. Data Movement on Emerging Large-Scale Parallel Systems
Åpne denne publikasjonen i ny fane eller vindu >>Data Movement on Emerging Large-Scale Parallel Systems
2017 (engelsk)Doktoravhandling, med artikler (Annet vitenskapelig)
Abstract [en]

Large-scale HPC systems are an important driver for solving computational problems in scientific communities. Next-generation HPC systems will not only grow in scale but also in heterogeneity. This increased system complexity entails more challenges to data movement in HPC applications. Data movement on emerging HPC systems requires asynchronous fine-grained communication and efficient data placement in the main memory. This thesis proposes an innovative programming model and algorithm to prepare HPC applications for the next computing era: (1) a data streaming model that supports emerging data-intensive applications on supercomputers, (2) a decoupling model that improves parallelism and mitigates the impact of imbalance in applications, (3) a new framework and methodology for predicting the impact of largescale heterogeneous memory systems on HPC applications, and (4) a data placement algorithm that uses a set of rules and a decision tree to determine the data-to-memory mapping in heterogeneous main memory.

The proposed approaches in this thesis are evaluated on multiple supercomputers with different processors and interconnect networks. The evaluation uses a diverse set of applications that represent conventional scientific applications and emerging data-analytic workloads on HPC systems. The experimental results on the petascale testbed show that the approaches obtain increasing performance improvements as system scale increases and this trend supports the approaches as a valuable contribution towards future HPC systems.

Abstract [sv]

Storskaliga HPC-system är en viktig drivkraft för att lösa datorproblem i vetenskapliga samhällen. Nästa generations HPC-system kommer inte bara att växa i skala utan också i heterogenitet. Denna ökade systemkomplexitet medför flera utmaningar för dataförflyttning i HPC-applikationer. Dataförflyttning på nya HPC-system kräver asynkron, finkorrigerad kommunikation och en effektiv dataplacering i huvudminnet.

Denna avhandling föreslår en innovativ programmeringsmodell och algoritm för att förbereda HPC-applikationer för nästa generation: (1) en dataströmningsmodell som stöder nya dataintensiva applikationer på superdatorer, (2) en kopplingsmodell som förbättrar parallelliteten och minskar obalans i applikationer, (3) en ny metologi och struktur för att förutse effekten av storskaliga, heterogena minnessystem på HPC-applikationer, och (4) en datalägesalgoritm som använder en uppsättning av regler och ett beslutsträd för att bestämma kartläggningen av data-till-minnet i det heterogena huvudminnet.

Den föreslagna programmeringsmodellen i denna avhandling är utvärderad på flera superdatorer med olika processorer och sammankopplingsnät. Utvärderingen använder en mängd olika applikationer som representerar konventionella vetenskapliga applikationer och nya dataanalyser på HPC-system. Experimentella resultat på testbädden i petascala visar att programmeringsmodellen förbättrar prestandan när systemskalan ökar. Denna trend indikerar att modellen är ett värdefullt bidrag till framtida HPC-system.

sted, utgiver, år, opplag, sider
KTH Royal Institute of Technology, 2017. s. 116
Serie
TRITA-CSC-A, ISSN 1653-5723 ; 2017:25
HSV kategori
Forskningsprogram
Datalogi
Identifikatorer
urn:nbn:se:kth:diva-218338 (URN)978-91-7729-592-1 (ISBN)
Disputas
2017-12-18, F3, Lindstedtsvägen 26, Stockholm, 10:00 (engelsk)
Opponent
Veileder
Merknad

QC 20171128

Tilgjengelig fra: 2017-11-28 Laget: 2017-11-27 Sist oppdatert: 2018-01-13bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Søk i DiVA

Av forfatter/redaktør
Peng, I. B.Laure, ErwinMarkidis, Stefano
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 37 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf