Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Performance Analysis of Irregular Collective Communication with the Crystal Router Algorithm
KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centra, SeRC - Swedish e-Science Research Centre.ORCID-id: 0000-0002-5415-1248
KTH, Skolan för datavetenskap och kommunikation (CSC), High Performance Computing and Visualization (HPCViz). KTH, Centra, SeRC - Swedish e-Science Research Centre.ORCID-id: 0000-0002-9901-9857
2015 (engelsk)Inngår i: Solving software challenges for exascale, 2015, s. 130-140Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

In order to achieve exascale performance it is important to detect potential bottlenecks and identify strategies to overcome them. For this, both applications and system software must be analysed and potentially improved. The EU FP7 project Collaborative Research into Exascale Systemware, Tools & Applications (CRESTA) chose the approach to co-design advanced simulation applications and system software as well as development tools. In this paper, we present the results of a co-design activity focused on the simulation code NEK5000 that aims at performance improvements of collective communication operations. We have analysed the algorithms that form the core of NEK5000's communication module in order to assess its viability on recent computer architectures before starting to improve its performance. Our results show that the crystal router algorithm performs well in sparse, irregular collective operations for medium and large processor number but improvements for even larger system sizes of the future will be needed. We sketch the needed improvements, which will make the communication algorithms also beneficial for other applications that need to implement latency-dominated communication schemes with short messages. The latency-optimised communication operations will also become used in a runtime-system providing dynamic load balancing, under development within CRESTA.

sted, utgiver, år, opplag, sider
2015. s. 130-140
Serie
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8759
Emneord [en]
Collective operations, MPI, Performance tuning
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-170717DOI: 10.1007/978-3-319-15976-8_10ISI: 000355749700010Scopus ID: 2-s2.0-84928920465ISBN: 978-3-319-15975-1 (tryckt)ISBN: 978-3-319-15976-8 (tryckt)OAI: oai:DiVA.org:kth-170717DiVA, id: diva2:839900
Konferanse
2nd International Conference on Exascale Applications and Software (EASC), APR 02-03, 2014, Stockholm, SWEDEN
Merknad

QC 20150706

Tilgjengelig fra: 2015-07-06 Laget: 2015-07-03 Sist oppdatert: 2018-01-11bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Personposter BETA

Schliephake, MichaelLaure, Erwin

Søk i DiVA

Av forfatter/redaktør
Schliephake, MichaelLaure, Erwin
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric

doi
isbn
urn-nbn
Totalt: 50 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf