Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Initial results on computational performance of Intel many integrated core, sandy bridge, and graphical processing unit architectures: implementation of a 1D c++/OpenMP electrostatic particle-in-cell code
KTH, Skolan för datavetenskap och kommunikation (CSC), Centra, Parallelldatorcentrum, PDC.ORCID-id: 0000-0003-0639-0639
Vise andre og tillknytning
2015 (engelsk)Inngår i: Concurrency and Computation, ISSN 1532-0626, E-ISSN 1532-0634, Vol. 27, nr 3, s. 581-593Artikkel i tidsskrift (Fagfellevurdert) Published
Abstract [en]

We present initial comparison performance results for Intel many integrated core (MIC), Sandy Bridge (SB), and graphical processing unit (GPU). A 1D explicit electrostatic particle-in-cell code is used to simulate a two-stream instability in plasma. We compare the computation times for various number of cores/threads and compiler options. The parallelization is implemented via OpenMP with a maximum thread number of 128. Parallelization and vectorization on the GPU is achieved with modifying the code syntax for compatibility with CUDA. We assess the speedup due to various auto-vectorization and optimization level compiler options. Our results show that the MIC is several times slower than SB for a single thread, and it becomes faster than SB when the number of cores increases with vectorization switched on. The compute times for the GPU are consistently about six to seven times faster than the ones for MIC. Compared with SB, the GPU is about two times faster for a single thread and about an order of magnitude faster for 128 threads. The net speedup, however, for MIC and GPU are almost the same. An initial attempt to offload parts of the code to the MIC coprocessor shows that there is an optimal number of threads where the speedup reaches a maximum.

sted, utgiver, år, opplag, sider
2015. Vol. 27, nr 3, s. 581-593
Emneord [en]
coprocessor, many integrated cores, particle-in-cell, heterogeneous computing
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-162942DOI: 10.1002/cpe.3248ISI: 000350142600004Scopus ID: 2-s2.0-84923530126OAI: oai:DiVA.org:kth-162942DiVA, id: diva2:800209
Merknad

QC 20150402

Tilgjengelig fra: 2015-04-02 Laget: 2015-03-26 Sist oppdatert: 2018-01-11bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Personposter BETA

Markidis, Stefano

Søk i DiVA

Av forfatter/redaktør
Markidis, Stefano
Av organisasjonen
I samme tidsskrift
Concurrency and Computation

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 97 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf