kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards an Efficient Spectral Element Solver for Poisson’s Equation on Heterogeneous Platforms
KTH, School of Electrical Engineering and Computer Science (EECS).
2022 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesisAlternative title
Mot en effektiv spektrala element-lösare för Poissons ekvation på heterogena plattformar (Swedish)
Abstract [en]

Neko is a project at KTH to refactor the widely used fluid dynamics solver Nek5000 to support modern hardware. Many aspects of the solver need adapting for use on GPUs, and one such part is the main communication kernel, the Gather-Scatter (GS) routine. To avoid race conditions in the kernel, atomic operations are used, which can be inefficient. To avoid the use of atomics, elements were grouped in such a way that when multiple writes to the same address are necessary, they will always come in blocks. This way, each block can be assigned to a single thread and handled sequentially, avoiding the need for atomic operations altogether. In the scope of the thesis, a Poisson solver was also ported from CPU to Nvidia GPUs. To optimise the Poisson solver, a batched matrix multiplication kernel was developed to efficiently perform small matrix multiplications in bulk, to better utilise the GPU. Optimisations using shared memory and kernel unification was done. The performance of the different implementations was tested on two systems using a GTX1660 and dual Nvidia A100 respectively. The results show only small differences in performance between the two versions of the GS kernels when only considering computational cost, and in a multi-rank setup the communication time completely overwhelms any potential difference. The shared memory matrix multiplication kernel yielded around a 20% performance boost for the Poisson solver. Both versions vastly outperformed cuBLAS. The unified kernel also had a large positive impact on the performance, yielding up to a 50% increase in throughput.

Abstract [sv]

Neko är ett KTH-projekt med syfte att vidareutveckla det populära beräkningsströmningsdynamik-programmet Nek5000 för moderna datorsystem. Speciell vikt har lagts vid att stödja heterogena plattformar med dedikerade accelleratorer för flyttalsberäkningar. Den idag vanligast förekommande sådana är grafikkort (GPUer). En viktig del av Neko är Gather-Scatter (GS)-funktionen, som är den huvudsakliga kommunikations-funktionen mellan processer i programmet. I GS-funktionen kan race conditions uppstå då flera trådar skriver till samma minnesaddress samtidigt. Detta kan undvikas med atomic operations, men användande av dessa kan ha negativ inverkan på prestanda. I detta masterarbete utvecklades en alternativ implementation där element i GS-algoritmen grupperades på sådant sätt att alla operationer på samma element kommer i block. På så sätt kan de enkelt behandlas i sekvens och därmed undvika behovet av atomic operations. Inom ramen för masterarbetet implementerades en numerisk lösare av Poisson’s ekvation för GPUer. Optimering av koden genom att göra matrismultiplikationer i bulk genomfördes, och vidare genom utnyttjande av shared memory. Prestandan utvärderades på två olika datorsystem med en GTX1660 respektive två A100 GPUer. Enbart små skillnader sågs mellan de olika GS-implementationerna, med en svag fördel om ca 5% högre prestanda för den grupperade varianten i högupplösta domäner. Poisson-lösaren visade på höga prestandasiffror jämfört med cuBLAS-biblioteket.

Place, publisher, year, edition, pages
2022. , p. 53
Series
TRITA-EECS-EX ; 2022:724
Keywords [en]
Neko, CUDA, Heterogeneous hardware, GPU, Gather-Scatter, HPC, CFD
Keywords [sv]
Neko, CUDA, Heterogena plattformar, GPU, Gather-Scatter, Högprestandabe-räkningar, Beräkningsströmningsdynamik
National Category
Computer and Information Sciences
Identifiers
URN: urn:nbn:se:kth:diva-321533OAI: oai:DiVA.org:kth-321533DiVA, id: diva2:1711497
Subject / course
Computer Science
Educational program
Master of Science - Computer Science
Supervisors
Examiners
Available from: 2022-12-19 Created: 2022-11-17 Last updated: 2022-12-19Bibliographically approved

Open Access in DiVA

fulltext(2354 kB)300 downloads
File information
File name FULLTEXT01.pdfFile size 2354 kBChecksum SHA-512
4191f9a57bbc62d7249c38d91eae8f6274a395a460af86693953b3c023b664adce26c520cc0b39d62e6b0720a62ba854675bd4004f40fe75f87542a0b6fab23e
Type fulltextMimetype application/pdf

By organisation
School of Electrical Engineering and Computer Science (EECS)
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 301 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 331 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf