kth.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Optimization of Tensor-product Operations in Nekbone on GPUs
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0003-3374-8093
KTH, School of Electrical Engineering and Computer Science (EECS), Centres, Centre for High Performance Computing, PDC.ORCID iD: 0000-0002-5020-1631
KTH, School of Electrical Engineering and Computer Science (EECS), Computer Science, Computational Science and Technology (CST).ORCID iD: 0000-0001-5452-6794
KTH, School of Engineering Sciences (SCI), Centres, Linné Flow Center, FLOW. KTH, Centres, SeRC - Swedish e-Science Research Centre. KTH, School of Engineering Sciences (SCI), Engineering Mechanics, Fluid Mechanics and Engineering Acoustics.ORCID iD: 0000-0001-9627-5903
Show others and affiliations
2020 (English)Conference paper, Poster (with or without abstract) (Refereed)
Abstract [en]

In the CFD solver Nek5000, the computation is dominated by the evaluation of small tensor operations. Nekbone is a proxy app for Nek5000 and has previously been ported to GPUs with a mixed OpenACC and CUDA approach. In this work, we continue this effort and optimize the main tensor-product operation in Nekbone further. Our optimization is done in CUDA and uses a different, 2D, thread structure to make the computations layer by layer. This enables us to use loop unrolling as well as utilize registers and shared memory efficiently. Our implementation is then compared on both the Pascal and Volta GPU architectures to previous GPU versions of Nekbone as well as a measured roofline. The results show that our implementation outperforms previous GPU Nekbone implementations by 6-10%. Compared to the measured roofline, we obtain 77-92% of the peak performance for both Nvidia P100 and V100 GPUs for inputs with 1024-4096 elements and polynomial degree 9.

Place, publisher, year, edition, pages
2020.
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:kth:diva-296314OAI: oai:DiVA.org:kth-296314DiVA, id: diva2:1559582
Conference
The International Conference for High Performance Computing, Networking, Storage, and Analysis, 2020
Funder
Swedish e‐Science Research CenterEU, Horizon 2020, 823691
Note

QC 20210607

Available from: 2021-06-02 Created: 2021-06-02 Last updated: 2022-06-25Bibliographically approved

Open Access in DiVA

fulltext(214 kB)141 downloads
File information
File name FULLTEXT01.pdfFile size 214 kBChecksum SHA-512
f29b6d8f364a0a6318d45b1e4a6a9951b4b3a747a76b76b832de2e3818aa386c33500614c64ed3c1ad6ac0b90a478e6f0fc446793ff0be1e711e88e6db51744f
Type fulltextMimetype application/pdf

Authority records

Karp, MartinJansson, NiclasPodobas, ArturSchlatter, PhilippMarkidis, Stefano

Search in DiVA

By author/editor
Karp, MartinJansson, NiclasPodobas, ArturSchlatter, PhilippMarkidis, Stefano
By organisation
Computational Science and Technology (CST)Centre for High Performance Computing, PDCLinné Flow Center, FLOWSeRC - Swedish e-Science Research CentreFluid Mechanics and Engineering Acoustics
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar
Total: 143 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 284 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf