Endre søk
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs
NYU, Courant Inst Math Sci, New York, NY 10003 USA..
Princeton Univ, PACM, Princeton, NJ 08544 USA..
KTH, Skolan för teknikvetenskap (SCI), Matematik (Inst.), Matematik (Avd.).ORCID-id: 0000-0002-3377-813x
Lawrence Berkeley Natl Lab, Natl Energy Res Sci Comp Ctr, Berkeley, CA USA..
Vise andre og tillknytning
2021 (engelsk)Inngår i: 2021 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Institute of Electrical and Electronics Engineers (IEEE) , 2021, s. 688-697Konferansepaper, Publicerat paper (Fagfellevurdert)
Abstract [en]

Nonuniform fast Fourier transforms dominate the computational cost in many applications including image reconstruction and signal processing. We thus present a general-purpose GPU-based CUDA library for type 1 (nonuniform to uniform) and type 2 (uniform to nonuniform) transforms in dimensions 2 and 3, in single or double precision. It achieves high performance for a given user-requested accuracy, regardless of the distribution of nonuniform points, via cache-aware point reordering, and load-balanced blocked spreading in shared memory. At low accuracies, this gives on-GPU throughputs around 109 nonuniform points per second, and (even including hostdevice transfer) is typically 4-10x faster than the latest parallel CPU code FINUFFT (at 28 threads). It is competitive with two established GPU codes, being up to 90x faster at high accuracy and/or type 1 clustered point distributions. Finally we demonstrate a 5-12x speedup versus CPU in an X-ray diffraction 3D iterative reconstruction task at 10(-12) accuracy, observing excellent multi-GPU weak scaling up to one rank per GPU.

sted, utgiver, år, opplag, sider
Institute of Electrical and Electronics Engineers (IEEE) , 2021. s. 688-697
Emneord [en]
Nonuniform FFT, GPU, load balancing
HSV kategori
Identifikatorer
URN: urn:nbn:se:kth:diva-302675DOI: 10.1109/IPDPSW52791.2021.00105ISI: 000689576200084Scopus ID: 2-s2.0-85114440464OAI: oai:DiVA.org:kth-302675DiVA, id: diva2:1599130
Konferanse
35th IEEE International Parallel and Distributed Processing Symposium (IPDPS), JUN 17-21, 2021, Portland, OR
Merknad

QC 20210930

Tilgjengelig fra: 2021-09-30 Laget: 2021-09-30 Sist oppdatert: 2022-06-25bibliografisk kontrollert

Open Access i DiVA

Fulltekst mangler i DiVA

Andre lenker

Forlagets fulltekstScopus

Person

Andén, Joakim

Søk i DiVA

Av forfatter/redaktør
Andén, Joakim
Av organisasjonen

Søk utenfor DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric

doi
urn-nbn
Totalt: 101 treff
RefereraExporteraLink to record
Permanent link

Direct link
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Annet format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annet språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf