Communication Efficient Multi–Processor FFT
1992 (English)In: Journal of Computational Physics, ISSN 0021-9991, E-ISSN 1090-2716, Vol. 102, no 2, 381-397 p.Article in journal (Refereed) Published
Computing the fast Fourier transform on a distributed memory architecture by a direct pipelined radix-2, a bi-section, or a multisection algorithm, all yield the same communications requirement, if communication for all FFT stages can be performed concurrently, the input data is in normal order, and the data allocation is consecutive. With a cyclic data allocation, or bit-reversed input data and a consecutive allocation, multi-sectioning offers a reduced communications requirement by approximately a factor of two. For a consecutive data allocation, normal input order, a decimation-in-time FFT requires that P/N + d−2 twiddle factors be stored for P elements distributed evenly over N processors, and the axis that is subject to transformation be distributed over 2d processors. No communication of twiddle factors is required. The same storage requirements hold for a decimation-in-frequency FFT, bit-reversed input order, and consecutive data allocation. The opposite combination of FFT type and data ordering requires a factor of log2N more storage for N processors. The peak performance for a Connection Machine system CM-200 implementation is 12.9 Gflops/s in 32-bit precision, and 10.7 Gflops/s in 64-bit precision for unordered transforms local to each processor. The corresponding execution rates for ordered transforms are 11.1 Gflops/s and 8.5 Gflops/s, respectively. For distributed one- and two-dimensional transforms the peak performance for unordered transforms exceeds 5 Gflops/s in 32-bit precision and 3 Gflops/s in 64-bit precision. Three-dimensional transforms execute at a slightly lower rate. Distributed ordered transforms execute at a rate of about 1/2 to 2/3 of the unordered transforms.
Place, publisher, year, edition, pages
1992. Vol. 102, no 2, 381-397 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-91002DOI: 10.1016/0021-9991(92)90380-HOAI: oai:DiVA.org:kth-91002DiVA: diva2:507662
NR 201408052012-03-052012-03-05Bibliographically approved