Scheduling FFT Computation on SMP and Multi-core Systems
2007 (English)In: Proceedings of the 21st annual international conference on Supercomputing, 2007, 293-301 p.Conference paper (Refereed)
Increased complexity of memory systems to ameliorate the gap between the speed of processors and memory has made it increasingly harder for compilers to optimize an arbitrary code within a palatable amount of time. With the emergence of multicore (CMP), multiprocessor (SMP) and hybrid shared memory multiprocessor architectures, achieving high e ciency is becoming even more challenging. To address the challenge to achieve high e ciency in performance critical applications, domain speci c frameworks have been developed that aid the compilers in scheduling the computations. We have developed a portable framework for the Fast Fourier Transform (FFT) that achieves high e ciency by automatically adapting to various architectural features. Adapting to parallel architectures by searching through all the combinations of schedules (plans) is an expensive task, even when the search is conducted in parallel. In this paper, we develop heuristics to simplify the generation of better schedules for parallel FFT computations on CMP/SMP systems. We evaluate the performance of OpenMP and PThreads implementations of FFT on a number of latest architectures. The performance of parallel FFT schedules is compared with that of the best plan generated for sequential FFT and the speedup for di erent number of processors is reported. In the end, we also present a performance comparison between the UHFFT and FFTW implementations.
Place, publisher, year, edition, pages
2007. 293-301 p.
Computer and Information Science
IdentifiersURN: urn:nbn:se:kth:diva-63147DOI: 10.1145/1274971.1275011OAI: oai:DiVA.org:kth-63147DiVA: diva2:481698
21st ACM International Conference on Supercomputing. Seattle, WA, USA. June 16-20, 2007
QC 201205242012-01-222012-01-222012-05-24Bibliographically approved