In this paper we describe a parallel FFT routine for MIMD distributed- memory machines, which performs two-dimensional mixed-radix complex-to-complex FFTs. It has been developed for the software library of PINEAPL, an ESPRIT project funded by the European Commission and aimed mainly at producing a general-purpose library of parallel numerical software suitable for a wide range of computationally intensive industrial applications. The parallel FFT algorithm used in the routine is based on a row-block distribution of the matrix to be transformed and has two main computational kernels, one executing multiple one-dimensional FFTs and the other computing the transpose of a matrix distributed in row-block form. The routine has been implemented in Fortran 77, using the BLACS communication library. The accuracy of the routine has been analyzed, comparing experimental results on a IBM RISC System/6000 with known theoretical error bounds. The parallel performance has been evaluated on an Intel iPSC/860, measuring execution time, efficiency and scaled e#ciency. Satisfactory results have been obtained.
Development of a Parallel Two-Dimensional Mixed-Radix FFT Routine
Carracciuolo L;
1997
Abstract
In this paper we describe a parallel FFT routine for MIMD distributed- memory machines, which performs two-dimensional mixed-radix complex-to-complex FFTs. It has been developed for the software library of PINEAPL, an ESPRIT project funded by the European Commission and aimed mainly at producing a general-purpose library of parallel numerical software suitable for a wide range of computationally intensive industrial applications. The parallel FFT algorithm used in the routine is based on a row-block distribution of the matrix to be transformed and has two main computational kernels, one executing multiple one-dimensional FFTs and the other computing the transpose of a matrix distributed in row-block form. The routine has been implemented in Fortran 77, using the BLACS communication library. The accuracy of the routine has been analyzed, comparing experimental results on a IBM RISC System/6000 with known theoretical error bounds. The parallel performance has been evaluated on an Intel iPSC/860, measuring execution time, efficiency and scaled e#ciency. Satisfactory results have been obtained.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.