Massimiliano Guarrasi (Cineca), Giovanni Erbacci (Cineca) and Andrew Emerson (Cineca)
In this paper we present the work carried out by CINECA in the framework of the PRACE-2IP project which had the aim of improving the performance of the FFTW library by refining the auto-tuning mechanism that is already implemented in this library. This optimization was realized with the following activities:
In particular, we have compared the performance of the standard Slab Decomposition algorithm already present with that obtained using the 2D Domain Decomposition and we found that on massively parallel supercomputers the performance of this new algorithm is significantly higher.
Read more: PRACE whitepaper