Intel's latest Xeon Phi processor, Knights Landing (KNL), has the potential to provide over 2.6 TFLOPS. However, to obtain maximum performance on the KNL, significant refactoring and optimization of application codes is still required to exploit key architectural innovations that KNL features – wide vector units, many-core node design, and deep memory hierarchy. The experience and insights gained in porting and running FEFLO (a typical edge-based Finite Element code for the solution of compressible and incompressible FLOws) on the KNL platform are described in this paper. In particular, optimizations used to extract on-node parallelism via vectorization and multithreading, and improve inter-node communication are considered. These optimizations resulted in a 2.3X performance gain on a 16 node run of FEFLO, with the potential for larger performance gains as the code is scaled beyond 16 nodes. The impact of the different configurations of KNL's on-package MCDRAM (Multi-Channel DRAM) memory on FEFLO's performance is also explored. Finally, the performance of the optimized versions of FEFLO for KNL and Haswell (Intel Xeon) are compared.
Published on 01/01/2018
DOI: 10.1002/fld.4474
Licence: CC BY-NC-SA license
Are you one of the authors of this document?