This work investigates the application and interaction of optimization techniques and performance models in a computational fluid dynamics (CFD) approach employing an OpenMP parallelized, explicit, weakly compressible, finite difference–based solver for the incompressible Navier–Stokes equations using a five-point wide stencil. The presented loop and stencil optimizations lead to a 6.8× increase in per core throughput. In order to verify optimal CPU utilization, performance models are applied to the tuned code. Three different performance models are considered: a roofline-based model, utilizing purely theoretical figures, one which is enhanced by measurements, and the execution cache memory model. It is shown that the models provide reliable estimates for simple benchmarks, such as seven-point stencils for scalar Laplacians, but the estimate quality is significantly worse for the complex and tuned stencil. While it is possible to include even more details in the model, it eventually leads to a state in which it purely reproduces the benchmarks from which it was derived. Thus, the applied general-purpose performance models are found to inaccurately predict the actual performance. They overestimate the achievable performance by more than about 97% for highly tuned code. Through further code tuning, 66% of the predicted performance could be achieved.
Published on 01/01/2018
DOI: 10.1177/1094342018774126
Licence: CC BY-NC-SA license
Are you one of the authors of this document?