Line 3: | Line 3: | ||
Exascale HPC systems are just about to become available. Such enormous simulation capabilities from the hardware perspective, however, require software that can effectively make efficient use of this massively parallel hardware. For the domain-decomposition based algorithms generally used for CFD, it has become apparent that running one MPI process, i.e. one domain, per CPU core is no longer apt when there are hundreds of cores available per cluster node. There are just too many processes per node that need to communicate with other remote processes. Message aggregation is thus a key aspect of highly scalable parallel codes. A 2-level parallelization featuring a shared-memory level in addition to the MPI process level seems a promising solution. The CODA (CFD for ONERA, DLR, Airbus) software for high-fidelity compressible-flow simulations of industrial configurations implements such a 2-level domaindecomposition hybrid-parallel approach. The processing of unstructured meshes is particularly challenging with respect to load balancing and non-linear data access. For maximum parallel scalability, CODA's parallelization not only features overlapping communication with computation on the process level, but also a dedicated Single Program (on) Multiple Data (SPMD) programming model for the shared-memory level. Here, the design principles of this parallelization concept are outlined, followed by details concerning its implementation in the Flucs infrastructure (FLIS), which has become part of CODA. Moreover, results of scalability studies with CODA are presented, which impressively demonstrate the extreme scalability realized with this parallelization approach. The studies were performed on two distinct HPC clusters, one based on Intel's Xeon Scalable Processor, the other one based on AMD's EPYC. | Exascale HPC systems are just about to become available. Such enormous simulation capabilities from the hardware perspective, however, require software that can effectively make efficient use of this massively parallel hardware. For the domain-decomposition based algorithms generally used for CFD, it has become apparent that running one MPI process, i.e. one domain, per CPU core is no longer apt when there are hundreds of cores available per cluster node. There are just too many processes per node that need to communicate with other remote processes. Message aggregation is thus a key aspect of highly scalable parallel codes. A 2-level parallelization featuring a shared-memory level in addition to the MPI process level seems a promising solution. The CODA (CFD for ONERA, DLR, Airbus) software for high-fidelity compressible-flow simulations of industrial configurations implements such a 2-level domaindecomposition hybrid-parallel approach. The processing of unstructured meshes is particularly challenging with respect to load balancing and non-linear data access. For maximum parallel scalability, CODA's parallelization not only features overlapping communication with computation on the process level, but also a dedicated Single Program (on) Multiple Data (SPMD) programming model for the shared-memory level. Here, the design principles of this parallelization concept are outlined, followed by details concerning its implementation in the Flucs infrastructure (FLIS), which has become part of CODA. Moreover, results of scalability studies with CODA are presented, which impressively demonstrate the extreme scalability realized with this parallelization approach. The studies were performed on two distinct HPC clusters, one based on Intel's Xeon Scalable Processor, the other one based on AMD's EPYC. | ||
+ | |||
+ | == Abstract == | ||
+ | <pdf>Media:Draft_Sanchez Pinedo_3374061271282_abstract.pdf</pdf> |
Exascale HPC systems are just about to become available. Such enormous simulation capabilities from the hardware perspective, however, require software that can effectively make efficient use of this massively parallel hardware. For the domain-decomposition based algorithms generally used for CFD, it has become apparent that running one MPI process, i.e. one domain, per CPU core is no longer apt when there are hundreds of cores available per cluster node. There are just too many processes per node that need to communicate with other remote processes. Message aggregation is thus a key aspect of highly scalable parallel codes. A 2-level parallelization featuring a shared-memory level in addition to the MPI process level seems a promising solution. The CODA (CFD for ONERA, DLR, Airbus) software for high-fidelity compressible-flow simulations of industrial configurations implements such a 2-level domaindecomposition hybrid-parallel approach. The processing of unstructured meshes is particularly challenging with respect to load balancing and non-linear data access. For maximum parallel scalability, CODA's parallelization not only features overlapping communication with computation on the process level, but also a dedicated Single Program (on) Multiple Data (SPMD) programming model for the shared-memory level. Here, the design principles of this parallelization concept are outlined, followed by details concerning its implementation in the Flucs infrastructure (FLIS), which has become part of CODA. Moreover, results of scalability studies with CODA are presented, which impressively demonstrate the extreme scalability realized with this parallelization approach. The studies were performed on two distinct HPC clusters, one based on Intel's Xeon Scalable Processor, the other one based on AMD's EPYC.
Published on 24/11/22
Accepted on 24/11/22
Submitted on 24/11/22
Volume Computational Fluid Dynamics, 2022
DOI: 10.23967/eccomas.2022.208
Licence: CC BY-NC-SA license
Are you one of the authors of this document?