D1.5 Third release of MAX software: Final report on restructuring, exascale readiness and inter-code libraries

Baroni, Stefano; Carnimeo, Ivan; Degomme, Augustin; Delugas, Pietro; De Gironcoli, Stefano Maria; Marini, Andrea; Sangalli, Davide; Varsano, Daniele; Ferrari Ruffino, Fabrizio; Ferretti, Andrea; Garcia, Alberto; Genovese, Luigi; Giannozzi, Paolo; Kozhevnikov, Anton; Marri, Ivan; Spallanzani, Nicola; And Daniel Wortmann,

This report summarizes the restructuring tasks carried out in WP1 in order to prepare MAX codes for the forthcoming pre- and exa-scale HPC platforms. The first milestone of this endeavour was reached at the preparation of the Software Development Plan, where we identified the code functionalities that had to be modularised, the data structures to be encapsulated within each module, and the APIs needed for accessing these data and functionalities. Leveraging this intra-code design work, we were also able to identify modules that were feasible and profitable to recast as standalone libraries, eventually redesigning them for an autonomous extra-code reuse. The work on modularization was practically accomplished by the second year of the project, when the flagship codes had already acquired their current structure, and most of the planned libraries had reached the production or final testing phase. While we had to operate further adjustments in the code structures, D1.3 and D1.4 can be considered the second milestone of this Work Package. The last milestone of WP1 deals with making the software architecture of MAX flagship codes robust and resilient against a hardware evolution scenario where multiple and diverse HPC hardware are expected emerge and be available. The technical details of the developments –together with the new features in latest releases– are reported in the code-specific sections of this document. These sections show that, during the last year, there has been a significant effort for improving the support of heterogeneous computing, working at the offload of kernels and data-structures on GPGPUs, with a renovated attention to avoiding or removing the usage of instructions sets that were too specific to the CUDA programming model. Thanks to the previous restructuring work, it was possible to target and localise most of these activities to the computationally relevant kernels. For some actions it was instead necessary to introduce offloading instructions in the science- specific layers of the codes. In these cases, directive-based programming models (such as openACC or openMP5) are progressively replacing platform or compiler specific solutions (e.g. CUDA-Fortran). To this purpose, the DevXlib library, designed and developed within MAX, provides an API and macros abstraction layer to manage different programming models. While most of the earlier development on this side was tested only on CUDA cards, in this last year MAX code developers have started the experimentation of the other accelerator cards, that will be used in the forthcoming pre- and (possibly) exascale machines. The work on this side progresses rapidly, again thanks to the already achieved internal reorganisation of the codes. In conclusion, this further confirms that the restructuring of the MAX flagship codes has successfully prepared them for a fast adaptation to the forthcoming pre-exascale, exascale, and even post-exascale HPC technologies.