CNR Institutional Research Information System

We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting Nvidia Graphics Processing Unit (GPU) accelerators. The work extends previous efforts of some of the authors in the exploitation of a single GPU accelerator and proposes an implementation, based on the hybrid MPI-CUDA software environment, of a Krylov-type linear solver relying on an efficient Algebraic MultiGrid (AMG) preconditioner already available in the BootCMatchG library. Our design for the hybrid implementation has been driven by the best practices for minimizing data communication overhead when multiple GPUs are employed, yet preserving the efficiency of the GPU kernels. Strong and weak scalability results of the new version of the library on well-known benchmark test cases are discussed. Comparisons with the Nvidia AmgX solution show a speedup, in the solve phase, up to 2.0x.

A multi-GPU aggregation-based AMG preconditioner for iterative linear solvers

M Bernaschi^Software;A Celestini^{Membro del Collaboration Group};F Vella^{Membro del Collaboration Group};P D'Ambra^{Conceptualization}

2023

Abstract

We present and release in open source format a sparse linear solver which efficiently exploits heterogeneous parallel computers. The solver can be easily integrated into scientific applications that need to solve large and sparse linear systems on modern parallel computers made of hybrid nodes hosting Nvidia Graphics Processing Unit (GPU) accelerators. The work extends previous efforts of some of the authors in the exploitation of a single GPU accelerator and proposes an implementation, based on the hybrid MPI-CUDA software environment, of a Krylov-type linear solver relying on an efficient Algebraic MultiGrid (AMG) preconditioner already available in the BootCMatchG library. Our design for the hybrid implementation has been driven by the best practices for minimizing data communication overhead when multiple GPUs are employed, yet preserving the efficiency of the GPU kernels. Strong and weak scalability results of the new version of the library on well-known benchmark test cases are discussed. Comparisons with the Nvidia AmgX solution show a speedup, in the solve phase, up to 2.0x.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Strutture organizzative
	
				Istituto Applicazioni del Calcolo ''Mauro Picone''
			
	Parole chiave
	
				GPU accelerators
heterogeneous computing
iterative sparse linear solvers
parallel numerical algorithms
scalability
			
	Appare nelle tipologie:
	
				01.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
IEEE2023.pdf non disponibili Descrizione: articolo in rivista Tipologia: Versione Editoriale (PDF) Licenza: NON PUBBLICO - Accesso privato/ristretto Dimensione 983.89 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	983.89 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/456142

Citazioni

ND

8

5

social impact