An LU factorization algorithm with maximum  efficiency on parallel-vector computers

Corana, A; Martini, C; Morando, M; Ridella, S; Rolando, C

A technique for dense linear system solution is presented which reaches maximum performances on attached processors like FPS-120, FPS 5000 and X64 series using the Fortran language with calls to the vector routines. Starting from the Dongarra's LU factorization algorithm the key idea is to carry out a pseudo-transposition of the lower triangular matrix L (including the main diagonal) around the minor diagonal. The pseudo-transposition allows to carry out all the matrix vector operations involved in LU factorization with only stride 1 dot product operations which, using the TM Auxiliary Memory and the TMDOT routine, can be executed in the FPS processor at the maximum speed. Since the algorithm uses only vector instructions it is fully portable on all the FPS 38/64 bit machines and in general on all the vector computers with a similar memory structure. Furthermore the algorithm can be easily translated into the new FORTRAN 8X, which will probably become the standard for future SIMD computers for numerical applications. The algorithm has been implemented on a FPS-100 yielding the asymptotic speed r_inf = 8 MegaFLOPS (FPS-100 peak performances) and the half performances length N_1/2= 235. The N_1/2 value could be lowered by using the APAL Assembly Language to code some critical parts, losing however the code portability.

An LU factorization algorithm with maximum efficiency on parallel-vector computers

A Corana;C Martini;M Morando;S Ridella;C Rolando

1988

Abstract

A technique for dense linear system solution is presented which reaches maximum performances on attached processors like FPS-120, FPS 5000 and X64 series using the Fortran language with calls to the vector routines. Starting from the Dongarra's LU factorization algorithm the key idea is to carry out a pseudo-transposition of the lower triangular matrix L (including the main diagonal) around the minor diagonal. The pseudo-transposition allows to carry out all the matrix vector operations involved in LU factorization with only stride 1 dot product operations which, using the TM Auxiliary Memory and the TMDOT routine, can be executed in the FPS processor at the maximum speed. Since the algorithm uses only vector instructions it is fully portable on all the FPS 38/64 bit machines and in general on all the vector computers with a similar memory structure. Furthermore the algorithm can be easily translated into the new FORTRAN 8X, which will probably become the standard for future SIMD computers for numerical applications. The algorithm has been implemented on a FPS-100 yielding the asymptotic speed r_inf = 8 MegaFLOPS (FPS-100 peak performances) and the half performances length N_1/2= 235. The N_1/2 value could be lowered by using the APAL Assembly Language to code some critical parts, losing however the code portability.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				1988
			
	Strutture organizzative
	
				Istituto di Elettronica e di Ingegneria dell'Informazione e delle Telecomunicazioni - IEIIT
			
	Parole chiave
	
				algorithms; LU factorization; vector computers; efficiency; performance evaluation
			
	Appare nelle tipologie:
	
				02.01 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/309933

Citazioni

ND

ND

ND

CNR Institutional Research Information System

An LU factorization algorithm with maximum efficiency on parallel-vector computers

A Corana;C Martini;M Morando;S Ridella;C Rolando

1988

Abstract

Scheda breve

Scheda completa

Scheda completa (DC)

Citazioni

social impact

CNR Institutional Research Information System

An LU factorization algorithm with maximum efficiency on parallel-vector computers

A Corana;C Martini;M Morando;S Ridella;C Rolando

1988

Abstract

Scheda breve Scheda completa Scheda completa (DC)

Informazioni

Citazioni

social impact

Conferma cancellazione

Scheda breve

Scheda completa

Scheda completa (DC)