We analyze a ring algorithm for the computation of long-range interactions, and a modified version, which uses a box-assisted approach with linked lists, for the computation of short-range interactions. The general problem is exemplified considering the computation and the histogram of distances between points in a set. The algorithms, originally developed for homogeneous parallel systems, where they yield a nearly linear speed-up, are moved to heterogeneous systems (e.g. NOW). The main part of our work analyzes performance obtainable on such systems using a virtual ring of processes and assigning to each node a number of processes proportional to its relative speed. Following our analysis, we implemented a computer simulator, which allows to investigate some interesting properties of ring algorithms and to predict with a good accuracy the experimental results. Simulations and trials show that the use of multiple processes per node greatly reduce load umbalancing, which is the major cause of performance loss, without a significant context switching overhead, allowing good performance even on highly heterogeneous systems. The short-range interaction problem is interesting, since varying the neighbour size we are able to vary the computation to communication ratio of the algorithm. The proposed analysis is general and applies to any regular data-parallel ring-based application.

Ring algorithms on heterogeneous clusters with PVM: performance analysis and modeling

A Corana
2003

Abstract

We analyze a ring algorithm for the computation of long-range interactions, and a modified version, which uses a box-assisted approach with linked lists, for the computation of short-range interactions. The general problem is exemplified considering the computation and the histogram of distances between points in a set. The algorithms, originally developed for homogeneous parallel systems, where they yield a nearly linear speed-up, are moved to heterogeneous systems (e.g. NOW). The main part of our work analyzes performance obtainable on such systems using a virtual ring of processes and assigning to each node a number of processes proportional to its relative speed. Following our analysis, we implemented a computer simulator, which allows to investigate some interesting properties of ring algorithms and to predict with a good accuracy the experimental results. Simulations and trials show that the use of multiple processes per node greatly reduce load umbalancing, which is the major cause of performance loss, without a significant context switching overhead, allowing good performance even on highly heterogeneous systems. The short-range interaction problem is interesting, since varying the neighbour size we are able to vary the computation to communication ratio of the algorithm. The proposed analysis is general and applies to any regular data-parallel ring-based application.
2003
Istituto di Elettronica e di Ingegneria dell'Informazione e delle Telecomunicazioni - IEIIT
heterogeneous computing systems; ring algorithms; computation of long- and short-range interactions; PVM; performance evaluation
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/49142
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact