In parallel computing, a complex task is typically split among many computing resources, which are engaged to perform portions of such task in a parallel fashion. Except for a very limited class of application, computing resources need to coordinate with each other in order to carry out the parallel execution in a consistent way. As a consequence, a synchronization overhead arises, which can significantly impair the overall execution performance. Typically, the synchronization is achieved by adopting a centralized synchronization barrier involving all the computing resources. In many application domains, though, such kind of global synchronization can be relaxed and a lean synchronization schema, namely local synchronization, can be exploited. By using local synchronization, each computing resource needs to synchronize only with a subset of the other computing resources. In this work, we evaluate the performance of the local synchronization mechanism when compared to the global synchronization scenario. As a key performance indicator, the efficiency index is considered, which is the speedup normalized with respect to the number of computing nodes. The efficiency trend is evaluated both analytically and through numerical simulation. More in particular, the analytical study is carried out by exploiting extreme value theory for the case of global synchronization, whereas, the max-plus algebra theory is used in the case of local synchronization.
Improving Efficiency in Parallel Computing Leveraging Local Synchronization
Franco Cicirelli;Andrea Giordano;Carlo Mastroianni
2019
Abstract
In parallel computing, a complex task is typically split among many computing resources, which are engaged to perform portions of such task in a parallel fashion. Except for a very limited class of application, computing resources need to coordinate with each other in order to carry out the parallel execution in a consistent way. As a consequence, a synchronization overhead arises, which can significantly impair the overall execution performance. Typically, the synchronization is achieved by adopting a centralized synchronization barrier involving all the computing resources. In many application domains, though, such kind of global synchronization can be relaxed and a lean synchronization schema, namely local synchronization, can be exploited. By using local synchronization, each computing resource needs to synchronize only with a subset of the other computing resources. In this work, we evaluate the performance of the local synchronization mechanism when compared to the global synchronization scenario. As a key performance indicator, the efficiency index is considered, which is the speedup normalized with respect to the number of computing nodes. The efficiency trend is evaluated both analytically and through numerical simulation. More in particular, the analytical study is carried out by exploiting extreme value theory for the case of global synchronization, whereas, the max-plus algebra theory is used in the case of local synchronization.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


