CNR Institutional Research Information System

The new generation of microprocessors incorporates a huge number of cores on the same chip. This trades single-core performance off for the total amount of work done across multiple threads of execution. Graphics Processing Units (GPUs) are an example of this kind of architectures. The first generation of GPUs has been designed to support a fixed set of rendering functions. Nowa- days, GPUs are becoming easier to program. Therefore, they can be used for applications that have been traditionally handled by CPUs. The reasons of using General Purpose GPU (GPGPUs) in high-performance computations are: raw computing power, good performance per watt, and low costs. How- ever, some important issues limit a wide exploitation of GPGPUs. The main one concerns the heterogeneous and distributed nature of the memory hierar- chy. As a consequence, the speed-up of some applications depends on being able to efficiently access the data so that all cores are able to work at the same time. This chapter discusses the characteristics and the issues of the memory systems of this kind of architectures. We analyze these architectures from a theoretical point by using K-model, a model for capturing their performance constraints. K -model is used to estimate the complexity of a given algorithm defined on this model. This chapter describes how K-model can also be used to design efficient data access patterns for implementing efficient GPU algorithms. To this extent, we use K -model to derive an efficient realization of two popular algorithms, i.e., prefix sum and sorting. By means of reproducible experiments, we validate theoretical results showing that the optimization of an algorithm based on K-model corresponds to an actual optimization in practice.

Effective Data Access Patterns on Massively Parallel Processors

Capannini G;Baraglia R;Silvestri F;Nardini F M

2014

Abstract

The new generation of microprocessors incorporates a huge number of cores on the same chip. This trades single-core performance off for the total amount of work done across multiple threads of execution. Graphics Processing Units (GPUs) are an example of this kind of architectures. The first generation of GPUs has been designed to support a fixed set of rendering functions. Nowa- days, GPUs are becoming easier to program. Therefore, they can be used for applications that have been traditionally handled by CPUs. The reasons of using General Purpose GPU (GPGPUs) in high-performance computations are: raw computing power, good performance per watt, and low costs. How- ever, some important issues limit a wide exploitation of GPGPUs. The main one concerns the heterogeneous and distributed nature of the memory hierar- chy. As a consequence, the speed-up of some applications depends on being able to efficiently access the data so that all cores are able to work at the same time. This chapter discusses the characteristics and the issues of the memory systems of this kind of architectures. We analyze these architectures from a theoretical point by using K-model, a model for capturing their performance constraints. K -model is used to estimate the complexity of a given algorithm defined on this model. This chapter describes how K-model can also be used to design efficient data access patterns for implementing efficient GPU algorithms. To this extent, we use K -model to derive an efficient realization of two popular algorithms, i.e., prefix sum and sorting. By means of reproducible experiments, we validate theoretical results showing that the optimization of an algorithm based on K-model corresponds to an actual optimization in practice.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2014
			
	Strutture organizzative
	
				Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
			
	Codice ISBN
	
				978-1-118-71205-4
			
	Parole chiave
	
				k-model
GPU computing
			
	Appare nelle tipologie:
	
				02.01 Contributo in volume (Capitolo o Saggio)

File in questo prodotto:

File	Dimensione	Formato
prod_332959-doc_103224.pdf solo utenti autorizzati Descrizione: Effective data access patterns on massively parallel processors Tipologia: Versione Editoriale (PDF) Dimensione 463.19 kB Formato Adobe PDF Visualizza/Apri Richiedi una copia	463.19 kB	Adobe PDF	Visualizza/Apri Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/290266

Citazioni

ND

1

1

social impact