Although sort has been extensively studied in many research works, it still remains a challenge in particular if we consider the implications of novel processor technologies such as manycores (i.e. GPUs, Cell/BE, multicore, etc.). In this paper, we compare different algorithms for sorting integers on stream multiprocessors and we discuss their viability on large datasets (such as those managed by search engines). In order to fully exploit the potentiality of the underlying architecture, we designed an optimized version of sorting network in the K-model, a novel computational model designed to consider all the important features of many-core architectures. According to K-model, our bitonic sorting network mapping improves the three main aspects of many-core architectures, i.e. the processors exploitation, and the on-chip/off-chip memory bandwidth utilization. Furthermore we are able to attain a space complexity of O(1). We experimentally compare our solution with state-of-the-art ones (namely, quick-sort and radix-sort) on GPUs. We also compute the complexity in the K-model for such algorithms. The conducted evaluation highlight that our bitonic sorting network is faster than quick-sort and slightly slower than radix, yet being an in-place solution it consumes less memory than both algorithms.

Sorting on GPUs for large scale datasets: a thorough comparison

2012

Abstract

Although sort has been extensively studied in many research works, it still remains a challenge in particular if we consider the implications of novel processor technologies such as manycores (i.e. GPUs, Cell/BE, multicore, etc.). In this paper, we compare different algorithms for sorting integers on stream multiprocessors and we discuss their viability on large datasets (such as those managed by search engines). In order to fully exploit the potentiality of the underlying architecture, we designed an optimized version of sorting network in the K-model, a novel computational model designed to consider all the important features of many-core architectures. According to K-model, our bitonic sorting network mapping improves the three main aspects of many-core architectures, i.e. the processors exploitation, and the on-chip/off-chip memory bandwidth utilization. Furthermore we are able to attain a space complexity of O(1). We experimentally compare our solution with state-of-the-art ones (namely, quick-sort and radix-sort) on GPUs. We also compute the complexity in the K-model for such algorithms. The conducted evaluation highlight that our bitonic sorting network is faster than quick-sort and slightly slower than radix, yet being an in-place solution it consumes less memory than both algorithms.
2012
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Algorithms performance sorting
Algorithms performance sorting
Stream programming
Graphical processor unit
Bitonic sorting network
File in questo prodotto:
File Dimensione Formato  
prod_199631-doc_43759.pdf

solo utenti autorizzati

Descrizione: Sorting on GPUs for large scale datasets: a thorough comparison
Tipologia: Versione Editoriale (PDF)
Dimensione 536.22 kB
Formato Adobe PDF
536.22 kB Adobe PDF   Visualizza/Apri   Richiedi una copia
prod_199631-doc_77988.pdf

solo utenti autorizzati

Descrizione: Sorting on GPUs for large scale datasets: a thorough comparison
Tipologia: Versione Editoriale (PDF)
Dimensione 298.9 kB
Formato Adobe PDF
298.9 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/18714
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 19
  • ???jsp.display-item.citation.isi??? 8
social impact