In this work, we address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a omega(k)-bit, (k greater than or equal to 2), sequence using as basic building blocks linear arrays of at most omega(2) shift switches, where omega is a small power of 2. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most omega(2). We adopt a VLSI delay model where the "length" of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a omega(k)-bit binary sequence in the time of 2k - 2 broadcasts, while the corresponding prefix sums can be computed in the time of 3k - 4 broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most w?, the total number of broadcasts involved is less than three times the number required by an "ideal" design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a k omega(k)-bit binary sequence in the time of 3k + [log(omega) k] - 3 broadcasts. Using this design, the corresponding prefix sums can be computed in the time of 4k + [log(omega) k] - 5 broadcasts.

Scalable hardware-algorithms for binary prefix sums

2000

Abstract

In this work, we address the problem of designing efficient and scalable hardware-algorithms for computing the sum and prefix sums of a omega(k)-bit, (k greater than or equal to 2), sequence using as basic building blocks linear arrays of at most omega(2) shift switches, where omega is a small power of 2. An immediate consequence of this feature is that in our designs broadcasts are limited to buses of length at most omega(2). We adopt a VLSI delay model where the "length" of a bus is proportional with the number of devices on the bus. We begin by discussing a hardware-algorithm that computes the sum of a omega(k)-bit binary sequence in the time of 2k - 2 broadcasts, while the corresponding prefix sums can be computed in the time of 3k - 4 broadcasts. Quite remarkably, in spite of the fact that our hardware-algorithm uses only linear arrays of size at most w?, the total number of broadcasts involved is less than three times the number required by an "ideal" design. We then go on to propose a second hardware-algorithm, operating in pipelined fashion, that computes the sum of a k omega(k)-bit binary sequence in the time of 3k + [log(omega) k] - 3 broadcasts. Using this design, the corresponding prefix sums can be computed in the time of 4k + [log(omega) k] - 5 broadcasts.
2000
Istituto di Scienza e Tecnologie dell'Informazione "Alessandro Faedo" - ISTI
Hardware-algorithms
Shift switching
Binary prefix sums
Binary counting
Scalable architectures
Pipelining
Special purpose and application based systems
File in questo prodotto:
File Dimensione Formato  
prod_406021-doc_141942.pdf

solo utenti autorizzati

Descrizione: Scalable hardware-algorithms for binary prefix sums
Tipologia: Versione Editoriale (PDF)
Dimensione 387.97 kB
Formato Adobe PDF
387.97 kB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/20.500.14243/388099
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 20
  • ???jsp.display-item.citation.isi??? 14
social impact