# Consiglio Mazionale delle Ricezche

# ISTITUTO DI ELABORAZIONE DELLA INFORMAZIONE

PISA

VLSI LINEAR TRANSFORMATIONS OF BANDED MATRICES

B. Codenotti, G. Lotti

Nota interna B84-17

Settembre 1984

## VLSI LINEAR TRANSFORMATIONS OF BANDED MATRICES

# B. CODENOTTI1 and G. LOTTI2

1 Istituto di Elaborazione dell'Informazione - CNR, via S. Maria 46, PISA, ITALY.

<sup>2</sup>Dipartimento di Scienze dell'Informazione - University of Pisa, Corso Italia 40, PISA, ITALY.

#### AESTRACT

The complexity of the multiplication of a banded matrix by a vector is studied, with respect to the VLSI model.

Upper and lower bounds are showed, which improve the known results.

An upper bound, which is optimal up to logarithmic factors, is obtained for the case of fixed band.

Key Words - VLSI Design, Area-Time Complexity, Banded Matrix, Recursive algorithm.

#### 1. INTRODUCTION

In this paper, the linear transformation of a vector by a (2k+1)-diagonal matrix with variable entries, is studied.

This problem has been investigated in [KUNG], where the systolic arrays structure has been proposed, producing  $AT^2 = O(n^2k)$ .

By comparing this result with the lower bound obtained by using the well known tecnique of minimal information flow [THO], it seems possible to improve either the lower and the upper bound.

In fact, the minimal information flow results to be dependent on k and not on the size n of the matrix, while simple considerations based on the number of inputs, yield an  $\Omega$  (nk) lower bound.

On the other hand, the above mentioned upper bound can be also improved, in the case of (2k+1)-diagonal matrices, with k not increasing with n.

This assumption does not influence the validity of our result, since (2k+1)-diagonal matrices with fixed k represent a very important class of banded matrices (e.g. matrices arising from differential problems).

In section 3, the upper bound  $AT^2 = O(n k log^3 n)$ , is obtained, which meets the lower bound of section 2 up to logarithmic factors.

#### 2. LOWER BOUNDS

In this section, the well known technique introduced by Thompson [THO] to give lower bounds to the minimal information flow across a VISI circuit is considered. This technique has been successfully applied to the case of a linear transform, whose matrix has constant entries (e.g. DFT transform).

Our problem can be seen as a linear transform defined by a (2k+1)-diagonal matrix A(x), i.e. A(x)=(a), a=0 for ij ij

|i-j|>k, with varible entries.

Let

$$A(x) = \begin{bmatrix} A(x) & A(x) \\ 11 & 12 \\ A(x) & A(x) \\ 21 & 22 \end{bmatrix}$$
, the computation of the n-vector

z(x) = A(x) y is now splitted as a

$$z(x) = A(x)y + A(x)y$$
,

$$z_{2}(x) = A_{21}(x)y + A_{22}(x)y$$

where  $z(x) \cap z(x) = 0$ ,  $z(x) \cup z(x) = z(x)$ , |z(x)| = |z(x)| = n/2,

$$y \cap y = 0$$
,  $y \cup y = y$ ,  $|y| + |y| = n$ .

The straightforward extension of Thompson approach yields the following lower bound to the minimal flow I:

It is trivial to see that the above relation, in the case of banded matrices, gives a lower bound not depending on the size

n of the matrix. A more significant lower bound can be obtained by the following arguments.

If p=0(nk) and r= $\Theta$ (nk) are respectively the number of input gates and of input variables, then A= $\Omega$ (p) and T= $\Omega$ (r/p), that is AT<sup>2</sup>= $\Omega$ (n<sup>2</sup>k<sup>2</sup>/p)= $\Omega$ (nk).

The best known upper bound is AT2=O(n2k) [Kung] obtained by sistolic arrays. In the next section an area-time upper bound of order O(n log3n) is shown.

## 3. UPPER BOUNDS

Let n=2, let A be a (2k+1)-diagonal matrix of size B, partitioned as

$$A = \begin{bmatrix} U & V \\ W & Z \end{bmatrix}, \text{ where } U \text{ and } Z \text{ are } (2k+1) \text{-diagonal square}$$
 matrices of size n/2.

The computation of the matrix-vector product z=Ay can be performed as follows.

z , z are n/2-vectors; then 1 2

$$\begin{array}{rclcrcl}
 z & = & U & y & + & V & y \\
 1 & & & 1 & & 2 \\
 2 & = & W & y & + & Z & y & & \\
 2 & & & 1 & & 2 & & \\
 \end{array}$$

Formula (3.1) leads to the recursive design of fig. 1, where gates of type (1), (2), (3) transmit respectively the entries



Fig. 1 Recursive module structure.

of U, y, V; modules of type I perform the transform of size n/2; modules of type II perform the matrix-vector product of size k, which requires  $A = O(k \log k)$  and  $T = O(\log k)$ , using the well known structure of mesh of trees [LEI].

In order to allow a correct recursive implementation, the r=(k+1) (2n-k) inputs have to be sent with the following order

a a ... a b a ... a b ... a ... a b, 11 21 k1 1 12 k+1,2 2 n-k+1,n nn n throught the r input gates.

In fig.2, the recursive circuit is shown in more detail, in the case  $n=\mu$ , k=1.

The inner recursive step consists of  $(k+1) \times (k+1)$  (kxk) matrix-vector product, for k odd (even), which can be performed by using mesh of trees.

Let T(n), H(n), L(n) be respectively the time, the height and the width required by the circuit related to the problem of size n.

It readily follows that:  $T(n) = \max \left\{ T(n/2) , O(\log k) \right\} + O(1) = O(\log n),$ 

H(n) = 2H(n/2) + O(nk) = O(nk log n),

 $L(n) = L(n/2) + O(k \log k) = O(\log n),$ 

from which an AT2 upper bound of order O(nk log3n) is obtained.



Fig. 2. Detailed design for n=4 and k=1.

#### REFERENCES

- [1] Kung, H.T. and C.E. Leiserson, Algorithms of VISI Processor Arrays, in: C.A. Mead and L.A. Conway, eds., Introduction to VLSI Systems (Addison-Wesley, Reading, MA, 1980) pp. 271-292.
- [2] F.T.LEIGHTON, New Lower Bound Tecniques for VISI.

  Proceedings of the 22nd Annual IEEE Symposium on Foundations of

  Computer Science (1981), 1-12.
- [3] C.D. THCMPSON, Area-Time Complexity for VLSI. Proc. 11th Annual ACM Symp. on the Theory of Computing (SIGACT) (1979) 81-88.