New Page 1

Go to COE498 Experiment | 1 | 2 | 3 | 4 | Part List | Lab manuals | ECE Lab home

COE498 Advanced Computer System Design Lab

Chapter 2

Experiment 2

Systolic-Array Implementation of Matrix-By-Matrix Multiplication

2.1 Objective

The multiplication of matrices is a very common operation in engineering and scientific problems. The sequential implementation of this operation is very time consuming for large matrices; the brute-force solution results in computation time O(n³), for n x n matrices. For this reason, several parallel algorithms have been developed to solve this problem more efficiently. Here, a simple parallel algorithm is presented for this problem and a "hardwired" (actually, systolic-array) implementation of the algorithm becomes our objective.

2.2 What You Need

8 8-bit shift registers.

8 LED bar displays and associated SIP resistors (common lead).

2 EPM7064/68s or EPM7096/68s.

Other miscellaneous chips depending on your design.

c₁₁ = a₁₁ ● b₁₁ + a₁₂ ● b₂₁ + a₁₃ ● b₃₁ + a₁₄ ● b₄₁c₁₂ = a₁₁ ● b₁₂ + a₁₂ ● b₂₂ + a₁₃ ● b₃₂ + a₁₄ ● b₄₂c₁₃ = a₁₁ ● b₁₃ + a₁₂ ● b₂₃ + a₁₃ ● b₃₃ + a₁₄ ● b₄₃c₁₄ = a₁₁ ● b₁₄ + a₁₂ ● b₂₄ + a₁₃ ● b₃₄ + a₁₄ ● b₄₄c₂₁ = a₂₁ ● b₁₁ + a₂₂ ● b₂₁ + a₂₃ ● b₃₁ + a₂₄ ● b₄₁c₂₂ = a₂₁ ● b₁₂ + a₂₂ ● b₂₂ + a₂₃ ● b₃₂ + a₂₄ ● b₄₂c₂₃ = a₂₁ ● b₁₃ + a₂₂ ● b₂₃ + a₂₃ ● b₃₃ + a₂₄ ● b₄₃c₂₄ = a₂₁ ● b₁₄ + a₂₂ ● b₂₄ + a₂₃ ● b₃₄ + a₂₄ ● b₄₄c₃₁ = a₃₁ ● b₁₁ + a₃₂ ● b₂₁ + a₃₃ ● b₃₁ + a₃₄ ● b₄₁c₃₂ = a₃₁ ● b₁₂ + a₃₂ ● b₂₂ + a₃₃ ● b₃₂ + a₃₄ ● b₄₂c₃₃ = a₃₁ ● b₁₃ + a₃₂ ● b₂₃ + a₃₃ ● b₃₃ + a₃₄ ● b₄₃c₃₄ = a₃₁ ● b₁₄ + a₃₂ ● b₂₄ + a₃₃ ● b₃₄ + a₃₄ ● b₄₄c₄₁ = a₄₁ ● b₁₁ + a₄₂ ● b₂₁ + a₄₃ ● b₃₁ + a₄₄ ● b₄₁c₄₂ = a₄₁ ● b₁₂ + a₄₂ ● b₂₂ + a₄₃ ● b₃₂ + a₄₄ ● b₄₂c₄₃ = a₄₁ ● b₁₃ + a₄₂ ● b₂₃ + a₄₃ ● b₃₃ + a₄₄ ● b₄₃c₄₄ = a₄₁ ● b₁₄ + a₄₂ ● b₂₄ + a₄₃ ● b₃₄ + a₄₄ ● b₄₄

Figure 2.1: Multiplication of matrices of size 4 4.

2.3 Introduction

2-dimensional, mesh-connected parallel computers are often used in systolic-array configuration for the multiplication of matrices. For the sake of simplicity, we assume input matrices of size 4 x 4 containing one-bit integer elements. Figure 2.1 shows the operations to be performed. The ● and + represent the integer operations multiplication and addition, respectively.

The two matrices A and B are shifted into the boundary processors in column 1 and row 1, respectively, as shown in Figure 2.2. The leading and trailing 0s in rows and columns are employed so that elements a_ir and b_rj arrive at processor P_ij simultaneously for the operation a_ir ● b_rj to be performed. c_ij is initialized to 0 in P_ij , for all i, j = 1, 2, 3, 4. At the end, processor P_ij will contain c_ij , for 1 ≤ i, j ≤ 4

Whenever a processor P_ij receives two inputs b and a from the north and the west, respectively, it performs the following set of operations, in this order:

Figure 2.2: A 4 x 4 mesh (systolic array) of processors for matrix multiplication.

it calculates a ● b;

it adds the result to the previous value c_ij , and stores the result in c_ij ;

it sends a to P_i,j+1, unless j = 4; and

it sends b to P_i_{+ 1,}_j, unless i = 4.

This algorithm takes time O(n), for n x n matrices.

2.4 Experiment

Implement this parallel algorithm directly in hardware using shift registers and the two Altera chips. Optimize your design with respect to the size of operands. Use LED bar graphs to display the intermediate and final results.

The proper operation of the entire design and each subsection is to be simulated in the Altera simulation software before the chips are programmed and the board is wired. The waveforms from these simulations should be included in the lab report.

Go to COE498 Experiment | 1 | 2 | 3 | 4 | Part List | Lab manuals | ECE Lab home