1.Introduction
2.Tests
3.How to compile
4.How to run 





1. Introduction

assess.c assesses the performance of a parallel machine using
some architecture independent operations to measure communication
and synchronization efficiency. 
Currently, the only standard supported is the BSP worldwide standard.
The code has been debugged and test under the BSPlib library;
It is likely it will work with the PUB library as well.
File ai.h is a crude (and naive) approximation in allowing other
communication libraries to be used.

If you plan to setup a pc cluster under BSPlib file cluster.how may
be of help.

2. Tests
The code performs the following tests.
A. Checking the synchronization periodicity of the interprocessor
   communication network. Timing are provided for the following tests
   Time y-x below is reported.

   Test 1: Time for barrier synchronization
            y=time;
            sync();
            x=time;
   Test 2: A more realistic measure of synchronization periodicity, in
           which a rudimentary computation is performed
            y=time;
            compute (eg z=z+1)
            sync();
            x=time;
   Test 3: A communication is performed; the pattern is a full relation
            (or total exchange) where each processor sends every other
           processor an integer (32bits in most machines).
            y=time;
            for i,j=0..p-1 do
              proc i sends proc j one integer
            sync();
            x=time;
   Test 4: A rudimentary communication is performed; the pattern is simpler
           than before
            y=time;
            proc 0 sends proc p-1 one integer
            sync();
            x=time;
   Test 5: A rudimentary communication is performed; the pattern is simpler
           than Step 3 but not as simple as Test4. A scatter-like operation
           is performed.
            y=time;
            for i=0,..,p-1
            proc 0 sends proc i one integer
            sync();
            x=time;

    All five tests can then be used to assess say the value of the BSP model
    parameter L. Bspprobe of BSPlib performs Tests 1 and 4 and normally
    the value L that is reported is that of Test 1.

    A  derivation of L however can only be completed after tests B are
    performed.

B. Throughput is measured by performing two test.
  a. An full relation is performed where each processor sends and receives
     a 2^i p integers. Each processor sends to every other processor
     2^i integers.
  b. Same as above except that each processors sends to every other
      processor a number n of integers that is chosen uniformly at
     random in the interval [ 2^(i-1) .. (3/2) 2^i].
     Such a pattern of communication is observed for example in randomized
     parallel sorting.


3. How to compile

% make assess

4. How to run

If you run assess on a cluster of PCs under BSPlib make sure you have
a valid .bsptcphosts file (Note: if you have SMP boxes in the cluster
and your local is machine is such an one, make sure it is included in
the last line of the .bsptcphosts file).

Under BSPlib I run assess as follows.

% bsprun -npes 4 -noload assess 8192 10 1 A

-noload is optional but i always use it for the following reason.
  If your .bsptcphosts has 4 dual processor machines listed and you
  want one processor per machine, if you don't use this option
   you may get two processes on the same machine (one per CPU).
8192  : first parameter is a power of two. Gives the maximum 2^i p of
        the  B tests. For test B.a maximum h-relation is
         8192 * nprocs/2
10    : number of runs per test; average (and in some tests, min and max)
        is given in the output
1     : After an h-relation is realized the following h relation is of
        size h' = h << 1 = 2*h. If you want h'=8h set this parameter to 3.
A     : A brief output of the experiments is reported on standard output
        (including host names of processors). A more detailed output of
         the experiments is reported in this file (ie A).
