1.Introduction
2.Tests
3.How to compile
4.How to run 


1. Introduction

assess.c assesses the performance of a parallel machine using
some architecture independent operations to measure communication
and synchronization efficiency. 
Currently, the only standard supported is the BSP worldwide standard.
The program however can run transparently under BSPlib and LAMMPI.
The code has been debugged and test under the BSPlib library;

If you plan to setup a pc cluster under BSPlib file cluster.how may
be of help.

2. Tests
The code performs the following tests.
A. Checking the synchronization periodicity of the interprocessor
   communication network. Timing are provided for the following tests
   Time y-x below is reported.

   Test 1: Time for barrier synchronization
            y=time;
            sync();
            x=time;
   Test 2: A more realistic measure of synchronization periodicity, in
           which a rudimentary computation is performed
            y=time;
            compute (eg z=z+1)
            sync();
            x=time;
   Test 3: A communication is performed; the pattern is a full relation
            (or total exchange) where each processor sends every other
           processor an integer (32bits in most machines).
            y=time;
            for i,j=0..p-1 do
              proc i sends proc j one integer
            sync();
            x=time;
   Test 4: A rudimentary communication is performed; the pattern is simpler
           than before
            y=time;
            proc 0 sends proc p-1 one integer
            sync();
            x=time;
   Test 5: A rudimentary communication is performed; the pattern is simpler
           than Step 3 but not as simple as Test4. A scatter-like operation
           is performed.
            y=time;
            for i=0,..,p-1
            proc 0 sends proc i one integer
            sync();
            x=time;

    All five tests can then be used to assess say the value of the BSP model
    parameter L. Bspprobe of BSPlib performs Tests 1 and 4 and normally
    the value L that is reported is that of Test 1.

    A  derivation of L however can only be completed after tests B are
    performed.

B. Throughput is measured by performing two test.
  a. An full relation is performed where each processor sends and receives
     a 2^i p integers. Each processor sends to every other processor
     2^i integers.
  b. Same as above except that each processors sends to every other
      processor a number n of integers that is chosen uniformly at
     random in the interval [ 2^(i-1) .. (3/2) 2^i].
     Such a pattern of communication is observed for example in randomized
     parallel sorting.


3. How to compile

   First edit the ai.h and decide whether you want to use BSPlib
   or LAMMPI by def/ing undef/ing the appropriate definitions.

   Then type
% make allbsp
   or
% make allmpi
   depending on the previous choice.
   If all went ok, you will be ready to run the code.

4. How to run

If you run assess on a cluster of PCs under BSPlib make sure you have
a valid .bsptcphosts file (Note: if you have SMP boxes in the cluster
and your local machine is  one of them, make sure it is included in
the last line of the .bsptcphosts file). If you create a lam however,
make sure that your local machine is the first one in the lam hosts
definition file you use for lamboot.

Under BSPlib I run assess as follows.

% bsprun -noload -local -npes 4 bspas 8192 10 1 OUT

-noload is optional but i always use it for the following reason.
  If your .bsptcphosts has 4 dual processor machines listed and you
  want one processor per machine, if you don't use this option
   you may get two processes on the same machine (one per CPU).
8192  : first parameter is a power of two. Gives the maximum 2^i p of
        the  B tests. For test B, a maximum h-relation is
         8192 * nprocs/2, where nprocs is # of processors.
10    : number of runs per test; average (and in some tests, min and max)
        is given in the output
1     : After an h-relation is realized the following h relation is of
        size h' = h << 1 = 2*h. If you want h'=8h set this parameter to 3.
OUT   : A brief output of the experiments is reported on standard output
        (including host names of processors). A more detailed output of
         the experiments is reported in file OUT.
