You might have notices points missing here and there from the programming
part of homework 1. I'll make some announcments in class but there are
some things that need to be addressed.

1. The purpose of this homework problem was to familiarize you with
the environment that you will be using later in the class (cluster, compiler).
The programming part was obvious, 10-20 lines of code that could be done 
in half an hour; half of it was just rewriting of the other half
(mmulrm vs mmulcm). In addition you had to read two documents and follow 
them faithfully (phw2.ps and uniinstall.txt). T

2. You were to get 4 points by just figuring out how to create a directory
homework and subdirectories in it hw[123456]. Some of you managed
not to create a directory (eg homework), or mislabel the subdirectories 
or misplace hw2 material in the hw1 subdirectory. You probably missed 
the comment in hw2.ps "There will be no files in directory hw1 as
there was no programming component in Homework 1."
If you skip reading the nores you will have more problems in HW3 (installation
of the cluster version of BSPlib).

3. Many of you embedded printf/cin/cout statements in your multiplication
   code. Before you give production code away REMOVE such print statements.
   If you include in your executables debug/db info remove it as well.

4. Some of you timed your function by embedding a variety of timers within 
   functions mmulrm mmulcm. If you want to time an executable, the easy way
   to do it
   % time executable
     on the command line.

   If you want to time a function f().
   do (pseudocode below)
     t1 = sometimeCfunction ();//Use the one suitable for your
        f()
     t2 = sometimeCfunction ();
     output t2-t1
   
   If you want to print something
   DON"T DO
     t1 = sometimeCfunction ();//Use the one suitable for your
     print "MY SUPER  FUNCTION"
        f()
     t2 = sometimeCfunction ();
     output t2-t1

    you could so
     print "MY SUPER FUNCTION"
     t1 = sometimeCfunction ();//Use the one suitable for your
        f()
     t2 = sometimeCfunction ();
     output t2-t1

    In the first example, the print statement is timed, and so t2-t2
    is time of f()+ time of print.

5. Initialize variables, arrays etc. Don't expect that everything will work
   nicely. Something is going to crash anyway sooner than later.
   Program defensively not optimistically.

6. Even if the two functions are no more than 10 lines each many of you forgot
   to do the obvious.
   a.  rename one of the two functions mmulcm was used twice, or mmulrm twice
       for both copies of the code.

   b.  A C[i*n+j] was never turned into C[i+n*j] in the other function. As a result
   half of the submitted implementations were buggy!

   If you don't pay attention to such tiny issues now, problems will pile-up 
   later. BE MORE CAREFUL.

7. Some of you managed to write 10-liners that caused the C-compiler to generate
   warnings or errors (eg. inconsistency between C++-isms and C-isms).
   If you plan to write portable code aim at the lowest common denominator don't
   change the compiler options to suit your style.

   If your 10-line code generates warnings, fix your code, do not look for
   some esoteric compiler options that stop the printing of the warnings.
   Warnings, mean that you are doing something that might be risky. Avoid it.
   It is easier to eliminate warnings from 10-liners than figuring out what is
   going on in a 1000-liner. 
   It is easier to eliminate warnings from 10-liners than figuring out how to
   shut them off with gcc options.

8. Starting with HW2 I'll assign bonus points to the fastest implementation
   that is also complete and elegant (tie breaker). 

   In HW2 user7 got 10 bonus points.

--------------------------------------------------------------------
   ALGORITHMIC PROBLEMS.
9. Some of you are still having problems with parallel algorithm design.

   In sequential algorithm design you only deal with one dimension TIME.

   In parallel algorithm design you have 3 or 4 dimensions
    TIME (T)
    PROCESSOR SIZE (P)
    WORK (W=PT)
    ACTUAL WORK (W@)
   Efficiency means minimize all of them not just time.

   One needs to minimize sometimes all of them to get to the best algorithm
   possible.


10. One of the problems had to do with ranking/sorting.

    On the least powerful versions of the CRCW PRAM one can use n**2 procs
    to solve this problem in  lgn time (order of magnitude). You 
    can use n**2 / lgn problems as well
    to attain the same time bound. 


11. Yet for the merging problem you were suggesting
    a. The same algorithm as in 10 (to merge you can just sort).
         . This is too much. Sequential algorithm requires linear time/work.
         . Parallel must do about the same work.
    b. More powerful PRAMs.
         . You can do it with less powerful. The CRCW combining PRAMs that have
            +,MIN, etc capabilities are just for fun not for designing algorithms.
    c. You suggested algorithms that take
         . n steps with more than one processors.
         . nlgn steps with more than one processors
         . n lgn step with 1 or more processors.

     If i do it sequentially i can still do it in n steps and use only one
     processor. This beats all case (c) algorithms.

12. Bonus points.

    . The bonus point problem was for you to think about this case. It's not
    a requirement to solve it. If you decide to solve it however THINK BEFORE
    YOU WRITE THE SOLUTION.

    a. Solutions such as 11c will be ignored.
    b. Solutions such as 11a will be ignored.

    c. One way to gain you some points and do it in lgn*lgn time on 
       an EREW PRAM is to
       emulate a CREW on an EREW i.e. turn the Concurrent Reads into 
       broadcast at a penalty of lgn.

    d. Those of you who have already taken other classes (eg CIS 667/CIS 467h)
       or are taking other classes, use what you learnt in those other classes.

       Eg. a comparison network for bitonic merging dones on an EREW PRAM
       allows you to merge two sorted sequences of n keys each in time
       T=lgn with P=n/2 processors and W=W_2 = nlgn (order of magnitude).