You might have notices points missing here and there from the programming part of homework 1. I'll make some announcments in class but there are some things that need to be addressed. 1. The purpose of this homework problem was to familiarize you with the environment that you will be using later in the class (cluster, compiler). The programming part was obvious, 10-20 lines of code that could be done in half an hour; half of it was just rewriting of the other half (mmulrm vs mmulcm). In addition you had to read two documents and follow them faithfully (phw2.ps and uniinstall.txt). T 2. You were to get 4 points by just figuring out how to create a directory homework and subdirectories in it hw[123456]. Some of you managed not to create a directory (eg homework), or mislabel the subdirectories or misplace hw2 material in the hw1 subdirectory. You probably missed the comment in hw2.ps "There will be no files in directory hw1 as there was no programming component in Homework 1." If you skip reading the nores you will have more problems in HW3 (installation of the cluster version of BSPlib). 3. Many of you embedded printf/cin/cout statements in your multiplication code. Before you give production code away REMOVE such print statements. If you include in your executables debug/db info remove it as well. 4. Some of you timed your function by embedding a variety of timers within functions mmulrm mmulcm. If you want to time an executable, the easy way to do it % time executable on the command line. If you want to time a function f(). do (pseudocode below) t1 = sometimeCfunction ();//Use the one suitable for your f() t2 = sometimeCfunction (); output t2-t1 If you want to print something DON"T DO t1 = sometimeCfunction ();//Use the one suitable for your print "MY SUPER FUNCTION" f() t2 = sometimeCfunction (); output t2-t1 you could so print "MY SUPER FUNCTION" t1 = sometimeCfunction ();//Use the one suitable for your f() t2 = sometimeCfunction (); output t2-t1 In the first example, the print statement is timed, and so t2-t2 is time of f()+ time of print. 5. Initialize variables, arrays etc. Don't expect that everything will work nicely. Something is going to crash anyway sooner than later. Program defensively not optimistically. 6. Even if the two functions are no more than 10 lines each many of you forgot to do the obvious. a. rename one of the two functions mmulcm was used twice, or mmulrm twice for both copies of the code. b. A C[i*n+j] was never turned into C[i+n*j] in the other function. As a result half of the submitted implementations were buggy! If you don't pay attention to such tiny issues now, problems will pile-up later. BE MORE CAREFUL. 7. Some of you managed to write 10-liners that caused the C-compiler to generate warnings or errors (eg. inconsistency between C++-isms and C-isms). If you plan to write portable code aim at the lowest common denominator don't change the compiler options to suit your style. If your 10-line code generates warnings, fix your code, do not look for some esoteric compiler options that stop the printing of the warnings. Warnings, mean that you are doing something that might be risky. Avoid it. It is easier to eliminate warnings from 10-liners than figuring out what is going on in a 1000-liner. It is easier to eliminate warnings from 10-liners than figuring out how to shut them off with gcc options. 8. Starting with HW2 I'll assign bonus points to the fastest implementation that is also complete and elegant (tie breaker). In HW2 user7 got 10 bonus points. -------------------------------------------------------------------- ALGORITHMIC PROBLEMS. 9. Some of you are still having problems with parallel algorithm design. In sequential algorithm design you only deal with one dimension TIME. In parallel algorithm design you have 3 or 4 dimensions TIME (T) PROCESSOR SIZE (P) WORK (W=PT) ACTUAL WORK (W@) Efficiency means minimize all of them not just time. One needs to minimize sometimes all of them to get to the best algorithm possible. 10. One of the problems had to do with ranking/sorting. On the least powerful versions of the CRCW PRAM one can use n**2 procs to solve this problem in lgn time (order of magnitude). You can use n**2 / lgn problems as well to attain the same time bound. 11. Yet for the merging problem you were suggesting a. The same algorithm as in 10 (to merge you can just sort). . This is too much. Sequential algorithm requires linear time/work. . Parallel must do about the same work. b. More powerful PRAMs. . You can do it with less powerful. The CRCW combining PRAMs that have +,MIN, etc capabilities are just for fun not for designing algorithms. c. You suggested algorithms that take . n steps with more than one processors. . nlgn steps with more than one processors . n lgn step with 1 or more processors. If i do it sequentially i can still do it in n steps and use only one processor. This beats all case (c) algorithms. 12. Bonus points. . The bonus point problem was for you to think about this case. It's not a requirement to solve it. If you decide to solve it however THINK BEFORE YOU WRITE THE SOLUTION. a. Solutions such as 11c will be ignored. b. Solutions such as 11a will be ignored. c. One way to gain you some points and do it in lgn*lgn time on an EREW PRAM is to emulate a CREW on an EREW i.e. turn the Concurrent Reads into broadcast at a penalty of lgn. d. Those of you who have already taken other classes (eg CIS 667/CIS 467h) or are taking other classes, use what you learnt in those other classes. Eg. a comparison network for bitonic merging dones on an EREW PRAM allows you to merge two sorted sequences of n keys each in time T=lgn with P=n/2 processors and W=W_2 = nlgn (order of magnitude).