For the testing, i run you code using double for FTYPE, created some artificial A, and B and compared your results to the output of my own version of a matrix mult routing (column major storage only). Below you may review the performance of your functions (cm is for mmulcm and rm mmulrm) on two instances of 256x256, 512x512 matrices. Number are in Megaflops per second (flop count in matrix mult is n*n*n + n*n*(n-1) for adds/mults respectively. Higher numbers are better The instructor's own implementation is alexg (mmulcm). n= 256 512 cm/rm cm /rm alexg 98/ 70.00/ user2 27/23 20.60/14.10 user3 31/33 20.90/21.06 user4 30/28 21.00/18.04 user5 32/27 20.88/18.05 user6 33/27 20.77/14.40 user7 38/36 21.03/21.19 user8 36/29 21.13/18.09 User7's implementation is marginally faster than the remaining ones. He receives 10 (announced) bonus points for that.