Write a CUDA program for computing the dot product of a vector in parallel with each row of a matrix. The inputs are a data matrix similar to the format in the Chi2 program and a vector in separate files. The program should output the the result of the dot products. For example if the input is 1 2 0 1 1 0 1 2 1 and w = (2, 4, 6) then your program should output 10 6 16 Compute the dot products in parallel your kernel function. You will have to transpose the data matrix in order to get coalescent memory access. Submit your assignments by copying your program to your AFS course folder /afs/cad/courses/ccs/s16/cs/732/004/. The assignment is due on Feb 8th, 2016.