Implement stochastic gradient descent (SGD) in your back propagation program that you wrote in assignment 3. In the original SGD algorithm we update the gradient based on a single datapoint: SGD algorithm: Initialize random weights for(k = 0 to n_epochs): Shuffle the rows (or row indices) for j = 0 to rows-1: Determine gradient using just the jth datapoint Update weights with gradient Recalculate objective We will modify this into the mini-batch version and implement it for this assignment. I. Mini-batch SGD algorithm: Initialize random weights for(k = 0 to n_epochs): for j = 0 to rows-1: Shuffle the rows (or row indices) Select the first k datapoints where k is the mini-batch size Determine gradient using just the selected k datapoints Update weights with gradient Recalculate objective Your input, output, and command line parameters are the same as assignment 3. We take the batch size k as input. We leave the offset for the final layer to be zero at this time. Test your program on the XOR dataset: 1 0 0 1 1 1 -1 0 1 -1 1 0 1. Test your program on breast cancer and ionosphere given on the website. Is the mini-batch faster or the original one? How about accuracy? 2. Is the search faster or more accurate if you keep track of the best objective in the inner loop? Submit your assignments by copying your program to your AFS course folder /afs/cad/courses/ccs/S20/cs/677/850/. The assignment is due on 11:30am June 13th, 2020.