In this course project we encourage you to develop your own set of methods for learning and classifying. You may form a team of up to two members and use various datasets from UCI and the ones in the class for practice. We will test your program on the dataset provided for the project. This is a simulated dataset of single nucleotide polymorphism (SNP) genotype data containing 29623 SNPs (total features). Amongst all SNPs are 15 causal ones which means they and neighboring ones discriminate between case and controls while remainder are noise. In the training are 4000 cases and 4000 controls. Your task is to predict the labels of 2000 test individuals whose true labels are known only to the instructor and TA. Both datasets and labels are immediately following the link for this project file. The training dataset is called traindata.gz (in gzipped format), training labels are in trueclass, and test dataset is called testdata.gz (also in gzipped format). You may use cross-validation to evaluate the accuracy of your method and for parameter estimation. The winner would have the highest accuracy in the test set with the fewest number of features. Your project must be in Python. You cannot use numpy or scipy. You may use the support vector machine, logistic regression, naive bayes, linear regression and dimensionality reduction modules but not the feature selection ones. These classes are available by importing the respective module. For example to use svm we do from sklearn import svm You may also make system calls to external C programs for classification such as svmlight, liblinear, fest, and bmrm. Your program would take as input the training dataset, the trueclass label file for training points, and the test dataset. The output would be a prediction of the labels of the test dataset in the same format as in the class assignments. Also output the total number of features and the feature column numbers that were used for final prediciton. If all features were used just say "ALL" instead of listing all column numbers. The score of your output is measured by accuracy/(#number of features). In order to qualify for full points you would need to achieve an accuracy of at least 63%. Submit your assignment by copying it into the directory /afs/cad/courses/ccs/f17/cs/675/001/. For example if your ucid is abc12 then copy your Perl or Python script into /afs/cad/courses/ccs/f17/cs/675/001/abc12. Submit a hardcopy in class as well. Your completed script is due before 11:30am on Dec 4th 2017.