CS 675

CS 675: Introduction to Machine learning
Spring 2020

Instructor: Usman Roshan
Office: GITC 4214B
Ph: 973-596-2872
Email: usman@njit.edu
Grader: Smit Girish Purohit
Email: sp2497@njit.edu

Textbooks:
Introduction to Machine Learning by Ethem Alpaydin (Not required but strongly recommended)
Learning with kernels by Scholkopf and Smola (Recommended)
Foundations of Machine Learning by Rostamizadeh, Talwalkar, and Mohri (Recommended)

Grading: 20% mid-term, 30% final exam, 15% course projects, 35% programming assignments
Grading instructions
Course Overview: This course is a hands-on introduction to machine learning and contains both theory and application. We will cover classification and regression algorithms in supervised learning such as naive Bayes, nearest neighbor, decision trees, random forests, linear regression, logistic regression, neural networks, and support vector machines. We will also cover dimensionality reduction, unsupervised learning (clustering), feature selection, kernel methods, hidden Markov models, gradient descent, big data methods, and representation learning. We will apply algorithms to solve problems on real data such as digit recognition, text document classification, and prediction of cancer and molecular activity.

Course plan:

Topic	Date	Notes
Introduction, Bayesian learning, and Python	01/22/20	Introduction Background Basic statistics More basic probability and statistics Applied statistics Linear algebra background More linear algebra Unix and login to NJIT machines Basic Unix command sheet Instructions for AFS login Textbook reading: All of chapter 1, 2.1, 2.4, 2.5, 2.6, 2.7
Bayesian learning	01/27/2020	Bayesian learning Bayesian decision theory example problem Textbook reading: 4.1 to 4.5, 5.1, 5.2, 5.4, 5.5
Python	01/29/2020	Python More on Python Python cheat sheet Python practice problems Python example 1 Python example 2 Python example 3
Nearest means and naive-bayes	02/03/2020	Nearest mean algorithm Naive Bayes algorithm Assignment 1 Predicted labels for naive bayes on breast cancer trainlabels.0 mean initialized to 0.01
Kernel nearest means	02/05/2020	Nearest means in Python (part 1) Nearest means in Python (part 2) Datasets Balanced error Balanced error in Perl Kernels More on kernels Kernel nearest means Script to compute average test error Script to compute average test error Textbook reading: 13.5, 13.6, 13.7
Separating hyperplanes and least squares	02/10/2020	Mean balanced cross-validation error on real data Hyperplanes as classifiers Least squares Textbook reading: 10.2, 10.3, 10.6, 11.2, 11.3, 11.5, 11.7
Multi-layer perceptrons	02/12/2020	Multi-layer perceptrons Assignment 2: Implement gradient descent for least squares Predicted labels for least squares ionosphere trainlabels.0 training, eta=.0001, stop=.001 Least squares in Perl Approximations by superpositions of sigmoidal functions (Cybenko 1989) Approximation Capabilities of Multilayer Feedforward Networks (Hornik 1991) The expressive power of neural networks: A view from the width (Lu et. al. 2017)
Support vector machines	02/17/2020	Textbook reading: 13.1 to 13.3 Support vector machines Assignment 3: Implement hinge loss gradient descent Predicted labels for hinge loss on ionosphere trainlabels.0 training, eta=.001, stop=.001 Efficiency of coordinate descent methods on huge-scale optimization problems Hardness of separating hyperplanes Learning Linear and Kernel Predictors with the 01 Loss Function
More on kernels	02/19/2020	Kernels Multiple kernel learning by Lanckriet et. al. Multiple kernel learning by Gonen and Alpaydin
Logistic regression	02/24/20	Assignment 3b: Implement SVM (hinge+regularizer) gradient descent Logistic regression Solver for regularized risk minimization Textbook reading: 10.7 Assignment 4: Implement logistic discrimination algorithm Predicted labels for logistic on climate trainlabels.0 training, eta=.001, stop=.001
Empirical and regularized risk minimization	02/26/20	Empirical risk minimization Regularized risk minimization Regularization and overfitting Solver for regularized risk minimization Advanced topics Convexity, classification, and risk bounds Does Distributionally Robust Supervised Learning Give Robust Classifiers? Curriculum Loss: Robust Learning and Generalization against Label Corruption Adversarial Machine Learning at Scale Revisiting Adversarial Risk Distributionally Robust Optimization: A Review Workshop on Distributionally Robust Optimization Efficient Stochastic Gradient Descent for Distributionally Robust Learning
Mid-term exam review	03/02/20	Midterm exam review sheet
Mid-term exam	03/04/20
Feature selection	03/09/20	Assignment 5: Adaptive step size for least squares and hinge Feature selection Feature selection (additional notes) NIPS 2003 feature selection contest Contest website Challenge results Challenge results II A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets Feature selection with SVMs and F-score Ranking genomic causal variants with chi-square and SVM
Dimensionality reduction	03/11/20, 03/23/20	Unsupervised dimensionality reduction Dimensionality reduction (additional notes) Proof of JL Lemma Textbook reading: Chapter 6 sections 6.1, 6.3, and 6.6 Course project 1 Training dataset Training labels Test dataset Python function to cross validate linear SVM C
Dimensionality reduction	03/25/20	Supervised dimensionality reduction Maximum margin criterion Laplacian linear discriminant analysis
Decision trees, bagging, boosting, and stacking	03/30/20	Decision trees, bagging, boosting, and stacking Decision trees (additional notes) Ensemble methods (additional notes) Assignment 6: Implement a decision stump in Python Neural Network Ensembles Univariate vs. multivariate trees Gradient boosted trees: Slides by Tianqi Chen Textbook reading: Chapters 9 and 17 sections 9.2, 17.4, 17.6, 17.7
Ensemble methods, random projections, and stacking	04/01/20	Stacking Random projections in dimensionality reduction Assignment 7: Implement a bagged decision stump in Python
Regression	04/06/20, 04/08/20	Regression Textbook reading: Chapter 4 section 4.6, Chapter 10 section 10.8, Chapter 13 section 13.10
Unsupervised learning - clustering	04/13/20	Clustering Assignment 8: Implement k-means clustering in Python Tutorial on spectral clustering K-means via PCA Convergence properties of k-means Textbook reading: Chapter 7 sections 7.1, 7.3, 7.7, and 7.8
Clustering	04/15/20
Feature learning, representation learning	04/20/20	Extreme learning machines Random Bits Regression: a Strong General Predictor for Big Data Exploring classification, clustering, and its limits in a compressed hidden space of a single layer neural network with random weights Learning Feature Representations with K-means Analysis of single-layer networks in unsupervised feature learning On Random Weights and Unsupervised Feature Learning A k-means based feature learning method for protein sequence classification Feature learning with k-means Course project 2 Random hyperplanes Predicted labels of ionosphere on trainlabels.0 in the new feature space of 10K features (error=5.5%) Results with random hyperplanes
Time series data, text document classification, and other topics		Time series methods Course project 3 Project 3 Weekly sales transaction dataset Text encoding Project 4 (extra credit) Spam train Spam test Python regular expressions Perl regular expressions Word tagging with nltk Semi-supervised and self-supervised classification Missing data (A study on missing data methods)
Hidden Markov models		Hidden Markov models Textbook reading: Chapter 15 (all of it)
Big data		Big data Mini-batch k-means Stochastic gradient descent Towards Optimal One Pass Large Scale Learning with Averaged Stochastic Gradient Descent Mapreduce for machine learning on multi-core
Comparison of classifiers and big data, ROC, multiclass, statistical significance in comparing classifiers		Comparing classifiers ROC area under curve Multiclass (Multiclass: one vs all) Statistical signficance Comparison of classifiers Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? An Empirical Comparison of Supervised Learning Algorithms Statistical Comparisons of Classifiers over Multiple Data Sets
Some advanced topics and papers		Classification boundaries(Code) Convolutional neural networks for image recognition Gradient based learning applied in document recognition Representation learning Geometrical and Statistical properties of systems of linear inequalities with applications in pattern recognition (Cover 1965) ImageNet classification with deep neural networks (Krizhevsky et. al. 2012) Random projections preserve margin Random projections preserve margin II Python Image Library
Final review		Review of most things covered in the course Final exam for review sheet
Final	TBA