CS 675

CS 675: Introduction to Machine learning
Fall 2019

Instructor: Usman Roshan
Office: GITC 4214B
Ph: 973-596-2872
Email: usman@njit.edu
Graders with office hours:

Kalp Dalal, kdd32@njit.edu, Tue 4-6 GITC 4321
Smit, sp2497@njit.edu, Wed 4-6 GITC 4321
Pooja, pk534@njit.edu, Mon 4:30-5:30, Th 3:30-4:30 GITC 4321
Soumya, sb2356@njit.edu, Tue 12-1, 6-7 GITC 4321
Kishan, kpp73@njit.edu, Tue 10:30-12:30 GITC 4321

Textbooks:
Introduction to Machine Learning by Ethem Alpaydin (Not required but strongly recommended)
Learning with kernels by Scholkopf and Smola (Recommended)
Foundations of Machine Learning by Rostamizadeh, Talwalkar, and Mohri (Recommended)

Grading: 20% mid-term, 30% final exam, 15% course projects, 35% programming assignments
Course Overview: This course is a hands-on introduction to machine learning and contains both theory and application. We will cover classification and regression algorithms in supervised learning such as naive Bayes, nearest neighbor, decision trees, random forests, linear regression, logistic regression, neural networks, and support vector machines. We will also cover dimensionality reduction, unsupervised learning (clustering), feature selection, kernel methods, hidden Markov models, gradient descent, big data methods, and representation learning. We will apply algorithms to solve problems on real data such as digit recognition, text document classification, and prediction of cancer and molecular activity.

Course plan:

Topic	Date	Notes
Introduction, Bayesian learning, and Python	09/04/19	Introduction Background Basic statistics More basic probability and statistics Applied statistics Linear algebra background More linear algebra Unix and login to NJIT machines Basic Unix command sheet Instructions for AFS login Textbook reading: All of chapter 1, 2.1, 2.4, 2.5, 2.6, 2.7
Bayesian learning	09/09/2019	Bayesian learning Bayesian decision theory example problem Textbook reading: 4.1 to 4.5, 5.1, 5.2, 5.4, 5.5
Python	09/11/2019	Python More on Python Python cheat sheet Python practice problems Python example 1 Python example 2 Python example 3
Nearest means and naive-bayes	09/16/2019	Nearest mean algorithm Naive Bayes algorithm Assignment 1 Predicted labels for naive bayes on breast cancer trainlabels.0 mean initialized to 0.01
Kernel nearest means	09/16/2019	Nearest means in Python (part 1) Nearest means in Python (part 2) Datasets Balanced error Balanced error in Perl Kernels More on kernels Kernel nearest means Script to compute average test error Script to compute average test error Textbook reading: 13.5, 13.6, 13.7
Separating hyperplanes and least squares	09/18/2019	Mean balanced cross-validation error on real data Hyperplanes as classifiers Least squares Textbook reading: 10.2, 10.3, 10.6, 11.2, 11.3, 11.5, 11.7
Multi-layer perceptrons	09/25/2019	Multi-layer perceptrons Assignment 2: Implement gradient descent for least squares Predicted labels for least squares ionosphere trainlabels.0 training, eta=.0001, stop=.001 Least squares in Perl Approximations by superpositions of sigmoidal functions (Cybenko 1989) Approximation Capabilities of Multilayer Feedforward Networks (Hornik 1991) The expressive power of neural networks: A view from the width (Lu et. al. 2017)
Support vector machines		Textbook reading: 13.1 to 13.3 Support vector machines Assignment 3: Implement hinge loss gradient descent Predicted labels for hinge loss on ionosphere trainlabels.0 training, eta=.001, stop=.001 Efficiency of coordinate descent methods on huge-scale optimization problems Hardness of separating hyperplanes Learning Linear and Kernel Predictors with the 01 Loss Function
More on kernels		Kernels Multiple kernel learning by Lanckriet et. al. Multiple kernel learning by Gonen and Alpaydin
Logistic regression	10/7/19	Logistic regression Solver for regularized risk minimization Textbook reading: 10.7 Assignment 4: Implement logistic discrimination algorithm Predicted labels for logistic on climate trainlabels.0 training, eta=.001, stop=.001
Empirical and regularized risk minimization	10/9/19	Empirical risk minimization Regularized risk minimization Regularization and overfitting Solver for regularized risk minimization Advanced topics Convexity, classification, and risk bounds Does Distributionally Robust Supervised Learning Give Robust Classifiers? Curriculum Loss: Robust Learning and Generalization against Label Corruption Revisiting Adversarial Risk Distributionally Robust Optimization: A Review Workshop on Distributionally Robust Optimization Efficient Stochastic Gradient Descent for Distributionally Robust Learning
Mid-term exam review	10/14/19	Midterm exam review sheet
Mid-term exam	10/16/19
Feature selection	10/21/19	Assignment 5: Adaptive step size for least squares and hinge Feature selection Feature selection (additional notes) NIPS 2003 feature selection contest Contest website Challenge results Challenge results II A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets Feature selection with SVMs and F-score Ranking genomic causal variants with chi-square and SVM
Dimensionality reduction	10/23/19	Unsupervised dimensionality reduction Dimensionality reduction (additional notes) Proof of JL Lemma Textbook reading: Chapter 6 sections 6.1, 6.3, and 6.6 Course project 1 Training dataset Training labels Test dataset Python function to cross validate linear SVM C
Dimensionality reduction	10/28/19	Supervised dimensionality reduction Maximum margin criterion Laplacian linear discriminant analysis
Decision trees, bagging, boosting, and stacking	10/30/19	Decision trees, bagging, boosting, and stacking Decision trees (additional notes) Ensemble methods (additional notes) Assignment 6: Implement a decision stump in Python Univariate vs. multivariate trees Gradient boosted trees: Slides by Tianqi Chen Textbook reading: Chapters 9 and 17 sections 9.2, 17.4, 17.6, 17.7
Ensemble methods, random projections, and stacking	11/06/19	Stacking Random projections in dimensionality reduction Assignment 7: Implement a bagged decision stump in Python
Regression	11/11/19	Regression Textbook reading: Chapter 4 section 4.6, Chapter 10 section 10.8, Chapter 13 section 13.10
Unsupervised learning - clustering	11/13/19	Clustering Assignment 8: Implement k-means clustering in Python Tutorial on spectral clustering K-means via PCA Convergence properties of k-means Textbook reading: Chapter 7 sections 7.1, 7.3, 7.7, and 7.8
Clustering	11/18/19
Clustering	11/20/19
Feature learning	11/25/19	Extreme learning machines Random Bits Regression: a Strong General Predictor for Big Data Exploring classification, clustering, and its limits in a compressed hidden space of a single layer neural network with random weights Learning Feature Representations with K-means Analysis of single-layer networks in unsupervised feature learning On Random Weights and Unsupervised Feature Learning Feature learning with k-means Course project 2 Random hyperplanes Predicted labels of ionosphere on trainlabels.0 in the new feature space of 10K features (error=5.5%) Results with random hyperplanes
Time series data, text document classification, and other topics	12/02/19	Time series methods Text encoding Weekly sales transaction dataset (Time series contest) Semi-supervised and self-supervised classification Missing data (A study on missing data methods)
Hidden Markov models	12/04/19	Hidden Markov models Textbook reading: Chapter 15 (all of it)
Big data	12/09/19	Big data Mini-batch k-means Stochastic gradient descent Mapreduce for machine learning on multi-core
Comparison of classifiers and big data, ROC, multiclass, statistical significance in comparing classifiers	12/11/19	Comparing classifiers ROC area under curve Multiclass (Multiclass: one vs all) Statistical signficance Comparison of classifiers Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? An Empirical Comparison of Supervised Learning Algorithms Statistical Comparisons of Classifiers over Multiple Data Sets
Some advanced topics and papers		Classification boundaries(Code) Convolutional neural networks for image recognition Gradient based learning applied in document recognition Representation learning Geometrical and Statistical properties of systems of linear inequalities with applications in pattern recognition (Cover 1965) ImageNet classification with deep neural networks (Krizhevsky et. al. 2012) Random projections preserve margin Random projections preserve margin II Python Image Library
Final review		Review of most things covered in the course Final exam for review sheet
Final	TBA