CS 675

CS 675: Introduction to Machine learning
Fall 2017

Instructor: Usman Roshan
Office: GITC 4207
Ph: 973-596-2872
Office hours: MW 1-2:30, Tue 1-3
Email: usman@njit.edu

TA: Yunzhe Xue
Office hours:
Email: yx277@njit.edu

Textbooks:
Introduction to Machine Learning by Ethem Alpaydin (Not required but strongly recommended)
Learning with kernels by Scholkopf and Smola (Recommended)
Foundations of Machine Learning by Rostamizadeh, Talwalkar, and Mohri (Recommended)

Grading: 25% mid-term, 30% final exam, 15% course project, 30% programming assignments
Course Overview: This course is a hands-on introduction to machine learning and contains both theory and application. We will cover classification and regression algorithms in supervised learning such as naive Bayes, nearest neighbor, decision trees, random forests, linear regression, logistic regression, neural networks, and support vector machines. We will also cover dimensionality reduction, unsupervised learning (clustering), feature selection, kernel methods, hidden Markov models, gradient descent, big data methods, and representation learning. We will apply algorithms to solve problems on real data such as digit recognition, text document classification, and prediction of cancer and molecular activity.

Course plan:

Topic	Date	Notes
Introduction, Bayesian learning, and Python	09/06/2017	Introduction Background Basic statistics More basic probability and statistics Applied statistics Linear algebra background More linear algebra Unix and login to NJIT machines Basic Unix command sheet Instructions for AFS login Textbook reading: All of chapter 1, 2.1, 2.4, 2.5, 2.6, 2.7
Bayesian learning	09/11/2017	Bayesian learning Bayesian decision theory example problem Textbook reading: 4.1 to 4.5, 5.1, 5.2, 5.4, 5.5
Python	09/13/2017	Python More on Python Python cheat sheet Python practice problems Python example 1 Python example 2 Python example 3
Nearest means and naive-bayes	09/18/2017	Nearest mean algorithm Nearest means (part I) Nearest means (part II) Naive Bayes algorithm Assignment 1
Kernel nearest means	09/20/2017	Datasets Balanced error Balanced error in Perl Kernels More on kernels Kernel nearest means Script to compute average test error Textbook reading: 13.5, 13.6, 13.7
Separating hyperplanes and least squares	09/25/2017	Hyperplanes as classifiers Least squares Textbook reading: 10.2, 10.3, 10.6, 11.2, 11.3, 11.5, 11.7
Multi-layer perceptrons	09/27/2017	Multi-layer perceptrons Assignment 2: Implement gradient descent for least squares Predicted labels for ionosphere trainlabels.0 training and eta=.0001 Least squares in Perl
Support vector machines	10/02/2017	Textbook reading: 13.1 to 13.3 Support vector machines Assignment 3: Implement hinge loss gradient descent Efficiency of coordinate descent methods on huge-scale optimization problems Hardness of separating hyperplanes Learning Linear and Kernel Predictors with the 01 Loss Function
More on kernels	10/04/2017	Kernels Multiple kernel learning by Lanckriet et. al. Multiple kernel learning by Gonen and Alpaydin
Logistic regression	10/09/2017	Logistic regression Solver for regularized risk minimization Textbook reading: 10.7 Assignment 4: Implement logistic discrimination algorithm
Empirical and regularized risk minimization	10/11/2017	Empirical risk minimization Regularized risk minimization Regularization and overfitting Solver for regularized risk minimization
Mid-term exam review	10/16/2017	Midterm exam review sheet
Mid-term exam	10/18/2017
Feature selection	10/23/2017	Feature selection Feature selection (additional notes) NIPS 2003 feature selection contest Contest website Challenge results Challenge results II A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets Feature selection with SVMs and F-score Ranking genomic causal variants with chi-square and SVM
Dimensionality reduction	10/25/2017	Unsupervised dimensionality reduction Dimensionality reduction (additional notes) Proof of JL Lemma Textbook reading: Chapter 6 sections 6.1, 6.3, and 6.6 Course project Training dataset Training labels Test dataset
Dimensionality reduction	10/30/2017	Supervised dimensionality reduction Maximum margin criterion Laplacian linear discriminant analysis
Decision trees, bagging, boosting, and stacking	11/01/2017	Decision trees, bagging, boosting, and stacking Decision trees (additional notes) Ensemble methods (additional notes) Assignment 5: Implement a decision stump in Python Univariate vs. multivariate trees Gradient boosted trees: Slides by Tianqi Chen Textbook reading: Chapters 9 and 17 sections 9.2, 17.4, 17.6, 17.7
Ensemble methods, random projections, and stacking	11/06/2017	Stacking Random projections in dimensionality reduction Assignment 6: Implement a bagged decision stump in Python
Regression	11/08/2017	Regression Textbook reading: Chapter 4 section 4.6, Chapter 10 section 10.8, Chapter 13 section 13.10
Unsupervised learning - clustering	11/13/2017	Clustering Assignment 7: Implement k-means clustering in Python Tutorial on spectral clustering K-means via PCA Convergence properties of k-means Textbook reading: Chapter 7 sections 7.1, 7.3, 7.7, and 7.8
Clustering	11/15/2017
Clustering	11/20/2017
Feature learning	11/27/2017	Random Bits Regression: a Strong General Predictor for Big Data Learning Feature Representations with K-means Analysis of single-layer networks in unsupervised feature learning On Random Weights and Unsupervised Feature Learning Feature learning with k-means Assignment 8 (optional extra credit) Random hyperplanes
Hidden Markov models	11/29/2017	Hidden Markov models Textbook reading: Chapter 15 (all of it)
Convolutional neural networks and multi-class classification	12/04/2017	Convolutional neural networks for image recognition Gradient based learning applied in document recognition
Comparison of classifiers and big data	12/06/2017	ROC area under curve Comparison of classifiers Do we Need Hundreds of Classifiers to Solve Real World Classification Problems? An Empirical Comparison of Supervised Learning Algorithms Statistical Comparisons of Classifiers over Multiple Data Sets Big data Mini-batch k-means Stochastic gradient descent Mapreduce for machine learning on multi-core
Some advanced topics and papers	12/11/2017	Representation learning Geometrical and Statistical properties of systems of linear inequalities with applications in pattern recognition (Cover 1965) Approximations by superpositions of sigmoidal functions (Cybenko 1989) Approximation Capabilities of Multilayer Feedforward Networks (Hornik 1991) ImageNet classification with deep neural networks (Krizhevsky et. al. 2012) Random projections preserve margin Random projections preserve margin II Python Image Library
Review for final	12/13/2016	Final exam for review sheet