CS 675: Introduction to Machine learning
Fall 2017

Instructor: Usman Roshan
Office: GITC 4207
Ph: 973-596-2872
Office hours: MW 1-2:30, Tue 1-3
Email: usman@njit.edu

TA: Yunzhe Xue
Office hours:
Email: yx277@njit.edu

Textbooks:
Introduction to Machine Learning by Ethem Alpaydin (Not required but strongly recommended)
Learning with kernels by Scholkopf and Smola (Recommended)
Foundations of Machine Learning by Rostamizadeh, Talwalkar, and Mohri (Recommended)


Grading: 25% mid-term, 30% final exam, 15% course project, 30% programming assignments
Course Overview: This course is a hands-on introduction to machine learning and contains both theory and application. We will cover classification and regression algorithms in supervised learning such as naive Bayes, nearest neighbor, decision trees, random forests, linear regression, logistic regression, neural networks, and support vector machines. We will also cover dimensionality reduction, unsupervised learning (clustering), feature selection, kernel methods, hidden Markov models, gradient descent, big data methods, and representation learning. We will apply algorithms to solve problems on real data such as digit recognition, text document classification, and prediction of cancer and molecular activity.

Course plan:

Topic
Date
Notes
Introduction, Bayesian learning, and Python
09/06/2017
Introduction
Background Unix and login to NJIT machines
Bayesian learning
09/11/2017
Bayesian learning
Bayesian decision theory example problem
Textbook reading: 4.1 to 4.5, 5.1, 5.2, 5.4, 5.5
Python
09/13/2017
Python
More on Python
Python cheat sheet
Python practice problems
Python example 1
Python example 2
Python example 3
Nearest means and naive-bayes
09/18/2017
Nearest mean algorithm
Nearest means (part I)
Nearest means (part II)
Naive Bayes algorithm
Assignment 1
Kernel nearest means
09/20/2017
Datasets
Balanced error
Balanced error in Perl
Kernels
More on kernels
Kernel nearest means
Script to compute average test error
Textbook reading: 13.5, 13.6, 13.7
Separating hyperplanes and least squares
09/25/2017
Hyperplanes as classifiers
Least squares
Textbook reading: 10.2, 10.3, 10.6, 11.2, 11.3, 11.5, 11.7
Multi-layer perceptrons
09/27/2017
Multi-layer perceptrons
Assignment 2: Implement gradient descent for least squares
Predicted labels for ionosphere trainlabels.0 training and eta=.0001
Least squares in Perl
Support vector machines
10/02/2017
Textbook reading: 13.1 to 13.3
Support vector machines
Assignment 3: Implement hinge loss gradient descent
Efficiency of coordinate descent methods on huge-scale optimization problems
Hardness of separating hyperplanes
Learning Linear and Kernel Predictors with the 01 Loss Function
More on kernels
10/04/2017
Kernels
Multiple kernel learning by Lanckriet et. al.
Multiple kernel learning by Gonen and Alpaydin
Logistic regression
10/09/2017
Logistic regression
Solver for regularized risk minimization
Textbook reading: 10.7
Assignment 4: Implement logistic discrimination algorithm
Empirical and regularized risk minimization
10/11/2017
Empirical risk minimization
Regularized risk minimization
Regularization and overfitting
Solver for regularized risk minimization
Mid-term exam review
10/16/2017
Midterm exam review sheet
Mid-term exam
10/18/2017
Feature selection
10/23/2017 Feature selection
Feature selection (additional notes)
A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets
Feature selection with SVMs and F-score
Ranking genomic causal variants with chi-square and SVM
Dimensionality reduction
10/25/2017 Unsupervised dimensionality reduction
Dimensionality reduction (additional notes)
Proof of JL Lemma
Textbook reading: Chapter 6 sections 6.1, 6.3, and 6.6
Course project
Training dataset
Training labels
Test dataset
Dimensionality reduction
10/30/2017 Supervised dimensionality reduction
Maximum margin criterion
Laplacian linear discriminant analysis
Decision trees, bagging, boosting, and stacking 11/01/2017 Decision trees, bagging, boosting, and stacking
Decision trees (additional notes)
Ensemble methods (additional notes)
Assignment 5: Implement a decision stump in Python
Univariate vs. multivariate trees
Gradient boosted trees: Slides by Tianqi Chen
Textbook reading: Chapters 9 and 17 sections 9.2, 17.4, 17.6, 17.7
Ensemble methods, random projections, and stacking 11/06/2017 Stacking
Random projections in dimensionality reduction
Assignment 6: Implement a bagged decision stump in Python
Regression 11/08/2017 Regression
Textbook reading: Chapter 4 section 4.6, Chapter 10 section 10.8, Chapter 13 section 13.10
Unsupervised learning - clustering
11/13/2017 Clustering
Assignment 7: Implement k-means clustering in Python
Tutorial on spectral clustering
K-means via PCA
Convergence properties of k-means
Textbook reading: Chapter 7 sections 7.1, 7.3, 7.7, and 7.8
Clustering
11/15/2017
Clustering
11/20/2017
Feature learning 11/27/2017 Random Bits Regression: a Strong General Predictor for Big Data
Learning Feature Representations with K-means
Analysis of single-layer networks in unsupervised feature learning
On Random Weights and Unsupervised Feature Learning
Feature learning with k-means

Assignment 8 (optional extra credit)
Random hyperplanes
Hidden Markov models 11/29/2017 Hidden Markov models
Textbook reading: Chapter 15 (all of it)
Convolutional neural networks and multi-class classification 12/04/2017 Convolutional neural networks for image recognition
Gradient based learning applied in document recognition
Comparison of classifiers and big data 12/06/2017 ROC area under curve
Comparison of classifiers
Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?
An Empirical Comparison of Supervised Learning Algorithms
Statistical Comparisons of Classifiers over Multiple Data Sets

Big data
Mini-batch k-means
Stochastic gradient descent
Mapreduce for machine learning on multi-core
Some advanced topics and papers
12/11/2017 Representation learning

Geometrical and Statistical properties of systems of linear inequalities with applications in pattern recognition (Cover 1965)
Approximations by superpositions of sigmoidal functions (Cybenko 1989)
Approximation Capabilities of Multilayer Feedforward Networks (Hornik 1991)
ImageNet classification with deep neural networks (Krizhevsky et. al. 2012)
Random projections preserve margin
Random projections preserve margin II

Python Image Library
Review for final 12/13/2016 Final exam for review sheet