Topic
|
Date
|
Notes
|
Introduction, Bayesian learning, and Python
|
|
Introduction
Background
Unix and login to NJIT machines
|
Bayesian learning
|
|
Bayesian learning
Bayesian decision theory example problem
Textbook reading: 4.1 to 4.5, 5.1, 5.2, 5.4, 5.5
|
Python
|
|
Python
More on Python
Python cheat sheet
Python practice problems
Python example 1
Python example 2
Python example 3
|
Nearest means and naive-bayes
|
|
Nearest mean algorithm
Naive Bayes algorithm
Practice problem 1
Predicted labels for naive bayes on breast cancer
trainlabels.0 mean initialized to 0.01
|
Kernel nearest means
|
|
Nearest means in Python (part 1)
Nearest means in Python (part 2)
Datasets
Balanced error
Balanced error in Perl
Kernels
More on kernels
Kernel nearest means
Script to compute average test error
Script to compute average test error
Textbook reading: 13.5, 13.6, 13.7
|
Separating hyperplanes and least squares
|
|
Mean balanced cross-validation error on real data
Hyperplanes as classifiers
Least squares
Textbook reading: 10.2, 10.3, 10.6, 11.2, 11.3, 11.5, 11.7
Project 1
Project 1 template code as a starting point
Datasets format for project
|
Multi-layer perceptrons
|
|
Multi-layer perceptrons
Practice problem 2: Implement gradient descent for least
squares
Least squares output for toy data with
seed=10
Predicted labels for least squares ionosphere trainlabels.0 training, eta=.0001, stop=.001
Objective values for least squares gradient descent on ionosphere trainlabels.0 training, eta=.0001, stop=.001
Least squares in Perl
Approximations by superpositions of sigmoidal functions (Cybenko 1989)
Approximation Capabilities of Multilayer Feedforward Networks (Hornik 1991)
The expressive power of neural networks: A view from the width (Lu et. al. 2017)
|
Support vector machines
|
|
Textbook reading: 13.1 to 13.3
Support vector machines
Practice problem 3: Implement hinge loss gradient descent
Predicted labels for hinge loss on ionosphere trainlabels.0 training, eta=.001, stop=.001
Objective values for hinge loss gradient descent on ionosphere trainlabels.0 training, eta=.001, stop=.001
Efficiency of coordinate
descent methods on huge-scale optimization problems
Hardness of separating hyperplanes
Learning Linear and Kernel
Predictors with the 01 Loss Function
|
More on kernels
|
|
Kernels
Multiple kernel learning by Lanckriet et. al.
Multiple kernel learning by Gonen and Alpaydin
|
Logistic regression
|
|
Regularization and overfitting
Logistic regression
Textbook reading: 10.7
Practice problem 4: Implement logistic
discrimination algorithm
Predicted labels for logistic on climate trainlabels.0 training, eta=.001, stop=.001
|
Empirical and regularized risk minimization
|
|
Practice problem 5: Adaptive step size for hinge
loss
Empirical risk minimization
Regularized risk minimization
Solver for regularized risk minimization
|
Mid-term exam review
|
|
Midterm exam review sheet
|
Mid-term exam
|
|
|
Feature selection
|
|
Feature selection
Feature selection (additional notes)
A comparison of univariate and multivariate gene selection techniques for classification of cancer datasets
Feature selection with SVMs and F-score
Ranking genomic
causal variants with chi-square and SVM
Feature selection exercise
Training dataset
Training labels
Test dataset
Python function to cross validate linear SVM C
|
Dimensionality reduction
|
|
Unsupervised dimensionality reduction
Dimensionality reduction (additional notes)
Proof of JL Lemma
Random projections in dimensionality reduction
Textbook reading: Chapter 6 sections 6.1, 6.3, and 6.6
|
Dimensionality reduction
|
|
Supervised dimensionality reduction
Maximum margin criterion
Laplacian linear discriminant analysis
|
Decision trees, bagging, boosting, and stacking
|
|
Decision trees, bagging, boosting, and stacking
Decision trees (additional notes)
Ensemble methods (additional notes)
Practice problem 6: Implement a decision stump in
Python
Neural Network Ensembles
Univariate vs. multivariate trees
Gradient boosted trees: Slides by Tianqi Chen
Textbook reading: Chapters 9 and 17 sections 9.2, 17.4, 17.6, 17.7
|
Ensemble methods, random projections, and stacking
|
|
Stacking
Practice problem 7: Implement a bagged
decision
stump in Python
|
Regression
|
|
Regression
Textbook reading: Chapter 4 section 4.6, Chapter 10 section 10.8, Chapter 13 section 13.10
|
Unsupervised learning - clustering
|
|
Clustering
Practice problem 8: Implement k-means clustering in
Python
Tutorial on spectral clustering
K-means via PCA
Convergence properties of k-means
Textbook reading: Chapter 7 sections 7.1, 7.3, 7.7, and 7.8
|
Clustering
|
|
|
Feature learning, representation learning
|
|
Extreme learning machines
Random Bits Regression: a Strong General Predictor for Big
Data
Exploring classification, clustering, and its limits in a compressed hidden space of a single
layer neural network with random weights
Learning Feature Representations
with K-means
Analysis of single-layer networks in unsupervised feature learning
On Random Weights and Unsupervised Feature Learning
A k-means based feature learning method for protein sequence
classification
Feature learning with k-means
Project 2
Predicted labels of ionosphere on trainlabels.0 in the new feature space of
10K features (error=5.5%)
Results with random hyperplanes
|
Time series data, text document classification, and other topics
|
|
Time series methods
Time series exercise
Time series exercise
Weekly sales transaction
dataset
Text encoding
Python regular
expressions
Perl regular
expressions
Word tagging with nltk
Semi-supervised and self-supervised classification
Missing data (A study on missing data methods)
|
Hidden Markov models
|
|
Hidden Markov models
Textbook reading: Chapter 15 (all of it)
|
Big data
|
|
Big data
Mini-batch k-means
Stochastic gradient descent
Towards Optimal One Pass Large
Scale Learning with Averaged Stochastic Gradient Descent
Mapreduce for machine
learning on multi-core
|
Comparison of classifiers and big data, ROC, multiclass,
statistical significance in comparing classifiers
|
|
Comparing classifiers
Comparison of classifiers
Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?
An Empirical Comparison of Supervised Learning Algorithms
Statistical Comparisons of Classifiers over Multiple Data Sets
|
Robust machine learning
|
|
Classification
boundaries(Code)
Robust machine learning (notes in google drive folder)
Extra credit assignment
MNIST 0 vs 1 train (input to your program)
MNIST 0 vs 1 trainlabels (input to your
program)
MNIST 0 vs 1 test
MNIST 0 vs 1 testlabels
MNIST 0 vs 1 fog corruption
MNIST 0 vs 1 brightness corruption
MNIST 0 vs 1 stripe corruption
MNIST 0 vs 1 scale corruption
MNIST 0 vs 1 translate corruption
MNIST 0 vs 1 corruption labels
|
Final review
|
|
Review of most things covered in the course
Final exam for review sheet
|
Final
|
TBA
|
|