DS 675: Machine Learning
Spring 2025

Instructor: Usman Roshan
Office: GITC 2106
Office Hours:M 4:45-5:25, T 1:45-2:25, W 1:45-2:25, Th 1:45-2:25
Ph: 973-596-2872
Email: usman@njit.edu

TA: Mugunthan Shandirasegaran
Email: ms3537@njit.edu

Textbooks:
Introduction to Machine Learning by Ethem Alpaydin (Not required but strongly recommended)
Learning with kernels by Scholkopf and Smola (Recommended)
Foundations of Machine Learning by Rostamizadeh, Talwalkar, and Mohri (Recommended)

Grading: 35% mid-term, 65% course project

Course Overview: This course is a hands-on introduction to machine learning and contains both theory and application. We will cover classification and regression algorithms in supervised learning such as naive Bayes, nearest neighbor, decision trees, random forests, linear regression, logistic regression, neural networks, and support vector machines. We will also cover dimensionality reduction, unsupervised learning (clustering), feature selection, kernel methods, hidden Markov models, gradient descent, big data methods, and representation learning. We will apply algorithms to solve problems on real data such as digit recognition, text document classification, and prediction of cancer and molecular activity.

Deadlines and exam dates Project one page: Feb 17th, Exam: April 7th, Final project powerpoint: May 5th

Course material:


Topic
Date
Notes
Linear modeling
Background Linear models
Least squares notes
Least squares gradient descent algorithm

Regularization

Stochastic gradient descent pseudocode
Stochastic gradient descent (original paper)
Neural networks
Multilayer perceptrons
Basic single hidden layer neural network
Back propagation

Approximations by superpositions of sigmoidal functions (Cybenko 1989)
Approximation Capabilities of Multilayer Feedforward Networks (Hornik 1991)
The Power of Depth for Feedforward Neural Networks (Eldan and Shamir 2016)
The expressive power of neural networks: A view from the width (Lu et. al. 2017)

Convolution and single layer neural networks objective and optimization
Softmax and cross-entropy loss
Relu activation single layer neural networks objective and optimization
Multi layer neural network objective and optimization.pdf

Image localization and segmentation
Machine learning - running linear models in Python scikit-learn
Scikit learn linear models
Scikit learn support vector machines
SVM in Python scikit-learn
Breast cancer training
Breast cancer test
Linear data
Non linear data
Cross validation and balanced accuracy
Cross validation
Training vs. validation accuracy
Balanced error
Deep learning - running neural networks in Scikit-learn
Scikit-learn MLPClassifier
Scikit-learn MLP code
Multiclass classification - linear models and neural networks
Multiclass classification
Different multiclass methods
One-vs-all method
Tree-based multiclass
Multiclass neural network softmax objective
Deep learning - running neural networks in Keras on tabular data
Categorical variables
One hot encoding in scikit-learn
Keras multilayer perceptron on tabular data
Keras multilayer perceptron on tabular data with feature spaces
Convolutions and image classification
Image classification code
Convolutions
Popular convolutions in image processing

Convolutions (additional notes)
Convolutions - example 1
Convolutions - example 2
Convolutions - example 3
Convolutions - example 4
Popular convolutions in image processing

Convolutional neural network (Additional slides by Yunzhe Xue)
Convolution and single layer neural networks objective and optimization
Training and designing convolutional neural networks

Flower image classification with CNNs code
Neural networks gradient descent, optimization,
batch normalization, common architectures, data augmentation
Optimization in neural networks
Stochastic gradient descent pseudocode
Stochastic gradient descent (original paper)

Image classification code v2

Batch normalization
Batch normalization paper
How does batch normalization help optimization

Gradient descent optimization
An overview of gradient descent optimization algorithms

On training deep networks
The Loss Surfaces of Multilayer Networks

Common architectures

Transfer learning by Yunzhe Xue
Transfer learning in Keras
Pre-trained models in Keras

Understanding data augmentation for classification
SMOTE: Synthetic Minority Over-sampling Technique
Dataset Augmentation in Feature Space
Improved Regularization of Convolutional Neural Networks with Cutout
Kernels
Kernels
More on kernels
Logistic regression
Logistic regression

Empirical and regularized risk minimization
Empirical risk minimization
Regularized risk minimization
Regularization and overfitting

Support vector machine
Support vector machines

Decision trees and random forests
Decision trees, bagging, boosting, and stacking
Decision trees (additional notes)
Ensemble methods (additional notes)

Feature selection
Feature selection
Feature selection (additional notes)
Dimensionality reduction
Dimensionality reduction

Clustering
Clustering

Maximum likelihood
Bayesian learning

Autoencoders
Generative models and networks
Autoencoder