DS 675

DS 675: Machine Learning
Spring 2025

Instructor: Usman Roshan
Office: GITC 2106
Office Hours:M 4:45-5:25, T 1:45-2:25, W 1:45-2:25, Th 1:45-2:25
Ph: 973-596-2872
Email: usman@njit.edu

TA: Mugunthan Shandirasegaran
Email: ms3537@njit.edu

Textbooks:
Introduction to Machine Learning by Ethem Alpaydin (Not required but strongly recommended)
Learning with kernels by Scholkopf and Smola (Recommended)
Foundations of Machine Learning by Rostamizadeh, Talwalkar, and Mohri (Recommended)

Grading: 35% mid-term, 65% course project

Course Overview: This course is a hands-on introduction to machine learning and contains both theory and application. We will cover classification and regression algorithms in supervised learning such as naive Bayes, nearest neighbor, decision trees, random forests, linear regression, logistic regression, neural networks, and support vector machines. We will also cover dimensionality reduction, unsupervised learning (clustering), feature selection, kernel methods, hidden Markov models, gradient descent, big data methods, and representation learning. We will apply algorithms to solve problems on real data such as digit recognition, text document classification, and prediction of cancer and molecular activity.

Deadlines and exam dates Project one page: Feb 17th, Exam: April 7th, Final project powerpoint: May 5th

Course material:

Topic	Date	Notes
Linear modeling		Background Basic statistics Linear algebra background Linear models Least squares notes Least squares gradient descent algorithm Regularization Stochastic gradient descent pseudocode Stochastic gradient descent (original paper)
Neural networks		Multilayer perceptrons Basic single hidden layer neural network Back propagation Approximations by superpositions of sigmoidal functions (Cybenko 1989) Approximation Capabilities of Multilayer Feedforward Networks (Hornik 1991) The Power of Depth for Feedforward Neural Networks (Eldan and Shamir 2016) The expressive power of neural networks: A view from the width (Lu et. al. 2017) Convolution and single layer neural networks objective and optimization Softmax and cross-entropy loss Relu activation single layer neural networks objective and optimization Multi layer neural network objective and optimization.pdf Image localization and segmentation
Machine learning - running linear models in Python scikit-learn		Scikit learn linear models Scikit learn support vector machines SVM in Python scikit-learn Breast cancer training Breast cancer test Linear data Non linear data
Cross validation and balanced accuracy		Cross validation Training vs. validation accuracy Balanced error
Deep learning - running neural networks in Scikit-learn		Scikit-learn MLPClassifier Scikit-learn MLP code
Multiclass classification - linear models and neural networks		Multiclass classification Different multiclass methods One-vs-all method Tree-based multiclass Multiclass neural network softmax objective
Deep learning - running neural networks in Keras on tabular data		Categorical variables One hot encoding in scikit-learn Keras multilayer perceptron on tabular data Keras multilayer perceptron on tabular data with feature spaces
Convolutions and image classification		Image classification code Convolutions Popular convolutions in image processing Convolutions (additional notes) Convolutions - example 1 Convolutions - example 2 Convolutions - example 3 Convolutions - example 4 Popular convolutions in image processing Convolutional neural network (Additional slides by Yunzhe Xue) Convolution and single layer neural networks objective and optimization Training and designing convolutional neural networks Flower image classification with CNNs code
Neural networks gradient descent, optimization, batch normalization, common architectures, data augmentation		Optimization in neural networks Stochastic gradient descent pseudocode Stochastic gradient descent (original paper) Image classification code v2 Batch normalization Batch normalization paper How does batch normalization help optimization Gradient descent optimization An overview of gradient descent optimization algorithms On training deep networks The Loss Surfaces of Multilayer Networks Common architectures Transfer learning by Yunzhe Xue Transfer learning in Keras Pre-trained models in Keras Understanding data augmentation for classification SMOTE: Synthetic Minority Over-sampling Technique Dataset Augmentation in Feature Space Improved Regularization of Convolutional Neural Networks with Cutout
Kernels		Kernels More on kernels
Logistic regression		Logistic regression
Empirical and regularized risk minimization		Empirical risk minimization Regularized risk minimization Regularization and overfitting
Support vector machine		Support vector machines
Decision trees and random forests		Decision trees, bagging, boosting, and stacking Decision trees (additional notes) Ensemble methods (additional notes)
Feature selection		Feature selection Feature selection (additional notes)
Dimensionality reduction		Dimensionality reduction
Clustering		Clustering
Maximum likelihood		Bayesian learning
Autoencoders		Generative models and networks Autoencoder