CS 732: Advance Machine learning
Spring 2015

Instructor: Usman Roshan
Office: GITC 3802
Ph: 973-596-2872
Office hours: Tue and Thu 2:30 to 5
Email: usman@cs.njit.edu

Textbook: Not required
Grading: 20% programming assignments, 10% paper presentation, 70% project
Course Overview: This course will cover advance topics in machine learning. We will begin with the CUDA and OpenCL languages for parallel programming on Graphics Processing Units (GPUs) and OpenMP for multi-core programming. While machine learning and parallel programming are separate topics we include GPU and multi-core computing in this course because of their applications in machine learning on large datasets. We will then discuss recent papers in machine learning throughout the semester such as deep learning, representation learning, optimization algorithms, algorithms for big datasets, and advance Bayesian methods. Students will present recent papers and undertake a machine learning project. The project may be theoretical or experimental in nature. For example students may take on a challenge dataset from Kaggle or study theoretical error bounds of a contemporary classifier.

Course plan:

Topic
Date
Notes
Introduction to GPU computing
01/23/2015
Introduction
Basic Unix command sheet
Instructions for AFS login
Kaggle
CUDA exercise
01/30/2015
Chi8 (Makefile and sample script file include)
Singular value decomposition on GPU using CUDA
Fast support vector machine training and classification on graphics processors
GPU library for deep learning
CUDA exercise and OpenCL
02/06/2015
CUDA to OpenCL slides
libOpenCL.so (NVIDIA library file for OpenCL code)
OpenCL files
Simulated GWAS
Class labels for above data
Chi-square 2-df test in parallel on a GPU
OpenCL
02/13/2015
Assignment 1
Optimization
02/20/2015
Do we Need Hundreds of Classifiers to Solve Real World Classification Problems?
An Empirical Comparison of Supervised Learning Algorithms
Algorithms for Direct 01 Loss Optimization in Binary Classification
Optimization
02/27/2015
Greedy function approximation: a gradient boosting machine
Stochastic gradient boosting
More optimization and deep learning
03/06/2015
Deep learning slides by Payam
Deep learning tutorial with Python and Theano
Learning Feature Representations with K-means
Analysis of single-layer networks in unsupervised feature learning
Building high-level features using large-scale unsupervised learning
Optimization and deep learning
03/13/2015
Deep learning slides by Peter
Deep learning
03/27/2015
Deep learning slides for sentiment analysis by Chaoran
Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
C-SVDDNet: An Effective Single-Layer Network for Unsupervised Feature Learning
Results from a Semi-Supervised Feature Learning Competition
Good Friday
04/03/2015
MapReduce
04/10/2015
Assignment 2
Mapreduce slides by Wadood (and pdf version paper)
Representation Learning: A Review and New Perspectives
Map-Reduce for Machine Learning on Multicore
An Iterative MapReduce Approach to Frequent Subgraph Mining in Biological Datasets
Stochastic Gradient Boosted Distributed Decision Trees
C-SVDDNet and cuBLAS
04/17/2015
K-means deep learning slides by Ling and Ruihua
Concentration inequalities
Large-scale parallelized sparse principal component analysis
Parallel GPU Implementation of Iterative PCA Algorithms
Metagenomics and SVM
04/24/2015
Chi2 opencl implementation
Machine learning for metagenomics by Abdulrhman
Basic support vector machine by Mutthapa
SVM papers
Paper 1 Paper 2 Paper 3 Paper 4 Paper 5 Paper 6 Paper 7 Paper 8
05/01/2015
Wadood's solutions to assignments
Python deep learning by Peter
Projects 05/05/2015
Python deep learning II by Peter
sgescript for deep learning
PCA on CPU vs GPU by Han
Theory of VC bounds by Ruihuia
K-means deep learning by Ling
Projects 05/11/2015
Projects
Payam: Genomic deep learning
Peter: Python Theano deep learning traindata.gz testdata.gz trainlabels
Abdulrhman: Metagenomics and machine learning An Efficient Comparative Machine Learning-based Metagenomics Binning Technique Via Using Random Forest
Ling: C-SVDDNET feature learning
Chaoran: Web crawling for e-health
Han: PCA in CUDA (with cuBLAS)
Ruihua: Theoretical error bounds of the SVM
Wadood: MapReduce Slides
Muthupa: Basic support vector machine
Additional readings Decision and regression trees: Slides by Patrick Beheny
Regression trees: Slides by Cosma Shalizi
Boosted trees: Slides by Tianqi Chen