CS 698: Current topics in data science
Spring 2017

Instructor: Usman Roshan
Office: GITC 3802
Ph: 973-596-2872
Office hours: Wed 2 to 5
Email: usman@njit.edu

Textbook: Not required
Grading: 20% programming assignments, 30% paper presentation, 50% project
Course Overview: This course will cover current topics in data science. We will begin with the CUDA and OpenCL languages for parallel programming on Graphics Processing Units (GPUs) and OpenMP for multi-core programming. While machine learning and parallel programming are separate topics we include GPU and multi-core computing in this course because of their applications in machine learning on large datasets. We will then discuss recent papers in machine learning throughout the semester such as deep learning, representation learning, optimization algorithms, algorithms for big datasets, and advance Bayesian methods. Students will present recent papers and work on GPU assignments. Students have the option of doing a deep learning assignment or a project of their undertaking.

Course plan:

Topic
Date
Notes
Introduction to GPU computing
01/19/2017
Introduction
Basic Unix command sheet
Instructions for AFS login
CUDA exercise
01/26/2017
Parallel chi-square 2-df test
Chi-square 2-df test in parallel on a GPU
Simulated GWAS
Class labels for above data
Assignment 1
Chi8 (Makefile and sample script file include)

Stacking and random hyperplanes
Random hyperplanes and stacking
02/02/2017
Student talks:
Random hyperplanes by Jay Patel
Is stacking better selecting best classifier by Kalyani

Papers

Projects

Snow day
02/09/2017
No class - to be rescheduled
02/16/2017
OpenCL
02/23/2017
Proof of Johnson-Lindenstrauss lemma for random projections from Foundations of Machine Learning by Mohri, Rostamizadeh, Talwalkar
Is margin preserved after random projection?

Student talks:
Extreme learning machine by Abdulrhman Aljouie (slides)

CUDA to OpenCL slides
libOpenCL.so (NVIDIA library file for OpenCL code)
Chi2 opencl implementation
OpenCL files
Assignment 2
Student talks
03/03/2017
Student talks:
Learning Feature Representations with K-means by Shuai Zhao (slides)
Student talks
03/09/2017
Student talks:
Analysis of single-layer networks in unsupervised feature learning by Shuai Zhao (slides)

Student talks
03/10/2017
Student talks:
Random Projections for Support Vector Machines by Kshitija Pansare (slides)
Random Projections for Support Vector Machines (Full proofs)

Support vector machines
Spring break
03/17/2017
Student talks
03/23/2017
Student talks:
Decision trees, random forests, boosting, and ensembles by Zhiqi Peng

Basic statistics
Applied statistics
Error bounds
Student talks
04/06/2017
Student talks:
Deep learning
Deep neural networks are easily fooled: high confidence predictions for unrecognizable images
Slides (part I)
Slides (part II)
Student talks
04/07/2017
Student talks:
Map-Reduce for Machine Learning on Multicore
Slides
Student talks
04/13/2017
Student talks:
Slides (Coordinate descent talk)
Slides (Randomness in neural networks)
Slides (SVMs)

Assignment 3
Datasets
Results on 52 datasets((CSV format)
CIFAR CNN in Keras thanks to Girish Sukhwani
MNIST CNN in Keras thanks to Girish Sukhwani
SGE script for keras jobs
Student talks
04/20/2017
Student talks:
Slides (Coordinate descent talk)
Slides (Empirical performance of programs)
Slides (Bias, variance, 0/1 loss)

Student talks
04/27/2017
Student talks:
Slides (Bias, variance, 0/1 loss)
Slides (Bayesian learning)
Slides (Building high level feautures with unsupervised learning)

Final projects CKB 341 11:30 to 2pm
05/09/2017
Papers

Projects:
  • Jay: Random hyperplanes
  • Kalyani: Stacking