CS 698: Current topics in data science
Spring 2018

Instructor: Usman Roshan
Office: GITC 4207
Ph: 973-596-2872
Office hours: M: 1 to 3, W: 11:30 to 2:30
TA: Chaoran Cheng
Email: usman@njit.edu

Textbook: Not required
Grading: 30% programming assignments, 20% paper presentation, 50% project
Course Overview: This course will cover current topics in data science. We will begin with the CUDA and OpenCL languages for parallel programming on Graphics Processing Units (GPUs) followed by OpenMP for multi-core programming. While machine learning and parallel programming are separate topics we include GPU and multi-core computing in this course because of their applications in machine learning on large datasets. We will then discuss recent papers in machine learning throughout the semester such as deep learning, representation learning, optimization algorithms, algorithms for big datasets, and advance Bayesian methods. Students will present recent papers and work on parallel programming assignments. Students have the option of doing a deep learning assignment or a project of their undertaking.

Course plan:

Topic
Date
Notes
Projects

Luis: Sentiment detection from images
Aradhya: Word2vec and CNN for text data
Fadi: Medical image recognition with convolutional neural networks
Elizabeth: Decision trees and random forests
Dhara and Joseph: Sound classification
Sneha and Nehal : Support vector machine and kernels
Raj: Sentiment analysis from text using CNN
Shih and Yucong: Facial recognition with CNN
Kaizheng: Driverless car
Akhil: Neural networks
Akanksha: Dimensionality reduction on medical data
Xuwen:
Yue: Clustering
Le: Deep learning
Maryam and Samtha: Predicting movement of objects
Introduction to GPU computing
01/18/2018
Introduction
Papers
Projects
Basic Unix command sheet
Instructions for AFS login

Some topics to start with
Representation learning
Geometrical and Statistical properties of systems of linear inequalities with applications in pattern recognition (Cover 1965)
Approximations by superpositions of sigmoidal functions (Cybenko 1989)
Approximation Capabilities of Multilayer Feedforward Networks (Hornik 1991)
ImageNet classification with deep neural networks (Krizhevsky et. al. 2012)
Random projections preserve margin
Random projections preserve margin II

For image recognition projects
Python Image Library
CUDA exercise
01/26/2018
Parallel chi-square 2-df test
Chi-square 2-df test in parallel on a GPU
Simulated GWAS
Class labels for above data
Assignment 1
Basic machine learning and Python scikit-learn
02/01/2018
Basic machine learning background with Python scikit-learn

Datasets
OpenCL
02/08/2018
CUDA to OpenCL slides
libOpenCL.so (NVIDIA library file for OpenCL code)
Chi2 opencl implementation
OpenCL files
Assignment 2
Deep learning for lung cancer and facial image detection 02/15/2018
Fadi:PPT
Papers: P1 , P2

Luisa:PPT
Papers: Face expression recognition with a 2-channel convolutional neural network

Trees, forests, and sound classification
02/22/2018
Elizabeth: Decision trees, random forests, and boosting
Dhara: CNN for sound classification (Paper)

Python Image Library
Image classification code
Word to vector representations and autoencoders
03/01/2018
Aradhya: Word2vec (Paper)
CNNs for text, multi-layer perceptrons
03/08/2018
Aradhya: CNNs for text (Papers)

Multi-layer perceptrons
Scikit-learn MLP code
Spring break 03/15/2018
CNNs for text, predicting motion of objects with CNNs 03/22/2018
Raj: CNNs for text
Samtha: Learning Physical Intuition of Block Towers (P1) (P2)
Maryam: Predicting movement of objects ( Paper )

Convolutional neural networks for image recognition
Flower image classification with CNNs
Assignment 3
Support vector machines, kernels, and driverless cars, mid-term review
03/29/2018
Sneha and Nehal: Support vector machines and kernels
Kaizheng: Driverless cars (P1)

Mid-term review sheet
Face recognition, object detection, clustering, and mid-term 04/05/2018
Shih: Deep face recognition ( P1 )
Yucong: Rich feature hierarchies for object detection ( P1 )
Yue: Clustering ensemble method based on swarm intelligence ( P1 )

04/12/2018
Joseph: Unsupervised sound classification ( P1 )
Xuwen: SVM for cancer therapy ( P1 )
Le: Deep learning for Image Steganalysis ( P1 )
Akanksha: Principal component analysis
Akhil: Neural networks
Projects
04/19/2018
Fadi: PPT
Elizabeth: PPT
Dhara: PPT
Joseph: PPT
Aradhya: PPT
Raj: PPT
Samtha: PPT
Maryam: PPT
Projects
04/26/2018
Le Li: PPT
Luisa: PPT
Sneha and Nehal: PPT
Kaizheng: PPT
Yucong: PPT
Shih: PPT
Yue: PPT
Xuwen:
Akanksha: PPT
Akhil: PPT