
This course covers new architectures and programming techniques for large scale distributed systems and web services. The topics include cloud computing, data processing in large clusters, distributed data-parallel computing, distributed storage systems, virtualization, distributed debugging, secure distributed computing, and transactional memory for multicore architectures. Students will study state-of-the-art solutions for large scale distributed systems developed by Google, Amazon, Microsoft, Yahoo, Sun, Intel, VMWare, etc. Students will also apply what they learn in a semester-long project using the Amazon Web Services platform.
CS 656 or CS 633 or instructor's permission. If you didn't take CS 656 or CS 633, but you would like to take this class, you should come and talk with me about your background (if you have a good background, I will give you permission to register). Basic Unix/Linux skills and good programming skills are necessary for the project.
There is no book required for this class. Each lecture is based on recent papers/articles covering a specific topic. Every week, the instructor will introduce the topic (the lecture slides will be posted before each class) and then will moderate the discussions of the papers assigned for that week. Students are required to read the papers before the class and participate in the discussions. Additionally, each lecture will include design reviews and Q/A sessions for the semester-long project.
Each student is required to prepare a power point presentation for one paper. Upon discussions with the instructor, this presentation will be incorporated in that week's lecture slides. In class, the instructor will present the context and main ideas of that week's topic, while the student will discuss the technical details of his/her assigned paper.
Students will work in teams of three to design and implement Internet services or systems using Apache's Hadoop and the Amazon Web Services platform. Specific projects ideas will be provided in the first weeks of classes.
There will be one individual programming assignment before the project is handed out. This assignment willl consist of a few short programs to help students get used with Hadoop and the Amazon Web Services platform.
There will be two exams: a midterm, and a final exam. Both exams are open book (i.e., papers, notes). The final exam will cover only the material taught after the midterm. In case of missing an exam, a make-up may be taken only after providing written documentation from the Dean of Students.
| Week | Topic | Readings |
| 1 | Course overview. Introduction to distributed systems and parallel computing. Slides. |
|
| 2 | Cloud Computing. Amazon Web Services. Slides. |
|
| 3 | Data Processing in Large Clusters I. Google's MapReduce. Apache's Hadoop. Slides. Programming assignment handed out. |
|
| 4 | Data Processing in Large Clusters II. Yahoo's Pig Latin. Slides. |
|
| 5 | Distributed Data-Parallel Computing. Microsoft's Dryad and DryadLINQ. Slides. |
|
| 6 | Distributed Storage Systems I. Google's GFS and BigTable. Slides. Programming assignment due. |
|
| 7 | Midterm. Project ideas presentation. | |
| 8 | Distributed Storage Systems II. Amazon's Dynamo. Virtualization I. VMWare virtual machine monitor. Slides. |
|
| 9 | Virtualization II. Xen virtual machine monitor; Virtual machine migration. Slides. |
|
| 10 | System Debugging and Testing. Sun's DTrace. Slides. |
|
| 11 | Secure Distributed Computing. Slides. |
|
| 12 | Transactional Memory/Multi-core Architectures. Slides. |
|
| 13 | More on Cloud Services. Grid Engine. Slides. |
|
| 14 | Final project presentations. |
The NJIT Honor Code will be upheld, and any violations will be brought to the immediate attention of the Dean of Students.
The students will be consulted and must agree to any modifications or deviations from the syllabus throughout the course of the semester.