
This course covers new architectures and programming techniques for large scale distributed systems and web services. The topics include cloud computing, data processing in large clusters, distributed data-parallel computing, distributed storage systems, virtualization, distributed debugging, secure distributed computing, and transactional memory for multicore architectures. Students will study state-of-the-art solutions for large scale distributed systems developed by Google, Amazon, Microsoft, Yahoo, Sun, Intel, VMWare, etc. Students will also apply what they learn in a semester-long project using the Amazon Web Services platform.
CS 656 or CS 633 or instructor's permission. If you didn't take CS 656 or CS 633, but you would like to take this class, you should come and talk with me about your background (if you have a good background, I will give you permission to register). Basic Unix/Linux skills and good programming skills are necessary for the project.
There is no book required for this class. Each lecture is based on recent papers/articles covering a specific topic. Every week, the instructor will introduce the topic (the lecture slides will be posted before each class) and then will moderate the discussions of the papers assigned for that week. Students are required to read the papers before the class and participate in the discussions. Additionally, each lecture will include design reviews and Q/A sessions for the semester-long project.
Students will work in teams of three to design and implement Internet services or systems using Apache's Hadoop and the Amazon Web Services platform. Specific projects ideas will be provided in the first weeks of classes.
There will be one individual programming assignment before the project is handed out. This assignment willl consist of a few short programs to help students get used with Hadoop and the Amazon Web Services platform.
| Week | Topic | Readings |
| 1 | Introduction. Internet-scale distributed systems. Web services. |
|
| 2 | Cloud Computing. Amazon's EC2 and S3. |
|
| 3 | Data Processing in Large Clusters I. Google's MapReduce. Apache's Hadoop. Programming assignment handed out. |
|
| 4 | Data Processing in Large Clusters II. Yahoo's Pig Latin. |
|
| 5 | Distributed Data-Parallel Computing. Microsoft's Dryad and DryadLINQ. Programming assignment due. Projects handed out. |
|
| 6 | Distributed Storage Systems I. Google's GFS and BigTable. |
|
| 7 | Distributed Storage Systems II. Amazon's Dynamo. |
|
| 8 | Midterm. Discussion of midterm solutions. | |
| 9 | Virtualization I. VMWare and Xen virtual machine monitors. |
|
| 10 | Virtualization II. VMWare and Xen virtual machine migration. |
|
| 11 | System Debugging and Testing. Sun's DTrace. |
|
| 12 | Secure Distributed Computing. |
|
| 13 | Transactional Memory/Multi-core Architectures. |
|
| 14 | Final project presentations. |
The NJIT Honor Code will be upheld, and any violations will be brought to the immediate attention of the Dean of Students.
The students will be consulted and must agree to any modifications or deviations from the syllabus throughout the course of the semester.