CS 698-108: Advanced Topics in Web/Distributed Systems, Spring '10

Fridays, 6-9:05pm, FMH 106
Tentative Syllabus (some of the details may change before the first day of classes)

Instructor

  Cristian Borcea
  Office: GITC 4303
  Phone: 973 596-3662
  Office Hours: TBA
email

Short description

This course covers new architectures and programming techniques for large scale distributed systems and web services. The topics include cloud computing, data processing in large clusters, distributed data-parallel computing, distributed storage systems, virtualization, distributed debugging, secure distributed computing, and transactional memory for multicore architectures. Students will study state-of-the-art solutions for large scale distributed systems developed by Google, Amazon, Microsoft, Yahoo, Sun, Intel, VMWare, etc. Students will also apply what they learn in a semester-long project using the Amazon Web Services platform.

Who should take this course

This class is aimed at all graduate students (both M.S. and Ph.D. students) who want to learn how recent advances in distributed systems have changed the way in which large scale Internet services and applications are designed, implemented, and deployed. By studying real-world systems developed in industry during the past few years, students will acquire cutting-edge knowledge that may be a major advantage when searching for a job. Finally, this class can help students to find topics for research projects and theses.

Prerequisites

CS 656 or CS 633 or instructor's permission. If you didn't take CS 656 or CS 633, but you would like to take this class, you should come and talk with me about your background (if you have a good background, I will give you permission to register). Basic Unix/Linux skills and good programming skills are necessary for the project.

Lectures and Readings

There is no book required for this class. Each lecture is based on recent papers/articles covering a specific topic. Every week, the instructor will introduce the topic (the lecture slides will be posted before each class) and then will moderate the discussions of the papers assigned for that week. Students are required to read the papers before the class and participate in the discussions. Additionally, each lecture will include design reviews and Q/A sessions for the semester-long project.

Project

Students will work in teams of three to design and implement Internet services or systems using Apache's Hadoop and the Amazon Web Services platform. Specific projects ideas will be provided in the first weeks of classes.

Programming Assignment

There will be one individual programming assignment before the project is handed out. This assignment willl consist of a few short programs to help students get used with Hadoop and the Amazon Web Services platform.

Grading

Schedule

Week Topic Readings
1 Introduction. Internet-scale distributed systems. Web services.
2 Cloud Computing. Amazon's EC2 and S3.
3 Data Processing in Large Clusters I. Google's MapReduce. Apache's Hadoop. Programming assignment handed out.
4 Data Processing in Large Clusters II. Yahoo's Pig Latin.
5 Distributed Data-Parallel Computing. Microsoft's Dryad and DryadLINQ. Programming assignment due. Projects handed out.
6 Distributed Storage Systems I. Google's GFS and BigTable.
7 Distributed Storage Systems II. Amazon's Dynamo.
8 Midterm. Discussion of midterm solutions.
9 Virtualization I. VMWare and Xen virtual machine monitors.
10 Virtualization II. VMWare and Xen virtual machine migration.
11 System Debugging and Testing. Sun's DTrace.
12 Secure Distributed Computing.
13 Transactional Memory/Multi-core Architectures.
14 Final project presentations.

Honor Code

The NJIT Honor Code will be upheld, and any violations will be brought to the immediate attention of the Dean of Students.

Modifications to Syllabus

The students will be consulted and must agree to any modifications or deviations from the syllabus throughout the course of the semester.