Status: May 20, 2002


 

»CIS 435: Advanced Data Structures and Algorithm Design«

Sections 002 & 004

Spring 2002

 


Instructor:

Artur Czumaj

Office:  4103 (GITC Building, 4th floor)
Tel.:   (973) 596-3369
Fax:   (973) 596-5777
EMail:  czumaj@cis.njit.edu
WWW:  http://web.njit.edu/~czumaj/
  
TA:  Seung-Yeop Lee (sxl5937@oak.njit.edu)

Schedule:

Class Hours:   Monday   1:00 pm - 2:25 pm   KUPF 211
    Wednesday   1:00 pm - 2:25 pm   KUPF 211


Office Hours:
  Monday   10:45 am - 12:10 pm
    Wednesday   4:45 pm - 5:55 pm

By special appointment:
  Friday   12:15 pm - 12:55 pm


TA's Office Hours:
  Tuesday   11:00 am - 12:30 pm   GITC 4217
    Friday   4:00 pm - 5:30 pm   GITC 4217


Exams and Homeworks:

  • Problem Set 1 was due February 13, 2002.
    The text is available in postscript format and in (Adobe) PDF format
    Solutions are available in postscript format and in (Adobe) PDF format

  • Problem Set 2 was due February 27, 2002.
    The text is available in postscript format and in (Adobe) PDF format

  • The Midterm exam took place on Wednesday, March 27, 2002.
    It was an open textbook exam (means - students can use both textbooks and class notes, but cannot use any other textbooks etc).
    Information about the topics covered are available here.
    A set of problems for preparations for the midterm exam is available here in postscript format and in Adobe PDF format
    Grades for the midterm (the 4 last digits of students ID (or SSN) are used for identification purposes)

  • Problem Set 3 was due April 10, 2002.
    The text is available in postscript format and in (Adobe) PDF format

  • Problem Set 4 was due April 24, 2002.
    The text is available in postscript format and in (Adobe) PDF format

  • Problem Set 5 will be due May 6, 2002.
    The text is available in postscript format and in (Adobe) PDF format
    NEW! Grades for the 5th homework (the 4 last digits of students ID (or SSN) are used for identification purposes).

  • NEW! Information about the topics covered are available here.
    A set of problems for preparations for the final exam is available here in postscript format and in Adobe PDF format
    Grades will be available on Friday, May 17, from the registrar office.
    Student having ID 4701 is not on the instructor's grade roster and therefore the instructor is unable to send his grade to the registrar's office.
    There has been also some other student who - as far as the instructor can see - for the first time showed up in the class during the final exam. Also in that case, the instructor is unable to send the grade to the registrar's office.

  • A handout with some basic facts from calculus/math is available in postscript and in Adobe PDF format
  • A handout wiht some basic facts about sorting and selection algorithms is available in postscript and in Adobe PDF format


Grading and Policies:

  • Grading will be based on 5 problems sets (homeworks, possibly including one programming assignment), an in-class midterm, and a final.

  • Five homeworks will be handled out. Each homework is worth 50 points. One homework with the lowest grade will be dropped from grade consideration. Thus, the homeworks contribute a total of at most 200 points towards the final grade. Homeworks will be posted on the course Web-page http://web.njit.edu/~czumaj/TEACHING/CIS435-S02.html both in postscript and in Adobe Acrobat format.

    • Homeworks are due at the start of class on their due date. They can be submitted either per email czumaj@cis.njit.edu or be delivered in a hard-copy at the beginning of the class. Handwritten or typed solutions will be accepted.
    • No later solutions will be taken into consideration!
    • If the homework is to be submitted per email, it must be in one of the following formats: postscript file (preferable), pdf-file (2nd preference), ASCII text, Word format.
    • Solutions must be readable (especially handwritting!!!), concise and complete.
    • Show your work, as partial credit will be given. You will be graded not only for the correctness and efficiency of your answers, but also on your clarity. Be neat.
    • Bonus points will be awarded for answers that are particularly clever, simple, or efficient.
    • Each time you must design an algorithm, try to design it as efficient as possible. The faster algorithm you will describe the better grade you will get.
    • Unless it is stated otherwise, each algorithm described must be provided with a full analysis of its correctness and its running-time.
    • The work you turn in must be your own personal work, composed and written by you. If you discuss a problem with a fellow student, you must mention this clearly in your homework (name the fellow student before the solution of the problem in question). Your work will then be compared to the other student's work to verify that your solution was written by you and reflect your own personalö effort. If you don't report it, it will be considered a violation of the course rules, as described in the General Collaboration Policy section. (Collaboration of this kind is not allowed in the midterm and final exams.)
    • Cheating: it is surprisingly easy to detect when students are not doing their own work. Working with others on the assignments does not mean dividing the problems among you, and then swapping the answers.

  • The midterm exam took place on Wednesday, March 27, 2002. It was an open textbook exam (means - students can use both textbooks and class notes, but cannot use any other textbooks etc).
    Information about the topics covered are available here.
    A set of problems for preparations for the midterm exam is available here in postscript format and in Adobe PDF format
    The midterm exam was worth 150 points.

  • The final exam will take place from 11:30am to 2:00pm on Tuesday, May 14, 2002. It will be in room KUPF 211.
    The final exam is cumulative (that is, all course material will be examined) and will be open-textbook and open class-notes only. It is worth 300 points.
    NEW! Information about the topics covered are available here.
    A set of problems for preparations for the final exam is available here in postscript format and in Adobe PDF format
    Grades will be available on Friday, May 17, from the registrar office.
    Student having ID 4701 is not on the instructor's grade roster and therefore the instructor is unable to send his grade to the registrar's office.
    There has been also some other student who - as far as the instructor can see - for the first time showed up in the class during the final exam. Also in that case, the instructor is unable to send the grade to the registrar's office.
Prerequisite: CIS 114 (contact the instructor immediately if you didn't take this course)


Contents:

The main aim of the course is to teach students to understand algorithms and data structures. We will discuss algorithms and data structures for most fundamental problems. For those problems we shall focus on the design of efficient algorithms and the use of proper data structures. Some issues of the analysis of the algorithms (like correctness, running-time analysis, etc.) will be also covered.

From the NJIT undergraduate course catalog: The course deals with advanced topics in data structures and algorithm, including mathematical induction, analysis and complexity of algorithms, and algorithms involving sequences, sets, and graphs such as searching, sorting, order statistics, sequence comparisons, graph traversals, etc. Optional topics include geometric, algebraic, and numeric algorithms.

Specific Contents:

  1. (1/23/02)
  2. (1/28/02)
    • What is an algorithm, what is a data structure, and what is an abstract data structure.
    • Big-Oh, Big-Omega, and Big-Theta notation.
    • Short discussion about a minimum-finding algorithm.

  3. (1/30/02)
    • Analysis of the minimum-finding algorithm (input in an n-array, analysis of its running time - O(n), exact bound for the number of comparisons - (n-1), a rather complete proof of its correctness).
    • Algorithm for finding an element in an array (for an n-array: running time of O(n) & n-1 comparisons).
    • Algorithm for finding an element in a sorted array; discussion of an iterative algorithm and a recursive one; analysis of their worst-case running time (for an n-array: running time of O(log(n)) & about 2*(log_2(n)+1) comparisons); use of recurrences.
    • Brief discussion about Worst-case vs. Average-case complexity (and even a shorter statement about the best-case complexity).
    • Recommended reading:
      Main textbook (2nd ed.): Try to read, chapters 1, 2 and 3, but do it slowly ...
      Sedgewick (C++): Read Chapter 2

  4. (2/4/02)
    • Brief discussion about the 1st homework (what has to be done/requirements, etc.)
    • Sorting algorithms (problem definition)
    • Insertion sort (based on "inserting" an element to a sorted sequence; analysis of its running time (worst-case and average-case runtime of Theta(n^2) comparisons))
    • Selection sort (based on finding a minimum in the remaining unsorted part of the input; analysis of its running time (worst-case runtime of Theta(n^2) comparisons))
    • Partitioning-based algorithms for sorting (main ideas behind merge-sort and quick-sort)
    • Recommended reading:
      Main textbook (2nd ed.): Solve Problem 2-2 in Chapter 2 (page 38)

  5. (2/6/02)
    • Homework (for self-work, don't submit it to the instructor nor the TA): analyze Bubble-sort algorithm (see, e.g., Problem 2-2 in Chapter 2 (page 38) in the main textbook (2nd edition))
    • Continuation: basic idea behind divide-and-conquer (partitioning-based) algorithms for sorting
    • Solving recurrences of the form T(n) = O(n) + T(l) + T(n-l);
      in particular, showing that if l = n/2, that is, for the recurrence T(n) = O(n) + 2 T(n/2), the solution is of the form T(n) = O(n log(n))
      ... but if l=n-1, that is for the recurrence T(n) = O(n) + T(n-1) + T(1), the solution is of the form T(n) = O(n^2)
    • Substituion method for solving recurrences (first guess the upper bound for T(n) and then prove it formally by induction)
    • Merge-Sort algorithm - main ideas behind it, why it's correct, analysis of its running time (recurrence T(1) = O(1) & T(n) = O(n) + 2 T(n/2) => T(n) = O(n log(n)))
    • Warm up before Quick-sort ...

  6. (2/11/02)
    • Quick-sort: algorithm, few words about its correctness, its worst-case running time analysis
    • Recommended reading:
      (Main textbook, 2nd edition) read very carefully Chapter 7, in particular Chapters 7.1 and 7.2
      (Sedgewick) read very carefully Chapter 7, in particular Chapters 7.1 and 7.2

  7. (2/13/02)
    • Discussion of the 1st Problem set
    • Solving recurrences using the "Master Method"
    • Quick-sort, continuation of the analysis: 4 ways of improvement of Quick-sort in practical implementations (nonrecursive Quick-sort, using Insertion-sort for small problem instances, randomized Quick-sort, median-of-three implementation of Quick-sort)
    • Recommended reading:
      For a non-recursive implementation of Quick-sort and for other implementation details, see Chapter 7.3 (Program 7.3) in Sedgewick's book
    • A handout wiht some basic facts from calculus/math is available in postscript and in Adobe PDF format

  8. (2/18/02)
    • Quick-sort, continuation of the analysis
    • (Tentatively I planned to talk about "Average-case (expected) running time analysis of Quick-sort", but after all I skip this topic - at least for a while
    • Selection Problem and the algorithm by Hoare
    • Heap data structure
    • Heap sort algorithm - introduction
    • Recommended reading:
      (Main textbook, 2nd edition) Chapters 6 and 7, and Problem 7-1
      (Sedgewick) Chapters 7 and 9

  9. (2/20/02)
    • Heap sort algorithm, continuation
    • Priority queues (and their implementation using heaps)
    • No comparison-based sorting algorithm can run in o(n olog(n)) time (a lower bound)
    • Recommended reading:
      (Main textbook, 2nd edition) Chapter 6 (Heap-sort)
      (Sedgewick) Chapter 9 (heap-sort)

  10. (2/25/02)
    • Main idea of the proof that no comparison-based sorting algorithm can run in o(n olog(n)) time (a lower bound)
    • Linear-time sorting algorithms (counting sort and bucket sort)
    • Recommended reading:
      (Main textbook, 2nd edition) Chapter 8
      (Sedgewick) Chapter 10

  11. (2/27/02)
    • Discussion about the second Problem Set and presentation of the solutions
    • Linear-time sorting algorithms (bucket sort and radix sort)
    • Recommended reading:
      (Main textbook, 2nd edition) Chapter 8
      (Sedgewick) Chapter 10

  12. (3/4/02)
    • Notion of stable sorting algorithms and In-place sorting algorithms
    • Bucket sort can be made stable
    • Radix sort (as an application of the use of stable sorting algorithms)
    • Using radix sort to sort integer numbers from {0,1,...,n^k-1} in time O(kn)
    • Selection algorithms and introduction to a linear-time algorithm for selection
    • Recommended reading:
      (Main textbook, 2nd edition) Chapter 8 & 9
      (Sedgewick) Chapter 10

  13. (3/6/02)
    • Linear-time algorithm for selection
    • Selection algorithms using heap operations
    • Summary of selection algorithms
    • Introduction to binary search trees
    • Recommended reading:
      (Main textbook, 2nd edition) Chapters 9 and 12
      (Sedgewick) Chapters 10 and 12

  14. (3/11/02)
    • Binary search trees
    • O(n)-time sorting when the input is given as a binary search tree
    • Inorder, preorder, postorder tree traversals
    • Creating, searching, and updating (insert & delete) in a binary search tree (also operations min-finding, successor...)
    • All operations require (in the worst-case) time proportional to the depth of the tree
    • Recommended reading:
      (Main textbook, 2nd edition) Chapter 12
      (Sedgewick) Chapter 12

  15. (3/13/02)
    • Why binary search trees are useful (static vs. dynamic dictionaries)
    • Selection algorithms using binary search trees (only O(height)-time provided binary search trees are augmented by "size" of each subtree)
    • Balanced binary search trees
    • Analysis of red-black trees
      • Definition (red-black properties)
      • Height of red-black trees
      • O(log(n))-time algorithms for Search, Maximum, Minimum, Successor, Predecessor, Selection
      • Rotations in binary search trees
      • Example of how to insert new nodes to maintain the red-black properties
    • Recommended reading:
      (Main textbook, 2nd edition) Chapter 13
      (Sedgewick) Chapter 13 (in particular 13.4)

  16. (3/25/02)
    • Discussion about the requirements for the midterm exam
    • Brief discussion of some of the problems put on the class web page for the practice before the midterm
    • Analysis of red-black trees: inserting a node
    • Students are supposed to read in the textbook(s) about deleting a node from a red-black tree
    • Recommended reading:
      (Main textbook, 2nd edition) Chapter 13
      (Sedgewick) Chapter 13 (in particular 13.4)

  17. (3/27/02)
    • Midterm
      It was an open textbook exam (means - students can use both textbooks and class notes, but cannot use any other textbooks etc).
      Information about the topics covered are available here.
      A set of problems for preparations for the midterm exam is available here in postscript format and in Adobe PDF format
      A handout wiht some basic facts from calculus/math is available in postscript and in Adobe PDF format
      A handout wiht some basic facts about sorting and selection algorithms is available in postscript and in Adobe PDF format
    • Grades for the midterm (the 4 last digits of students ID (or SSN) are used for identification purposes)

  18. (4/1/02)
    • Discussion about the midterm exam

  19. (4/3/02)
    • Hashing
      • Simple hashing strategies: direct-address tables
      • Hash tables and hash functions
      • Hashing with chaining (implementation of dictionary operations: searching, deletion, insertion)
      • If hash functions are chosen at random ("under assumption of simple uniform hashing") all dictionary operations require O(1) time on average
      • Basic methods to design hash functions
    • Recommended reading:
      (Main textbook, 2nd edition) read very carefully Chapter 11
      (Sedgewick) read very carefully Chapter 14, in particular Chapters 14.1 and 14.2

  20. (4/8/02)
    • Hashing, continuation
    • Hash functions - what makes a good hash function
      • why random hash functions are not so useful
      • the division method
      • the multiplication method
    • Opena ddressing
      • Implementation of insertion and searching (constant average running time)
      • hash methods: lieanr probing, quadratic probing, and double hashing
    • Matrix multiplication (an algorithm multiplying an n x m matrix by an m x k matrix in time O(nmk))
    • Recommended reading:
      (Main textbook, 2nd edition) read very carefully Chapter 11 and Chapter 15.2 (about matrix multiplications)
      (Sedgewick) read very carefully Chapter 14, in particular Chapters 14.1 and 14.2

  21. (4/10/02)
    • Dynamic Programming technique on an example of mutrix-chain multiplication
    • Matrix-chain multiplication problem
      • why the order in which a sequence of matrices is multiplied does matter
      • number of "orders" in which a sequence of n matrices can be multiplied (it is of order 4^{n} / n^{1.5})
      • a recursion to compute the number of orders
      • the structure of an optimal order of matrix-chain mulitplication
      • an O(n^3)-time algorithm that computes the optimal order of matrix-chain mulitplication
    • Recommended reading:
      (Main textbook, 2nd edition) read very carefully Chapter 15.2

  22. (4/15/02)
    • Dynamic programming technique cont.
    • Dynamic programming algorithm for the longest common subsequence problem (running time O(n^2))
    • Dynamic programming algorithm for optimal binary search trees (running time O(n^3))
    • Recommended reading:
      (Main textbook, 2nd edition) read very carefully Chapter 15

  23. (4/17/02)
    • Dynamic programming technique cont.
    • Dynamic programming algorithm for optimal binary search trees (running time O(n^3))
    • General properties of dynamic programming technique
    • Graphs, their definitions (undirected vs. directed), their applications, and their most basic properties

  24. (4/22/02)
    • Graphs, their definitions (undirected vs. directed), their applications, and their most basic properties
    • Graphs representations (adjacency matrices and adjacency lists)
    • Basic classes of graphs: planar graphs, forests, trees
    • Spanning trees and minimum spanning trees
    • Kruskal's (greedy) algorithm for finding minimum spanning trees
    • Implementations details: sorting (or priority queues) and UNION/FIND data structures

  25. (4/24/02)
    • Kruskal's algorithm for finding minimum spanning trees
    • Very efficient implementations of UNION/FIND data structures for disjoint sets

  26. (4/29/02)
    • Traversing undirected graphs: depth-first search and breadth-first search algorithms
    • Analysis of the running time of depth-first search
    • Finding connected components in an undirected graph
    • String matching
    • "Easy" algorithm for string matching (running time O(n m))

  27. (5/1/02)
    • String matching (continuation)
    • Brute-force algorithm for string matching (running time O(n m))
    • String matching algorithm using integer representation of texts (running time O(n + m) but assuming constant-time operations on huge numbers)
    • Karp-Rabin string matching algorithm (modification of the algorithm above but using simple hashing to deal with small numbers): worst-case running time O(n m) and "typical" (average) running time O(n + m)
    • Knuth-Morris-Pratt algorithm for string matching (worst-case running time O(n + m))
    • Boyer-Moore string matching algorithm (use of the occurrence shift heuristic: worst-case running time O(n + m) and typical running time O(n + m) or even better)

  28. (5/6/02)
    • Discussion about the final exam

  29. NEW! (5/14/02) Final Exam
    • Final exam took place from 11:30am to 2:00pm on Tuesday, May 14, 2002, room KUPF 211.
      The final exam is cumulative (that is, all course material will be examined) and will be open-textbook and open class-notes only. It is worth 300 points.
      Information about the topics covered are available here.
      A set of problems for preparations for the final exam is available here in postscript format and in Adobe PDF format

    • Grades will be available on Friday, May 17, from the registrar office.
      Student having ID 4701 is not on the instructor's grade roster and therefore the instructor is unable to send his grade to the registrar's office.
      There has been also some other student who - as far as the instructor can see - for the first time showed up in the class during the final exam. Also in that case, the instructor is unable to send the grade to the registrar's office.

Topics to be covered (tentative):

References in parenthesizes below refer to the chapters in the (2nd edition of the) main textbook.

  1. General issues: (Ch 1)
    • what is an "algorithm design"
    • what are "data structures"
    • what does it mean to "analyze an algorithm"; worst-case and average-case analysis
    • we shall learn about techniques, tools, tricks plus a necessary information about the complexity issues
  2. Basic "mathematics":
    • Big-Oh notation (Ch 3-4)
    • solving recurrences (Ch 4)
    • summations, sets, relations, functions, graphs, trees (Ch A and B)
  3. Sorting and order statistics: (Ch 1, 6-9)
    • the importance of sorting (in practice and in the understanding of the algorithm design)
    • Insertion sort, Selection sort, Merge sort, Quick sort, Heap sort, Radix and Bucket sort (optionally: sorting in external memory)
    • medians and selection
  4. Basic data structures: (Ch 10-13, 18, 21)
    • lists, stacks, and queues
    • trees
    • dictionaries
    • priority queues
    • search trees and binary search trees
    • balanced binary search trees (Red-Black trees)
    • (optionally: splay trees and B-trees)
    • hashing
    • data structures for disjoint sets (Ch 21, 23)
  5. Basic techniques and their applications: (Ch 1, 15-17, 23-25)
    • divide-and-conquer
    • dynamic programming
    • greedy algorithms
    • amortize analysis
    For each of the techniques we shall present its applications and, in particular, we shall introduce graph problems here (Ch 23-25).
  6. Additional problems:
    • string matching (Ch 32)
    • polynomial and matrix multiplication (optionally: Fast Fourier Transform and basic algebraic algorithms) (Ch 28 and 30)
    • problems that are "hard": NP-completeness and NP-hardness (Ch 34-35)


References:

The difference between the main textbook and the optional one is that the book by Cormen et al. is more theoretical and might be more complicated for students; the book by Sedgewick contains more implementations of algorithms (in C or C++).


Artur Czumaj, May 20, 2002