Final Status: December 20, 2002
won't be updated in future


 

»CIS 435: Advanced Data Structures and Algorithm Design«

Section 101

Fall 2002

 


Instructor:

Prof. Dr. Artur Czumaj

Office:  4103 (GITC Building, 4th floor)
Tel.:   (973) 596-3369
Fax:   (973) 596-5777
EMail:  czumaj@cis.njit.edu
WWW:  http://web.njit.edu/~czumaj/

Teaching Assistant:  Seung-Yeop Lee (sxl5937@oak.njit.edu)

Office: GITC 4217

 

 
Schedule:

Class Hours:   Wednesday   6:00 pm - 9:05 pm   FAC 321


Office Hours:
  Wednesday   4:00 pm - 5:25 pm
    Thursday   4:00 pm - 5:25 pm

By special appointment:
  Friday   12:15 pm - 12:55 pm


TA's Office Hours:
  Monday   4:00 pm - 5:25 pm   GITC 4217 
  Thursday   11:00 am - 12:25 pm  

 

 


Grading and Policies:

  • Grading will be based on 5 problems sets (homework assignments), an in-class midterm, and a final.

    The problem sets will count for 2/7 of the final grade, the midterm for another 2/7, and the final exam for 3/7. If there is a large discrepancy between your homework and exam grades, the grades will be balanced at student's discretion.

  • Five homework assignments will be handled out. Each homework is worth 100 points. One homework with the lowest grade will be dropped from grade consideration. Thus, the homework assignments contribute a total of at most 400 points towards the final grade. Homework will be posted on the course Web-page http://web.njit.edu/~czumaj/TEACHING/CIS435-F02.html in Adobe Acrobat format.

    • Homework assignments are due at the start of class on their due date. They can be submitted either per email czumaj@cis.njit.edu or be delivered in a hard-copy at the beginning of the class. Handwritten or typed solutions will be accepted.
    • No later solutions will be taken into consideration!
    • If the homework is to be submitted per email, it must be in one of the following formats: Adobe PDF-file, ASCII text, Microsoft Word format.
    • Solutions must be readable (especially handwriting!!!), concise and complete.
    • Show your work, as partial credit will be given. You will be graded not only for the correctness and efficiency of your answers, but also on your clarity. Be neat.
    • Bonus points will be awarded for answers that are particularly clever, simple, or efficient.
    • Each time you must design an algorithm, try to design it as efficient as possible. The faster algorithm you will describe the better grade you will get.
    • Unless it is stated otherwise, each algorithm described must be provided with a full analysis of its correctness and its running-time.
    • The work you turn in must be your own personal work, composed and written by you. If you discuss a problem with a fellow student, you must mention this clearly in your homework (name the fellow student before the solution of the problem in question). Your work will then be compared to the other student's work to verify that your solution was written by you and reflect your own personal effort. If you don't report it, it will be considered a violation of the course rules, as described in the General Collaboration Policy section. (Collaboration of this kind is not allowed in the midterm and final exams.)
    • Cheating: it is surprisingly easy to detect when students are not doing their own work. Working with others on the assignments does not mean dividing the problems among you, and then swapping the answers.

  • The midterm exam took place on October 23, 2002. It was an open textbook exam (means - students can use both textbooks and class notes, but cannot use any other textbooks etc).

    Information about topics covered by the midterm exam

    Self-test problems for the midterm

    Handout 1: Useful mathematical background

    Handout 2: Sorting and selection algorithms (for the midterm look only at the part about sorting algorithms we discussed in the class)

    The midterm exam was worth 400 points.

  • The final exam took place on Wednesday, December 18, at 6pm. (Room FAC 321)

    The final exam is cumulative (that is, all course material will be examined).

    The final exam is open-textbook and open class-notes only.

    It is worth 600 points.

    Information about topics covered by the final exam

    Self-test problems for the final

    Grades for the final

     

  • Final grades are available here - and are also available from the registrar office

 


 

Prerequisite: CIS 114 or CIS 335 (contact the instructor immediately if you didn't take any of these courses)


Contents:

The main aim of the course is to teach students to understand algorithms and data structures. We will discuss algorithms and data structures for most fundamental problems. For those problems we shall focus on the design of efficient algorithms and the use of proper data structures. Some issues of the analysis of the algorithms (like correctness, running-time analysis, etc.) will be also covered.

From the NJIT undergraduate course catalog: The course deals with advanced topics in data structures and algorithm, including mathematical induction, analysis and complexity of algorithms, and algorithms involving sequences, sets, and graphs such as searching, sorting, order statistics, sequence comparisons, graph traversals, etc. Optional topics include geometric, algebraic, and numeric algorithms.

Topics to be covered (tentative):

References in parenthesizes below refer to the chapters in the (2nd edition of the) main textbook.

  1. General issues: (Ch 1)
    • what is an "algorithm design"
    • what are "data structures"
    • what does it mean to "analyze an algorithm"; worst-case and average-case analysis
    • we shall learn about techniques, tools, tricks plus a necessary information about the complexity issues

  2. Basic "mathematics":
    • Big-Oh notation (Ch 3-4)
    • solving recurrences (Ch 4)
    • summations, sets, relations, functions, graphs, trees (Ch A and B)

  3. Sorting and order statistics: (Ch 1, 6-9)
    • the importance of sorting (in practice and in the understanding of the algorithm design)
    • Insertion sort, Selection sort, Merge sort, Quick sort, Heap sort, Radix and Bucket sort (optionally: sorting in external memory)
    • medians and selection

  4. Basic data structures: (Ch 10-13, 18, 21)
    • lists, stacks, and queues
    • trees
    • dictionaries
    • priority queues
    • search trees and binary search trees
    • balanced binary search trees (Red-Black trees)
    • (optionally: splay trees and B-trees)
    • hashing
    • data structures for disjoint sets (Ch 21, 23)

  5. Basic techniques and their applications: (Ch 1, 15-17, 23-25)
    • divide-and-conquer
    • dynamic programming
    • greedy algorithms
    • amortize analysis
    For each of the techniques we shall present its applications and, in particular, we shall introduce graph problems here (Ch 23-25).


  6. Additional problems:
    • string matching (Ch 32)
    • polynomial and matrix multiplication (optionally: Fast Fourier Transform and basic algebraic algorithms) (Ch 28 and 30)
    • problems that are "hard": NP-completeness and NP-hardness (Ch 34-35)


Specific Contents:

  1. (9/4/02)
    • Discussion of the syllabus, requirements, topics to be covered, textbooks.
    • Quiz and Questionnaire.
    • What is an algorithm, what is a data structure, and what is an abstract data structure.
    • Algorithms (and their analyses) for the "simple" searching problem.
      • Simple linear-time algorithm - discussion of correctness and its running time.
      • Faster algorithm when the input sequence is sorted (only design and brief running-time analysis).
    • Suggested reading: Chapters 1 and 2 from the textbook (2nd edition).
  2. (9/11/02)
    • Growth of functions (Section 1 from the textbook (2nd edition)).
    • Asymptotic notation: big Theta, big O, big Omega (Chapter 3.1 and Problems from Chapter 3 (pages 57-60) from the textbook (2nd edition)).
    • Searching algorithms once again (linear search and bisection search)
      • counting the number of comparisons (linear search - n and bisection search - log n)
      • estimating the running time in Big-Theta notation (linear search - Theta(n) and bisection search - Theta(log n))
    • Insertion Sort (Chapters 2.1 and 2.2 from the textbook (2nd edition)).
      • The main idea of Insertion Sort and the process of the design of the algorithm.
      • Worst-case running time analysis in the Big-Theta notation (Theta(n^2)).
    • Selection Sort
      • The main idea of Selection Sort and the process of the design of the algorithm.
      • An algorithm for finding a smallest element in an array.
      • Worst-case running time analysis in the Big-Theta notation (Theta(n^2)).
    • Suggested reading: Chapters 1, 2.1, 2.2, 3.1 from the textbook (2nd edition).
  3. (9/18/02)
    • Further discussion on properties of Insertion Sort and Selection Sort (Chapter 2)
      • Insertion Sort and Selection Sort have both the worst-case and the average-case running time of Theta(n^2)
      • Insertion Sort and Selection Sort are both in-place
      • Insertion Sort and Selection Sort are both stable
      • Selection Sort performs the minimum number of keys replacements
    • Merge Sort (Chapter 2.3)
      • Follows the idea of divide-and-conquer
      • Main idea: sort recursively the first half of the elements, then sort recursively the second half of the elements, and then merge two sorted sequence
      • Merging of two sorted sequences of N elements each can be done in time Theta(N)
      • Worst-case and average-case running times are given by the recurrence: T(n) = T(n/2) + T(n/2) + Theta(n), what gives the running time Theta(n log(n))
      • Merge Sort is stable but is not in-place (because of merging phase)
    • Quick Sort (Chapter 7)
      • Also follows the idea of divide-and-conquer
      • Main idea: pick a pivot element (for example, A[1]) and partition the sequence into 2 parts - elements smaller than the pivot and the elements greater than or equal to the pivot; sort each of the parts
      • Partitioning with respect to the pivot element can be done in time Theta(n)
      • Worst-case running time occurs when one of the parts has n-1 elements - in this case the running time is given by the recurrence T(n) = Theta(n) + T(n-1), what gives the worst-case running time of Theta(n^2)
      • Quick Sort is not called "quick" because it's fast in the worst-case ... it's because it's fast on average, in a "typical" case
      • In average-case each part is of similar case and the average-case running time is Theta(n log(n))
      • Quick Sort is stable but is not in-place (because of the recursive calls)
    • Suggested reading: Chapter 2 (especially 2.3) and Chapter 7 from the textbook (2nd edition).
  4. (9/25/02)
    • Very long and detailed discussion about Homework 1.
    • The master method (Chapter 4.3 and 4.4): a "cookbook" method for solving recurrences of the form

      T(n) = a T(n/b) + f(n)

      where a >= 1 and b > 1 are constants, and f(n) is an asymptotically positive function.

    • Suggested reading: Chapter 4 (especially 4.3 and 4.4) from the textbook (2nd edition).
  5. (10/2/02)
    • Quick Sort (Chapter 7) - continuation
      • Theta(n)-time implementations of Partition procedure ("easy" algorithm that uses additional space and a slightly more difficult algorithm that is in-place - for more details see Chapter 7.1)
      • Analysis of the average-case running time of Quick Sort (analysis following Problem 7-2 in the textbook)
        • Describing the average-case running time in a form of a recurrence
          AT(n) = Theta(n) + 1/n * sum(k=1 ... n) (AT(k-1) + AT(n-k))
        • Solving the recurrence to get the average-case running time of Theta(n log(n))
      • Discussion about efficient implementations of Quick Sort
        Pivot choice:
        • pick 3 elements and then choose as the pivot the one with the medium value
        • choose a random element as the pivot.
    • Selection problem (Chapter 9): given an array of n elements A[1 ... n] and an integer k, find the kth smallest element in array A
      • Easy algorithm for selection: sort array A and return A[k]
        • if one uses Merge Sort algorithm then the worst-case running time is Theta(n log(n))
    • Quick-sort-like algorithm for Selection (by Hoare) (for a similar descrption, see Chapter 9.2 in the textbook)
      • Theta(n^2) worst-case running time
      • Theta(n) average-case running time
    • Heap Sort - introduction (Chapter 6.1)
      • Definition of a heap - full binary tree with the heap property
      • Implementation of a heap in an array
    • Suggested reading: Chapters 6.1, 7, and 9.2 from the textbook (2nd edition).
  6. (10/9/02)
    • Very long and detailed discussion about Homework 2.
    • How to implement trees (binary and without any upper bound on degrees) in standard programming languages (we discussed only C++) (Chapter 10.4)
    • Analysis of Heap-sort (Chapter 6)
      • Definition of a heap (structure and Max-heap property)
      • Heap representation in an array
      • How to use heap to sort: main idea behind Heap-sort algorithm
      • Maintaining heap property - Max-Heapify procedure
      • Building a heap in O(n) time (using Max-Heapify procedure)
      • An efficient implementation of heaps - Heap-Sort algorithm running in time O(n log(n))
    • Suggested reading: Chapter 6 and 10.4 from the textbook (2nd edition).
  7. (10/16/02)
    • Canceled (because of the power outage on the campus).
    • Even though it has been very unfortunate that the last class before the midterm was canceled, the exam date remains unchanged and the midterm will take place on October 23.
  8. (10/23/02)
  9. (10/30/02)
    • Priority queues - an abstract (and non-trivial) data structures
    • Efficient implementation of priority queues using heaps
    • Lower bound for sorting: every comparison-based sorting algorithm requires Omega(n log(n)) comparisons to sort n elements
    • Linear-time sorting algorithms: counting sorting and radix sort
    • Medians and order statistics
    • Suggested reading: Chapters 6.5, 8 and 9 from the textbook (2nd edition).
  10. (11/6/02)
    • Efficient algorithms for selection (discussion of 4 basic algorithms - Selection-Sort-like, heap-sort based, sorting based, Hoare's partition algorithm)
    • Linear time algorithm for selection
    • Optimal bounds/algorithms for finding the minimum element, two minimal elements, and the smallest and the largest elements (via "tournament tree")
    • Dictionary - data structures to implement SEARCH, INSERT, and DELETE operations on sets
    • Hashing - efficient approach to implement dictionaries
    • Hashing with chaining (short discussion about the average running time of implementations of dictionary operations under the assumption that the hash functions are random)
    • What are good hash functions
    • Suggested reading: Chapters 9 (especially 9.1 and 9.3) and 11 (except 11.4) from the textbook (2nd edition).
  11. (11/13/02)
    • Hashing - continuation
    • Binary search trees
    • Suggested reading: Chapters 11.4, 12, and 13 from the textbook (2nd edition).
  12. (11/20/02)
    • Special lecture: distinguished guest, Professor Wojciech Rytter
    • Red-black trees - binary search trees of logarithmic height
    • AVL trees
  13. (12/04/02)
    • Graphs: definitions of undirected and directed graphs
    • Graphs representations
    • Basic algorithms for graphs: graph traversing
    • Breadth-first search (BFS): meta-algorithm and its use for finding connected components and for finding a shortest path from a vertex to all other vertices in the graph (with no weights on the edges)
    • Depth-first search (DFS): meta-algorithm and its use for finding connected components
  14. (12/11/02)
    • Last day of classes ...
    • Depth-first search (DFS) - more properties
    • Topological sorting
    • Minimum spanning trees (Kruskal's algorithm)
    • Finding single source shortest paths in a graph (Bellman -Ford algorithm and Dijkstra's algorithm)
  15. (12/18/02)
    Final exam (Wednesday, December 18, 2002, 6-9pm)

 


Homework Assignments: (solutions have been removed)

  1. Problem Set 1 - was due September 25, 2002.
  2. Problem Set 2 - was due October 9, 2002.
  3. Problem Set 3 - was due November 6, 2002.
  4. Problem Set 4 - was due December 2, 2002.
  5. Problem Set 5 - was due December 12, 2002.


References:

There are also many other good textbooks available. I especially recommend the textbook »Algorithms in C++: Fundamentals, Data Structure, Sorting, Searching (Parts 1-4)« by R. Sedgewick, Addison-Wesley Pub Co; ISBN: 0201350882. (The book in C is also fine.)


Adobe and PDF files:

All of the course materials are in Adobe PDF format, and so you will need Adobe Acrobat or Acrobat Reader to read the files. A free copy of Adobe Acrobat Reader is available from Adobe web page at http://www.adobe.com/products/acrobat/readstep.html.


 

Artur Czumaj, December 20, 2002