| |
|
FINAL STATE: May 16, 2003
No changes are planned in the future
| |
»CIS 610: Data Structures and Algorithms«
Sections 106/108
Spring 2003
|
|
|
Class Hours: |
|
Wednesday |
|
6:00 pm - 9:05 pm |
|
TIER LECT 1 |
Office Hours: |
|
Tuesday |
|
4:00 pm - 5:25 pm |
|
|
|
Wednesday |
|
4:00 pm - 5:25 pm |
By special appointment: |
|
Monday |
|
12:15 pm - 12:55 pm |
TA's Office Hours: |
|
Monday |
|
11:00 am - 12:25 pm
|
|
GITC 4217 |
|
|
|
Thursday |
|
4:00 pm - 5:25 pm
|
|
|
- Grading will be based on 6 homework assignments,
an in-class midterm, and
a final.
The homework assignments will count for 5/14 (~36%) of the final grade,
the midterm for 4/14 (~29%), and the final exam for 5/14 (~36%).
If there is a large discrepancy between your homework and
exam grades, the grades will be balanced at student's discretion.
- Six homework assignments will be handled out.
Each homework is worth 100 points.
One homework with the lowest grade will be dropped from grade consideration.
Thus, the homework assignments contribute a total of at most 500 points towards
the final grade. Homework will be posted on the course Web-page
http://web.njit.edu/~czumaj/TEACHING/CIS610-S03.html
in Adobe Acrobat format.
2 homework assignments will be programming
assignments.
The other 4 will be problem sets.
-
Homework assignments are due at the beginning of the class on the due date.
-
No later solutions will be taken into consideration!
-
The two programming assignments must be done in either C or C++.
-
On all programming assignments, you need to hand in a hard copy of the following
(all parts must be handed together, and properly stapled together):
- A brief description of the algorithm together and its time complexity
analysis.
- The source code.
- The input files, and the output.
-
More details on programming assignments will be available here in some later
time.
-
Solutions to the four problem sets
can be submitted either per email
to the TA (sxl5937@oak.njit.edu),
or hand in a hard-copy at the beginning of the class.
Handwritten or typed solutions will be accepted.
-
If the homework is to be submitted per email, it must be in one
of the following formats: Adobe PDF-file, ASCII text,
Microsoft Word format.
-
Solutions must be readable (especially handwriting!!!), concise and complete.
-
Show your work, as partial credit will be given.
You will be graded not only for the correctness and efficiency of your answers, but also on your clarity.
Be neat.
-
Bonus points will be awarded for answers that are particularly clever, simple, or efficient.
-
Each time you must design an algorithm, try to design it as efficient as possible.
The faster algorithm you will describe the better grade you will get.
-
Unless it is stated otherwise, each algorithm described must be provided with a full analysis
of its correctness and its running-time.
-
The work you turn in must be your own personal work, composed and written by
you (unless stated very clearly
otherwise).
If you discuss a problem with a fellow student, you must mention this clearly
in your homework (name the fellow student before the solution of the problem in question).
Your work will then be compared to the other student's work to verify that your
solution was written by you and reflect your own
personal effort.
If you don't report it, it will be considered a violation of the course rules,
as described in the General Collaboration Policy section.
(Collaboration of this kind is not allowed in the midterm and final exams.)
-
Cheating: it is surprisingly easy to detect when students are not doing their own work.
Working with others on the assignments does not mean dividing the problems among you, and then swapping the answers.
-
The midterm exam took place on March 12, 2003.
It was an open textbook exam (means - students can use both textbooks
and class notes, but cannot use any other textbooks etc).
The exam covered all topics discussed so far. Students had 2,5 hours to solve 4 problems and earn up to 400 points.
-
The final exam will take place in
the Finals week, on Wednesday, May 14, from 6:00-9:00pm, in room TIER LECT 1
(normal classroom).
The final exam is cumulative (that is, all course material will be examined).
The final exam is open-textbook and open class-notes only.
It is worth 500 points.
Information about the requirements for the final is available.
Scores for the final exam.
Grades are available from the registrar office (and won't be posted on this web page).
Prerequisite:
- CIS 335 (Data Structures & Algorithms), or
CIS 505 or equivalent
- Math 226 (Discrete Math)
Contact the instructor immediately if you didn't take any of these courses.
Additionally, students must be able to program in either C or
C++ and to understand well programs written in C++.
The main aim of the course is to teach students to
understand algorithms and data structures.
We will discuss algorithms and data structures for most fundamental
problems. For those problems we shall focus on the design of efficient
algorithms and the use of proper data structures. Some issues of
the analysis of the algorithms (like correctness, running-time
analysis, etc.) will be also covered.
From the NJIT graduate course catalog:
Intensive study of the fundamentals of data structures and
algorithms. Presents the definitions, representations, processing algorithms
for data structures, data structures, algorithms and their applications
including linked lists, various tree organizations, hash tables, strings,
storage allocation, algorithms for searching and sorting, and a selected
collection of other algorithms. Programs are assigned to give students
experience in algorithms, data structure design and implementation.
|
Course outline / topics to be covered (tentative):
|
- General introduction:
- what is an "algorithm design"
- what are "data structures"
- what does it mean to "analyze an algorithm";
worst-case and average-case analysis
- we shall learn about techniques, tools, tricks plus
a necessary information about the complexity issues
- Basics:
- analysis techniques
- induction, recursion
- Big-Oh notation
- solving recurrences
- summations, sets, relations, functions, graphs, trees
- Basic data structures
(self-review):
- data representation
- linked-lists
- arrays and matrices
- stacks
- queues
- trees
- Search Trees:
- binary search trees
- balanced binary search trees (either AVL trees or Red-Black trees)
- Hashing
- Sorting algorithms and order statistics:
- the importance of sorting (in practice and
in the understanding of the algorithm design)
- basic algorithms: Insertion sort, Selection sort, Merge sort, Quick sort,
and Heap sort
- integer sorting: Counting sort, Radix sort, and Bucket sort
- medians and selection
- Graph algorithms
- Basic algorithm design techniques and their applications:
- divide-and-conquer
- greedy algorithms
- dynamic programming
- amortized analysis
-
(1/22/03)
- Discussion of the syllabus, requirements, topics to be covered, textbooks.
- What is an algorithm, what is a data structure, and what is an abstract data structure.
- Algorithms versus Implementations of algorithms.
- A simple algorithm for finding a minimum element in an array - analysis of the running time.
- Algorithms for the "simple" searching problem.
- Simple linear-time algorithm.
- Faster algorithm when the input sequence is sorted (only design and brief running-time analysis).
-
(1/29/03)
- Binary search algorithm
- Iterative algorithm and the analysis of its running time (worst-case running time is Theta(log(n)))
- Recursive algorithm (worst-case running time analysis is obtained by solving the recurrence
T(1) = Theta(1) and T(n) = Theta(1) + T(n/2); solution to that recurrence is T(n) = Theta(log(n)))
- Analysis of the running time of iterative algorithms.
- Analysis of the running time of recursive algorithms by reducing the worst-case running-time bound to solving a recurrence.
- Quiz and Questionnaire.
- Basic method of solving recurrences: substitution method.
- Master theorem: a generic method to solve recurrences of the form T(n) = a T(n/b) + f(n).
- Handout 1
with some basic facts from calculus, definitions of Big-Oh, Big-Delta,
and Big-Omega notation, and description of Master Theorem.
-
Suggested reading: Section 2 in the textbook; read especially Section 2.4
(and check all definitions of O, Omega, and Theta, and go through basic
properties of these notations described in Figures 2.15 and 2.16), Section 2.5, and Section 2.3.
- The following algorithms from the textbooks were presented in the class:
Programs
2.28 (Sequential search),
2.30 (Binary search),
2.3 (A function to evaluate a polynomial),
2.4 (Horner's rule for polynomial evaluation),
2.7 and 2.12 (Selection sort),
2.14 and 2.15 (Insertion sort);
(many of the algorithms were presented in a slightly different form).
-
(2/5/03)
- Discussion of the homework assignment.
- Techniques to prove that a given function has a given order (that is, that f(n)
= O(g(n)), f(n) = Omega(g(n)), and f(n) = Theta(g(n))).
- Methods to solve recurrences: substitution method and Master's
theorem. Examples for using Master's theorem (and when this method doesn't
work).
- Merge-Sort: its implementation and the running time analysis.
- In-place and stable algorithms.
- Quick-Sort: its implementation, worst-case running-time analysis, and
brief discussion why Quick-Sort works very good for "typical data."
(We discussed two version of the Partition algorithm - one uses an additional
array and another is in-place.)
-
(2/12/03)
- Discussion about Homework 1.
- Algorithms for maintaining dictionaries:
- what is a dictionary
- implementing dictionary using double-linked lists
(worst-case running time of Search, Insert, Delete is Theta(n))
- implementing dictionary using arrays
(worst-case running time of Search is Theta(log(n)) and
worst-case running time of Insert, Delete is Theta(n))
- hashing - main concepts
- dealing with collisions in hashing
- hashing with linear open addressing
- hashing with chains
-
Suggested reading: Section 7.1, 7.2, 7.4 in the textbook.
It is strongly advised that students will read Sections 3-6 on their own.
These topics will not be discussed in the class because it is assumed that they
have been covered in prior courses that form prerequisite for CIS 610.
However, these topics are crucial to understand the concepts discussed in the class!!!
-
(2/19/03)
- Hashing: discussion of further properties, implementations, and the choice of hash functions.
- Skip-list: basic ideas, and basic properties ("Typical" running time of Search, Insertion, and Delete is O(log(n))).
- Lempel-Ziv-Welch compression: basic ideas and basic properties.
- Trees: basic definitions, basic properties, and representation.
- Binary trees and their use: representation, tree traversals (pre-order, in-order, and post-order).
Two main examples: expression trees and their use and sorting in binary search trees.
-
Suggested reading: Sections 7.3 - 7.5 and 8.1 - 8.9 in the textbook.
-
(2/26/03)
- Data structures for maintaining disjoint sets (Online equivalence classes, Section 8.10.2 in the textbook)
- Operations Make-Set, Find, Union
- Simple algorithm using arrays (worst-case running-time O(n) per operation Union).
- Simple approach using trees (Program 8.15 in the textbook worst-case running-time O(n) per Union/Find).
- Heuristics improving the running-time:
- Performing Union operations according to either the size (weight) of each of the two trees
or the height of each of the tree.
- Performing Find with path compression.
- With the two heuristics above the running time is O(n + f alpha(n+f,n)), see Theorem 8.2 in the Textbook,
where alpha is an extremely slowly growing function called an inverse Ackerman function,
and we assume one performs f Finds, u Unions, and u > n/2,
where n is the initial number of sets
- Priority Queues and their applications.
- Using priority queues to implement "scheduling problems" and sorting.
- Using linked-lists to implement priority queues.
- Using heaps to implement priority queues.
- Definition of a max-heap and its basic properties.
- Implementation of MaxHeap(), Insert(x), DeleteMax(), Initialize(A[1...n]).
- Heap can be implemented in-place, that is, it is an implicit data structure:
we "see a tree" but "work on an array".
- MaxHeap() can be performed in time O(1).
- Insert(x) and DeleteMax() can be performed in time O(log(n)), where n is
the number of the keys in the heap.
- Initialization(A[1...n]) can be performed in time O(n), where n is
the number of the keys in the heap.
- HeapSort - an optimal sorting algorithm that uses MaxHeap and runs in time O(n log(n)).
-
Suggested reading: Sections 3.8.3, 8.10.2, 9.1-9.3, 9.5.1 in the textbook.
-
(3/5/03)
- Huffman codes: properties, definitions, and efficient implementation.
- Lower bound for sorting: Every comparison-based algorithm that sorts n keys requires Omega(n log(n)) time.
(Textbook: Section 14.4.2)
- Sketch of the proof - reduction to (binary) computation trees, in which each non-leaf node corresponds
to a test if one input key is smaller/greater than another input key, each leaf-node corresponds
to the output permutation of the input, and the height of the tree is a lower bound for
the worst-case running time for sorting. Since there must be at least n! leaves, the
height of any (binary) computation tree must be at least log(n!), and since
log(n!) = Theta(n log(n)), this proves the promised lower bound.
Note: this lower bound is not discussed in the textbook but all students should be aware of this result
and
understand the main idea of its proof.
- Sorting algorithms beating the lower bound: non-comparison-based sorting.
- Counting sort: a sorting algorithm that sorts n keys, each key being an
integer from {1 ... M} can be performed in time O(n + M).
Note: counting sort is not discussed in the textbook. Instead, a similar algorithm (Bin sort)
is discussed.
- Radix sort.
- Discussion on the midterm exam.
- Suggested reading: Sections 9.5.3, 14.4.2, 3.8.1 (Bin sort), 3.8.2 in the textbook.
-
(3/12/03)
-
(3/26/03)
- Discussion about midterm problems.
- Selection algorithms:
- Finding minimum (maximum) element requires exactly n - 1 comparisons.
- Finding the smallest and the 2nd smallest element using n - 2 + log(n) comparisons.
(corrected!!!)
- Finding the smallest and the largest element using 3n/2 - 2 comparisons.
(corrected!!!)
- Finding the kth smallest element can be done in O(n log(n))-time
by reducing to sorting.
- Finding the kth smallest element can be done in O(n k)-time by
calling k times the minimum-finding algorithm.
- Finding the kth smallest element can be done in O(n + k log(n))-time
by first building a priority queue and then making k calls to Delete-Min.
- Finding the kth smallest element can be done in average O(n)-time
using an algorithm due to Hoare; the algorithm uses the same approach as Quick-Sort algorithm.
(The worst-case running time of this algorithm is Theta(n2)).
- It is possible to find the kth smallest element in O(n)-time by using a (complicated)
variant of Hoare's algorithm. (This algorithm hasn't been discussed in the class.)
- Finding the kth smallest element can be efficiently implemented using binary search trees.
- Binary search trees: repeating the definition and main properties (including a O(n)-time
sorting algorithm once a binary search tree is built).
- Balanced binary search trees: why do we need them.
- AVL-trees: definition and basic properties.
- Suggested reading: Chapter 10, 11 (only 11.1 11.2.1, and 11.2.2), and 14.2.4 in the textbook.
-
(4/2/03)
- AVL-trees: efficient algorithms for Insertion and Deletion.
- Red-black trees: definition and basic properties (in particular, O(log n) depth).
- B-trees: definition and basic properties.
- Graphs: basic definitions, basic properties, and graph representations (adjacency matrices and adjacency lists).
- Suggested reading: Chapters 11 and 12 in the textbook.
-
(4/9/03)
- Depth-first search (DFS): traversing a graph to find all vertices reachable from a given vertex.
Running time O(|V| + |E|) in the adjacency list graph representation and O(|V|2)
in the adjacency matrix representation.
- Further application of DFS: partitioning of the vertex set into connected components.
- Breadth-first search (BFS): another way of traversing a graph.
(Running-time complexity is the same as that of DFS.)
- Application to computing an Erdos number.
- Allows to solve the following problem (in time proportional to the number of vertices
and edges in the adjacency list graph representation): for a given vertex v, compute
the distance from v to any other vertex in the graph, where the distance means the minimum number
of edges one has to traverse to reach a given vertex from vertex v.
- Topological sorting (or how should we help our younger brother to dress himself in the morning):
- An O(|V|+|E|)-time algorithm (in the adjacency list graph representation).
- Main property: topological sorting exists if and only if the underlying directed graph is acyclic.
- Main idea: if a directed graph is acyclic then it does have a vertex of in-degree 0.
- Recursive algorithm: find a vertex of in-degree 0;
remove it from the graph, and
number the resulted graph with the larger numbers.
- Spanning trees and Minimum spanning trees.
- Kruskal's algorithm to find a minimum spanning tree.
- Description of the algorithm.
- Using "clever" data structures to obtain an efficient implementation: using priority queue and Union-Find data structure.
- Efficient implementation gives the running time of O(|E| log|E|)
- Suggested reading: Sections 12, 13.3.3, and 13.3.6 in the textbook.
Instructor's advise: read this part of the material in the textbook!!!
-
(4/16/03)
- Minimum spanning tree: Prim's algorithm.
- Data structures used for an efficient implementation of Prim's algorithm (O(|E| log|E|)-time).
- Single-source shortest path problem in weighted graphs.
- Dijkstra algorithm: efficient implementation using priority queues (O((|E| + |V|) log(|E|+|V|))-time).
- All-pairs shortest path problem in weighted graphs.
- "Trivial algorithm": apply |V| times Dijktra's algorithm.
- Slow but simple algorithm with the running time of O(|V|4) -
"recursive" computations of A[i,j,k] = length of the shortest path from i to j that uses at most k edges.
- Improvement to the running time of O(|V|3 log|V|).
- Matrix multiplication: given two n x n matrices, their product can be computed in O(n3)time.
- Computing the kth power of a matrix A can be done in O(n3 log(k))time.
- Suggested reading: Sections 13.3.6 and 15.2.4 in the textbook.
-
(4/23/03)
- Matrix multiplications vs. matrix additions (standard algorithm requires O(n m k) time
to multiply a n x m matrix by an m x k matrix, and only O(n m) time to add
an n x m matrix to an n x m matrix.
- Divide-and-conquer method: Strassen's algorithm for matrix multiplications (only to multply an n x n matrix by an n x n matrix):
- Recursive call to 7 multiplications of n/2 x n/2 by n/2 x n/2 matrices and about 20 non-recursive additions.
- Running time is the solution of a recurrence T(n) = 7 T(n/2) + O(n2), what gives
the running time of O(nlog(7)) ~ O(n2.81)
- Hot to avoid recursive calls in Merge-sort.
- Matrix multiplication chains
- Assume that the cost of multiplying an n x m matrix by an m x k matrix is O(n m k).
- Given a sequence of N matrices, M1, M2, ..., MN,
such that each matrix Mi is of dimensions si-1 x si.
Find an optimal order to multiply the matrices to minimize the total number of operations.
- For example, if N = 3, s0 = 1000000, s1 = 1,
s2 = 1000000, and s3 = 1, then
- The cost of multiplying (M1 x M2) x M3 is 2 x 1000000000000 operations, while
- the cost of multiplying M1 x (M2 x M3) is 2 x 1000000 operations.
- Dynamic programming algorithm that solves this problem in O(n3) time.
- Knapsack problem:
- Computationally a very hard problem (so-called NP-hard).
- ... but has a pseudopolynomial-time algorithm:
- Dynamic programming algorithm that solves the problem in time O(n B),
assuming that the weights are integer, where n is the number of objects and
B is the capacity of the knapsack.
- Suggested reading: Sections 14.1 (especially, Example 14.3), 15.2.1, and 15.2.3 in the textbook.
-
(4/30/03)
- Last class: Solving sample problems before the final (Prof. Rytter).
-
(5/14/03)
- Final Exam will take place in
the Finals week, on Wednesday, May 14, from 6:00-9:00pm, in room TIER LECT 1
(normal classroom)
-
The final exam is cumulative (that is, all course material will be examined).
-
The final exam is open-textbook and open class-notes only.
-
It is worth 500 points.
-
Information about the requirements for the final is available.
Scores for the final exam.
- Grades are available from the registrar office (and won't be posted on this web page).
-
Homework 1
was due Wednesday, February 12, 6:00 pm.
Sample
solutions to Homework 1.
-
Programming Assignment A
was due Friday, March 14, 6:00 pm.
-
Homework 2
was due Wednesday, March 26, 6:00 pm.
Sample
solutions to Homework 2.
-
Homework 3
was due Wednesday, April 9, 6:00 pm.
Sample
solutions to Homework 3.
-
Programming Assignment B
was due
Sunday, May 4, 11:59 pm.
-
Homework 4
was due Wednesday, April 30, 6:00 pm.
Sample
solutions to Homework 4.
Grades before the final (include everything but the FINAL exam).
Towards the final grade only five out of six homework are counted -
one with the lowest score is dropped;
please check if the right homework has been dropped ...
- MAIN TEXTBOOK:
Sartaj SAHNI:
»Data Structures, Algorithms, and Applications in C++«,
McGraw-Hill, 1998.
- Additional references:
- T.H. Cormen, C.E. Leiserson, R.L. Rivest, C. Stein:
»Introduction to Algorithms«,
The MIT Press 2001, 2nd edition (though the 1st edition is also fine).
- R. Sedgewick:
»Algorithms in C++: Fundamentals,
Data Structure, Sorting, Searching (Parts 1-4)«,
Addison-Wesley Pub Co; ISBN: 0201350882.
(The book in C is also fine.)
All of the course materials are in Adobe PDF format, and so you will
need Adobe Acrobat or Acrobat Reader to read the files.
A free copy of Adobe Acrobat Reader is available from Adobe web page at
http://www.adobe.com/products/acrobat/readstep.html.
Artur Czumaj,
May 16, 2003
| |
|