CIS 750 (Lecture Notes)
The copyrighted material downloadable from this page is to be
used only by the students enrolled in CIS 750 under Prof. Gerbessiotis.
Distribution of this material outside this group is
NOT allowed for any reason.
C1. Midterm Performance and PS comments (INACTIVE)
C2. Lecture Summaries (ONLY POSTSCRIPT LINKS ACTIVE)
DISCLAIMER: The included material DOES NOT substitute the textbook
for this class. It should be used in conjunction with the textbook and the
material presented in class. If a statement in these "notes" seems to be
incorrect, report it to the instructor so that it be fixed immediately.
These "notes" are distributed to the students of CIS750
offered in Fall at the New Jersey Institute of Technology;
distribution outside this group of students is prohibited.
The material below will be uploaded in due time; an upload message will
appear as soon as the corresponding document is uploaded.
-
Subject 0 High Performance Computing (Web-Searching): Course overviews.
In
Postscript and
PDF
(** Jan 23, 2006 **)
-
Subject 1 Fundamentals of Web Searching.
In
Postscript and
PDF
(** Jan 30, 2006 **)
The Jan 23, 2006 versions in PS and
PDF
-
Subject 2 Document Preprocessing: Parsing and Tokenization
In
Postscript and
PDF
(** Feb 6, 2006 **)
There is a typo on page 21 related to the Golomb-3 representation of 8. There is a
missing 0 in the representation for $r=1$. This $r$ for $b=3$ is represented as $10$ and
thus if concatenated to $110$ ($q+1$ in unary) we get $11010$ for 8.
-
Subject 3 Document Processing: Indexing
In
Postscript and
PDF
(** Feb 13, 2006 **)
-
Subject 4 String Matching (Review Material)
In
Postscript and
PDF
(** Feb 13, 2006 **)
-
Subject 5 Modeling, Retrieval Evaluation, and Ranking
In
Postscript and
PDF
(** Feb 23, 2006 **).
Typos
- Page 23, code line 6 The Dest[v] was to mean D[v]. The a in line 13 is the d (=0.85).
- Page 15, line 1 "page if it referenced" should be "page if it is referenced".
- Page 18, line 8 "The hub function ... will become after scaling a(a_i )" should be "
...h(a_i )".
- Page 17, h(5) The values of h(5) is 0 not the indicated 1. This means that
sqrt(60)=7.74 will be used for normalization. As a results the new h(5) will also be a 0.
Given the precision indicated, no other change in values is required.
The convergence value for h are: 0.65,0.36,0.65,0.00,0.00 after 15 iterations and for
a are: 0.0,0.0,0.0,0.78,0.61.
In
Postscript and
PDF the updates are available.
(** Mar 20, 2006 **)
-
Subject 6 Parallel Computation An Introduction
In
Postscript and
PDF
(** Mar 6, 2006 **)
-
Subject 7 The Parallel Random Access Machine
In
Postscript and
PDF
(** Mar 6, 2006 **)
-
Subject 8 Architecture Independent Parallel Modeling. The Bulk-Synchronous Parallel and the
LogP models.
In
Postscript and
PDF
(** Mar 27, 2006 **)
-
Subject 9The Oxford BSPlib Toolset.
In
Postscript and
PDF
(** Mar 27, 2006 **)
-
Subject 10The message passing interface: MPI and MPI-2
In
Postscript and
PDF
(** Mar 27, 2006 **)
C2.a Presentations
- M. Robbins ``Open Source Search Technologies - Lucene and Nutch''
PowerPoint slides.
C3. Course-related papers and links
- Google-related papers
Click here. For local copies of some of these papers check list
below
- Google Cluster Architecture
In PDF .
- The Google File System
In PDF .
- Google MapReduce
In PDF .
- Who links to Whom?
In PDF .
- Google by Brin and Page
http://www-db.stanford.edu/~backrub/google.html.
Mirrored locally
in
pdf.
- Searching the Web by Arasu et al
http://oak.cs.ucla.edu/~cho/papers/cho-toit01.pdf.
- Recent Google DocIDs
docID.txt, picked at random from Google.
- Hubs and Authorities paper by Kleinberg (1998)
NEW-link(Mar 14).
- Parallel Computing Related Papers
- Gustafson paper
in pdf.
- Computational requirements for weather related tasks.
http://ct.gsfc.nasa.gov/Briefing_05-23-0
2.pdf.
- The PRAM paper by Fortune and Wyllie
in pdf.
- The Parallel Prefix Computations paper by Ladner and Fischer
in pdf.
- Brent's paper
in pdf.
- L.G.Valiant's BSP paper (C.ACM August 1990)
in pdf. Also a
BSP patent.
- The LogP model by Culler, Karp, Patterson et al
in pdf.
- BSPlib documentation by J.D.Hill et al
in Postcript.
- MPI related papers with sample code etc
http://www.cs.berkeley.edu/~bonache
a/upc/mpi2.html,
papers/mpi/MPI1av.pdf,
papers/mpi/MPI2.pdf,
papers/mpi/MPI2a.pdf (limitations of Put/Get ops),
papers/mpi/MPI_Day1.pdf.
Also,
papers/mpi/RMACompaq.pdf,
papers/mpi/RMAHP.pdf,
papers/mpi/RMAMPIBSP.pdf,
papers/mpi/mpi2.txt,
- MPI Tutorial at LLNL
http://www.ll
nl.gov/computing/tutorials/workshops/workshop/mpi/MAIN.html
.
- MPI-Forum
http://www.mpi-forum.org/index.html
.
- MPI-2 C++ Bindings
Click here.
.
- MPI-2 Manual/Documentation
Click here.
.
- Web-page for book "Using MPI-2: etc"
Click here.
.
- C Programming Tutorial
at Cambridge
University (UK)
.
- LAM-MPI (available in Red Hat linux distributions)
http://www.lam-mpi.org
.
- MPICH2 (another MPI implementation)
Click here.
.
- BSP Wordlwide
Click here
.
- BSPlib
The most recent version is
Version 1.4 of BSPlib,
also available
here in .tar.gz format.
The documentation of BSPlib in
Postcript.
Brief instructions to install
uniinstall.txt. on a single processor machine
(eg. PC with linux, Sun workstation with Solaris)
C4. Other Information
The following material might be useful.
- HTML Guides
- WebCrawlers
- Search Engines
- Studies
C5. Hints for Programming Assignments: Source Code and links
- Traverse a directory structure Simplistic and incomplete code
click here and more complete code
click here.
- Checksums and hashes
Adler-32 checksum.
CRCs.
- Porter's stemming algorithm with links and source code
http://www.tartarus.org/~martin/PorterStemmer/.
- PATRICIA trie original paper
click here.
click for local (pdf) copy.
- Trie vs Hashing comparison
PDF version.
- How to create a directory (C function call)
Click here.