Hydra is a Beowulf cluster with 67 nodes running RHEL 3.0 (kernel 2.6) built by Microway. Each node contains 2 AMD Opteron 250 CPUs with 1 MB L2 cache, and 4 GB of RAM. The nodes are interconnected with Myrinet. The default Message-Passing Interface implementation is mpich compiled with the Portland Group compilers. Only batch processing is supported; currently, there is a 20-node queue for serial jobs and a 46-node queue for parallel jobs. Both queues are managed via OpenPBS and the job scheduler is FIFO (first-in first-out). On the master node, execution time for user programs is limited to 1 hour. However, please avoid running jobs on the master node.
The cluster was funded in part by a grant
from the National Science Foundation. The projects that have priority
on this cluster are described here. All DMS
members, as well as members of the greater NJIT research community, are
eligible to use the cluster after a short proposal submission/evaluation
cycle described here. Please use the
following links for further information (use "Back" to return here):
Login Info :: Run Scripts :: Compilers
& Libraries :: Documents :: Other Software
How to Print :: Benchmarks
:: Job
Monitoring :: Workshop :: Links & Code Examples
Use the Problem Reporting facility provided in these pages to
report any problems you may encounter. Click here to get up current information
on cluster load, memory usage, etc.
Logins to hydra are currently managed locally due to problems with building
OpenAFS on Linux kernel 2.6. The goal is to eventually get OpenAFS
running on hydra and convert all logins to use AFS/kerberos authentication.
There is currently no timeframe set for doing this.
Direct connections to hydra are made through ssh (use the -X switch in order to allow X11 applications
to display on your local desktop) and must be done from within the
NJIT network. If a connection to hydra must be made from outside of
the NJIT network, then a VPN client must be used or a connection to one
of the public AFS systems must be used as a jump-point into hydra. Connections
are allowed only on the master node and not to the slave nodes.
OpenPBS (Open Portable Batch System) is used to submit parallel (mpi), as well as serial and multi-threaded, jobs on hydra. When a job is submitted via PBS, that job is placed into the PBS scheduler. The PBS server then executes that job, when the requested resources become available. Prior to execution, PBS executes /var/spool/pbs/mom_priv/prologue which enables user access for each node that the job will run on. After execution has completed, PBS executes /var/spool/pbs/mom_priv/epilogue which disables user access to each node the job was run on. The number of nodes for mpi jobs is specified in the PBS run script that is used to submit the job to the parallel queue. The above prologue/epilogue scripts are automatically run on each node that has been allocated for a user's particular job.
The following queues are set up on hydra: serial (20 nodes) & parallel (46 nodes). The distribution
of nodes among these two queues will change in the near future in favor
of the parallel queue. Jobs using threads (High-Performance Fortran),
or OMP (f90 with OpenMP directives), are limited to 2 threads and must
be placed in the serial queue.
The following scripts can be cut and pasted into your own text file:
Example PBS Script for submitting an MPI
Job
Example PBS Script for submitting a Serial
Job
Example PBS Script for submitting an OMP
Job
A PBS job is submitted to the queue using the qsub command and specifying the pbs script file as an argument to qsub. When the job is submitted, the job-id will be printed upon returnof the qsub command to the window prompt. After the job is submitted to the queue, the status of the job can be checked with the qstat comman or the xpbs utility (see "Job Monitoring" section below). A job can be deleted from the queue by giving the job-id as an argument to the qdel command or by using xpbs. For a detailed listing of PBS commands and xpbs usage see the pbs and/or xpbs man/info pages by doing "man pbs" and/or "man xpbs" (you can replace man with info for a more navigable presentation).
There are three printers defined in hydra's print queue.
- math615bw - HP Laserjet 4050 printer located in Cullimore
615 (default)
- math615co - HP Color Laser 4500 printer located in Cullimore
615
- math511bw - HP Laserjet 2300 printer located in Cullimore
512
A user may print directly to their office printer which is locally
connected to their office workstation, providing that their system has
been setup to accept print requests from remote systems. A user may request
that their system be configured to accept remote print jobs by sending
mail to sys@oak.njit.edu. Once a user's workstation is properly configured
to accept remote print requests, jobs may be printed to the user's locally
attached printer by using the lprw
script. This script requires that two environment variables be set:
RPRINTER - set this to the remote print queue
RHOST - set this to the remote host that the print queue defined
in RPRINTER is located on.
Example usage of lprw:
setenv RPRINTER office
setenv RHOST remote
lprw file.ps
The above example prints the file "file.ps" on the "office"
print queue which is located on the machine "remote."
NAS MPI & OMP Benchmarks
The stommel code timings
I also wanted to get an idea how computation times scales with problem size in an MPI solution of a typical two-dimensional elliptic problem. To that effect, the stommel code (used in MPI/OPENMP courses at the San Diego Supercomputing Center) was run for 75000 jacobi iterations (no stopping criterion) and sizes n00xn00 where n=2,4,8,16. For the serial runs CPU time is reported. For the parallel runs we report both CPU time and Wall time for 1d and 2d domain decompositions on 4, 8, and 16 processors (nodes:mpi:ppn2, where nodes=2, 4, and 8). We also report timings with 32 procesors for 1600x1600 and 3200x3200 domains.
serial run timings
size CPUt
200x200 00:02:33
400x400 00:15:01
800x800 01:09:56
1600x1600 04:41:36
3200x3200 19:19:19
200x200, 1d-decomposition |
200x200, 2d-decomposition |
400x400, 1d-decomposition |
400x400, 2d-decomposition |
800x800, 1d-decomposition |
800x800, 2d-decomposition |
1600x1600, 1d-decomposition |
1600x1600, 2d-decomposition |
3200x3200, 1d-decomposition |
3200x3200, 2d-decomposition |
The PBS run scripts, described above, provide for sending e-mails to users alerting them of job start and job completion. While a job is running it can be monitored with either qstat or xpbs. Information on how to use these two utilities can be obtained by issuing "info qstat" or "info xpbs" (without the quotation marks) at the hydra command line.