hydra.njit.edu

Hydra is a Beowulf cluster with 67 nodes running RHEL 3.0 (kernel 2.6) built by Microway. Each node contains 2 AMD Opteron 250 CPUs with 1 MB L2 cache, and 4 GB of RAM. The nodes are interconnected with Myrinet. The default Message-Passing Interface implementation is mpich compiled with the Portland Group compilers. Only batch processing is supported; currently, there is a 20-node queue for serial jobs and a 46-node queue for parallel jobs. Both queues are managed via OpenPBS and the job scheduler is FIFO (first-in first-out). On the master node, execution time for user programs is limited to 1 hour. However, please avoid running jobs on the master node.

The cluster was funded in part by a grant from the National Science Foundation. The projects that have priority on this cluster are described here. All DMS members, as well as members of the greater NJIT research community, are eligible to use the cluster after a short proposal submission/evaluation cycle described here. Please use the following links for further information (use "Back" to return here):
Login Info :: Run Scripts :: Compilers & Libraries :: Documents :: Other Software
How to Print :: Benchmarks :: Job Monitoring :: Workshop :: Links & Code Examples
Use the Problem Reporting facility provided in these pages to report any problems you may encounter. Click here to get up current information on cluster load, memory usage, etc.

The Information on this page is OBSOLETE. I will be updating this page as more info on the reconfigured Hydra is made available by the SysAdmins.

Login Information

Logins to hydra are currently managed locally due to problems with building OpenAFS on Linux kernel 2.6. The goal is to eventually get OpenAFS running on hydra and convert all logins to use AFS/kerberos authentication. There is currently no timeframe set for doing this.

Direct connections to hydra are made through ssh (use the -X switch in order to allow X11 applications to display on your local desktop) and must be done from within the NJIT network. If a connection to hydra must be made from outside of the NJIT network, then a VPN client must be used or a connection to one of the public AFS systems must be used as a jump-point into hydra. Connections are allowed only on the master node and not to the slave nodes.

Run Scripts for Serial, OMP, & MPI Jobs

OpenPBS (Open Portable Batch System) is used to submit parallel (mpi), as well as serial and multi-threaded, jobs on hydra. When a job is submitted via PBS, that job is placed into the PBS scheduler. The PBS server then executes that job, when the requested resources become available. Prior to execution, PBS executes /var/spool/pbs/mom_priv/prologue which enables user access for each node that the job will run on. After execution has completed, PBS executes /var/spool/pbs/mom_priv/epilogue which disables user access to each node the job was run on. The number of nodes for mpi jobs is specified in the PBS run script that is used to submit the job to the parallel queue. The above prologue/epilogue scripts are automatically run on each node that has been allocated for a user's particular job.

The following queues are set up on hydra: serial (20 nodes) & parallel (46 nodes). The distribution of nodes among these two queues will change in the near future in favor of the parallel queue. Jobs using threads (High-Performance Fortran), or OMP (f90 with OpenMP directives), are limited to 2 threads and must be placed in the serial queue.

The following scripts can be cut and pasted into your own text file:
Example PBS Script for submitting an MPI Job
Example PBS Script for submitting a Serial Job
Example PBS Script for submitting an OMP Job

A PBS job is submitted to the queue using the qsub command and specifying the pbs script file as an argument to qsub. When the job is submitted, the job-id will be printed upon returnof the qsub command to the window prompt. After the job is submitted to the queue, the status of the job can be checked with the qstat comman or the xpbs utility (see "Job Monitoring" section below). A job can be deleted from the queue by giving the job-id as an argument to the qdel command or by using xpbs. For a detailed listing of PBS commands and xpbs usage see the pbs and/or xpbs man/info pages by doing "man pbs" and/or "man xpbs" (you can replace man with info for a more navigable presentation).

Compilers & Libraries

Compilers

Portland Group
GNU

Math Libraries

ACML
IMSL
ATLAS
FFTW
ScaLAPACK
Goto BLAS
NAMD/CHARM
SUNDIALS
MISCELLANEOUS

Message-Passing Interface

mpich
Myrinet

Docs

Other Software

Printing from Hydra

There are three printers defined in hydra's print queue.

- math615bw - HP Laserjet 4050 printer located in Cullimore 615 (default)
- math615co - HP Color Laser 4500 printer located in Cullimore 615
- math511bw - HP Laserjet 2300 printer located in Cullimore 512

A user may print directly to their office printer which is locally connected to their office workstation, providing that their system has been setup to accept print requests from remote systems. A user may request that their system be configured to accept remote print jobs by sending mail to sys@oak.njit.edu. Once a user's workstation is properly configured to accept remote print requests, jobs may be printed to the user's locally attached printer by using the lprw script. This script requires that two environment variables be set:
RPRINTER - set this to the remote print queue
RHOST - set this to the remote host that the print queue defined in RPRINTER is located on.
Example usage of lprw:

setenv RPRINTER office
setenv RHOST remote
lprw file.ps

The above example prints the file "file.ps" on the "office" print queue which is located on the machine "remote."

Results of Some Benchmarks

FFTW serial benchmarks

NAS MPI & OMP Benchmarks

More coming soon.

The stommel code timings

I also wanted to get an idea how computation times scales with problem size in an MPI solution of a typical two-dimensional elliptic problem. To that effect, the stommel code (used in MPI/OPENMP courses at the San Diego Supercomputing Center) was run for 75000 jacobi iterations (no stopping criterion) and sizes n00xn00 where n=2,4,8,16. For the serial runs CPU time is reported. For the parallel runs we report both CPU time and Wall time for 1d and 2d domain decompositions on 4, 8, and 16 processors (nodes:mpi:ppn2, where nodes=2, 4, and 8). We also report timings with 32 procesors for 1600x1600 and 3200x3200 domains.

serial run timings

size            CPUt
200x200         00:02:33
400x400         00:15:01
800x800         01:09:56
1600x1600       04:41:36
3200x3200       19:19:19

parallel run timings
200x200, 1d-decomposition np CPUt Wallt 4 00:02:59 00:00:46 8 00:03:20 00:00:27 16 00:04:32 00:00:17	200x200, 2d-decomposition np CPUt Wallt 4 00:03:00 00:00:46 8 00:03:28 00:00:27 16 00:04:32 00:00:18
400x400, 1d-decomposition np CPUt Wallt 4 00:11:29 00:02:57 8 00:12:08 00:01:36 16 00:13:12 00:00:50	400x400, 2d-decomposition np CPUt Wallt 4 00:11:32 00:02:54 8 00:12:00 00:01:33 16 00:13:34 00:00:51
800x800, 1d-decomposition np CPUt Wallt 4 01:06:06 00:16:33 8 00:57:07 00:07:10 16 00:50:01 00:03:08	800x800, 2d-decomposition np CPUt Wallt 4 01:05:13 00:16:21 8 00:56:57 00:07:09 16 00:48:28 00:03:02
1600x1600, 1d-decomposition np CPUt Wallt 4 04:44:05 01:11:08 8 04:36:55 00:34:41 16 04:27:51 00:16:47 32 04:20:25 00:08:11	1600x1600, 2d-decomposition np CPUt Wallt 4 04:55:50 01:14:03 8 04:50:15 00:36:21 16 04:29:29 00:16:58 32 04:43:12 00:08:56

"large" parallel run timings
3200x3200, 1d-decomposition np CPUt Wallt 32 18:25:21 00:34:38	3200x3200, 2d-decomposition np CPUt Wallt 32 20:34:53 00:38:40

Job Monitoring

The PBS run scripts, described above, provide for sending e-mails to users alerting them of job start and job completion. While a job is running it can be monitored with either qstat or xpbs. Information on how to use these two utilities can be obtained by issuing "info qstat" or "info xpbs" (without the quotation marks) at the hydra command line.