University Computing Systems


HPC linux cluster - cappl.njit.edu


The goal of this page is to make available a general amount of information on the
cappl linux cluster to its users.


General Information about cappl
AFS on cappl
Software Available on cappl
User Logins
Printing from cappl
Getting Help



General Information

The cappl cluster consists of 16 Dell Poweredge 1750s which make up the slave nodes and a single Dell Poweredge 1750 for the master node. All of the nodes, including the master have 1GB RAM and 2 Intel(R) Xeon(TM) Processors operating at 2.80GHz. The Operating System installed on the master node is RedHat Enterprise Linux Version 4.0 (Nahant Update 3) and is running Linux kernel 2.6. The slave nodes are connected via Gigabit Ethernet (GigE) to a cisco catalyst XXX switch. There are various Message Passing Interfaces available for use on cappl. See MPI below for additional details. The cluster management software in use is warewulf which provides a framework for managing clusters. A single image is created with only the essential components needed for operation. This image is stored on the master node and each slave node pulls a copy of this image at boot time, making each slave node essentially diskless. After each node boots up, a RAM disk stores an image of the OS, which is a small 45MB. The disks on the master node are setup in a RAID 0 (mirrored) configuration. If a disk failure should occur on one of the disks, the system will continue to be operational.
Master node disk layout: FileSystem Size Purpose / 12.0GB Root File System /boot 2.0GB OS boot loader files (GRUB) /home 37.0GB User Home Directories /opt 11.0GB Locally installed software /usr/vice 3.4GB AFS client files and disk cache
Disk layout on each slave node: FileSystem Size Purpose /scratch 76.0GB Scratch space available for temporary storage /usr/vice/cache 3.5GB The AFS disk cache used buy afsd. [ swap ] 2.0GB Local Swap Space The following NFS mounts are mounted from the master: /opt /home

AFS on cappl

As previously mentioned, the cappl cluster is an AFS client. version (1.4.0) of the OpenAFS Software is running on all of the slave nodes and the master. A local disk cache is used to cache AFS files to local disk space which speeds up access to frequently accessed files. When one logs in to cappl.njit.edu using ssh one will obtain their AFS token, providing that ssh keys have not been setup. See: http://web.njit.edu/all_topics/SSH for additional information on SSH keys. To check the status of your AFS token, run: /usr/bin/tokens and the output should be similar to: Tokens held by the Cache Manager: User’s (AFS ID 22966) tokens for afs@cad.njit.edu [Expires Apr 27 14:47] --End of list-- In order to be able to read from and write into your AFS home directory, you must obtain your AFS token. At the present time there is no way to securely obtain one’s AFS kerberos token on each of the slave nodes when running MPI jobs. Due to this limitation the use of a local home directory on cappl (/home) will need to be used instead of using your AFS home directory. However, one will still be able to use software that is installed in AFS without first obtaining a token. This limitation only effects the use of one’s AFS home directory during MPI jobs.

Software

Software that is specific to cappl includes the MPI Libraries and Sun Grid Engine Software. The cappl cluster, including all of the slave nodes are AFS clients, so the AFS file space is available on all nodes. Since cappl is an AFS client all software available on all Linux AFS clients is also available on cappl. There are a number of MPI implementations available on cappl. The available libraries include: MPICH MPICH/2 LAM OpenMPI The above libraries are installed in /opt/mpi on cappl The list of available compilers on cappl currently includes gcc version 3 (3.4.5) At this time no commercially available compilers such as Portland or Pathscale are available on cappl. It is important to mention that the above MPI implementations are not loaded by default. The module command is a utility that can be used to assist the user in setting up the needed environment, i.e, execution and library path. There are various modules which are available, to see them issue the following command: module avail --------------------------- /usr/share/modules/modulefiles -------------------- dot mpi/lam-gnu4 mpi/mpich-pgi null sge module-cvs mpi/lam-pgi mpi/mpich2-gnu3 pathsc use.own module-info mpi/mpich-gnu3 mpi/mpich2-gnu4 pgi52 modules mpi/mpich-gnu4 mpi/ompi-gnu3 pgi60 mpi/lam-gnu3 mpi/mpich-pathsc mpi/ompi-gnu4 pgi61 The above shows all modules which are available to be loaded The sge module is used for Sun Grid Engine. In order to run jobs on cappl, all users must have the sge module loaded. By default, users who source /afs/cad/solaris/local/etc/std-cshrc in their ~/.cshrc files will have the sge module loaded. A user may see what modules are loaded by executing the following command: module list Currently Loaded Modulefiles: 1) sge The above output shows that the sge module is currently loaded. if the sge module was not loaded, i.e., in the case of the user not sourcing /afs/cad/solaris/local/etc/std-cshrc in their .cshrc file, the user can load the sge module with the following command: module load sge Additional information on the module command can be found on the module man page.

User Logins

Your login information, i.e, username and password is the same on cappl as it is on all other AFS clients.

Printing

There are currently no printers configured on cappl

Support

The cappl cluster is managed by University Computing Systems, IST. Any questions regarding usage or operation of the cluster should be directed to sys@oak.njit.edu . Before contacting UCS, users should consult the Sun Grid Engine usage page to see if the answer to a particular question can be found there.