University Computing Systems


System Documentation - Operator Initiated Shutdown of servers in ITC 5302/4320


General Information

Due to continued air conditioning problems in both ITC machines rooms, the need has risen for operator initiated shutdowns. In the event that an AC unit fails and ambient temperature climbs, an IST Operator would begin an orderly shutdown of systems via SCRAM. SCRAM is a captive menu program which allows the operator to initiate a shutdown of systems in an orderly fashion. SCRAM invokes a backend script via restricted SSH keys to initiate the shutdown, so there is no need for an operator to have direct access to a host system. The SCRAM backend script is housed on each host in /root/.ssh (for Linux) and /.ssh (for solaris). Currently, there are several groups with which systems can belong. The group a system is in determines the order in which it is to be powered down. At present, systems are to be powered down in the following order : HPC, includes all High Performance Compute Clusters, hydra, cappl, kong, etc. CORE, includes systems which provide core services such as AFS file and DB servers. Additional groups are expected to be added which would be powered down after HPC, but before CORE. These additional groups and the systems which will belong to them have yet to be defined.

shut down of HPC clusters

All systems with the exception of HPC clusters can be shutdown via their standard shutdown scripts, e.g., ’/usr/sbin/shutdown’ on Solaris and ’/sbin/poweroff’ on Linux. HPC clusters should be shutdown via ’/admin/cluster.shutdown.sh’ which is in place on all HPC clusters. This script initiates a shutdown of the clusters compute nodes then shuts down the master, which ensures the cluster is brought down clean. The cluster shutdown script is called via the SCRAM backend script when a SCRAM shutdown is invoked by the operator.

Last updated: 5/23/2007