Operator Initiated Shutdown

University Computing Systems

System Documentation - Operator Initiated Shutdown of servers in ITC 5302/4320

General Information

Due to continued air conditioning problems in both ITC machines rooms, the need
has risen for operator initiated shutdowns. In the event that an AC unit fails
and ambient temperature climbs, an IST Operator would begin an orderly shutdown
of systems via SCRAM.

SCRAM is a captive menu program which allows the operator to initiate a shutdown
of systems in an orderly fashion. SCRAM invokes a backend script via restricted
SSH keys to initiate the shutdown, so there is no need for an operator to have
direct access to a host system. The SCRAM backend script is housed on each host
in /root/.ssh (for Linux) and /.ssh (for solaris).

Currently, there are several groups with which systems can belong. The group a
system is in determines the order in which it is to be powered down. At present,
systems are to be powered down in the following order :

HPC, includes all High Performance Compute Clusters, hydra, cappl, kong, etc.

CORE, includes systems which provide core services such as AFS file and DB servers.

Additional groups are expected to be added which would be powered down after HPC, but
before CORE. These additional groups and the systems which will belong to them have
yet to be defined.

shut down of HPC clusters

All systems with the exception of HPC clusters can be shutdown via their
standard shutdown scripts, e.g., ’/usr/sbin/shutdown’ on Solaris
and ’/sbin/poweroff’ on Linux.

HPC clusters should be shutdown via ’/admin/cluster.shutdown.sh’
which is in place on all HPC clusters. This script initiates a shutdown of the
clusters compute nodes then shuts down the master, which ensures the cluster
is brought down clean. The cluster shutdown script is called via the SCRAM backend script
when a SCRAM shutdown is invoked by the operator.

Last updated: 5/23/2007