Procedure for bringing up a (solaris) system after a system disk failure ------------------------------------------------------------------------ written: 12/19/06 scheduled: <when system disk/mirror fails> This document is available as : /afs/cad/admin/sys/SOLARIS.DISK.RECOVERY/notes If a system disk fails, it is not mirrored or both the system disk and its mirror have failed, please see : Recovering from a complete disk failure If a system is mirrored and a single disk has failed, please see : Recovering from a failed mirror disk --- o Recovering from a complete disk failure ======================================= In the event that the system disk has been trashed either by a hardware failure or administrative error, a complete system restore will need to be performed to get the system back to a working state. If the system was mirrored using Solaris Volume Manager (SVM) then it may be possible to boot from the mirror disk. Please try to boot from the mirror disk before following this procedure. When a hardware failure occurs it is likely that the mirror disk will still be usable. If however the cause of the crash was due to software or administrator error then the problems will most likely have manifested itself to the mirror disk(s) and the system will need to be restored from backup. A complete system restore will need to be done from an authorized Netbackup client. Currently, ucsrestore1 is the only Netbackup client authorized to perform restores of UCS systems with the exception of the AFS fileservers, which have the ability to restore one another, i.e,, one AFS fileserver has the authority or "trust" to restore another AFS fileserver. For the purposes of this document the system running the Authorized Netbackup client will be known from here on as the "rescue" system. The system that is being restored will be known from here on as the "STBR or System To Be Restored" The system that is currently installed as ucsrestore1 is an old Dell Pricision 410 workstation running Linux. The Pricision 410 only supports IDE disks and an external USB disk which makes this system only practical for restoring other Linux systems which utilize the IDE interface for their system disk. Due to the complexity and variations of disk partitions in Linux vs Solaris, the rescue system needs to be of the same type as the STBR, i.e., a system running Solaris is needed to restore a Solaris system and a Linux rescue system is needed if the STBR is a Linux machine. The version of the OS does not matter though, i.e., a Solaris 9 system can easily be used as a rescue system for a Solaris 10 system, and a Linux system running RHEL 3 can be used to restore a system running RHEL 4. In theory it may be possible to use a Linux system to restore a Solaris system, but this has not be tried or tested. Some sparc based systems use IDE interfaces for their system disks while others use SCSI, fiber channel or SAS. The rescue system needs to support the same disk type as the STBR. systems that may be used to restore IDE based systems include but are not limited to : Ultra 5/10, Sun Blade 100 SunFire V100s, but not V120s as they use SCSI systems that may be used to restore SCSI based systems include but not limited to : Sun Ultra 60 Sun Ultra 80 systems that may be used to restore Fiber Channel based systems include but not limited to : Sun Blade 1000 Sun Fire 280r If the failed system is neither of type IDE or Fiber Channel, but SAS which is found on the new SunFire T2000 systems, then a system that has a SAS controller will be needed. systems that may be used to restore systems with SAS disks include : Sun Fire T2000 Once the rescue system has been identified, it will need to be renamed to ucsrestore1 and brought up with the following network parameters : IP: 128.235.209.136 Mask: 255.255.255.128 GW: 128.235.209.129 ======================================================================== If an AFS fileserver is being used to restore another AFS fileserver, e.g., rodan is being used to restore varan, then it DOES NOT NEED to be renamed and SHOULDN'T BE. ======================================================================== Use a new disk (if existing system disk is destroyed) or utilize existing system disk if failure was determined to be caused by software or human error. The disk should be installed into the rescue system in an available disk slot. ====================== note ====================== For the purposes of this document, the following device will be used: c0t2d0 The above device name is used ONLY as an example. The device that corresponds to the restore disk in the rescue system should be used. ====================== note ====================== - Partition disk The disk can be easily partitioned by importing the VTOC that was created from the original disk. Currently, VTOC information for all fileservers: rodan, varan, kumonga and megalon is saved in : /afs/cad/afs.server.configs/<fileserver>/disk0.VTOC To import the previously saved VTOC use the following command : cat /afs/cad/afs.server.configs/<fileserver>/disk0.VTOC | \ fmthard -s - /dev/rdsk/c0t2d0s0 If a previously saved VTOC is not available, the format utility will need to be used to "slice" up the disk accordingly. It may be possible to see what the partition layout was, by looking at : viridian:/jumpstart/profiles/<STBR>.njit.edu - Create file systems on each of the newly created partitions # bash # for p in 0 3 4 5 6 7; do \ echo y |newfs /dev/rdsk/c0t2d0s${p} ; done - Create mount points and mount partitions The below mount points are used ONLY as an example, the actual mount points from the STBR should be used. This information can be ascertained by restoring STBR:/etc/vfstab. # mkdir /mnt/d0 # mount /dev/dsk/c0t2d0s0 /mnt/d0 # mkdir /mnt/d0/usr # mount /dev/dsk/c0t2d0s3 /mnt/d0/usr # mkdir /mnt/d0/usr/vice # mount /dev/dsk/c0t2d0s4 /mnt/d0/usr/vice # mkdir /mnt/d0/var # mount /dev/dsk/c0t2d0s5 /mnt/d0/var # mkdir /mnt/d0/tmp # mount /dev/dsk/c0t2d0s6 /mnt/d0/tmp # mkdir /mnt/d0/export # mount /dev/dsk/c0t2d0s7 /mnt/d0/export Create the following directories, which are not restored, but will be needed by the system once it is brought back online : # mkdir /afs # mkdir /usr/vice/cache # mkdir /proc # mkdir /dev/fs if this is a fileserver, vice partitions will also need to be created. - Install Netbackup client and prepare restore scripts The /afs/cad/admin/NetBackup/scripts/install.nbcl.ksh script can be used to install the Netbackup client on the rescue system. - Create a working area to hold the Netbackup scripts used for the restore. Personally, I prefer /export/restore/<STBR> - Next, create the bprestore.fileplan and bprestore.rename files These files are used to tell Netbackup what files and directories to restore and where to restore them to. Since this is a full system restore, specify / bprestore.fileplan should only contain one line : / The bprestore.rename file should contain the following : change / to /mnt/d0 - Identify the date range for the backup image to be used for the restore. /usr/openv/netbackup/bin/bpclimagelist -client <client> -s 1/1/1976 where <client> is the fully qualified host name of the STBR. The above produces output such as the following : 12/17/2006 20:11 12/31/2006 627 2272 N Incr Backup NJIT-UCS-UNIX-PROD-AFS-FILESERVER 12/16/2006 20:06 12/30/2006 631 3264 N Incr Backup NJIT-UCS-UNIX-PROD-AFS-FILESERVER 12/15/2006 20:00 12/29/2006 632 3200 N Incr Backup NJIT-UCS-UNIX-PROD-AFS-FILESERVER 12/14/2006 20:05 01/14/2007 157212 6021632 N Full Backup NJIT-UCS-UNIX-PROD-AFS-FILESERVER The above output is for server kumonga. A Full backup was taken on 12/14 and an incremental each night afterward. - Prepare the doit.sh script, which will start bprestore A template of the following script can be found in : /afs/cad/admin/Netbackup/scripts/doit.sh #---------------------------------------------------------------------# # #!/bin/bash # # set -x # cd `dirname $0` # # HERE=`pwd` # FROMCLIENT="kumonga.njit.edu" # Qualified name of STBR # TOCLIENT="ucsrestore1.njit.edu" # Qualified name of rescue system # OTHEROPTS=" -s 12/14/2006 -e 12/17/2006" # DATE range for restores # export PATH=$PATH:/usr/openv/netbackup/bin # echo "HERE=$HERE" # # # USAGE: bprestore [-A | -B] [-K] [-l | -H | -y] [-r] [-T] # # [-L progress_log] [-R rename_file] [-C client] # # [-D client] [-S master_server] [-t class_type] # # [-c class] [-k "keyword phrase"] # # [-s mm/dd/yyyy [hh:mm:ss]] [-e mm/dd/yyyy [hh:mm:ss]] # # [-w [hh:mm:ss]] -f listfile | filenames # # # # bprestore $OTHEROPTS -H -R $HERE/bprestore.rename \ # -L $HERE/bprestore.`/afs/cad/solaris/ucs/bin/datestamp \ # -Y`.kumonga.log -C "$FROMCLIENT" -D "$TOCLIENT" \ # -f bprestore.fileplan # #---------------------------------------------------------------------# The three most important variables in the above script include FROMCLIENT TOCLIENT and OTHEROPTS The FROMCLIENT should contain the fully qualified host name of the STBR. The TOCLIENT should contain the machine that the data is being restored to, in most cases this will be ucsrestore1.njit.edu . The OTHEROPTS variable contains -s <DATE> -e <DATE> where -s is a start date for the backups and -e specifies the end date. If the most recent backup taken was a full, then the date of that last full would be specified for both the start and end dates. - Launch doit.sh to begin the netbackup restore process. A notification of the restore should be sent to prodctl@adm.njit.edu and cc to hoskins@njit.edu to alert the operators that a restore request has been queued. This notification will inform them that a tape mount request will soon be pending. To watch the progress of the restore : tail -f bprestore*.log - After a succesfull backup, install rescue disk back into STBR, and install boot block. Once the backup has completed successfully as indicated in the restore log, shutdown the rescue system and remove the restore disk. At this point the restore disk should be installed back into the STBR as the primary disk. If the restore disk was the original system disk used in the STBR then it will not be necessary to install the boot block. - Removing Solaris Volume Manager (SVM) configuration If the STBR had its disks mirrored using SVM then this configuration needs to be undone since the mirrors no longer exist on the newly restored disk. If the system was not mirrored, proceed to "Installing a boot block" comment out the following line in /etc/system rootdev:/pseudo/md@0:0,10,blk Note that the commenting style used in /etc/system is c-style commenting where a "*" is used to indicate a comment line, do not use a shell style comment "#" as this will produce undesired results. Modify /etc/vfstab to remove all references to the metadevices all /dev/md/r?dsk lines should become /dev/r?dsk - Installing a boot block If new disk was used, install the boot block, other wise proceed to "Booting from the restored disk" Boot the system using an install CD that corresponds to the version of the operating system that is installed on the STBR, i.e., if the STBR runs Solaris 10 then use a Solaris 10 install CD. Installation CDs for the versions of Solaris that we run, are located on the bookshelf in ITC 5301 (key located in CAB_G of 2200). Boot from CD by issuing the following at the OK prompt : boot cdrom -s After the system comes up, issue the following to install the boot block : installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \ /dev/rdsk/cNtNdNs0 ^^^^^^^^ This should be whatever the restore disk shows up as on the STBR (usually, c0t0d0s0 for the primary disk) - Booting from the restored disk At this point try to boot from the newly restored disk. The system should just come up. If however, you see messages similar to the following : "/dev/dsk/cNtNdNsN no such device ..." "/sbin/rcS: no such device ... " Then this indicates that that /dev and /devices need to be manually rebuilt as well as /etc/path_to_inst. The above device paths can normally be rebuilt using boot -r but doing do requires that /usr be accessible. The fix is to boot from CD again and manually copy the /dev and /devices directories including /etc/path_to_inst over to the system disk. To do so, run the following : # boot cdrom -s # mkdir /tmp/a # mount /dev/dsk/cNtNdNs0 # cd /tmp/dev # tar cpvf - . | (cd /tmp/a/dev && tar xpf - ) # cd /tmp/devices # tar cpvf - . | (cd /tmp/a/devices && tar xpf - ) backup the existing path_to_inst file # cp /tmp/a/etc/path_to_inst /tmp/a/etc/path_to_inst.org # cd /tmp/root/etc # cp path_to_inst /tmp/a/etc/path_to_inst halt and reboot # halt # boot -r - Final cleanup At this point if the system comes up things should be fine. However, if the system does not boot correctly, go back over the above steps again to make sure that you have not missed a step. If the system was originally mirrored, you will need to redo the mirror. You may also find some old metadevice hanging around, especially ones associated with the secondary mirror disk, since this disk still has its meta databases. It is a good idea to remove all metadb replicas metadb -d <device> After removal of the old metadbs the system can be mirrored by following the normal procedure used for mirroring disks using SVM. o Recovering from a failed mirror disk ======================================= If a system which is mirrored has lost one of the disks that is part of the mirror then the following can be followed to bring that mirror disk back online. The below documentation will use c0t1d0 as the first disk and c0t1d0 as the second disk. First identify the defective disk by running a metastat. If there are failures, metastat will show the metadevice in a "Maintenance" state. # metastat d10 d10: Mirror Submirror 0: d10 State: Needs maintenance Submirror 1: d20 State: Okay Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 13423200 blocks d11: Submirror of d10 State: Needs maintenance Invoke: metareplace d10 c0t0d0s0 <new device> Size: 13423200 blocks Stripe 0: Device Start Block Dbase State Hot Spare c0t0d0s0 0 No Maintenance d11: Submirror of d10 State: Okay Size: 13423200 blocks Stripe 0: Device Start Block Dbase State Hot Spare c0t1d0s0 0 No Okay The above output from metastat shows that Submirror d11 is in an errored state and needs replacing. The disk associated with this device is c0t0d0s0 Before replacing the problematic disk, remove the state database replicas from the old disk. You will most likely have to force the removal if the disk is not readable. - Remove the metadbs from the failed disk running metadb without any arguments will show the current metadbs # metadb -f -d <device> - Install and partition new disk After physically replacing the failed disk, copy the VTOC from the working disk over to the newly replaced disk : prtvtoc /dev/rdsk/c0t1d0s0 | grep -v '2 *5' | \ fmthard -s - /dev/rdsk/c0t0d0s0 - Creation of new metadbs # metadb -a -c[N] <device> where N=number of replicas. The standard in use on all UCS systems is 3 copies. If the above command fails with a message similar to the following : metadb: <hostname>: has bad master block on device Then the existing submirrors associated with the bad disk need to be removed and recreated. Remove the submirror : metaclear d11 Recreate the submirror : metainit -f d11 1 1 c0t0d0s0 - Replace the failed components ## begin IF BLOCK ## IF adding back the metadbs did not produce an error, then run : # metareplace -e d10 Although metastat says : Invoke: metareplace d10 c0t0d0s0 <new device> we do not specify a new device but rather -e since we are not replacing the failed component with a new device but reusing the same device but with a new disk. See the metareplace man page for additional information. ELSE IF adding metadbs failed then run : # metattach d10 d21 ## end IF BLOCK ## Repeat the above steps for each failed component. - Wait for metadevices to sync. Once the metadevices have been replaced they will take some time to sync. This can take several hours, especially on systems with large partitions. It is a good idea to periodically check the status of the sync with the metastat command. I usually run something like : while true { do metastat | grep % sleep 60 done