Procedure for bringing up a (solaris) system after a system disk failure
------------------------------------------------------------------------
written: 12/19/06
scheduled: <when system disk/mirror fails>
This document is available as :

        /afs/cad/admin/sys/SOLARIS.DISK.RECOVERY/notes

If a system disk fails, it is not mirrored or both the system disk
and its mirror have failed, please see :

        Recovering from a complete disk failure

If a system is mirrored and a single disk has failed, please see :

        Recovering from a failed mirror disk 


---


 o Recovering from a complete disk failure 
   =======================================

   In the event that the system disk has been trashed either by a hardware
   failure or administrative error, a complete system restore will need 
   to be performed to get the system back to a working state.

   If the system was mirrored using Solaris Volume Manager (SVM) then it 
   may be possible to boot from the mirror disk.  Please try to boot from 
   the mirror disk before following this procedure.  
   
   When a hardware failure occurs it is likely that the mirror disk will 
   still be usable.  If however the cause of the crash was due to software 
   or administrator error then the problems will most likely have manifested 
   itself to the mirror disk(s) and the system will need to be restored from 
   backup.

   A complete system restore will need to be done from an authorized
   Netbackup client.  Currently, ucsrestore1 is the only Netbackup client 
   authorized to perform restores of UCS systems with the exception of the
   AFS fileservers, which have the ability to restore one another, i.e,,
   one AFS fileserver has the authority or "trust" to restore another AFS
   fileserver.  

   For the purposes of this document the system running the Authorized 
   Netbackup client will be known from here on as the "rescue" system.  
   The system that is being restored will be known from here on as the
   "STBR or System To Be Restored"

   The system that is currently installed as ucsrestore1 is an old Dell
   Pricision 410 workstation running Linux.  The Pricision 410 only
   supports IDE disks and an external USB disk which makes this system
   only practical for restoring other Linux systems which utilize the 
   IDE interface for their system disk. 

   Due to the complexity and variations of disk partitions in Linux vs
   Solaris, the rescue system needs to be of the same type as the 
   STBR, i.e., a system running Solaris is needed to restore a
   Solaris system and a Linux rescue system is needed if the STBR is
   a Linux machine.  The version of the OS does not matter though, 
   i.e., a Solaris 9 system can easily be used as a rescue system for
   a Solaris 10 system, and a Linux system running RHEL 3 can be used to
   restore a system running RHEL 4.

   In theory it may be possible to use a Linux system to restore a Solaris
   system, but this has not be tried or tested.

   Some sparc based systems use IDE interfaces for their system disks while
   others use SCSI, fiber channel or SAS.  The rescue system needs to 
   support the same disk type as the STBR. 
   
   systems that may be used to restore IDE based systems include but are
   not limited to :

        Ultra 5/10, Sun Blade 100
        SunFire V100s, but not V120s as they use SCSI

   systems that may be used to restore SCSI based systems include but
   not limited to :

        Sun Ultra 60
        Sun Ultra 80

   systems that may be used to restore Fiber Channel based systems include
   but not limited to :

        Sun Blade 1000
        Sun Fire 280r
   
   If the failed system is neither of type IDE or Fiber Channel, but SAS
   which is found on the new SunFire T2000 systems, then a system that 
   has a SAS controller will be needed.  

   systems that may be used to restore systems with SAS disks include :

        Sun Fire T2000

   Once the rescue system has been identified, it will need to be renamed
   to ucsrestore1 and brought up with the following network parameters :

                IP:      128.235.209.136
                Mask:    255.255.255.128
                GW:      128.235.209.129

   ========================================================================
   If an AFS fileserver is being used to restore another AFS fileserver,
   e.g., rodan is being used to restore varan, then it DOES NOT NEED to be 
   renamed and SHOULDN'T BE.  
   ========================================================================

   Use a new disk (if existing system disk is destroyed) or utilize 
   existing system disk if failure was determined to be caused by
   software or human error.

   The disk should be installed into the rescue system in an available
   disk slot.


        ====================== note ======================

   For the purposes of this document, the following device will be used:

        c0t2d0

   The above device name is used ONLY as an example. The device that
   corresponds to the restore disk in the rescue system should be used.

        ====================== note ======================

   - Partition disk 

     The disk can be easily partitioned by importing the VTOC that was
     created from the original disk.  Currently, VTOC information for all
     fileservers: rodan, varan, kumonga and megalon is saved in :

                /afs/cad/afs.server.configs/<fileserver>/disk0.VTOC

     To import the previously saved VTOC
     use the following command :

                cat /afs/cad/afs.server.configs/<fileserver>/disk0.VTOC | \
                fmthard -s - /dev/rdsk/c0t2d0s0



     If a previously saved VTOC is not available, the format utility will
     need to be used to "slice" up the disk accordingly.  It may be
     possible to see what the partition layout was, by looking at :

                viridian:/jumpstart/profiles/<STBR>.njit.edu
                
   - Create file systems on each of the newly created partitions

     # bash 
     # for p in 0 3 4 5 6 7; do \
                echo y |newfs /dev/rdsk/c0t2d0s${p} ; done

   - Create mount points and mount partitions 

     The below mount points are used ONLY as an example, the actual mount
     points from the STBR should be used.  This information can be ascertained
     by restoring STBR:/etc/vfstab.

     # mkdir /mnt/d0
     # mount /dev/dsk/c0t2d0s0 /mnt/d0
     # mkdir /mnt/d0/usr
     # mount /dev/dsk/c0t2d0s3 /mnt/d0/usr
     # mkdir /mnt/d0/usr/vice
     # mount /dev/dsk/c0t2d0s4 /mnt/d0/usr/vice
     # mkdir /mnt/d0/var
     # mount /dev/dsk/c0t2d0s5 /mnt/d0/var
     # mkdir /mnt/d0/tmp
     # mount /dev/dsk/c0t2d0s6 /mnt/d0/tmp
     # mkdir /mnt/d0/export
     # mount /dev/dsk/c0t2d0s7 /mnt/d0/export

     Create the following directories, which are not restored, but will be
     needed by the system once it is brought back online :

     # mkdir /afs
     # mkdir /usr/vice/cache
     # mkdir /proc
     # mkdir /dev/fs

     if this is a fileserver, vice partitions will also need to be created.

   - Install Netbackup client and prepare restore scripts
     
     The /afs/cad/admin/NetBackup/scripts/install.nbcl.ksh script can be
     used to install the Netbackup client on the rescue system.

   - Create a working area to hold the Netbackup scripts used for the
     restore.  Personally, I prefer /export/restore/<STBR>

   - Next, create the bprestore.fileplan and bprestore.rename files

     These files are used to tell Netbackup what files and directories to
     restore and where to restore them to.  Since this is a full system 
     restore, specify / 

     bprestore.fileplan should only contain one line :

     /
 
     The bprestore.rename file should contain the following :

     change / to /mnt/d0

   - Identify the date range for the backup image to be used for the restore.

        /usr/openv/netbackup/bin/bpclimagelist -client <client> -s 1/1/1976
        where <client> is the fully qualified host name of the STBR.

     The above produces output such as the following :

 12/17/2006 20:11  12/31/2006      627        2272 N Incr Backup NJIT-UCS-UNIX-PROD-AFS-FILESERVER
 12/16/2006 20:06  12/30/2006      631        3264 N Incr Backup NJIT-UCS-UNIX-PROD-AFS-FILESERVER
 12/15/2006 20:00  12/29/2006      632        3200 N Incr Backup NJIT-UCS-UNIX-PROD-AFS-FILESERVER
 12/14/2006 20:05  01/14/2007   157212     6021632 N Full Backup NJIT-UCS-UNIX-PROD-AFS-FILESERVER
 
 
     The above output is for server kumonga.  A Full backup was
     taken on 12/14 and an incremental each night afterward.  

   - Prepare the doit.sh script, which will start bprestore

     A template of the following script can be found in :
        
                /afs/cad/admin/Netbackup/scripts/doit.sh
     #---------------------------------------------------------------------#
     #  #!/bin/bash
     #  
     #  set -x
     #  cd `dirname $0`
     #  
     #  HERE=`pwd`
     #  FROMCLIENT="kumonga.njit.edu"       # Qualified name of STBR
     #  TOCLIENT="ucsrestore1.njit.edu"     # Qualified name of rescue system
     #  OTHEROPTS=" -s 12/14/2006 -e 12/17/2006" # DATE range for restores
     #  export PATH=$PATH:/usr/openv/netbackup/bin
     #  echo "HERE=$HERE"
     #  
     # # USAGE: bprestore [-A | -B] [-K] [-l | -H | -y] [-r] [-T]
     # #       [-L progress_log] [-R rename_file] [-C client]
     # #       [-D client] [-S master_server] [-t class_type]
     # #       [-c class] [-k "keyword phrase"]
     # #       [-s mm/dd/yyyy [hh:mm:ss]] [-e mm/dd/yyyy [hh:mm:ss]]
     # #       [-w [hh:mm:ss]] -f listfile | filenames
     # # 
     #
     # bprestore $OTHEROPTS  -H -R $HERE/bprestore.rename \
     # -L $HERE/bprestore.`/afs/cad/solaris/ucs/bin/datestamp \
     # -Y`.kumonga.log   -C "$FROMCLIENT"  -D "$TOCLIENT" \
     # -f bprestore.fileplan
     #
     #---------------------------------------------------------------------#

     The three most important variables in the above script include 
     FROMCLIENT TOCLIENT and OTHEROPTS

     The FROMCLIENT should contain the fully qualified host name of the
     STBR.

     The TOCLIENT should contain the machine that the data is being
     restored to, in most cases this will be ucsrestore1.njit.edu .

     The OTHEROPTS variable contains -s <DATE> -e <DATE> where -s is a
     start date for the backups and -e specifies the end date.  If the
     most recent backup taken was a full, then the date of that last full
     would be specified for both the start and end dates.

   - Launch doit.sh to begin the netbackup restore process.
     
     A notification of the restore should be sent to prodctl@adm.njit.edu
     and cc to hoskins@njit.edu to alert the operators that a restore
     request has been queued.  This notification will inform them that a
     tape mount request will soon be pending.  

     To watch the progress of the restore :

        tail -f bprestore*.log 

   - After a succesfull backup, install rescue disk back into STBR,
     and install boot block.
  
     Once the backup has completed successfully as indicated in the restore
     log, shutdown the rescue system and remove the restore disk.  

     At this point the restore disk should be installed back into the STBR
     as the primary disk.  

     If the restore disk was the original system disk used in the STBR then
     it will not be necessary to install the boot block.  

   - Removing Solaris Volume Manager (SVM) configuration

     If the STBR had its disks mirrored using SVM then this configuration
     needs to be undone since the mirrors no longer exist on the newly
     restored disk.  If the system was not mirrored, proceed to
     "Installing a boot block"

     comment out the following line in /etc/system

        rootdev:/pseudo/md@0:0,10,blk

     Note that the commenting style used in /etc/system is c-style
     commenting where a "*" is used to indicate a comment line, do not use
     a shell style comment "#" as this will produce undesired results.

     Modify /etc/vfstab to remove all references to the metadevices

     all /dev/md/r?dsk lines should become /dev/r?dsk
     

   - Installing a boot block 

     If new disk was used, install the boot block, other wise proceed to
     "Booting from the restored disk"

     Boot the system using an install CD that corresponds to the version
     of the operating system that is installed on the STBR, i.e., if the
     STBR runs Solaris 10 then use a Solaris 10 install CD.  
     
     Installation CDs for the versions of Solaris that we run, are located
     on the bookshelf in ITC 5301 (key located in CAB_G of 2200). 

     
     Boot from CD by issuing the following at the OK prompt :

        boot cdrom -s

     After the system comes up, issue the following to install the
     boot block :

        installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk \
                /dev/rdsk/cNtNdNs0 
                          ^^^^^^^^
                          This should be whatever the restore disk
                          shows up as on the STBR (usually, c0t0d0s0 for 
                                                      the primary disk)
                
   - Booting from the restored disk

     At this point try to boot from the newly restored disk.  The system
     should just come up.  If however, you see messages similar to the
     following :

        "/dev/dsk/cNtNdNsN no such device ..."
        "/sbin/rcS: no such device ... "

     Then this indicates that that /dev and /devices need to be manually
     rebuilt as well as /etc/path_to_inst.

     The above device paths can normally be rebuilt using boot -r but
     doing do requires that /usr be accessible.

     The fix is to boot from CD again and manually copy the /dev and
     /devices directories including /etc/path_to_inst over to the system
     disk.

     To do so, run the following :

        # boot cdrom -s
        # mkdir /tmp/a 
        # mount /dev/dsk/cNtNdNs0
        
        # cd /tmp/dev
        # tar cpvf - . | (cd /tmp/a/dev && tar xpf - )

        # cd /tmp/devices
        # tar cpvf - . | (cd /tmp/a/devices && tar xpf - )

     backup the existing path_to_inst file

        # cp /tmp/a/etc/path_to_inst /tmp/a/etc/path_to_inst.org
        # cd /tmp/root/etc
        # cp path_to_inst /tmp/a/etc/path_to_inst

     halt and reboot 

        # halt
        # boot -r

     
   - Final cleanup 

     At this point if the system comes up things should be fine.
     However, if the system does not boot correctly, go back over the
     above steps again to make sure that you have not missed a step.  

     If the system was originally mirrored, you will need to redo the
     mirror.   

     You may also find some old metadevice hanging around, especially ones
     associated with the secondary mirror disk, since this disk still has
     its meta databases.  

     It is a good idea to remove all metadb replicas

        metadb -d <device>

     
     After removal of the old metadbs the system can be mirrored by
     following the normal procedure used for mirroring disks using SVM.

 o Recovering from a failed mirror disk
   =======================================

   If a system which is mirrored has lost one of the disks that is part of
   the mirror then the following can be followed to bring that mirror disk
   back online.  

   The below documentation will use c0t1d0 as the first disk and c0t1d0 as
   the second disk.

   First identify the defective disk by running a metastat.  If there are
   failures, metastat will show the metadevice in a "Maintenance" state.

   # metastat d10
        
       d10: Mirror
           Submirror 0: d10
             State: Needs maintenance 
           Submirror 1: d20
             State: Okay         
           Pass: 1
           Read option: roundrobin (default)
           Write option: parallel (default)
           Size: 13423200 blocks
       
       d11: Submirror of d10
           State: Needs maintenance 
           Invoke: metareplace d10 c0t0d0s0 <new device>
           Size: 13423200 blocks
           Stripe 0:
               Device              Start Block  Dbase State        Hot Spare
               c0t0d0s0                   0     No    Maintenance  
       
       
       d11: Submirror of d10
           State: Okay         
           Size: 13423200 blocks
           Stripe 0:
               Device              Start Block  Dbase State        Hot Spare
               c0t1d0s0                   0     No    Okay         


   The above output from metastat shows that Submirror d11 is in an
   errored state and needs replacing.  The disk associated with this
   device is c0t0d0s0

   Before replacing the problematic disk, remove the state
   database replicas from the old disk.    You will most likely have to
   force the removal if the disk is not readable.


   - Remove the metadbs from the failed disk
     running metadb without any arguments will show the current metadbs

        # metadb -f -d <device>

   - Install and partition new disk

     After physically replacing the failed disk, copy the VTOC from the
     working disk over to the newly replaced disk :

        prtvtoc /dev/rdsk/c0t1d0s0 | grep -v '2  *5' | \
                fmthard -s - /dev/rdsk/c0t0d0s0

   - Creation of new metadbs

        # metadb -a -c[N] <device>

     where N=number of replicas.  The standard in use on all UCS
     systems is 3 copies.

     If the above command fails with a message similar to the following :

        metadb: <hostname>: has bad master block on device

     Then the existing submirrors associated with the bad disk need to be
     removed and recreated.

     Remove the submirror :

        metaclear d11

     Recreate the submirror :

        metainit -f d11 1 1 c0t0d0s0

   - Replace the failed components

     ## begin IF BLOCK ##

     IF adding back the metadbs did not produce an error, then 
     run :

        # metareplace -e d10

     Although metastat says :

        Invoke: metareplace d10 c0t0d0s0 <new device>
   
     we do not specify a new device but rather -e since we are not
     replacing the failed component with a new device but reusing the
     same device but with a new disk.  See the metareplace man page
     for additional information.


     ELSE
     IF adding metadbs failed then run : 

        # metattach d10 d21

     ## end IF BLOCK ##

     Repeat the above steps for each failed component.


  - Wait for metadevices to sync.  
    
    Once the metadevices have been replaced they will take some time to
    sync.  This can take several hours, especially on systems with large
    partitions.

    It is a good idea to periodically check the status of the sync with
    the metastat command.

    I usually run something like :

        while true {
         do
         metastat | grep %
         sleep 60
         done