Note: if the CPU in your system is a 'PC' type (ie. it has no secondary cache, such as R4600PC 133MHz), please read the extra note at the end of these instructions about RxxxxPC systems.
If you already have an OS installed, such as IRIX 5.3, I recommend against 'upgrading' in the normal sense, eg. running swmgr on a 5.3 Indy and upgrading using the 6.2 CDs. It is far better to carry out a clean installation from scratch; even the file system might be different (XFS instead of EFS in the case of changing from 5.3 to 6.2, although there is a version of 5.3 with XFS). Upgrading on top of an older OS means one is never really sure that the system is configured the way it should be.
Thus, the methodology for installing an OS which I use is as follows:
When I ran an Indy lab at the main University in Preston (where I live), most of the files I backed up were important ones in /etc, others in /var, some local home-made web pages, etc.
Here is a list of the files I backed up (depending on your configuration, other files may also be important to you, such as /etc/sendmail.cf):
/.cshrc /.tchsrc /.Xresources /.jotrc /etc/TIMEZONE /etc/fstab /etc/hosts.equiv /etc/resolv.conf /etc/group /etc/hosts /etc/passwd /etc/sys_id /etc/config/timed.options /etc/init.d/network.local /var/yp/ypdomainIt's worth noting that, in some cases, I was not intending to simply move the above backed-up files into the appropriate places once the new installation was complete. Rather, they would be useful in working out what the required changes to the new setup ought to be. When it actually came to configuring the new system, I used a more efficient method of setting up the individual machines - this is described later as an example of automation.
When reinstalling the server, extra files to be backed up included all necessary DNS/NIS data (/var/yp, /var/named, etc.) and of course user data.
Naturally, I also made a complete backup of the system to DAT (see below for details).
Using the two IRIX 6.2 CDs, I made a clean installation like this (my description assumes the presence of a locally-connected CDROM drive):
boot -f dksc(0,,8)sashARCS dksc(0,,7)stand/fx.ARCS --x
where is the SCSI ID of the CDROM drive. Note that it's possible the CDROM is on a different SCSI channel, eg. an external CDROM on Indigo2, in which case use the appropriate SCSI controller number instead of 0. For example, the command for an Indigo2 with an external CDROM on ID 4 would be:
boot -f dksc(1,4,8)sashARCS dksc(1,4,7)stand/fx.ARCS --x
Use the hinv command in the Command Monitor to identify the correct SCSI controller number and SCSI ID of your CDROM.
According to the fx man page, the above command sequences apply to systems with the 32bit ARCS PROM, namely R4K Indigo, R4K Indigo2, Indy, R4K Onyx, R4K Challenge and O2. For systems with the 64bit ARCS PROM (ie. Power Challenge, Power Onyx, Power Indigo2, Indigo2 IMPACT 10000 or R8000 Indigo2, Origin, Onyx2, OCTANE, etc.) use this command:
boot -f dksc(0,,8)sash64 dksc(0,,7)stand/fx.64 --x
Older systems such as R3K Indigo are slightly different. In these cases, the sash file and fx file are named to correspond with the system's CPU IP number, eg. R3000 Indigo is an IP12 system. Thus, the correct command for R3K Indigo is:
boot -f dksc(0,,8)sashIP12 dksc(0,,7)stand/fx.IP12 --x
Note that sometimes I have seen this command fail with an error about the CDROM not being ready, or R4K Indigo might give an error about wrong architecture. This happens more often on older systems such as Indigo when the CDROM being used is a more modern model. Usually the problem can be solved just by repeating the command again by entering '!!' (without the quotes) - for some reason, just entering the command makes it work ok. Alternatively, when an error occurs like this, I've often found that doing the command sequence in two stages instead of all at once can fix the problem too, ie. first boot up into the sash (I'm using the R3K Indigo example here, assuming CDROM on SCSI ID 4):
boot -f dksc(0,4,8)sashIP12
Then, once at the sash prompt, boot into fx:
boot -f dksc(0,4,7)stand/fx.IP12 --x
I think R3K Indigo, and probably earlier systems, are more fussy about how the CDROM behaves. I certainly found that reliably installing 5.3 on an R3K Indigo was only possible with very early models of CDROM (1X or 2X Toshiba), though I expect 5.3 could handle later models of CDROM if one applied all relevant 5.3 updates and patches.
fx: "device-name" = (dksc)
Assuming you're using a system that's disk on SCSI ID 1, the default settings will be correct so just press Return in answer to the initial questions (dksc, ctlr, drive and lun). If this isn't the case (use hinv to check), then just enter the correct values when asked, eg. the system disk in an R10000 O2 is always on SCSI ID 2.
From the shell prompt, enter the following:
umount /root mkfs /dev/dsk/dks0d1s0
Compared to IRIX 5.3, this operation is executed very quickly under 6.2 - this is because the XFS file system works in a completely different way to the old EFS file system.
Note that if you are using a disk which is smaller than 4GB (eg. 549MB, 1GB, 2GB, etc.) then it is better to have a disk block size of 512 instead of the default 4096. So, for small-size disks, use this command instead of the mkfs sequence shown above:
mkfs -b size=512 /dev/dsk/dks0d1s0
Now remount the root disk on /root:
mount /dev/dsk/dks0d1s0 /root
and exit the shell (enter 'exit', or press CTRL+D) to return to the inst program.
Enter '1' to select the 'From' option. The default is /CDROM/dist so press Return in answer to the question if this is correct. The 6.2 startup script README will appear again. Quit the script and exit without running it by pressing 'Q' and then entering '2'. The product descriptions will be read from the CD, and then the inst prompt will appear again after some extra message concerning product dependencies, file sizes, etc.
This next step is very important; enter the following:
set delay_conflicts on
Because 6.2 comes on two CDs, it obviously isn't possible to install everything with just one 'run' of inst (6.5 handles this in a different way), unless you happen to have two CDROMs attatched, or the data has been placed together on a single disk. Thus, one must inform inst to ignore any conflicts it finds while installing the first CD - any conflicts are automatically dealt with when installing the 2nd CD.
Now enter:
install default
One could use option 7 ('Step') to manually decide which items to install, but it's much easier to do a default installation and then deal with the specifics later when one can use the GUI tools, perhaps even from a different SGI if the target system is a server. Note that a default installation takes up about twice as much space as a minimum installation. If your disk is small and you want to install as little as possible, then don't enter 'install default' after setting delay_conflicts to on.
When the installation of the first CD has finished, eject the CD and insert the 'IRIX 6.2 Part 2 of 2' CD. Select 'From' in the same way (enter '1'), quit out of any script README, enter 'install default' as before (if you did for the first CD) and then enter 'go'.
At this point, it's quite common for a conflict message to be displayed concerning the installation of xlators_3d.doc.web_page (it can't be installed because it relies on part of Netscape that's not present on the 2nd CD. Thus, if this happens, just enter 'conflicts 1a' to resolve the conflict, and then commence the installation with 'go'.
If you're installing onto a server system such as Challenge S, then after the initial installation has finished, the most effective thing to do is to alter just enough of the system files so that one can login to the server remotely from another SGI (all you need to change is /etc/sys_id and /etc/hosts, then reboot). This allows one to use the GUI tools (Software Manager) to begin the task of installing further software, configuring the DNS, NFS, NIS and so on.
Tip: install any desired patches last, usually the current Required/Recommended patch set CD.
I mentioned earlier that I had an easy way of configuring the individual machines in the Indy lab I ran at Preston Uni. Basically, I used a script file which, given a 'target' system name, would install the necessary files with the appropriate changes in the right places automatically. I can't claim the script was super-efficient (it contained no error-checking) but it did the job and saved me alot of time and effort. Here's how I did it...
The server was already configured and ready with NFS, NIS, DNS, etc. all successfully running. All other machines except the 5.3 admin Indy were shut down, cleaned, and the disks ready to clone (I installed 6.2 on the admin Indy last).
Because I knew that many of the files which make a system what it is will be common to all machines, I made a directory called 'CLONE' which was copied to every machine's root disk during the main disk cloning process. The CLONE directory resided in /var/tmp. It contained the following files:
.Xresources .cshrc .jotrc .rhosts go* etc/ bootptab fstab hosts.equiv resolv.conf bootptab.msk group socks.conf TIMEZONE bootptab.tmp hosts passwd sys_id.msk etc/config/timed.options etc/init.d/network.local* var/yp/ypdomain var/netls/nodelock
The main thing which makes a machine an individual entity is /etc/sys_id. The sys_id.msk file contains all the different host names for the 19 machines I had to configure. The script 'go' is given the target machine name as a single parameter. grep uses this name to select the appropriate line from sys_id.msk and the result is redirected into the new /etc/sys_id file. In a similar way, the /etc/bootptab file is created by grepping the appropriate line from bootptab.msk with the target name, combining it with the initial text that /etc/bootptab always has, and dumping the result into a new /etc/bootptab file.
The other actions taken by the script file include copying the various configuration files to their appropriate locations, setting up the necessary K39 and S31 network links for the network.local file (you may or may not be using a static route), chkconfigging on particular flags and erasing certain portions of /var/www (this is because the system had /var/www NFS-mounted; everything in there was removed, except the /var/www/server directory to allow the system to be booted in standalone mode should it ever be necessary).
Here is the 'go' script in detail. It contains many 'echo' statements so that I could see the script's progress when executed, and also for debugging purposes:
#!/bin/sh echo Target system: $1 echo Copying hidden files... cd /var/tmp/CLONE /bin/cp .Xresources .cshrc .jotrc .rhosts / echo Copying etc files... cd etc echo sys_id... grep $1 sys_id.msk > /etc/sys_id echo bootptab... grep $1 bootptab.msk > bootptab.tmp cat bootptab bootptab.tmp > /etc/bootptab echo fstab, group, hosts, hosts.equiv, passwd, resolv.conf, TIMEZONE... /bin/cp fstab group hosts hosts.equiv passwd resolv.conf TIMEZONE /etc echo timed.options... cd config /bin/cp timed.options /etc/config echo network.local... cd ../init.d /bin/cp network.local /etc/init.d echo K39/S31 network links... ln -s /etc/init.d/network.local /etc/rc0.d/K39network ln -s /etc/init.d/network.local /etc/rc2.d/S31network echo Copying var files... echo 'nodelock (but not overwritten old nodelock)...' cd ../../var/netls /bin/cp nodelock /var/netls/nodelock.new echo ypdomain... cd ../yp /bin/cp ypdomain /var/yp echo Creating mounting directories... cd / mkdir mapleson mkdir home echo 'Erasing /var/www stuff (do this only after cloning next disk)...' cd /var/www echo Erasing cgi-bin... /bin/rm -rf cgi-bin echo Erasing conf... /bin/rm -rf conf... echo Erasing htdocs... /bin/rm -rf htdocs echo Changing chkconfig flags... chkconfig directoryserver on chkconfig network on chkconfig nfs on chkconfig verbose on chkconfig yp on chkconfig videod on echo Done. echo 'You are now ready to reboot, erase /var/tmp/CLONE if no more' echo 'cloning is required, and reboot again.'
The script was run on each machine only after all disks had been successfully cloned. A typical command sequence, after turning on the system and logging in as root, would look like this:
cd /var/tmp/CLONE ./go AKIRA
where 'AKIRA' is the name of one of the Indys I ran.
Installing IRIX 6.2 on RxxxxPC Systems
There is a problem with the later kernel rollup patches that affects systems which have CPUs with no secondary cache (R4600PC 100MHz, R4600PC 133MHz and R5000PC 150MHz). A system with one of these CPUs which has a kernel rollup patch later than 2777 installed (eg. 3110 or 3156) will experience kernel faults (see my Miscellaneous System Problems page for full details).
Tip: during a 6.2 OS installation, when it comes to installing the 'Required/Recommended' patches CD, install patch 2777 first from a different source (eg. the 'IRIX 6.2 Development Foundation 1.1' CD, or the 'Varsity Update 1 of 1, August 1998' CD) and then install the automatically selected patches from the latest patch set CD except the kernel rollup patch.
SGI will likely release a newer kernel rollup at some point which solves this problem.
setenv OSLoadFilename /unix
exit Command Monitor and press 1 to boot the system.
Sometimes, it's necessary to make an exact copy of a disk, perhaps
for backup purposes or perhaps during an OS upgrade, eg. a single
client machine has a new OS installed and its disk is then cloned to
all other client disks before individual changes are made. This is a
very good way of ensuring that all client systems have identical
software setups and is alot faster than installing products manually
on each machine. The procedure itself is easy, so if you're someone
who has dozens of systems to configure, then don't panic! The info
here should help. Note that SGI's TechPubs site has further
information.
There are two main ways to clone a disk: using xfsdump in conjunction with xfsrestore, or by using the tar command. The xfsdump method is listed first as it's faster and has other advantages. Sometimes though, the xfsdump method is not appropriate, in which case tar is used - example scenarios are explained later. xfsdump is just better at handling device files, etc. whereas tar can cause problems if not used properly.
In the description given here, I'm assuming certain things:
Bootup the system and login as root. Obtain a UNIX shell.
Create a mount point:
mkdir /0
Use fx to repartition the extra disk (don't include my comments):
cd fx -x # Run fx <Enter> # Select dksc <Enter> # Select controller 0 2 # Select drive 2 <Enter> # Select lun 0 r # Select repartition option ro # Select root drive option <Enter> # Select XFS yes # Yes, continue with the operation .. # Return to the main menu l # Create a new label sy # Write out the new label /exit # Exit fx
Use mkfs to create a new file system:
mkfs -b size=512 /dev/dsk/dks0d2s0
Note that if the disk is 4GB or larger, then exclude the block size definition, ie. just enter:
mkfs /dev/dsk/dks0d2s0
Mount the destination disk:
mount /dev/dsk/dks0d2s0 /0
Confirm the amount of space available with 'df -k'.
Now begin the copy process:
cd /0 xfsdump -l 0 -p 5 - / | xfsrestore - .
This specifies a Level 0 dump (all files), progress report every 5 seconds, acting on the root file system. xfsdump sends the data to the standard output (by the use of the '-' character); this is piped to xfsrestore which is getting its data from the standard input (again by the use of the '-' character).
NB: I often find it useful to know how long these copy procedures take, eg. planning whether or not one has enough time to do multiple systems, etc. Thus, I always use the timex command to report how long the copy process lasted. Just put timex as the first command, ie. instead of the above, enter:
cd /0 timex xfsdump -l 0 -p 5 - / | xfsrestore - .
It doesn't make any difference to the copy process, but it can be useful to have a appreciation for how long these tasks take.
Tip 1: if you're doing all this in a standard xterm, make it wider so that the progress messages don't get wrapped onto the next line. It's easier to read.
Tip 2: if you're using a multi-CPU system, remember you can use the runon command to force the copy process to run on a particular CPU. It's best to choose a CPU that's closest to the SCSI controller(s) involved in the copy process as this minimises system traffic. This is more relevant to newer systems such as Origin, Onyx2, etc. On older systems like Onyx and Challenge, it's more useful simply as a way to prevent the default CPU 0 being used to do everything, eg. for a 4-CPU deskside one might run the task on CPU 3 thus:
cd /0 runon 3 timex xfsdump -l 0 -p 5 - / | xfsrestore - .
Finally, the volume header information from the root disk must be copied onto the target disk, though one could do this while the copying is going on. Enter the following:
cd /stand dvhtool -v get sash sash /dev/rdsk/dks0d1vh dvhtool -v get ide ide /dev/rdsk/dks0d1vh dvhtool -v creat sash sash /dev/rdsk/dks0d2vh dvhtool -v creat ide ide /dev/rdsk/dks0d2vh
There may be a symmon entry in the volume header too, in which case enter these extra commands:
dvhtool -v get symmon symmon /dev/rdsk/dks0d1vh dvhtool -v creat symmon symmon /dev/rdsk/dks0d2vh
Try the 'get symmon' command above; if it gives a not-found error, then there isn't any symmon entry present, so don't bother with the creat command.
Alternatively, one can copy the volume header interactively, which does have the advantage of being able to see exactly what is present in the volume header. Also, some systems will have other entries besides sash, ide and symmon, eg. Octane will often have a file called IP30prom. Thus, the interactive method is what I usually use. Here is what to enter (exclude my comments of course):
cd /stand dvhtool /dev/rdsk/dks0d1vh # Access the system disk volume header vd # Switch to a different menu l # List contents of volume header g sash sash # Copy volume header entries to disk; g ide ide # If the 'l' command shows other entries g symmon symmon # besides these, then copy them too. quit # Exit from this session... quit dvhtool /dev/rdsk/dks0d2vh # Access the destination disk vd l d sash # Delete old entries (if any are shown d ide # to be present by the l command), d symmon # including any besides, sash, ide and symmon. a sash sash # Copy new entries to destination volume header... a ide ide a symmon symmon quit write # Confirm out the changes quit
In fact, the amount of typing required for the interactive method is less, so that's another advantage.
Note that 5.3 handles the sash in a slightly different way from 6.2/6.5, so if the disk to be copied is a 5.3 installation, then the volume header copy operation can be compacted to:
cd /0 dvhtool -v creat /stand/sash sash /dev/rdsk/dks0d2vhAnd that's it! The machine can now be powered down and the cloned disk removed. Don't forget to change the clone disk's SCSI ID to 1, though on many systems that is done automatically via the use of a disk sled.
Most of the time, using xfsdump is the best, fastest and most efficient way to clone a disk. However, sometimes it may not be appropriate, eg. if the file system spans several disks but the destination is just a single disk (xfsdump only dumps a single named file system). In such circumstances, using tar is the main alternative.
However, using tar requires some special measures: all NFS mounts should be unmounted beforehand, as should the /proc file system. Also, any CDROMs and other media should be ejected from their respective devices.
Here is what to enter after the fx/mkfs procedure, making the /0 mount point and mounting the target disk:
umount /proc tar cvBpf - . | (cd /0; tar xBpf -)This command recursively copies the root disk or file system onto the extra disk. By recursive I mean that it also copies /0 into /0; however, at the time this is done, the only items in /0 are some hidden files (because the copy process hasn't yet alphabetically reached anything else), so not much extraneous data is copied. This is why I use /0 as a mount point: if the extra disk was mounted on /disk2 and there was a directory such as /Data or /Alias containing alot of data, then alot of unnecessary copying would occur, and the copy procedure might even fail due to running out of disk space. The character '0' comes before just about everything else in the ASCII character set, so these problems ar prevented.
Note that one definitely does not want to try and tar over /proc since /proc does not contain 'real' files - the entries in /proc relate to process information, used, for example, by the 'ps' command and 'killall'. The entries appear as very large files even though they're not; they are effectively images of running processes; tar cannot understand this and chokes on them, so one should unmount /proc before beginning the tar procedure.
Anyway, after the tar process has finished, enter the following to remount /proc and remove the unwanted '0' directory that's inside /0:
/etc/mntproc cd /0 /bin/rm -rf 0
Using tar does have the advantage that one can see the files being copied, which is good feedback on the copying process. However, as the various necessary commands demonstrate, tar is sensitive to issues such as NFS mounts, /proc, mounted removeable media, etc. After the cloning has finished, copy over the volume header information just as for the xfsdump method.
Option Disks
The procedure for copying option disks is similar, expect that the partition number will be 7 instead of 0 (remember to select 'Option Drive' from within fx) and one does not need to worry about any volume header since there isn't one.
How to clone lots of disks the easy way!
The following is useful if one has many disks to clone, eg. every client in the lab I ran is being upgraded from 5.3 to 6.2. This is the example I used at the time using tar; these days I would use xfsdump instead.
The answer? Use a script file! Here is the script I used, stored in an executable file called 'diskcopy' which is placed in the root directory:
#!/bin/sh echo echo WARNING: this script assumes the 'fx -x' procedure has echo already been performed on the target disk! echo echo Making file system on /dev/dsk/dks0d2s0... mkfs -b size=512 /dev/dsk/dks0d2s0 echo Mounting /dev/dsk/dks0d2s0 on /0... mount /dev/dsk/dks0d2s0 /0 echo Unmounting /proc... umount /proc echo Changing to root dir... cd / echo Copying... tar cvBpf - . | (cd /0; tar xBpf -) echo echo Changing to /0... cd /0 echo Removing unwanted contents of /0/0... /bin/rm -rf 0 echo Recreating /0/0... mkdir 0 echo Changing to root dir... cd / echo getting the sash from the root disk... dvhtool -v get sash /stand/sash /dev/rdsk/dks0d1vh echo Changing to /0... cd /0 echo Writing the sash to the target disk... dvhtool -v creat /stand/sash sash /dev/rdsk/dks0d2vh echo Remount /proc... /etc/mntproc echo Now power down the system and remove the cloned disk. echo Remember to switch the cloned disk's SCSI ID back to 1. echo NB: after installing the cloned disk into the next machine echo and powering on the system, remember to login as root and echo do an immediate reboot before running this script again - echo this will install the new unix.install file. echo
(I have alot of echo comments in my scripts so I can see what is going on)
So how is the above script used? Here is an example of cloning four disks (W, X, Y and Z), using 2 Indys (A and B):
./diskcopy
And that's it! The script makes the file system on the target disk, mounts the disk, unmounts /proc, copies the data, removes the recursive garbage, copies over the sash, etc. The copying process takes about 15 to 20 minutes for an almost-full 549MB 4500rpm disk.
Firstly, installing patch files can sometimes use a lot of RAM. I once installed patch 2262 on a 6.2 Indy with 64MB; rqsall was swapping out to disk repeatedly, at times grabbing as much as 40MB. So, if you can, and assuming you've more than one patch file to do (not worth the hassle of opening up machines if one only has to install a small number of patches), temporarily increase the memory in the target system. I increased it from 32MB to 64MB and then again to 96MB after seeing what patch 2262 was doing.
Ah yes, patch files, the sysadmin's nightmare. Which to use? What must be installed before a particular patch file can be used? What are the incompatibilities?
The instructions that come with patch subsystems always say that one usually only wants to install patches for problems that one has encountered, but if one is installing a new OS it makes sense to me to install the entire patch set as a preventative measure before the system starts really being used. Besides, SGI themselves recommend installing a complete patch set.
Install an entire patch set you say? Well, not all is bad news. For a start, many patches won't be relevant to your system because of hardware options you don't have, systems you're not using (eg. I2 IMPACT) or software that isn't present (eg. 64bit libs). In my case, I installed around two-dozen patches after initially installing 6.2.
The first time I dealt with the patches for 6.2, I wasn't really bothered with the order in which I installed things and basically attempted to install the whole lot at once. Something went wrong; patches can be finicky things and I had problems - after installing a block of ten or so patches, the Indy would boot up with a network memory error and a tiny core dump (200K) was created. Well, I strive for perfection in my system so this wasn't acceptable. I did everything again from scratch and all was ok.
In fact, since that time, I've had occasion to install complete patch sets many, many times using the install script, and I've had no problems since. I think I was just unlucky the first time round.
Even so, some things do need to be said about patch sets. When one activates a patch CD from swmgr (or inst), I often find that some subsystems are selected for installation which cause conflicts, or patches are selected which just aren't needed at all. These conflicts are almost always due to the absence of software that some part of a particular patch expects to be present. So, if you get conflicts, don't panic; just check through the selected patches and see if there are any subsystems which don't need to be selected (64bit versions of things is the most common one), or just select the appropriate options in the conflicts window.
Sometimes, parts of patches are selected which are actually older versions of installed software. I think things like this occur because the scripting process occasionally selects all of the contents of a patch, rather than just the relevant parts of it. I've also seen patches selected which just aren't needed at all - why this happens I don't know. When I do a complete 6.2 installation on an Indy, ie. including the IDO and Varsity set, I observed these patches being selected by the auto-script (from the 'September 1998 Required/Recommended CD') that shouldn't have been selected:
Some admins may feel that they only want to install particular patches as opposed to a patch set, eg. security patches. This is fine, but do take care to ensure there aren't any conflicts, and that you don't overwrite software with older versions.
Usually, people will have patch CDs from which to obtain patch files, but if you don't then you can grab many patches from SGI's ftp site(s):
ftp://patches.sgi.com/support/patchset/ ftp://patches.sgi.com/support/patchset/.allrecpatch/
On several occasions, I've pointed people there who wanted patches and had no other means of obtaining them.
Notes:
What this actually means is that the patch is for all platforms, but the bug fixes only affect certain Challenge systems; the patch as a whole is fine for Challenge S and Power Challenge M.
Some patches affect several different subsystems and so the history images required can be large. After installing all relevant 6.2 patches I found that 33MB of disk space had been used up. Since I had no intention of ever removing the patches (I expect the next big change to be a move to IRIX 6.5), I decided to remove the patches' installation histories. The first time I did this, I used individual commands such as:
versions removehist patchSG0001537
and I removed the patches in reverse order to their installation (ie. last first). But if one intends to remove all patch histories, then there is a short-cut:
versions removehist "*"
Simple Backup
Assuming one has a DAT drive attatched and one is logged in as root, the easiest way to backup the system is to use the following sequence of commands and actions (an action represents one or more commands, the exact nature of which only you will know):
cd / umount /proc <unmount any NFS-mounted file systems and option disks> tar cv .As stated elsewhere, the /proc directory doesn't contain real files. They are 'images' of running processes which are used by various programs, most importantly the 'killall' command when one shuts down the system.
Backup Duration
Absolutely ages if one is unfortunate enough to be using a DDS1 DAT (2-4GB capacity, 150K/sec peak transfer without compression). DDS2 is 4-8GB capacity, and DDS3 is 12-24GB capacity (1.2MB/sec without compression). Thus, the transfer rate of DDS3 is about 10X faster than DDS1, so definitely try and use a DDS3 DAT if you can. If you don't have a DAT and are thinking of buying one, definitely get a DDS3! Trust me, it'll be well worth it in the long term. A DDS3 model which is definitely supported by SGIs is the Sony SDT9000 (make sure you have the latest Tape Driver patch installed, and that the DIP switches on the DAT unit are correctly set for SGIs, ie. switches 1 and 2 turned on, 3 and 4 off).
In case you're thinking that the above isn't important, just listen to this tale of woe: I had to reinstall the OS on the Challenge S I run due to total network failure (as it turned out, it was the hub unit that had gone wrong, but I didn't discover that until later). So, out came the DAT tape from the last required backup and into the our DDS1 DAT drive it went. This is the procedure I used to restore the system:
cd /disk2 tar xvAnd that's it! Afterwards, I powered down the machine, swapped the disks over and used the disk cloning procedure above to clone the disk's contents back onto the main 4GB disk.
That works out to be an average transfer rate of about 102K/sec (it's nowhere near the peek rate probably because many files are very small and cause overhead with respect to creating inodes, etc.)
I had to stay at work overnight to get it all done. The next day, I moaned to my HoD, saying I desperately needed a DDS3, or else I would go crazy. The HoD said yes, go order one. The fact that I looked like 40 miles of rough road after such a night probably helped. :D
The moral of the story is that it's easy to forget or ignore how long backup procedures take because normally they're done overnight by cron when one isn't around. Why not do a test and see how long it takes to backup your system? The elapsed time will be a good estimate of how long it'll take to restore your system should the need ever arise. If you're in a company or business where downtime equals lost revenue, then the time it would take to restore your system is something you definitely ought to know. Enter this command sequence before leaving work one evening:
cd / umount /proc <the usual unmounting of NFS directories, option disks, etc.> timex tar c .The last line is the important one. The tar operation is done without displaying any messages; once finished, the use of timex shows how long the tar command took to complete.
Fast Backup
As I type this section, I'm in the middle of upgrading the Challenge S server I run with the latest Varsity set (Aug98). I can't be bothered waiting hours for a DAT to finish (DDS3 not arrived yet), so I'm using a slightly different method...
The user files on my system reside in /home, which is a 4.5GB FastWide external SCSI disk. At present, this disk has 2.1GB of space free. The 4.5GB UltraSCSI system root disk only has 1.7GB used, so I figured what the hell, why not just backup the root disk onto the option disk?
However, one cannot use the normal disk cloning procedure to do this because the /home contents would be copied too (all 2GB of it). Thus, the items to be backed up must be specified precisely instead of using any kind of catch-all wildcard (usually the '.' character in the tar command).
After unmounting all non-/home NFS directories and /proc, this is the command sequence I entered ('yoda' is the name of the server):
cd /home mkdir yodabackup cd / tar cvBpf - .A* .S* .X* .a* .c* .d* .e* .g* .i* .j* .l* .n* .p* .r* .s* .v* .w* .z* CDROM bin debug dev dumpster etc floppy lib lib32 mapleson opt proc sbin stand tmp unix usr var | (cd /home/yodabackup; tar xBpf -)The tar command is one long line - it's split onto three lines here in order to be more readable (for the curious, my own account exists in /mapleson on my office admin Indy). This command archived everything into /home/yodabackup, without archiving any of /home itself at all. And since this was a disk-to-disk transfer, it was much faster than backing up to DAT.
If your system's software isn't particularly customised in any way, ie. it's just the system setup that's important (NIS, NFS, DNS, /etc/hosts, etc.) then a very quick way to 'backup' one's system is to only backup the key files which make the system unique, ie. copy the /etc, /var/netls, /var/flexlm, /var/www and /var/yp directories to somewhere safe, or put them onto DAT (your own system might have other important directories, but these five are the main ones).
If anything goes wrong and a reinstall is required, one would reinstall the OS from the base CDs and then use the contents of the backed-up /etc, /var/yp, etc. directories to change the system setup back to what it should be. Key files in /etc are hosts, sys_id, bootptab and a few others, but this technique grabs them all just to be sure. It's is very quick way of backing up the essential data, but a reinstall would require one to sit through the whole CD installation process (not so bad if you have a fast CDROM and good CPU).
For example, my office admin Indy doesn't have anything special on its root disk (all the really important sysadmin information is on my own external /mapleson-mounted 2GB disk), so this is the method I use to 'backup' the system. In my case, if anything goes wrong, I can clone the disk from one of the other lab machines that has the same basic installation, and then use the key backed-up files to restore the system to its correct state. A complete reinstall would thus take less than an hour, instead of 5 hours from a DDS1 DAT.
Conclusions
If you have a spare disk, or enough room on a disk already in use, consider exploiting it for temporary or fast backups (but not permanent backups).
If your system doesn't have anything special installed, and there's another machine one can clone a similar setup from, consider backing up just the essential files which show how the system was configured (/etc, /var/yp, etc.) However, make sure you've written down the procedures that you'd have to go through in order to make the changes (configuring NIS, etc.) I have a script that does much of the work for me, an example of which is shown above in the section on 'Installing an Operating System'.
Always have a fast CDROM! It'll make a difference if/when you have to install many items from multiple CDs. I've personally bought a Toshiba 32X CDROM for my own Indigo2 at home (model no. CD-XM-6201B). Complete information on this model of Toshiba CDROM is available on their web site.
I'm lucky of course since I have spare Indys from which to temporarily shunt memory around. Incidentally, installing patch files seems to grab even more RAM on occasion (use gmemusage to watch what happens during the installation of patch 2262). Even with 64MB, rqsall was grabbing over 40MB at one point and kept swapping out - time to up it to 96MB while I get these patches done...
Note that /usr/lib32 should not be NFS-mounted with 6.5. Thanks to Pim van Riezen (pim@webcity.nl) for this information.
Of course, plenty of other things could be NFS-mounted as well, but many products have their software elements broadly distributed across several directories and so are harder to server-mount with a single NFS mount point. The above items just happen to capture entire software products in one go, especially subsystems like ProDev.
Annoyingly, one of the largest directories that one might want to NFS mount if one could is not worth attempting, namely /usr/lib/nonshared. The reason is that many software products put their non-shared libraries in /usr/lib instead of /usr/lib/nonshared. Typical! I suppose one could NFS-mount it anyway, but not all non-shared libraries would be 'captured' in this way so watch out - some unshared libraries are huge (eg. ImageVision).