[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]


[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

Miscellaneous System Problems

Last Change: 04/Oct/2007

There is a vast amount of information out there on solving system problems, eg. search the DejaNews archives, browse the FAQ files, your own online help sources, etc. This page just covers the problems I have encountered, or those which people have often emailed me about.


Jot is corrupting my files!

This problem only occurs under IRIX 6.2 when jot is accessing a file across an NFS-mounted directory. If the user's files reside on a remote server disk, the symptoms can look suspiciously like a physical disk problem. A user will save a file, but the file's contents get completely erased, or a user loads a file into jot only to see the contents corrupted, or parts of it missing.

The solution is to install patch 2051 (Jot fix for mmapping), which changes the way jot behaves when accessing files over NFS-mounts. The patch is available from SGI's ftp site(s) at:

  ftp://patches.sgi.com/support/patchset/
  ftp://patches.sgi.com/support/patchset/.allrecpatch/

If you have patch CD sets (eg. 'IRIX 6.2 Required/Recommended Patches' and the two 'Fix on Fail' CDs that go with it), you'll find patch 2051 on the first of the 'Fix on Fail' CDs.

As far as I know, there is no need to install patch 2051 on a stand-alone system.

NOTE: an alternative solution is to upgrade to 6.5.


I keep getting a PANIC: KERNEL FAULT! Software detected SEGV

There are various different types of kernel fault. The solution described here only applies if the kernel fault message you're getting looks like this:

    <0>PANIC: KERNEL FAULT
    PC: 0x8801ad5c ep: 0xffffc710
    EXC code:128, `Software detected SEGV '
    Bad addr: 0x0, cause: 0x8<CE=0,EXC=RMISS>
    sr: 0xff03<IM8,IM7,IM6,IM5,IM4,IM3,IM2,IM1,IPL=0,MODE=KERNEL,EXL,IE>

and the operating system you're using is IRIX 6.2. I don't know if the problem described here occurs with other OS versions.

The problem lies with the kernel rollup patch (the latest version of which is patch 3156 from the September 1998 'Required/Recommended' CD). Apparently, all versions of this patch that are later than patch 2777 will cause the above kernel fault to occasionally occur on any IRIX 6.2 system which has a CPU that does not have secondary cache. This means Indy will be affected more than any other system because several zero-L2 CPUs were used in Indy:

All of the lab Indys I run have R4600PC 133MHz CPUs (a very common choice made by academic institutions that purchased Indys). The kernel fault was occuring two or three times a day on different machines; the error report in /var/adm/SYSLOG (and in the crash analysis report in /var/adm/crash) was always the same, ie. the 5 lines of text shown above (plus other info), even though the progam which was running at the time the fault occured was never the same.

RxxxxSC systems shouldn't be affected by the problem described here, so the very latest kernel rollup patch (3156) should be ok if you're using any of the following CPU types with IRIX 6.2:

If you've not yet installed any patches and are about to (eg. you've just reinstalled a system and haven't yet used the 'Required/Recommended' CD), then follow this procedure:

  1. If the Required/Recommended CD you're going to use is dated June 1998, then you've nothing to worry about. The kernel patch present on that CD is 2777, which apparently does not cause a problem. Carry out the patch set installation as you normally would and proceed to step 3.

  2. You have two choices at this point. You can either install patch 2777 first from somewhere else and then install all patches except the kernel rollup, or you can install all patches except the kernel rollup and then install patch 2777 from somewhere else. In detail, the two possibilities are:

    I personally had problems with the second method: rebooting after the installation caused a kernel configuration error.

    However, the first method worked ok.

  3. Exit Software Manager.

Of course, you may wish to install some patches from the 'Fix on Fail' CDs before exiting swmgr. The only such patch I always install is patch 2051 (Jot fix for mmapping); other patches may be important to you, but only you will know which ones.


What if I've already installed a post-2777 patch, eg. 3156?

My experience with this wasn't good.

13 of the 16 Indys in the lab I run have 549MB disks. To save space, I'd removed the patch histories, so patch 3156 couldn't be removed. Thus, for these Indys I had to reinstall them all. This was a trivial procedure: binary disk cloning allowed me to complete the entire reinstallation of all 13 machines in not much more than 2 hours, ie. a single reinstalled disk clones to 2, 2 to 4, 4 to 8, then the final 5 are done (40 mins to create the first disk and about 20 mins for each cloning step).

The other 3 Indys have 2GB disks with alot more software installed (full IDO and Varsity). These were the machines that I had problems with. I ran swmgr, chose 'Manage Installed Software', selected patch 3156 and executed the removal, which was successful. Afterwards, I decided to reboot the system just to make sure everything was ok. Maybe this was the wrong thing to do. Maybe I should have installed patch 2777 at that point, ie. as soon as patch 3156 had been removed. Either way, the system produced a kernel configuration error during the reboot and was unable to successfully reconfigure the operating system. It looked as if the patch installation history had not changed the system back to the state it was in before the patch was installed. Perhaps this happened because the patch was installed as part of a patch set - I don't know.

Anyway, the only choice was to reinstall one of the 2GB disks and clone it to the remaining two. The procedure isn't complex, it just takes much longer because three times more data needs to be installed (at the time, I had to use an old 2X CDROM). Unfortunately, I had executed the above patch removal procedure on all 3 machines with 2GB disks at the same time, so I've no idea whether installing patch 2777 immediately after removing patch 3156 would indeed have been the correct thing to do.

If you find yourself in a position where you have to remove patch 3156, and discover that installing patch 2777 immediately afterwards does work ok, then please let me know because you could save alot of Indy users the hassle of a reinstallation if they encounter this problem.


WARNING: NFS server: ec0 output queue full

If you are receiving the above warning message in your SYSLOG file, then you need to alter one of the kernel parameters that determines the network packet size. Usually, the relevant paramater (called nfs3_default_xfer) is set to 32K - this is often too high. It needs to be set to 16K or 8K (always try 16K first). Follow this procedure to make the necessary changes:

And that's it!

Note: the above procedure must be carried out on all the systems on the network.


How can I stop my monitor from going into power-saving?

Enter this command:

   jot /var/X11/xdm/Xlogin

Search for three lines that look like this:

   #if [ -x /usr/bin/X11/xset ] ; then
   #    /usr/bin/X11/xset s 600 3600
   #fi

The lines may or may not be uncommented, and note that the numerical values 600 and 3600 might vary between systems, eg. IRIX 6.5 on O2 has 1200 instead of 3600.

Make sure the lines are uncommented, and change the middle line so that the three lines look like this:

   if [ -x /usr/bin/X11/xset ] ; then
       /usr/bin/X11/xset s 0 0
   fi

The xset man page says that supplying xset with zero values makes the monitor deactivate its power saving feature. It works! I made these changes because I want passing new students to see the SGI monitors in an activated state and hopefully become interested. With power-saving turned on, students usually just see monitors with black screens and assume the SGIs are turned of and not for their use, which isn't true.


I don't have the root password for my Indigo system, but there's a PROM password too - how do I reset it?

I've often been asked this and usually could not offer an answer. However, thanks to the persistence of one person, I now have a solution. Thanks go to Alexandra L. Carter (carter@goodnet.com) who needed to reset the PROM password on her Indigo and didn't stop hunting until she found the answer. Alexandra says:

"I got the correct information from someone, and got it fixed! What you do is look at the BACKPLANE, and down there in the lower left corner is an 8-pin IC in a socket. You yank that, power up the unit, and get into the PROM monitor, which won't have the PW standing in your way because you physically removed it. Then, this is tricky, you stick that little NVRAM chip back in there with the machine powered up and in PROM Monitor, and when the NVRAM's back in its socket then you issue the resetpw command and it's out of the chip for good! You usually have to pull at least the CPU board to get that little chip out, then stick it on the end of a ruler or something with some doublesided tape and have it's little legs all lined up to go back into the socket smoothly. It's tricky and borders on computer abuse but it works, my opinion is these IRIS Indigos are buit like tanks and this one has taken any amount of abuse not only at my hands but also in filtering through the surplus electronics market. Once I was able to get into PROM Monitor I reinstalled IRIX and that was a breeze. This is a neat little machine. Only thing is, I wish I had c/c++ compilers for it. Alex."
However, note that using the above method can trash the Indigo's MAC address. So, unless someone can come up with a better way of removing the PROM password, Alex gives these further words of advice:

I haven't been able to find another way to get around this w/o trashing the MAC address and in fact I did this operation assuming that it could get trashed. The only two solutions I can think of are: Get a new backplane and solve the problem that way, or get an EEProm editor/programmer and using a good NVRAM chip as a master, reburn the chip with the locked-in PW. ... Alex


How do I change the window focusing on IRIX 6.2?

IRIX 6.5 allows one to change window focusing using the Toolchest's Desktop->Customise->Windows option, but IRIX 6.2 doesn't have this menu option, so here's how to do it.

Load the following file into an editor:

      /usr/lib/X11/app-defaults/4DWm

In the section entitled, "4Dwm Specific Appearance and Behavior Resources", look for the line which begins with:

      *keyboardFocusPolicy:

By default it says 'pointer'. Change it to 'click'. Save, then logout and back in again.


How can I install IRIX 5.3 on an R4K/250 Indigo2?
It keeps giving me a bus error after booting from the boot CD

The version of IRIX 5.3 on most 5.3 boot CDS has a bug concerning the support of CPUs which have 2MB L2 cache. Trying to boot on an R4K/200 (2MB) will also cause a bus error.

There are two solutions to this:

1. Install the system using a lesser CPU, such as an R4K/200 (1MB), put on all the software you need, and then after the latest 5.3 patch set has been applied, put back the original R4K/250 CPU. The patch update fixes the 2MB L2 bug.

2. Do the installation using the release of 5.3 which is specifically called, "IRIX 5.3 with 2MB Cache Support". As the name suggests, this release does not have the relevant bug.


How can I setup my system to dual-boot between
5.3 and 6.5? (or other combination)

My thanks to Mark Mitchell (mark@k-par.co.uk) for the very useful information contained in his past post on comp.sys.sgi.admin
Set the 6.5 disk to be on SCSI ID 1 and install normally (it's a little less complicated if the 6.5 disk is on ID 1; if your OS pair doesn't include 6.5, then it doesn't matter which way round you set it up). Disconnect the drive, connect the other disk on ID 2 and install 5.3 (or 6.2, etc.) Once done, create a script to allow the system to change the relevant nvram variables, namely SystemPartition and OSLoadPartition, ie. the script should contain these two lines at some point:

  nvram SystemPartition 'scsi(0)disk(1)rdisk(0)partition(8)'
  nvram OSLoadPartition 'scsi(0)disk(1)rdisk(0)partition(0)'

or if you're using something like an Octane and are trying to setup dual-boot 6.4/6.5, the lines for the disk on ID 2 would be:

  nvram SystemPartition 'xio(0)pci(15)scsi(0)disk(1)rdisk(0)partition(8)'
  nvram OSLoadPartition 'xio(0)pci(15)scsi(0)disk(1)rdisk(0)partition(0)'

Enter 'nvram' on its own to see the exact format you should use, eg. an O2 will use slightly different definitions.

After the script changes the nvram settings, just include the reboot command to reboot the system, and that's it!

Note that it's a good idea to include a confirmation question in the script, and some kind of time delay so that one can close any applications and logout if required, though this isn't essential. If you're creating the script for someone else to use then it's essential to make it clear what is happening, and offer the user the chance to cancel the operation.


But one of my disks is already installed and using the wrong ID. How do I change it? I tried and it doesn't boot!

I initially ran into this problem, thinking that after installing an OS on a disk using SCSI ID 1, all I had to do was change the nvram settings, change the SCSI ID and the disk would boot on ID 2 quite happily, but this isn't the case. The reason is that the /dev directory contains three device files which point to specific disk devices, namely root, rroot and swap. After changing the nvram settings and the SCSI ID, the disk can't boot because these device files will still be point to devices on SCSI ID 1. In my case, I had the wierd situation of the 5.3 disk on ID 2 trying to boot partly from the 6.5 disk which was now on ID 1.

The solution is to erase and recreate the device files using mknod, based on the major and minor numbers that are shown under /dev/rdsk.

Assuming the situation is as I've described, ie. a disk installed using ID 1 (in this case containing IRIX 5.3) is to be changed so that it boots from ID 2, then here's how to do it...

With the disk still on the original SCSI ID 1, boot up and login as root.

Examine the details of the relevant files under /dev/rdsk thus:

  # ls -l /dev/rdsk/dks0d2s0
      brw-------    2  root   sys    128,  32  Jul 17 21:26  dks0d2s0
  # ls -l /dev/rdsk/dks0d2s1
      brw-------    2  root   sys    128,  33  Jul 17 21:26  dks0d2s1

Here, the major/minor numbers for the system partition are 128 and 32, while the numbers for the swap partition are 128 and 33.

Now delete the unwanted old device files and create the new ones:

  cd /dev
  rm root rroot swap
  mknod root b 128 32
  mknod rroot c 128 32
  mknod swap b 128 33

By default, this creates the files with slightly incorrect permissions, so do the following to make the necessary changes:

  chmod go-r root rroot
  chmod o-r swap

And that's it! Shutdown (I did a hard power off to make sure the system didn't have a chance to alter anything), change the SCSI ID of the disk to 2 and then power up.

At least that's the theory anyway. Some people reckon that the kernel itself includes some values which refer to the SCSI ID the system should be using, so the advice is to rebuild the kernel too. I found that a straight rebuild attempt with autoconfig did nothing - the kernel was said to be current. Thus, just in case and to be absolutely sure, I forced it to create a new kernel:

  /etc/autoconfig -v -f

and then powered off, etc. It worked great! In my case, using an Indigo2, all I had to do was move the disk from the bottom slot to the top slot which changes the ID automatically to 2.


Example 5.3/6.5 Dual-Boot Scripts

For those who aren't familiar with writing scripts, here are the examples I used for a 5.3/6.5 dual-boot setup on an Indigo2. All scripts were stored in /usr/local/bin and all related text files were in /usr/local/doc.

For the 6.5 disk on SCSI ID 1, the script is called 'switch5.3' and contains:

  #!/bin/sh
  case `xconfirm -c -header "Switch OS to IRIX 5.3..." -icon question -b No -B Yes -file /usr/local/doc/switch5.3.txt`
  in
  Yes)
    echo "Ok! Rebooting the system to run IRIX 5.3 in 30 seconds..."
    echo "Close all applications and logout NOW!"
    nvram -v SystemPartition 'scsi(0)disk(2)rdisk(0)partition(8)'
    nvram -v OSLoadPartition 'scsi(0)disk(2)rdisk(0)partition(0)'
    sleep 10 && echo "T minus 20 seconds..." && sleep 10 && echo "T minus 10 seconds..." && sleep 10 && echo "Rebooting!..." && reboot&
  ;;

  No)
    echo "Switch to IRIX 5.3 cancelled."
  ;;
  esac

and the text file 'switch5.3.txt' contains:


  Are you sure you want to reboot
    the system to use IRIX 5.3?


For the 5.3 disk on SCSI ID 2, the script is called 'switch6.5' and contains:

  #!/bin/sh
  case `xconfirm -c -header "Switch OS to IRIX 6.5..." -icon question -b No -B Yes -file /usr/local/doc/switch6.5.txt`
  in
  Yes)
    echo "Ok! Rebooting the system to run IRIX 6.5 in 30 seconds..."
    echo "Close all applications and logout NOW!"
    nvram -v SystemPartition 'scsi(0)disk(1)rdisk(0)partition(8)'
    nvram -v OSLoadPartition 'scsi(0)disk(1)rdisk(0)partition(0)'
    sleep 10 && echo "T minus 20 seconds..." && sleep 10 && echo "T minus 10 seconds..." && sleep 10 && echo "Rebooting!..." && reboot&
  ;;

  No)
    echo "Switch to IRIX 6.5 cancelled."
  ;;
  esac

while the text file 'switch6.5.txt' contains:


  Are you sure you want to reboot
    the system to use IRIX 6.5?


I also included two extra scripts, one on each disk, to allow for an immediate change and reboot from one disk to another. These show what must be done at a minimum in order for the mechanism to work.

On the 6.5 disk, a script called switch5.3fast:

  #!/bin/sh
  echo "Rebooting the system to run IRIX 5.3 NOW!..."
  nvram -v SystemPartition 'scsi(0)disk(2)rdisk(0)partition(8)'
  nvram -v OSLoadPartition 'scsi(0)disk(2)rdisk(0)partition(0)'
  reboot&

and on the 5.3 disk, a script called switch6.5fast:

  #!/bin/sh
  echo "Rebooting the system to run IRIX 6.5 NOW..."
  nvram -v SystemPartition 'scsi(0)disk(1)rdisk(0)partition(8)'
  nvram -v OSLoadPartition 'scsi(0)disk(1)rdisk(0)partition(0)'
  reboot&

Note that the script names are quite long compared to most UNIX commands because it is important to make 'dangerous' operations of this kind difficult to do by accident.


My screen is frozen. How can I unfreeze it?

If you are unable to remote login from another system and kill off the offending process, then do this: hold down Left-CTRL, Left-Shift, F12 and the '/' symbol key on the numeric keypad all at the same time. This will reset the X server, putting you back at the login prompt. This method is sometimes referred to as the Vulcan Death Grip.


I've upgraded my Octane with a known ok V6,
but I get red lights on poweron and no display. What's wrong?

Note that you must have the later Cherokee PSU to use VPro graphics. I'm assuming here that your system's PSU has already been upgraded to the later type.

This can be a frustrating problem to solve. Swapping out frontplanes, using other V6s, etc., all have no effect. All the parts are known to be ok, but it still doesn't work.

The solution is to make sure the OS is reasonably up to date (eg. 6.5.26) and then reflash the system PROM before swapping in the V6. Do this with by logging in normally as root and entering:

  flash -v

This situation will most often occur when upgrading an older non-VPro Octane, eg. an R10K/250 SI. If your system is more up to date, eg. an R12K/300 SE, then it's more likely the PROM will already be at a revision that understands the V6 ok.

Thanks to Emery Davis for this information.


How do I mkfs a disk under IRIX 5.3-with-XFS? The normal mkfs <device name> does not work.

IRIX 5.3 with XFS has a different syntax for using mkfs compared to later OS releases such as 6.2 and 6.5.x.

The use of fx is the same, eg. to repartition a disk as a system disk with an XFS file system, but the syntax for doing a mkfs on the disk is more verbose. Thus, for example, if you're used to the mkfs command being used like this under IRIX 6.2/6.5:

  mkfs /dev/dsk/dks0d2s7

then here is how you would do the same operation under IRIX 5.3 with XFS:

  mkfs -b size=4096 -d name=/dev/dsk/dks0d2s7 -l internal,size=1000b

or if the disk is smaller than 4GB:

  mkfs -b size=512 -d name=/dev/dsk/dks0d2s7 -l internal,size=1000b

The best thing to do is to setup a couple of simple scripts in /usr/local/bin to do all of the hard work, eg. a file called mkf containing:

  #!/bin/sh
  mkfs -b size=512 -d name=$1 -l internal,size=1000b

and another called mkfl (mkf large) for use with 4096 block size:

  #!/bin/sh
  mkfs -b size=4096 -d name=$1 -l internal,size=1000b

then you can use the scripts much as you would use mkfs under 6.2 or 6.5:

  mkf /dev/dsk/dks0d2s7

Presumably SGI changed how mkfs works with the release of IRIX 6 so that the extra detail is not needed. Indeed, the detailed nature of XFS has changed several times, so that, for example, a disk mkfs'd under IRIX 6.5 will not be mountable under IRIX 6.2. If you want to have an XFS file system which can be mounted by any XFS-capable OS version, then you should use IRIX 6.2 to fx and mkfs the drive.

NOTE: none of the above applies to dealing with EFS file systems, and remember that the standard November 1994 release of 5.3 does not support XFS.


Why isn't my O2Cam working? It should be ok.

Sometimes the metal foil around the connector on the back of the AV board can get tangled with the O2Cam's plug. Trim the foil away, then try reseating the camera cable plug firmly.


Why does it take ages for my desktop icons to appear?

The delay is most often caused by having 'nis' included in the host resolution order when it's not necessary, ie. when no NIS system is being used. The timeout can cause a delay of typically 30 to 60 seconds. To correct this, remove nis from the host resolution order in /etc/nsswitch.conf, ie. as root, edit the line so that it looks like this:

  hosts:                     files dns

If you don't make use of any DNS servers, then 'dns' can be removed aswell.

It is also a good idea to make sure the icon caches are up-to-date and cleaned, so enter as root:

  /usr/lib/desktop/cleanCache types
  /usr/lib/desktop/cleanCache layouts
  /usr/lib/desktop/flushCache types
  /usr/lib/desktop/flushCache layouts

Lastly, I find it helps a little if a file manager window is always open and minimised, so either use the Toolchest or enter:

  fm


Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)
[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]
[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]