[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]

[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

Octane Single-CPU SPEC95 Performance
Comparison Using Different R10000s

Last Change: 26/May/1998

SPEC's Introduction to SPEC95
SPECfp95 Analysis
SPECint95 Analysis

(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale)
(the graphs are also to the same scale as those given on other single-CPU comparison pages)

Octane Single-CPU SPECfp95 Performance Comparison

Objectives

This analysis examines how different R10000 CPUs perform in Octane, ie. the focus is on how different R10000s perform in the same system, in this case Octane (I have separate pages dealing with how the same CPU performs in different systems).

Note that I do not have any SPEC95 data for 175MHz R10000. Since many systems will be using this CPU, please contact me if you have any detailed SPEC95 data for R10K/175 (final base and peak averages are of little use; it's the detailed results I'm looking for).

As with all these studies, a 3D Inventor model of the data is available (screenshots of this are included below). Load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.

All source data for this analysis came from www.specbench.org.

Given below is a comparison table of available single-CPU R10000 SPECfp95 test results for Octane, covering 195MHz and 250MHz versions. Faster CPUs are leftmost in the table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.

          R10000   R10000     % Increase
          250MHz   195MHz    (195 -> 250)

tomcatv    29.4     25.3         16.2%
swim       46.3     40.6         14.0%
su2cor     11.2     9.64         16.2%
hydro2d    11.4     9.97         14.3%
mgrid      18.5     15.9         16.4%
applu      13.2     11.2         17.9%
turb3d     16.9     13.8         22.5%
apsi       16.0     12.8         25.0%
fpppp      37.1     29.7         24.9%
wave5      27.4     22.4         22.3%

      Octane SPECfp95 Comparison

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz | 195MHz]

Next, a separate comparison graph for each of the ten SPECfp95 tests:

tomcatv:

tomcatv comparison graph

swim:

swim comparison graph

su2cor:

su2cor comparison graph

hydro2d:

hydro2d comparison graph

mgrid:

mgrid comparison graph

applu:

applu comparison graph

turb3d:

turb3d comparison graph

apsi:

apsi comparison graph

fpppp:

fpppp comparison graph

wave5:

wave5 comparison graph

Observations

Remember that the increase in clock speed from 195MHz to 250MHz is 28.3%. No one would expect a perfect scaling of speed, so a good result would be a 25% increase. Hence, one must examine whether each test achieves an increase as large as this or not. Since the R10K/250 has its L2 cache running at 2/3rds core speed, one must also examine whether any test benefits from the faster L2 cache.

Going from 195MHz to 250MHz, most of the tests improve by around 15% or so. Why not more than 25%? From studying how the results for Origin change when moving from 195MHz to 250MHz, my explanation for this is as follows:

For any particular test, there will be an 'optimum' amount of L2 cache which that test ought to have in order for it to be executed at the full potential of the CPU. Since SPEC95 is a test that is now three years old, it would not be surprising if some of the tests find their optimum in the larger cache sizes which exist today. However, not all of the tests are like this. What one can say is that, out of the ten fp tests, some of them do not benefit much from a larger L2, ie. 1MB is enough, while others behave in a manner which suggests that, despite the increase in clock speed, the presence of 1MB L2 is not enough; my rationale for this is that these are the tests which go on to show much greater performance improvements when comparing 195MHz to 250MHz on 4MB L2 Origin. There is a good correlation between the results and whether or not a test shows an improvement which is similar to the clock increase, given the amount of L2 cache present.

Why is this relevant? The important point here is identifying what it is about a test which will improve it the most: higher clock speed, faster/larger L2 cache, or both. In other words, there is little point in upgrading a CPU from 195MHz to 250MHz if the task concerned is one which is limited more by L2 cache size & speed, rather than clock speed.

Or, put another way, I contend that it is perfectly possible to have the following scenario without even realising it: imagine two tasks, A & B; A is run on a 4MB L2 Origin and is clock speed limited, not L2 limited; B is run on a 1MB L2 Octane and is L2 limited. Swapping the tasks over, ie. running A on Octane and B on Origin, would dramatically improve the performance of task B by 15% to 20%, whilst task A would carry on as normal, or at worst would slow down by a mere 5% or so. Thus, if you are an administrator with a variety of systems, try moving your tasks around from system to system: you may find a better way of allocating them to take best advantage of the available hardware.

Thus, some of the fp tests do not need a larger L2 cache (ie. 1MB is enough). These tests will show above-average performance improvements when upgrading from 195MHz to 250MHz on both Octane and Origin (ie. the test will take full advantage of the higher clock speed and the faster L2 cache).

Other tests need more than 1MB L2 in order to execute in an optimum manner. When upgrading from 195MHz to 250MHz, these tests will show a larger performance improvement when the upgrade concerns Origin compared to Octane (because on Octane the less-than-optimum L2 size will still be a limiting factor).

The following table, comparing the percentage improvements between 195MHz to 250MHz R10000s for Octane and Origin, shows this effect quite clearly:

                   Octane        Origin
                 % Increase    % Increase

      tomcatv       16.2%         28.6%
      swim          14.0%         21.4%
      su2cor        16.2%         35.7%
      hydro2d       14.3%         31.8%
      mgrid         16.4%         25.0%
      applu         17.9%         23.1%
      turb3d        22.5%         26.8%
      apsi          25.0%         35.3%
      fpppp         24.9%         27.7%
      wave5         22.3%         32.2%

Of most note are tomcatv, su2cor, hydro2d, mgrid, apsi and wave5. These show large jumps for Origin compared to Octane. Sure enough, if one examines the R10K/195 and R10K/250 performance comparison pages, you'll see that these are precisely the tests which benefit from a larger L2 when the CPU in different systems is actually at the same clock speed. On Origin, with its large L2, these tests can take full advantage of the faster L2 access speed as well as the higher clock speed.

The lesson to be learned here is that an upgrade decision shouldn't be an automatic affair. R10K/250 may be a 28% increase in clock speed over R10K/195, but for Octane users it's entirely possible that they'd only see a performance improvement of around 15% for such an upgrade because of the continuing use of a smaller L2 on Octane. If an Octane user's task comes under this category, then they'd be better off waiting for a newer CPU release from SGI which offered a larger L2 as well as a high higher clock speed, either a future R10000 or even R12000.

In other words: if you're an Origin user considering upgrading from 195MHz to 250MHz, it is highly likely that your task would show a greater performance improvement than that implied by the increase in clock speed. But if you're an Octane user, then you should definitely have some proper tests done first before deciding whether an upgrade is worth it.

Test first, decide later!

Octane Single-CPU SPECint95 Performance Comparison

Just as for the SPECfp95 analysis given above, you can download a 3D performance graph (gzipped) if you wish: load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective), etc.

The rationale and method for this examination were the same as for SPECfp95. Thus, given below is a comparison table of the various SPECint95 test results. After the table and 3D graphs is a short-cut index to the original results pages.

          R10000   R10000     % Increase
          250MHz   195MHz    (195 -> 250)

go         14.1     11.4         23.7%
m88ksim    14.1     11.3         24.8%
gcc        12.5     10.1         23.8%
compress   13.9     11.3         23.0%
li         11.9     9.59         24.1%
ijpeg      12.6     10.1         24.8%
perl       16.4     13.0         26.2%
vortex     13.8     11.2         23.2%

      Octane SPECint95 Comparison

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz | 195MHz]

Next, a separate comparison graph for each of the eight SPECint95 tests:

go:

go comparison graph

m88ksim:

m88ksim comparison graph

gcc:

gcc comparison graph

compress:

compress comparison graph

li:

li comparison graph

ijpeg:

ijpeg comparison graph

perl:

perl comparison graph

vortex:

vortex comparison graph

Compared to the fp tests, the results show a much lower variance and a surprisingly good uniform performance increase from R10K/195 to R10K/250 - even vortex behaves itself. However, none of the tests show an increase that is greater than the simple ratio of clock speeds (250/195 = 28%), which suggests to me - and this is just speculation on my part - that Octane's small 1MB L2 cache size could be a limiting factor, perhaps preventing the tests from taking full advantage of the faster L2 speed offered by R10K/250. If this is true, Origin should show larger increases for these tests. Here is a comparison:

           Octane        Origin
         % Increase    % Increase

go         23.7%         30.1%
m88ksim    24.8%         25.7%
gcc        23.8%         29.8%
compress   23.0%         32.7%
li         24.1%         28.5%
ijpeg      24.8%         26.5%
perl       26.2%         25.6%
vortex     23.2%         35.4%

In my opinion, there is a nice correlation here: the smaller the percentage increase is for Octane, the larger the percentage increase is for Origin. Put another way, the higher the increase is for Octane, the more likely it is that Origin will show a similar increase that is less than the clock ratio.

This leads to the ironic conclusion that if an Octane upgrade shows a small performance improvement for an integer task, then a similar upgrade for Origin should show a much larger performance increase.

I've no way of testing this, and the above speculation could just be a phenomenon that's specific to the tests used in SPECint95. Still, as with other comparisons, it all goes to show that the more L2 one has, the better.

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

Octane Single-CPU SPEC95 Performance Comparison Using Different R10000s

Last Change: 26/May/1998

SPEC's Introduction to SPEC95 SPECfp95 Analysis SPECint95 Analysis

(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale) (the graphs are also to the same scale as those given on other single-CPU comparison pages)

Octane Single-CPU SPECfp95 Performance Comparison

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz | 195MHz]

Octane Single-CPU SPECint95 Performance Comparison

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz | 195MHz]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

Octane Single-CPU SPEC95 Performance
Comparison Using Different R10000s

SPEC's Introduction to SPEC95
SPECfp95 Analysis
SPECint95 Analysis

(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale)
(the graphs are also to the same scale as those given on other single-CPU comparison pages)