This analysis examines how different R10000 CPUs perform in Octane, ie. the focus is on how different R10000s perform in the same system, in this case Octane (I have separate pages dealing with how the same CPU performs in different systems).
Note that I do not have any SPEC95 data for 175MHz R10000. Since many systems will be using this CPU, please contact me if you have any detailed SPEC95 data for R10K/175 (final base and peak averages are of little use; it's the detailed results I'm looking for).
As with all these studies, a 3D Inventor model of the data is available (screenshots of this are included below). Load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.
All source data for this analysis came from www.specbench.org.
Given below is a comparison table of available single-CPU R10000 SPECfp95 test results for Octane, covering 195MHz and 250MHz versions. Faster CPUs are leftmost in the table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.
R10000 R10000 % Increase 250MHz 195MHz (195 -> 250) tomcatv 29.4 25.3 16.2% swim 46.3 40.6 14.0% su2cor 11.2 9.64 16.2% hydro2d 11.4 9.97 14.3% mgrid 18.5 15.9 16.4% applu 13.2 11.2 17.9% turb3d 16.9 13.8 22.5% apsi 16.0 12.8 25.0% fpppp 37.1 29.7 24.9% wave5 27.4 22.4 22.3% Octane SPECfp95 Comparison
Next, a separate comparison graph for each of the ten SPECfp95 tests:
tomcatv:
swim:
su2cor:
hydro2d:
mgrid:
applu:
turb3d:
apsi:
fpppp:
wave5:
Observations
Remember that the increase in clock speed from 195MHz to 250MHz is 28.3%. No one would expect a perfect scaling of speed, so a good result would be a 25% increase. Hence, one must examine whether each test achieves an increase as large as this or not. Since the R10K/250 has its L2 cache running at 2/3rds core speed, one must also examine whether any test benefits from the faster L2 cache.
Going from 195MHz to 250MHz, most of the tests improve by around 15% or so. Why not more than 25%? From studying how the results for Origin change when moving from 195MHz to 250MHz, my explanation for this is as follows:
Or, put another way, I contend that it is perfectly possible to have the following scenario without even realising it: imagine two tasks, A & B; A is run on a 4MB L2 Origin and is clock speed limited, not L2 limited; B is run on a 1MB L2 Octane and is L2 limited. Swapping the tasks over, ie. running A on Octane and B on Origin, would dramatically improve the performance of task B by 15% to 20%, whilst task A would carry on as normal, or at worst would slow down by a mere 5% or so. Thus, if you are an administrator with a variety of systems, try moving your tasks around from system to system: you may find a better way of allocating them to take best advantage of the available hardware.
Thus, some of the fp tests do not need a larger L2 cache (ie. 1MB is enough). These tests will show above-average performance improvements when upgrading from 195MHz to 250MHz on both Octane and Origin (ie. the test will take full advantage of the higher clock speed and the faster L2 cache).
Other tests need more than 1MB L2 in order to execute in an optimum manner. When upgrading from 195MHz to 250MHz, these tests will show a larger performance improvement when the upgrade concerns Origin compared to Octane (because on Octane the less-than-optimum L2 size will still be a limiting factor).
The following table, comparing the percentage improvements between 195MHz to 250MHz R10000s for Octane and Origin, shows this effect quite clearly:
Octane Origin % Increase % Increase tomcatv 16.2% 28.6% swim 14.0% 21.4% su2cor 16.2% 35.7% hydro2d 14.3% 31.8% mgrid 16.4% 25.0% applu 17.9% 23.1% turb3d 22.5% 26.8% apsi 25.0% 35.3% fpppp 24.9% 27.7% wave5 22.3% 32.2%
Of most note are tomcatv, su2cor, hydro2d, mgrid, apsi and wave5. These show large jumps for Origin compared to Octane. Sure enough, if one examines the R10K/195 and R10K/250 performance comparison pages, you'll see that these are precisely the tests which benefit from a larger L2 when the CPU in different systems is actually at the same clock speed. On Origin, with its large L2, these tests can take full advantage of the faster L2 access speed as well as the higher clock speed.
The lesson to be learned here is that an upgrade decision shouldn't be an automatic affair. R10K/250 may be a 28% increase in clock speed over R10K/195, but for Octane users it's entirely possible that they'd only see a performance improvement of around 15% for such an upgrade because of the continuing use of a smaller L2 on Octane. If an Octane user's task comes under this category, then they'd be better off waiting for a newer CPU release from SGI which offered a larger L2 as well as a high higher clock speed, either a future R10000 or even R12000.
In other words: if you're an Origin user considering upgrading from 195MHz to 250MHz, it is highly likely that your task would show a greater performance improvement than that implied by the increase in clock speed. But if you're an Octane user, then you should definitely have some proper tests done first before deciding whether an upgrade is worth it.
Test first, decide later!
The rationale and method for this examination were the same as for SPECfp95. Thus, given below is a comparison table of the various SPECint95 test results. After the table and 3D graphs is a short-cut index to the original results pages.
R10000 R10000 % Increase 250MHz 195MHz (195 -> 250) go 14.1 11.4 23.7% m88ksim 14.1 11.3 24.8% gcc 12.5 10.1 23.8% compress 13.9 11.3 23.0% li 11.9 9.59 24.1% ijpeg 12.6 10.1 24.8% perl 16.4 13.0 26.2% vortex 13.8 11.2 23.2% Octane SPECint95 Comparison
Next, a separate comparison graph for each of the eight SPECint95 tests:
go:
m88ksim:
gcc:
compress:
li:
ijpeg:
perl:
vortex:
Compared to the fp tests, the results show a much lower variance and a surprisingly good uniform performance increase from R10K/195 to R10K/250 - even vortex behaves itself. However, none of the tests show an increase that is greater than the simple ratio of clock speeds (250/195 = 28%), which suggests to me - and this is just speculation on my part - that Octane's small 1MB L2 cache size could be a limiting factor, perhaps preventing the tests from taking full advantage of the faster L2 speed offered by R10K/250. If this is true, Origin should show larger increases for these tests. Here is a comparison:
Octane Origin % Increase % Increase go 23.7% 30.1% m88ksim 24.8% 25.7% gcc 23.8% 29.8% compress 23.0% 32.7% li 24.1% 28.5% ijpeg 24.8% 26.5% perl 26.2% 25.6% vortex 23.2% 35.4%
In my opinion, there is a nice correlation here: the smaller the percentage increase is for Octane, the larger the percentage increase is for Origin. Put another way, the higher the increase is for Octane, the more likely it is that Origin will show a similar increase that is less than the clock ratio.
This leads to the ironic conclusion that if an Octane upgrade shows a small performance improvement for an integer task, then a similar upgrade for Origin should show a much larger performance increase.
I've no way of testing this, and the above speculation could just be a phenomenon that's specific to the tests used in SPECint95. Still, as with other comparisons, it all goes to show that the more L2 one has, the better.