This analysis examines how different CPUs perform in O2, ie. the focus is on how different R5000s and R10000s perform in the same system, in this case O2 (I have separate pages dealing with how the same CPU performs in different systems).
Note that I do not have any SPEC95 data for the following CPUs when used in O2:
As with all these studies, a 3D Inventor model of the data is available (screenshots of this are included below). Load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.
All source data for this analysis came from www.specbench.org.
Given below is a comparison table of available SPECfp95 test results for O2. Faster CPUs are leftmost in the table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.
R10000 R10000 R5000SC R5000PC 250MHz 195MHz 180MHz 180MHz tomcatv 10.2 9.78 7.35 6.77 swim 14.4 13.9 10.6 10.6 su2cor 5.40 4.72 2.42 1.94 hydro2d 3.26 3.17 2.48 2.48 mgrid 7.26 6.95 5.02 4.39 applu 6.49 5.92 4.23 4.05 turb3d 11.1 9.57 5.60 4.25 apsi 11.6 9.77 5.47 4.39 fpppp 37.2 29.3 10.3 7.96 wave5 12.8 11.8 6.92 4.20 O2 SPECfp95 Comparison
Next, a separate 2D comparison graph for each of the ten SPECfp95 tests:
tomcatv:
swim:
su2cor:
hydro2d:
mgrid:
applu:
turb3d:
apsi:
fpppp:
wave5:
Observations
Obviously, the R10000 performs better than R5000 in O2, though not by much in some cases. Remember though that the above data does not include the 200MHz R5000 which has a larger L2 cache than the 180MHz R5000 (1MB compared to 512K), so don't form any concrete judgements just yet. I need to obtain detailed R5K/200 SPEC results in order to offer a complete picture, especially since R5K/200 is probably the most popular CPU being used in O2s today (ie. in terms of the configuration of O2s being sold at the moment).
There are a variety of reasons why R10000 doesn't perform as well in O2 compared to Octane or Origin. These are discussed in depth on the O2 architecture page, and further on the R10000/195 comparison page, so I won't repeat the details here.
One important point though: ignore fpppp. That particular test involves a tiny data set (small enough to fit into L1 cache on R10000, never mind L2 cache!), somewhere between 8K and 32K; it's highly unlikely that a typical task in today's computing environment will involve data sets anything like as small as that used by fpppp. Most modern applications process far larger and more complex data sets, sometimes into the gigabyte range (eg. seismic modeling, etc.) This isn't to say your task isn't like fpppp, just that's it's very unlikely. In the context of the above results, fpppp is annoying because it skews the final averages - this is why none of my SPEC95 analysis pages ever use SPEC averages as a basis for making conclusions.
Note that fpppp will not be included in SPEC98.
The rationale and method for this examination were the same as for SPECfp95. Thus, given below is a comparison table of the various SPECint95 test results. After the table and 3D graphs is a short-cut index to the original results pages.
R10000 R10000 R5000SC R5000PC 250MHz 195MHz 180MHz 180MHz go 13.9 11.0 5.19 3.51 m88ksim 14.5 11.1 5.25 5.04 gcc 10.7 9.02 4.57 3.16 compress 12.0 10.6 3.64 2.45 li 11.9 9.42 5.25 4.36 ijpeg 11.5 9.35 4.40 4.04 perl 15.7 13.0 6.52 5.04 vortex 9.74 8.20 4.27 2.89 O2 SPECint95 Comparison
Next, a separate 2D comparison graph for each of the eight SPECint95 tests:
go:
m88ksim:
gcc:
compress:
li:
ijpeg:
perl:
vortex:
Observations
Obviously, R10000 is much better than R5000 in O2 for integer tasks, but remember that the 200MHz R5000 (with 1MB L2) is not included in this study (no data available yet).
Other SPEC95 analysis pages I've written have described the way in which most SPECint95 tests involve small data sets, resulting in few cache misses. Only vortex (and to a lesser extent gcc) involves a varied memory access patern, benefiting from a large L2 cache as shown by the 250MHz R10000 4MB L2 Origin results. Thus, baring in mind the factors affecting cache miss behaviour in R10K O2, it isn't surprising to see vortex showing the lowest value out of the eight R10K O2 SPECint95 results. If one looks at the tests which do not seem to involve heavy memory traffic, eg. perl and m88ksim, O2 shows good results, matching Octane and Origin fairly well (though it's within the margins of error that result from compiler optimisation, R10K/250 O2 actually beats R10K/250 Origin2000 for m88ksim).
What this means is that if you have an int task which doesn't cause heavy memory traffic and doesn't benefit much from a large (>1MB) L2 cache, then you'll get good performance with an R10K O2, ie. there's no need to spend a fortune on an Origin2000. This also means that, if you have a variety of int tasks and systems, it's well worth experimenting to see which system offers the best performance for each task; it's entirely possible that swapping two tasks over between two different machines may result in better performance for one task but no loss of performance for the other (an extreme example would be vortex vs. m88ksim for O2 vs. Origin2000).
How can one tell if one's task is like m88ksim, perl, etc.? Well, one can use tools like gr_osview to study what's happening whilst the code is running (degree of memory traffic, etc.), one can run comparison tests using other systems with identical CPUs to test the effects of different L2 sizes (eg. 195MHz R10000 4MB L2 Origin vs. 195MHz R10000 1MB L2 Octane, or 195MHz 1MB L2 Power Challenge vs. 195MHz 2MB L2 Power Challenge), and one can run tests using old vs. new systems with the same CPUs and L2 sizes to test whether one's task can benefit from better memory latency, higher memory bandwidth and better outstanding cache miss support (eg. 195MHz 1MB L2 Octane vs. 195MHz 1MB L2 Indigo2).
By way of typical evidence, note that m88ksim shows no improvement when moving from 195MHz 1MB L2 Power Challenge to 195MHz 2MB L2 Power Challenge (the same applies to 1MB L2 Octane vs. 4MB Origin). Performance differences only really show up when looking at R10K/250, but even then the margins are not great (vortex is the exception).
Of course, you'll need access to different systems to run comparison tests (though you can use gr_osview on any system to gain some insight), but if SGI values your custom then they should be willing to help out with tests in the event that you do not have access to the necessary systems.
Remember that many typical daily tasks involve small data sets, eg. processing typical Internet movie frames (half-size PAL, half-size NTSC). But sometimes careful thought is required; eg. a full-size NTSC frame will fit into a 1MB L2 cache (0.9MB), but a full-size PAL frame will not (1.27MB). Thus, a system with more than 1MB L2 will be better for processing PAL data (eg. 4MB L2 versions of R10000 in Origin). Though it's possible one may be able to hardware-accelerate a movie processing task, depending on the system (hardware JPEG support on O2 with ICE, and other systems using video accelerator boards, eg. Octane Compression).
Judging exactly which system is best for a particular task, or which processor is best for a system already decided upon (perhaps because of budget constraints) may not always be easy. Thus, always have proper tests done, and investigate thoroughly, before making any final purchasing decision.