This analysis examines how different R10000/R12000 CPUs perform in Origin200, ie. the focus is on how different CPUs perform in the same system, in this case Origin200 (I have separate pages dealing with how the same CPU performs in different systems).
Note that I do not have any SPEC95 data for R10000 180MHz Origin200QC (this is the later 180MHz version of Origin200/180 with a larger 2MB L2 cache that runs at a faster speed). Since some systems will be using this CPU, please contact me if you have any detailed SPEC95 data for Origin200QC/180 (final base and peak averages are of little use; it's the detailed results I'm looking for).
As with all these studies, a 3D Inventor model of the data is available (screenshots of this are included below). Load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.
For this analysis, the 180MHz R10000 Origin200 data was obtained from www.specbench.org. SPEC does not yet have 225MHz R10000 Origin200QC data posted up, but in meantime I have been supplied some fp data by John McCalpin, a former Server System Architect for SGI. John told me:
Given below is a comparison table of single-CPU R10000/R12000 SPECfp95 test results for Origin200, covering 180MHz and 225MHz R10000 (remember that the 225MHz version has a faster, larger L2 cache), and 270MHz R12000. Faster CPUs are leftmost in the table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.
R12000 R10000 R10000 270MHz 225MHz 180MHz tomcatv 33.3 28.0 22.0 swim 44.0 40.3 34.5 su2cor 14.6 13.0 8.47 hydro2d 16.7 11.7 7.99 mgrid 24.9 18.7 14.8 applu 14.9 12.5 11.0 turb3d 20.4 17.3 14.3 apsi 25.3 18.6 11.9 fpppp 42.2 34.0 28.3 wave5 35.6 29.1 20.8 %Increase %Increase %Increase FROM: R10K/180 R10K/180 R10K/225 TO: R10K/225 R12K/270 R12K/270 tomcatv 27.3% 51.4% 18.9% swim 16.8% 27.5% 9.2% su2cor 53.5% 72.4% 12.3% hydro2d 46.4% 109.0% 42.7% mgrid 26.4% 68.2% 33.2% applu 13.6% 35.5% 19.2% turb3d 21.0% 42.7% 17.9% apsi 56.3% 112.6% 36.0% fpppp 20.1% 49.1% 24.1% wave5 40.0% 71.2% 22.3% Origin200 SPECfp95 Comparison
Next, a separate comparison graph for each of the ten SPECfp95
tests:
tomcatv:
swim:
su2cor:
hydro2d:
mgrid:
applu:
turb3d:
apsi:
fpppp:
wave5:
Observations
It's important to remember that the 225MHz CPU is using a 2MB 225MHz L2 cache, ie. faster and larger than the 1MB cache used with the 180MHz CPU.
Note that the ratio of the clock speed increase itself is 25%. However, many individual tests show much larger increases than this, and even the final peak averages (which I normally never deal with because I don't think they're particularly useful) show an increase of roughly 31%. This is compelling evidence that the faster, larger L2 cache on the QC model is indeed doing its job. Some tests increase by over 50%, a very significant performance improvement.
What is most interesting is that the improvements are quite different from those shown by upgrading Octane or Origin from 195MHz to 250MHz (clock difference ratio of 28%). Compare:
Octane Origin2000 Origin200 % Increase % Increase % Increase (195->250) (195->250) (180->225) tomcatv 16.2% 28.6% 27.3% swim 14.0% 21.4% 16.8% su2cor 16.2% 35.7% 53.5% hydro2d 14.3% 31.8% 46.4% mgrid 16.4% 25.0% 26.4% applu 17.9% 23.1% 13.6% turb3d 22.5% 26.8% 21.0% apsi 25.0% 35.3% 56.3% fpppp 24.9% 27.7% 20.1% wave5 22.3% 32.2% 40.0%
Further, any test which shows a good improvement for the Origin2000 upgrade correlates to a very good improvement for the Origin200 upgrade (examine su2cor, hydro2d, apsi and wave5). The larger the improvement for Origin2000, the more likely it is that Origin200 will show a better improvement. Again, I expect this is because Origin200's cache is running at a higher clock speed, despite Origin2000 having a larger L2.
An obvious exception is fpppp, but this is because fpppp uses a tiny data set which actually fits into the R10K's L1 data cache, so L2 cache issues are not important.
One might think applu shows an odd result: the Origin200 improvement isn't as high as the other tests. However, Origin200/180 and Octane/195 actually give very similar SPEC ratios in the first instance (11.0 and 11.2 respectively), whilst Origin2000 had a slightly better result (11.7), so don't read too much into the lower percentage increase for Origin200. The actual SPEC ratios aren't that different for applu overall (O200QC/225 gets 12.5, Octane/250 gets 13.2, O2000/250 gets 14.4). What is far more interesting is that, for six out of the ten tests, Origin200QC/225 is faster than Octane/250. This definitely shows that a larger, quicker L2 cache can benefit some tasks to a significant degree. Given this fact, one must look forward to seeing the release of future CPUs at higher clock speeds which have the L2 cache running at full core speed.
The lesson to be learned here is that an upgrade decision shouldn't be an automatic affair. R10K/225 may be a 25% increase in clock speed over R10K/180, but for Origin200 users it's entirely possible that they'd see a performance improvement of twice that or more (>50%), because of the larger, faster L2 cache. But this will not always be the case and will depend on the application in question. So, test first, decide later!