This analysis examines how different R10000 CPUs perform in Origin2000 for single-CPU performance only, ie. the focus is on how different R10000s perform in the same system, in this case Origin2000 (I have separate pages dealing with how the same CPU performs in different systems).
As with all these studies, a 3D Inventor model of the data is available (screenshots of this are included below). Load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.
All source data for this analysis came from www.specbench.org.
Given below is a comparison table of available single-CPU R10000 SPECfp95 test results for Origin2000, covering 195MHz and 250MHz versions; for reference, an equivalent percentage increase is also included for each test, plus a final average percentage increase. Faster CPUs are leftmost in the table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.
R10000 R10000 R12000 195MHz 250MHz 300MHz 4MB L2 4MB L2 8MB L2 tomcatv 26.9 34.6 47.4 swim 41.2 50.0 71.3 su2cor 11.5 15.6 20.9 hydro2d 12.6 16.6 26.3 mgrid 18.8 23.5 37.2 applu 11.7 14.4 17.6 turb3d 15.3 19.4 26.7 apsi 15.6 21.1 30.3 fpppp 29.6 37.8 47.2 wave5 25.5 33.7 41.0 % Increase % Increase % Increase FROM: R10K/195 R10K/195 R10K/250 TO: R10K/250 R12K/300 R12K/300 tomcatv 28.6 76.2 37.0 swim 21.4 73.1 42.6 su2cor 35.7 81.7 34.0 hydro2d 31.7 108.7 58.4 mgrid 25.0 97.9 58.3 applu 23.1 50.4 22.2 turb3d 26.8 74.5 37.6 apsi 35.3 94.2 43.6 fpppp 27.7 59.5 24.9 wave5 32.2 60.8 21.7 Origin2000 SPECfp95 Comparison
Next, a separate comparison graph for each of the ten SPECfp95 tests:
tomcatv:
swim:
su2cor:
hydro2d:
mgrid:
applu:
turb3d:
apsi:
fpppp:
wave5:
Observations
Remember that the increase in clock speed from 195MHz to 250MHz is 28.2%. No one would expect a perfect scaling of speed, so a good result would be a 25% increase. Hence, one must examine whether each test achieves an increase as large as this or not.
Since the R10K/250 has its L2 cache running at 2/3rds core speed, one must also bare in mind that some tests may benefit from this faster L2 cache speed; however, this may be difficult to judge because of the small number of tests under dicussion, so I will not cover this aspect here in detail. One could perhaps form some conclusions by carefully comparing Origin200, Origin2000 and Octane figures, but any statements could easily be misleading because SPECfp95 only consists of ten tests.
Anyway, the main points which arise from the above graphs are as follows:
Octane Origin % Increase % Increase tomcatv 16.2% 28.6% swim 14.0% 21.4% su2cor 16.2% 35.7% hydro2d 14.3% 31.8% mgrid 16.4% 25.0% applu 17.9% 23.1% turb3d 22.5% 26.8% apsi 25.0% 35.3% fpppp 24.9% 27.7% wave5 22.3% 32.2%
By examining systems such as PowerChallenge, it can be established that these lower increases are due to the smaller amount of L2 cache present in the system (1MB for R10000s in Octane). However, for Origin2000, these differences do not occur, ie. every SPECfp95 test improves by the best possible amount based on the clock speed increase and faster L2 cache speed - the tests are not affected by L2 cache size issues. See my Octane Single-CPU Performance Comparison page for a more detailed discussion of these issues with respect to Octane.
This is important because, on other analysis pages, I have shown how - for other systems like Octane - one must consider the possibility that an upgrade is not worth the cost because one's task is more limited by the amount of L2 cache present rather than raw clock speed. But for Origin2000, this is not an issue to be concerned with: 4MB L2 seems to be enough to satisfy the kinds of tasks represented by SPECfp95.
The results also mean that, when CPUs are released which have their L2 cache running at full core speed, Origin2000 will definitely be able to take full advantage of any such CPU. My analysis of single-CPU performance in Origin200 has shown that some tasks increase in performance by a very large amount when the L2 cache runs at full core speed, and the amount of L2 cache is not as small as 1MB. Hence, if SGI release a future faster-clocked R10000 for Origin2000 which has its L2 running at core speed (and the L2 is bound to be at least 4MB), I predict that fp tasks such as those represented by SPECfp95 will show enormous performance increases compared to R10K/195, probably over 100% in some cases. Therefore, one may conclude that the future R12000 CPU, with its improved internal structure, is definitely something to look forward to.
I've talked alot on my analysis pages about L2
cache issues, but there is one area I have not discussed, namely
compiler optimisation. This isn't an area I am greatly experienced
with, but having read chapter 9 from the Indigo2 technical report
entitled, "MIPSpro Compiler Technology",
it is very obvious that some careful coding modifications can give
significant performance improvements, in some cases far greater
improvements than any CPU-upgrade would give. I also read a technical
document on Cray's web site which detailed some typical coding
modifications that can be made for vector systems; the document
showed how a little attention paid to hardware issues, such as the
size and frequency of memory load requests, could often offer
enormous speed improvements simply by changing the code to
take account of these hardware-level factors.
Upgrading a CPU may give a performance increase in the order of a few tens of percent, as is the case for R10K/250 vs. R10K/195, but some careful code optimisation can easily give far greater performance increases. So, if you're thinking about an upgrade, don't go spending a fortune if you haven't yet looked at optimising your code. Some careful thought and hard reading might cut those computation times down from several days to just a few hours. Obviously, combining code optimisation with a CPU upgrade would give the best improvement; what I'm suggesting is that one shouldn't spend money on upgrades until one has fully investigated optimisation issues.
Note that although there are online documents about code optimisation for various systems and compilers, there is also a wealth of hard printed books available on the subject. Consult your local library for some background reading; delving straight into an online guide that's specific to your system or task may make it hard to understand the general concepts involved. Besides, understanding the general principles will allow you to apply them to many systems and code types, not just the one task you happen to be concerned with at the time.
The rationale and method for this examination were the same as for SPECfp95. Thus, given below is a comparison table of the various SPECint95 test results and an equivalent percentage increase. After the table and 3D graphs is a short-cut index to the original results pages.
R10000 R10000 % Increase 250MHz 195MHz (195 -> 250) go 14.9 11.4 30.7% m88ksim 14.2 11.3 25.7% gcc 13.5 10.4 29.8% compress 15.0 11.3 32.7% li 12.3 9.57 28.5% ijpeg 12.9 10.2 26.5% perl 16.7 13.3 25.6% vortex 19.5 14.4 35.4% Average (NB: 250/195 = +28.2%): 29.4% Origin2000 SPECint95 Comparison
Next, a separate comparison graph for each of the eight SPECint95 tests:
go:
m88ksim:
gcc:
compress:
li:
ijpeg:
perl:
vortex:
Observations
Given that all the tests except vortex use small data sets, it is clear by comparing to the equivalent increases for Octane that Origin2000 shields the tests from L2 cache size issues, and so the system is fully able to take advantage of the faster L2 cache and clock speeds.
Note that ijpeg (JPEG compression) may be a computational area that is hardware accelerated on some systems, depending on the presence or absence of video board options.
Finally, when dealing with high-end systems like Origin2000, it is highly advisable to explore all possible avenues of compiler and code optimisation before contemplating an upgrade. Sometimes, careful changes to code design can give rise to large performance increases, especially by tuning one's code to match specific hardware parameters. Please see the last three paragraphs of the SPECfp95 discussion above for further comments on this subject.