[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]

[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

Origin2000 Single-CPU SPEC95 Performance
Comparison Using Different R10000s

Last Change: 20/Aug/1998

SPEC's Introduction to SPEC95
SPECfp95 Analysis
SPECint95 Analysis

(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale)
(the graphs are also to the same scale as those given on other single-CPU comparison pages)

Origin2000 Single-CPU SPECfp95 Performance
Comparison Using Different R10000s

Objectives

This analysis examines how different R10000 CPUs perform in Origin2000 for single-CPU performance only, ie. the focus is on how different R10000s perform in the same system, in this case Origin2000 (I have separate pages dealing with how the same CPU performs in different systems).

As with all these studies, a 3D Inventor model of the data is available (screenshots of this are included below). Load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective). Rotate the object 30 degrees horizontally and then 30 degrees vertically (use Roty and Rotx thumbwheels) - that'll give the standard isometric view. I actually found slightly smaller angles makes things a little clearer (15 or 20 degrees) so feel free to experiment. Note that newer versions of popular browsers may be able to load and show the object directly, although such browsers may not offer Orthographic viewing.

All source data for this analysis came from www.specbench.org.

Given below is a comparison table of available single-CPU R10000 SPECfp95 test results for Origin2000, covering 195MHz and 250MHz versions; for reference, an equivalent percentage increase is also included for each test, plus a final average percentage increase. Faster CPUs are leftmost in the table (in the Inventor graph, they're placed at the back). After the table and 3D graphs is a short-cut index to the original results pages for the various systems.

          R10000   R10000   R12000
          195MHz   250MHz   300MHz
          4MB L2   4MB L2   8MB L2

tomcatv    26.9     34.6     47.4
swim       41.2     50.0     71.3
su2cor     11.5     15.6     20.9
hydro2d    12.6     16.6     26.3
mgrid      18.8     23.5     37.2
applu      11.7     14.4     17.6
turb3d     15.3     19.4     26.7
apsi       15.6     21.1     30.3
fpppp      29.6     37.8     47.2
wave5      25.5     33.7     41.0


          % Increase     % Increase     % Increase
FROM:      R10K/195       R10K/195       R10K/250
TO:        R10K/250       R12K/300       R12K/300

tomcatv      28.6           76.2           37.0
swim         21.4           73.1           42.6
su2cor       35.7           81.7           34.0
hydro2d      31.7          108.7           58.4
mgrid        25.0           97.9           58.3
applu        23.1           50.4           22.2
turb3d       26.8           74.5           37.6
apsi         35.3           94.2           43.6
fpppp        27.7           59.5           24.9
wave5        32.2           60.8           21.7

    Origin2000 SPECfp95 Comparison

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz | 195MHz]

Next, a separate comparison graph for each of the ten SPECfp95 tests:

tomcatv:

swim:

su2cor:

hydro2d:

mgrid:

applu:

turb3d:

apsi:

fpppp:

wave5:

Observations

Remember that the increase in clock speed from 195MHz to 250MHz is 28.2%. No one would expect a perfect scaling of speed, so a good result would be a 25% increase. Hence, one must examine whether each test achieves an increase as large as this or not.

Since the R10K/250 has its L2 cache running at 2/3rds core speed, one must also bare in mind that some tests may benefit from this faster L2 cache speed; however, this may be difficult to judge because of the small number of tests under dicussion, so I will not cover this aspect here in detail. One could perhaps form some conclusions by carefully comparing Origin200, Origin2000 and Octane figures, but any statements could easily be misleading because SPECfp95 only consists of ten tests.

Anyway, the main points which arise from the above graphs are as follows:

The percentage increases follow the ratio of the increase in clock speed very well. Not only is the average increase very similar to the ratio of the different clock speeds (26.8% vs. 28.2%), but exactly half the tests improve by a factor that is greater than the ratio of clock speeds (tomcatv, su2cor, hydro2d, apsi and wave5).
I have shown on other comparison pages how some of the SPECfp95 tests are affected by the amount of available L2 cache. When one examines the performance differences between R10K/250 vs. R10K/195 on other systems such as Octane, it is very noticable that some tests do not improve as much as one might expect, ie. the improvement is well below 28.2%. Here, for example, is a table showing the different percentage performance increases one sees when moving from R10K/195 to R10K/250, for Origin2000 and Octane (notice the lower figures for some of the tests on Octane):
```
                   Octane        Origin
                 % Increase    % Increase

      tomcatv       16.2%         28.6%
      swim          14.0%         21.4%
      su2cor        16.2%         35.7%
      hydro2d       14.3%         31.8%
      mgrid         16.4%         25.0%
      applu         17.9%         23.1%
      turb3d        22.5%         26.8%
      apsi          25.0%         35.3%
      fpppp         24.9%         27.7%
      wave5         22.3%         32.2%
```
By examining systems such as PowerChallenge, it can be established that these lower increases are due to the smaller amount of L2 cache present in the system (1MB for R10000s in Octane). However, for Origin2000, these differences do not occur, ie. every SPECfp95 test improves by the best possible amount based on the clock speed increase and faster L2 cache speed - the tests are not affected by L2 cache size issues. See my Octane Single-CPU Performance Comparison page for a more detailed discussion of these issues with respect to Octane.
This is important because, on other analysis pages, I have shown how - for other systems like Octane - one must consider the possibility that an upgrade is not worth the cost because one's task is more limited by the amount of L2 cache present rather than raw clock speed. But for Origin2000, this is not an issue to be concerned with: 4MB L2 seems to be enough to satisfy the kinds of tasks represented by SPECfp95.
The results also mean that, when CPUs are released which have their L2 cache running at full core speed, Origin2000 will definitely be able to take full advantage of any such CPU. My analysis of single-CPU performance in Origin200 has shown that some tasks increase in performance by a very large amount when the L2 cache runs at full core speed, and the amount of L2 cache is not as small as 1MB. Hence, if SGI release a future faster-clocked R10000 for Origin2000 which has its L2 running at core speed (and the L2 is bound to be at least 4MB), I predict that fp tasks such as those represented by SPECfp95 will show enormous performance increases compared to R10K/195, probably over 100% in some cases. Therefore, one may conclude that the future R12000 CPU, with its improved internal structure, is definitely something to look forward to.

Note: given that Origin2000 CPUs have so much L2 cache, it is possible that one may be wasting resources by running a fp task on Origin2000 when that task does not need as much as 4MB L2. On my Octane single-CPU page, I suggest that if one has multiple systems available such as Origin2000 and Octane, then one should experiment to see which task is best suited to which system. Given two tasks A and B, running on Octane and Origin2000 respectively, swapping them over between the systems may significantly improve the performance of task A without harming the performance of task B.

I've talked alot on my analysis pages about L2 cache issues, but there is one area I have not discussed, namely compiler optimisation. This isn't an area I am greatly experienced with, but having read chapter 9 from the Indigo2 technical report entitled, "MIPSpro Compiler Technology", it is very obvious that some careful coding modifications can give significant performance improvements, in some cases far greater improvements than any CPU-upgrade would give. I also read a technical document on Cray's web site which detailed some typical coding modifications that can be made for vector systems; the document showed how a little attention paid to hardware issues, such as the size and frequency of memory load requests, could often offer enormous speed improvements simply by changing the code to take account of these hardware-level factors.

Upgrading a CPU may give a performance increase in the order of a few tens of percent, as is the case for R10K/250 vs. R10K/195, but some careful code optimisation can easily give far greater performance increases. So, if you're thinking about an upgrade, don't go spending a fortune if you haven't yet looked at optimising your code. Some careful thought and hard reading might cut those computation times down from several days to just a few hours. Obviously, combining code optimisation with a CPU upgrade would give the best improvement; what I'm suggesting is that one shouldn't spend money on upgrades until one has fully investigated optimisation issues.

Note that although there are online documents about code optimisation for various systems and compilers, there is also a wealth of hard printed books available on the subject. Consult your local library for some background reading; delving straight into an online guide that's specific to your system or task may make it hard to understand the general concepts involved. Besides, understanding the general principles will allow you to apply them to many systems and code types, not just the one task you happen to be concerned with at the time.

Origin2000 Single-CPU SPECint95 Performance
Comparison Using Different R10000s

Just as for the SPECfp95 analysis given above, you can download a 3D performance graph (gzipped) if you wish: load the file into SceneViewer or ivview and switch into Orthographic mode (ie. no perspective), etc.

The rationale and method for this examination were the same as for SPECfp95. Thus, given below is a comparison table of the various SPECint95 test results and an equivalent percentage increase. After the table and 3D graphs is a short-cut index to the original results pages.

          R10000   R10000     % Increase
          250MHz   195MHz    (195 -> 250)

go         14.9     11.4         30.7%
m88ksim    14.2     11.3         25.7%
gcc        13.5     10.4         29.8%
compress   15.0     11.3         32.7%
li         12.3     9.57         28.5%
ijpeg      12.9     10.2         26.5%
perl       16.7     13.3         25.6%
vortex     19.5     14.4         35.4%

Average (NB: 250/195 = +28.2%):  29.4%

     Origin2000 SPECint95 Comparison

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz | 195MHz]

Next, a separate comparison graph for each of the eight SPECint95 tests:

go:

m88ksim:

gcc:

compress:

li:

ijpeg:

perl:

vortex:

Observations

It is immediately obvious that the improvements for Origin2000 are very good, often giving increases that are better than the ratio of clock speeds (250/195) - indeed, 5 out of the 8 tests show performance improvements that are greater than the average improvement. Plus, the average improvement itself is greater than the ratio of clock speeds.
Given that all the tests except vortex use small data sets, it is clear by comparing to the equivalent increases for Octane that Origin2000 shields the tests from L2 cache size issues, and so the system is fully able to take advantage of the faster L2 cache and clock speeds.
The one test which does have a variable memory access pattern and causes L2 cache misses, namely vortex, also shows a good improvement. Elsewhere I've quoted comments from John McCalpin of SGI, where he says that some modern applications do not use data sets that are as cache-friendly as most of the tests in SPECint95; his examples included database processing (eg. searching and sorting), CPU simulation, airline scheduling, etc. Since vortex shows a good improvement, one can be confident that other tasks which have complex memory access patterns should also show a good performance improvement from an 195-to-250 upgrade.

Although the above results are good, it is still wise to have proper tests done before making an upgrade decision. Because the test results show so little variance in the percentage improvements and individual SPEC ratios, it could be hard deciding which test is most like one's task.

Note that ijpeg (JPEG compression) may be a computational area that is hardware accelerated on some systems, depending on the presence or absence of video board options.

Finally, when dealing with high-end systems like Origin2000, it is highly advisable to explore all possible avenues of compiler and code optimisation before contemplating an upgrade. Sometimes, careful changes to code design can give rise to large performance increases, especially by tuning one's code to match specific hardware parameters. Please see the last three paragraphs of the SPECfp95 discussion above for further comments on this subject.

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

(check my current auctions!)

[WhatsNew] [P.I.] [Indigo] [Indy] [O2] [Indigo2] [Crimson] [Challenge] [Onyx] [Octane] [Origin] [Onyx2]

[Future Technology Research Index] [SGI Tech/Advice Index] [Nintendo64 Tech Info Index]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

Origin2000 Single-CPU SPEC95 Performance Comparison Using Different R10000s

Last Change: 20/Aug/1998

SPEC's Introduction to SPEC95 SPECfp95 Analysis SPECint95 Analysis

(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale) (the graphs are also to the same scale as those given on other single-CPU comparison pages)

Origin2000 Single-CPU SPECfp95 Performance Comparison Using Different R10000s

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz | 195MHz]

Origin2000 Single-CPU SPECint95 Performance Comparison Using Different R10000s

(click on the images above to download larger versions of the views shown)

[Test Suite Description | 250MHz | 195MHz]

Ian's SGI Depot: FOR SALE! SGI Systems, Parts, Spares and Upgrades

Origin2000 Single-CPU SPEC95 Performance
Comparison Using Different R10000s

SPEC's Introduction to SPEC95
SPECfp95 Analysis
SPECint95 Analysis

(Note: the 2D bar graphs shown here for the various SPEC95 tests have been drawn to the same scale)
(the graphs are also to the same scale as those given on other single-CPU comparison pages)

Origin2000 Single-CPU SPECfp95 Performance
Comparison Using Different R10000s

Origin2000 Single-CPU SPECint95 Performance
Comparison Using Different R10000s