The Maya binary file for this scene is available for download if required. Note that the ZooRender website has a table of results, but they're pretty useless for comparison purposes since the table is full of bogus/joke numbers. However, I can at least use the test for comparing different SGIs.
The commands used for the tests are as follows:
Single-CPU Command:
timex Render -r mr -rd $PWD -im test -of tif Benchmark_Maya.mb
Multi-CPU Command:
timex Render -r mr -rt 4 -rd $PWD -im test -of tif Benchmark_Maya.mb
Here are the results, in hours, minutes, seconds and hundredths of a
second (ie. the time format is
'hours:minutes:seconds.hundredths-of-a-second').
NOTE 1: any result shown in bold type is an overall 'throughput' result, ie. the effective speed per render when running multiple 4-thread renders at the same time.
NOTE 2: in order to demonstrate CPU scalability, any system with N CPUs that is tested with a number of threads K that is less than N is shown by having its entry in italics, ie. only K CPUs in that system are being used.
Num -------- CPU ------- Time System CPUs Type MHz L2 h:mm:ss.ss Notes Onyx 24 R10000 195 1MB 0:01:57.00 4 threads x 6, throughput test (#1) Tezro 4 R16000 1000 16MB 0:02:24.39 4 threads Onyx 16 R10000 195 2MB 0:02:43.85 4 threads x 4, throughput test (#1) Origin350 4 R16000 700 4MB 0:03:11.23 4 threads (32-CPU system, only 4 CPUs used) Origin300 4 R14000 600 4MB 0:03:37.10 4 threads Tezro 4 R16000 1000 16MB 0:04:07.54 2 threads Origin300 4 R14000 500 2MB 0:04:13.26 4 threads Onyx2 4 R14000 500 8MB 0:04:29.24 4 threads Onyx2 4 R12000 400 8MB 0:05:16.42 4 threads Tezro 2 R16000 700 4MB 0:05:54.37 4 threads Origin2000 4 R12000 350 4MB 0:06:17.90 4 threads (system has 4GB RAM, SSE+TRAM gfx with IO6G) Origin300 2 R14000 600 4MB 0:06:42.16 4 threads Octane 2 R14000 600 2MB 0:06:50.02 4 threads Tezro 4 R16000 1000 16MB 0:08:20.49 1 thread Fuel 1 R16000 900 8MB 0:08:39.93 Origin300 1 R14000 600 4MB 0:09:33.12 Octane 2 R12000 400 2MB 0:09:36.57 4 threads Fuel 1 R14000 800 4MB 0:09:45.99 Onyx 4 R10000 195 2MB 0:10:55.67 4 threads Fuel 1 R16000 700 4MB 0:11:29.96 Onyx 4 R10000 195 1MB 0:11:33.87 4 threads Octane 2 R12000 350 1MB 0:12:39.55 4 threads Origin300 1 R14000 600 4MB 0:13:04.09 Fuel 1 R14000 600 4MB 0:13:17.18 Octane 1 R14000 600 2MB 0:13:24.35 Octane 2 R12000 300 2MB 0:13:30.56 4 threads Octane 1 R14000 550 2MB 0:15:25.44 Octane 2 R10000 250 1MB 0:16:35.61 4 threads Octane 2 R12000 250 1MB 0:17:50.52 4 threads (CPU mod, stage 1. No benefit until overclocked!) Onyx2 1 R12000 400 8MB 0:18:38.65 Fuel 1 R14000 500 2MB 0:19:11.33 Onyx2 2 R10000 195 4MB 0:19:46.46 2 threads Octane 1 R12000 400 2MB 0:20:04.25 Onyx 2 R10000 195 1MB 0:20:11.07 2 threads Octane 2 R10000 195 1MB 0:20:22.90 Octane 1 R12000 360 2MB 0:21:16.89 Onyx 4 R4400 250 4MB 0:22:39.33 4 threads [hinv] Octane 2 R10000 175 1MB 0:23:43.14 2 threads Octane 1 R12000 300 2MB 0:25:42.36 O2 1 R7000 600 256K/1MB 0:26:01.79 [hinv] O2 1 R12000 400 2MB 0:30:04.04 Octane 1 R10000 250 2MB 0:32:16.08 Octane 1 R10000 250 1MB 0:33:00.92 O2 1 R12000 300 1MB 0:36:07.57 Onyx2 1 R10000 195 4MB 0:38:37.14 O2 1 R7000 350 1MB 0:40:33.61 O2 1 R12000 270 1MB 0:41:03.35 O2 1 R10000 250 1MB 0:42:04.01 Indigo2 1 R10000 195 1MB 0:43:19.61 Octane 1 R10000 175 1MB 0:45:22.45 Octane 1 R10000 195 1MB 0:45:50.43 O2 1 R10000 225 1MB 0:47:21.28 O2 1 R10000 195 1MB 0:51:46.01 O2 1 R10000 175 1MB 1:00:54.02 O2 1 R5200 300 1MB 1:01:22.76 O2 1 R10000 150 1MB 1:13:20.93 O2 1 R5000 200 1MB 1:23:11.92 Indigo2 1 R4400 250 2MB 1:39:08.93 O2 1 R5000 180 512K 1:40:08.31 Indy 1 R5000 180 512K 1:52:18.56 Indigo2 1 R4400 200 2MB 1:55:03.81 Indy 1 R4400 200 1MB 2:02:30.20 Indy 1 R5000 150 512K 2:03:14.05 Indigo2 1 R4400 200 1MB 2:04:27.66 O2 1 R5000 180 - 2:31:02.95 Indy 1 R4400 150 1MB 2:34:36.26 Indy 1 R5000 150 - 2:53:37.49 Indy 1 R4600 133 512K 3:13:43.05 Indy 1 R4000 100 1MB 4:10:05.47 Indy 1 R4600 133 - 4:39:16.63 Indy 1 R4600 100 - 5:05:16.15
Unlike the Alias render test, render time performance for this scene varies more or less with straight clock speed (except for older systems that have no L2 cache or small L1 cache) and thus scales very well with multiple CPUs on older Origin2000-based systems (ie. linear speed increase), suggesting it does not involve particularly complex memory access and/or does not benefit that much from a larger L2. Or to put it another way, an Origin2000 or Onyx2 would scale nicely with more CPUs. Indeed, testing my older Onyx system confirms this idea, scaling nicely from 1 to 4 CPUs; the system also performs with an almost linear speed increase when running multiple instances of the same render on a 24-CPU system, giving an excellent overall throughput for rendering multiple frames.
By contrast, the Alias test scene scales better if run on a later Origin3000-series system (which includes Origin300, Onyx300, Onyx350, Fuel, Tezro, etc.) This is also why it is sometimes more efficient not to parallelise frame rendering too much for a complex scene, ie. simply render 1 frame per CPU/core instead. It depends on the scene. Smart render management software systems will adjust how they use CPU resources based on the scene being processed, eg. I know of one movie company (MPC) which uses a system that does not use the 4th core in each quad-core XEON for very complex renders (their renderfarm has 7000 cores).
If data can be reused between frames though, then good speedups can be obtained on shared memory systems, but these days not many companies bother doin this because it means employing people to write the custom software (ILM used to do this with 16-CPU Origin2K racks, giving much better results than would otherwise be the case).
The results show that older systems and multi-CPU systems are very effective for this kind of task, ie. a non-complex scene that can benefit from multiple CPUs. Usually the speedup from using more than one CPU is pretty much linear, no matter what CPUs a system has. Given Maya's V6.5's thread limit of 4, this bodes well for rendering multiple frames for animations, ie. overall throughput as opposed to the speed of doing just one frame. Thus, for example, a 24-CPU Onyx is almost as fast as an 8 x 600MHz Origin300!
NOTES:
The application startup time can be significant on older multi-CPU systems, partly masking the benefit of having extra CPUs. It's only a few seconds, but this can account for some variance between results. One could alleviate this by using a striped XLV to hold key applications directories and/or data.
#1: These results refer to running n instances of the Render test at the same time, done by executing the n commands in n different shell windows (rlogged into from another system), acting upon copies of the scene file in different directories. I set up the command in each shell, then used the mouse middle button to paste a newline into each shell as fast as possible, so the n commands are activated all within the space of about 2 seconds at most. Since this means n copies of Maya have to be loaded at the same time, there is some variance in how long each instance takes to run, but the results are impressive; here is an example for six renders:
Time mm:ss.ss Render 1: 11:41.09 Render 2: 11:34.50 Render 3: 11:32.88 Render 4: 11:54.06 Render 5: 11:48.55 Render 6: 11:51.45
This is an average time of 11 min 44 sec, ie. an overal throughput time for 24 CPUs of 1 min 57 secs per render. Very cool! 8)
Unanswered questions:
Dual-R12K/360 Dual-R10K/225 Single-R12K/270 Single-R10K/225
Fuels not yet tested:
R16K/800 (4MB)
O2s not yet tested:
R7K/600
Indigo2s not yet tested:
R10K/175 R4K/175 (1MB) R4K/150 (1MB) R4K/100 (1MB) R4K/100 R4600SC/133 (512K) R8000/75 (2MB)
Indys not yet tested:
R4400SC/175 (1MB) R4000PC/100