The O2 workstation, first released on October 10th 1996, uses a
system design entitled Unified Memory Architecture, or UMA for short
(not to be confused with NUMA which refers to the Origin server
architecture). SGI has their own White
Paper [47K PDF] on the subject of UMA design, which I strongly
recommend you read. This page is the result of my own extensive
investigative work.
Most existing (and older) workstations and computers (pre-1998) are based on 'bus' technologies, where data is moved around the system via a shared bus from one subsystem to another. Subsystems include main memory, CPU, graphics, texture, image, video, I/O connections, networking ports, etc. Sometimes, there is a fast link between CPU and main RAM, but even then the link that connects these two elements to the rest of the system (eg. the bridge + PCI in many PCs) is slow when judged by the demands of today's applications - painfully slow for some tasks.
The problem with the shared-bus design is that, as data bandwidth demands become greater and the nature of tasks increase in complexity, many tasks become difficult to accomplish since they require vast amounts of data to be moved around the system, eg. the use of an incoming video stream as a texture in a 3D model. The normal response to such problems has been to increase the clock speed of the bus or to make it wider, but there comes a point when the bandwidth gained is small and not worth the extra cost.
UMA solves this problem by having just one 'unified' high speed memory block. The heart of the system is no longer the main CPU; instead, the memory/graphics controller becomes the focus of attention. Understanding how O2's UMA system works, and what the memory/graphics controller can do, is the key to comprehending the things O2 can do which other systems such as traditional PCs cannot. Note, however, that as time moves on, traditional designs solve these problems using other methods, eg. the Intel i760 graphics ASIC can take a video stream directly into itself to use as a texture on a 3D model; however, if the main CPU wanted to carry out operations on that video stream, it would need to be copied to main memory first - this is not the case with O2.
UMA doesn't solve every problem, but when it comes to satisfying the demands placed on workstations by users, it is an ideal low-cost solution. However, there comes a point when a task's complexity is so great, eg. rendering 750MByte seismic data sets, that UMA is not an appropriate solution; thus, systems like Octane use a crossbar-based approach which offers massive system and memory bandwidth (Octane can render a 750MByte volumetric data set in two or three seconds). At the high-end, the crossbar concept is combined with advanced interconnection technologies to offer the massively scalable systems known as Origin (file/data/web/media serving, number crunching), Onyx2 (all the power of Origin combined with the fastest graphics designs in the world), and newer equivalent models (Origin3000, Onyx3000, Onyx4, Altix, etc.)
All these concepts share a common approach: the focus is on moving data around in the most efficient way possible, removing the bandwidth bottleneck which has existed for many years in the computing world, and allowing each computer subsystem to operate at its maximum potential.
Here is SGI's simple O2 diagram (note that the small annotated numbers do not refer to bus speeds or bandwidths. I belive they denote ASIC pin counts):
So how does UMA work? An example: suppose a video stream is brought in from the O2 digital camera and the data is stored in an area in the main RAM block (termed a 'Digital Media Buffer', or DMbuffer). If one then wished to use the data as a texture for a 3D model, all that needs to be done is to pass a pointer for the data area to the CRM chip, thus saving the need to copy the data as a whole to another part of the system, such as a separate graphics card. Hence, video-as-texture (2.6MB MPEG) is trivial with O2.
Another example is volume rendering. Since there is only a single memory block, one therefore has access to effectively unlimited texture memory (ie. limited only by main RAM size). Thus, one can easily manipulate large textures sets, eg. 256MB of CAT scan data.
Of course, having virtually no limit to texture memory also benefits other areas such as visual simulation (many different textures are needing for landscapes, buildings, trees, etc.) The UMA design ensures fast and reliable access to the data, with 2.1GB/sec peak transfer rate between main RAM and the memory/graphics controller (CRM).
By way of a summary, here is an edited version of comments made by Tom Furlong (SGI's vice-president and general manager for desk-side systems) in an interview about the O2:
Although the CRM ASIC handles most graphics functions in hardware, all geometry and lighting calculations are handled by the main CPU. AT first one might think this is a disadvantage, but its cheaper and also means there is an easy upgrade path to increased performance: just get a faster/newer CPU (there are other unexpected benefits too which are discussed later). To this end, the R5000 is a good solution: it has been specifically designed to handle operations that are typically found in 3D graphics tasks, eg. MADD instructions. Do not dismiss the R5000 just because it isn't an R10000.
Please examine the detailed test results for the use of these CPUs in O2 before forming any conclusions. Also read Byte's article on the R5000, examine the R10000 performance comparison pages for R10K/195 and R10K/250, the SGI Performance Comparisons page, etc.
Here is a SPEC95 performance summary table (weighted means only) for R5000, R10000 and R12000 in O2, ordered by integer performance:
SPECint95 SPECfp95 R7000SC 600MHz 1MB L3: ? ? R12000SC 400MHz 2MB L2: 19.30 13.60 [] R12000SC 300MHz 1MB L2: 14.49 10.42 R12000SC 270MHz 1MB L2: 13.10 9.80 R10000SC 250MHz 1MB L2: 12.40 9.71 R10000SC 195MHz 1MB L2: 10.10 8.77 R7000SC 350MHz 1MB L3: ? ? R7000SC 300MHz 1MB L3: ? 7.50 R10000SC 175MHz 1MB L2: 9.10 6.60 R5200SC 300MHz 1MB L2: 8.04 6.86 R10000SC 150MHz 1MB L2: 7.40 6.20 R5000SC 200MHz 1MB L2: 5.40 5.70 R5000SC 180MHz 512K L2: 4.82 5.42 R5000PC 180MHz (no L2): 3.70 4.55
Some people react with dissapointment when first examining the R10000's/R12000's floating point (fp) performance in O2, compared to other SGI systems which use R10000 such as Octane and Origin (the discussion here will refer to R10000, but the same issues apply to R12000 in O2 aswell). There are several things to be said on this subject:
Also, as far as I know, none of the SPECfp95 tests use the kind of single precision fp calculations that R5K was specficially designed for, namely MADD-style computation (the matrix math found in 3D graphics). See the Byte article for more information.
One must decide carefully whether the improved performance offered by an R10K is worth the extra cost, though R10K O2 systems have become considerably cheaper in recent years, except perhaps for R12K/400 systems which seem to retain their value quite well. Either way, always have your application tested before making any purchasing decision. It must be said though, R10K/R12K systems are definitely good for integer tasks and 2D work; my main o2 system used to be an R5200/300, but I replaced it with an R12K/400 after tests showed the R12K to be about 100% faster.
There are several additional aspects of the main CPU in O2 that are
worthy of discussion.
The Impact of Screen Resolution on CPU Performance
Lower screen resolutions and shallower colour visuals will allow O2 to run some kinds of application faster. For example, on an R5000SC/200MHz O2, changing the screen resolution from 1280x1024 32+32 down to VGA16 improves the STREAM memory benchmark by 13 percent!
This effect may sound bizarre, but the explanation is quite simple and correlates correctly with the O2's UMA design (refer back to the architecture diagrams given earlier for clarification).
The CRM ASIC handles data transfers between itself and the:
For most users, the vast majority of the available bandwidth from CRM will be used by the DE. A typical 32bit 1280x1024 72Hz display requires a bandwidth of 360MB/sec. While this data transfer is going on, the main CPU and other system components must utilise the remaining bandwidth. Thus, if one decreases the display complexity, there will be less data moving from CRM to DE and hence more opportunities for CRM to service the rest of the system. As a result, memory-intensive applications will speed up, and tasks such as video I/O won't have to compete to the same degree for bandwidth resources. There is always enough bandwidth to handle video I/O at real-time rates; the difference is that the IOE will be more likely to be able to transfer data when first requested (quicker response, better reliability, fewer conflicts with other data being moved around, lower possibility of error, etc.)
In fact, for someone who's main task is video I/O processing where the actual on-screen display isn't important (ie. only the video I/O signals matters, and perhaps parallel, serial, SCSI transfer too), it would be advantageous to be able to shutdown the CRM/DE data transfer completely, allowing other system components to make full use of the available bandwidth, especially the main CPU (eg. offscreen rendering would be quicker). I am currently investigating how this can be achieved. It is likely that using a VT terminal connected to the serial port, instead of using the main monitor, would be one way of achieving this, but at the moment I have no practical data to prove this. When I work out how to use an O2 via a VT terminal and can obtain a suitable VT, I'll run some tests.
Who cares? Why does it matter? Well, consider that a 13% performance increase is, on average, better than the performance increase obtained by upgrading from an 180MHz R5000SC CPU to an 200MHz R5000SC CPU. For some users, it could mean the difference between a task taking 45 minutes instead of 60 minutes. If one has many such tasks to run, that extra saving could be very useful to those with time constraints; for medical people, it could be a life-saver.
Observe how the STREAM bandwidth figures in the following diagram (MB/sec), for a 200MHz R5000SC O2 running IRIX 6.3, gradually improve (ie. increase) as the display complexity decreases:
Display Complexity Copy Scale Add Triad 1280-1024-32-32-75 69.2 69.1 69.6 70.7 1280-1024-32-32-60 72.1 71.5 72.8 73.8 1280-1024-32-32-50 74.5 73.5 72.7 73.7 1280-1024-32-32-48 75.4 74.5 74.1 75.1 1280-1024-16-16-75 74.1 73.1 73.4 74.9 1280-1024-16-0-75 76.6 75.5 74.3 76.2 1024-768-32-32-75 75.6 75.1 75.1 75.9 1024-768-32-32-60 76.8 76.2 75.8 77.0 1024-768-32-0-60 77.1 76.1 76.1 77.0 1024-768-16-0-60 80.8 78.9 79.3 80.3 800-600-32-32-72 79.5 78.1 76.5 77.9 800-600-32-32-60 80.5 78.6 78.7 80.0 640-480-32-32-60 82.5 80.2 80.8 82.0 640-480-16-16-60 82.7 81.3 82.1 83.0 640-480-16-0-60 83.2 81.2 82.2 83.2
I don't yet have any data for STREAM running on O2 when the display is a VT terminal. I would be most interested to hear from anyone who has run such a test, or from someone who has any idea how the CRM/DE data transfer could be shutdown under software control.
It is possible that these effects apply for every day tasks such 3D modeling, movie conversion, etc. Some tasks may be I/O-disk bound, in which case the display complexity will be irrelevant; other tasks may be compute bound - lowering the display to VGA16 could give a good speed increase.
Note: forcing the monitor to go into power saving mode does not shut down the CRM/DE data transfer.
Geometry/Lighting and Comparing to Hardware Accelerated
Systems
How can an R4600PC 100MHz Indy XL outperform an R4400SC 250MHz Indigo2 Elan for a 3D graphics task? Answer: when the 3D scene includes complex geometry and lighting calculations.
O2 does all geometry and lighting calculations in the main CPU. The same is true for Indy XL, Indigo2 XL, or any similar system such as Crimson Entry. At first this may sound like a disadvantage, but as main CPUs have improved in power, we have now entered an era where older systems with good main CPUs and no hardware graphics acceleration can easily outperform older systems with old types of hardware accelerator board (XS24, XZ, Elan and Extreme). When this situation occurs, it doesn't really matter what type of CPU is present in the system that has the hardware acceleration. The key point is that the main CPU in the former system has an effective fp performance that is better than the Geometry Engines (GEs) on the latter system's accelerator board.
The original XZ graphics offered 64MFLOPS of GE power; later revisions (seen as Elan by hinv on Indigo2, and XZ on Indy) offered 128MFLOPS, and Extreme offered 256MFLOPS. The R5000 CPU in Indy offered between 300MFLOPS and 360MFLOPS peak single-precision MADD performance, while the best currently known CPU for O2 (custom fit R7000C/600MHz) offers 1.2GLFOPs peak.
Complex lighting calculations can hit these older accelerator boards (XZ/Elan/Extreme) hard. All older SGIs with hardware acceleration only support one hardware light (compare to InfiniteReality which supports four), so when multiple lights are present, the calculations become too complex, context switches occur because temporary data must be stored somewhere, the graphics board FIFOs fill up because the main CPU is sending in data faster than the board can process it, the CPU has to pause constantly to wait for the FIFOs to drain, and thus the GEs become the main bottleneck. In such situations, the main CPU may be little used - I saw only 2% CPU usage when running such a scenario on my Indigo2 Elan.
On the other hand, systems like XL offload all such calculations onto the main CPU. When things get tough, the main CPU runs as fast as it can just as always, hence the situation with FIFOs filling up, context switches occuring, etc. never happens and the system is ironically able to give a fair performance. That is how an Indy XL can outperform an Indigo2 Elan. It is also the reason why an R5000 Indy XZ can be slower than an R5000 Indy XL (the former must do its geometry/lighting calculations on the XZ board, completely wasting the much higher fp power of the main CPU), although this will often not be the case for any scene that only involves one light source because the presence of the hardware Z buffer in an XZ Indy can be more important than the higher fp speed of the main CPU.
What relevance is this to O2? It means as the CPU speed increases, it's possible that O2 can outperform even a MaxIMPACT for certain types of task - it'll definitely be able to outperform a HighIMPACT or SolidIMPACT anyway. The reason is geometry/lighting: as the main CPUs for O2 improve, the single-precision fp performance will eventually exceed that offered by the GEs of systems like SolidIMPACT (480MFLOPS), High IMPACT (480MFLOPS) and Max IMPACT (960MFLOPS). A 300MHz R12000 would offer 600MFLOPS, so I would expect an R12K/300 O2 to outperform a SolidIMPACT Indigo2 for tasks involving complex geometry and lighting (especially something like multiple spotlights). In theory, the R7K/600 should allow O2 to outperform an Octane/SI (at least 1GFLOP for the O2 compared to half that for the Octane/SI's GEs).
These effects will be important unless the bottleneck becomes something else such as:
These could be important if, for example, the 3D scene contained a very large number of polygons, or a complex dynamic scene. My comments above mainly refer to scenes that involve multiple lights and low polygon counts, eg. VRML worlds, though O2 does have the extra advantage of texture memory capacity being limited only by main RAM size (compared to the small 4MB limit in IMPACT).
For a more thorough investigation and discussion of these issues, please see my HolliDance Benchmark page, which includes a table of example performance results for a typical dynamic 3D real-time scene that contains complex lighting. If you own or have access to an SGI, please consider submitting a set of results as I am convinced that, for relevant tasks, the HolliDance Benchmark results table will be a very useful resource to 2nd-hand buyers and those considering upgrades from older systems. It should also be useful to users of faster systems who may be interested in possible performance degradation when the number of lights reaches a certain threshold (see the benchmark page for more details).
When thinking about O2, these issues may be important if your task involves real-time 3D animation, VRML, low-end visual-simulation, etc. It could be especially relevant if you have an older system, are considering an upgrade, and aren't sure whether to go for something like an Indigo2 Extreme/IMPACT, an O2, or an entry-level Octane. It's quite surprising to think that O2 could gradually be seen to outperform many existing SGIs for tasks that involve complex lighting. However, I doubt this will occur with Onyx2 since IR supports four hardware lights and the GEs offer 2.56GFLOPS of processing power - much greater than even a theoretical 800MHz R14K (unless such a future CPU was able to do 4 fp operations per clock instead of 2 fp operations per clock).
With hindsight, and certainly for particular types of O2 user (eg. anyone doing VRML), the fact that O2 does all geometry/lighting calculations in software could prove very advantageous in terms of much greater performance in the future. Note that this kind of task is very different from the typical 'primitive' level benchmarks shown on technical reports and PR web sites. Such simplistic performance figures (eg. flat tris/sec, or lit, shaded, textured triangles/sec) almost always involve either no lighting whatsoever, or just a single directional light, thus hardware acceleration boards never experience the problem of having to deal with more light sources than can be handled by the hardware at one time. A good example of this that although an R4600PC 100MHz Indy XL outperforms an R4400SC 250MHz Indigo2 Elan for the HolliDance 3D animation program by 8 percent (large window, no texture), if one turns all the lights off then the Indigo2 immediately becomes 158% quicker than the Indy.
What I've tried to highlight here is that you should be very wary of assuming O2 must be better or worse than older systems simply because it's newer, has a better main CPU, etc. The reality may be much more complex because of the way graphics hardware works and how the different components of a system interact, combined with the fact that different systems often work in very different ways.
For example, one might assume that an O2 should outperform an Indigo2 Extreme for Gouraud shaded tasks, and indeed it does on the primitive level benchmarks by a moderate to reasonable margin (between 7 and 65 percent for various CPUs); but what might be a surprise to many is that O2 can completely stomp over an Indigo2 Extreme for a 3D task that involves multiple lights. For the HolliDance benchmark, compared to R4400SC/250MHz Indigo2 Elan, the O2 was 510 percent faster! The primitive level benchmarks would have suggested a difference of around 170%.
But turn off the lights and the difference changes drastically: O2 is now 144% faster than Indigo2 Elan, a figure which correlates much better with the primitives tests. In other words, when the complex lighting is turned off, both systems speed up, but Indigo2 Elan speeds up by a much greater degree (300% compared to 60%) because all the horrible bottlenecks concerning the GEs are removed, though it's still slower overall. Obviously, I would expect the differences between O2 and Indigo2 Extreme for HolliDance to be less, but I reckon O2 would still be at least 200% quicker when the lights are turned on (as opposed to the 20% difference one might expect from the primitives tests).
3D graphics is a strange thing. Yet again, this is more proof, if any were needed, that the only benchmark test one should really trust when making a purchasing decision is one's own application.
The controller element is programmable, to allow for future video and image formats - this means it's likely that the unit is perfectly capable of doing four 32bit ops or two 64bit ops per clock, but I don't think the current libraries support such operations since today's video/image tasks don't need them.
ICE allows one to do some impressive real-time image and video operations, some of which are shown in the various O2 demo programs. Real-time examples include: edge detection, colour space conversion, luma and chroma keying, etc. For a more thorough description of ICE, please see my main ICE page.
Incidentally, because of the many questions about ICE that I've thrown at people in SGI, a member of SGI's Global Technical Support has begun the process of writing a proper report on ICE for a future issue of Pipeline (a few months' time probably). I will be helping in the creation of the report to a limited degree.
Finally, here is SGI's own description of the ICE system, including comparisons with Indy (note that IRIX 6.5 has a newer API for dealing with O2's digital media features):
The following table lists some key digital media hardware differences between O2 and Indy: Table: O2 vs. Indy Hardware O2 Indy Image and Compression Engine (ICE): * Motion JPEG video compression/decompression * Built-in motion JPEG video requires optional Cosmo compression/decompression Compress board * Built-in imaging * Imaging accerlation not acceleration available on Indy Video input (video output Video input and output requires IndyVideo or IndyVideo 601 option) Screen-capture video source Requires optional Indy (graphics screen available as VideoTM card video input device) Improved digital video camera IndyCamTM and external with built-in microphone and microphone shutter button Silicon Graphics is also releasing IRIXTM 6.3 for O2. This updated OS version has the following new elements: * New digital media buffer (DMbuffer) programming interface for sharing unified memory among the application, video I/O devices, compression, graphics rendering, and graphics display * New Video Library (VL) programming interface to DMbuffers * New digital media image conversion (dmIC) programming interface based on DMbuffer for direct data transfer among image-conversion algorithms/devices, video I/O, and graphics * Hardware-accelerated OpenGL imaging extensions Audio and Video I/O Ports The following I/O devices transfer audio samples and video pixels into and out of main system memory: * Camera and camera microphone * Two line-level analog stereo outputs and one line-level analog stereo input * S-video and composite video in/out * Headphones out * Microphone in (mono) * Speaker output * Optional CCIR 601 digital video adapter in/out Digital Media Buffer Architecture The DMbuffer is a new API for programmatic access to a new IRIX operating system feature that unifies the memory buffering systems of live video devices, such as video input and output and image compression and decompression. Also, OpenGL can both read from and render to the DMbuffer system, thus enabling completely programmable video effects: anything that you can render to a window you can also render offscreen and send directly to video output or compression. Furthermore, video input and decompression output are available for graphics display. The software architecture consists of the following elements: * DMbuffer * Ability to treat DMbuffer data as pbuffer or texture map data in OpenGL * VL receive/send DMbuffers to/from video I/O hardware * ICE (Image and Compression Engine) uses DMbuffers for input and output * New Digital Media Library (libdmedia) image conversion API (dmIC) Image Processing Engine ICE is a chip, and digital media image conversion (dmIC) is a software interface. Together, these two components enable video compression/decompression functions; they also allow applications to display multiple image streams. The ICE chip contains the following components: * MIPS RISC core for program control * Integer vector unit capable of 8 multiply-accumulates per clock * Bit stream encoder and decoder * Intelligent DMA controller These features are tied together with highly optimized code for applications such as JPEG encode and decode, general and separable convolutions, color matrix multiplies, and histogram generation. Providing the functionality of the Cosmo CompressTM option card for Indy, ICE is even more flexible than its predecessor. In addition to handling single streams of live video, ICE is easily shared between multiple smaller streams (of any size and rate); for example: 4 quarter-size, full-rate streams are supported as easily as 1 full-size, full-rate (or 2 half-size, or 3 third-size, or 2 full-size, half-rate, and so on). Since there is no built-in video clock or video dimensionality on the ICE chip, you can also use for non-standard sizes and rates; for example, film aspect ratio at film rate for film animation preview to the graphics monitor. With the Indy, all imaging and compression calculations were done by the main CPU. ICE, which functions as a separate CPU, now handles these calculations, which frees the main CPU to handle other processes. Also with the Indy, you had to purchase dedicated cards, such as a JPEG card, to handle jobs such as compression. Silicon Graphics designed O2 with flexibility as a key objective. Consequently, the system can handle JPEG compression as well as image-processing functions, without having to purchase dedicated cards for each process. The IO Engine (IOE) is a chip that brings video and audio into and out of the system. Both IOE and ICE feature direct memory access (DMA) controllers, which enables them to read compressed images and output the information to a video out channel. Not only do IOE, ICE, and UMA simplify the sharing of digital media data between subsystems, their interaction is many times faster than more common methods of transferring data between subsystems over a system bus. New Image Conversion API The Digital Media Library (libdmedia) that's included with IRIX 6.3 features a new digital media image conversion library (dmIC). You use this low-level API for memory-to-memory image compression/decompression and conversion. dmIC supports the standard software image codecs supported by the older Compression Library (libcl) interface in IRIX 6.2 and earlier releases. dmIC also supports the real-time motion JPEG encode/decode capability of the O2 ICE processor: * The dmIC interface makes software image codecs and hardware-accelerated memory-to-memory codecs look the same to application developers. * dmIC operates on image data stored in DMbuffers. This makes it possible to share image data between hardware or software codecs and OpenGL or the Video Library, without copying data. * dmIC does not support in-line compression devices that are integrated into video capture or playback hardware paths; for example, Cosmo Compress or Impact Compress. These kinds of devices require a slightly different programming model from the model used to send data to and receive data from an asynchronous memory-to-memory processor. The older libcl continues to provide the applications programming interface to these kinds of devices. * An application can query dmIC to determine whether the current system offers a real-time implementation of a particular memory-to-memory codec; for example, JPEG. The real-time JPEG codec on O2 supports full-rate encode/decode at NTSC/PAL square pixel, CCIR 601/525, and CCIR 601/625 video timings. On systems that are not equipped with a real-time memory-to-memory codec, an application can also use the non-real-time software implementation. * The Compression Library functionality offered in IRIX 6.2 will continue to be supported in IRIX 6.3 and future releases in order to ensure backward compatibility for applications. * Starting with IRIX 6.3, MPEG audio/video encode and Cinepak encode capabilities are bundled with every Silicon Graphics system. These software encoders no longer require a Silicon Graphics run-time license. The new dmIC routines are declared in the public header. The new DMbuffer routines for creating and manipulating DMbuffers are declared in . OpenGL Extensions for Image Data Silicon Graphics created OpenGL extensions for O2, which allow you to use DMbuffers as either pbuffers or texture maps. The company also designed an OpenGL extension for rendering YCrCb (4:2:2) interlaced data, which lets you save video display pixels in a pixel format, rather than converting them to bits. Using these extensions, you can also perform hardware color space conversions from YCrCb to RGB. In addition to the new OpenGL extensions, O2 provides hardware acceleration for the following existing extensions: * Color scale and bias * Color table look-ups * Convolutions: 3x3, 5x5, and 7x7 (separable and general) * Color matrix multiply * Histogram and MinMax The support of these operations should promote interesting applications, with real-time feedback (attributable to the performance increase), in the fields of medical imaging, GIS, and post production. Moreover, the support of a common API (OpenGL) enables applications to run across the product line, with performance gains associated with the platform on which the applications are running. DMcolor and OpenGL Color Matrix Extensions With O2, you can use OpenGL hardware to perform transforms. In addition, DMcolor can set up transform matrices that the application can pass to OpenGL. The system also has a software image color space conversion engine in libdmedia. The system also has a DMcolor API. Video Library and DMbuffers The system has new Video Library (VL) calls for receiving video data (fields or pairs of fields interleaved to form frames) into DMbuffers, and for sending video data using DMbuffers. In addition, the video I/O path can handle mipmap generation for live video. The older VLbuffer interface is still supported as well. Audio Library Enhancements Starting with IRIX 6.3, the Audio Library (AL) is packaged as a DSO rather than as a static library. The Audio Library adds a number of new functions and features, however the 6.3 version of the library is backward-compatible with previous releases. New features in 6.3 include: * The ability to support multiple audio I/O devices in a single system. * Support for the O2 workstation's ability to lock audio and video sample rates together in hardware to prevent drift during synchronized audio/video recording or playback. In addition, IRIX 6.3 introduces a new, generalized version of the Audio Control Panel, which can automatically configure itself when you add audio I/O devices to the system. High-Resolution Timer for Synchronizing Audio and Video Streams The O2 workstation includes audio/video hardware support for Silicon Graphics' high-resolution digital media timer, the unadjusted system time (UST) clock. UST provides a common time base for timestamping audio samples and video fields as they enter or leave the system through the audio/video I/O ports. AL and VL each support timestamps based on the UST clock. Applications can use this common timebase to correlate and synchronize outgoing audio and video input/output streams. Refer to the man pages alGetFrameTime(3dm) and vlGetUSTMSCPair(3dm) for more information. The O2 architecture makes the high-resolution UST clock visible to PCI option cards as well as to the audio/video subsystems that are standard on the system. Movie Library Enhancements Starting with IRIX 6.3, the Movie Library is packaged as a pair of DSOs rather than as a single static library. The Movie Library API is backward-compatible with previous IRIX releases: * Movie file library (libmoviefile.so) deals with movie file reading, writing and editing. This DSO includes the functions defined in the public header . * Movie playback library (libmovieplay.so) provides high-level functions for movie playback with synchronized sound and images. This DSO includes the functions defined in the public header . The IRIX 6.3 version of the Movie Library offers the following new features: * Support for Indeo encoding and writing AVI files * Support for creating MPEG-1 video and systems bitstreams through the movie file library interface * Support for full-rate, full-resolution motion JPEG playback with synchronized audio by using the real-time JPEG decode capabilities of the O2 ICE processor * Ability to take advantage of the OpenGL extensions for rendering interlaced image data and YCrCb image data on O2 New Audio Conversion API The Digital Media Library (libdmedia) that's included with IRIX 6.3 features a new digital media audio conversion library (dmAC). You use this low-level API for memory-to-memory audio sample format conversion, sample rate conversion, and compression/decompression. dmAC supports these audio conversion operations: * Sample data format conversion (signed, unsigned, float, double, scaling) * Sample rate conversion (several algorithms) * Channel conversion (mono, stereo, 4-channel, and so on) * Compression/decompression IRIX 6.3 supports the following audio compression algorithms: * CCITT G.711 mu-law and A-law * CCITT G.722 * CCITT G.726 16, 24, 32, and 40 Kb/sec * CCITT G.728 * GSM * Intel DVI ADPCM * MPEG audio All of the audio compression/decompression and conversion algorithms are implemented in software. No special option hardware is required to perform these conversions. The new dmAC routines are declared in the public header . Starting with IRIX 6.3, MPEG audio encoding is bundled with all systems and no longer requires a license from Silicon Graphics. Audio File Library Enhancements The new version of the Audio File Library (libaudiofile) included in IRIX 6.3 offers support for several additional sound file formats: * Amiga IFF/8SVX * SampleVision * Audio Visual Research * Creative Labs VOC * Creative Labs SoundFont2 The library now offers transparent sample rate conversion in addition to transparent sample format conversion and compression/decompression. You can specify a virtual sample rate from within your application; for example, 48 kHz. The application can open sound files that contain data sampled at a variety of rates, and the library automatically converts between the sample rates used in the sound files (such as 44.1 kHz, 32 kHz, or 16 kHz) and the virtual sample rate that the application requests.