According to IBM the bulk of the PS3's processing power is located in Cell's SPEs (/SPUs). Without using them games development is very similar to developing a game on a decent specced single processor PC or Mac (So not really difficult at all like some developers claim, funnily those who do not even use the SPEs in the porting process! This should then actually be pretty straight foward, with of course the advantage of being able to optimize for a single uniform configuration). PS3 exclusive devs who are using the SPEs claim it's not that difficult to develop for and that it's much easier to develop for than for the PS2. Considering more and more games start to utilize these SPEs I thought it would be interesting to list those games.
SPE usage doesn't per se make a good game, neither not using them per se makes a bad game, for example simple games such as distributed on the PSN don't really need that much performance with the exception of games like Super Stardust HD, which is quite impressive with so many effects and things going on at once. However IMO for games which greatly depend on performance, not using the SPEs [enough] IMO don't make these games genuine PS3 games to judge the power of the platform on, but should rather be viewed as far from optimal ports.
Beyond3D Motorstorm interview: "Scott Kirkland: Cell’s SPUs provide a huge amount of processing power. Early adopters tended to bias usage towards either RSX or PPU support (we fall into the latter category). I’m confident that over the coming months, exploitation of this resource will become far more balanced."
Insomniac Q&As on how they try to help 3rd party developers and why some developers are still struggling with the Cell archicture:
Q&A: Insomniac's Mike Acton - Part 1
Q&A: Insomniac's Mike Acton - Part 2
"What I've always said is that bad code, and bad data design in particular, is bad on any architecture, but it's particularly bad on the PS3 because the Cell is a much more modern, much more heterogeneous design. It's much more parallel, and so requires good data design and good code. So if you're poorly designing your data and your code, then yeah, I can see why it'd be difficult to take something like that and try and manipulate it to work on the PS3, especially when people have invested a huge amount of money and time on something that basically doesn't fit a modern methodology. Yeah, it's going to be time-consuming to get that to work - if it's at all possible."
"It's interesting, because I think that probably the oldest programming methods are the most relevant today. It's the habits over the last five or eight years that are struggling, and it's interestingly the people that are more recently out of school that are going to have the most trouble, because the education system really hasn't caught up to how the real world is, how hardware is changing and how development is changing."
Some early PS3 games that are known to use the Cell's SPEs, some more recent highlights and how these games use the SPEs:
1) Resistance: Fall of Man
"Animation and calculating collisions between objects are perfect fits, says Hastings. So those are the primary jobs Resistance doles out to the SPEs.."
Source: Spectrum online
Audio (NextSynth and LR1)
Collision (separate broad and narrow)
Geom Cull Clip (for shadows and decals)
Particle (weather fx)
10-20% total SPU utilization" (uses 5 SPEs, Resistance 2 uses 6 SPEs)
2) Resistance 2
"Propriety game systems are now being heavily farmed out to the PS3's SPUs, keeping the central PPU as a sort of traffic cop that organizes what gets attention at any given moment. In simple terms, the game is taking much better advantage of the untapped potential of the console. In regard to visuals, the expanded use of the SPUs means more enemies on screen, significantly more complex AI from all of those foes and dramatically expanded options for special effects."
Source: Game Informer
GDC 2008 - Insomniac SPU Programming (Powerpoint)
3) Ratchet and Clank Future: Tools of Destruction
"We were using the SPUs and that's a key to having a game that runs fast on the PS3, but there were a lot of things that we knew we could improve, and we have been improving them on Ratchet. And even with Ratchet, we're still seeing more and more things we can do; it's kind of like peeling off the layers of an onion."
"We are continuing to build our Insomniac Engine and have made many improvements to it since Resistance: Fall of Man. The one huge focus for us has been moving more of our processes over to the SPUs on the CELL processor. This has allowed us to get our physics and effects systems running roughly four times faster than it did in Resistance at nearly double the framerate, which is something you can see in weapons like the Tornado Launcher."
Source: The New Zealand Herald
on the NeoGAF forum Mike Acton of Insomniac explained a comment regarding "experimenting with pre-vertex shaders on the SPUs".
"1. Transfer some of the load from the GPU to the SPUs.
2. Minimize complexity of the GPU Shaders. i.e. Rather than making more GPU vertex shaders, or more complex GPU vertex shaders, we'll just edit the data directly from the SPU before the GPU gets it. This allows us to "disguise" complex vertex shader code as a simple shader from the GPU's perspective.
3. Run parts of the complete vertex shader code at different rates. An (SPU Vertex Shader, or Pre-Vertex Shader) does not necessarily have to run in lock-stop with the (GPU Vertex Shader). It could run at half-rate or lower, depending on the data and the need.
We're still experimenting with different approaches and places to do this, but we've had good success so far. For example, we used this idea in RCF to handle UV animations - textures weren't animated on the GPU, the UVs were animated before the stream got to the (GPU Vertex Shader) so it could use the same GPU shaders as any stream that did not have UV animation."
Source: NeoGAF forum
Insomniac Games SPU Shader Presentation (PDF)
"SPU usage is a good example. The progressive development of corresponding debugging and profiling tools made thorough exploitation of this powerful resource quite challenging for the less technically biased members of the team. In the aftermath of MotorStorm, with mature tools at our disposal, we’ve been developing mechanisms to make the PPU and SPU’s power and parallelism far more accessible to our entire team, re-thinking data organization and algorithms in the process. MotorStorm only uses between 15 and 20 percent of available SPU resource, so we’re aiming to achieve a 5 fold increase in SPU performance, which should allow us to do some awesome stuff!"
"Our SPU exploiting systems consist of:
i) Havok physics.
ii) Determination of object visibility.
iii) Concatenation of hierarchies.
iv) Billboard object culling and vertex buffer creation.
v) Updating of particles and vertex buffer creation.
vi) Updating of vehicle dynamics.
vii) Updating of vehicle suspension constraints.
viii) Audio (MultiStream).
ix) Video decoding."
"If by cooperative rendering you're referring to SPUs supporting the RSX, I strongly believe that this approach will become far more widespread. In addition to reducing the vertex load on the RSX through the use of culling and vertex pre-processing, this approach also provides an efficient mechanism to introduce procedural geometry.
Historically, CPUs have provided course grain scene culling using view frustums, occlusion planes, portal visibility and BSP-trees with GPUs left to perform fine grain rejection using guard band clipping, occlusion and backface culling. While such features improve fragment performance, they don't reduce vertex processing overhead.
The leap in performance provided by Cell gives us the bandwidth to significantly reduce RSX time spent processing vertices that don't contribute to the final scene. The favoured approach is to use SPUs to generate minimal scene/instance specific index and vertex buffers from compressed data."
5) Super Stardust HD
"We are able to get over 10,000 active objects with physics and collisions and over 75,000 particles simulated and drawn @60fps. That said, we were unable to use all the available processing power from Cell for this game, so for the next game there are still plenty of reserves left"
Solo Pack update
"We probably draw about twice the number of objects compared to the original game. We are pretty close to maxing out the RSX, but in our next game we will still push the chip more. Currently we do not use SPUs to pre-process the geometry for RSX — that will make a major difference. I estimate that we can further boost the graphics performance by 50%."
Source: MTV Multiplayer
6) Heavenly Sword
"In Heavenly Sword, the Cell enables incredible numbers of enemies to be on screen at one time. The trick is that Cell treats entire regiments as a single unit of artificial intelligence when they are at a distance; as they draw closer, Cell gradually divides the army into smaller and smaller groups, so they eventually become individual troops with unique fighting styles and tactics."
"Heavenly Sword is one of the first PS3 games to tap into Cell's true potential. Here are the highlights.
Artificial Intelligence To keep up with the hundreds of on-screen enemies, Cell treats distant armies as a singlular "hive mind." As they approach Nariko, Cell splits their intelligence across squads, and finally, individual troops.
Graphics Wind gusts swirl Nariko's hair and clothes, and bazooka blasts send out showers of dust and rubble. 1080p support is still a question mark, though.
Physics When firing a cannon, Nariko can influence the trajectory of the projectile using the Sixaxis. Ninja Theory claims it needs the Cell to handle these complex calculations."
"Personally I really love the SPUs as they have exceeded our performance expectations and we've got a lot of them to play with."
"In terms of graphics, we use the SPU as a form of object processor. So essentially everything up to and including the production of RSX's command stream probably has a module on SPU to help.
1. A module that does a lot of object level clipping and culling both for the view frustum and ths shadow maps. Its job is per frame to calculate how big each shadow map should be in world space and what objects needs rendering in each map.
2. Animations, using ATGs (DeanA team) animation library, every animation is blended and bones updated.
3. Blend shapes, a custom module that handles facial animations
4. Skin matrices, even after animation there some work required to get them into the format used by the GPU vertex shader.
5. Flags, a simple verlet based simulation used for the flags in the game
6. Cloth & Hair, a constrained physics solver used for simple chains that are then rendered as Nariko's cloth and hair
7. Pushbuffer generation. This produces the commands used by RSX to actually render the scene. Has a number of optimisers to reduce redundent state changes.
Probably a few i've missed. Essentially a normal skinned or non-skinned character costs very little PPU time and virtually all processing is done on SPU and RSX. Its this that allows us to render the army scenes for example.
We do no per triangle work on the SPU, we let RSX do that, we however do try and prepare things on the SPU for RSX."
Source: Beyond3D forum
7) Killzone 2
"In this talk, we will discuss our approach to face this challenge and how we designed a deferred rendering engine that uses multi-sampled anti-aliasing (MSAA). We will give in-depth description of each individual stage of our real-time rendering pipeline and the main ingredients of our lighting, post-processing and data management. We’ll show how we utilize PS3’s SPUs for fast rendering of a large set of primitives, parallel processing of geometry and computation of indirect lighting. We will also describe our optimizations of the lighting and our parallel split (cascaded) shadow map algorithm for faster and stable MSAA output."
"We've created our own proprietary technology to drive the game, and this is using many of PS3's specific strengths.Large quantities of data can be streamed because we have a great deal of storage capacity. This allows for the level of detail you can see in the game.
It is not a luxury to have Blu-ray, but rather a necessity, as compression only gets you so far. I mean, the level that we showed at E3 and Leipzig topped out around 2GB! Also having the CELL and SPUs means we can offload all of our physics processing to an SPU, or process AI using the SPU's. All this processing power just means we can add more detail and create that Hollywood-type realism we're after."
I think the following article sheds more light on how the SPEs are currently used in Killzone 2, and addresses the advanced deferred rendering techniques Guerrilla Games already implemented for the game so far.
Deferred Rendering in Killzone 2 (PDF)
More recent comments:
"One of the main developments was that more processes that were initially handled by the main CPU were being moved to the SPUs. Physics, lighting set-up, particle set-up, animation and such are by now all running on the SPU, leaving the CPU to calculate the more tricky game systems that aren't easily made parallel. At some point we even found ways to start doing certain GPU calculation on the SPUs, so now a lot of our post-processing such as bloom, depth-of-field and motion blur are being rendered by the SPUs. This freed up performance from the GPU, which in turn allowed us to go even further with shader complexity and particle density."
"It's incredible to see huge levels and see the deferred rendering and note that on all the SPU’s, even on the heaviest load were coming up to about 60%," Haynes said. "They weren't coming close to maxing out. .They had about 40% of space before they started tripping or saw slow down on some of the processes."
Killzone 2 tech interviews:
8) Final Fantasy XIII
"The White Engine reportedly uses four of the six developer-available synergistic processing elements (SPEs) of the Cell microprocessor to achieve near-pre-rendered CGI quality in realtime."
Source: Play UK through Wikipedia
9) Gran Turismo 5
"Although I would say it’s the sum-total of all of our natural phenomenon in the game. Our clouds, procedural water, atmospheric scattering, terrain, etc. All of this stuff runs in parallel on all 7 SPUs simultaneously every frame – I’m still not sure if the game community is giving enough credit to just how fast the SPUs really are."
11) Uncharted: Drake’s fortune
"Like the PS2 the PS3 is a sophisticated and powerful piece of hardware. Our engineers are working very hard at making specific optimizations to take full advantage of the Cell and its SPU's. However, there is so much depth to this machine, that much like the PS2, you will continue to see developers squeeze more and more out of it over the course of what I am sure is going to be a lengthy life-cycle."
"We are utilizing all SPUs in Uncharted for AI, animation and lots of other systems. We are however just starting to tap into the power of the Cell. In future games I can promise even more utilization of the Cell which will result in more of everything, including game play."
Source: Ars Technica
"As far as the Cell processor is concerned, we're actually using about a third to half of that right now, so there's still a bit of untapped potential there."
" I would say number one thing is animation, and the fact that the Cell processor has so much raw horse power that you could just throw more and more at it and it doesn't break a sweat. Our animation system is very complex, and we layer on dozens of frames of animation so you have that fluidity of movement where Nathan Drake can be running across a courtyard, stumbling over a rock as he's ducking under a hail of gunfire, reloading his weapon and rolling into cover, and all of these animations can happen simultaneously. "
"The PlayStation 3 has a lot of power. When we started Uncharted we were really ambitious and had no idea what the PS3 would give us. Once we got the first devkits, we realized quickly that we could do everything we had planned to. The three main points for me are the Cell, Blu-Ray and the hard drive. We’ve been using the Cell for pretty much all our systems: rendering, particles, physics simulation, collision detection, animation, AI, decompression, water simulation, etc … and to give you an idea of the power of the PS3, we're using only 30 percent of the Cell processor.
In terms of Blu-Ray, we just couldn’t have made Uncharted without it; with Uncharted we have almost filled it (91 percent). We're also using the hard drive to pre-cache data from the Blu-Ray disc. That allows us to stream up to 12 streams for sound, load level data super fast and more importantly to stream textures constantly to guarantee high-res quality on the screen. "
Source: Ars Technica
"Basically, in Jak I we had somewhere in the vicinity of 300-350 animations for Jak and everyone was really happy with the fluidity of his movement and the response. In Uncharted, Drake has got more than 3500 animations and the difference is we're now taking the cell processor and we're taking say two dozen of those animations, like we've got his running animations, flinching animations, reloading animations, rolling animations, just dozens of animations all at once being layered on top of each other and then the cell processor recreates on the fly the single frame of animation that you need to be able to play the game at that moment and the fact we can just dump more and more work on that processor and its SPUs just means we can free up our CPU to do more general purpose tasks. "
"We’ve solved most of our memory problems by relying on the SPEs to perform compression, both at load-time and at run-time, using techniques developed by ICE, SCEA Tools&Tech and the SCEE ATG group."
"One of our first goals when we started Uncharted: Drake's Fortune was to push what's been done in animation for video games. We developed a brand new animation system that took full advantage of the SPU's. Nathan Drake's final animation is made of different layers like running, breathing, reloading weapons, shooting, facial expression, etc; we end up decompressing and blending up to 30 animations every frame on the SPU's."
"The main thing about the PlayStation 3 is the Cell processor and more specifically the SPU's. We are only using 30 percent of the power of the SPU's in Uncharted. We've been architecting a lot of our systems around this and we were able to take full advantage of that power. A big part of our systems is running on SPU's: scene bucketing, particles, physics, collision, animation, water simulation, mesh processing, path finding, etc. For our engine, the cool thing about having the SPU's is the fact we can minimize what we send to the RSX (the graphic chip), it allows us to reject unnecessary information and get the RSX to be very efficient. "
"We are constantly streaming animations, level data, textures, music and sounds. It would have been impossible to get this amount of data at that speed to memory without the hard drive. And of course on top of that we use the SPU's to decompress all this data on the fly."
Source: Playstation Universe (PSU)
Uncharted Tech GDC 2008
List continues in post 32