Just want to add this for those interessted;
http://tu-dresden.de/die_tu_dresden...alyse_von_hochleistungsrechnern/cell//matmul/
Back in the days programming was just like on Cell all the way down with all
the DMA stuff and custom chips. If one is used to only Intel's architecture
then his/her programming model is manly based on multi-threading and a
shared memory model, which is easier to program with but not as efficient if
the problem scales and esp. not if system resources are at a prime. Serious
parallel programming, the main issue today and for the foreseeable future,
can't be solved with Intel's (indoctrinated) multi-threading model. The
entire company is built around it, their microprocessor, compilers,
optimizers, etc. However, this model doesn't scale. Serious parallel
programming always starts with the data. One needs to be in control of the
data, i.e. of the movement of the data through the system. And the movement
of the data is heavily problem/algorithm dependent and can't be left alone
by letting the system guess which data is supposed to be required next, as
is the case in a coherent shared cache memory system which Intel relies on.
The performance of many old computers/consoles, like for example the Amiga
500, PS2 etc, weren't possible if they had a uniform system architecture
similar to today's PC. The uniformity eases the programming model and that's
about it. If system performance is of utmost importance, a non-uniform model
is used. If its not, a uniform is used. Consoles are usually all non-uniform
in their architecture for a bunch if good reasons. But some of them will
converge to a uniform system if performance is not of primary concern. And
this is to be expected since many games (perhaps targeted at a specific
audience) simply don't need that much performance to run, to transmit the
story, gameplay whatsoever.
This is just what I've described above. A hardware sprite system was
essentially build to release the CPU from blitting primitive graphics onto
the screen. It was the perfect solution to assemble a processor just for
this purpose, to blit sprites on the screen, i. e. to overlay the video
signal with sprite data, and do collision detection among them etc. But
since sprites are a very small problem esp. from today's perspective, and
since bandwidth and system clocks have increased a multifold on state of the
art microprocessor, they can be entirely programmed in software. Well, you
can use an entire SPE just as a sprite engine. Anyhow, as I've written above,
there are system where performance is at a prime and as such they can't
waste system resources - even today. So for example, the gameduino board has
a hardware sprite engine;
http://excamera.com/sphinx/gameduino/.
There you go.