POWER5 architecture (efficiency 2 DMIPS/MHz)
1 MB L2 cache
x.x GHz (no freaking idea)
POWER7 architecture (efficiency 2.15 DMIPS/MHz?)
Out-of-Order-Execution (pretty much confirmed?)
No SMT (?)
~17-Stage pipeline (?)
3 MB L2 cache
Now, this generation had to do lot's of SMT (multithreading) because they couldn't do out-of-order execution so they had to branch out to other cores. This CPU, if out-of-order but without multithreading is another kind of beast altogether; flunking in SMT optimized code will be quite normal.
As for reasons to leave it out:
ARM states that it is considerably better to double your silicon area and stick two cores on, than it is to go for a more complex single core with SMT support, their reasoning being that a well-designed multi-core system, while bigger, will actually use less power. They claim up to 46% savings in energy over an SMT solution with four threads.
Also, moving an application to two threads on a single SMT-enabled core will increase cache-thrashing by 42%, whereas it will decrease by 37% when moving to two cores.
Out-of-Order means more consumption than in-order execution, as does SMT as opposed to no SMT. Now, power consumption was certainly an issue with current gen, as they went with 2-way, not 4-way and they chose to leave out-of-order out. So it's all a matter of balancing, I suspect Nintendo's balancing to be very different from Sony/Microsoft because of how they effectively hampered general purpose performance on their cpu's with PS3/X360
Picking from the rumors we had with the wii u they're making exact oposite choices, some of them are clearly motivated by power issues (and heat) as SMT can have a 15 to 30% gain on performance; others seem to be valuing more what a regular cpu should be able to do (out-of-order being instrumental in that). On a side note: Current gen cpu's are a pain in the arse like the aforementioned cache trashing issue mentioned in the quote above, their in-order-execution nature means that there isn't dynamic branch prediction or cache miss prediction (that reaches 5% on PS3/X360) that crashes the whole pipeline and takes cycles to clean out.
In short, this gen hasn't been optimized with good cpu's in mind, it has been optimized for cpu's with crap general purpose performance, in-order-execution, lots of fillrate/FPU performance and written in a multithreaded yet not cpu branch-intensive way while being plagued by inconsistent performance (cache miss and other issues).
Dropping code in there meant for that kind of architecture will not run miraculously better; in fact the lack of SMT (if true) can certainly hurt it. Doesn't mean it's less powerful just yet though.