My original question was: 'What do you think Xenon has over Espresso, SIMD non-withstanding?' If you say they tested with an actual game then of course the SIMD side of things cannot be isolated now, can it?
Then the answer to your question is simple (and again, it's right there in my response):
The thing that Xenon has over Espresso is that it is clocked at 3.2GHz, instead of 1.25Ghz.
You should know the drill, no one gives a shit about how elegant the processor architecture is or how efficient or whatever if it doesn't deliver the goods. I don't think there will be many gamers out there saying "I'm so happy that I am playing this game with fewer online opponents because the CPU in this system is more elegant."
Dunno about other compilers, but on gcc you have to get through some hoops to get the autovectorizer to even acknowledge the existence of paired-singles. The biggest issue is that if in the wrong 'mood' it will accept all paired-singles-related command line options and silently ignore them. Which is not fatal if you have the habit to always check what cc did today, but can be a source of issues, and is straight-out annoying as fuck.
I have not used an Espresso like processor since the GameCube (and honestly I was not writing to the metal kinda code back then), did the compilers Nintendo provided in those days produce horribly suboptimal code? If so, why didn't they make it a priority to fix it in the last 10 years? I mean, true, they probably had some hiccups with the purchase of SN by Sony...I have no idea what people used in the Wii era and beyond. (Codewarrior? Did Nintendo provide something else?)
The minimum effort I would have expected DICE to put in was to update their base math library to use whatever the Espresso uses. I'm pretty sure EA had something like that lying around given 10+ years of GameCube and Wii games, so it's not like they would have had to write that from scratch for their test.
No effort. But did they do that, actually? What I mean is, did they run a game scenario, check the actual sustained FP throughput on Xeonon, then compared that to the theoretical Espresso FP throughput?
Would they have to? Wouldn't the situation look like this?
"Wow. Job group A with a heavy FPU workload that takes 1ms of wall clock time across 6 Xenon hardware threads seems to take 7-8ms of wall clock time across three Espresso hardware threads. How the hell are we going to claw all that time back...and that's just job group A. What about groups B, C, D..."
(Numbers based on the ratio of the theoretical max GFLOPS of each CPU - 115GFLOPS for Xenon, 15 or so GFLOPS for WiiU, right? The ratio probably isn't that severe in reality but I think you are looking at a significant amount of time to get back...just look at all the blocks in the BFBC2 timing view screenshot running on SPU, which is really Frostbite 1)