The thing is, the entire way that out-of-order instruction pipelines are designed is that they do the entire out-of-order thing automatically in a way which is logically equivalent to an in-order processor (ie the exact same instructions are run with the exact same outcomes, but the order in which they're computed is sometimes changed to speed things up). This means that an out-of-order processor will never be slower than an otherwise identical in-order processor at running the same code. It might be faster, it might take exactly the same amount of time, but it'll never be slower.
After seeing how some applications, once being optimized for the Wii U CPU, have a dramatical increase of their performances (many hundreds percent more), i tried to search what in the CPU is so different than its current gen counterparts to explain this. Then appear my scenario where maybe, its OoO nature is the problem, as normally, this architecture is supposed to speed the execution of a code by changing the order of the sequence here and there, and perhaps, in the case that interests us, the heavily customized processor involved had a problem (at a certain time in its designing, with certain applications, with a certain software to use it, with certain type of code), and this traditional OoO initiative to modify the order of the sequence is, in this scenario, counterproductive and slow the process. But clearly, IBM & Nintendo must have found such a deficiency if it existed quickly and correct it before implementing such CPU on their dev kit. Anyway, thanks for the precisions
Of more relevance is the fact that the instruction set will probably be quite different from those of the Cell and Xenon. Console CPUs generally have highly customised instruction sets, with extra instructions to handle whatever tasks the designers deem important. Hence, code which relies on custom instructions on Xenon may either run poorly or not at all on the Wii U CPU. AltiVec (yep, like VMX) is one such area, as Xenon has a highly customised variant of the VMX AltiVec unit, and it's quite possible that the Wii U CPU is using a customised version of the more modern VSX AltiVec unit. They'll have broadly the same functionality, but different instruction sets and data-formatting rules would mean there would be a bit of a learning curve in terms of getting code intended for the XBox360 to run well on the Wii U.
Correct me if i'm wrong, but to generalize/make this more accessible, AltiVec and VMX are like extended versions of MMX or SSE, an additional "package" of instructions set, of features, on a CPU. But do you really think that there are such differences between the instructions set of the Wii U CPU and the Xenon that it could explain the phenomenal increase of performances of some middleware after being optimized ? What is the probability that the same manufacturer, IBM, changed or added to or modified the VMX128 instruction set of the Xenon, already more adapted to gaming purposes, to an extent that the Wii U CPU see such poor performances for some middleware, and require a lot of optimizations ?