Now that we've got (some) hard numbers, I think it's time for a bit more speculation:
CPU
According to Anandtech's teardown, Wii U's CPU die is approximately 32.76mm². We know that it's manufactured at 45nm, and there's 3MB of cache on there. From my calculations from the BlueGene/Q die shot on page 4 of this pdf, 2MB of IBM eDRAM cache on a 45nm process is 6.77mm², meaning we're looking at 10.16mm² for "Espresso"'s 3MB of eDRAM cache, leaving about 22.6mm² left for the cores, etc.
Now, just for reference, from the same die shot I calculated the A2 core at 6.58mm² (again, 45nm). Of course Nintendo isn't using A2 cores, but you could fit three of them on Espresso, and still have just enough space left for inter-core communication, off-die interfaces, etc. Perhaps this says more about how small the A2 cores are than anything else, though.
Anyway, the Wii's CPU is apparently about 16mm² on a 90nm node (I couldn't find a better source, if anyone has one it'd be appreciated). An optimistic shrink to 45nm would put this at 4mm², probably closer to 5-6mm² in reality, but considering we're stripping off the SRAM cache, off-die interfaces, etc. from this number, I feel 4mm² is a rough guide to how big a Broadway core would be at 45nm. Hence, if we were to assume that the Wii U's CPU is just three Broadways bolted together with 3MB of cache, then we have to assume that the IPC and off-die interfaces take up about 10.6mm², or about a third of the die, which I don't think is reasonable or necessary for a CPU with just three threads and an off-die memory controller. Then again, there isn't much room to for three cores more than ~50% bigger than Broadways, either.
This brings me back to the asymmetric cache, and a theory I had a good few months ago. The only reason that one core would have a larger cache than other cores is if it were simply chewing through more data than the other cores. Furthermore, if the cache were four times the size, you would have to assume that that's because it's chewing through four times as much data. What runs through four times as much data as a core that's performing 32-bit scalar maths? One that's performing 128-bit vector maths. I think that what we're looking at is two cores which are basically Broadways with minor improvements, and one core with a SIMD unit (possibly a combination FPU/SIMD as in the A2). From the size of the A2 core, we know such a unit can fit in the space provided (the A2 QFPU is actually a 256-bit wide SIMD unit), and it explains the asymmetric cache better than any other explanation we've heard. Furthermore, it makes sense from Nintendo's perspective. We know from Iwata's GPGPU comments, and the sheer size discrepancy between the CPU and GPU dies, that the GPU is intended to handle most of the computational grunt work, including I'd imagine most of the vector calculations. Hence, it doesn't make sense for all CPU cores to have dedicated SIMD units. Nonetheless the CPU is going to end up doing some amount of 3D maths, and therefore a SIMD unit on just one of the cores is the most practical approach.
I'll have some speculation on the GPU a bit later.
CPU
According to Anandtech's teardown, Wii U's CPU die is approximately 32.76mm². We know that it's manufactured at 45nm, and there's 3MB of cache on there. From my calculations from the BlueGene/Q die shot on page 4 of this pdf, 2MB of IBM eDRAM cache on a 45nm process is 6.77mm², meaning we're looking at 10.16mm² for "Espresso"'s 3MB of eDRAM cache, leaving about 22.6mm² left for the cores, etc.
Now, just for reference, from the same die shot I calculated the A2 core at 6.58mm² (again, 45nm). Of course Nintendo isn't using A2 cores, but you could fit three of them on Espresso, and still have just enough space left for inter-core communication, off-die interfaces, etc. Perhaps this says more about how small the A2 cores are than anything else, though.
Anyway, the Wii's CPU is apparently about 16mm² on a 90nm node (I couldn't find a better source, if anyone has one it'd be appreciated). An optimistic shrink to 45nm would put this at 4mm², probably closer to 5-6mm² in reality, but considering we're stripping off the SRAM cache, off-die interfaces, etc. from this number, I feel 4mm² is a rough guide to how big a Broadway core would be at 45nm. Hence, if we were to assume that the Wii U's CPU is just three Broadways bolted together with 3MB of cache, then we have to assume that the IPC and off-die interfaces take up about 10.6mm², or about a third of the die, which I don't think is reasonable or necessary for a CPU with just three threads and an off-die memory controller. Then again, there isn't much room to for three cores more than ~50% bigger than Broadways, either.
This brings me back to the asymmetric cache, and a theory I had a good few months ago. The only reason that one core would have a larger cache than other cores is if it were simply chewing through more data than the other cores. Furthermore, if the cache were four times the size, you would have to assume that that's because it's chewing through four times as much data. What runs through four times as much data as a core that's performing 32-bit scalar maths? One that's performing 128-bit vector maths. I think that what we're looking at is two cores which are basically Broadways with minor improvements, and one core with a SIMD unit (possibly a combination FPU/SIMD as in the A2). From the size of the A2 core, we know such a unit can fit in the space provided (the A2 QFPU is actually a 256-bit wide SIMD unit), and it explains the asymmetric cache better than any other explanation we've heard. Furthermore, it makes sense from Nintendo's perspective. We know from Iwata's GPGPU comments, and the sheer size discrepancy between the CPU and GPU dies, that the GPU is intended to handle most of the computational grunt work, including I'd imagine most of the vector calculations. Hence, it doesn't make sense for all CPU cores to have dedicated SIMD units. Nonetheless the CPU is going to end up doing some amount of 3D maths, and therefore a SIMD unit on just one of the cores is the most practical approach.
I'll have some speculation on the GPU a bit later.