A few calculations on the GPU:
From
the Anandtech teardown, the Wii U's GPU die is 156.21mm². We can assume, although it must be stressed that it's only an assumption, that the GPU is manufactured on a 40nm process. The die likely contains a GPU derived from AMD's R700 series, 32MB of Renesas eDRAM, a memory controller, a pair of ARM cores and a DSP.
First, the eDRAM. Assuming a 40nm process, the eDRAM is going to be Renesas UX8LD, which comes in
three configurations:
64Mb/256bit -> 1024bit interface for 32MB -> 51.2GB/s to 102.4GB/s
8Mb/256bit -> 8192bit interface for 32MB -> 409.6GB/s to 819.2GB/s
8Mb/128bit -> 4096bit interface for 32MB -> 204.8GB/s to 409.6GB/s
(The bandwidth ranges are based on a clock range of 400MHz to 800MHz)
Bandwidth of 400-800GB/s would certainly be massive overkill (as a comparison, the highest bandwidth on any currently available consumer GPU is the Radeon 7970's 288GB/s, and that's targeting much higher resolutions than the Wii U's 720p standard). The 1024bit interface at a high clock or the 4096bit at a low clock are probably the most likely, and the clock is likely to be either equal to, or a clean multiple of, the GPU's clock. For reference, the on-die interface between Xenon's ROPs and eDRAM is 4096bit at 500MHz for 256GB/s of bandwidth.
The cell size of UX8LD is 0.06 square micron meters, which, if my maths hasn't failed me, means a total of 16.1mm² for 32MB. This leaves ~140.11mm² for the rest of the die.
Onto the ARM core(s).
Renesas offers the following ARM cores on its 40nm process:
ARM9, ARM11, ARM11MP core, Cortex R4, A5, A8, A9
We can rule out ARM9 and ARM11, as they're single-core architectures. I also feel we can probably rule out the A8 and A9, as they're targeted at higher performance applications than the security/IO co-processor role they're likely to serve in the Wii U. Of the remaining three, I feel the Cortex A5 is the most likely bet. Why? It was revealed in 2009 (when development of the Wii U hardware was beginning) and is, apparently
"the smallest and lowest power ARM multicore processor". It's also used in a very similar role (as a security co-processor) in AMD's 2013 APUs, which indicates its suitability. How big exactly are Cortex A5 cores?
Produced using a technique of 40 nm each Cortex A5 occupies an area of only 0.9 mm ² (including the 64 KB L1 cache)
(
Source)
That comes to just 1.8mm² for the two cores, leaving us with ~138.31mm² left for the rest of the die.
On the DSP front, it gets a bit trickier. The DSP used in the Gamecube and Wii was designed by Macronix, who have since spun off their DSP design business to a company called
Modiotek. Modiotek's current product line seems to be ill-suited for what Nintendo are looking for, though, as they're targeting digital answering machines and low-cost phones. Nintendo seem to agree, as for the 3DS DSP they instead went to a company called
CEVA, whose TeakLite line is more suitable for gaming devices. The 3DS's DSP is,
according to wsippel*, a modified TeakLite I, and a modified TeakLite III or IV seem sensible choices for the Wii U. The TeakLite IV is probably the more interesting one, as CEVA specifically refers to it a number of times as being suitable for games consoles. It was only announced earlier this year, though, but it's possible Nintendo could have had early access to the design. Otherwise the TeakLite III seems feasible. One issue with the TeakLite family is the reported clockspeed of 121.5MHz. Both the TeakLite III and IV can hit 1GHz+, so it seems odd to have it running so low (much lower than the GPU it's embedded in, in fact).
Another possibility is that the DSP is ARM-based, as there are DSP extensions to the ARM architecture, including
this particular one from NXP which Fourth Storm posted about a while back*, which is based on the Cortex-M3, and happens to run at 120MHz. Nintendo had the opportunity of going with an ARM-based DSP in the 3DS, though, and decided against it in favour of a dedicated architecture, so it would seem odd for them to take the opposite decision with the Wii U. It's also possible that Nintendo went for one of Renesas's DSP cores, in particular the SH3-DSP, but once again this is a CPU architecture repurposed as a DSP, so doesn't seem consistent with Nintendo's previous decisions. There are of course other dedicated DSP designers than CEVA, but it'd be a stab in the dark trying to pick the one which Nintendo would have gone with.
Let's just assume, for the moment, that Nintendo have gone with a CEVA TeakLite III DSP, and have kept a redundant GC/Wii Macronix DSP on there for BC purposes.
The TeakLite III core is 0.47mm² on a 65nm process. I'd say it's fair to assume that, on 40nm, the TeakLite III + Macronix DSP shouldn't be larger than 1mm² combined, or at least any other DSP they might choose should be around that ballpark. This would bring our total remaining die to
~137.31mm², which would account for the GPU and memory controller.
Here's the thing. From very early on in the speculation threads, when we heard that the GPU was based on the R700 line, the RV740 was identified as the most likely candidate for a base for the chip. It's designed for a 40nm manufacturing process, it's fits the reported 640 shader count, and (clocked down) it fits our performance expectations. It also happens that the RV740 die (which includes the memory controller) is exactly
137mm². Now, of course I've made a number of assumptions in my calculations, and of course any modifications Nintendo would have made to the RV740 would be quite unlikely to leave it at the exact same size, but it's still astonishing how close the GPU's die size corresponds to what we'd expect from something based on the RV740, and at this point I'd be very surprised if it were anything but.
*It's always fun how often I come across GAF posts while researching these things.