Full-size photo here (very big)
A version of the die photo with each of the components outlined and labelled:
What's going on here?
A few days ago, wsippel noticed that Chipworks had Wii U die photos up for sale on their website, and at $200 a piece (for each of the CPU, GPU and NOR dies), some of us in the Wii U Technical Discussion Thread decided to chip in a few dollars each to buy the GPU photo, with the aim of a few of us (Fourth Storm, wsippel, Blu, Durante and myself) deciphering it and posting up our results on GAF. Chipworks, though, decided to be amazingly kind and helpful, and sent an email back to us offering not only to do a higher quality polysilicon die photo for us at their expense (as they felt their existing shot didn't give us the detail we needed), but to allow us to post the full-res photo up here on GAF for all you lovely folks to enjoy!
(We've learnt from Chipworks that this kind of photo would usually cost about $2500 to do, so it really is incredibly generous of them to do it for us for free.)
***What follows is speculation and deduction based on what we see in the die photo. Our analysis is ongoing, so please don't jump to any hasty conclusions***
What am I looking at?
The die is exactly 11.88 x 12.33mm (146.48mm²). Chipworks believe that it's "fabricated in a 40 nm advanced CMOS process at TSMC". It carries Renesas die markings, but no AMD die markings (although there is an AMD marking on the MCM heat-spreader). This is unexpected, as it was widely reported that the GPU was originally based on AMD's R700 line, and Nintendo publicly referred to it as a Radeon-based GPU. As the die appears to be very highly customised (it looks very different to other R700-based GPUs), the markings (or lack thereof) may indicate that the customisations were not done by AMD, but rather by Nintendo and Renesas.
In addition to the usual GPU components, the die includes a large eDRAM pool (accessible to both CPU and GPU), and it is understood that one or more ARM cores are also on-die, as well as a DSP.
As can be seen in the labelled version of the die photo above, there are three main sections of the chip, which I will deal with separately. The first of these comprises the memory pools (the two eDRAM pools and one SRAM pool), the second the off-chip interfaces (around the edges of the chip), and the third the GPU logic (the sections labelled A-Y).
The large orange block on the left is 32MB of eDRAM, known as MEM1. It's 40.72mm², and takes up 27.8% of the die. As this appears to be a non-standard eDRAM configuration from Renesas, the interface (and hence the bandwidth) are not immediately obvious. This pool of memory is accessible from both the CPU and GPU, and would be expected to have a high-bandwidth, low-latency interface to each. The MEM1 pool also serves a purpose in Wii mode, by replacing the 24MB of 1T-SRAM.
The smaller orange block above it is also eDRAM, and is referred to as MEM0. It's 4.25mm², and appears to be 2MB in size. In Wii mode it is used for the embedded framebuffer, and in Wii U mode it "is used as fast general purpose RAM".
To the left of the smaller eDRAM pool is a pool of SRAM, understood to be 1MB in size, and seems to be used as a texture cache in Wii mode. Its purpose in Wii U mode is unclear, possibly also serving as a cache. Its use as a texture cache in Wii U mode would be puzzling, though, as it is on the opposite corner of the die from the DDR3 interface, and seemingly far from the texture units.
The interface running around the lower right corner of the die is the DDR3 memory interface (the DDR3 is known as MEM2).
Running along the top and left sides of the die, along with a small section on the upper right side of the die, are general purpose I/O (GP I/O). The GP I/O is likely dedicated in large part to communication with the CPU, but may also be used for lower-bandwidth off-chip communication, such as the Blu-Ray drive or SD card slot.
On the bottom left of the die there are two high-speed I/O (HS I/O) interfaces, such as SERDES (serialiser/deserialiser), which are used to achieve very high bandwidth over relatively few wires. Proposed applications of these include:
- Communication with the hardware that handles video transmission to the gamepad
- Communication with the CPU (to provide high-bandwidth/low-latency eDRAM access)
- USB interfaces
- SATA interface
- Flash memory interface
There are also two blocks on the right side of the chip above the DDR3 interface that are currently unknown. These may be part of the DDR3 interface, or may be I/O elements in and of themselves.
(This is somewhat of a misnomer, as there's an ARM CPU and a DSP in there, but we're not certain where either of them are.)
The GPU logic consists of 40 blocks, which are apparently of 25 different types. They are labelled A-Y, and repeated blocks are numbered. The small orange/black units on these blocks are SRAM cells, and the type, quantity and location of the SRAM cells are central clues when it comes to discovering which blocks contain which components.
The blocks labelled N1-N8 appear to contain the SPUs. Judging by their size relative to other 40nm VLIW5 GPUs, it seems that they each contain 40 SPUs, giving a total of 320. As well as the changes in their grouping (VLIW5 SPUs are usually grouped in 20s), there seem to be changes to their register files (the SRAM cells around them).
It is generally assumed that the blocks labelled J1-J4 are the texture unit bundles (four in each for a total of 16). Their location is neither adjacent to the DDR3 interface or the SRAM "cache", which would be unusual, but certainly not impossible, placement for texture units. That they are located next to the SPUs, however, makes sense.
The location of the ROP bundles is unknown, but blocks U1 and U2 have been proposed as possibilities, due to their size and location. It is also possible that the ROP bundles are separated out into their constituent components, so may reside in asymmetrical blocks (or sets of blocks).
The ARM core (referred to as "Starbuck") is believed to be very similar (or even identical) to the "Starlet" ARM core on the Wii's GPU die. As such, it should be very small, possibly <1mm². Marcan believes that Starbuck is block Y, which seems likely given the size and SRAM configuration.
There is almost certainly a DSP somewhere on the die, although we know little to nothing about it. Like the ARM core, it should be pretty small.
Other potential functions of blocks on the GPU logic:
- Command Processor and Thread Scheduler (not necessarily the same block)
- Trisetup and rasterizer (R800 dropped that and delegated the workload to SPs)
- Global Data Share (traditionally not very large, and likely encased nicely by some of the numerous embedded pools, in a much larger size)
- A bunch of caches (vertex, texture) which could be really tiny or not so much (again, memory pools ahoy)
- DMA engines
- Ring buses
- Tessellator (likely still sitting in fixed-function silicon)
It is likely that at least a couple of the blocks are used for Wii BC (see further discussion on this below).
It it worth noting at this stage that a large portion of the GPU logic is still unexplained. Even accounting for everything we know should be on there, there are a significant number of blocks left. There are a number of possibilities to consider. For one, it could simply be that there's obvious functionality we aren't considering. Otherwise there may be some customised units not usually present on GPUs. Alternatively, there's at least one crackpot theory that we're undercounting the SPUs, texture units and ROPs, chalking it up to an asymmetric shader design.
Wii Backwards Compatibility
The GPU is understood to provide full hardware level BC with Wii's GPU. Some of the components for this (e.g. MEM1 and MEM0) have already been explained, however the GPU logic itself needs to be accounted for. In considering this, the following comment from Ko Shiota, the Deputy General Manager of Nintendo's Product Development Department, is worth reading:
Shiota said:Yes. The designers were already incredibly familiar with the Wii, so without getting hung up on the two machines' completely different structures, they came up with ideas we would never have thought of. There were times when you would usually just incorporate both the Wii U and Wii circuits, like 1+1. But instead of just adding like that, they adjusted the new parts added to Wii U so they could be used for Wii as well.
This seems a clear indication that there is not a full 1:1 copy of the Wii's Hollywood GPU on the die, but that at least some parts of its functionality are being handled by Wii U components.
Hollywood was about 72mm² on a 90nm process (inc embedded RAM and ARM), so even if there is a 1:1 copy, it would only be expected to take up around 10-20% of the space on a 146.48mm² 40nm die. Given Shiota's comments, the actual amount of GPU logic dedicated purely to Wii BC may be as low as 5-10%.
It is worth considering what Wii U components may provide BC for Hollywood functions. A possible candidate for this is block J1. If the blocks J1-J4 are indeed texture unit bundles, then J1 would seem to have some difference to the other three, due to its slightly larger size. This would be explained if J1 had extra hardware to allow it to also function as the texture unit for Wii mode.
Comparison Die Photos:
RV770 - Radeon HD4870, RV700 series, 55nm
Llano - APU with Evergreen graphics, 32nm
Flipper - Gamecube GPU, 180nm
Latte/Flipper comparison (assuming both at the same manufacturing node)
Info Directly From Chipworks:
Jim Morrison said:Been reading some of the comments on your thread and have a few of my own to use as you wish.
1. This GPU is custom.
2. If it was based on ATI/AMD or a Radeon-like design, the chip would carry die marks to reflect that. Everybody has to recognize the licensing. It has none. Only Renesas name which is a former unit of NEC.
3. This chip is fabricated in a 40 nm advanced CMOS process at TSMC and is not low tech
4. For reference sake, the Apple A6 is fabricated in a 32 nm CMOS process and is also designed from scratch. It’s manufacturing costs, in volumes of 100k or more, about $26 - $30 a pop. Over 16 months degrade to about $15 each
a. Wii U only represents like 30M units per annum vs iPhone which is more like 100M units per annum. Put things in perspective.
5. This Wii U GPU costs more than that by about $20-$40 bucks each making it a very expensive piece of kit. Combine that with the IBM CPU and the Flash chip all on the same package and this whole thing is closer to $100 a piece when you add it all up
6. The Wii U main processor package is a very impressive piece of hardware when its said and done.
Trust me on this. It may not have water cooling and heat sinks the size of a brownie, but its one slick piece of silicon. eDRAM is not cheap to make. That is why not everybody does it. Cause its so dam expensive
Randy from Chipworks said:
Annotated Die Photo From Marcan:
Thanks to Forth Storm, wsippel, blu, Durante for helping organise and analyse, and all the folks who chipped in to buy the photo (who will be getting their money back from FS shortly!). And, of course, huge thanks to Chipworks for doing this for us!
Digital Foundry article
OP will be updated as we go