A combination of finally getting to play more PS3/360 games and evaluating certain comments.
Wii U isn't the 1tflop console I hoped it to be (or even HD 6570) and I'm kinda angry at how Nintendo is handling the system right now.
Apart from that, nothing else changed. I still think it's more powerful than PS3/360. But the potential Nintendo had with the specs... completely wasted.
I won't deny that. Antonz and I kinda caused a stink in either the first or second WUST when talking about how Wii U wasn't shaping up like we thought/hoped early on. I was wanting to see something much closer to what Xbox 3 seems headed toward minus 8GB of memory. I didn't like the hardware decisions with Wii.
Iwata talked about their "Jimae-shugi" policy and how they didn't use it for setting up their online system. I think they need to do the same for the next console. I'm not saying they need to go out and build a 6 or 7 TFLOP beast in 2017/18. But they need to do just like Sony when they are ready to start planning their next console. Ask 3rd parties what they want. Nintendo needs to adapt to them not the other way around. And once they have an idea build a console as close as possible to those requests within the budget they've set. And then in turn have their devs learn how to use the hardware. And if that meant no BC for the next console then so be it.
Bg, I indicated last night that I largely disagree with your labeling of components. Allow me to elaborate in order to continue our friendly debate. Basically, I find a problem in the methodology you employ. To say that a block on one chip appears visually similar to a block on another chip is problematic, because the layout of SRAM changes drastically from design to design. For example, if we look at the shader blocks in R700, Llano, and Brazos, the SRAM arrangement is drastically different in each. Therefore, I find it very dangerous to draw a conclusion based on this alone. What I have tried to do is take basic appearance into account, but focus more on the amount of SRAM in each block and their placement on the chip.
Friendly is always enjoyable.
And I agree. We both have taken the same approach in identification.
For example, what you have labelled U on Llano, I actually believe to be analogous to F on Latte. Both have 32 small blocks of SRAM and lie adjacent to what we know to be the display interface. Thus, I would bet F to be display related, although it's hard to pinpoint exact function.
That's how I labeled the block in Llano as F as well. It has 32 similar (to itself) blocks of SRAM and in fact jumping ahead to the link you gave, it specifically labels (a portion of) that block as the UVD not the one I listed as U. That link would seem to confirm my view. And in fact may explain Block E in Latte.
The T blocks I am quite confident in at this point, not only because of their relationship to S (L1 texture cache), their close proximity to the DDR3 interface, and their striking resemblance to the TMUs on RV770, but because of the amount of SRAM contained within. If you look at The TMUs in Llano, you will see that there is a large disparity between the amount of SRAM they hold and the amount of SRAM in the J blocks - too much a disparity to be ignored, even taking the differing architectures into account.
This to me suggests that in the memory hierarchy MEM2 is getting higher priority over MEM0/1. This is where I also believe that Nintendo won't have the Wii's eMemory go idle in Wii U mode. I don't see Nintendo contributing that much die space to that memory just to go unused. I would believe that 1MB of SRAM replaces the need for L1 texture cache (and L2) in other conventional designs. For me I think it would be more logical for the J blocks to be the TMUs, giving us 16, and having to access that memory than for T to be the TMUs and separated from the SIMDs. The latter seems rather random compared to any GPU we've looked at.
I also see your point in being careful in giving die placement too much weight, since we see different arrangements. For example, ROPs may not always be right next to the memory interface. I do think it's worth keeping in mind, however, since RV770 utilized an approach which placed primary bandwidth consumers adjacent to the memory interface in place of a ring bus (something which does not seem to be present on Latte).
I label W as the ROPs, not only because they do resemble blocks around the outer edge of RV770, but also because there must be enough memory on them to to account for the color cache and Z cache. Llano seems to be a strange configuration and different from RV770. This link seems to say that there are 2 blocks, each containing L2, ROPs, Z cache and color cache.
http://www.realworldtech.com/fusion-llano/
I cannot find any two identical blocks to fit the bill, so the jury's out on that one. I wouldn't be surprised if they got that detail wrong and what you labeled W is a block of 8 ROPs with the block above it being the L2.
I agree to an extent with what you are saying at the beginning, but that still doesn't explain why the blocks you are considering as ROPs are nowhere near the DDR3 I/O in the other GPUs. In Llano it's completely on the opposite side. In Llano I had identified two sets of blocks near the mem I/O as potential duplicates. First there are two smaller ones (one has the memory going horizontally and the other vertically) under what I labeled as F, but they don't seem to have enough memory. If we are looking at the large, vertical die shot there are two blocks to the left of the bottom row of SIMD blocks that can work as well. If you notice the block closest to mem I/O has it's layout affected by that small red block at it's bottom, left corner. It seems that is causing the SRAM in that block to be bunched up more than the block the right.
Like I mentioned before Xenos shows us we don't have to find two blocks for 8 ROPs. And considering the memory hierarchy and BW needs this is why I see B as the most likely candidate for the ROPs due to it's positioning with the eDRAM portions. Again making sure the smaller eDRAM pool is utilized in Wii U mode and keeping them close to the larger pool if used as FB.
I'd hazard a guess that my view could be why some ports suffered from issues with transparencies. As an example Mem0 not being properly accessible (if at all) early on since in my view it would replace the L2 normally found in GPUs at that point in the pipeline, and devs using Mem1 either for other needs or it doesn't have the same BW as the ROPs access in Xenos.
Finally, it is tough to tell which components even get their own dedicated blocks. For example, you name Hierarchical Z and the tesselator as two blocks which might be candidates for the duplets, yet have a look at this presentation. It's of the HD2000 generation, but the front end in AMD's cards went largely unchanged between this series and HD4000.
https://graphics.stanford.edu/wikis...AttachFile&do=get&target=Eric_Demers_R6XX.pdf
This pdf states that tesselation is performed within the vertex engine. Meanwhile, HiZ is a function of the scan converter/rasterizer. Actually, looking at this, I might have to slightly amend my labeling as the command processor seems to contain quite a few command queus and whatnot which are working with the CPU. I might have this as B now. Also, I is the ideal location for thread dispatch as it needs to draw data from two different caches - the instruction cache and constant cache, which would fit the D block quite nicely. Down in P, I would guess might be a large stream out buffer. This has grown from 8k to 128k over the last few generations.
These changes in layout and the addition of extra memory (I also have the GDS more in line with recent chips) are the type of changes I would expect Nintendo to make to the R700 architecture, in contrast to a complete overhaul. This is why I do not understand when people say that Latte looks nothing like RV770. Many of the blocks look quite similar; they are just found in a different arrangement on the die.
The portion of the PDF you are referring to only seems to say that the Vertex block is capable of tessellation. Page 15 says there is a programmable tessellation unit. Also In regards to the SC/RAS and HiZ, it says:
- Also interfaces to depth to perform HiZ / Early Z checks
That would suggest a second block is necessary.
I do agree with what you are saying about using the RV770 die shot. For example I consider the block above the TMUs to be like Latte's Block I. Though the layout seems to resemble more of AMD's IGP line making that a "clearer" comparison.
In my interpretation I'm still not fully committed to D being video, with I and D being the GDS and Shader Export most likely respectively. And P being the IC/CC/LDS.
I think I covered everything, haha.