• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

WiiU "Latte" GPU Die Photo - GPU Feature Set And Power Analysis

Status
Not open for further replies.

ozfunghi

Member
Excuse me?

I wouldn't know why, but... sure.

Because its a different take on what we have, and when cross posted here noone had responded to it. No other poster pixel-counted the areas. And which chip was that?

Brazos, 40nm, 40SP/block

I thought he meant the AMD architecture. Ok, from now on, we call the GameCube GC. I know it's coded as GCN on the bottom of every gamecube (I have one, I can see it), but this is getting confusing. So AMD architecture is GCN, the gamecube is GC (or NGC).

Yeah, i was talking about the Gamecube. Maybe you could have gotten it from me saying "the" GCN, but it wasn't intentional on my part. Sorry for the confusion. That said, maybe Blu does in fact have experience with GCN (AMD) as well, lol.
 

efyu_lemonardo

May I have a cookie?
Not sure what this means, but here's a tweet from that marcan chap:
He's referring to the ARM core's SRAM.
this is what wsippel was wondering earlier. 96K + 16K +16K= 128K which is what Starlett had and therefor must be included in Starbuck.

edit: beaten with an image ;)
 

tipoo

Banned
Edit: Damn it. Yeah, Marcan annotated the die above. And by the way, the 4MB initially guessed for the smaller faster eDRAM is wrong, that was an initial guess and everyone ran with it, it's actually 2MB.

We also know which block is Starbuck now.
 

tkscz

Member
He's referring to the ARM core's SRAM.
this is what wsippel was wondering earlier. 96K + 16K +16K= 128K which is what Starlett had and therefor must be included in Starbuck.

edit: beaten with an image ;)

Why name it that?

Tipoo the image is huge and was posted already.
 

McSpidey

Member
It's so odd to see USB and SATA I/Os on a GPU. The only rational reason I can come up with for why they're there is to embed them with the Starbuck chip.
 
Why name it that?

Tipoo the image is huge and was posted already.

Because if Starlet was the security processor in the Wii setup, and the Wii U chips are all codenamed with coffee references (Latte, Espresso, Cafe for the system...) then "Starbuck" makes sense :)
 

tipoo

Banned
Why name it that?

Tipoo the image is huge and was posted already.

Yeah I saw right after I posted, can't help it, this thread is quick :p

I believe Starbuck is just his nickname for it since the last one was Starlett, plus everything is coffee themed lol
 

tkscz

Member
Because if Starlet was the security processor in the Wii setup, and the Wii U chips are all codenamed with coffee references (Latte, Espresso, Cafe for the system...) then "Starbuck" makes sense :)

While that's funny considering how the Wii's was called Starlett, I hate Starbucks.
 

tipoo

Banned
It's so odd to see USB and SATA I/Os on a GPU. The only rational reason I can come up with for why they're there is to embed them with the Starbuck chip.

The GPU in this case is also performing northbridge and southbridge functions, if I'm not wrong? That's probably the CPU is smaller than we would expect and the GPU is larger than the shader count would imply, I suspect. In the Wii U everything goes through the GPU first, CPU second.
 
And once again, if that's how a 40SP block looks like, there's 4 of them on the WiiU chip. Not 8.

So am I looking at that image right in that all of this:
AMD never in any designs they have access to used a register bank larger than 2KB. There's no need to. The register banks on the Wii U die look too large to be 2KB@40nm. So either the fab is different or its larger than 2KB

In addition the simd logical area is also smaller than other units that have 40 SP per core at 40nm. That points to either an odd number or different fab.

And thats in addition to having only 32 register banks for over 32 SP units
Still applies when using the Brazos 40nm comparison?
 

Donnie

Member
* most likely 352 GFlops (1.5x 360)
* heavily customized design, with about half the die spent on "special sauce", e.g. fast hardware implementations of "common" subroutines. What the special sauce is and how effective it will be for third party development is an enigma wrapped in a riddle shrouded by mystery. However, it will almost certainly boost graphical performance above the pure flops number, at least for developers who know how to use the hardware. (Nintendo first party titles)

and, here's some idle speculation:
* Nintendo might be really bad at documenting their stuff, since third party developers seem to be having trouble using the special sauce. Or maybe they consider it a trade secret? I don't know.

Good summary, but I just have to mention that just from the perspective of pure floating point performance 352flops from a improved R700 design will be more than 1.5x 360, which is a very early unified shader design.
 
And once again, if that's how a 40SP block looks like, there's 4 of them on the WiiU chip. Not 8.
Don't you really see that the blocks of the WiiU are a "bit" bigger than the Brazo ones?
The same goes with the memory, each memory cell of the WiiU blocks is double the size (I would say more like triple) of the Brazo ones.
320SPs is the most plausible option no mater how you look at it, unless you argue that Nintendo has customized de SPs to make them much bigger than normal.
 

Earendil

Member
Don't you really see that the blocks of the WiiU are a "bit" bigger than the Brazo ones?
The same goes with the memory, each memory cell of the WiiU blocks is double the size (I would say more like triple) of the Brazo ones.
320SPs is the most plausible option no mater how you look at it, unless you argue that Nintendo has customized de SPs to make them much bigger than normal.

Hulk Mode SPs...

B E L I E V E
 
Ah, glad Marcan confirmed what I suggested! The DDR3 I/O makes much more sense next to the caches there!

Also that was the only block that was small enough to be an ARM9. On 90nm, they are already under 1mm^2.
 

Raist

Banned
Don't you really see that the blocks of the WiiU are a "bit" bigger than the Brazo ones?
The same goes with the memory, each memory cell of the WiiU blocks is double the size (I would say more like triple) of the Brazo ones.
320SPs is the most plausible option no mater how you look at it, unless you argue that Nintendo has customized de SPs to make them much bigger than normal.

The size difference of the whole isn't really interesting, given that there's more empty space as well.

I agree that the cells themselves are much bigger... but as pointed out numerous times, the number of register files doesn't add up...
 

Popstar

Member
How likely is POPstar's theory about not even needing traditional ROPs? He posted it here, then deleted it and put it in the "serious discussion" thread.
I figured the other thread was more appropriate since it wasn't speculation I was directly basing off the die photo. I wouldn't call it a "theory".
 
Earendil said:
Hulk Mode SPs...

B E L I E V E
Yeah, ok. So Nintendo's chip being made the latest and presumably the one that has the more optimized design towards gaming is the biggest one with only 20ALUs and they haven't a single customization made on them. B E L I E V E :)

It's minimum 40ALUs or in case of being 20, some ultra-customized ones with of course much better performance in doing certain operations.

Could it be that some of the Wii fixed functions are on the ALUs?


The size difference of the whole isn't really interesting, given that there's more empty space as well.

I agree that the cells themselves are much bigger... but as pointed out numerous times, the number of register files doesn't add up...
But... empty space? There is no empty space! This "empty space" is where the SP are located. So the fact that there is more "empty" space, is proof that there has to be something there. No one wastes die area for nothing, and even less if you have a customized design!
 
I've just had a bizarre thought - what if all of this extra silicon that we don't have a clue about does absolutely nothing and Nintendo are just fucking with everyone lololol
 

wsippel

Banned
Wsippel, I told you, man ;p
Actually, I thought so as well (posted it, even), but changed my mind because the SRAM in the block at the top seemed to match. And if something is called "tightly coupled memory", one would expect it to be close, right? ;)
 

ozfunghi

Member
So... with the findings of Marcan... can somebody update, or do we wait for an update in the OP? I'm rather curious to see what the conclusion is on Marcan's theories and what the consequences are (if outlined).
 
So... with the findings of Marcan... can somebody update, or do we wait for an update in the OP? I'm rather curious to see what the conclusion is on Marcan's theories and what the consequences are (if outlined).

It just makes much better sense of things having the DDR3 and CPU I/Os switched from what we initially thought.

-I keep coming back to ROPs - they just gotta be blocks U1 and U2. It is strange to see them away from the DDR3 controller, but that might just be a consequence of the 32 MB eDRAM acting as a framebuffer in most cases.

-They are also fairly removed from the SPUs, however. The mysterious O and R blocks sit in the middle. Those two blocks also contain a large proportion of logic for their size. Quite curious.

-I have a strong feeling that blocks W1, W2, T1, T2, V, S1, S2, Q1, Q2, and maybe even P will all be accounted for as rather mundane front end GPU stuff - command processor, geometry engines (w/ tessellation units), instruction cache, vertex cache, constant cache, ultrathreaded processors, etc. Many of these things are doubled in newer AMD GPUs (usually with much more grunt, but Nintendo may have planned this to render two discreet scenes at once).
 
Now it makes sense what BG was telling me over 6 months ago, that the GPU performance would be similar to an AMD e6760 but not based or made just like it except for being very close in power consumption. In that sense the Wii U GPU does share some things in common with that GPU but still completely custom and not like any GPU for a PC or the coming systems from MS and Sony.

However, Nintendo hopefully made the GPU "easy enough" to develop for so porting down-scaled modern engines would not be too much of a hassle so even it's custom "not ordinary" parts can still be used.

Bravo BG

I'm pretty sure I remember reading that Nintendo were working with Crytek and/or Epic when developing the console so it shouldn't be a problem hopefully. Nintendo know that the nonstandard rendering pipeline the Wii had thanks to the TEV Unit caused problems with porting so hopefully they've evolved it or completely redesigned it to stop fixed functions being a problem.
 

tipoo

Banned
Ah, glad Marcan confirmed what I suggested! The DDR3 I/O makes much more sense next to the caches there!

Also that was the only block that was small enough to be an ARM9. On 90nm, they are already under 1mm2.



Yeah that makes much more sense, I thought the edge towards the SRAM was the connect towards the CPU and the bottom row was the DDR3 connect, but the SRAM being towards the memory end of things makes more sense from a cache perspective. That may throw my theory of a CPU-GPU scratchpad out.
 

MDX

Member
Is it 32MB of SRAM or eDRAM?
Because I thought AMD didnt know how to put eDRAM in a GPU.
And the new Xbox is supposedly using embedded SRAM as well.
 
Yeah that makes much more sense, I thought the edge towards the SRAM was the connect towards the CPU and the bottom row was the DDR3 connect, but the SRAM being towards the memory end of things makes more sense from a cache perspective. That may throw my theory of a CPU-GPU scratchpad out.

Oh, I'm sure there's still a scratchpad. It's just within the 32 MB MEM1. ;)

Is it 32MB of SRAM or eDRAM?
Because I thought AMD didnt know how to put eDRAM in a GPU.
And the new Xbox is supposedly using embedded SRAM as well.

32 MB eDRAM - which is similar enough to 1t-SRAM that marcan can mix them up. 1t-SRAM is actually pseudo SRAM - not the real six transistor stuff you can see in the upper left hand corner of the Chipworks image. The thing is eDRAM/1t-SRAM are much denser, so you can add larger amounts.
 

wsippel

Banned
It just makes much better sense of things having the DDR3 and CPU I/Os switched from what we initially thought.

-I keep coming back to ROPs - they just gotta be blocks U1 and U2. It is strange to see them away from the DDR3 controller, but that might just be a consequence of the 32 MB eDRAM acting as a framebuffer in most cases.

-They are also fairly removed from the SPUs, however. The mysterious O and R blocks sit in the middle. Those two blocks also contain a large proportion of logic for their size. Quite curious.

-I have a strong feeling that blocks W1, W2, T1, T2, V, S1, S2, Q1, Q2, and maybe even P will all be accounted for as rather mundane front end GPU stuff - command processor, geometry engines (w/ tessellation units), instruction cache, vertex cache, constant cache, ultrathreaded processors, etc. Many of these things are doubled in newer AMD GPUs (usually with much more grunt, but Nintendo may have planned this to render two discreet scenes at once).
As Popstar speculated, there might be no (real) ROPs.
 
As Popstar speculated, there might be no (real) ROPs.

Ah, I hadn't read his post in the other thread. It's an interesting proposition. Admittedly though, my knee jerk reaction is, "Why argue for something seemingly nontraditional and unfamiliar to console devs when we are staring at quite a few blocks that could very well be ROPs?" It just seems less likely. Similarly, why stress the limited SPUs even more?
 

tipoo

Banned
Oh, I'm sure there's still a scratchpad. It's just within the 32 MB MEM1. ;)

Probably, I meant as an explanation for having three different cache levels though. Backwards compatibility with the Wii would likely explain it, but I wonder what that 1MB SRAM and 2MB faster eDRAM will be used for in Wii U mode.

And it would be odd for Wii backwards compatibility too, the Wii had 3MB eDRAM for the GPU, why have a 2+1 split for that? Maybe since they could only come close by going with 4, they decided to split it instead to get the exact needed number? But then, why is the 1MB SRAM instead of eDRAM?
 
Same poster (Esrever) comparing to Brazos
http://beyond3d.com/showpost.php?p=1703822&postcount=4546

mrWZj77.jpg

Scaled to size assumed to be 8.3mm X 9 mm for 75mm^2( some sites say 80mm, if that is the case my numbers for the measurements would be 7% lower than they should be for Brazos)
gNkktjV.jpg

Wii U's SIMD blocks are larger but not 2x as large, it looks like the caches occupy more area. If both are believed to be from TSMC's 40nm process, it looks like what is inside the Wii U has more cache.
0tak1ZS.jpg

The cache blocks are these size. if we use these size, the total cache area inside 1 Brazos SIMD is 2656px. For 2 Wii U blocks it is 3648 px. Cache area on die is then 37% larger. In comparison, the whole SIMD block is 50% larger when comparing 1 Brazos SIMD to 1 Wii U.
 

Raist

Banned
But... empty space? There is no empty space! This "empty space" is where the SP are located. So the fact that there is more "empty" space, is proof that there has to be something there. No one wastes die area for nothing, and even less if you have a customized design!

I know, I meant visually. Should have put quotation marks.
Thing is, the area is nowhere near twice the Brazos', so I don't see how there could be a total of 320SPs crammed in there.
 

tipoo

Banned
I know, I meant visually. Should have put quotation marks.
Thing is, the area is nowhere near twice the Brazos', so I don't see how there could be a total of 320SPs crammed in there.

In addition to each unit being larger, there also appears to be less space for SRAM around each cluster in the Wii U GPU, right? Maybe that also explains the three level memory configuration of the GPU? Each cluster can get away with less SRAM because there are three pools of fast memory on the die anyways, and in using less space for SRAM they can cram more shaders in? Just an idea.
 
Well, about the caches, I assume that the part colored in blue/black is the one relative to the memory connectors, and the part in orange the memory transistors that contain the data.
If this is correct and I'm not mistaken, then in terms of cache sizes we would have at least the same amount than the Brazos SPUs, or even more.

Raist said:
I know, I meant visually. Should have put quotation marks.
Thing is, the area is nowhere near twice the Brazos', so I don't see how there could be a total of 320SPs crammed in there.
The area for SPUs in Brazos is 4834px, and on the WiiU it's 3192px. There are three option here:
1. WiiU's SPUs have been customized in some way that makes them smaller. For example, stripping some functions less important on consoles.
2. WiiU's SPUs have been beefed up adding functions, and there is only 20 of them per block.
3. There are 30 SPUs on each block. It's a custom design but I don't know if this has that much sense or if it's impossible. Someone with real knowledge on the subject could tell if it's possible or if it's not.
 

Raist

Banned
The area for SPUs in Brazos is 4834px, and on the WiiU it's 3192px. There are three option here:
1. WiiU's SPUs have been customized in some way that makes them smaller. For example, stripping some functions less important on consoles.
2. WiiU's SPUs have been beefed up adding functions, and there is only 20 of them per block.
3. There are 30 SPUs on each block. It's a custom design but I don't know if this has that much sense or if it's impossible. Someone with real knowledge on the subject could tell if it's possible or if it's not.

Well that's almost exactly a 50% increase, so it might just be 240SPUs in total...
Wasn't there a rumor at some point that Latte was based on a E6xxx GPU with 480SPUs?

As for the cache size, the ratio between WiiU's and Brazos' is almost exactly the same than 55/40...
 

wsippel

Banned
Ah, I hadn't read his post in the other thread. It's an interesting proposition. Admittedly though, my knee jerk reaction is, "Why argue for something seemingly nontraditional and unfamiliar to console devs when we are staring at quite a few blocks that could very well be ROPs?" It just seems less likely. Similarly, why stress the limited SPUs even more?
I'm still looking for an explanation for the complete lack of tearing, and something like a tile based renderer, maybe even a scanline renderer, could potentially explain that curious detail. Also, scanline renderers are apparently extremely efficient with a limited number of polygons or if combined with a Z buffer, but the last hardware based scanline renderer ever built was the DS GPU as far as I know. Though that thing is a mystery in itself...
 

Popstar

Member
Ah, I hadn't read his post in the other thread. It's an interesting proposition. Admittedly though, my knee jerk reaction is, "Why argue for something seemingly nontraditional and unfamiliar to console devs when we are staring at quite a few blocks that could very well be ROPs?" It just seems less likely. Similarly, why stress the limited SPUs even more?
I'm not really arguing for anything. I was just speculating out loud which is why I put it in the other thread. It would explain the problems some games are seemingly having with transparencies.

But I wouldn't bet money on it. Well, maybe if you gave me really good odds I might throw a couple dollars at it just in case.

I'm still looking for an explanation for the complete lack of tearing, and something like a tile based renderer, maybe even a scanline renderer, could potentially explain that curious detail. Also, scanline renderers are apparently extremely efficient with a limited number of polygons or if combined with a Z buffer, but the last hardware based scanline renderer ever built was the DS GPU as far as I know. Though that thing is a mystery in itself...
Nintendo might just have "vysnc must be enabled" as part of their lot check requirements. Besides the DS I think Rendition chips back near the dawn of time were the only hardware using a span buffer.
 
Status
Not open for further replies.
Top Bottom