• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

WiiU "Latte" GPU Die Photo - GPU Feature Set And Power Analysis

Status
Not open for further replies.

Raist

Banned
Haven't checked this thread in weeks and the OP hasn\'t been updated for ages. Any recent significant advances in the die's analysis?
 
Seems like we will never get the answers we want with this GPU.

All we can do at this point to see if its above ps360 graphics is to wait until E3.
 
Seems like we will never get the answers we want with this GPU.

All we can do at this point to see if its above ps360 graphics is to wait until E3.

Yeah, the chip is pretty damn enigmatic. My latest (perhaps final) take on it is that it seems like there may indeed be only 160 shaders on there, but those shaders have been modified to the point that you can't rightly compare them to any Radeon. It truly is a custom beast.
 

krizzx

Junior Member
The max is likely 550 million polygon/sec (1 poly/cycle @ 550MHz) The 360 is 500 million, the PS3 is 250 million (cell had to help to keep up with the 360s), and the PS4/Durango clocks at 1.6 billion. The real world numbers are a bit more complicated than that, considering that no current-gen game was able to reach close to 500 million poly/ sec.

That's how you count polygons? 1 polygon per hertz?

I remember seeing the PS2 hardware limit being 50 millioin for 147 Mhz and the GC hardware limit being like 110mil for 162 MHz

I always thought polygon performance was calculated by some other factor.
 
That's how you count polygons? 1 polygon per hertz?

I remember seeing the PS2 hardware limit being 50 millioin for 147 Mhz and the GC hardware limit being like 110mil for 162 MHz

I always thought polygon performance was calculated by some other factor.

GameCube's peak poly count was actually about 20 million (Nintendo gave out some realistic performance figures of about 675,000 triangles per frame @30Hz and 337,500 @60Hz). PS2 had an infamously inflated theoretical peak poly-count (something like 60 million raw *read, no textures or effects* triangles and a realistic count of 500,000 triangles per frame @ 30Hz and 250,000 @ 60Hz).
 
GameCube's peak poly count was actually about 20 million (Nintendo gave out some realistic performance figures of about 675,000 triangles per frame @30Hz and 337,500 @60Hz). PS2 had an infamously inflated theoretical peak poly-count (something like 60 million raw *read, no textures or effects* triangles and a realistic count of 500,000 triangles per frame @ 30Hz and 250,000 @ 60Hz).

sony's infamously inflated figure was more like 100 million
 
That's how you count polygons? 1 polygon per hertz?

I remember seeing the PS2 hardware limit being 50 millioin for 147 Mhz and the GC hardware limit being like 110mil for 162 MHz

I always thought polygon performance was calculated by some other factor.

That's how it's calculated on many modern AMD GPUs at least. Although I believe the newest batch have doubled that. So 2 polygons per herz. I believe PS4/Durango fall into that category.
 

AmyS

Member
For comparison, the 360's eDRAM is usually listed as 256Gbit/s, which is 32GB/s. So if the WiiU eDRAM is 70GB/s, that's a healthy doubling over the 360's eDRAM. And as on-die eDRAM, it doubtless has respectable latency performance.




Well from what I remember (and this might be somewhat confusing) 360's eDRAM actually had 256 GigaBytes/sec bandwidth, internally, between the eDRAM itself and the ROPs/logic, within the 'daughter' die. The daughter die had 32 GB/sec bandwidth to the 'parent' shader core.


70GB/s isn't crazy high, but for a console that seems to be targeting a graphical performance ballpark not all that far above PS360, it's probably more than adequate. If 70GB/s is what the eDRAM actually is, that's probably something to be celebrated, not a horrifying bottleneck of doom.

I would tend to agree with you about Wii U, especially since 360's eDRAM was rather limited in what it could and could not do, in many respects.
 
Cerny in speaking about PS4 development mentioned eDRAM and the amount bandwidth that's possible, I get the feeling Nintendo went for pretty high bandwidth and low latency. I'm guessing 256 or maybe even higher, and devs are testing it. Criterion just texture resolution, others anti-aliasing.
 

jaz013

Banned
Don't lose your hope ForeverZero!

Maybe soon Nintendo will publish real specifications to developers that can then leak it to us!

Even on the *official* Wii documentation those details where a little scarce, so, I wouldn't count on that.
 
I suspect we will find more as kits flood into the hands of Indies while I know the system is rather light on documentation they will start to get a jist with how stuff runs and someone is bound to end up opening their mouth when they shouldn't
 

krizzx

Junior Member
GameCube's peak poly count was actually about 20 million (Nintendo gave out some realistic performance figures of about 675,000 triangles per frame @30Hz and 337,500 @60Hz). PS2 had an infamously inflated theoretical peak poly-count (something like 60 million raw *read, no textures or effects* triangles and a realistic count of 500,000 triangles per frame @ 30Hz and 250,000 @ 60Hz).

I know the max achieved in an actual game on the Gamecube was 20 million at 60 FPS. I was talking about the hardware maximum, as in with nothing else significant running like what Sony gave for the PS2 and Microsoft for the Xbox1.

http://z10.invisionfree.com/Nintendo_Allstar/ar/t173.htm
I was a little off with the GC. Seems it was 90 million RAW.

I wanted to know the peak and real world estimates of Expresso. I know that there are a lot of factors involved depending on the architecture and hardware features as I recall the PS2 having to split its max fill rate/polygon count in half every time it passed a texture or something of that nature. (I'm pulling this from memory and its been a while)

From what I have gleamed over the past few weeks, it seems that the Wii U lacks the bottleknecks and hangups of the PS3/360 on top of having more RAW graphical power under the hood. There was that explanation about the Wii U having full access to its RAM at all times making its RAM performance higher even though the bandwidth seems to be only half of the PS3/360 RAM bandwidth thanks to the eDRAM(the contradiction to the bandwith starved claimed). Then I have to go to the Bayonetta comparison again. That is an increase in polygons by over 5 times for her basic modal alone. Even if that was a cutscene modal(which I don't it was going by actions it was performing in the demonstration) an enormous increase from Bayonetta 1. We also can't forget the monster at the end that a lot of people were trying to write off a CG(A claim I've heard thrown at a lot of things shown on the Wii U).

Then I compare the environments of games like ZombiU to L4D2 which were both coded specifically for one consoles strengths.

http://images.bit-tech.net/content_images/2009/11/left-4-dead-2-demo-impressions/left_4_dead_2_8.jpg
http://www.nintendo-nation.net/wp-content/uploads/2012/10/ZU_NYC__Screenshot__Supermarket_Melee.jpg

The number of independent objects and clutter is much higher in ZombiU and much more detailed like the palette in the back against the wall and the tables to the left that have each individual board rendered, as opposed to being draw on the textures like it is on the ground in front of the character in L4D2. The level of geometry I see is many times higher than what I see in similar last gen games.

Would it be unreasonable to estimate about a 3X increase in real world polygon performance?
 
I think 320 is the most likely number but 400 could be possible I guess

Not unless the SPUs actually have less register space than other AMD cards. What we are seeing in terms of GPRs seems to indicate 160 shaders, although it's possible that the SRAM blocks are double the capacity they appear to be. Possible, but IMO not likely. :/
 
Could anyone with more technical knowledge than myself compare this to a say, a Geforce Titan? How does it compare?

I haven't been following NVidia cards for a while. IMO, though, it would be tough to compare Latte to any existing GPU. The theory that makes the most sense to me, given what we know, is that they have modified the architecture of the shader processors themselves to handle different instructions (such as ones for TEV). So even if it is 160 shaders, it wouldn't be fair to compare them to 160 shader Radeon parts.
 
That's how you count polygons? 1 polygon per hertz?

I remember seeing the PS2 hardware limit being 50 millioin for 147 Mhz and the GC hardware limit being like 110mil for 162 MHz

I always thought polygon performance was calculated by some other factor.
Fourth Storm answer was correct.

That's how it's calculated on many modern AMD GPUs at least. Although I believe the newest batch have doubled that. So 2 polygons per herz. I believe PS4/Durango fall into that category.
Having said that, the actual polygon counts a game will more complicated to calculated. Heavy use of shaders, for example, will effect how many polygons you can render on screen.
 
Well if it is as custom as the guy from Chipworks suggested then it could be either one of the theorized numbers.

A memory focused design, targeted at bottlenecks. Reducing stalls, promoting efficiency and reaching theoretical peaks. Ground up games will impress and surprise IMO.

I still think eDRAM is crazy high, and why not when shaders are bandwidth heavy.
 
Could anyone with more technical knowledge than myself compare this to a say, a Geforce Titan? How does it compare?


Even if Latte was a stock RV730 it would be difficult to compare to the different architectures of modern AMD and Nvidia GPUs.


Still, some numbers for a very rough estimation:

AMDs current single-GPU flagship is the Radeon 7970 GHz edition. It features 2048 shader cores @ 1 GHz and 3 GB GDDR5 @ 1500 MHz on a 384-bit bus. This results in 4048 GFLOP/s and a memory bandwidth of 288 GB/s.

Assuming Latte would have 320 full-fledged shaders, reportedly clocked @ 550 MHz, it'd be at 352 GFLOP/s. So, a factor of 11.5 (or 23 assuming 160 shaders).
Comparing the memory bandwidth is even more difficult, because on the one hand Latte has eDRAM while PC graphics cards don't and on the other hand it needs to share the bandwidth with the CPU. Again, if we play a bit dumb and blank out these factors, it would be 288 GB/s vs. 12.8 GB/s or a factor of 22.5. Of course it would be interesting to know the eDRAM's bandwidth though. But it's safe to say that because of the eDRAM the factor is not as huge in the real world.

Of course the power consumption of one 7970 card is also higher than the whole Wii U's by a factor of 7.
 

krizzx

Junior Member
Fourth Storm answer was correct.


Having said that, the actual polygon counts a game will more complicated to calculated. Heavy use of shaders, for example, will effect how many polygons you can render on screen.

I wasn't disregarding Fourth Storms comment. It actually made me wonder even more. Since the Wii U GPU is more modern than the other ones, could it not have similar features that allow more polygons than the hertz suggest?
 

prag16

Banned
I haven't been following NVidia cards for a while. IMO, though, it would be tough to compare Latte to any existing GPU. The theory that makes the most sense to me, given what we know, is that they have modified the architecture of the shader processors themselves to handle different instructions (such as ones for TEV). So even if it is 160 shaders, it wouldn't be fair to compare them to 160 shader Radeon parts.

That would be interesting... build the TEV functionality right into all the shaders rather than having to bolt extra stuff to the chip to maintain BC. Fits in with the idea of having BC elements serve a purpose for Wii U games and vice versa.

But is that really a cost effective route, all that customization? It almost seems like it would've been easier and/or cheaper to just throw Hollywood on the die along with everything else alongside a more "traditional" GPU, and call it a day.
 
That would be interesting... build the TEV functionality right into all the shaders rather than having to bolt extra stuff to the chip to maintain BC. Fits in with the idea of having BC elements serve a purpose for Wii U games and vice versa.

But is that really a cost effective route, all that customization? It almost seems like it would've been easier and/or cheaper to just throw Hollywood on the die along with everything else alongside a more "traditional" GPU, and call it a day.

I can't say how cost effective it is, or why they chose to do it. It just seems to make the most sense of:

a)The Iwata Asks comments on not adding old and new parts 1:1
b)No identifiable Hollywood on die
c)No translation being run on CPU (only one core active in Wii mode so no emulation at work)
d)The larger size of the shader blocks, but register banks seemingly the same as standard RV770
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
That's how it's calculated on many modern AMD GPUs at least. Although I believe the newest batch have doubled that. So 2 polygons per herz. I believe PS4/Durango fall into that category.
Just to clarify: that's the triangle setup rate (aka trisetup) - the rate at which the rasterizer can process individual triangles and send them down to the interpolators. The significance of this rate is that no matter what the vertex/geometry/tessellation shaders do, they cannot produce more triangles than what the trisetup can handle. BUT that does not mean that those shading units always produce vertices and/or topology at this rate! IOW, the trisetup rate is merely a cap of the pipeline in its ability to handle triangles, not the rate in every given case - a particular case can be much lower than that.

Well from what I remember (and this might be somewhat confusing) 360's eDRAM actually had 256 GigaBytes/sec bandwidth, internally, between the eDRAM itself and the ROPs/logic, within the 'daughter' die. The daughter die had 32 GB/sec bandwidth to the 'parent' shader core.
The large BW between Xenos' ROPs and the eDRAM was accounting for the 'multiplicity' at MSAA and zexel rate during the read-modify-write cycle. The rate at which individual pixels (not MSAA or zexel) were coming from the GPU was capped at 4GPix/s, *8 bytes/pixel max pixel weight = 32GB/s. If UGPU has 70GB/s, it could achieve the same read-modify-write rate (8 pixels @ each clock, but @ 550MHz) sans the MSAA and zexel factor.
 
Just to clarify: that's the triangle setup rate (aka trisetup) - the rate at which the rasterizer can process individual triangles and send them down to the interpolators. The significance of this rate is that no matter what the vertex/geometry/tessellation shaders do, they cannot produce more triangles than what the trisetup can handle. BUT that does not mean that those shading units always produce vertices and/or topology at this rate! IOW, the trisetup rate is merely a cap of the pipeline in its ability to handle triangles, not the rate in every given case - a particular case can be much lower than that.

Understood, kinda. So basically it's a theoretical max provided you're doing nothing but setting up triangles. Or can this rate actually be achieved realistically?
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Understood, kinda. So basically it's a theoretical max provided you're doing nothing but setting up triangles. Or can this rate actually be achieved realistically?
It can under some scenarios which normally involve minimal per-vertex/per-primitive work. It really depends how capable the (unified) shading units and the thread schedulers are.
 

Popstar

Member
Just to confuse things more, beginning with evergreen (Radeon 5000) dedicated interpolators have been removed. Interpolation is now done by the shader cores for extra control.

I also think a lot of the theoretical throughput figures were based on vertex transformation rate.
 

Hermii

Member
Yeah, the chip is pretty damn enigmatic. My latest (perhaps final) take on it is that it seems like there may indeed be only 160 shaders on there, but those shaders have been modified to the point that you can't rightly compare them to any Radeon. It truly is a custom beast.

If you are right and there is only 160 shaders, what would be the flop count ? 176 ? Sorry for idiot question, its probably hard to tell because of how custom it is.
 
I wasn't disregarding Fourth Storms comment. It actually made me wonder even more. Since the Wii U GPU is more modern than the other ones, could it not have similar features that allow more polygons than the hertz suggest?
Well, the tri-setup may probably be the same rate, but the modifications and enhancements could positively affect how much Latte can display in actuall games. Perhaps blu or Pop3 could elaborate on that point. :)
 
If you are right and there is only 160 shaders, what would be the flop count ? 176 ? Sorry for idiot question, its probably hard to tell because of how custom it is.

Yup, that would be correct. But if my hypothesis is true, there's more going on there. Here's a link to a post I made on beyond3d recently. The write up is a bit of a mess, but it basically explains where I am coming from with this.
 

Hermii

Member
Yup, that would be correct. But if my hypothesis is true, there's more going on there. Here's a link to a post I made on beyond3d recently. The write up is a bit of a mess, but it basically explains where I am coming from with this.

Its incredible the sacrifices Nintendo have done for backwards compatibility, low power consumption and high power efficiency. If they weren't so conservative about these things they could make a much more powerful machine for the same $.
 

Popstar

Member
Yup, that would be correct. But if my hypothesis is true, there's more going on there. Here's a link to a post I made on beyond3d recently. The write up is a bit of a mess, but it basically explains where I am coming from with this.
TEV trivia. The ADD function of the TEV calculated a*(1 - c) + b*c + d which is five floating point ops. If that was directly brought over into the shader cores for compatibility instead of 176 GFLOPS (550000000 * 160 * 2 MADD) you could calculate 440 GFLOPS (550000000 * 160 * 5 TEVADD).

You could make the figure even higher if you took into account the scale and bias that could be applied.

I'm sceptical however.
 

Popstar

Member
Actually, thinking about it, if Hollywood has a clockspeed of 243Mhz that would mean you'd need to be able to do any single clock TEV op within ~2.26 GPU7 ops... although you have more parallelism... I'll have to think about exactly what capabilities you need in GPU7 to have perfect emulation of Hollywood. It's late.
 
TEV trivia. The ADD function of the TEV calculated a*(1 - c) + b*c + d which is five floating point ops. If that was directly brought over into the shader cores for compatibility instead of 176 MFLOPS (550000 * 160 * 2 MADD) you could calculate 440 MFLOPS (550000 * 160 * 5 TEVADD).

You could make the figure even higher if you took into account the scale and bias that could be applied.

I'm sceptical however.
So Fourth Storm's Shader + TEV theory would push the GPU to a calculated higher FLOP count than it would be if it had 320 normal shaders. Very interesting, but I thought that they were sources that heavily implied that there were no fixed TEV units in Latte outside of Wii BC.
 

krizzx

Junior Member
TEV trivia. The ADD function of the TEV calculated a*(1 - c) + b*c + d which is five floating point ops. If that was directly brought over into the shader cores for compatibility instead of 176 MFLOPS (550000 * 160 * 2 MADD) you could calculate 440 MFLOPS (550000 * 160 * 5 TEVADD).

You could make the figure even higher if you took into account the scale and bias that could be applied.

I'm sceptical however.

It was always kind of a dream of mine that they would go with a really huge TEV in the next GPU.

Looking at what the Wii's TEV did for Mario Galaxy, Overloard, RE Darkside Chronicles and Zangeki no Reginleiv made me wonder just what type of feats would be possible with a double TEV or more horese power behind them. 160 shaders with the fixed function capabilities of the GC/Wii would allow shading beyond what any other modern GPU can do.

Would tessellation be possible on fixed function shaders? How exactly did Wind Waker achieve it on the GC?

I'm liking what I'm seeing so far. In my opinion, consoles should have custom hardware that is meant specifically for gaming and nothing else. Tweaked PC GPUs have seemed pointless addition. I want something that I can't get on my PC when I use a console, besides over hyped exclusive games that never fail to disappoint me.
 

HTupolev

Member
It was always kind of a dream of mine that they would go with a really huge TEV in the next GPU.
That really doesn't make much sense. What exactly would a "huge TEV" look like?

Would tessellation be possible on fixed function shaders? How exactly did Wind Waker achieve it on the GC?
Are you referring to this? I don't think it uses the word "tessellation" in the way you think it does.

Mathematically speaking, any use of polygons to represent a model as done in computer graphics involves "tessellation." The plane was represented in-engine by a substantial number of polygons, hence "tessellated plane."

Basically, Wind Waker probably doesn't use tessellation in the sense of running geometry through a tessellator to break it into more geometry.

160 shaders with the fixed function capabilities of the GC/Wii would allow shading beyond what any other modern GPU can do.
I'm trying to imagine a 160-shader device that could keep up with something like a Titan. And I'm seeing either a comically huge piece of silicon that tries to route everything to absurdly massive functions, or a slightly more reasonably-sized piece of silicon that's clocked at 8GHz with a 20V power supply and a constant stream of liquid nitrogen pouring over it to keep the magic smoke in.
 
Anybody else think that the camera effects in ZombiU were inspired by 28 Days later? (parts shot on a DSLr with a dirty lens)?

Also, do you think that the game would've benefited stylistically from a 16mm film grain running over the top of it?
 

Popstar

Member
So Fourth Storm's Shader + TEV theory would push the GPU to a calculated higher FLOP count than it would be if it had 320 normal shaders. Very interesting, but I thought that they were sources that heavily implied that there were no fixed TEV units in Latte outside of Wii BC.
Well that high figure is assuming the TEV ADD op can be executed in a single cycle.

Just to be clear, there is nothing that could be done in a TEV that couldn't be done in an unaltered R700 if you're just talking about the final result. If you need to match the latency and throughput of the TEVs for perfect emulation then things become trickier.

Does anyone know details on the Flipper/Hollywood as far as cycle counts and such for TEV instructions?
 

krizzx

Junior Member
That really doesn't make much sense. What exactly would a "huge TEV" look like?
Like any other, only with more pipelines and more capability.


Are you referring to this? I don't think it uses the word "tessellation" in the way you think it does.

Mathematically speaking, any use of polygons to represent a model as done in computer graphics involves "tessellation." The plane was represented in-engine by a substantial number of polygons, hence "tessellated plane."

Basically, Wind Waker probably doesn't use tessellation in the sense of running geometry through a tessellator to break it into more geometry.
I don't think you understand what I'm thinking about.

I said tessellation. I never said anything about tessellator.

I'm trying to imagine a 160-shader device that could keep up with something like a Titan. And I'm seeing either a comically huge piece of silicon that tries to route everything to absurdly massive functions, or a slightly more reasonably-sized piece of silicon that's clocked at 8GHz with a 20V power supply and a constant stream of liquid nitrogen pouring over it to keep the magic smoke in.
I do not understand what you are saying.
 

z0m3le

Banned
So, wouldn't 160ALUs cause issues with the polygon count? since it should produce 225million polygons which is less than half of 360s? wouldn't ports become far more tedious thanks to the GPU that we regularly hear isn't a problem for porting? and is suppose to be something like 1.5x Xenos (quote from Tekken?) been a long time since I kept up to date with this thread but obviously if we have moved on to 160 ALUs, these sorts of questions should be asked already right? Also if TEV is back on board with the GPU7, it probably is doing stuff that is very efficient in TEV hardware such as lighting.

I really do think there is some major problems with 160ALUs for Wii U, the biggest being the power efficiency, since we know they used a very power efficient node process, running 8-9 GFLOP per watt is comparable to 55nm process which makes very little sense, especially without the power saving node process used, the r740 reached 12GFLOPs per watt, and this is an embedded product which should mean it is easier to hit this efficiency. The 352 GFLOPs would break down to ~16 GFLOPs per watt at say 21-22 watts for the GPU. Seems far more reasonable than the flat 8 GFLOPs per watt that is currently being reached as a conclusion, remember it is also an MCM which would reduce power requirements even further.

http://en.wikipedia.org/wiki/Compar...essing_units#Radeon_R700_.28HD_4xxx.29_Series

HD 4650 "Mario" r730 (pro) 55nm /w 320 ALUs @600MHz = 384GFLOPs : 8 GFLOPs per watt /w 48w TDP. Using this core /w 160 ALUs @550MHz with a 40% reduction to move down to 40nm (estimation based on IBM) would make the Wii U's GPU use only 14 watts and give you ~12GFLOPs /w. However Wii U's GPU is likely using between 18 and 22watts, also this doesn't take into account the efficiency changes since 2008 on the 40nm process or the high efficiency node designed to draw lower power still. meaning Wii U's GPU at 176 GFLOPs would draw even less than 14watts.

Personally I think you guys have been staring at these pictures for too long and need to take a step back and look at all the questions you are trying to answer, not just shoehorn a theory into the biggest question and ignore the rest.
 
A

A More Normal Bird

Unconfirmed Member
I don't think you understand what I'm thinking about.

I said tessellation. I never said anything about tessellator.

So when you asked if tesselation would be possible on a fixed function unit like the TEV you weren't asking about the modern implementation of tesselation made available in DX11 that gamers and the industry are so excited about? What were you asking about?

You need to realise the TEV isn't anything special nowadays. Programmable shaders are the future and they aren't lacking any 'magic' or some such.

Popstar said:
Just to be clear, there is nothing that could be done in a TEV that couldn't be done in an unaltered R700 if you're just talking about the final result.
 

HTupolev

Member
I don't think you understand what I'm thinking about.

I said tessellation. I never said anything about tessellator.
Okay, I'll spell this out a bit differently.

Tessellation in the modern graphics sense (whether it's carried out on a tessellator or in software or by hand or whatever) refers to taking a polygon and breaking it up into more polygons.

The Legend of Zelda: The Wind Waker does not use tessellation in this sense of the word.

The language that led people to believe that Wind Waker had tessellation was a reference to a "tessellated water plane." However, this was misinterpreted. The reference to a "tessellated water plane" uses a mathematical definition of "tessellation" which basically refers to any representation of something with a mesh of polygons. The "tessellated water plane" is just a surface made up of a large number of polygons. There is no "tessellation" operation happening on the geometry. The water plane was not created by having a processor on the Gamecube take a plane and break it up in real-time. "Tessellated water plane" just refers to the fact that the water plane happens to be made up of a bunch of polygons.
 
Status
Not open for further replies.
Top Bottom