• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

WiiU "Latte" GPU Die Photo - GPU Feature Set And Power Analysis

Status
Not open for further replies.

krizzx

Junior Member
You don't seem to understand the difference between clock speed and bandwidth. I've pointed that out before.

And no, neither the GC's (which never had GDDR3 btw) nor the Wii's RAM were faster.



Did you somehow miss the graph summarizing the architecture like 4 posts above what you quoted? The split is only in regards to the CPU<->GPU bandwidth (and the CPU accesses the RAM via the GPU). The GPU<->RAM access is not affected by this.

Oh, my mistake with the GC. I forgot and mistook the 1T SRAM for GDDR3 because of how the 1tsram was clocked the same as the GDDR3 in the Wii. I've seen so many trying to say the Wii was just a GC with a higher clock that even I forgot that it wasn't when you actually looked under the hood. That still goes back to my other point though. There is much more to memory performance than memory type and clock. How do you determine the overall bandwid th of the Wii U's DDR3 RAM if not for clock speed? Wouldn't that require actual hardware stress testing.

Also, I thought the GPU and CPU being on the same die allowed equal connectivity to the rest of the hardware.
 

tipoo

Banned
Also, its gDDR3 RAM for the Wii U I thought. Have they found out what the lower case g means yet? I do not understand why people keep omitting it when discussing the Wii U's RAM.

The lower case g doesn't indicate increased bandwidth like uppercase GDDR, the lower case g just means it's lower voltage/power draw RAM. Bandwidth is the same. It's just oriented at mobile use, and the Wii U used that for some reason (power consumption).
 

tipoo

Banned
There is one other thing I don't understand that seems to contradict most of the slow RAM bottleneck theory. All of the Wii U ports of 360 and PS3 games have much faster loading times on the Wii U. Is that related to the RAM?


The Wii Us drive reads at 22MB/s. The PS3s reads at 9MB/s, and the 360s DVD drive was between the two. Game loads will likely be limited by the drive read speed, so there's your answer if that's true.
 

Schnozberry

Member
Understood. 360 definitely has a theoretical advantage then (if not an enormous one; definitely less than almost 2x or whatever the speeds say). Would be interesting to see how that would play out in real world conditions both in a vacuum, and considering the rest of the respective memory setups.

It seems based on what we've found out this week, that the Wii U memory subsystem was designed purposefully. It requires a different approach than the 360, but it's low latency and very high bandwidth if you're careful to avoid the penalties that come with changing bus direction. Based on the patent filing that was posted a few pages ago in this thread, it seems Nintendo has a custom memory controller that helps assist with managing this. How much developers were able to understand and leverage the differences in architecture with their launch ports is a question we don't have enough data to answer.

One thing is clear though, Nintendo made some mistakes in not having final hardware and the SDK ready to go well ahead of launch.
 
The split occurs with the GPU as well while it is reading and writing simultaneously. It can write or read full speed, but not at the same time. Also, based on developer commentary, real world bandwidth didn't reach theoretical peaks in either situation.
Well, of course if you are reading and writting at the same then the speeds can't reach the theorical maximum, but the same goes for the WiiU.

The fact is that Xbox 360's GDDR3 had a maximum bandwidth of 22.2GB/s, and the WiiU's DDR3 maximum bandwidth is 12.8 GB/s.
This and the CPU SIMD limited capacity could explain perfectly why those games had the bottlenecks they had on engines developed around PS360.
 

krizzx

Junior Member
It seems based on what we've found out this week, that the Wii U memory subsystem was designed purposefully. It requires a different approach than the 360, but it's low latency and very high bandwidth if you're careful to avoid the penalties that come with changing bus direction. Based on the patent filing that was posted a few pages ago in this thread, it seems Nintendo has a custom memory controller that helps assist with managing this. How much developers were able to understand and leverage the differences in architecture with their launch ports is a question we don't have enough data to answer.

One thing is clear though, Nintendo made some mistakes in not having final hardware and the SDK ready to go well ahead of launch.

Excellent. "It requires a different approach than the 360, but it's low latency and very high bandwidth." This is exactly what I thought was the case.
 

Schnozberry

Member
Well, of course if you are reading and writting at the same then the speeds can't reach the theorical maximum, but the same goes for the WiiU.

The fact is that Xbox 360's GDDR3 had a maximum bandwidth of 22.2GB/s, and the WiiU's DDR3 maximum bandwidth is 12.8 GB/s.
This and the CPU SIMD limited capacity could explain perfectly why those games had the bottlenecks they had on engines developed around PS360.

Yes, but that requires you to dismiss all the advantages of the Wii U's EDRAM and cache design. As Blu's statement above pointed out, there would need to be significant changes to the approach taken on PS3/360 titles to take advantage of the Wii U's EDRAM, as the 360 specifically cannot texture from EDRAM in the same manner. It had to write things out to main memory and read them back in order to display them. If this approach was used with the Wii U, you would be seriously crippling potential performance, due to the constant changes in bus direction.
 

OryoN

Member
From post in other (slower) thread:


Shiota:
The designers were already incredibly familiar with the Wii, so without getting hung up on the two machines' completely different structures, they came up with ideas we would never have thought of. There were times when you would usually just incorporate both the Wii U and Wii circuits, like 1+1. But instead of just adding like that, they adjusted the new parts added to Wii U so they could be used for Wii as well.

Could this be the reason why one of the shader cores (N4) is slightly larger than the rest? Maybe it can also perform TEV-like functions?
 

prag16

Banned
Yes, but that requires you to dismiss all the advantages of the Wii U's EDRAM and cache design. As Blu's statement above pointed out, there would need to be significant changes to the approach taken on PS3/360 titles to take advantage of the Wii U's EDRAM, as the 360 specifically cannot texture from EDRAM in the same manner. It had to write things out to main memory and read them back in order to display them. If this approach was used with the Wii U, you would be seriously crippling potential performance, due to the constant changes in bus direction.

Yeah, and who knows how much optimization was done by devs porting the launch games.

The haters would say "lots" and the fanboys would say "none". The real answer probably lies somewhere in the middle. I'm sure there will be improvements made going forward.
 

Schnozberry

Member
Yeah, and who knows how much optimization was done by devs porting the launch games.

The haters would say "lots" and the fanboys would say "none". The real answer probably lies somewhere in the middle. I'm sure there will be improvements made going forward.

We don't know. I don't think anybody can really call out developers. Launches are difficult deadlines to meet, let alone when you don't have full power hardware and a complete SDK until a few months from going gold.
 

Raist

Banned
The split occurs with the GPU as well while it is reading and writing simultaneously. It can write or read full speed, but not at the same time. Also, based on developer commentary, real world bandwidth didn't reach theoretical peaks in either situation.

And that's exactly the same in WiiU's case. RAM's max bandwidth is 12.8GB/s. It can't write AND read at that speed at the same time. Unless I missed something.
 

efyu_lemonardo

May I have a cookie?
I'm pretty much convinced now that my gut feeling was correct and D is Starbucks. Starlet had 128kB SRAM as TCM, so Starbucks needs to have those as well. And that seems to be exactly the amount located in this block. It might seem a bit too large for an ARM926, but there should be other stuff like the crypto engine in there.

that seems sensible. would that make X the audio DSP?
 

Schnozberry

Member
It was supported on ATI DX9 hardware starting with the X800 series I believe. It was then incorporated as a part of DX10. So not hacked on.

Ok, thanks. Google was pretty much useless in that regard. Maybe the guys that created Toki Tori had never used it prior, or is there some other texture compression that might be available in GX2? ASTC maybe? I know it's pretty new, but it's already present in OpenGL, and since it doesn't require a license it sounds like something Nintendo would use.
 
The Wii Us drive reads at 22MB/s. The PS3s reads at 9MB/s, and the 360s DVD drive was between the two. Game loads will likely be limited by the drive read speed, so there's your answer if that's true.

That's good for the wii u since it uses those optical discs that are 25gb, I don't think any wii u games as of now require an install. On the ps3 you had to install partitions of a bunch of games up until recently. For example god of war 3 was 35 gb in size and had no install, probably did the loading in the background by way of those in engine pre-rendered videos. Nintendo should incorporate something similar and effective as that maybe to reduce loading times on games.
 

wsippel

Banned
that seems sensible. would that make X the audio DSP?
I think X is the display controller and UVD, as it's right next to what we believe is the video output (the two high speed interfaces). I think Y might be the DSP. Or maybe the DSP and display controller are merged, and Y is the encoder for the GamePad?
 

Schnozberry

Member
I think Y is the display controller and UVD, as it's right next to what we believe is the video output (the two high speed interfaces). I think Y might be the DSP. Or maybe the DSP and display controller are merged, and Z is the encoder for the GamePad?

Do you mean X is the encoder? Also, is there any speculation on what V is? Tessellation unit maybe?
 

Popstar

Member
Ok, thanks. Google was pretty much useless in that regard. Maybe the guys that created Toki Tori had never used it prior, or is there some other texture compression that might be available in GX2? ASTC maybe? I know it's pretty new, but it's already present in OpenGL, and since it doesn't require a license it sounds like something Nintendo would use.
I don't think anything supports ASTC yet. There are some texture formats supported by the Wii that are unsupported other current hardware. Palette based textures being the most obvious. But I think there's a funky tile mode that could be considered an odd form of vector quantization. They might be there for backwards compatibility.
 

Schnozberry

Member
I don't think anything supports ASTC yet. There are some texture formats supported by the Wii that are unsupported other current hardware. Palette based textures being the most obvious. But I think there's a funky tile mode that could be considered an odd form of vector quantization. They might be there for backwards compatibility.

According to the dubiously reliable Wikipedia, ASTC is an official extension for OpenGL now. That's where the thought crossed my mind. A tile mode would be interesting. Are we talking PowerVR type tile rendering, or something entirely different?
 

Earendil

Member
When calculating the color value of a pixel on the screen, does the entire texture get stored in the GPU's memory? Or can the it simply request the part of the texture that it needs?

Here's an idea that could be a load of crap, and perhaps it's already done this way, but I'll put it forward anyway...

Start with a coordinate map that divides the texture into 64x64 pixel blocks. When the GPU needs that texture, use a hashtable to look up which part or parts of the texture it needs and then it only has to pull a small amount of the texture over the bus. That way you aren't wasting bandwidth pulling a texture and storing it, when you only need a tiny portion of it. This obviously wouldn't help with landscape textures, or character model textures. But it could be used in conjunction with occlusion culling to limit the memory use of non-essential textures.

I know this is done with sprites (hell I do it in web development all the time), so why not with textures in general?

I know absolutely nothing about engine design so this may be a bad idea, or a common practice that everyone and their brother already does. But I wanted to throw it out there.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
When calculating the color value of a pixel on the screen, does the entire texture get stored in the GPU's memory? Or can the it simply request the part of the texture that it needs?
Only portions of the texture that are used are fetched. That usually includes some level of granularity above individual texel, say, fetching is done in 32x32 tiles which then sit in GPU caches.

Here's an idea that could be a load of crap, and perhaps it's already done this way, but I'll put it forward anyway...

Start with a coordinate map that divides the texture into 64x64 pixel blocks. When the GPU needs that texture, use a hashtable to look up which part or parts of the texture it needs and then it only has to pull a small amount of the texture over the bus. That way you aren't wasting bandwidth pulling a texture and storing it, when you only need a tiny portion of it. This obviously wouldn't help with landscape textures, or character model textures. But it could be used in conjunction with occlusion culling to limit the memory use of non-essential textures.

I know this is done with sprites (hell I do it in web development all the time), so why not with textures in general?

I know absolutely nothing about engine design so this may be a bad idea, or a common practice that everyone and their brother already does. But I wanted to throw it out there.
That's not a crap idea - that's sort of how Megatexture works, only it works on a different level of the memory hierarchy and involves some interactions with the host.
 

MDX

Member
Is the following patent:

Graphics Processing System With Enhanced Memory Controller - Patent 8098255

related to the WiiU design?
In the drawings, the GPU has a:

Command processor
Transform unit
Setup/rasterizer
Texture unit
Texture environment unit
Pixel engine

http://www.docstoc.com/docs/11866235...Patent-8098255
http://www.google.com/patents?id=MS0...page&q&f=false

Nice post. I've no idea if it's correct but good detective skills nonetheless. Certainly chimes with a lot of what's being said by the smart people?

Well, I haven't seen anyone breakdown the patent yet.
I see a lot of comments about the RAM speeds, etc, but nobody is
talking about the solutions Nintendo probably has found, for example, like this patent,
that would allow them to go with just DDR3.

Why would Nintendo go with DDR3 when the Wii had GDDR3 like the 360,
unless they had a good reason?

It would also be good, if it hasn't been done yet, if somebody could place those
units (texture, pixel engine, etc) on the die.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Well, I haven't seen anyone breakdown the patent yet.
I see a lot of comments about the RAM speeds, etc, but nobody is
talking about the solutions Nintendo probably has found, for example, like this patent,
that would allow them to go with just DDR3.

Why would Nintendo go with DDR3 when the Wii had GDDR3 like the 360,
unless they had a good reason?


It would also be good, if it hasn't been done yet, if somebody could place those
units (texture, pixel engine, etc) on the die.
GDDR3 being phased out by manufacturers seems like a fairly good reason to me.
 

Earendil

Member
Only portions of the texture that are used are fetched. That usually includes some level of granularity above individual texel, say, fetching is done in 32x32 tiles which then sit in GPU caches.


That's not a crap idea - that's sort of how Megatexture works, only it works on a different level of the memory hierarchy and involves some interactions with the host.

Yay! I'm not an idiot!!

Thanks for the explanation.
 
As re the alpha-blending issue that Epic Mickey is supposed to demonstrate - that's just a hypothesis. The bottleneck, if in hw, does not have to necessarily be in the ROPs - it could be in trisetup just as well. Or the bottleneck might not be in the hw at all. Claiming a game demonstrates it without actually having analyzed the situation (read: profiled it and/or ran synthetic tests isolating all but the crucial aspects) is just not serious.

In case you are referring to me I never claimed Epic Mickey demonstrated anything other than the fact that in many games the WiiU drops frames when doing alpha textures. Epic Mickey merely drops harder and on demand. I never claimed the problem being caused by low bandwidth was anything more than a hypothesis.
 

Schnozberry

Member
Well, I haven't seen anyone breakdown the patent yet.
I see a lot of comments about the RAM speeds, etc, but nobody is
talking about the solutions Nintendo probably has found, for example, like this patent,
that would allow them to go with just DDR3.

Why would Nintendo go with DDR3 when the Wii had GDDR3 like the 360,
unless they had a good reason?

It would also be good, if it hasn't been done yet, if somebody could place those
units (texture, pixel engine, etc) on the die.

I read the patent document. I'm no low level hardware expert, but from what I could actually understand, the memory controller design in the Wii U is custom and unified in a way that attempts to make it easier to minimize the changes in bus direction that come at a stiff performance penalty to the architecture. Maybe someone who can actually interperet the finer details could give it a look.
 
In case you are referring to me I never claimed Epic Mickey demonstrated anything other than the fact that in many games the WiiU drops frames when doing alpha textures. Epic Mickey merely drops harder and on demand. I never claimed the problem being caused by low bandwidth was anything more than a hypothesis.

You keep throwing around the word hypothesis...a hypothesis is just an estimated guess. It's still a claim that you're making, as an explanation for the issues in those games
 
Maybe I missed it somewhere but I'm looking at the brightened image Ideaman posted and that N4 block is... strange. Aside from the size difference from the other blocks on the left side the SRAM seems to be closer together than the other N-blocks while on the right side of the block the SRAM has been spread farther apart.

Any theories on this or is it just a result of how the computers decided to lay out the chip?
 

Schnozberry

Member
Maybe I missed it somewhere but I'm looking at the brightened image Ideaman posted and that N4 block is... strange. Aside from the size difference from the other blocks on the left side the SRAM seems to be closer together than the other N-blocks while on the right side of the block the SRAM has been spread farther apart.

Any theories on this or is it just a result of how the computers decided to lay out the chip?

Could be extra logic for backwards compatibility. Otherwise I got nothin.
 
You keep throwing around the word hypothesis...a hypothesis is just an estimated guess. It's still a claim that you're making, as an explanation for the issues in those games

I keep throwing around hypothesis because people are treating it as if I am saying THIS IS DEFINITELY IT!!!! when I am clearly not.

If you go back to the very beginning of this stupid argument it goes something like this.
"B3D thinks the WiiU has problems with alpha textures due to low bandwidth"
Me->"Yea I agree with B3D's hypothesis. Loads of WiiU games has problems with alpha textures. Just look how bad it gets in Epic Mickey 2"

Response:
"OMG how dare you say that about WiiU! It's lazy devs"
"Jumping to conclusions"
"Your hypothesis is stupid"
"LOL when e3 comes you'll be sorry"
 
Could be extra logic for backwards compatibility. Otherwise I got nothin.
I saw that someone (maybe you) had that idea earlier in that the block houses the BC components but the layout of the entire chip itself is strange. I usually expect to see more symmetry in chip design but honestly I never spent much time looking at GPUs before so this is probably normal.
 

MDX

Member
GDDR3 being phased out by manufacturers seems like a fairly good reason to me.

I dont see how that would be a problem.
The Wii and 360 are still being made, and probably so
into 2014-5.

Nintendo could have gone with GDDR5, but as some have pointed out,
there is no need for GDDR5, if you are going to use eDRAM.
 
I dont see how that would be a problem.
The Wii and 360 are still being made, and probably so
into 2014-5.

Nintendo could have gone with GDDR5, but as some have pointed out,
there is no need for GDDR5, if you are going to use eDRAM.
Given the amount of eDRAM in the chip (I guess we still don't know what speed it is) would it still have been a better idea for Nintendo to use GDDR3 for the main pool?
 

AlStrong

Member
GDDR3 being phased out by manufacturers seems like a fairly good reason to me.

GDDR3 also never got past 1Gbit density. :p

But yeah, might as well take advantage of economies of scale. I imagine DDR3 production to far outweigh video card memory.
 

Donnie

Member
Just wondering about Marcan's claims about both pools of eDRAM on the Latte die being 1T-SRAM and being 32MB and 2MB in size.

The smaller pool of eDRAM takes up just over 10% as much space as the 32MB eDRAM. So if they're both 1T-SRAM then surely the size he claims for the smaller pool doesn't add up (10% of 32MB is 3.2MB obviously). Or am I missing something here?
 

Thraktor

Member
Okay folks, I seem to have spent most of my evening completely rewriting the OP, as I felt it was becoming a bit of a mess. As such I don't have time to go over my thoughts on some of the things brought up over the last few pages, which will have to wait until tomorrow.

I won't post the entire update here (go back to the first page to see it), but I will post this bit, which only really occurred to me while writing it:

It is worth considering what Wii U components may provide BC for Hollywood functions. A possible candidate for this is block J1. If the blocks J1-J4 are indeed texture unit bundles, then J1 would seem to have some difference to the other three, due to its slightly larger size. This would be explained if J1 had extra hardware to allow it to also function as the texture unit for Wii mode.

Does this seem to be a plausible hypothesis for people?

Anyway, I'm going to get some sleep. Let me know if there are any corrections I should make to the new OP.

Just wondering about Marcan's claims about both pools of eDRAM on the Latte die being 1T-SRAM and being 32MB and 2MB in size.

The smaller pool of eDRAM takes up just over 10% as much space as the 32MB eDRAM. So if they're both 1T-SRAM then surely the size he claims for the smaller pool doesn't add up (10% of 32MB is 3.2MB obviously). Or am I missing something here?

For a variety of reasons, I'm confident they're both eDRAM. I'll try to go into some detail why tomorrow.
 
*Comes out of his hole (again)*

I tend to forget some people take this stuff a lot more serious than I do.

First a thanks to Chipworks for going above and beyond for the picture and to blu, Durante, Fourth Storm, Thraktor, and wsippel for the work they did. Shinjohn let me know that the picture had been obtained and sent me a link, but I also checked out the thread. I wanted to come back and help with the confusion and what not.

As some of you know getting info about the hardware was a pain because what Nintendo released essentially boiled down to a features list. And by that I mean general features of a modern GPU that could easily be looked up. Info that dealt with performance apparently was not given out leaving devs to figure have to figure it out on their own. I had two working ideas of the GPU based on a more traditional design (which I was hoping for) and a non-traditional design. I see that some of you actually remembered the non-traditional idea. Wsippel and I would compare notes on whatever info we could come up with. Some of those notes led us to come up with how it may look if Nintendo took the non-traditional route.

http://www.neogaf.com/forum/showpost.php?p=36485259&postcount=12053

In this post you’ll see both wsippel’s take and my take. I’m going to address some things in that post because I know some of you will try to take them out of context. First you’ll see wsippel’s baseline ended up being more accurate than mine. When I talked about the potential performance of 1TF or more that was in comparison to the R700 series because new GPUs are more efficient than that line, a higher baseline, and my idea focused on the dedicated silicon handling other performance tasks.

So what was the basis for the non-traditional view? I shared two of those bits of info before.

http://www.neogaf.com/forum/showpost.php?p=41883633&postcount=6136

Well, I can't reveal too much. The performance target is still more or less the same as the last review from around E3. Now it's more balanced and "2012" now that it's nearer to complete and now AMD is providing proper stuff. As far as specs, I don't see any big change for better or worse, other than said cost/performance balance tweaks... It won't make a significant difference to the end user. As far as the kit goes, it's almost like what MS went through. Except more Japanese-ish... If you know what I mean.

http://www.neogaf.com/forum/showpost.php?p=41901585&postcount=6305

Anyway, things are shaping up now with the new year. There was some anxiety with some less close third parties about what they were doing with GPU side, whether things were going to be left in the past... but it looks more modern now. You know, there simply wasn't actual U GPU data in third party hands this time last year, just the target range and R700 reference GPU for porting 360 titles to the new cafe control. Maybe now they finally can get to start debugging of the specifics and start showing a difference...

Here is one more specific piece that I didn’t fully share.

I can't confirm or deny, sorry. The cat is very confidential and I repeat non-final. The target, last checked, is triple core with XX eDram and exclusive Nintendo instructions. 1080/30 capable Radeon HD w/tess. and exclusive Nintendo patented features. On a nice, tight bus that MS wishes they had on 360. ;)

I appreciate the individual for sharing as much as he did. He was a little paranoid though (I can understand) and at one point thought I was leaking info on a messageboard under a different name, but wouldn’t tell me the board or the username, lol.

I’m sure some of you remember me talking about games being 720p. It’s because with this I knew devs would use those resources for 720p development. I’m sure some of you also remember me mentioning the bus. The key thing in this is the “Nintendo patented features”. In the context of things we talked about, it seemed to me these were going to be hardwired features. What is certain for now is that the die shot shows a design that is not traditional, fewer ALUs (in number) from where things supposedly started with the first kit, and GPU logic that is unaccounted for. I’ve seen some saying fixed functions, but that’s too specific to be accurate right now. Dedicated silicon would be a better alternative to use, though I say that as a suggestion. In my opinion I think lighting is a part of this. The Zelda and Bird demos emphasized this. Also in the past it was discussed how Nintendo likes predictability of performance. It would also suggest Nintendo wasn’t ready to embrace a “fully” programmable GPU and kept on the water wings when jumping in the pool.

I did what I could to get as much info on the hardware as possible since Nintendo was giving out so little. From there I gave the best speculation I could based on that info. As of today, I still stand by the evaluations I made about Wii U’s potential performance from all the info I could gather. And until Nintendo’s games show otherwise I’ll continue to stand by them because in the end it’s on Nintendo show what Wii U is capable of.

And if you think I deserve flak for what I’ve said in the past then I’m here, but you’re wasting your time trying because my view hasn’t changed yet.

I made the farewell post to hold myself accountable to avoid posting, but I haven’t done well sticking to that, haha. I wasn’t going to make this post, but since I was one of the primary ones gathering info it’s unfair to you guys to leave things as they were.
 

Darryl

Banned
the zelda demo in particular had a large emphasis on lighting. the 'day/night' switch seemed designed entirely to highlight the lighting change and that was a big part of the demo. i can't recall the bird demo right now. from what i've noticed lately, lighting is one of the easiest to utilize effects that truly make a game feeling modern. i can see them pushing good lighting in just about every mainline nintendo game.
 

Donnie

Member
Confident because 1T-SRAM doesn't have NEARLY enough bandwidth to either compensate for the low bandwidth of the MEM 2 pool OR work as a framebuffer?

1T-SRAM has no particular bandwidth limitations in of itself. Its bandwidth would only be limited by the bus used and the GPU clock speed, like any other embedded memory.
 
Status
Not open for further replies.
Top Bottom