• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

WiiU "Latte" GPU Die Photo - GPU Feature Set And Power Analysis

Status
Not open for further replies.

krizzx

Junior Member
Everyday.
That salt must taste delicious.

I was pondering that as well, surely they don't mean the whole custom OpenGL library is just dumped on there? Where does the library sit on other consoles?

I took that as being a reserved spot for the libraries to be loaded directly into RAM every time without need for other management. I didn't think it was odd. That would be one less burden on the programmer. Automated memory management.


MEM1 is reserved for graphics libraries (although over the time the size of MEM1 will be decreased for graphics libraries). Therefore, applications cannot use MEM1 directly.
 

OryoN

Member
Could it be that inaccessible Mem1 was only the case in older dev kits? Perhaps why Criterion talked so much about "hardware was always capable, but the tools to exploit it was not initially available"[/paraphrase]?
 
I was pondering that as well, surely they don't mean the whole custom OpenGL library is just dumped on there? Where does the library sit on other consoles?

In my limited programming experience, the libraries are exactly what the name implies - a reference for all the various objects and whatnot to call on so that you don't have to reinvent the wheel every time you issue a command. Basically prewritten routines for the most common functions. (I'm probably mixing up terms and know someone can explain that much better :p). I'm wondering how large they actually are and why they need to be in MEM1. Perhaps there is actually a large advantage in having these libraries in a low latency/high bandwidth position, since they are being referenced so often. Still seems like a hell of a way to blow through that precious 32MB, though.
 
I didn't say "bandwith". I say performance. That means all things considered.

Other aspects people are not factoring into the bandwidth are the possibility of it being accessed in a dual channel fashion (it is 2X512 after all) and the latency.

I believe blu replied to one of your posts and explained quite succinctly why the bandwidth figures we have for MEM2 are accurate. I'm beginning to wonder if you are, in fact, trolling us.
 

tipoo

Banned
In my limited programming experience, the libraries are exactly what the name implies - a reference for all the various objects and whatnot to call on so that you don't have to reinvent the wheel every time you issue a command. Basically prewritten routines for the most common functions. (I'm probably mixing up terms and know someone can explain that much better :p). I'm wondering how large they actually are and why they need to be in MEM1. Perhaps there is actually a large advantage in having these libraries in a low latency/high bandwidth position, since they are being referenced so often. Still seems like a hell of a way to blow through that precious 32MB, though.

I think, from this, that all the OpenGL function calls exist in the main memory of PCs, thus it would seem like it would have to be fairly bandwidth and latency insensitive for GPU interaction not to be slowed down by it.






This question is almost impossible to answer because OpenGL by itself is just a front end API, and as long as an implementations adheres to the specification and the outcome conforms to this it can be done any way you like.

The question may have been: How does an OpenGL driver work on the lowest level. Now this is again impossible to answer in general, as a driver is closely tied to some piece of hardware, which may again do things however the developer designed it.

So the question should have been: "How does it look on average behind the scenes of OpenGL and the graphics system?". Let's look at this from the bottom up:

At the lowest level there's some graphics device. Nowadays these are GPUs which provide a set of registers controlling their operation (which registers exactly is device dependent) have some program memory for shaders, bulk memory for input data (vertices, textures, etc.) and an I/O channel to the rest of the system over which it recieves/sends data and command streams.

The graphics driver keeps track of the GPUs state and all the resources application programs that make use of the GPU. Also it is responsible for conversion or any other processing the data sent by applications (convert textures into the pixelformat supported by the GPU, compile shaders in the machine code of the GPU). Furthermore it provides some abstract, driver dependent interface to application programs.

Then there's the driver dependent OpenGL client library/driver. On Windows this gets loaded by proxy through opengl32.dll, on Unix systems this resides in two places:
X11 GLX module and driver dependent GLX driver
and /usr/lib/libGL.so may contain some driver dependent stuff for direct rendering

On MacOS X this happens to be the "OpenGL Framework".

It is this part that translates OpenGL calls how you do it into calls to the driver specific functions in the part of the driver described in (2).

Finally the actual OpenGL API library, opengl32.dll in Windows, and on Unix /usr/lib/libGL.so; this mostly just passes down the commands to the OpenGL implementation proper.

How the actual communication happens can not be generalized:

In Unix the 3<->4 connection may happen either over Sockets (yes, it may, and does go over network if you want to) or through Shared Memory. In Windows the interface library and the driver client are both loaded into the process address space, so that's no so much communication but simple function calls and variable/pointer passing. In MacOS X this is similar to Windows, only that there's no separation between OpenGL interface and driver client (that's the reason why MacOS X is so slow to keep up with new OpenGL versions, it always requires a full operating system upgrade to deliver the new framework).

Communication betwen 3<->2 may go through ioctl, read/write, or through mapping some memory into process address space and configuring the MMU to trigger some driver code whenever changes to that memory are done. This is quite similar on any operating system since you always have to cross the kernel/userland boundary: Ultimately you go through some syscall.

Communication between system and GPU happen through the periphial bus and the access methods it defines, so PCI, AGP, PCI-E, etc, which work through Port-I/O, Memory Mapped I/O, DMA, IRQs.

http://stackoverflow.com/questions/6399676/how-does-opengl-work-at-the-lowest-level
 

krizzx

Junior Member
I believe blu replied to one of your posts and explained quite succinctly why the bandwidth figures we have for MEM2 are accurate. I'm beginning to wonder if you are, in fact, trolling us.


I never disagreed with blu or made any claims that the bandwidth figures for MEM2 were inacurrate. I was just saying that the possibility dual channel access should not be entirely ruled out this early on.

We have 2 sets of RAM. 2x512MB DDR3 for games and 2x512MB of gDDR3 for the OS. I was suggesting that 2 identical chips for the unreserved ram could be accessed simultaneously, "effectively" doubling the bandwidth, not actually changing the bandwidth rating.

tippo cleared that up before you even made this post though. Its mute at this pont.
 

tipoo

Banned
I never disagreed with blu or made any claims that the bandwidth figures for MEM2 were inacurrate. I was just saying that the possibility dual channel should not be entierly ruled out this early on.

Yes, it should. I said why above.
Edit: Nevermind, I see your edit.
 

ozfunghi

Member
Yes, I've accounted for that - 1GB is set aside for the app. From there on, the system areas mapped into the app address space total to ~700MB. Which means that the system-reserved portions not mapped into app space are ~300MB. Basically naively put, OS + system buffers = 300MB.

Forgive my ignorance... how does this relate to the 2GB of total memory? OS only uses 300MB?
 
That salt must taste delicious.
I simply responded to a question.

In my limited programming experience, the libraries are exactly what the name implies - a reference for all the various objects and whatnot to call on so that you don't have to reinvent the wheel every time you issue a command. Basically prewritten routines for the most common functions. (I'm probably mixing up terms and know someone can explain that much better :p). I'm wondering how large they actually are and why they need to be in MEM1. Perhaps there is actually a large advantage in having these libraries in a low latency/high bandwidth position, since they are being referenced so often. Still seems like a hell of a way to blow through that precious 32MB, though.

I feel like that isn't right. As you said, it is blowing through very valuable eSRAM. Do we know what the other embedded ram pools do? Certainly they aren't just leaving 1-3mb for rendering... seems silly.
 

krizzx

Junior Member
Yes, it should. I said why above.

I know, I responded to it. He is requoting the exact statement you replied to. Its redundant. Also, i've never at any point in this thread tried to discount the bandiwth as 12 GB/s. I was suggesting ways in which "performance" may be higher than the bandwidth suggests.

It was just a logical suggestion. Nothing more. I don't know why Fourth Storm is trying to blow it up into some trolling attempt. Recently, he's been appearing hell bent on making issues with what I say in a very passive aggressive manner. He seems to have some kind of chip his on shoulder regarding me.

I have no interest in fighting with forum members. I'm here for the tech.
 
I feel like that isn't right. As you said, it is blowing through very valuable eSRAM. Do we know what the other embedded ram pools do? Certainly they aren't just leaving 1-3mb for rendering... seems silly.

It is certainly bewildering. Who knows how old this documentation actually is. Perhaps some of MEM1 has already opened up. But it also says that Wii U does not allow uncached access, so my wild guess is that there is some automatic caching of data into MEM0 for all calls to the DDR3. This would definitely help speed things up. A very complex memory subsystem indeed, but seemingly rigidly controlled...

Edit: Also interesting that MEM0 is the same size as the L2 on Espresso. Since the GPU functions as the NB, perhaps MEM0 is there to help that data along its way, so to speak.
 
It is certainly bewildering. Who knows how old this documentation actually is. Perhaps some of MEM1 has already opened up. But it also says that Wii U does not allow uncached access, so my wild guess is that there is some automatic caching of data into MEM0 for all calls to the DDR3. This would definitely help speed things up. A very complex memory subsystem indeed, but seemingly rigidly controlled...

Edit: Also interesting that MEM0 is the same size as the L2 on Espresso. Since the GPU functions as the NB, perhaps MEM0 is there to help that data along its way, so to speak.

It just seems so crazy that they put that much embedded ram on the chip, only to be used for libraries.
 
It just seems so crazy that they put that much embedded ram on the chip, only to be used for libraries.

Seems like a gigantic waste, I agree. Kind of like the 1 GB of RAM used for system applications that run like molasses.

Unless Nintendo are intentionally gimping the system so that there is room for improvement in each wave of titles. Insane, but perhaps true. And as we can see, even without access to MEM1, Wii U is holding its own against current gen, so you could argue that the 32MB really isn't necessary this early in the console's life. Iwata did claim that only 50% of Wii U's power was being utilized - a vague general statement, but this could in part explain his reasoning for saying that.
 
Seems like a gigantic waste, I agree. Kind of like the 1 GB of RAM used for system applications that run like molasses.

Unless Nintendo are intentionally gimping the system so that there is room for improvement in each wave of titles. Insane, but perhaps true. And as we can see, even without access to MEM1, Wii U is holding its own against current gen, so you could argue that the 32MB really isn't necessary this early in the console's life. Iwata did claim that only 50% of Wii U's power was being utilized - a vague general statement, but this could in part explain his reasoning for saying that.

They should have gone x86 like the other two. Going forward with PPC and highly custom expensive hardware is only going to give them trouble.
 

prag16

Banned
My two cents: it doesn't look better than mario galaxy, except maybe for the higher res ground textures. It looks unfinished.

Eh, it's tough. With all the fast-moving action, the compression artifacts are fucking awful in both the IGN and youtube feeds. Hard to judge off of that.

Parts of it did kind of look like CG; if the entire video was 100% real time and representative of what gameplay will look like, it's fairly impressive. During what was obviously gameplay, for the most part it moved so fast all the detail was lost due to compression. Jury's completely out right now, imo.
 
Eh, it's tough. With all the fast-moving action, the compression artifacts are fucking awful in both the IGN and youtube feeds. Hard to judge off of that.

Parts of it did kind of look like CG; if the entire video was 100% real time and representative of what gameplay will look like, it's fairly impressive. During what was obviously gameplay, for the most part it moved so fast all the detail was lost due to compression. Jury's completely out right now, imo.

The parts where it showed those.. monster things, that was clearly pre-rendered, but the actual gameplay is easily above Galaxy, graphically.

The amount of sprites just for grass alone puts it above Galaxy. I don't think they were capable of doing all that on the Wii. Then again, xenoblade used a lot of it. Anyway, they models seem high poly too.
 

krizzx

Junior Member
I like where this is going. Its using the old aesthetic like it used to with the orange and brown blocks. I hope its mostly 3D movement. I was never fond of Hedgehog engine stiffness in the 2D plane.

20130528124454greenshot.png
 
The PS4 is a generational leap over the Wii U, the X1 is like 75% of a leap over the Wii U. Your comparison isn't accurate.
M2 was always the better comparison anyway.

The M2 could crunch marginally higher poly counts than the PS1 and N64, but generally improved on that era in texturing and featureset. WiiU crunches polygons within PS3/360 levels while featuring the potential for better texturing and a more modern featureset.

Neither are close to the potential of later releasing systems. But that does mean vastly different things now than it did then.
 

krizzx

Junior Member
M2 was always the better comparison anyway.

The M2 could crunch marginally higher poly counts than the PS1 and N64, but generally improved on that era in texturing and featureset. WiiU crunches polygons within PS3/360 levels while featuring the potential for better texturing and a more modern featureset.

Neither are close to the potential of later releasing systems. But that does mean vastly different things now than it did then.

How do you know this? From what I've been seeing, Latte is likely using a dual graphics engine setup as there are five duplicate components on the chip.

This would make its peak polygon performance around double the last gen consoles and half the other 2 next gen.
 
The parts where it showed those.. monster things, that was clearly pre-rendered, but the actual gameplay is easily above Galaxy, graphically.

The amount of sprites just for grass alone puts it above Galaxy. I don't think they were capable of doing all that on the Wii. Then again, xenoblade used a lot of it. Anyway, they models seem high poly too.

Yes, the grass is what stood out for me, but beside that I don't see how it looks better than an upressed mario galaxy

Edit: there was a pic of SMG2 on dolphin here but now it disappeared. weird

But maybe I'm just being blind. I look forward to see what we can extrapolate from that video.
 
How do you know this? From what I've been seeing, the Latte is likely using a dual graphics engine setup as there are five duplicate components on the chip. This would make its polygon performance around double the last gen consoles and half the other 2 next gen.
I don't.

I'm just going by conjecture we've heard. "Modern featureset and PS3/360 grunt." And we know it has more available memory.
 

joesiv

Member
A trailer has been released for Sonic Lost World. Those graphcis look nice. The game play reminds me of the old unreleased Sonic X footage.
http://www.ign.com/videos/2013/05/28/sonic-lost-worlds-debut-trailer

Let the analysis begin.

Are those intermediate sequence CG or ingame?

Definitely CG. There is a particular part near the end where it transitioned from in-game where the grass was comprised of large planes, and then the CG immediately after was individual blades of grass.
 
How do you know this? From what I've been seeing, Latte is likely using a dual graphics engine setup as there are five duplicate components on the chip.

This would make its peak polygon performance around double the last gen consoles and half the other 2 next gen.
It was an analogy. This time, the newer featureset will include tessellation and that will bring much higher polygon counts, but the analogy is still valid:
Dreamcast was much better than both N64 and PS not because it could draw much more polygons per second or push much higher numbers than those systems but because it had a much more modern featureset.

Featureset -> Numbers, that's what is important.
 

prag16

Banned
That was stated towards the shader efficiency, though, not the peak polygon performance.

The GPU apparently has two Rasterizers.

I'm not sure I'd characterize that as "apparent" at this juncture. "Possible" is a better starting point, and may even been too optimistic. But either way it's all still speculation..
 

krizzx

Junior Member
It was an analogy. This time, the newer featureset will include tessellation and that will bring much higher polygon counts, but the analogy is still valid:
Dreamcast was much better than both N64 and PS not because it could draw much more polygons per second or push much higher numbers than those systems but because it had a much more modern featureset.

Featureset -> Numbers, that's what is important.

I was speaking purely in regard to polygon performance, not tessellation. 2 Rasterizers, 2 Geometry Assemblers, 2 Vertex Assembles = double the polygon output.

I'm not sure I'd characterize that as "apparent" at this juncture. "Possible" is a better starting point, and may even been too optimistic. But either way it's all still speculation..

True, it is not absolutley confirmed. That is why I said "likely" and "apparently". Its seeming more likely than any other suggestion, so I am leaning toward it.

We do know that there are 5 duplicate comoments We also know that AMD GPU with dual graphics engines have exactly 5 duplicate compoents.
 
How do you know this? From what I've been seeing, Latte is likely using a dual graphics engine setup as there are five duplicate components on the chip.

This would make its peak polygon performance around double the last gen consoles and half the other 2 next gen.

There is no dual graphics engine. It's fiction. I realize nothing can convince you, though. But for the sake of others who read this thread and haven't followed the entire discussion, I feel the need to state this.
 

tipoo

Banned
Seems like a gigantic waste, I agree. Kind of like the 1 GB of RAM used for system applications that run like molasses.

Unless Nintendo are intentionally gimping the system so that there is room for improvement in each wave of titles. Insane, but perhaps true. And as we can see, even without access to MEM1, Wii U is holding its own against current gen, so you could argue that the 32MB really isn't necessary this early in the console's life. Iwata did claim that only 50% of Wii U's power was being utilized - a vague general statement, but this could in part explain his reasoning for saying that.

What would they be holding out on though? I can't understand that strategy at all, they'll want a strong install base and momentum by the time the PS4 and One are out.


Plus I don't recall stationary hardware ever being deliberately gimped and freed up like that, sure there's speed enhancements and OS shrinks but I mean deliberately locking a hardware feature?

I guess there's one way I could mentally justify that, if they started showing some incrediballs games just as the PS4/One were gearing up to go, but that would be such a kick to the pants of third party devs.
 

krizzx

Junior Member
There is no dual graphics engine. It's fiction. I realize nothing can convince you, though. But for the sake of others who read this thread and haven't followed the entire discussion, I feel the need to state this.

Dude, will you stop harassing me and trying to vilify my statements. Unlike you, I am not beyond reason.

I stated why I have drawn this conclusion and provided facts to support it. If someone provides data that explains why this is not likely, and it seems more plausible then I will accept that instead. As it stands, the facts are just as I stated above. There are 5 duplicate components on Latte and the AMD GPUs with dual graphics engine sets also have 5 duplicate components. If that is not an indicator of a dual graphics engine setup then please, enlighten me to what it is then, since you claim to know far better than I.

My conclusions are based on logic, reasoning, and facts.

I am not here to fight with you. I am not here for fanboyish arguments. I am not here to debate the business practices of Nintendo.
 
I was speaking purely in regard to polygon performance, not tessellation. 2 Rasterizers, 2 Geometry Assemblers, 2 Vertex Assembles = double the polygon output.
Yes, and this is why I talked about tessellation. The WiiU is a 352GFlop card at best, it will never reach the 550 million polygon/second limit. Even my HD5870 card, which has 1+ Teraflops, has only a single rasterizer and can draw up to 800 million polygon at max.

This is why tessellation was invented. To have a huge amount of polygons from the get go meant that you needed tons of flops to transform and manipulate those polygons.
If you have a competent tessellation unit, on the other hand, you can perform all the vertex operations at a low cost and then increase the polycount through the tessellator.

If the WiiU is confirmed to have such an architecture (and I think it's quite plausible given the amount of repeated blocks and what you said about them) then tessellation is the only thing that makes sense on a design like that.
 

krizzx

Junior Member
Yes, and this is why I talked about tessellation. The WiiU is a 352GFlop card at best, it will never reach the 550 million polygon/second limit. Even my HD5870 card, which has 1+ Teraflops, has only a single rasterizer and can draw up to 800 million polygon at max.

This is why tessellation was invented. To have a huge amount of polygons from the get go meant that you needed tons of flops to transform and manipulate those polygons.
If you have a competent tessellation unit, on the other hand, you can perform all the vertex operations at a low cost and then increase the polycount through the tessellator.

If the WiiU is confirmed to have such an architecture (and I think it's very plausible given the amount of repeated blocks) then tessellation is the only think that makes sense on a design like that.

Okay. i see where you coming from.

Though, now I am confused about how you are calculating polygon output again. Zomie said it was equal to the Mhz.

The reason this confuses me are the 360/PS3.
The 360 GPU is clocked at 500 Mhz and has 240 ALUs buts its polygon performnace is rated at 500million polygon per second.
The PS3 GPU is rated at 550 Mhz and its polygon performance is listed as 333million polygons per second. I can't find its shader count though.

The Wii U GPU at is 550 Mhz and has most has 320 ALUs but with apparently duplicate components on the GPU.

Can someone clear this up for me?
 
Dude, will you stop harassing me and trying to vilify my statements. Unlike you, I am not beyond reason.

I stated why I have drawn this conclusion and provided facts to support it. If someone provides data that explains why this is not likely, and it seems more plausible then I will accept that instead. As it stands, the facts are just as I stated above. There are 5 duplicate components on Latte and the AMD GPUs with dual graphics engine sets also have 5 duplicate components. If that is not an indicator of a dual graphics engine setup then please, enlighten me to what it is then.

My conclusions are based on logic, reasoning, and facts.

I am not here to fight with you. I am not here for fanboyish arguments.

Who is making fanboyish arguments? I realize we are all speculating here but your assesments are just way off base. Sorry if some of my replies have been blunt, but it's the truth. I have tried to point out some reasons why your conclusions are extremely unlikely in the past, yet you seem to ignore any facts which aren't alligned with your very generous portrait of the system components. I know I am not the only poster who sees this and have tried to just let it be, but I feel it necessary to chime in from time to time when you appear to be misleading people who just stop in for some quick info.
 

Absinthe

Member
Who is making fanboyish arguments? I realize we are all speculating here but your assesments are just way off base. Sorry if some of my replies have been blunt, but it's the truth. I have tried to point out some reasons why your conclusions are extremely unlikely in the past, yet you seem to ignore any facts which aren't alligned with your very generous portrait of the system components. I know I am not the only poster who sees this and have tried to just let it be, but I feel it necessary to chime in from time to time when you appear to be misleading people who just stop in for some quick info.

For those who are just stopping in, could you quickly clarify once more why you believe "5 duplicate components" do not point to a dual graphics engine?

Edit: Is this where the dual graphics engine idea gained steam? http://m.neogaf.com/showpost.php?p=57899636
 

Blades64

Banned
From an outsider looking in, krizz, I think you need to take a break or something and relax a little. It seems you're getting worked up bro.
 

krizzx

Junior Member
Who is making fanboyish arguments? I realize we are all speculating here but your assesments are just way off base. Sorry if some of my replies have been blunt, but it's the truth. I have tried to point out some reasons why your conclusions are extremely unlikely in the past, yet you seem to ignore any facts which aren't alligned with your very generous portrait of the system components. I know I am not the only poster who sees this and have tried to just let it be, but I feel it necessary to chime in from time to time when you appear to be misleading people who just stop in for some quick info.

I do not ignore facts. I have taken every reasonable argument brought about into consideration.

What I am saying is what I have concluded at the end of the day after analyzing all of the facts presented by everyone.

Your exploitations have been duly noted. I have taken what you, Zomie, blu and BGAssasin have said an analyzed it myself.

Bg correlated direct design similarities between Brazos and Latte and provided direct visual proof to support his. Logically it would follow that manufacturers will reuse components in chip design. I see no reason to discount this at this point. I have seen arguments brought against it, but none that outweigh it.

Then there are still the two facts that we can confirm.

Latte has 5 duplicate components near the base of the GPU and that it is a custom design. From there, I did research in order to look for an explanation.

I came across this.

It was 2011 tech so it was in line with the Wii U announcement. We know the chip is custom made and that is final form is not the same as the form it had annoucement.

I'm suggesting that the Wii U may be using a custom dual graphics engine design in conjunction with the RV700 tech. A hybrid chip of sorts. I have seen no better explanation provided for the 5 duplicate components.

It was same before when I suggest it could be an HD5550 based chip. Thats what the facts were leaning to at that point. Evidence came out that greatly discounted that, so I dropped the claim as it no long seemed plausible.
 

tipoo

Banned
I think I need a refresher on how the 6900 dual graphics engine worked, all I can find on it is AMD saying they "help keep the GPU well-fed with data". Was this carried forth with GCN? The 6900s had the same essential makeup as the Cypress-powered Radeon HD 5800s, but the feature did add a bit of performance.

http://anandtech.com/bench/Product/587?vs=509

EDIT: Actually the performance may have come from a few extra shaders, the dual graphics engine seemed to be more to do with not killing visual performance while doing tesselation: |
http://www.techradar.com/reviews/pc...hics-cards/amd-radeon-hd-6970-915716/review/2

Not endorsing this theory, just wondering what it would do.
 
For those who are just stopping in, could you quickly clarify once more why you believe "5 duplicate components" do not point to a dual graphics engine?

Sure, for one, the components that make up a "setup engine" needn't each occupy an entire block. In Brazos, we have vertex setup, geometry setup, and tesselator on one block. The rasterizer is a separate block. In all documentation I've read, the tesselator is listed as being within the vertex setup block. Meanwhile, HiZ is listed as a function performed by the Rasterizer and depth buffer (one of the ROPs).

Common sense also dictates that a dual setup engine would not make sense in a GPU the size of Latte. The only GPUs they are found in are ~2TFLOP behemoths. In short, they are necessary to keep that many ALUs fed.

Finally, after doing an extensive comparison, I am near certain that 3 of the duplicated blocks may be identified as TMUs, L1 texture cache, and ROPs. The other two may be memory controllers and L2 texture cache, but I am not quite as sure on those two. Basically, these are things which are found in various multiples in all Radeon cards and should have been the first thing we turned to in order to explain the duplicate blocks. But since the floor plan of Latte is so different from what we are used to seeing, it was not as immediately apparent.
 

MDX

Member
I'm suggesting that the Wii U may be using a custom dual graphics engine design in conjunction with the RV700 tech. A hybrid chip of sorts. I have seen no better explanation provided for the 5 duplicate components.


And developers have only access to half of it???
 

krizzx

Junior Member
Sure, for one, the components that make up a "setup engine" needn't even occupy an entire block. In Brazos, we have vertex setup, geometry setup, and tesselator on one block. The rasterizer is a separate block. In all documentation I've read, the tesselator is listed as being within the vertex setup block. Meanwhile, HiZ is listed as a function performed by the Rasterizer and depth buffer (one of the ROPs).

Common sense also dictates that a dual setup engine would not make sense in a GPU the size of Latte. The only GPUs they are found in are ~2TFLOP behemoths. In short, they are necessary to keep that many ALUs fed.

Finally, after doing an extensive comparison, I am near certain that 3 of the duplicated blocks may be identified as TMUs, L1 texture cache, and ROPs. The other two may be memory controllers and L2 texture cache, but I am not quite as sure on those two. Basically, these are things which are duplicated in all Radeon cards and should have been the first thing we turned to in order to explain the duplicate blocks. But since the floor plan of Latte is so different from what we are used to saying, it was not as immediately apparent.

If 2 of the dual components are TMUs, then are what are block J1-4/N1-4?

I'm not saying it one of those actually chips. I'm simply suggest that it could be borrowing the setup and using its on a smaller scale. Though if you what you are saying is true, then I need to do some more research into this. I will shoot BG a message as well, because I'm mostly following his analysis.

And developers have only access to half of it???
I wouldn't say that. I would say that they just not fully utilizing it with most games being ports. I was also presenting it to explain the huge geometry increase observed in some of the first party games when compared to last gen games.
 

Absinthe

Member
Sure, for one, the components that make up a "setup engine" needn't each occupy an entire block. In Brazos, we have vertex setup, geometry setup, and tesselator on one block. The rasterizer is a separate block. In all documentation I've read, the tesselator is listed as being within the vertex setup block. Meanwhile, HiZ is listed as a function performed by the Rasterizer and depth buffer (one of the ROPs).

Common sense also dictates that a dual setup engine would not make sense in a GPU the size of Latte. The only GPUs they are found in are ~2TFLOP behemoths. In short, they are necessary to keep that many ALUs fed.

Finally, after doing an extensive comparison, I am near certain that 3 of the duplicated blocks may be identified as TMUs, L1 texture cache, and ROPs. The other two may be memory controllers and L2 texture cache, but I am not quite as sure on those two. Basically, these are things which are found in various multiples in all Radeon cards and should have been the first thing we turned to in order to explain the duplicate blocks. But since the floor plan of Latte is so different from what we are used to saying, it was not as immediately apparent.

Thank you.
 
If 2 of the dual components are TMUs, then are what are block J1-4/N1-4

N1-N8 are definitely the shaders. J1-J4 are in all likelihood fixed function interpolation units. Admittedly, I have not found much support for that notion, but the only counter seems to be that, starting with DirectX11 cards, AMD made interpolation a programmable function of the SPUs. That needn't be the case in Latte, and we must not fall into the trap of saying, "This would be better, so Nintendo must have included it." That logic has come back to bite us again and again.
 

krizzx

Junior Member
N1-N4 are definitely the shaders. J1-J4 are in all likelihood fixed function interpolation units. Admittedly, I have not found much support for that notion, but the only counter seems to be that, starting with DirectX11 cards, AMD made interpolation a programmable function of the SPUs. That needn't be the case in Latte, and we must not fall into the trap of saying, "This would be better, so Nintendo must have included it." That logic has come back to bite us again and again.

That is where I run into a problem with what you are suggesting. Didn't Marcan and even Iwata confirm something along the lines of their being no fixed function hardware?

I can see where you are getting the 160 claim from now, if you are only counting only 4 of the chips as shaders. That makes more sense than 8 chips that were 90% smaller than the companies 40SP blocks but 55% larger than the 20SP blocks being 160 altogether.

Though, all of that seems to ride on the assumption that those 4 blocks are fixed function. I'm still finding it hard to swallow.
 
Status
Not open for further replies.
Top Bottom