• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

WiiU technical discussion (serious discussions welcome)

schuelma

Wastes hours checking old Famitsu software data, but that's why we love him.
No doubt the reason why Nintendo doesn't give detailed specs. On paper, it looks underwhelming but in real world scenarios it's surprisingly competent. At this point, I couldn't care less how many FLOPs the GPU has. I have no doubt the Wii U is a highly specialized machine like the GameCube was and the games will look and perform amazing.

That could very well be true, but if I'm reading things correctly this is going to make 3rd party porting very difficult and probably unlikely.
 

z0m3le

Banned
the 4870 has 40 SIMD cores not 10

die-shot-colored.jpg

Superficially, the Radeon HD 4770’s specs look fairly similar to ATI’s Radeon HD 4830. But they’re completely unique GPUs. For example, the 4830 centers on the familiar 956 million transistor RV770 with two of its 10 SIMD units disabled, yielding 640 total stream processors and the ability to filter 32 textured pixels per clock (down from 800 and 40, respectively).

No, I was right, it is 80 ALUs per SIMD, but it is 4 of those structures per SIMD. 2 SIMDs x 80 ALUs = 160 ALUs @ 550MHz = 176GFLOPs
 

Doc Holliday

SPOILER: Columbus finds America
Kind of sad that it took GAF to get this info. You figured mainstream gaming sites would've done this sooner. Nice job guys!

Also shame on Nintendo for forcing people to do this shit.
 

LCGeek

formerly sane
Wiiu is a test case of BC just ruining the hardware of the system. It was more important to run wii game than it was to push performance at all.

Which you and others were warned about all along. Dead serious mine and another person post specifcally highly this yet you and most others ignored sound solid info about what nintendo was doing with the system. Nintendo is and for the better part of the future is concerned with shrinking down their gpu architecure and now merging fixed function systems along with what we see in more modern like aspects of a pc based gpu. Next to bc or power concerns people expecting more shouldn't wether it's WiiU or their next system. Until they achieve this expect nothing traditional expect nitnendo to be nitnendo.

Besides this factor the only thing killing them in regards to devs is making it clear to devs how their systems are unique and how people should develop on them to maximize it vs something that ms/sony do. If they never do this and they haven't now for two gens the third party situation will remain as shitty as it is.

Mainstream never wants a real tech argument they want headlines and most of the time to take a cheap shot at nintendo.
 

FLAguy954

Junior Member
Might be more a case of a ridiculously low TDP target (30-35W while running) ruining performance. Considering how much PPW the actual hardware configuration gets, I'd think that even an additional 10-15W would have allowed them to eliminate most or all of the current-gen porting issues, at the very least.

This argument doesnt make any sense when I have a laptop with a TDP of 35W with an INTEGRATED GPU capable of 384 gflops (A10-4600M) with a bandwidth of 25.6 GB/s on a 128 bit bus (ddr3 1600) and 8 ROPs. All this with a Direct X 11 overhead. Nintendo had no reason to underperform, especially considering developers are coding to the metal.
 

Gahiggidy

My aunt & uncle run a Mom & Pop store, "The Gamecube Hut", and sold 80k WiiU within minutes of opening.
Whatever it is we are talking about.. I count 80 of them.
 

tipoo

Banned
So everyone falling back to "fixed function logic makes it incomparable", it seems to me that the uncore (not sure if that term applies in GPUs, but everything that isn't a shader, eDRAM, TMU etc) parts are about as large as other GPUs. Where is all this fixed logic that will save it?

If someone can show me why that's not true I'd be happy to reconsider, but "fixed function" appears to be the newest fallback. The uncore looks the same as the uncore of any GPU to me. And some of that would go to backwards compatibility too.
 
Lots of custom GPU logic would explain some of the problems in multiplat ports, like shadow resolution and DOF issues in Ass Creed 3. The Wii U port is not running the same shader effects as other ports, and insted uses custom effects. And the devs didn't spend enough time to get the custom effects working together with the "multiplat" part of the graphics engine. Maybe.
 

LCGeek

formerly sane
So everyone falling back to "fixed function logic makes it incomparable", it seems to me that the uncore (not sure if that term applies in GPUs, but everything that isn't a shader, eDRAM, TMU etc) parts are about as large as other GPUs. Where is all this fixed logic that will save it?

If someone can show me why that's not true I'd be happy to reconsider, but "fixed function" appears to be the newest fallback. The uncore looks the same as the uncore of any GPU to me. And some of that would go to backwards compatibility too.

Why should anyone argue with that logic. You're dividing the whole of what the gpu does to knock down the whole. To top it off you can't even say what it's all doing before you come to that conclusion. How about just saying we don't really know after a certain point.
 
Kind of sad that it took GAF to get this info. You figured mainstream gaming sites would've done this sooner. Nice job guys!

Also shame on Nintendo for forcing people to do this shit.


Gaming journalism is about getting freebies and attending trade shows.

Why do you think so many got pissed at the latest Nintendo direct?
 

tipoo

Banned
Why should anyone argue with that logic. You're dividing the whole of what the gpu does to knock down the whole. To top it off you can't even say what it's all doing before you come to that conclusion. How about just saying we don't really know after a certain point.

I'm having difficulty understanding what you just said, but I'm not making any assertions. I'm asking someone to show me where the fixed function logic is, as all I can see is standard uncore parts around the shaders, TMUs, etc.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
ATI/AMD GPUs have featured lossless Z compression for a long time. It's marketed as "HyperZ".

EDIT: This means that the Wii U GPU will have it also for those wondering.
HyperZ is quoted in its own segment in that spec (as 'Pixel hierarchical Z Cull rate'). Also, HyperZ is a best-case optimisation which does absolutely nothing for higher-granularity occlusion cases.

There's a good reason Xenos does 256GB/s for 32GSamples/s of Z culling.
 
Kind of sad that it took GAF to get this info. You figured mainstream gaming sites would've done this sooner. Nice job guys!

Also shame on Nintendo for forcing people to do this shit.
Well said, i was thinking the same thing. Nice initiative from some of the users and Chipwors of course for being so generous.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Er, so? That whole spec sheet is typical spec sheet bs. Otherwise the triangle rate and vertex rate wouldn't be equal. They'd have to at least subtract 2 for the start of the incredibly long triangle strip they imply people render with. Not that it really matters, as the triangle rate probably also assumes you're actually not rendering any pixels.
You missed the part where the HyperZ figure is quoted separately. The 51.2Gsamples/s is right below that, and it's not HyperZ apparently.
 
Lots of custom GPU logic would explain some of the problems in multiplat ports, like shadow resolution and DOF issues in Ass Creed 3. The Wii U port is not running the same shader effects as other ports, and insted uses custom effects. And the devs didn't spend enough time to get the custom effects working together with the "multiplat" part of the graphics engine. Maybe.

Or they were quick ports that try to use the shader units to produce those effects (as they would have done on the PS3/360), not taking the time to use the fixed functions that do the same things

I'm pretty sure I read a problem with the GC was that many devs didn't take full advantage of the fixed functions
 
Or they were quick ports that try to use the shader units to produce those effects (as they would have done on the PS3/360), not taking the time to use the fixed functions that do the same things

I'm pretty sure I read a problem with the GC was that many devs didn't take full advantage of the fixed functions

Yeah, could be that way too. Nintendo Land certainly didn't have any problems with similar effects...
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
What I'm getting at is that all those numbers are for systems in isolation. It doesn't matter if there isn't enough bandwidth to feed it. That figure is for if there somehow magically was. The same way the triangle rate is for if you were somehow magically drawing triangles without pixels. All the figures fall apart if you think about them.

The HyperZ branding includes all of their Z optimizations btw. Hierarchical Z, fast Z clear, Z compression, etc...
I still disagree with your interpretation. The Z compression ala Xenos requires that you had ROPs (well, ZOPs) at the buffer end, and moreover, that the buffer was still capable of handing the decompressed Z BW. Again, that's why Xenos could afford to have a 32GB/s bus to the ROPs, but the ROPs themselves had 256GB/s for 32 GZexel/s (among other purposes).

Durango's spec sheet lists 102.4GB/s and 51.2 GZexel/s. Even if it had the ultimate z-compression at ZERO bytes/sample for one of the directions of the read-modify-write cycle, it still remains with just 2 bytes/sample for the other direction - that makes no sense, no matter how you look at it.
 
Why are you conditoning your answer? It's about puting less and less restraints to whatever a creative mind wants to do. It doesn't have to do exclusvly with visuals, we could be talking about more complex simulaions for example.

The more capable the hardware, the closer to realise that vison (ambitious or not). You can't replicate Orcs Must Die on a NES, however a Wii U has no problem runing something of the complexity of Salomon's Key.

That was a hyperbolic joke.

I've been kind of in a mood today. I'll always love ya Refresh after douching.
 

Earendil

Member
So, what part of that image was the eDRAM and is it REALLY 32MB?

The big orange square on the left. And yes, it's 32MB. However, there is also an additional 4MB pool of "faster" eDRAM above it, as well as what appears to be a 1MB pool of SRAM, though this is still under debate.
 
I'm also confused as to how people are calculating the amount of shaders. I've read that they are grouped in the "I-shaped clusters", but how are you counting them?
 
I'm also confused as to how people are calculating the amount of shaders. I've read that they are grouped in the "I-shaped clusters", but how are you counting them?
8 blocks of 40 shaders each (20 was speculated earlier but dismissed because the area used is too big for that)
 

japtor

Member
So it IS 320 shaders. Meaning that it has a theoretical output of 352GFLOPS?
That's what it sounds like, of what's visibly recognizable at least.

...which I'm guessing is an issue considering a large chunk of the die appears to be unknown mystery logic.
 

Datschge

Member
Depending on the capability and balance of the fixed function units, in the ideal case allowing for cheaper usage in the most common shader cases, the whole talk about GPGPU makes sense again as developers could use the former for graphics and the latter for computing. Doesn't sound like such a bad idea anymore.
 

AzaK

Member
Depending on the capability and balance of the fixed function units, in the ideal case allowing for cheaper usage in the most common shader cases, the whole talk about GPGPU makes sense again as developers could use the former for graphics and the latter for computing. Doesn't sound like such a bad idea anymore.

We're not sure what, if any, "fixed function" stuff is in there. If there is some it could very well only be a few features and I imagine you can't abandon the entire programmable shader pipeline.
 

Datschge

Member
We're not sure what, if any, "fixed function" stuff is in there. If there is some it could very well only be a few features and I imagine you can't abandon the entire programmable shader pipeline.

Geeze, just went through all the noise in the other thread, but I think except for the late post by Thraktor everyone was missing the wood for the trees. The 8 SPs are on the low end of the expectation, but a really large area is completely unaccounted for (>30%). To me it looks like Nintendo went a "best of both worlds" route, not simply taking the TEV stuff from Broadway/Hollywood but significantly extending it, if not in capability, so at least in pure chip area/horsepower. Does anyone know how big Hollywood would be theoretically on a 40nm node?
 
We're not sure what, if any, "fixed function" stuff is in there. If there is some it could very well only be a few features and I imagine you can't abandon the entire programmable shader pipeline.

Indeed I imagine the fixed function aspects could be replicated perfectly well by the programmable shaders, It's only a case where the fixed function hardware provides a notable increase in speed vs. the programmable hardware for the few specific aspects it was designed for.
 
Indeed I imagine the fixed function aspects could be replicated perfectly well by the programmable shaders, It's only a case where the fixed function hardware provides a notable increase in speed vs. the programmable hardware for the few specific aspects it was designed for.

Well why would they include a fixed function that didn't perform as well as a good programmed version?
 

Ryoku

Member
So it IS 320 shaders. Meaning that it has a theoretical output of 352GFLOPS? Do we know the TEVs and ROPs yet?

We don't know. Check out the OP in the Wii U "Latte" GPU Die Photo thread.

The eight squarish groupings on the right hand side are possibly the VLIW5 shader clusters. However, due to the apparent increased amount of SRAM there, along with the unusual layout and the general changes to the die, it's entirely possible that the number of shaders in each cluster has changed from R700 dies. In fact, at this point it's even possible that the microarchitecture itself has changed, a la the VLIW4 used in some of AMD's 6000 series GPUs. Therefore, we could have a fairly unusual number of shader cores.

It could be 20, 40, 80, or something totally unexpected. At the moment, 40 shaders per block seems realistic, which equates to 352 GFLOPS.
 

tkscz

Member
Does anyone find it ironic that this picture was suppose to clear things up, but instead made things more confusing? Hell, no one was expecting the mystery space, or extra super fast eDRAM. It's almost as if Nintendo predicted this.
 

japtor

Member
Does anyone find it ironic that this picture was suppose to clear things up, but instead made things more confusing? Hell, no one was expecting the mystery space, or extra super fast eDRAM. It's almost as if Nintendo predicted this.
I posted this back on the 28th:
Watch it end up being completely foreign looking without any recognizable components.

(Not that I want anyone's money to go to waste...but part of the fun is the mystery, it'd be kind of funny if we're still clueless after this)
Well I was partially right!
 

ahm998

Member
Does anyone find it ironic that this picture was suppose to clear things up, but instead made things more confusing? Hell, no one was expecting the mystery space, or extra super fast eDRAM. It's almost as if Nintendo predicted this.

http://en.wikipedia.org/wiki/EDRAM

Edram by the way for Embedding memory on the ASIC or processor allows for much wider buses and higher operation speeds.

And the GPU still Unknown :(
 

tipoo

Banned
Depending on the capability and balance of the fixed function units, in the ideal case allowing for cheaper usage in the most common shader cases, the whole talk about GPGPU makes sense again as developers could use the former for graphics and the latter for computing. Doesn't sound like such a bad idea anymore.

Where? All the uncore parts look the same as any modern GPU to me (apart from the DSP/ARM core/whatever that is), where is all this fixed function logic?


Also what do we have to go on that says the 4MB pool of eDRAM above is "faster"? It is also being speculated that is there to directly emulate the 3 MB embedded GPU texture memory and framebuffer of the Wii.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Where? All the uncore parts look the same as any modern GPU to me (apart from the DSP/ARM core/whatever that is), where is all this fixed function logic?
The 'fixed function' logic is just all the almost-uniformly-brown logic that people cannot discern as shader clusters or TMUs. How 'fixed function' it is is anybody's guess. Yes, we are all aware a GPU has a fair share of other functional blocks than those mentioned above, but the surface area percentage of all those currently-unknown blocks does not add up in comparison to other designs we are more familiar with. But you are claiming to have good understanding of the picture, why don't you help everybody out and provide a color-coded diagram of the chip areas?

Also what do we have to go on that says the 4MB pool of eDRAM above is "faster"?
Higher density = lower latency. Otherwise eDRAM, like most other types of ram, has BW entirely determined by the bus width and clock, so it's unlikely that the smaller macros have as fat a bus the larger ones.

It is also being speculated that is there to directly emulate the 3 MB embedded GPU texture memory and framebuffer of the Wii.
While sitting unused in all other scenarios? That would be really something.
 

tipoo

Banned
While sitting unused in all other scenarios? That would be really something.


You're probably right, just a crackpot theory. Marcan said on twitter that he could have told us that extra eDRAM was there, so I guess it's visible and not wired off to the Hollywood only or anything like that.

But you are claiming to have good understanding of the picture, why don't you help everybody out and provide a color-coded diagram of the chip areas?

I was genuinely curious, maybe my intent was lost but I just wanted it pointed out where all that was. I thought if there were older fixed function pixel shaders for instance they may be discernible repeated blocks, I wanted to know where we thought those were.
 

ozfunghi

Member
You're probably right, just a crackpot theory. Marcan said on twitter that he could have told us that extra eDRAM was there, so I guess it's visible and not wired off to the Hollywood only or anything like that.



I was genuinely curious, maybe my intent was lost but I just wanted it pointed out where all that was. I thought if there were older fixed function pixel shaders for instance they may be discernible repeated blocks, I wanted to know where we thought those were.

I rearranged the orange parts of the 4870 picture and they take up exactly 1/3rd of the entire space. Maybe you can compare to the WiiU GPU.
 

tkscz

Member
I posted this back on the 28th:

Well I was partially right!

Good job then, as I was expecting the chip to be in sort of order, but this is Nintendo.

You're probably right, just a crackpot theory. Marcan said on twitter that he could have told us that extra eDRAM was there, so I guess it's visible and not wired off to the Hollywood only or anything like that.



I was genuinely curious, maybe my intent was lost but I just wanted it pointed out where all that was. I thought if there were older fixed function pixel shaders for instance they may be discernible repeated blocks, I wanted to know where we thought those were.

Is it possible for fixed function pixel shaders to not be in repeated blocks?
 

tipoo

Banned
Is it possible that Hollywood is also smashed in there somewhere?

That's what's being discussed, some think it's in there as a 1:1 copy, some think its functions are just mixed into the rest of the Wii U GPU. I would think a 1:1 copy would be small enough now to put in there to ensure perfect compatibility.
 
Top Bottom