• Register
  • TOS
  • Privacy
  • @NeoGAF

User Tron
Member
(02-04-2013, 09:12 PM)
User Tron's Avatar

Originally Posted by DragonSworne

What's the 2 in the formula, the number of operations per hertz? I'm assuming .55 is the hertz right?

mul+add?
Father_Brain
Samus made me a Widower :(
(02-04-2013, 09:12 PM)
Father_Brain's Avatar

Originally Posted by USC-fan

Wiiu is a test case of BC just ruining the hardware of the system. It was more important to run wii game than it was to push performance at all.

Might be more a case of a ridiculously low TDP target (30-35W while running) ruining performance. Considering how much PPW the actual hardware configuration gets, I'd think that even an additional 10-15W would have allowed them to eliminate most or all of the current-gen porting issues, at the very least.
Last edited by Father_Brain; 02-04-2013 at 09:17 PM.
schuelma
Wastes hours checking old Famitsu software data, but that's why we love him.
(02-04-2013, 09:13 PM)
schuelma's Avatar

Originally Posted by TAS

No doubt the reason why Nintendo doesn't give detailed specs. On paper, it looks underwhelming but in real world scenarios it's surprisingly competent. At this point, I couldn't care less how many FLOPs the GPU has. I have no doubt the Wii U is a highly specialized machine like the GameCube was and the games will look and perform amazing.

That could very well be true, but if I'm reading things correctly this is going to make 3rd party porting very difficult and probably unlikely.
z0m3le
Junior Member
(02-04-2013, 09:13 PM)
z0m3le's Avatar

Originally Posted by LeleSocho

the 4870 has 40 SIMD cores not 10

Superficially, the Radeon HD 4770ís specs look fairly similar to ATIís Radeon HD 4830. But theyíre completely unique GPUs. For example, the 4830 centers on the familiar 956 million transistor RV770 with two of its 10 SIMD units disabled, yielding 640 total stream processors and the ability to filter 32 textured pixels per clock (down from 800 and 40, respectively).

No, I was right, it is 80 ALUs per SIMD, but it is 4 of those structures per SIMD. 2 SIMDs x 80 ALUs = 160 ALUs @ 550MHz = 176GFLOPs
Doc Holliday
Member
(02-04-2013, 09:24 PM)
Doc Holliday's Avatar
Kind of sad that it took GAF to get this info. You figured mainstream gaming sites would've done this sooner. Nice job guys!

Also shame on Nintendo for forcing people to do this shit.
LCGeek
formerly sane
(02-04-2013, 09:24 PM)
LCGeek's Avatar

Originally Posted by USC-fan

Wiiu is a test case of BC just ruining the hardware of the system. It was more important to run wii game than it was to push performance at all.

Which you and others were warned about all along. Dead serious mine and another person post specifcally highly this yet you and most others ignored sound solid info about what nintendo was doing with the system. Nintendo is and for the better part of the future is concerned with shrinking down their gpu architecure and now merging fixed function systems along with what we see in more modern like aspects of a pc based gpu. Next to bc or power concerns people expecting more shouldn't wether it's WiiU or their next system. Until they achieve this expect nothing traditional expect nitnendo to be nitnendo.

Besides this factor the only thing killing them in regards to devs is making it clear to devs how their systems are unique and how people should develop on them to maximize it vs something that ms/sony do. If they never do this and they haven't now for two gens the third party situation will remain as shitty as it is.

Mainstream never wants a real tech argument they want headlines and most of the time to take a cheap shot at nintendo.
FLAguy954
Junior Member
(02-04-2013, 09:25 PM)
FLAguy954's Avatar

Originally Posted by Father_Brain

Might be more a case of a ridiculously low TDP target (30-35W while running) ruining performance. Considering how much PPW the actual hardware configuration gets, I'd think that even an additional 10-15W would have allowed them to eliminate most or all of the current-gen porting issues, at the very least.

This argument doesnt make any sense when I have a laptop with a TDP of 35W with an INTEGRATED GPU capable of 384 gflops (A10-4600M) with a bandwidth of 25.6 GB/s on a 128 bit bus (ddr3 1600) and 8 ROPs. All this with a Direct X 11 overhead. Nintendo had no reason to underperform, especially considering developers are coding to the metal.
Gahiggidy
My aunt & uncle run a Mom & Pop store, "The Gamecube Hut", and sold 80k WiiU within minutes of opening.
(02-04-2013, 09:28 PM)
Gahiggidy's Avatar
Whatever it is we are talking about.. I count 80 of them.
Last edited by Gahiggidy; 02-04-2013 at 09:38 PM.
tipoo
Banned
(02-04-2013, 09:36 PM)
So everyone falling back to "fixed function logic makes it incomparable", it seems to me that the uncore (not sure if that term applies in GPUs, but everything that isn't a shader, eDRAM, TMU etc) parts are about as large as other GPUs. Where is all this fixed logic that will save it?

If someone can show me why that's not true I'd be happy to reconsider, but "fixed function" appears to be the newest fallback. The uncore looks the same as the uncore of any GPU to me. And some of that would go to backwards compatibility too.
Last edited by tipoo; 02-04-2013 at 09:44 PM.
SpoonyBard
Member
(02-04-2013, 09:36 PM)
SpoonyBard's Avatar
Lots of custom GPU logic would explain some of the problems in multiplat ports, like shadow resolution and DOF issues in Ass Creed 3. The Wii U port is not running the same shader effects as other ports, and insted uses custom effects. And the devs didn't spend enough time to get the custom effects working together with the "multiplat" part of the graphics engine. Maybe.
TheGreatMightyPoo
(02-04-2013, 09:41 PM)

Originally Posted by Doc Holliday

Also shame on Nintendo for forcing people to do this shit.

Seriously???
LCGeek
formerly sane
(02-04-2013, 09:58 PM)
LCGeek's Avatar

Originally Posted by tipoo

So everyone falling back to "fixed function logic makes it incomparable", it seems to me that the uncore (not sure if that term applies in GPUs, but everything that isn't a shader, eDRAM, TMU etc) parts are about as large as other GPUs. Where is all this fixed logic that will save it?

If someone can show me why that's not true I'd be happy to reconsider, but "fixed function" appears to be the newest fallback. The uncore looks the same as the uncore of any GPU to me. And some of that would go to backwards compatibility too.

Why should anyone argue with that logic. You're dividing the whole of what the gpu does to knock down the whole. To top it off you can't even say what it's all doing before you come to that conclusion. How about just saying we don't really know after a certain point.
DragonSworne
Satoru Iwata and his Trilateral Commission cronies are suppressing the truth about Retro. Wake up, sheeple!
(02-04-2013, 10:01 PM)
DragonSworne's Avatar

Originally Posted by Doc Holliday

Kind of sad that it took GAF to get this info. You figured mainstream gaming sites would've done this sooner. Nice job guys!

Also shame on Nintendo for forcing people to do this shit.


Gaming journalism is about getting freebies and attending trade shows.

Why do you think so many got pissed at the latest Nintendo direct?
tipoo
Banned
(02-04-2013, 10:04 PM)

Originally Posted by LCGeek

Why should anyone argue with that logic. You're dividing the whole of what the gpu does to knock down the whole. To top it off you can't even say what it's all doing before you come to that conclusion. How about just saying we don't really know after a certain point.

I'm having difficulty understanding what you just said, but I'm not making any assertions. I'm asking someone to show me where the fixed function logic is, as all I can see is standard uncore parts around the shaders, TMUs, etc.
blu
Member
(02-04-2013, 10:04 PM)
blu's Avatar

Originally Posted by Popstar

ATI/AMD GPUs have featured lossless Z compression for a long time. It's marketed as "HyperZ".

EDIT: This means that the Wii U GPU will have it also for those wondering.

HyperZ is quoted in its own segment in that spec (as 'Pixel hierarchical Z Cull rate'). Also, HyperZ is a best-case optimisation which does absolutely nothing for higher-granularity occlusion cases.

There's a good reason Xenos does 256GB/s for 32GSamples/s of Z culling.
Popstar
Member
(02-04-2013, 10:17 PM)
Popstar's Avatar

Originally Posted by blu

HyperZ is quoted in its own segment in that spec (as 'Pixel hierarchical Z Cull rate'). Also, HyperZ is a best-case optimisation which does absolutely nothing for higher-granularity occlusion cases.

There's a good reason Xenos does 256GB/s for 32GSamples/s of Z culling.

Er, so? That whole spec sheet is typical spec sheet bs. Otherwise the triangle rate and vertex rate wouldn't be equal. They'd have to at least subtract 2 for the start of the incredibly long triangle strip they imply people render with. Not that it really matters, as the triangle rate probably also assumes you're actually not rendering any pixels.
Last edited by Popstar; 02-04-2013 at 10:19 PM.
Refreshment.01
Member
(02-04-2013, 10:20 PM)
Refreshment.01's Avatar

Originally Posted by Doc Holliday

Kind of sad that it took GAF to get this info. You figured mainstream gaming sites would've done this sooner. Nice job guys!

Also shame on Nintendo for forcing people to do this shit.

Well said, i was thinking the same thing. Nice initiative from some of the users and Chipwors of course for being so generous.
blu
Member
(02-04-2013, 10:37 PM)
blu's Avatar

Originally Posted by Popstar

Er, so? That whole spec sheet is typical spec sheet bs. Otherwise the triangle rate and vertex rate wouldn't be equal. They'd have to at least subtract 2 for the start of the incredibly long triangle strip they imply people render with. Not that it really matters, as the triangle rate probably also assumes you're actually not rendering any pixels.

You missed the part where the HyperZ figure is quoted separately. The 51.2Gsamples/s is right below that, and it's not HyperZ apparently.
Zoramon089
Banned
(02-04-2013, 10:46 PM)
Zoramon089's Avatar

Originally Posted by SpoonyBard

Lots of custom GPU logic would explain some of the problems in multiplat ports, like shadow resolution and DOF issues in Ass Creed 3. The Wii U port is not running the same shader effects as other ports, and insted uses custom effects. And the devs didn't spend enough time to get the custom effects working together with the "multiplat" part of the graphics engine. Maybe.

Or they were quick ports that try to use the shader units to produce those effects (as they would have done on the PS3/360), not taking the time to use the fixed functions that do the same things

I'm pretty sure I read a problem with the GC was that many devs didn't take full advantage of the fixed functions
SpoonyBard
Member
(02-04-2013, 10:51 PM)
SpoonyBard's Avatar

Originally Posted by Zoramon089

Or they were quick ports that try to use the shader units to produce those effects (as they would have done on the PS3/360), not taking the time to use the fixed functions that do the same things

I'm pretty sure I read a problem with the GC was that many devs didn't take full advantage of the fixed functions

Yeah, could be that way too. Nintendo Land certainly didn't have any problems with similar effects...
Popstar
Member
(02-04-2013, 11:27 PM)
Popstar's Avatar

Originally Posted by blu

You missed the part where the HyperZ figure is quoted separately. The 51.2Gsamples/s is right below that, and it's not HyperZ apparently.

What I'm getting at is that all those numbers are for systems in isolation. It doesn't matter if there isn't enough bandwidth to feed it. That figure is for if there somehow magically was. The same way the triangle rate is for if you were somehow magically drawing triangles without pixels. All the figures fall apart if you think about them.

The HyperZ branding includes all of their Z optimizations btw. Hierarchical Z, fast Z clear, Z compression, etc...
Last edited by Popstar; 02-04-2013 at 11:32 PM.
blu
Member
(02-04-2013, 11:54 PM)
blu's Avatar

Originally Posted by Popstar

What I'm getting at is that all those numbers are for systems in isolation. It doesn't matter if there isn't enough bandwidth to feed it. That figure is for if there somehow magically was. The same way the triangle rate is for if you were somehow magically drawing triangles without pixels. All the figures fall apart if you think about them.

The HyperZ branding includes all of their Z optimizations btw. Hierarchical Z, fast Z clear, Z compression, etc...

I still disagree with your interpretation. The Z compression ala Xenos requires that you had ROPs (well, ZOPs) at the buffer end, and moreover, that the buffer was still capable of handing the decompressed Z BW. Again, that's why Xenos could afford to have a 32GB/s bus to the ROPs, but the ROPs themselves had 256GB/s for 32 GZexel/s (among other purposes).

Durango's spec sheet lists 102.4GB/s and 51.2 GZexel/s. Even if it had the ultimate z-compression at ZERO bytes/sample for one of the directions of the read-modify-write cycle, it still remains with just 2 bytes/sample for the other direction - that makes no sense, no matter how you look at it.
Thunder Monkey
(02-05-2013, 12:30 AM)
Thunder Monkey's Avatar

Originally Posted by Refreshment.01

Why are you conditoning your answer? It's about puting less and less restraints to whatever a creative mind wants to do. It doesn't have to do exclusvly with visuals, we could be talking about more complex simulaions for example.

The more capable the hardware, the closer to realise that vison (ambitious or not). You can't replicate Orcs Must Die on a NES, however a Wii U has no problem runing something of the complexity of Salomon's Key.

That was a hyperbolic joke.

I've been kind of in a mood today. I'll always love ya Refresh after douching.
Smurfman256
Member
(02-05-2013, 12:58 AM)
Smurfman256's Avatar
So, what part of that image was the eDRAM and is it REALLY 32MB?
Earendil
Member
(02-05-2013, 01:00 AM)
Earendil's Avatar

Originally Posted by Smurfman256

So, what part of that image was the eDRAM and is it REALLY 32MB?

The big orange square on the left. And yes, it's 32MB. However, there is also an additional 4MB pool of "faster" eDRAM above it, as well as what appears to be a 1MB pool of SRAM, though this is still under debate.
ScepticMatt
Member
(02-05-2013, 01:09 AM)
ScepticMatt's Avatar
32+4 MB
AzaK
Member
(02-05-2013, 01:12 AM)
AzaK's Avatar

Originally Posted by Smurfman256

So, what part of that image was the eDRAM and is it REALLY 32MB?

This bit
Smurfman256
Member
(02-05-2013, 01:13 AM)
Smurfman256's Avatar

Originally Posted by AzaK

This bit

Thanks. What is that silver bit in the bottom left? The DSP?
AzaK
Member
(02-05-2013, 01:16 AM)
AzaK's Avatar

Originally Posted by Smurfman256

Thanks. What is that silver bit in the bottom left? The DSP?

The OP has details on that I think. There's a SerDes and a Tank Oscillator.
Smurfman256
Member
(02-05-2013, 01:30 AM)
Smurfman256's Avatar
I'm also confused as to how people are calculating the amount of shaders. I've read that they are grouped in the "I-shaped clusters", but how are you counting them?
ScepticMatt
Member
(02-05-2013, 01:43 AM)
ScepticMatt's Avatar

Originally Posted by Smurfman256

I'm also confused as to how people are calculating the amount of shaders. I've read that they are grouped in the "I-shaped clusters", but how are you counting them?

8 blocks of 40 shaders each (20 was speculated earlier but dismissed because the area used is too big for that)
Smurfman256
Member
(02-05-2013, 01:51 AM)
Smurfman256's Avatar

Originally Posted by ScepticMatt

8 blocks of 40 shaders each (20 was speculated earlier but dismissed because the area used is too big for that)

So it IS 320 shaders. Meaning that it has a theoretical output of 352GFLOPS? Do we know the TEVs and ROPs yet?
Last edited by Smurfman256; 02-05-2013 at 01:56 AM.
ambientmystic
Member
(02-05-2013, 01:53 AM)
ambientmystic's Avatar

Originally Posted by Smurfman256

So it IS 320 shaders. Meaning that it has a theoretical output of 352GFLOPS?

Yes sir.
japtor
Member
(02-05-2013, 01:53 AM)
japtor's Avatar

Originally Posted by Smurfman256

So it IS 320 shaders. Meaning that it has a theoretical output of 352GFLOPS?

That's what it sounds like, of what's visibly recognizable at least.

...which I'm guessing is an issue considering a large chunk of the die appears to be unknown mystery logic.
ozfunghi
Member
(02-05-2013, 01:54 AM)
ozfunghi's Avatar

Originally Posted by Smurfman256

So it IS 320 shaders. Meaning that it has a theoretical output of 352GFLOPS?

Likely.
Datschge
Member
(02-05-2013, 02:15 AM)
Datschge's Avatar
Depending on the capability and balance of the fixed function units, in the ideal case allowing for cheaper usage in the most common shader cases, the whole talk about GPGPU makes sense again as developers could use the former for graphics and the latter for computing. Doesn't sound like such a bad idea anymore.
AzaK
Member
(02-05-2013, 03:10 AM)
AzaK's Avatar

Originally Posted by Datschge

Depending on the capability and balance of the fixed function units, in the ideal case allowing for cheaper usage in the most common shader cases, the whole talk about GPGPU makes sense again as developers could use the former for graphics and the latter for computing. Doesn't sound like such a bad idea anymore.

We're not sure what, if any, "fixed function" stuff is in there. If there is some it could very well only be a few features and I imagine you can't abandon the entire programmable shader pipeline.
Smurfman256
Member
(02-05-2013, 03:20 AM)
Smurfman256's Avatar
I just thought of two things: could the 4MB of ultra-fast eDRAM be used as the CPU cache? Or maybe it's the frame buffer?
Datschge
Member
(02-05-2013, 03:22 AM)
Datschge's Avatar

Originally Posted by AzaK

We're not sure what, if any, "fixed function" stuff is in there. If there is some it could very well only be a few features and I imagine you can't abandon the entire programmable shader pipeline.

Geeze, just went through all the noise in the other thread, but I think except for the late post by Thraktor everyone was missing the wood for the trees. The 8 SPs are on the low end of the expectation, but a really large area is completely unaccounted for (>30%). To me it looks like Nintendo went a "best of both worlds" route, not simply taking the TEV stuff from Broadway/Hollywood but significantly extending it, if not in capability, so at least in pure chip area/horsepower. Does anyone know how big Hollywood would be theoretically on a 40nm node?
VIDEOGAMESYAY
Banned
(02-05-2013, 07:26 AM)

Originally Posted by AzaK

We're not sure what, if any, "fixed function" stuff is in there. If there is some it could very well only be a few features and I imagine you can't abandon the entire programmable shader pipeline.

Indeed I imagine the fixed function aspects could be replicated perfectly well by the programmable shaders, It's only a case where the fixed function hardware provides a notable increase in speed vs. the programmable hardware for the few specific aspects it was designed for.
Zoramon089
Banned
(02-05-2013, 07:37 AM)
Zoramon089's Avatar

Originally Posted by VIDEOGAMESYAY

Indeed I imagine the fixed function aspects could be replicated perfectly well by the programmable shaders, It's only a case where the fixed function hardware provides a notable increase in speed vs. the programmable hardware for the few specific aspects it was designed for.

Well why would they include a fixed function that didn't perform as well as a good programmed version?
Ryoku
Member
(02-05-2013, 07:46 AM)
Ryoku's Avatar

Originally Posted by Smurfman256

So it IS 320 shaders. Meaning that it has a theoretical output of 352GFLOPS? Do we know the TEVs and ROPs yet?

We don't know. Check out the OP in the Wii U "Latte" GPU Die Photo thread.

Originally Posted by Thraktor

The eight squarish groupings on the right hand side are possibly the VLIW5 shader clusters. However, due to the apparent increased amount of SRAM there, along with the unusual layout and the general changes to the die, it's entirely possible that the number of shaders in each cluster has changed from R700 dies. In fact, at this point it's even possible that the microarchitecture itself has changed, a la the VLIW4 used in some of AMD's 6000 series GPUs. Therefore, we could have a fairly unusual number of shader cores.

It could be 20, 40, 80, or something totally unexpected. At the moment, 40 shaders per block seems realistic, which equates to 352 GFLOPS.
Last edited by Ryoku; 02-05-2013 at 07:52 AM.
tkscz
Member
(02-05-2013, 12:54 PM)
tkscz's Avatar
Does anyone find it ironic that this picture was suppose to clear things up, but instead made things more confusing? Hell, no one was expecting the mystery space, or extra super fast eDRAM. It's almost as if Nintendo predicted this.
japtor
Member
(02-05-2013, 01:09 PM)
japtor's Avatar

Originally Posted by tkscz

Does anyone find it ironic that this picture was suppose to clear things up, but instead made things more confusing? Hell, no one was expecting the mystery space, or extra super fast eDRAM. It's almost as if Nintendo predicted this.

I posted this back on the 28th:

Watch it end up being completely foreign looking without any recognizable components.

(Not that I want anyone's money to go to waste...but part of the fun is the mystery, it'd be kind of funny if we're still clueless after this)

Well I was partially right!
ahm998
Member
(02-05-2013, 01:12 PM)
ahm998's Avatar

Originally Posted by tkscz

Does anyone find it ironic that this picture was suppose to clear things up, but instead made things more confusing? Hell, no one was expecting the mystery space, or extra super fast eDRAM. It's almost as if Nintendo predicted this.

http://en.wikipedia.org/wiki/EDRAM

Edram by the way for Embedding memory on the ASIC or processor allows for much wider buses and higher operation speeds.

And the GPU still Unknown :(
tipoo
Banned
(02-05-2013, 01:31 PM)

Originally Posted by Datschge

Depending on the capability and balance of the fixed function units, in the ideal case allowing for cheaper usage in the most common shader cases, the whole talk about GPGPU makes sense again as developers could use the former for graphics and the latter for computing. Doesn't sound like such a bad idea anymore.

Where? All the uncore parts look the same as any modern GPU to me (apart from the DSP/ARM core/whatever that is), where is all this fixed function logic?


Also what do we have to go on that says the 4MB pool of eDRAM above is "faster"? It is also being speculated that is there to directly emulate the 3 MB embedded GPU texture memory and framebuffer of the Wii.
blu
Member
(02-05-2013, 02:27 PM)
blu's Avatar

Originally Posted by tipoo

Where? All the uncore parts look the same as any modern GPU to me (apart from the DSP/ARM core/whatever that is), where is all this fixed function logic?

The 'fixed function' logic is just all the almost-uniformly-brown logic that people cannot discern as shader clusters or TMUs. How 'fixed function' it is is anybody's guess. Yes, we are all aware a GPU has a fair share of other functional blocks than those mentioned above, but the surface area percentage of all those currently-unknown blocks does not add up in comparison to other designs we are more familiar with. But you are claiming to have good understanding of the picture, why don't you help everybody out and provide a color-coded diagram of the chip areas?

Also what do we have to go on that says the 4MB pool of eDRAM above is "faster"?

Higher density = lower latency. Otherwise eDRAM, like most other types of ram, has BW entirely determined by the bus width and clock, so it's unlikely that the smaller macros have as fat a bus the larger ones.

It is also being speculated that is there to directly emulate the 3 MB embedded GPU texture memory and framebuffer of the Wii.

While sitting unused in all other scenarios? That would be really something.
Last edited by blu; 02-05-2013 at 03:48 PM.
tipoo
Banned
(02-05-2013, 02:51 PM)

Originally Posted by blu


While sitting unused in all other scenarios? That would be really something.


You're probably right, just a crackpot theory. Marcan said on twitter that he could have told us that extra eDRAM was there, so I guess it's visible and not wired off to the Hollywood only or anything like that.

Originally Posted by blu

But you are claiming to have good understanding of the picture, why don't you help everybody out and provide a color-coded diagram of the chip areas?

I was genuinely curious, maybe my intent was lost but I just wanted it pointed out where all that was. I thought if there were older fixed function pixel shaders for instance they may be discernible repeated blocks, I wanted to know where we thought those were.
Last edited by tipoo; 02-05-2013 at 02:57 PM.
ozfunghi
Member
(02-05-2013, 04:06 PM)
ozfunghi's Avatar

Originally Posted by tipoo

You're probably right, just a crackpot theory. Marcan said on twitter that he could have told us that extra eDRAM was there, so I guess it's visible and not wired off to the Hollywood only or anything like that.



I was genuinely curious, maybe my intent was lost but I just wanted it pointed out where all that was. I thought if there were older fixed function pixel shaders for instance they may be discernible repeated blocks, I wanted to know where we thought those were.

I rearranged the orange parts of the 4870 picture and they take up exactly 1/3rd of the entire space. Maybe you can compare to the WiiU GPU.
tkscz
Member
(02-05-2013, 04:17 PM)
tkscz's Avatar

Originally Posted by japtor

I posted this back on the 28th:

Well I was partially right!

Good job then, as I was expecting the chip to be in sort of order, but this is Nintendo.

Originally Posted by tipoo

You're probably right, just a crackpot theory. Marcan said on twitter that he could have told us that extra eDRAM was there, so I guess it's visible and not wired off to the Hollywood only or anything like that.



I was genuinely curious, maybe my intent was lost but I just wanted it pointed out where all that was. I thought if there were older fixed function pixel shaders for instance they may be discernible repeated blocks, I wanted to know where we thought those were.

Is it possible for fixed function pixel shaders to not be in repeated blocks?

Thread Tools