• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Is the Wii U gpu 176 gflops or 352 gigaflops.

10k

Banned
it only flopped once afaik
giphy.gif
 

AlStrong

Member
Just looked back and the number I saw was 31.7 GB/s. Weird indeed, but it fit the other info I had. I guess the eDRAM possibly has its own clock discreet from the GPU.

I thought it was basically something along the lines of

550MHz*8ROPs*4Bpp*(read+write) = 34.375GB/s

Stop raising the WUST signal, dood! /returns to DOA cave.
 
Welcome to 238 pages of figuring it out we did way back when.

http://www.neogaf.com/forum/showthread.php?t=511628

99% sure its 176, wikipedia was just edited by a hopeful. It was just slightly fatter than it should be which threw some people, but the fabrication plant differences account for it. As well as any other changes per shader they did.

8 x 20 shader units = 160ALUs, 550mhz = 176GFLOPs. The chance of it packing double the shader units through magic is negligible.


that's why I'm not letting myself expect anything for NX. Nintendo can always be Nintendo Special.

Weaker than the 360@240GFlops? Maybe they are closer in real world, because of efficiency of new architecture, but damn. The 360 was a great design.
 

LordOfChaos

Member
Your comparing a CPU to a GPU now. ~38GB/s with the Glops is to slow to run games like Fast Racing Neo even with the resolution is at. Hell I doubt it would run in the main menu at 30fps. No program from Nintendo and Shin'en would be able pull off the visuals we have seen on the system because it would have been huge bottleneck for the system.

I would take blu's word and Shin'en word.


I am? Iris Pro, which is what I was talking about there, sure sounds like a GPU to me. I said "Crystalwell has about 50GB/s bandwidth serving an 800Gflop GPU". The eDRAM serves as a victim cache for things evicted from CPU L3, sure, but it's main use is to provide the GPU the bandwidth it needed to scale up to that performance of a 800Gflop chip.

That's the Iris Pro, with 128MB eDRAM. But there's no possible way the 32MB eDRAM on the Wii U is 64% of that bandwidth? For a GPU 20% as powerful (830gflops to 176, even if not directly comparable) ??

Again with the "it just can't pull those visuals with that bandwidth". Is this just your gut feeling? Guts are often wrong. If the Iris Pro 5200 with a 50GB/s eDRAM and DDR3 connection can run most titles the 8G twins hit 900p-1080p at, at 720p itself, I really don't see how you can say it's just not possible for the Wii U to pull Mario or Zelda or Fast Racing out of a 38GB/s eDRAM plus DDR3. I've run SoM, Crysis 3, Advanced Warfare, Legacy of the Void, nearly every AAA game has run acceptably (read: above 30fps, minimum 720p to 900p, mid settings most of the time) on the Iris Pro that I've tried...Yet the Wii U can't possibly pull of Mario with 38GB/s plus 12 odd from the DDR3? Because feelings?

Anyways, this is all rather moot. With 1TB/s eDRAM bandwidth it would still be a 160 shader 8 ROP 8 TMU part at 550MHz. Bandwidth doesn't seem to be a developer complaint at any rate, the rest of the system limits it first.


I would take blu's word and Shin'en word.

Where did Shin'en specifically say it was definitely more than 38GB/s?
And Blu said 35 was the minimum last page, I'm not really contesting that with the dev comment of being able to get ~38 out. Given the rest of the system...The minimum sounds about right.
 
It's 176 gflops confirmed, and the games don't look anything beyond lastgen, and wiiu still hasn't even matched lastgen best looking games, falls right in line with its specs
 
I thought it was basically something along the lines of

550MHz*8ROPs*4Bpp*(read+write) = 34.375GB/s

Stop raising the WUST signal, dood! /returns to DOA cave.

Hey, are they gonna give us some detailed specs this time with NX or are we gonna have to do it all over again?

So whats the point of the eDRAM, thats not much better than the slow mem1 ddr2.

I'm not the person to ask for the nitty gritty details, but it is over twice as much bandwidth, it saves cost on motherboard complexity, and reduces latency (CPU also has access, remember). There are also 8 separate channels as opposed to 2 for the DDR3, which should make for less wasted cycles.
 
Weaker than the 360@240GFlops? Maybe they are closer in real world, because of efficiency of new architecture, but damn. The 360 was a great design.

Yeah. Architecturally it is better, but that is quite a large deficit to fill. Wii U, beynd having more RAM, is kinda worse than last gen if visual output is to go by...

So awesome that GAF did all the research on this in that epic thread.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
I thought it was basically something along the lines of

550MHz*8ROPs*4Bpp*(read+write) = 34.375GB/s

Stop raising the WUST signal, dood! /returns to DOA cave.
You're forgetting the z.

550MHz * 8 ROPs * (4Bpp color + 4Bpp z) * (read + write) = 70GB/s
550MHz * 4 ROPs * (4Bpp color + 4Bpp z) * (read + write) = 35GB/s

Now, I too seem to recall Latte having 8 ROPs, but I could be wrong.
 

AlStrong

Member
Hey, are they gonna give us some detailed specs this time with NX or are we gonna have to do it all over again?

It's gonna be a bunch of Amiibos that combine like Voltron. :p

I'm not the person to ask for the nitty gritty details, but it is over twice as much bandwidth, it saves cost on motherboard complexity, and reduces latency (CPU also has access, remember). There are also 8 separate channels as opposed to 2 for the DDR3, which should make for less wasted cycles.

Basically that. It let's them get away with less external I/O, which also has power consumption implications. I may not worry too much about latency. The highest bandwidth consumer will be the ROPs, and the GPU will be hiding latency as much as possible while the CPU... does stuff. I'm not particularly convinced the CPU has a significant role here considering the upgrades they already did to Gekko. ;) (cue Blu shooting me down)

That said, eDRAM isn't necessarily cheap, but they must have had a good contract with Renesas at least to deem it a worthwhile design. With the CPU's SOI eDRAM, IBM would probably have been jumping at the chance to make use of their fabs (before they sold them to GF), even for such an elfin chip size.

You're forgetting the z.

550MHz * 8 ROPs * (4Bpp color + 4Bpp z) * (read + write) = 70GB/s
550MHz * 4 ROPs * (4Bpp color + 4Bpp z) * (read + write) = 35GB/s

Now, I too seem to recall Latte having 8 ROPs, but I could be wrong.

Ah yeah. Guess that really depends on how bandwidth concerned they were. Mind you, the RBEs/ROPs*should* have their own pixel caches compared to Xenos days, so it might not be as problematic. Xenos ROPs were effectively without any compression HW, which is especially relevant to depth, which ought to be fairly compressable in the right hands (early Z etc). Xenos was rather brute-force in that sense.
 
Hey, are they gonna give us some detailed specs this time with NX or are we gonna have to do it all over again?



I'm not the person to ask for the nitty gritty details, but it is over twice as much bandwidth, it saves cost on motherboard complexity, and reduces latency (CPU also has access, remember). There are also 8 separate channels as opposed to 2 for the DDR3, which should make for less wasted cycles.

So whats the cost differences, I'm assuming the DDR3 cost more at higher clocks, than the eDRAM.
 

Rodin

Member
Yeah. Architecturally it is better, but that is quite a large deficit to fill.
No it isn't. 176 vliw5 gigaflops (plus whatever enhancement they made to that architecture) are better than 240 R600 period. Wii U also has DX10.1 equivalent API, which are far closer to DX11 than DX9, and shader model 4.1. We had more than one source who said that the Wii U GPU is better than what's in last gen consoles.

Wii U, beynd having more RAM, is kinda worse than last gen if visual output is to go by...

6jdJl8P.gif


You're forgetting the z.

550MHz * 8 ROPs * (4Bpp color + 4Bpp z) * (read + write) = 70GB/s
550MHz * 4 ROPs * (4Bpp color + 4Bpp z) * (read + write) = 35GB/s

Now, I too seem to recall Latte having 8 ROPs, but I could be wrong.

You're not
 

LordOfChaos

Member
You're forgetting the z.

550MHz * 8 ROPs * (4Bpp color + 4Bpp z) * (read + write) = 70GB/s
550MHz * 4 ROPs * (4Bpp color + 4Bpp z) * (read + write) = 35GB/s

Now, I too seem to recall Latte having 8 ROPs, but I could be wrong.


So this would be bandwidth available directly to the ROPs, assuming they're tightly linked a la 360, correct? Such as that claiming a 256GB/s connection between eDRAM and ROPs, but a more modest 32GB to everything else on-GPU.

edrambandwidth.gif
 
No it isn't. 176 vliw5 gigaflops (plus whatever enhancement they made to that architecture) are better than 240 R600 period. And we had more than one source who said that the Wii U GPU is better than what's in last gen consoles.
Any link to that source? And did they specify "what" is better in it?
I am being pretty serious about this. Rendering-wise, (the quality of rendering), no wii U game is reaching the heights of the Crysis series, GTA V, or the Naughty Dog games. If that is because a lot of their games focuse on 60fps rendering, then so be it. But the 30fps Wii U games have been completely visually underwhelming from my perspective. Even the 60fps games have me scratching my head at certain junctures.
 

my6765490

Member
Looking back, it's still sort of astounding that Nintendo launched with such anemic HW, given that smartphone SOCs were coming out with GPUs within the same power range (~S800) in mid-2013, with more features and (likely) better CPU performance, wouldn't have been easier to just contact QCOM for some semi-custom stuff?
 

AlStrong

Member
So whats the cost differences, I'm assuming the DDR3 cost more at higher clocks, than the eDRAM.
You have to do a cost analysis.

Let us say... $3000 for a 45nm wafer @ 147mm^2 (11.88x 12.33) you get ~400 dies on a 300mm wafer. (We pretend that 45nm is mature as hell despite eDRAM complexities) On an optimistic day, 80% yield. Hell maybe Nintendo was ultra conservative with 550MHz, but nevermind.

320 usable dies - $9.38 per chip -> round it to $9.50 for something asfdoiapsfdnalsnflksflksajhfdlkajlvjanvdlicna

Each DDR3 chip costs say $3.

So you shave off the 32MB eDRAM for 100mm^2 chip while compensating with at least double the DDR3 chips.

-------------

Fuzzy math, but I don't think DDR chips go much below a few dollars per chip at the end of the day. It's just to put it into perspective. Keep in mind that DDR3-1600 @ 2x what is in WiiU would still be pitiful.
 
I think it has to be more than four or five gamecubes. I'd say at least eight.

I mean the Gamecube was one Gamecube, Wii was two, but Wii U is way more than twice as powerful as the Wii.
 
You're forgetting the z.

550MHz * 8 ROPs * (4Bpp color + 4Bpp z) * (read + write) = 70GB/s
550MHz * 4 ROPs * (4Bpp color + 4Bpp z) * (read + write) = 35GB/s

Now, I too seem to recall Latte having 8 ROPs, but I could be wrong.
Hmmm. Does it have to be read+write? And just to play devil's advocate a little, what about the AMD GPUs which have 8 ROPs and less bandwidth? Looking at Wikipedia, the RV730 XT, for instance, features a bus to GDDR4 @ 32 GB/s.

It's gonna be a bunch of Amiibos that combine like Voltron. :p
If it's amiibo powered, I'm (somewhat sadly) in pretty good shape! God help me.
Basically that. It let's them get away with less external I/O, which also has power consumption implications. I may not worry too much about latency. The highest bandwidth consumer will be the ROPs, and the GPU will be hiding latency as much as possible while the CPU... does stuff. I'm not particularly convinced the CPU has a significant role here considering the upgrades they already did to Gekko. ;) (cue Blu shooting me down)

That said, eDRAM isn't necessarily cheap, but they must have had a good contract with Renesas at least to deem it a worthwhile design. With the CPU's SOI eDRAM, IBM would probably have been jumping at the chance to make use of their fabs (before they sold them to GF), even for such an elfin chip size.

All duly noted. MEM1 is probably mostly for framebuffer (triple buffering for Vsync?) since the CPU has its own eDRAM on chip plus access to the 3 MB MEM0 on the GPU.

Renesas could have given them a nice deal or maybe not. Nintendo did require that pool for Wii backwards compatibility. It didn't have to be on-chip necessarily, but then we'd be looking at a 3 part MCM. Well, 4 if we count that tiny NOR FLASH chip next to the CPU.
 

Rodin

Member
Any link to that source? And did they specify "what" is better in it?

I am being pretty serious about this. Rendering-wise, (the quality of rendering), no wii U game is reaching the heights of the Crysis series, GTA V, or the Naughty Dog games. If that is because a lot of their games focuse on 60fps rendering, then so be it. But the 30fps Wii U games have been completely visually underwhelming from my perspective. Even the 60fps games have me scratching my head at certain junctures.

Crysis 3 runs like ass on last gen consoles and GTA V, while being quite an accomplishment for such outdated hardware, runs possibly even worse. Nintendo games are usually locked 60fps, use advanced shader and we've seen some display of radiosity global illumination (which i don't recall was ever used in any last gen game) in combination with hd textures and a fair amount of polygons (nothing super impressive, but not underwhelming either all things considered). The only two 30fps first party games are Xenoblade X, which features maybe the biggest non-procedurally generated world ever, with zero loading screens and locked 30fps despite its giant areas full of elements on screen and despite being made with a mid tier budget, and Zelda U, which isn't out yet but showed things never seen on last gen hardware, especially in an open world game.

All these games are not without problems, image quality is usually underwhelming and lack of AA in so many games is quite baffling, not to mention the generally poor (or lack of) anisotropic filtering. I think that for a console that came out in 2012 they should've aimed at least for hardware that allowed this level of fidelity at 900p-1080p, but that doesn't make Nintendo games worse than last gen best accomplishments by any mean. They just use resources differently.

Clock * shaders * 2 = flops.

550mhz * 320 * 2 = 352 GFLOPS

Or

Clock * shaders * 2 = flops.

550mhz * 160* 2 = 176 GFLOPS
 

AmyS

Member
We know Xbox 360 GPU was 240 gflops and PS3's GPU was 192 glfops, It's really hard to believe the Wii U GPU is that much higher, with 352 glops.

176 glops seems a lot more believable.

While not specific on specs, Gamesindustry.biz and Eurogamer article from early 2012 citing developer sources would be right in line with the lower number.

Developers have indicated that the Wii U isn't as powerful in graphics terms as the PlayStation 3 and Xbox 360.

Separate development sources, speaking under condition of anonymity, told GamesIndustry International that when it comes to visuals, the Wii U is "not as capable" as Sony and Microsoft's current generation of home consoles.

"No, it's not up to the same level as the PS3 or the 360," one developer said of Nintendo's first high definition home console. "The graphics are just not as powerful."

Another source stated: "Yeah, that's true. It doesn't produce graphics as well as the PS3 or the 360. There aren't as many shaders, it's not as capable. Sure, some things are better, mostly as a result of it being a more modern design. But overall the Wii U just can't quite keep up."

http://www.eurogamer.net/articles/2012-04-03-wii-u-not-as-capable-as-ps3-xbox-360-report
http://www.gamesindustry.biz/articl...ess-powerful-than-ps3-xbox-360-developers-say
 

Rodin

Member
We know Xbox 360 GPU was 240 gflops and PS3's GPU was 192 glfops, It's really hard to believe the Wii U GPU is that much higher, with 352 glops.

176 glops seems a lot more believable.

While not specific on specs, Gamesindustry.biz and Eurogamer article from early 2012 citing developer sources would be right in line with the lower number.



http://www.eurogamer.net/articles/2012-04-03-wii-u-not-as-capable-as-ps3-xbox-360-report
http://www.gamesindustry.biz/articl...ess-powerful-than-ps3-xbox-360-developers-say

That quote is very old and was debunked many times, there are a few developers who said the opposite and didn't hide behind anonimity.
 

btrboyev

Member
Looking back, it's still sort of astounding that Nintendo launched with such anemic HW, given that smartphone SOCs were coming out with GPUs within the same power range (~S800) in mid-2013, with more features and (likely) better CPU performance, wouldn't have been easier to just contact QCOM for some semi-custom stuff?

Show me any smartphone game that looks as good and runs as good as Mario Kart. Or Xenoblade.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
So this would be bandwidth available directly to the ROPs, assuming they're tightly linked a la 360, correct? Such as that claiming a 256GB/s connection between eDRAM and ROPs, but a more modest 32GB to everything else on-GPU.

edrambandwidth.gif
Yes. Keep in mind that if the Z/ROPs are not integrated into the eDRAM block (the way they were on Xenos' daughter die), but sit with the GPU so that zexels have to travel from the zbuffer to the GPU and back, then the efficiency of your lossless z compression scheme across the bus could drop arbitrarily - from a guaranteed rate for Xenos, to no compression at all for pathological cases. That's because in contrast to a primitive's z's, the z's from the footprint of the primitive in the buffer don't need to exhibit any coherency whatsoever.

Hmmm. Does it have to be read+write? And just to play devil's advocate a little, what about the AMD GPUs which have 8 ROPs and less bandwidth? Looking at Wikipedia, the RV730 XT, for instance, features a bus to GDDR4 @ 32 GB/s.
I've been discussing the worst-case scenario for the console part- read+write of uncached, uncompressable data. Clearly, on the average, your ROPs don't work at 100% utilisation unless you get ultra-simplistic shading or just zexels (which, btw, is a valid use-case). As re that 730XT - it's just an unbalanced part - it will be doing much worse with its 6GPix/s as it will also need to feed its TMUs from the same 32GB/s pool.
 

AlStrong

Member
We know Xbox 360 GPU was 240 gflops and PS3's GPU was 192 glfops, It's really hard to believe the Wii U GPU is that much higher, with 352 glops.

It's probably more accurate to label Xenos as 216GFLOPs given the vec4+1 nature of the ALUs, but anyways. There will be some other difficulties in comparing to the vec5 of the r7xx generation.

Similarly, there are some quirky... quirks of G7x that would make the theoretical flops laughable.
 
Just for reference, since we are flopping around here, Xenos pushes a "mere" ~216 GFLOPS. Its architecture is a bit unique, and 1 out of every 5 shaders does a single floating point operation per cycle rather than 2.

Edit: Damn you, Al!
 

AlStrong

Member
So this would be bandwidth available directly to the ROPs, assuming they're tightly linked a la 360, correct? Such as that claiming a 256GB/s connection between eDRAM and ROPs, but a more modest 32GB to everything else on-GPU.

edrambandwidth.gif

The 256GB/s is actually the full 4xMSAA bandwidth connections that they built into the eDRAM. It is.... nutty if you think about it.

Basically, if MSAA was not utilized, you'd just see a quarter of that for read & write, but then they designed it to feed the "8 ROPs" exactly for read and write. It's just a number that results from what the ROPs would consume.

The 32GB/s is somewhat reflective of the fact that any MSAA target would be resolved to a single sample per pixel in GDDR3. In the simplest case, you're looking at 8 ROPs, Read + Write, 32 bit per pixel @ 500MHz, which just so happens to need 32GB/s.

-------

Hope that makes sense
------

/returning to DOA cave.
 

AmyS

Member
It's probably more accurate to label Xenos as 216GFLOPs given the vec4+1 nature of the ALUs, but anyways. There will be some other difficulties in comparing to the vec5 of the r7xx generation.

Similarly, there are some quirky... quirks of G7x that would make the theoretical flops laughable.

I'm not sure if the following is the same as what you mean by laughable (I would agree tho).

Sony claimed a massive number for PS3's Nvidia RSX GPU at E3 2005 - 1.8 Tflops (that's the "same" number as PS4's GPU today) which I think we can all agree was from adding up every single function on the GPU, programmable and fixed function, better known as "NvFlops". Totally laughable. No different than Microsoft at GDC 200 claiming 140 gflops for the original Xbox.
 
I am being pretty serious about this. Rendering-wise, (the quality of rendering), no wii U game is reaching the heights of the Crysis series, GTA V, or the Naughty Dog games. If that is because a lot of their games focuse on 60fps rendering, then so be it. But the 30fps Wii U games have been completely visually underwhelming from my perspective. Even the 60fps games have me scratching my head at certain junctures.

Yup I agree with you here if you post direct feed pics of 360/ps3 vs wiiu games, 360/ps3 games look better by a large margin. People will mention xenoblade x but it's not doing anything impressive technically aside from running at a stable 30fps with good data streaming, but the graphics have so many technical flaws and short comings nobody should use as the most technically impressive game on wiiu/360/ps3.
 

AlStrong

Member
I'm not sure if the following is the same as what you mean by laughable (I would agree tho).

Sony claimed a massive number for PS3's Nvidia RSX GPU at E3 2005 - 1.8 Tflops (that's the "same" number as PS4's GPU today) which I think we can all agree was from adding up every single function on the GPU, programmable and fixed function, better known as "NvFlops". Totally laughable. No different than Microsoft at GDC 200 claiming 140 gflops for the original Xbox.

haha yeah, there was that 1.8 number... but RSX did indeed have some low level shit to deal with. First thing that comes to mind are certain vertex formats cutting speed in two.

Mind you, I didn't develop on it directly, so I'm just recalling some really old info. Don't have to believe me, that's ok. :)

---------

Perhaps one of the dumbest things was simply how broken the HW scaler was for RSX. There's a reason why even first party games had weird stuff like 1080i-specific modes. And perhaps some of DF conclusions regarding minor resolution discrepancies are worse than they are these days if you consider that the RSX scaling was just so shit.

Things have improved with scaling!
 
Top Bottom