Support NeoGAF

ozfunghi · Feb 8, 2013

Fourth Storm said:
But what if (as the chip labeling suggests) the GPU design was finished a couple years back? RAM is one of the last system aspects to fall in place. Back in 2010/2011, it may not have been clear that 4 gigabit DDR3 chips would be available. In that case, they would have required 8 DDR3 chips (or 8 GDDR3 chips if they only planned on 1 GB of RAM). In that case, it would have been wise to design a wider I/O just in case.

Doesn't make sense imo. If the chip were finished years ago, then why wasn't it inside the earliest devkits. Then what has Nintendo been doing up until december 2012? Upgrading the RAM?

The Abominable Snowman said:
The thing is, the same thing had been said at the end of last generation (Xbox vs Xbox 360) except to a larger degree, yet we see where things turned out. The best looking game of this gen released imo is Halo 4, and Halo 4 has so much room for improvement just in IQ and texture quality.

Looking at games like Metroid Prime 3 running on Dolphin in HD... what exactly is missing for that game to be acceptable as PS360 generation software? Shaders. In fact, i bet if you were to emulate Chronicles of Riddick (xbox) in HD, it would absolutely not look out of place on 360. So after the big resolution upgrade and the use of programmable shaders 'this' gen... where is Orb/Dur going to make the same leap forward? Polycount? AA? FPS? Extra effects? Higher res textures? Because we got those as wel going from PS2 to PS3, on top of resolution upgrade and shaders. Unless we're going to see raytracing, i'm not seeing the same leap happening again.

joesiv · Feb 8, 2013

ozfunghi said:
Doesn't make sense imo. If the chip were finished years ago, then why wasn't it inside the earliest devkits. Then what has Nintendo been doing up until december 2012? Upgrading the RAM?

There is much more to a development kit than the GPU. Besides the overall drawings were likely fairly locked down long long before they actually tapped out, or were able to manufacture them at any scale. It's possible they had the GPU design to accommodate a couple different memory configurations (or just to support the development kits higher memory count).

ozfunghi · Feb 8, 2013

joesiv said:
There is much more to a development kit than the GPU. Besides the overall drawings were likely fairly locked down long long before they actually tapped out, or were able to manufacture them at any scale. It's possible they had the GPU design to accommodate a couple different memory configurations (or just to support the development kits higher memory count).

Still doesn't explain why the presumed "finished" chip wasn't in the devkits to begin with, even if other parts weren't finished. Also, i even think we got confirmation about the timeperiod the chip was designed, and that was up until late 2011 iirc. I think Wsippel found some info on LinkedIn.

Edit: wait, so you are arguing the design was finished years ago, but they weren't able to produce a couple to put in the devkits until a few months ago?

Gahiggidy · Feb 8, 2013

joesiv said:
Is it possible that the extra IO is for the development kits which likely has more ram chips, of probably similar density.

Makes sense to me. Do dev-kits from other platforms account for this?

The Abominable Snowman · Feb 8, 2013

Schnozberry said:
Oh sure. It will be a step up. I have a nearly $2000 gaming PC. I play Far Cry 3 in 1080p on my TV with shit turned all the way up. My wife first started playing video games 8 years ago, and when she watches me play she doesn't really see why I spent so much money on the PC vs just getting the game for my 360. I think most of the gaming population will probably have that reaction to next Gen. It's better, but not $xxx better.

Will people on this forum jizz in their pants no matter what? I'm sure, but that's probably not a good indication of broad market appeal.

Yeah, I'm sure the mass market appeal lies solely on the games coming, which I'm sure will be the main draw.

I will say, however, I saw a 4K demonstration for the first time a couple months ago... and it looked amazing. Incredibly lifelike, and completely changed my perception of HD. Next gen consoles should be capable of it in a couple games at least, and that requires a pretty sizeable leap, as it has 4x the resolution alone. Not including the increased asset size and definition. I think that's what Sony was aiming for, and even the low end rumors suggest systems capable of showing us such.

Although 4K doesnt come across nearly as well in pictures.

But we're entirely too offtopic.

ozfunghi said:
Looking at games like Metroid Prime 3 running on Dolphin in HD... what exactly is missing for that game to be acceptable as PS360 generation software? Shaders. In fact, i bet if you were to emulate Chronicles of Riddick (xbox) in HD, it would absolutely not look out of place on 360. So after the big resolution upgrade and the use of programmable shaders 'this' gen... where is Orb/Dur going to make the same leap forward? Polycount? AA? FPS? Extra effects? Higher res textures? Because we got those as wel going from PS2 to PS3, on top of resolution upgrade and shaders. Unless we're going to see raytracing, i'm not seeing the same leap happening again.

Well yeah, I sort of see what youre saying, but while CoR looked awesome for the Xbox times, its 360 version was nowhere near being acclaimed for its visuals. And there have been huge, significant leaps in all of those fields you mentioned. If anything the 360 and PS3, on top of the GPU market developing too fast for most devs to catch up, are holding PC games back from being as spectacular as they could be

If you're not impressed by the tech demos, though, I can see what you mean by diminishing returns.

joesiv · Feb 8, 2013

Gahiggidy said:
Makes sense to me. Do dev-kits from other platforms account for this?

I know the origional Xbox had double the RAM on the motherboard of the dev kits, and actually the retail motherboard also had the traces and pads for the extra ram kits. I believe people modded the origional Xbox for double the RAM mainly for media center duties though.

The 360 was reported to have 512MB's of ram for dev kits, and intended to have 256MB's for consumers, which was then upped to 512MB for consumers due to developer outcry. I'm not sure if the mainboards of the consumer boards would have had the traces though, most certainly the GPU would have had support for it though, as they wouldn't have a separate part for the developement kits. Remember that the GPU in the Xbox 360 is similar to what Nintendo generally does, as it is a GPU but also handles things that other chips would do on competing systems like having the main memory controller and North/South bridges.

*edit* I guess this is one of the downsides of having the memory controller on the GPU silicon as you have to account for more variables such as memory counts for the Dev Kit, or last minute changes to specifications. Whereas having a separate North bridge would allow the memory management to be different depending on the boards spec. I should also mention that it's still possible that the Wii U's dev kits just use higher capacity DDR3 chips and the same quantity of them, making my theory moot lol.

MarkusRJR · Feb 8, 2013

I'm just going to apologize right out of he gate for being off-topic. I posted in two relevant Wii U threads and got no answers so I though I'd post in a popular Wii U thread.

My Wii U is having pretty bad frame rate problems when viewing the Wara Wara Plaza. It starts to take decent frame rate drops whenever Miis move or whenever I zoom in and move the camera around. It's gotten to the point where I'm worried it's a hardware problem, despite the fact the console runs normally otherwise (cool to the touch, quiet, etc). Could someone check their Wii U and tell me if they have similar frame rate issues when looking around the Wara Wara Plaza. If it was a widespread thing like the slow OS I'd be happy because that would require no effort from me and would probably get fixed in a patch.

Could someone please help me?

Gahiggidy · Feb 8, 2013

Just an idea... but try disabling your WiFi connection and than rebooting with the default Miis. I'm wondering if its a rogue Mii with, say, a 50-million polygon hat that snuck into your Wara Wara plaza.

gingerbeardman · Feb 8, 2013

Yaceka said:
Could someone please help me?

Does it still happen when disconnected from the Internet? Delete your wifi settings and try again.

Snap.

ozfunghi · Feb 8, 2013

The Abominable Snowman said:
Well yeah, I sort of see what youre saying, but while CoR looked awesome for the Xbox times, its 360 version was nowhere near being acclaimed for its visuals. And there have been huge, significant leaps in all of those fields you mentioned. If anything the 360 and PS3, on top of the GPU market developing too fast for most devs to catch up, are holding PC games back from being as spectacular as they could be

If you're not impressed by the tech demos, though, I can see what you mean by diminishing returns.

Well, looking at tech demo's that (except for Nintendo's) are rarely achieved in the generation later on, but whatever. Take the S-E demo, i think it's the best looking one i've seen. Wasn't really impressed with Samaritan or the UE4 demo. Now, i doubt if games are to pull that off on next gen consoles (orb/dur), that it will be in a resolution higher than 720p, much smoother than 30fps. And while i'd like WiiU to be able to do that, looking at a game that still has about a year of development to go (Xenoblade2), i think i'll be content.

MarkusRJR · Feb 8, 2013

gingerbeardman said:
Does it still happen when disconnected from the Internet? Delete your wifi settings and try again.

Snap.

Okay the fps drop isn't as bad as it was before as there are less Miis but it's still around 20-30 fps when I move around (if my eyes aren't failing me). Here's a test if someone could try: connect a Wii remote and on the default zoom level on the Wara Wara Plaza, twirl the Wii remote cursor quickly in a circle. Is there any drops in frame rate?

The Abominable Snowman · Feb 8, 2013

ozfunghi said:
Well, looking at tech demo's that (except for Nintendo's) are rarely achieved in the generation later on, but whatever. Take the S-E demo, i think it's the best looking one i've seen. Wasn't really impressed with Samaritan or the UE4 demo. Now, i doubt if games are to pull that off on next gen consoles (orb/dur), that it will be in a resolution higher than 720p, much smoother than 30fps. And while i'd like WiiU to be able to do that, looking at a game that still has about a year of development to go (Xenoblade2), i think i'll be content.

We have differing opinions then, because while a couple demos for this gen were outlandish, like the Madden demo, a lot of the demos have absolutely been surpassed. Its just that the ones that failed, like KillZone and Madden demos, are the ones highlighted.

The FF7 techdemo is nothing special now. The Xbox 360 Mesh demo looks bad compared to what we have now. The UE 360 demo was arguably Gears of War, at better resolution at higher framerate. The Nvidia demos have nice art but look like they'd place squarely in the middle of this gen.

guek · Feb 8, 2013

Yaceka said:
I'm just going to apologize right out of he gate for being off-topic. I posted in two relevant Wii U threads and got no answers so I though I'd post in a popular Wii U thread.

My Wii U is having pretty bad frame rate problems when viewing the Wara Wara Plaza. It starts to take decent frame rate drops whenever Miis move or whenever I zoom in and move the camera around. It's gotten to the point where I'm worried it's a hardware problem, despite the fact the console runs normally otherwise (cool to the touch, quiet, etc). Could someone check their Wii U and tell me if they have similar frame rate issues when looking around the Wara Wara Plaza. If it was a widespread thing like the slow OS I'd be happy because that would require no effort from me and would probably get fixed in a patch.

Could someone please help me?

This doesn't happen with mine.

I'd recommend just calling nintendo. If it really is a hardware problem, they'll fix it for free likely within a week.

ozfunghi · Feb 8, 2013

The Abominable Snowman said:
We have differing opinions then, because while a couple demos for this gen were outlandish, like the Madden demo, a lot of the demos have absolutely been surpassed. Its just that the ones that failed, like KillZone and Madden demos, are the ones highlighted.

The FF7 techdemo is nothing special now. The Xbox 360 Mesh demo looks bad compared to what we have now. The UE 360 demo was arguably Gears of War, at better resolution at higher framerate. The Nvidia demos have nice art but look like they'd place squarely in the middle of this gen.

Even so, that's not the core of the point i was trying to make. Take the absolute best looking tech demo* as a benchmark (imo, the S-E demo) and compare it to the best graphics PS360 spit out, knowing WiiU will likely do a tad better yet... i'm not seeing the same leap as PS2 > PS3

*to be clear, i'm talking about the next-gen tech demo's

The Abominable Snowman · Feb 8, 2013

ozfunghi said:
Even so, that's not the core of the point i was trying to make. Take the absolute best looking tech demo* as a benchmark (imo, the S-E demo) and compare it to the best graphics PS360 spit out, knowing WiiU will likely do a tad better yet... i'm not seeing the same leap as PS2 > PS3

*to be clear, i'm talking about the next-gen tech demo's

Well, lets wait for the gameplay vids instead of tech demos. I definitely think the next gen demos and the enthusiast PC mods and titles are going to show a generational divide, as they aren't exactly what we're getting as much as a tease and demonstration of how much the next gen has improved over what we're getting. I think the general gaming populous, who made games like Gears of War and Crysis phenomenons because of how good they looked, will be impressed enough.

And if not, just thinking of the general gaming improvements makes me happy. We have miles to go, even in game logic.

ozfunghi · Feb 8, 2013

Well, it is a pointless discussion since neither can look into the future. I'm still inclined to believe WiiU games will start to look better as well. You could argue the jump from late xbox to early 360 games, was not as big as from early to late 360 games. WiiU hasn't peaked yet, and next gen isn't going to look any better than photorealism i'm sure.

heringer · Feb 8, 2013

This topic is hard to follow.

So, WiiU's hardware, better or worse than realistic expectations?

prag16 · Feb 9, 2013

Schnozberry said:
Oh sure. It will be a step up. I have a nearly $2000 gaming PC. I play Far Cry 3 in 1080p on my TV with shit turned all the way up. My wife first started playing video games 8 years ago, and when she watches me play she doesn't really see why I spent so much money on the PC vs just getting the game for my 360. I think most of the gaming population will probably have that reaction to next Gen. It's better, but not $xxx better.

Will people on this forum jizz in their pants no matter what? I'm sure, but that's probably not a good indication of broad market appeal.

This is almost exactly what I said in one of the Crysis threads. It'll be noticeably better but not $499 (or whatever) better.

As you say there'll be a segment of people who will go nuts over the higher res textures and higher quality effects. But the owners of those 150 million PS360s out there will not collectively react that way.

Datschge · Feb 9, 2013

heringer said:
So, WiiU's hardware, better or worse than realistic expectations?

Depends on the kind of realistic expectations.

Raist · Feb 9, 2013

prag16 said:
EDIT @tipoo: Sure, but this time around, even the alleged raw numbers tell us that the jump is much smaller. As was said, this time it's less than an order of magnitude by most measures whereas last time it was mor ethan an order of magnitude by most measures.

Which measures exactly?

Branduil · Feb 9, 2013

tipoo said:
I seem to remember this sentiment reflected during the end of a few generations, at least the last one for sure. The thing is, we don't get to see how much more developers can do with new hardware until it has been out for a few years. The PC side of things sometimes gives us a small glimpse, but even that is mostly games that are made with console limitations in mind and thrown a few extra fancy effects.

Need I remind people what the start of this generation looked like

Pretty sure that's just a screenshot of the never-released Xbox version of the game.

Thraktor · Feb 9, 2013

ROPless GPUs and Transactional Memory

For those who haven't read it, Popstar posted this over in the technical discussion thread:

Popstar said:
*Random thinking out loud probably not related to the actual Wii U GPU*

If you have all that memory embedded right on the GPU and accessible to the shader units with low latency, do you need conventional ROP hardware at all? Or can you just do blending in the shader like a PowerVR / Tegra chip? Perhaps with mini-rops for Z / stencil test?

Someone later posted a link to this blog post, which explains why blending in shaders is usually a disaster. As my (limited) understanding of this goes, there are pretty much two things a GPU needs for blending in shaders to work.

The first is a low-latency memory pool large enough to hold both the framebuffer and Z-buffer. Wii U has this in the 32MB MEM1 pool of eDRAM.

The second, and I believe more important, aspect is that the GPU needs a fully transactional interface for this memory pool. (For a description of what I mean by transactional, have a read through this Ars article.) A GPU with blend shaders is an almost perfect example of the problem that transactional memory is designed to resolve, as there are a large number of units performing operations on a common memory pool, and the small granularity of the data access makes a conventional locking scheme almost completely infeasible. In the Wii U even more so, as there'd also be the extra issue of three CPU cores contending for access.

So, how would a transactional memory interface for the eDRAM be implemented? In the BlueGene/Q chip it's implemented in a shared cache, but that isn't strictly necessary, and all we actually need is a buffer. This buffer would operate in a (relatively) simple manner. Every time a thread starts an atomic op on the eDRAM, all reads and writes within that op are kept in the buffer. When the atomic op is finished, the buffer logic tests to see whether the data have been changed, and if they haven't it commits the writes. If they have, it cancels the op and tells the thread to retry.

Because the eDRAM is just 32MB, and the framebuffer shader operations would be operating on very small pieces of data at a time, a transactional buffer on Latte wouldn't actually need to be very big, but it would need to be very fast. Latte has an absolutely perfect candidate for this in the 1MB of SRAM up in the left corner. Block A (and possibly B) would house the logic necessary. The transactional buffer could handle MEM1 access for texture units, pixel shaders, blend shaders, CPU cores, and possibly even the ARM and DSP, and should be able to do so with a near-negligible increase in latency. In fact, given the SRAM is already there for BC, such a transactional interface would already be useful for Latte even without blend shaders, given the potential difficulty of managing MEM1 between so many components.

Now, there is one issue with the notion of blending via pixel shaders, and it's this: if someone had solved the blending via shaders problem, wouldn't you expect the GPU they produce to have more shader bundles as a proportion of the die area? We, however, seem to have a lower shader to die area ratio than we would have expected. In that case, it seems like we'd be looking at special blend shader units distinct from the pixel shaders.

Fourth Storm · Feb 9, 2013

Here's some more fuel to add to the fire apropos ROPs. Check out this photo of llano. It's a VLIW5 APU design that actually seems to share more similarities to Latte than the old RV770 die:

http://images.anandtech.com/reviews/cpu/amd/llano/review/desktop/49142A_LlanoDie_StraightBlack.jpg

Any guesses as to where the ROPs are there? Besides the obvious structures (LDS, ALU, TMU,Texture L1), I am hard pressed to find two blocks that are exactly the same in layout/SRAM banks.

Perhaps, as they did with the ALUs in both Llano and Latte, Renesas/AMD/whoever were able to fit what were formerly two blocks into one.

In other words, 8 ROPs in one block.

...Maybe?

darkness_s · Feb 9, 2013

Yaceka said:
Okay the fps drop isn't as bad as it was before as there are less Miis but it's still around 20-30 fps when I move around (if my eyes aren't failing me). Here's a test if someone could try: connect a Wii remote and on the default zoom level on the Wara Wara Plaza, twirl the Wii remote cursor quickly in a circle. Is there any drops in frame rate?

I tried this how you asked and also while zooming in and out and the framerate doesn't drop. I think you should call Nintendo, like guek suggested.

lwilliams3 · Feb 9, 2013

Looks like Marcan is trying to activate a neogaf account. It would be easier for him to say what he wants without such a restricting text limit.

Fourth Storm · Feb 9, 2013

Ah, the chip is face down on the motherboard. Of course! I concede defeat on this one.

CTLance · Feb 9, 2013

Wait... so that means... what, exactly?

lwilliams3 said:
Looks like Marcan is trying to activate a neogaf account. It would be easier for him to say what he wants without such a restricting text limit.

Ooooh, the more the merrier. Hopefully the mods can fast-track his application.

Fourth Storm · Feb 9, 2013

It means I'm an idiot. lol

But I'll be damned if that Chipworks photo didn't appear taller than it is wide.

Edit: It also strengthens the case for traditional ROPs. I'm thinking either the Q blocks or the W blocks at this point.

Popstar · Feb 9, 2013

Thraktor said:
ROPless GPUs and Transactional Memory

I hope this is true because it would be a really interesting piece of hardware!

That said, Tegra shows that you can have a ROPless architecture that is fairly conventional otherwise.

My speculation about having no ROPs was based on the following:
- Nobody has identified them on the die with a great degree of confidence.
- The framebuffer being embedded on the die removes a great deal of the latency penalty.
- The ability to read/write directly from memory is useful for GPGPU.
- The problems some titles are seemingly having with transparency while others are fine could be explained by a significant change in how it's handled compared to the PS360. So basic ports might not be well adapted.
- Removing the ROPs saves power and Nintendo seems all about that.
- I was googling EXT_shader_framebuffer_fetch because I couldn't remember what version of iOS added it, and was wondering if I could use it in an app that targeted iOS 5.0.

But I think it's likely the ROPs are there and we just can't easily identify them. It's fun to speculate though.

Elfforkusu · Feb 9, 2013

lwilliams3 said:
Looks like Marcan is trying to activate a neogaf account. It would be easier for him to say what he wants without such a restricting text limit.

One of us. One of us.

bgassassin · Feb 9, 2013

Fourth Storm said:
Welcome back, bg! Neogaf is a hell of a drug, ain't it? Hope all has been well on your end.

Now that formalities are out of the way, on to your hypothesis. Something everyone should know about me is that there is nothing too far fetched that I won't at least give it a fair shot. I love thinking outside the box and when it comes to Nintendo, one almost has to get used to that way of thought.

That being said, here's how I look at the possibility of Nintendo designed fixed function/dedicated silicon blocks on Latte. I broke it down into a little "for and against." Anyone please feel free to add your own.

FOR:

-Wii BC blocks might be present - why not upgrade them for Wii U functionality?
-BG's source and Li Mu Bai have both hinted at such a possibility
-3DS' "Maestro" extensions are fixed function
-Predictable performance
-Low power consumption

Against:

-Not present in leaked features list - this is not something like clockspeed or ALU count that would have tipped off the competition or disappointed fans. If there is something like "free per pixel lighting," such a feature is exactly the type we would expect to be listed with the others in the initial vgleaks specs.

-Not heard of by Ideaman or mentioned by any developers in interviews.

-In an investor Q&A some time back, Iwata was questioned on why they chose fixed function pixel shaders for 3DS. What much of it boiled down to was that they were appropriate, at the time, for a portable device. Wii U is a different scenario.

-If these custom blocks exist, where did Nintendo get them from? On 3DS, they licensed the technology from DMP. Put plainly, I don't know if Nintendo's engineers are in the business of designing hardware blocks from scratch. Would AMD have really been a help with something this different from the current trend?

-Something this exotic would be in documentation. This is not like clock speeds and ALU count, where developers can easily gauge performance on their own. Further, Nintendo told them basically all they need to know in that the chip is based on the R700 ISA and uses a similar API as OpenGL. Devs know how to use that stuff. Not so with any dedicated silicon. Would Nintendo really sabotage 3rd parties this way?

-Legacy Wii hardware blocks (RAM) seem to be locked off by Nintendo. It's likely that if they included legacy Wii logic, they have done the same, rather than try to shoehorn it into a modern graphics pipeline, where it has no place.

Thoughts, anyone?

Haha. Gaf is a good messageboard, but I wouldn't be posting if it weren't for the effort you all put forth. But I did try to resist.

I'll target some of the against portions.

All the features originally leaked were already known R700 features and suggested nothing about customizations, while the die shot suggests there are customizations. I believe it was mentioned before, but when looking at the details that have come out about PS4 and Xbox 3, you'd have to assume that if more details were made available for Wii U they would have come out as well. In retrospect, while early Vigil talked about having to contact Nintendo directly on how to do things. Though it's funny they say the same thing as Iwata around E3 '12, but Iwata faced some criticism (talk about developing on well-known hardware vs Wii U).

We also haven't received much info from 1st party devs commenting on development in a similar manner.

In the context of that 3DS comment, since it was about fixed shaders vs programmable shaders meaning confirmation of Wii U having programmable shaders and leaving everything else to still be a mystery.

With the customizations Shiota said this:

Actually, a lot of the CPU and GPU designers this time have been working with us since development of Wiiwhich is a plus. They really like our products.

I think it would be rather easy for Nintendo to get what they want.

The API may not have been be fully completed to take full advantage of the hardware. The person I spoke with compared it to 360's development and we know they didn't get final hardware out till six months before launch.

Shiota also seemed very clear that instead of adding Wii components, they had Wii U components tweaked for BC.

These obviously shouldn't be taken as fact, but I felt the "againsts" could be reasonably debated.

Popstar said:
I hope this is true because it would be a really interesting piece of hardware!

That said, Tegra shows that you can have a ROPless architecture that is fairly conventional otherwise.

My speculation about having no ROPs was based on the following:
- Nobody has identified them on the die with a great degree of confidence.
- The framebuffer being embedded on the die removes a great deal of the latency penalty.
- The ability to read/write directly from memory (bypassing ROPs) is useful for GPGPU.
- The problems some titles are seemingly having with transparency while others are fine could be explained by a significant change in how it's handled compared to the PS360. So basic ports might not be well adapted.
- Removing the ROPs saves power and Nintendo seems all about that.
- I was googling EXT_shader_framebuffer_fetch because I couldn't remember what version of iOS added it, and was wondering if I could use it in an app that targeted iOS 5.0.

But I think it's likely the ROPs are there and we just can't easily identify them. It's fun to speculate though.

I think the thing about Tegra is that it does not utilize a unified shader architecture. I wonder if that has any bearing on that. I found this article showing how the ALUs looked in different Tegras.

http://www.anandtech.com/show/6666/the-tegra-4-gpu-nvidia-claims-better-performance-than-ipad-4

Popstar · Feb 9, 2013

bgassassin said:
I think the thing about Tegra is that it does not utilize a unified shader architecture. I wonder if that has any bearing on that. I found this article showing how the ALUs looked in different Tegras.

http://www.anandtech.com/show/6666/the-tegra-4-gpu-nvidia-claims-better-performance-than-ipad-4

I'm surprised they haven't went to a unified architecture. I don't think it has any bearing on being ROPless though. It allows them to keep the size of the chip down. In particular I notice that the pixel shaders only have 20 bits* precision which has to save a fair chunk of hardware.

*I wonder if I could find any old NVidia press releases from the Radeon 9700/GeForceFX** days about how "24 bits of precision isn't enough!" – the 9700 had 24 bits precision while the FX had 32 bits.

**If there was ever a video card that was an advertisement for not including old hardware for backwards compatibility when you have new functionality that could do the job, it's was the GeForceFX.

OryoN · Feb 9, 2013

Whoa.. seeing those enlarged images of other GPU dies makes me appreciate - even more - the exceptional quality of ChipWorks's photos.

Brad Grenz · Feb 9, 2013

wsippel said:
*I wonder if I could find any old NVidia press releases from the Radeon 9700/GeForceFX** days about how "24 bits of precision isn't enough!" – the 9700 had 24 bits precision while the FX had 32 bits.

I remember that. Especially ironic since the GeForceFX driver optimizations* usually ignored what the application requested and calculated everything in 16bit format since 32bit was so fucking slow. Meanwhile the Radeon could do 24bit at full speed.

*cheats

Fourth Storm · Feb 9, 2013

I've been thinking alot about the double blocks on the processor. Looking at typical Radeon setups, and in particular Cyprus, might I suggest the following:

-2x Render Back Ends
-2x DDR3 dual channel memory controllers
-2x Color cache
-2x Z/Stencil Cache
-2x Rasterizer

This matches the amount of block pairs we have. There are other blocks which represent good candidates for the following: North Bridge, Command Processor, Geometry Assembly Processor, Vertex Assembly Processor (w/ tesselator), Ultrathreaded Dispatch Processor, Instruction Cache, Constant Cache, Vertex Cache, Global Data Share, and L2 Texture Cache (Block V for that last one - I'm thinking you were right, Thraktor!).

We've already identified (w/ some help, of course), the Pixel Shaders, TMUs, and South Bridge/DSP/ARM core. Oddly enough, no blocks seem to jump out as obvious Local Data Shares or L1 Texture Caches. In all designs from R700 onward, they are tightly grouped w/ the SIMD cores/TMUs.

If all of these blocks are present, which I suspect they are, there are few left over for any fixed function units. That's just as well, I say. I'd rather have the few extra pixel shaders and R700 generation ROPs!

bgassassin · Feb 9, 2013

Popstar said:
I'm surprised they haven't went to a unified architecture. I don't think it has any bearing on being ROPless though.

So you don't think dedicated pixel shaders would do better with blending in a ROPless design?

OryoN said:
Whoa.. seeing those enlarged images of other GPU dies makes me appreciate - even more - the exceptional quality of ChipWorks's photos.

This reminds me. I bet if Chipworks took an even higher resolution picture we'd see an Iwata trollface.

DreadPirateRoberts · Feb 9, 2013

bgassassin said:
This reminds me. I bet if Chipworks took an even higher resolution picture we'd see an Iwata trollface.

There's probably one built into every ALU

ambientmystic · Feb 9, 2013

Brad Grenz said:
I remember that. Especially ironic since the GeForceFX driver optimizations* usually ignored what the application requested and calculated everything in 16bit format since 32bit was so fucking slow. Meanwhile the Radeon could do 24bit at full speed.

*cheats

Woah... the memories... NVIDIA was really rampantly cheating their way to the top in the benchmark programs (3D Mark 03, which Futuremark rectified with a major patch) and 3d games performance then during the FX 5xxx series.. Hard to believe ATI was top dog during the 9700/9800 era.. What a disaster the FX line was.

Lord Gaben of Valve was actually praising ATI and recommended thier cards for Half Life 2 at that time lol.

This is all thanks to their falling out with MS at the time, which caused them not to recieve the complete Direct X 9 specs on time. Of course they fixed things around with the 68xx series (which was the first card to have a fully featured Direct X 9.0c implementation), but their reputation took a big hit for that garbage.

Thanks for that blast from the blast my good man!

Popstar · Feb 9, 2013

bgassassin said:
So you don't think dedicated pixel shaders would do better with blending in a ROPless design?

I can't think of any way it would make a difference for better or worse.

The_Lump · Feb 9, 2013

Fourth Storm said:
Ah, the chip is face down on the motherboard. Of course! I concede defeat on this one.

Darn. Was just getting my t-shirt made: "I guessed right on gaf"

z0m3le · Feb 9, 2013

My last post was just an addition to the other options, I wasn't trying to replace those.

I know quite a bit less than some of the people in here, so I do have a question:
HD 5550 basically has the same set up we are speculating here, it does 550 million polygons, which is I think 50million more than xenos? my question though is, these would likely have the Gamecube ability to implement 8 textures per pass on each polygon right? Since I am unfamiliar with other Radeon graphic cards, is this still a useful distinction? and could it mean that Wii U could push more polygons in game than we might expect without taking this into account?

I ask because coldblooder pointed out to me that the bayonetta 2 model of the main character was over 190k polygons: http://www.abload.de/img/bayonetta_trianglesiru3u.png

It could easily be just a cutscene model and I know those are often as big as 500k, but is something like that possible for Wii U, considering Bayonetta's model would likely be a highly detailed character?

Thought I would ask the question here since it could bring to light an interesting aspect of the GPU that we are not talking about, something that Gamecube's GPU design almost centered around.

GhostTrick · Feb 9, 2013

z0m3le said:
My last post was just an addition to the other options, I wasn't trying to replace those.

I know quite a bit less than some of the people in here, so I do have a question:
HD 5550 basically has the same set up we are speculating here, it does 550 million polygons, which is I think 50million more than xenos? my question though is, these would likely have the Gamecube ability to implement 8 textures per pass on each polygon right? Since I am unfamiliar with other Radeon graphic cards, is this still a useful distinction? and could it mean that Wii U could push more polygons in game than we might expect without taking this into account?

I ask because coldblooder pointed out to me that the bayonetta 2 model of the main character was over 190k polygons: http://www.abload.de/img/bayonetta_trianglesiru3u.png

It could easily be just a cutscene model and I know those are often as big as 500k, but is something like that possible for Wii U, considering Bayonetta's model would likely be a highly detailed character?

Thought I would ask the question here since it could bring to light an interesting aspect of the GPU that we are not talking about, something that Gamecube's GPU design almost centered around.

It's just a model for cinematics. I highly doubt the ingame model would use that much.

z0m3le · Feb 9, 2013

GhostTrick said:
It's just a model for cinematics. I highly doubt the ingame model would use that much.

That wasn't the question, it was also posed as an explanation in my post, what I wanted to know is if Radeons do 8 textures per polygon now, or is Wii U uniquely able to push a much higher polygon count than similar designed Radeons thanks to a design of gamecube's GPU, and that sort of thing being integrated into Wii U's GPU for the sake of BC.

Edit: R700 "Up to 1,536 instructions and 16 textures per rendering pass" Is this the same as Gamecube's ability to pass 8 textures per polygon? IE 16 textures per polygon? or is it something different.

Do we know what M is? some people are speculating that it is a 9th SPU, it does seem that way... Was there any solid guesses as to what it is, the area looks similar in size and memory looks odd but correct to match them too... Is there a reason it wouldn't be a SPU?

z0m3le · Feb 9, 2013

lwilliams3 · Feb 9, 2013

z0m3le said:
That wasn't the question, it was also posed as an explanation in my post, what I wanted to know is if Radeons do 8 textures per polygon now, or is Wii U uniquely able to push a much higher polygon count than similar designed Radeons thanks to a design of gamecube's GPU, and that sort of thing being integrated into Wii U's GPU for the sake of BC.

Edit: R700 "Up to 1,536 instructions and 16 textures per rendering pass" Is this the same as Gamecube's ability to pass 8 textures per polygon? IE 16 textures per polygon? or is it something different.

Do we know what M is? some people are speculating that it is a 9th SPU, it does seem that way... Was there any solid guesses as to what it is, the area looks similar in size and memory looks odd but correct to match them too... Is there a reason it wouldn't be a SPU?

The Wii and the r700 series have a completely different architecture for "shaders", so Im unsure if you could even compare them so simply. It is like comparing Hollywood to the current-gen GPUs: it is definitely weaker in raw power but the way they render seems to be too different to give a numerical answer to how much it is.

krizzx · Feb 9, 2013

lwilliams3 said:
The Wii and the r700 series have a completely different architecture for "shaders", so Im unsure if you could even compare them so simply. It is like comparing Hollywood to the current-gen GPUs: it is definitely weaker in raw power but the way they render seems to be too different to give a numerical answer to how much it is.

Its not such an off comparison. The hardware within Hollywood allowed effects to be produced at much lower resource costs than the same effect on the other consoles. It would be nice if they kept that component for at least auxiliary purposes in the Wii U's GPU.

One thing I would like to know is why people just don't ask the good devs for help? They seem to base all assumptions for the power of various Nintendo hardware on the worst ports and devs who have little to no experience with the hardware like the Metro Last Light dev. The more experienced devs like Shin'en, the Toki Tori dev and the Trine 2: DC dev seem to go completely ignored when they talk about the Wii U's technical capabilities. Its like journalists and even some professionals prefer the negative news and lean heavily towards the most negative possibilities.

http://www.notenoughshaders.com/2012/11/03/shinen-mega-interview-harnessing-the-wii-u-power/
For Nano Assault Neo we already used a few tricks that are not possible on the current console cycle.

If Shin'en were not the best at using the Wii's GPU, they were definitely in the top three. Just look at the work they did with Jett Rocket. http://jettrocket.wordpress.com/

They are such a technical company and they seem to be very open to answering technical question. Why don't people ask for their thoughts on the matters of the hardware functions? I'm sure they will give us some input. They seem like the most credible devs currently on the console.

sfried · Feb 9, 2013

krizzx said:
If Shin'en were not the best at using the Wii's GPU, they were definitely in the top three. Just look at the work they did with Jett Rocket. http://jettrocket.wordpress.com/

They are such a technical company and they seem to be very open to answering technical question. Why don't people ask for their thoughts on the matters of the hardware functions? I'm sure they will give us some input. They seem like the most credible devs currently on the console.

I wonder if Factor 5 staff is still around to test the waters of the system.

Thraktor · Feb 9, 2013

Popstar said:
I hope this is true because it would be a really interesting piece of hardware!

That said, Tegra shows that you can have a ROPless architecture that is fairly conventional otherwise.

My speculation about having no ROPs was based on the following:
- Nobody has identified them on the die with a great degree of confidence.
- The framebuffer being embedded on the die removes a great deal of the latency penalty.
- The ability to read/write directly from memory is useful for GPGPU.
- The problems some titles are seemingly having with transparency while others are fine could be explained by a significant change in how it's handled compared to the PS360. So basic ports might not be well adapted.
- Removing the ROPs saves power and Nintendo seems all about that.
- I was googling EXT_shader_framebuffer_fetch because I couldn't remember what version of iOS added it, and was wondering if I could use it in an app that targeted iOS 5.0.

But I think it's likely the ROPs are there and we just can't easily identify them. It's fun to speculate though.

The issues with transparency are definitely something worth thinking about, and given they seem to affect quick/cheap/external ports but not exclusives or less pressured ports like Trine 2, it does seem to suggest some kind of change in architecture compared to other platforms.

On Tegra, I assume it's simply a matter of having so few shaders that contention between them for blend operations wouldn't be an issue. On Tegra 2, there may only be one of the four pixel shaders performing blend operations at any given time, so obviously contention isn't an issue there. For Tegra 3 and on, there's probably some careful allocation of blend duties to the pixel shaders to prevent them stepping on each other's feet over FB access. I wouldn't be surprised to see them add ROPs once they start to seriously increase the number of shaders on there.

I do agree with you that there probably are ROPs on there, but I wouldn't be surprised if they are quite different to traditional ROPs, although perhaps not as different as blend shaders.

Here's a few questions, though (not necessarily directed at you, but more general queries): if Latte did have blend shaders, what would they look like? How big would they be, and how many would be needed? What would the register (SRAM) requirements be? More or less than the unified shader bundles?

Fourth Storm said:
I've been thinking alot about the double blocks on the processor. Looking at typical Radeon setups, and in particular Cyprus, might I suggest the following:

-2x Render Back Ends
-2x DDR3 dual channel memory controllers
-2x Color cache
-2x Z/Stencil Cache
-2x Rasterizer

This matches the amount of block pairs we have. There are other blocks which represent good candidates for the following: North Bridge, Command Processor, Geometry Assembly Processor, Vertex Assembly Processor (w/ tesselator), Ultrathreaded Dispatch Processor, Instruction Cache, Constant Cache, Vertex Cache, Global Data Share, and L2 Texture Cache (Block I for that last one - I'm thinking you were right, Thraktor!).

We've already identified (w/ some help, of course), the Pixel Shaders, TMUs, and South Bridge/DSP/ARM core. Oddly enough, no blocks seem to jump out as obvious Local Data Shares or L1 Texture Caches. In all designs from R700 onward, they are tightly grouped w/ the SIMD cores/TMUs.

If all of these blocks are present, which I suspect they are, there are few left over for any fixed function units. That's just as well, I say. I'd rather have the few extra pixel shaders and R700 generation ROPs!

I still don't think there's a global data share on there, as the two pools of eDRAM should be more than enough to take over those duties. Local data shares were apparently 16KB shared between 80 SPUs for R700 chips, so that may simply be handled by a couple of the SRAM cells within the SPU bundles (at 8KB shared between 40, perhaps).

We know the SRAM operates as the texture cache in Wii mode, so it'd be a possibility for the L2 cache in Wii U mode, although of course it's in an odd location for it. The reason I'd find block I a bit odd as a cache is that it's got a whole load of different shapes and sizes of SRAM, whereas a cache would generally have a fairly uniform set of SRAM cells. This is why I initially suggested block V, as it has those relatively large, uniform SRAM cells (and is of course nicely located next to the DDR3).

Things like the Z/Stencil cache and colour cache should be on the ROP blocks, if I'm not mistaken, and would be pretty small in any case. Ditto with the L1 texture caches on the texture units.

Fourth Storm · Feb 9, 2013

Thraktor said:
I still don't think there's a global data share on there, as the two pools of eDRAM should be more than enough to take over those duties. Local data shares were apparently 16KB shared between 80 SPUs for R700 chips, so that may simply be handled by a couple of the SRAM cells within the SPU bundles (at 8KB shared between 40, perhaps).

We know the SRAM operates as the texture cache in Wii mode, so it'd be a possibility for the L2 cache in Wii U mode, although of course it's in an odd location for it. The reason I'd find block I a bit odd as a cache is that it's got a whole load of different shapes and sizes of SRAM, whereas a cache would generally have a fairly uniform set of SRAM cells. This is why I initially suggested block V, as it has those relatively large, uniform SRAM cells (and is of course nicely located next to the DDR3).

Things like the Z/Stencil cache and colour cache should be on the ROP blocks, if I'm not mistaken, and would be pretty small in any case. Ditto with the L1 texture caches on the texture units.

Ha! I said Block I, didn't I? I actually did mean Block V for the L2. I'll go back and edit the post.

I'm not sure about caches/LDS being in the blocks themselves. It seems odd to me now. I was initially thinking that myself and asked around on Beyond3D. It was brought up that caches usually have smaller SRAM "tags" around the main pool that give away their identity. I'm not seeing that in the texture units.

Do you have any thoughts on what components could comprise the "doubles?"

Thraktor · Feb 9, 2013

Fourth Storm said:
Ha! I said Block I, didn't I? I actually did mean Block V for the L2. I'll go back and edit the post.

I'm not sure about caches/LDS being in the blocks themselves. It seems odd to me. I was initially thinking that myself and asked around on Beyond3D. It was brought up that caches usually have smaller SRAM "tags" around the main pool that give away their identity. I'm not seeing that in the texture units.

Do you have any thoughts on what components could comprise the "doubles?"

That's assuming that what we think are the texture units are actually texture units! What if the texture units are broken up, though? Let's say that the texture units are broken up a bit differently. You may have two blocks handling texture decompression (these would contain the L1 cache) and another two for the actual texture sampling and filtering. Q1 and Q2 may be the former, and T1 and T2 may be the latter. Their relative positions would be a bit odd, but very few positions seem to make sense on Latte.

On the other doubles, we could also have the ROPs divided into two. W1 and W2 might perform Z/stencil and U1 and U2 blend. That'd account for most of the double units, bar the small S1 and S2 blocks.

Support NeoGAF

WiiU "Latte" GPU Die Photo - GPU Feature Set And Power Analysis

Member

Member

Member

My aunt & uncle run a Mom & Pop store, "The Gamecube Hut", and sold 80k WiiU within minutes of opening.

Member

Member

Member

My aunt & uncle run a Mom & Pop store, "The Gamecube Hut", and sold 80k WiiU within minutes of opening.

Member

Member

Member

Member

Banned

Member

Member

Member

Member

Banned

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Banned

Banned

Banned

Banned

Banned

Member

Junior Member

Member

Member

Member

Member

Similar threads