• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

WiiU technical discussion (serious discussions welcome)

mrklaw

MrArseFace
Meltdowns, meltdowns. Get your meltdowns.

-----

Anyway, can someone elaborate on how exactly "GPGPU" is going to be the Wii U's panacea, as has been made out as on the chalkboard.

From reading B3D, the impression I'm getting is that it likely isn't going to be that much use.

There was also some interesting speculation, that seemed to imply that eDRAM may not be any sort of magic bullet either... that it's bandwidth could be as poor as the system RAM.

If rumours have PS4/720 using an APU + Discrete GPU, then the GPU on the APU could be used for GPGPU processing, allowing the main CPU to be 'relatively' weak and performing CELL SPE type tasks on the GPGPU part. and using edram to mitigate large amounts of slower main memory.

That could be a mirror of what WiiU has - relatively weak CPU, GPGPU elements and edram. Just a lot faster.

So if this happens, developers will need to adapt to the new architectures, and will by default therefore be more set up to code for WiiU.

The big question is - where does WiiU sit compared to PS3/360 and PS4/720? If its closer to the new guys, then sharing a similar architecture may help, with some trimming of assets here and there. But if its closer to PS3/360, then it may be an active hinderance. Developers will be producing ports from PS3/360 engines but those port teams may not be familar with the new architecture and so they may never leverage the strengths of the platform.
 

The_Lump

Banned
What's happened here?? blu, tell them off again, quick!


Back on topic: on the VGLeaks spec sheet, it mentioned something about 512MB NAND for the system - is this likely to be the max size of the installed OS? Or is this just for storing basic compulsory system software?
 

AlStrong

Member
Stream out: Allows you to write directly to memory from a geometry or vertex shader bypassing the pixel shader (and ROPs).

This is basically memexport on Xenos.

Texture Arrays (incuding cube maps): This allows an array of textures to accessed as a single texture. Can be used like a texture atlas without the filtering problems.

Xenos had limited support for texture arrays up to 64 surfaces. IIRC, D3D10 went up to 512 (along with the cubemap stuff).

BC4/BC5: New texture compression formats. The most common use is probably for compressing normal maps.

Xenos had BC5 (DXN/3Dc).


MSAA textures / Shader Model 4.1 extended MSAA: Extensions to allow MSAA surfaces to be more flexibly used. In particular the ability to use MSAA with deferred shading techniques is improved.

Support is a non-issue for PS360; the bigger problem is the increased memory requirements & performance (aliasing larger RTs to store the resultant samples + shading/lighting said samples). Its inclusion in DX10.1 just highlights the API issues that were on PC.
 

AlStrong

Member
Yeah, Xenos is pretty much a prototype "Direct3D 10" part.

That's one of the reasons i'm doing this exercise actually. People tend to think of hardware as being in more discreet generations than it actually is. And i'm trying to get a sense where along the line the R700 actually sits.

:)

You probably should add Fetch4 support, which was introduced with R580/530/515, though it was only officially added much later to the DX11 spec.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Yeah, Xenos is pretty much a prototype "Direct3D 10" part.
Indeed.

I'm also curious if the GPU7 is pretty much an R700 part with some modifications for backwards compatibility and the changed memory configuration or if any Evergreen sneaked in there.
U-GPU has its own ISA, neither Wekiva nor Evergreen.
 

japtor

Member
If rumours have PS4/720 using an APU + Discrete GPU, then the GPU on the APU could be used for GPGPU processing, allowing the main CPU to be 'relatively' weak and performing CELL SPE type tasks on the GPGPU part. and using edram to mitigate large amounts of slower main memory.

That could be a mirror of what WiiU has - relatively weak CPU, GPGPU elements and edram. Just a lot faster.

So if this happens, developers will need to adapt to the new architectures, and will by default therefore be more set up to code for WiiU.

The big question is - where does WiiU sit compared to PS3/360 and PS4/720? If its closer to the new guys, then sharing a similar architecture may help, with some trimming of assets here and there. But if its closer to PS3/360, then it may be an active hinderance. Developers will be producing ports from PS3/360 engines but those port teams may not be familar with the new architecture and so they may never leverage the strengths of the platform.
My main concern would be SIMD support in the other CPUs. Whether that's implemented through the APU/GPU is a moot point since it sounds like Jaguar supports a bunch of SIMD instruction sets (barring MS/Sony sticking in crippled cores?). Something might be implemented one way and be difficult to implement in OpenCL or whatever usual stuff used on GPUs. Of course we don't really know what the hell the Wii U's GPU will support and/or if the SDK is limited/incomplete with regards to it, I'm just assuming the minimum for this example since...well again, it's not like we anything else about it.
Back on topic: on the VGLeaks spec sheet, it mentioned something about 512MB NAND for the system - is this likely to be the max size of the installed OS? Or is this just for storing basic compulsory system software?
The rumor was SLC NAND I thought, which made me think it'd be pointless for OS storage. SLC is used for the write endurance (and speed but advances have made that mostly moot), so I figured it'd be a big ass write cache/buffer of some sort rather than just a place for the OS to sit. My crazy theory was to use that as a sort of secondary loading/streaming buffer, like instead of being limited by disc speed and/or using some RAM as a buffer to compensate, the NAND could be used in some way to help out.
 

M3d10n

Member
This is basically memexport on Xenos.



Xenos had limited support for texture arrays up to 64 surfaces. IIRC, D3D10 went up to 512 (along with the cubemap stuff).



Xenos had BC5 (DXN/3Dc).




Support is a non-issue for PS360; the bigger problem is the increased memory requirements & performance (aliasing larger RTs to store the resultant samples + shading/lighting said samples). Its inclusion in DX10.1 just highlights the API issues that were on PC.

If I remember correctly, DX10.1 has a couple hardware features, otherwise the GeForce 8000 and 9000 could have been updated via new drivers. For example, DX10.1 allows the fragment shader to be executed for each MSAA sample and allows the application to control sample positions. There's also cubemap array support.
 

Oblivion

Fetishing muscular manly men in skintight hosery
I never asked this, but I am curious, why does pretty much every tech comparison with the Wii-U go with the 360? I don't think I've seen literally any article directly comparing the Wii-U to the PS3. Is it cause both the 360 and Wii-U have triple core architectures and GPUs made by Ati (or AMD, as they're called nowadays I guess)?
 

mrklaw

MrArseFace
I never asked this, but I am curious, why does pretty much every tech comparison with the Wii-U go with the 360? I don't think I've seen literally any article directly comparing the Wii-U to the PS3. Is it cause both the 360 and Wii-U have triple core architectures and GPUs made by Ati (or AMD, as they're called nowadays I guess)?

its a good question. Perhaps the edram and unified memory? And lack of understanding how developers are meant to leverage any GPGPU elements, which might otherwise lend comparisons to the PS3s SPEs.
 

Log4Girlz

Member
I never asked this, but I am curious, why does pretty much every tech comparison with the Wii-U go with the 360? I don't think I've seen literally any article directly comparing the Wii-U to the PS3. Is it cause both the 360 and Wii-U have triple core architectures and GPUs made by Ati (or AMD, as they're called nowadays I guess)?

They is twins.
 

BlackJace

Member
I never asked this, but I am curious, why does pretty much every tech comparison with the Wii-U go with the 360? I don't think I've seen literally any article directly comparing the Wii-U to the PS3. Is it cause both the 360 and Wii-U have triple core architectures and GPUs made by Ati (or AMD, as they're called nowadays I guess)?

Blah Blah XBOX 360 1.5
 

Thraktor

Member
Which means?

Wasn't FX older than CL?

The 750CL is effectively just Broadway/Gekko, although it was only sold as a stand-alone processor much later. The 750FX would be a year or two newer.

Edit: On the 512MB NAND chip, I'm guessing it's for Wii BC. The main Wii U flash is eMMC, which is kind of like a mini SSD (has controllers, etc. onboard), and it's possible that the console can't access it in Wii mode, so they just threw in some NAND for Wii mode to use.
 

ozfunghi

Member
The 750CL is effectively just Broadway/Gekko, although it was only sold as a stand-alone processor much later. The 750FX would be a year or two newer.

Edit: On the 512MB NAND chip, I'm guessing it's for Wii BC. The main Wii U flash is eMMC, which is kind of like a mini SSD (has controllers, etc. onboard), and it's possible that the console can't access it in Wii mode, so they just threw in some NAND for Wii mode to use.

Thanks, So it's a newer SMP, or it's an SMP at all?

Also thanks for the NAND, i asked the same question last week in the other topic but didn't get an answer. I also figured either OS or BC.
 

Thraktor

Member
Thanks, So it's a newer SMP, or it's an SMP at all?

Also thanks for the NAND, i asked the same question last week in the other topic but didn't get an answer. I also figured either OS or BC.

None of the 750 series CPUs support SMP, so that's going to be an entirely new implementation.
 
None of the 750 series CPUs support SMP, so that's going to be an entirely new implementation.

Of course none of the 750 line supports SMP. It was a single-core architecture. My question is why does Nintendo have such a fettish over the G3 architecture?
 

Osiris

I permanently banned my 6 year old daughter from using the PS4 for mistakenly sending grief reports as it's too hard to watch or talk to her
Of course none of the 750 line supports SMP. It was a single-core architecture. My question is why does Nintendo have such a fettish over the G3 architecture?

BC
 

Durante

Member
I think the speculation about the eDRAM is interesting. I think that it should easily have a 50 GB/s+ bandwidth to the GPU. But then, why do we see those slowdowns in some games with alpha blending? I guess we don't have enough data yet.

I never asked this, but I am curious, why does pretty much every tech comparison with the Wii-U go with the 360? I don't think I've seen literally any article directly comparing the Wii-U to the PS3. Is it cause both the 360 and Wii-U have triple core architectures and GPUs made by Ati (or AMD, as they're called nowadays I guess)?
There's lots of reasons why they are much easier to compare:
- CPU: both have a tri-core architecture of (somewhat, see Wii U cache layout) homogeneous cores
- GPU: both use an AMD GPU with unified shaders
- Memory subsystem: both use a unified main memory pool and an additional eDRAM pool (though the latter seems more capable on Wii U)

PS3 is just completely different in all those aspects, which makes comparisons much harder.
 

wsippel

Banned
that's it? Couldn't they just have a Wii SoC on the MCM?
That would be a waste of silicon, the 750 line isn't as bad as some might believe. The 476FP has a Dhrystone of 2.7. That was supposedly the highest Dhrystone of any embedded processor back in 2010. The 750CL (Broadway) had a Dhrystone of 2.1 if I remember correctly. Not exactly a huge difference. Depending on the changes IBM made, Espresso is possibly faster than 476FP by now. It's certainly faster when it comes to floating point math, as neither chip has a VMX unit, but Espresso supports paired singles.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
A 750-class CPU has ultra-short (by today's standards) pipeline, which grants certain 'general purpose' advantages over many current CPUs (read: higher per-clock performance and/or better determinism at general-purpose code), while at the same time making hitting the typical clocks today (2+GHz) impossible. Short pipeline giveth, short pipeline taketh.
 

Thraktor

Member
I think the speculation about the eDRAM is interesting. I think that it should easily have a 50 GB/s+ bandwidth to the GPU. But then, why do we see those slowdowns in some games with alpha blending? I guess we don't have enough data yet.

Looking at Renesas's eDRAM specs, the minimum bandwidth we'd be looking at is 68.75GB/s 65.57GB/s, although that's likely to be divided between different components.

Actually, Durante, as you're a hardware guy, do you have any thoughts on what I posted here, both as a potential configuration of the eDRAM with respect to the GPU, and as an explanation of the alpha-blending slowdown?

A 750-class CPU has ultra-short (by today's standards) pipeline, which grants certain 'general purpose' advantages over many current CPUs (read: higher per-clock performance and/or better determinism at general-purpose code), while at the same time making hitting the typical clocks today (2+GHz) impossible. Short pipeline giveth, short pipeline taketh.

What with the OoO execution, short pipeline and large cache, it really looks like the kind of CPU you'd end up with if you wanted the best pathfinding performance possible within very small die size and thermal limits.
 

JordanN

Banned
Looking at Renesas's eDRAM specs, the minimum bandwidth we'd be looking at is 68.75GB/s, although that's likely to be divided between different components.
Is it coincidence that comes up close to the HD 4850's bandwidth (63.55 Gb/s)? One of the first cards speculated for the Wii U and can't remember who said it but Nintendo could of been trying to emulate those speeds.
 

Thraktor

Member
Is it coincidence that comes up close to the HD 4850's bandwidth (63.55 Gb/s)? One of the first cards speculated for the Wii U and can't remember who said it but Nintendo could of been trying to emulate those speeds.

Entirely co-incidental. The eDRAM speed is based on a 1024 bit interface operating at 550MHz, which is simply the narrowest interface available for 32MB of Renesas 40nm eDRAM and operating at the GPU's clock. Other possibilities are 275GB/s and 550GB/s, based on 4096 and 8192 bit interfaces operating again at 550MHz.

Actually, I've just realised that my numbers were a bit off, due to my not taking into account the difference between 1000-base Hz and 1024-base bytes. The correct numbers should be:

1024 bit interface - 65.57GB/s
4096 bit interface - 262.26GB/s
8192 bit interface - 524.52GB/s
 

japtor

Member
Edit: On the 512MB NAND chip, I'm guessing it's for Wii BC. The main Wii U flash is eMMC, which is kind of like a mini SSD (has controllers, etc. onboard), and it's possible that the console can't access it in Wii mode, so they just threw in some NAND for Wii mode to use.
Completely forgot about that, does the Wii transfer take up any of the Wii U's storage or just fill up some invisible (to the user) 512MB block?
Of course none of the 750 line supports SMP. It was a single-core architecture. My question is why does Nintendo have such a fettish over the G3 architecture?
Other than BC, probably cost and power draw, and being good enough as a performance target for however many years in their eyes. Ideally the GPU could act like an APU as far as SIMD instruction set flexibility, but I'm guessing that's not the case or there'd probably be less complaints about computing power (barring difficulties of implementation).
So basically... they're already bleeding money, so why not bleed some more?
Cause they are not horrible business men...perhaps a bit short sighted and idealistic when it comes to third parties, but that's their gambling side I guess. Hell it's a gamble either way if you look back to the GameCube, since the bet on third parties is unreliable they're controlling costs where they can.
That would be a waste of silicon, the 750 line isn't as bad as some might believe. The 476FP has a Dhrystone of 2.7. That was supposedly the highest Dhrystone of any embedded processor back in 2010. The 750CL (Broadway) had a Dhrystone of 2.1 if I remember correctly. Not exactly a huge difference. Depending on the changes IBM made, Espresso is possibly faster than 476FP by now. It's certainly faster when it comes to floating point math, as neither chip has a VMX unit, but Espresso supports paired singles.
Would this be one of those changes that'd effectively make it a new chip by most conventions or just a sort of bolt on feature?
What with the OoO execution, short pipeline and large cache, it really looks like the kind of CPU you'd end up with if you wanted the best pathfinding performance possible within very small die size and thermal limits.
...or probably one of the newer ARM cores at this point. Or Intel's mobilized Atom coming from the other end and presumably AMD's low end cores. But I'm guessing other competitive solutions would cost more and not have the benefit of BC (without another added cost) and familiarity, which may have avoided some additional development transition costs.
 

wsippel

Banned
Would this be one of those changes that'd effectively make it a new chip by most conventions or just a sort of bolt on feature?
I don't understand your question? It's a feature found in all Nintendo consoles since the Gamecube, but seems to be exclusive to (certain?) PPC750 cores. The more modern 476FP, which some people considered a good candidate for Wii U, doesn't support paired singles, which essentially means that it only provides half the peak floating point performance of Gekko, Broadway or Espresso.
 

ozfunghi

Member
I don't understand your question? It's a feature found in all Nintendo consoles since the Gamecube, but seems to be exclusive to (certain?) PPC750 cores. The more modern 476FP, which some people considered a good candidate for Wii U, doesn't support paired singles, which essentially means that it only provides half the peak floating point performance of Gekko, Broadway or Espresso.

So "best of both words"...? Relatively speaking :)
 

Durante

Member
Thinking about the whole CPU floating point performance / GPGPU issue, if we ever get a GFlop number for the Wii U GPU, should we assume that ~100 of them are "used up" simply to make up for with Xenon/Cell? In comparisons with PS3/360 that is.

Looking at Renesas's eDRAM specs, the minimum bandwidth we'd be looking at is 68.75GB/s 65.57GB/s, although that's likely to be divided between different components.

Actually, Durante, as you're a hardware guy, do you have any thoughts on what I posted here, both as a potential configuration of the eDRAM with respect to the GPU, and as an explanation of the alpha-blending slowdown?
I'm not really a hardware guy, a computer architecture guy maybe. Your explanation seems as reasonable as any I've seen yet (I agree that it's unlikely you'd want the DDR3 accesses to go through the same data path as the eDRAM accesses), but -- as I'm not a hardware guy ;) -- I don't know how feasible it is to give the SIMD arrays direct access to the eDRAM as in your proposed architecture.
 

ikioi

Banned
Thinking about the whole CPU floating point performance / GPGPU issue, if we ever get a GFlop number for the Wii U GPU, should we assume that ~100 of them are "used up" simply to make up for with Xenon/Cell? In comparisons with PS3/360 that is.

No way would it be anywhere near 100gflops, 50 at most imho.
 

ikioi

Banned
...Wasn't the GFLOP count on the G3 series really damn low? And besides, the SIMD is apparently pretty bleh.

There is no way the Cell or Xenon processors do 100 gigaflops in games, i doubt they could crack anywhere near that in simulated environments. The ATi and Nvidia GPUs in the Xbox 360 and PS3 were rated around 250gigaflops, the CPUs at most would have been 50.
 

TunaLover

Member
Not sure if posted yet...

About Wii U CPU

Hector Martin ‏@marcan42
@AminKhajehnassi @eubank_josh we suspect a cross between the 750CL and the 750FX but it's unclear. The SMP is new anyway.
 

TheD

The Detective
There is no way the Cell or Xenon processors do 100 gigaflops in games, i doubt they could crack anywhere near that in simulated environments. The ATi and Nvidia GPUs in the Xbox 360 and PS3 were rated around 250gigaflops, the CPUs at most would have been 50.

No.

All of that is theoretical peak only, including the GPUs.

Even new GPUs have problems reaching it vs CPUs.
 

ikioi

Banned
No.

All of that is theoretical peak only, including the GPUs.

Even new GPUs have problems reaching it vs CPUs.

Which exactly what i'm saying.

The Xbox 360 and PS3's quoted gflops are basically bullshit theoretical best maximums. In the real world neither console's GPU or CPUs could ever output anywhere near the claimed figures.

So no way on earth would 100gflops fo the Wii U's CPU be taken up from tasks off loaded to it from the CPU. Cell and Xenon never would have done anywhere near those sort of numbers irrespective of task, ICP, SIMD, etc.
 
But GPGPU is almost by definition a brute force approach. It's all well and good to say that you can't achieve the peak efficiency quoted on the other console CPU's, but the fact is trying to do the same work on a GPU is way more wasteful. People only do it because the sheer number of calculations you can bring to bear on the problem is so much higher with so many ALUs on a GPU. But all the people out there using GPGPU for scientific research, supercomputers, bitcoin mining aren't stealing time from graphics work to do non-graphics calculations. Moving work from the CPU to the GPU may in some cases get the work done faster, but it is absurd to suggest it is a more efficient use of resources. This is especially true on older GPU architectures like the WiiU employs compared to recent architectures like GCN and Fermi that explicitly target higher GPGPU performance.
 

pottuvoi

Banned
Which exactly what i'm saying.

The Xbox 360 and PS3's quoted gflops are basically bullshit theoretical best maximums. In the real world neither console's GPU or CPUs could ever output anywhere near the claimed figures.

So no way on earth would 100gflops fo the Wii U's CPU be taken up from tasks off loaded to it from the CPU. Cell and Xenon never would have done anywhere near those sort of numbers irrespective of task, ICP, SIMD, etc.
I'm pretty sure that on simple task Cell can get near it's maximum performance.

Overall, all these systems have allowed us to hit a mean 65% of active SPU usage, while reducing PPU latencies and wait times to zero. (source)
Certainly doesn't sound that bad and would indicate a quite decent throughput. (I would bet that it gets over 50gflops, at least in some cases.. ;)
 

Durante

Member
There is no way the Cell or Xenon processors do 100 gigaflops in games, i doubt they could crack anywhere near that in simulated environments. The ATi and Nvidia GPUs in the Xbox 360 and PS3 were rated around 250gigaflops, the CPUs at most would have been 50.
Three points:
(1) Cell in its configuration in PS3 is rated around 200 GFlops.
(2) I don't know about "simulated environments", but people have reached 90%+ utilization on Cell.
(3) I don't think that GPGPU effectiveness would necessarily be higher (or easier to achieve) than SPE effectiveness.

In light of that, at least vis-a-vis Cell, I think my number isn't nearly as ridiculous as you make it appear.
 
Top Bottom