shinra-bansho
Member
http://forum.beyond3d.com/showpost.php?p=1683404&postcount=3737Where does this idea come from?
It may be "baseless speculation" as suggested above though.
http://forum.beyond3d.com/showpost.php?p=1683404&postcount=3737Where does this idea come from?
Meltdowns, meltdowns. Get your meltdowns.
-----
Anyway, can someone elaborate on how exactly "GPGPU" is going to be the Wii U's panacea, as has been made out as on the chalkboard.
From reading B3D, the impression I'm getting is that it likely isn't going to be that much use.
There was also some interesting speculation, that seemed to imply that eDRAM may not be any sort of magic bullet either... that it's bandwidth could be as poor as the system RAM.
Stream out: Allows you to write directly to memory from a geometry or vertex shader bypassing the pixel shader (and ROPs).
Texture Arrays (incuding cube maps): This allows an array of textures to accessed as a single texture. Can be used like a texture atlas without the filtering problems.
BC4/BC5: New texture compression formats. The most common use is probably for compressing normal maps.
MSAA textures / Shader Model 4.1 extended MSAA: Extensions to allow MSAA surfaces to be more flexibly used. In particular the ability to use MSAA with deferred shading techniques is improved.
Yeah, Xenos is pretty much a prototype "Direct3D 10" part.
That's one of the reasons i'm doing this exercise actually. People tend to think of hardware as being in more discreet generations than it actually is. And i'm trying to get a sense where along the line the R700 actually sits.
Indeed.Yeah, Xenos is pretty much a prototype "Direct3D 10" part.
U-GPU has its own ISA, neither Wekiva nor Evergreen.I'm also curious if the GPU7 is pretty much an R700 part with some modifications for backwards compatibility and the changed memory configuration or if any Evergreen sneaked in there.
Oh right, I think they added jitter support for DX11.Is Fetch4 the same as Gather4? I believe it was added in 10.1, and then made more general in 11.
My main concern would be SIMD support in the other CPUs. Whether that's implemented through the APU/GPU is a moot point since it sounds like Jaguar supports a bunch of SIMD instruction sets (barring MS/Sony sticking in crippled cores?). Something might be implemented one way and be difficult to implement in OpenCL or whatever usual stuff used on GPUs. Of course we don't really know what the hell the Wii U's GPU will support and/or if the SDK is limited/incomplete with regards to it, I'm just assuming the minimum for this example since...well again, it's not like we anything else about it.If rumours have PS4/720 using an APU + Discrete GPU, then the GPU on the APU could be used for GPGPU processing, allowing the main CPU to be 'relatively' weak and performing CELL SPE type tasks on the GPGPU part. and using edram to mitigate large amounts of slower main memory.
That could be a mirror of what WiiU has - relatively weak CPU, GPGPU elements and edram. Just a lot faster.
So if this happens, developers will need to adapt to the new architectures, and will by default therefore be more set up to code for WiiU.
The big question is - where does WiiU sit compared to PS3/360 and PS4/720? If its closer to the new guys, then sharing a similar architecture may help, with some trimming of assets here and there. But if its closer to PS3/360, then it may be an active hinderance. Developers will be producing ports from PS3/360 engines but those port teams may not be familar with the new architecture and so they may never leverage the strengths of the platform.
The rumor was SLC NAND I thought, which made me think it'd be pointless for OS storage. SLC is used for the write endurance (and speed but advances have made that mostly moot), so I figured it'd be a big ass write cache/buffer of some sort rather than just a place for the OS to sit. My crazy theory was to use that as a sort of secondary loading/streaming buffer, like instead of being limited by disc speed and/or using some RAM as a buffer to compensate, the NAND could be used in some way to help out.Back on topic: on the VGLeaks spec sheet, it mentioned something about 512MB NAND for the system - is this likely to be the max size of the installed OS? Or is this just for storing basic compulsory system software?
This is basically memexport on Xenos.
Xenos had limited support for texture arrays up to 64 surfaces. IIRC, D3D10 went up to 512 (along with the cubemap stuff).
Xenos had BC5 (DXN/3Dc).
Support is a non-issue for PS360; the bigger problem is the increased memory requirements & performance (aliasing larger RTs to store the resultant samples + shading/lighting said samples). Its inclusion in DX10.1 just highlights the API issues that were on PC.
I never asked this, but I am curious, why does pretty much every tech comparison with the Wii-U go with the 360? I don't think I've seen literally any article directly comparing the Wii-U to the PS3. Is it cause both the 360 and Wii-U have triple core architectures and GPUs made by Ati (or AMD, as they're called nowadays I guess)?
I never asked this, but I am curious, why does pretty much every tech comparison with the Wii-U go with the 360? I don't think I've seen literally any article directly comparing the Wii-U to the PS3. Is it cause both the 360 and Wii-U have triple core architectures and GPUs made by Ati (or AMD, as they're called nowadays I guess)?
I never asked this, but I am curious, why does pretty much every tech comparison with the Wii-U go with the 360? I don't think I've seen literally any article directly comparing the Wii-U to the PS3. Is it cause both the 360 and Wii-U have triple core architectures and GPUs made by Ati (or AMD, as they're called nowadays I guess)?
Which means?
Wasn't FX older than CL?
The current discussion on B3D is just a few people who have absolutely no idea how the Gamecube and Wii memory architecture worked making wild and completely baseless assumption.
The 750CL is effectively just Broadway/Gekko, although it was only sold as a stand-alone processor much later. The 750FX would be a year or two newer.
Edit: On the 512MB NAND chip, I'm guessing it's for Wii BC. The main Wii U flash is eMMC, which is kind of like a mini SSD (has controllers, etc. onboard), and it's possible that the console can't access it in Wii mode, so they just threw in some NAND for Wii mode to use.
Thanks, So it's a newer SMP, or it's an SMP at all?
Also thanks for the NAND, i asked the same question last week in the other topic but didn't get an answer. I also figured either OS or BC.
None of the 750 series CPUs support SMP, so that's going to be an entirely new implementation.
Of course none of the 750 line supports SMP. It was a single-core architecture. My question is why does Nintendo have such a fettish over the G3 architecture?
that's it? Couldn't they just have a Wii SoC on the MCM?
Budget?
There's lots of reasons why they are much easier to compare:I never asked this, but I am curious, why does pretty much every tech comparison with the Wii-U go with the 360? I don't think I've seen literally any article directly comparing the Wii-U to the PS3. Is it cause both the 360 and Wii-U have triple core architectures and GPUs made by Ati (or AMD, as they're called nowadays I guess)?
they were allready taking a loss on it. Would it really be that more expensive to add a Wii SoC onto it? What does Wii cost do build now? $20?
That would be a waste of silicon, the 750 line isn't as bad as some might believe. The 476FP has a Dhrystone of 2.7. That was supposedly the highest Dhrystone of any embedded processor back in 2010. The 750CL (Broadway) had a Dhrystone of 2.1 if I remember correctly. Not exactly a huge difference. Depending on the changes IBM made, Espresso is possibly faster than 476FP by now. It's certainly faster when it comes to floating point math, as neither chip has a VMX unit, but Espresso supports paired singles.that's it? Couldn't they just have a Wii SoC on the MCM?
A 750-class CPU has ultra-short (by today's standards) pipeline, which grants certain 'general purpose' advantages over many current CPUs (read: higher per-clock performance and/or better determinism at general-purpose code), while at the same time making hitting the typical clocks today (2+GHz) impossible. Short pipeline giveth, short pipeline taketh.
I think the speculation about the eDRAM is interesting. I think that it should easily have a 50 GB/s+ bandwidth to the GPU. But then, why do we see those slowdowns in some games with alpha blending? I guess we don't have enough data yet.
A 750-class CPU has ultra-short (by today's standards) pipeline, which grants certain 'general purpose' advantages over many current CPUs (read: higher per-clock performance and/or better determinism at general-purpose code), while at the same time making hitting the typical clocks today (2+GHz) impossible. Short pipeline giveth, short pipeline taketh.
Is it coincidence that comes up close to the HD 4850's bandwidth (63.55 Gb/s)? One of the first cards speculated for the Wii U and can't remember who said it but Nintendo could of been trying to emulate those speeds.Looking at Renesas's eDRAM specs, the minimum bandwidth we'd be looking at is 68.75GB/s, although that's likely to be divided between different components.
Is it coincidence that comes up close to the HD 4850's bandwidth (63.55 Gb/s)? One of the first cards speculated for the Wii U and can't remember who said it but Nintendo could of been trying to emulate those speeds.
Completely forgot about that, does the Wii transfer take up any of the Wii U's storage or just fill up some invisible (to the user) 512MB block?Edit: On the 512MB NAND chip, I'm guessing it's for Wii BC. The main Wii U flash is eMMC, which is kind of like a mini SSD (has controllers, etc. onboard), and it's possible that the console can't access it in Wii mode, so they just threw in some NAND for Wii mode to use.
Other than BC, probably cost and power draw, and being good enough as a performance target for however many years in their eyes. Ideally the GPU could act like an APU as far as SIMD instruction set flexibility, but I'm guessing that's not the case or there'd probably be less complaints about computing power (barring difficulties of implementation).Of course none of the 750 line supports SMP. It was a single-core architecture. My question is why does Nintendo have such a fettish over the G3 architecture?
Cause they are not horrible business men...perhaps a bit short sighted and idealistic when it comes to third parties, but that's their gambling side I guess. Hell it's a gamble either way if you look back to the GameCube, since the bet on third parties is unreliable they're controlling costs where they can.So basically... they're already bleeding money, so why not bleed some more?
Would this be one of those changes that'd effectively make it a new chip by most conventions or just a sort of bolt on feature?That would be a waste of silicon, the 750 line isn't as bad as some might believe. The 476FP has a Dhrystone of 2.7. That was supposedly the highest Dhrystone of any embedded processor back in 2010. The 750CL (Broadway) had a Dhrystone of 2.1 if I remember correctly. Not exactly a huge difference. Depending on the changes IBM made, Espresso is possibly faster than 476FP by now. It's certainly faster when it comes to floating point math, as neither chip has a VMX unit, but Espresso supports paired singles.
...or probably one of the newer ARM cores at this point. Or Intel's mobilized Atom coming from the other end and presumably AMD's low end cores. But I'm guessing other competitive solutions would cost more and not have the benefit of BC (without another added cost) and familiarity, which may have avoided some additional development transition costs.What with the OoO execution, short pipeline and large cache, it really looks like the kind of CPU you'd end up with if you wanted the best pathfinding performance possible within very small die size and thermal limits.
I don't understand your question? It's a feature found in all Nintendo consoles since the Gamecube, but seems to be exclusive to (certain?) PPC750 cores. The more modern 476FP, which some people considered a good candidate for Wii U, doesn't support paired singles, which essentially means that it only provides half the peak floating point performance of Gekko, Broadway or Espresso.Would this be one of those changes that'd effectively make it a new chip by most conventions or just a sort of bolt on feature?
I don't understand your question? It's a feature found in all Nintendo consoles since the Gamecube, but seems to be exclusive to (certain?) PPC750 cores. The more modern 476FP, which some people considered a good candidate for Wii U, doesn't support paired singles, which essentially means that it only provides half the peak floating point performance of Gekko, Broadway or Espresso.
I'm not really a hardware guy, a computer architecture guy maybe. Your explanation seems as reasonable as any I've seen yet (I agree that it's unlikely you'd want the DDR3 accesses to go through the same data path as the eDRAM accesses), but -- as I'm not a hardware guy -- I don't know how feasible it is to give the SIMD arrays direct access to the eDRAM as in your proposed architecture.Looking at Renesas's eDRAM specs, the minimum bandwidth we'd be looking at is68.75GB/s65.57GB/s, although that's likely to be divided between different components.
Actually, Durante, as you're a hardware guy, do you have any thoughts on what I posted here, both as a potential configuration of the eDRAM with respect to the GPU, and as an explanation of the alpha-blending slowdown?
So basically... they're already bleeding money, so why not bleed some more?
Thinking about the whole CPU floating point performance / GPGPU issue, if we ever get a GFlop number for the Wii U GPU, should we assume that ~100 of them are "used up" simply to make up for with Xenon/Cell? In comparisons with PS3/360 that is.
No way would it be anywhere near 100gflops, 50 at most imho.
...Wasn't the GFLOP count on the G3 series really damn low? And besides, the SIMD is apparently pretty bleh.
Not sure if posted yet...
About Wii U CPU
Hector Martin ‏@marcan42
@AminKhajehnassi @eubank_josh we suspect a cross between the 750CL and the 750FX but it's unclear. The SMP is new anyway.
There is no way the Cell or Xenon processors do 100 gigaflops in games, i doubt they could crack anywhere near that in simulated environments. The ATi and Nvidia GPUs in the Xbox 360 and PS3 were rated around 250gigaflops, the CPUs at most would have been 50.
No.
All of that is theoretical peak only, including the GPUs.
Even new GPUs have problems reaching it vs CPUs.
I'm pretty sure that on simple task Cell can get near it's maximum performance.Which exactly what i'm saying.
The Xbox 360 and PS3's quoted gflops are basically bullshit theoretical best maximums. In the real world neither console's GPU or CPUs could ever output anywhere near the claimed figures.
So no way on earth would 100gflops fo the Wii U's CPU be taken up from tasks off loaded to it from the CPU. Cell and Xenon never would have done anywhere near those sort of numbers irrespective of task, ICP, SIMD, etc.
Certainly doesn't sound that bad and would indicate a quite decent throughput. (I would bet that it gets over 50gflops, at least in some cases..Overall, all these systems have allowed us to hit a mean 65% of active SPU usage, while reducing PPU latencies and wait times to zero. (source)
Three points:There is no way the Cell or Xenon processors do 100 gigaflops in games, i doubt they could crack anywhere near that in simulated environments. The ATi and Nvidia GPUs in the Xbox 360 and PS3 were rated around 250gigaflops, the CPUs at most would have been 50.