It would have taken much less time for you to look it up versus typing it two big paragraphs. Again, unless you back up your claims it's nothing but hot air.
What makes you think that I'll find it faster than you? I've tried but most of results are leading to guys trying to add dedicated PhysX GeForce to their Radeons. If you don't trust me -- it's your choice.
Again, its probably a reference to the PPU. They might wanted to differentiate between PhysX (SDK) and PPU. Charlie's using AEGIA reference too and for good reason.
That PPU was called PhysX P1. It's pretty easy to differentiate P1 from SDK since the first is a chip/card and the second is a software (which is kind of hard to be built into hardware).
Charlie's using a lot of stuff in that piece and most of it doesn't even makes sense -- although it's normal for his write-ups.
Charlie said:
Kepler is said to have a very different shader architecture from Fermi, going to much more AMD-like units, caches optimised for physics/computation, and clocks said to be close to the Cayman/Tahiti chips.
What the hell is "AMD-like units"? If that's "GCN-like units" than these units are in turn a lot like NV's CUDA cores being used since G80. So if Kepler's using these then it's basically using the same units as all NV GPUs since G80. Why call them "AMD-like"? And I don't think that NV's gone VLIW (which
was AMD-like) in Kepler since Kepler is supposed to be a continuation of Tesla->Fermi architecture plus it's just doesn't make any sense for them to do it now when even AMD switched to G80-like scalar execution.
Charlie said:
Performance is likewise said to be a tiny bit under 3TF from a much larger shader count than previous architectures. This is comparable to the 3.79TF and 2048 shaders on AMD’s Tahiti, GK104 isn’t far off either number. With the loss of the so called “Hot Clocked” shaders, this leaves two main paths to go down, two CUs plus hardware PhysX unit or three. Since there is no dedicated hardware physics block, the math says each shader unit will probably do two SP FLOPs per clock or one DP FLOP.
There are no CUs in NVIDIA's GPUs. What is talking about? Why not use NV's terms for NV's GPUs especially since they've been around for much longer time than that "CU" of GCN? What's having or not "dedicated hardware physics block" have to do with how many FLOPs a "shader" can do? This whole paragraph just doesn't make sense.
Charlie said:
but also leads to questions of how those shaders will be fed with only a 256-bit memory path
Yeah, that's a new problem. We didn't have it for like, ages already. Kepler is surely unique here. Next time Charlie will make a discovery that compute capabilities are improving much faster than off-chip bandwidth and after that he'll discover that on-chip caches are there for a reason and complex GPU effects/features like POM, DoF, tesselation are all there basically to create something on-chip while waiting for external data. So much to learn.
Charlie said:
The net result is that shader utilisation is likely to fall dramatically, with a commensurate loss of real world performance compared to theoretical peak.
On what code? "Shader" (I think it's time to start calling them compute cores or CUDA cores or shader processors because "shaders" are programs and they don't have any "utilisation") utilisation is always relevant to something that is running on them. Without software SP utilisation is always zero. So if we have a game like Crysis 2 DX11 which uses a lot of shader calculations then SP utilisation will be high. And if we have some GLQuake running then their utilisation will be low. SPs utilisation isn't something that can be judged outside of what's running on them.
Charlie said:
In the same way that AMD’s Fusion chips count GPU FLOPS the same way they do CPU FLOPS in some marketing materials, Kepler’s 3TF won’t measure up close to AMD’s 3TF parts.
It is completely the other way around right now. Newly launched HD7950 have 2.9TFlops peak compute performance while a GXT580 card which is more or less on par with it in performance has only 1.58TFlops peak. Why would that change so suddenly with Kepler? Charlie's reasoning about it being severely limited by 256 bit bus makes zero sense here. 256 bit bus with fast GDDR5 will give GK104 more bandwidth than GTX580 had. Also...
Charlie said:
Benchmarks for GK104 shown to SemiAccurate have the card running about 10-20% slower than Tahiti. On games that both heavily use physics related number crunching and have the code paths to do so on Kepler hardware, performance should seem to be well above what is expected from a generic 3TF card. That brings up the fundamental question of whether the card is really performing to that level?
What level? 10-20% slower than Tahiti which has 3.79TFlops peak is 3.0-3.4TFlops which is exactly what he's saying GK104 will have. If it'll beat Tahiti with such specs in some physics- or compute-optimised benchmarks -- that'd be great since just from Charlie's own raw numbers it looks like it shouldn't. (By the way, what benchmarks are these? there is zero new PC game releases between now and Metro LL in 4Q so all the benchmarks are already here -- what are these "games that both heavily use physics related number crunching and have the code paths to do so on Kepler hardware"? Batman AC? BF3? Crysis 2?)
Charlie said:
If physics code is the bottleneck in your app, A goal Nvidia appears to actively code for, then uncorking that artificial impediment should make an app positively fly. On applications that are written correctly without artificial performance limits, Kepler’s performance should be much more marginal.
If PhysX code is the bottleneck in your app then all chances are that this app with GPU PhysX on is faster on some GTX560 than on HD7970 right now. No need for Kepler/GK104. Problem solved. facepalm.jpg
Charlie said:
Since Nvidia is pricing GK104 against AMD’s mid-range Pitcairn ASIC, you can reasonably conclude that the performance will line up against that card, possibly a bit higher. If it could reasonably defeat everything on the market in a non-stacked deck comparison, it would be priced accordingly, at least until the high end part is released.
Is this basically all the reasoning behind everything written above? Sure, Charlie, it's not like any company ever started a price war by putting a product on the market with a much lower price than its competitors. Pitcairn is supposed to be on GF114/GTX560 levels of performance. I don't know how NV need to screw up to end up with a GK104 which will have the same performance as GF114. It's just impossible to do because a straight shrink of GF114 to 28nm would give them a better performance while being a much smaller chip than GK104 is supposed to be.
Charlie said:
All of the benchmark numbers shown by Nvidia, and later to SemiAccurate, were overwhelmingly positive. How overwhelmingly positive? Far faster than an AMD HD7970/Tahiti for a chip with far less die area and power use, and it blew an overclocked 580GTX out of the water by unbelievable margins. That is why we wrote this article. Before you take that as a backpedal, we still think those numbers are real, the card will achieve that level of performance in the real world on some programs.
The problem for Nvidia is that once you venture outside of that narrow list of tailored programs, performance is likely to fall off a cliff, with peaky performance the likes of which haven’t been seen in a long time. On some games, GK104 will handily trounce a 7970, on others, it will probably lose to a Pitcairn.
A smart man would assume that he's been shown a cherry picked benchmarks in which GK104 is much faster than 7970 (which is very impressive by itself) and in all the other not-so-cherry-picked benchmarks they'll end up being more or less close to each other, with GK104 loosing 10-20% on average (which would be impressive as well considering that GK104 supposedly have less Flops, at best 67% of bandwidth and is a less complex design). But Charlie somehow arrives at a conclusion that it'll loose even to Pitcairn in other programs which is buffling to say the least. Why not go straight to Cape Verde while we're at it?
Charlie said:
Nvidia is going out of their way to have patches coded for games that tend to be used as benchmarks by popular sites.
Written this down. I'll see how many benchmarks will get Kepler-specific patches. (I'll be amused if it'll be more than two or three of them all, yeah. It's generally close to impossible to push some ISV to even patch out his own bugs from his game.)
Charlie said:
Since Nvidia’s Fermi generation GPUs are very good at handling stencil buffers, they perform very well on this code.
As far as I remember Fermi is identical to Evergreen and NI in it's handling of stencil buffers (I suppose he's talking about depth buffer fillate here).
Charlie said:
Since most modern GPUs can compute multiple triangles per displayable pixel
They can't. That would simply kill perfomance even on Fermi.
Charlie said:
Since most modern GPUs can compute multiple triangles per displayable pixel on any currently available monitor, usually multiple monitors, doubling that performance is a rather dubious win. Doubling it again makes you wonder why so die area was wasted.
Sure because every PC game now have a lot of triangles in every screen pixel. And tesselation is everywhere and it doesn't kill perfomance on Radeons at all. Clearly we don't need more tesselation perfomance since it's all triangles everywhere now. doublefacepalm.jpg
Charlie said:
If the purported patch does change performance radically on specific cards, is this legitimate GPU performance? Yes. How about if it raises performance on Kepler cards while decreasing performance on non-Kepler cards to a point lower than pre-patch levels? How about if it raises performance on Kepler cards while decreasing performance only on non-Nvidia cards? Which scenario will it be? Time will tell.
How about you shut up until that time will tell you something then? This is the reason why I don't like his write-ups. He usually has some solid info but it is almost lost in-between loads of such bullshit coming from him and him only.
Charlie said:
This is important because it strongly suggests that Nvidia is accelerating their own software APIs on Kepler without pointing it out explicitly. Since Kepler is a new card with new drivers, there is no foul play here, and it is a quite legitimate use of the available hardware.
What software APIs are those? PhysX? So they accelerate only two games from 2011 while basically ignoring all the others like Crysis 2, The Witcher 2, BF3? That's a smart move. /sarcasm
Charlie said:
Then again, they have been proven to degrade the performance of the competition through either passive or active methods.
And competition has been proven to do the same to them. News at eleven.
Charlie said:
Since Nvidia controls the APIs and middleware used, the competition can not ‘fix’ these ‘problems with the performance of their hardware’.
Again what APIs is he talking about? Crysis 2 doesn't use any NVIDIA APIs. Battlefield 3 doesn't use any NVIDIA APIs.
Charlie said:
Is the performance of Kepler cards legitimate? Yes. Is it the general case? No. If you look at the most comprehensive list of supported titles we can find, it is long, but the number of titles released per year isn’t all that impressive, and anecdotally speaking, appears to be slowing.
When Kepler is released, you can reasonably expect extremely peaky performance. For some games, specifically those running Nvidia middleware, it should fly. For the rest, performance is likely to fall off the proverbial cliff. Hard. So hard that it will likely be hard pressed to beat AMD’s mid-range card.
And for the third time: what NVIDIA middleware? Why would performance of a 3TFlops part "fall off the proverbial cliff" "so hard that it will likely be hard pressed to beat AMD’s mid-range card" which is rumoured to have only 1408 SPs which would give it ~2,86Tflops @950 MHz and exactly the same 256 bit GDDR5 memory bus? And what is he smoking and where can I get that too?
Charlie said:
What does this mean in the end?
I wonder.
This and especially when nvidia is trying devs to use their special stuff.
No thanks man.
Yes, because better graphics and interaction is bad. Oh, wait.
Considering Physx is not allowed to offload onto CPU and detects when ATI/foreign GPU's are connected and cripples performance, I see nothing other than shady tactics on nvidia's part. I never liked Physx because of that alone.
PhysX is running fine on CPU (PhysX 3.x SDK is using all CPU cores automatically now) and nothing cripples on any non-NVIDIA GPU.