And do you understand?People trying to recalculate FLOPS between different architectures and things like "Ampere FLOPS != Turing FLOPS" just show repeatedly that they inherently do not understand what FLOPS mean.
I'm a big MS console fan and am really excited about the series x. I know XsX is going to have its own scaling solution, but did you see Nvidia throwing that 9x scaling out there. If that turns out to be legit, its gonna a more powerful platform in actual practice from day one and even more so into the future. I'm not saying MS is competing against them here. You still have to buy a whole pc if you don't have one to use this. We will have to wait and see what MS scaling solution is. I really do hope its powerful. I would love to play my entire Xbox library in 4k with auto hdr added!!So basically, in a nutshell, the xbox series X is still more powerfull than a 3070, considering the X's closed architechture. Perhaps the consoles aren't so "weak" after all.
No matter how you look at it RTX still has 30TF because TF calculation method doesn't changed. But that's just theorethical number and we arnt seeing 3x more performance jump as numbers would suggest, but up to 2x as Nvidia has said ("up" means it's lower than 2x on average).
Psorcerer is probably right saying Ampere CUDA cores are less efficient now in real games, but the thing is each SM is wayyy faster than Turing SM, so Ampere architecture is way supperior in the end. That's why even RTX 3080 destroys 2080ti, and not to mention RTX 2080 (PS5 / XSX performance equivalent). People were expecting RTX 3090 to offer something like 30% performance jump over over 2080ti, yet even RTX 3080 is around 50% faster (up to 70% in few scenes) in Doom Eternal gameplay. I cant wait to see 3090 benchmarks, it should be nearly as fast as 2x 2080ti in raster and RT.
His premise is correct though.I can`t wait until the damn consoles are finally released and you can just shut up trolls like the OP with cold hard benchmarks.....
25% was minimum. There were games showing even up to 70% better raster performance, and average was probably somewhere around 40%.The jump from the 10XX series to the 20XX series was small ( only about 25-30% and that's WITH a 70% increase in price ) This is ONE of the reasons why so many people felt that Turing ( 20XX series ) was such a joke and a ripoff.
for a use case he personally defined.......the same guy also claimed the 3080 is just a 2080Ti in disguise while we already have benchmarks factually refuting this.....His premise is correct though.
The arguments used are incorrect in some cases but the general point that TFLOPs aren’t comparable and that 30 TFLOPs is just a nice marketing number is right.uh-hu...the same guy also claimed the 3080 is just a 2080Ti in disguise while we already have benchmarks factually refuting this.....
That guy is ignoring every fact that doesn´t fit his narrative. He´s a picture book troll-poster.
25% was minimum. There were games showing even up to 70% better raster performance, and average was probably somewhere around 40%.
Also if you add RT performance gains (especially in quake 2 RTX) and DLSS resolution boost, then you would need something like 6x 1080ti in order to play with similar graphics quality. The difference between pascal and turing was just GIGANTIC, but the problem was (and still is) not every game support RT and DLSS. It will however change in the near future, because many PS5 / XSX games will be build with RT in mind and PC ports will use it as well.
BTW. Unlike Turing, Pascal will not run next gen ports for long because it will not support HW data decompression. UE5 will run like on Pascal GPUs.
Well, they are mathematically correct for certain use cases, so you can`t nail Nvidia to the cross here. That TF numbers do not directly translate to performance is nothing new.....but that`s what we have benchmarks for...which we already have in this case, too...but those real-life comparisons are getting ignored and the "it`s only marketing"-narrative is all that`s put to the foreground.The arguments used are incorrect in some cases but the general point that TFLOPs aren’t comparable and that 30 TFLOPs is just a nice marketing number is right.
They are 100% mathematically accurate but the context is this is a gaming card. They’re selling 30 TFLOPs to their audience but to be fair, consoles started this TFLOPs war so NVIDIA just used their own weapon against them.Well, they are mathematically correct for certain use cases, so you can`t nail Nvidia to the cross here. That TF numbers do not directly translate to performance is nothing new.....but that`s what we have benchmarks for...whicih we already have in this case, too.
If it's that simple, why nobody sue nvidia for false marketing? You americans sue for every fucking thing, this sound like an opportunity to make easy money...We already know it!
2080 = 10TF
3080 = 30TF = 3x 2080
Speed increase according to NV themselves: 2x
Because it`s not factually false and they themselves already gave a pretty realistic real-world-performance-increase number.If it's that simple, why nobody sue nvidia for false marketing?
Dude horizon is a shitty port, that's why it runs like crap.It's a lot less than that because it's not the same architecture and consoles have tons of hardware optimizations PCs don't have, like all the custom I/O hardware on PS5. That and optimized low level APIs vs abstractions like Direct X on PC.
That's why you can get God of War or HZD on a 1.84 Tflop machine while PC port of HZD looks bad even years after and with a lot more "Teraflooops" (TM) on PC.
Apples and oranges. Like Carmack explained. But surely GAF PC fans know better than him.
That's the wrong Ampere.You can read all about the architecture here:
NVIDIA Ampere Architecture In-Depth | NVIDIA Technical Blog
Today, during the 2020 NVIDIA GTC keynote address, NVIDIA founder and CEO Jensen Huang introduced the new NVIDIA A100 GPU based on the new NVIDIA Ampere GPU architecture. This post gives you a look…developer.nvidia.com
You are accusing Nvidia of lying and claiming that the 3080 is something it is not. That is a pretty heavy accusation.
There are no rumored IPC gains for RDNA2 GPU's, we only know performance improvement per Watt.Nice writeup, I was going to get on this myself, but you nailed it.....Also RDNA 1 was super fast, with the complete transition to Radeon DNA, removing all GCN constraints, we are in for a treat. The Navi TFLOP should be insanely fast and unrestrained with the rumored IPC gains, a pure screamer with insanely high clocks.
So in essence, a 20-25 TF AMD card should perform just as fast as these NV cards. Hopefully they have a Geometry Engine in place and have an enhanced HBCC feature with cache scrubbing there for launch to enhance perf even more...
There are two separate issue here:This is the crux of this argument.
It IS "3x as fast" but that sentence by itself is meaningless. 3X faster at what? At maximum theoretical floating point operations per second - it IS. Just because you have 3X the TFLOPS doesn't mean 3X the framerate is some kind of guarantee. Far from it.
Look at it from the perspective of PS4 ( 1.8TFLOPS) to PS5 (10TFLOPS).
That's 5.55X the FLOPS.
So now think of a game that runs at 1080p 60FPS on PS4. Are you really expecting the PS5 to be able to run the same game at 333FPS? Cuz that shit ain't happening. ( maybe in some rare, contrived example )
Does this mean that the PS5's TFLOP count is "bullshit"? Using the "logic" of some people ITT, it would.
Very rarely does double the power equate to double the framerate. Some engines are just inefficient and doubling the framerate will require far more than double the power.
Fuck me videocards really bring out the spergs out of the woodwork.
Is this enough evidence to demonstrate that floating point performance does not directly translate to better gaming performance?
3080 has 3x the TF of the 2080, but is around 70% faster.
3080 has 2x the TF of the 2080 Ti, but is around 30% faster.
Welp, at least you aren't claiming it's .5 Turing TF like you did in the other threads...but you're still wrong. Here's why.
Reddit Q&A
Read that bolded part carefully.
Good. Addition, yay! So let's see where you went wrong...
And there it is. You can't only account for raw numerical performance; node process efficiency gains have to also be taken into account. So even if the numerical numbers between the two average out to being the same across both architectures, Ampere still sees IPC gains simply by being on a newer process (not to mention having other hardware present to offload certain taskwork more efficiently than present in Turing, such as RT, DLSS and AI through the Tensor cores. Equivalent performance in those areas on Turing would've required more raw GPU resources expended to cover the gap).
I don't see why you're doing the math this way. Nvidia says they see 36 ADDITIONAL INT32 OPs for each 100 FP32 OPs. Just previously you listed Turing as 64 INT32 + 64 FP32, and one of Ampere's as the same. So would this above division not be worthless at that point? In both cases you get 128 OPs per SM per cycle; the two Ampere numbers are clearly an either/or, the SMs can operate either in full FP32 or mixed FP32/INT32 OP modes on a cycle.
So reading this again, it really does look like you got wonky with your calculations because it's Turing that would be hindered by running INT32 instructions on a clock cycle, not Ampere, since FP32 instructions would have to wait their turn until INT32 instructions are completed.
If the conditions and context for calculations look suspect, I think that is worth questioning, long as it's respectful. FWIW there's been a rather strong push by some to downplay Nvidia's stuff, especially on the I/O front, following their presentation. If you look a little deeper you can infer why some people are doing it, too, but I'll leave that for another time and in fact I don't think it's really necessary to say why at this point
And in Ampere's case its shader occupancy could be even worse because other areas of the GPU's architecture might not have also been doubled up.
3080 is 2080Ti in disguise.
What a wonderul shitshow of a thread. The amount of salt among Sony fanboys the NVIDIA announcement has provided is so ridiculously delicious. If only MS still made their consoles with NVIDIA... Oh, can you IMAGINE the amounts of SALT then, my friend?
Fun and super relevant fact: Nvidia sounds just like envy in spanish (envidia).
Most appropriate name ever.
He was right about the strength of the PS5 and XSXI wouldn't listen too deeply into MLID; won't get into specifics here, but he's made some rather inflammatory (and wrong) claims on other products in recent past.
The NVIDIA Ampere GPU architecture retains and extends the same CUDA programming model provided by previous NVIDIA GPU architectures such as Turing and Volta
The maximum number of concurrent warps per SM remains the same as in Volta (i.e., 64), and other factors influencing warp occupancy are:
- The register file size is 64K 32-bit registers per SM.
- The maximum number of registers per thread is 255.
- The maximum number of thread blocks per SM is 32.
- Shared memory capacity per SM is 164 KB, a 71% increase compared to GV100's capacity of 96 KB.
- Maximum shared memory per thread block is 160 KB.
Overall, developers can expect similar occupancy as on Volta without changes to their application.
And do you understand?
If so, please explain
This.Sure.
FLoating point
Operations
Per
Second
If you wouldn't have just quoted my first text and went out of your way to actually read my other posts, you'd maybe understand too.
FLOPS is a measurement used to describe the FP32 power of a piece of hardware. The 3080 will deliver 30TF. The 2080ti delivered 13.5TF.
So if you look for a GPU that you only use in scientific computing where all you need is FP32 performance than yes, a 3080 is more than 2x as good as a 2080ti.
Do all games only use FP32? No!
The inherent problem that I have with people "recalculating" TF from 1 Arch to another is, that they do so in a attempt to average out Game performance.
So you take a Scientific metric, that has nothing to do with gaming at all and try to bend and fit it to match gaming results that benchmarks provide.
That's like taking a truck engine engine with 400hp, comparing it to a 250hp car engine and be like "that truck engine may have more hp, but it accelerates way slower, so 1 truck HP is like 0.3 car hp!"
That's not how any of this works, and that's exactly why max TF numbers don't translate 1:1 into gaming performance, because gaming is much more than just simple FP32 calculations.
If people were to start and understand, they maybe could slowly move on from comparing TF numbers as the more tech we introduce, like ML, RT, etc. the less meaningful TF will become on a standalone basis and have less and less impact on gaming performance overall.
This.
The TFLOPs war is silly but it started way back in 2013 and we can’t backpedal from it anymore. Might as well make the best of it instead of letting people run around with misinformation.
I don't think you understand what "at best" means, as the video shows an increase of more than 80% on multiple occasions.The I/O stuff is what it is.
I think his numbers are fine to be honest. Let me try and pitch it from another perspective, and maybe you'll understand...
RTX 2080: 10.7TF
RTX 3080: 29.8TF
RTX 3080 = 179% faster
DF benchmarks suggest RTX 3080 = ~80% faster at best
You don't see this as a problem? There is practically a 10TF mark-up on the RTX 3080.
TL;DR 1 Ampere TF = 0.72 Turing TF, or 30TF (Ampere) = 21.6TF (Turing)
Reddit Q&A
A reminder from the Turing whitepaper:
So, Turing GPU can execute 64INT32 + 64FP32 ops per clock per SM.
Ampere GPU can either execute 64INT32 + 64FP32 or 128FP32 ops per clock per SM.
Which means if a game executes 0 (zero) INT32 instructions then Ampere = 2xTuring
And if game executes 50/50 INT32 and FP32 then Ampere = Turing exactly.
So how many INT32 are there on average?
According to Nvidia:
Some math: 36 / (100+36) = 26%, i.e. in an average game instruction stream 26% are INT32
So we can now calculate what will happen to both Ampere and Turing when 26% INT32 + 74% FP32 instruction streams are used.
I have written a simple software to do that. But you can calculate an analytical upper bound easily: 74%/50% = 1.48 or +48%
My software shows a slightly smaller number +44% (and that's because of the edge cases where you cannot distribute the last INT32 ops in a batch equally, as only one pipeline can issue INT32 per each block of 16 cores)
So the theoretical absolute max is +48%, in practice the absolute achievable max is +44%
Thus each 2TF of Ampere have only 1.44TF of Turing performance.
Let's check the actual data Nvidia gave us:
3080 = 30TF (ampere) = 21.6TF (turing) = 2.14x 2080 (10.07TF turing)
Nvidia is even more conservative than that and gives us: 3080 = 2x2080
3070 = 20.4TF (ampere) = 14.7TF (turing) = 1.86x 2070 (7.88TF turing)
Nvidia is massively more conservative here giving us: 3070 = 1.6x2070
Actually if we average the two max numbers that Nvidia gives us (they explicitly say "up to") we get to even lower theoretical max of 1 Ampere TF = 0.65 Turing TF
Which suggests that maybe these new FP32/INT32 mixed pipelines cannot execute FP32 at full speed (or cannot execute all the instructions).
We do know that Turing had reduced register file access in INT32 (64 vs 256 for FP32) if it's the same (and everything suggests that Ampere is just a Turing facelift) then obviously not all FP32 instruction sequences can run on these pipelines.
Anyway a TF table:
Ampere TF Turing TF (me) Turing TF (NV) 3080 (Ampere) 30 21.6 19.5 3070 (Ampere) 20.4 14.7 13.3 2080Ti (Turing) 18.75 (me) or 20.7 (NV) 13.5 13.5 2080 (Turing) 14 (me) or 15.5 (NV) 10.1 10.1 2070 (Turing) 10.4 (me) or 11.5 (NV) 7.5 7.5
Bonus round: RDNA1 TF
RDNA1 has no INT32 pipeline, all the INT32 instructions are handled in the main stream. Thus it's essentially almost exactly the same as Ampere, but it has no skew in the last instruction thus +48% theoretical max applies here (Ampere +2.3%)
Ampere TF Turing TF (me) Turing TF (NV) 5700XT (RDNA1) 10.01 7.2 ?
Amusingly enough 5700XT actual performance is pretty similar to 2070 and these adjusted TF numbers show exactly that (10TF vs 10-11TF)
They are 100% mathematically accurate but the context is this is a gaming card. They’re selling 30 TFLOPs to their audience but to be fair, consoles started this TFLOPs war so NVIDIA just used their own weapon against them.
He was right about the strength of the PS5 and XSX
He was right about what nVidia would be showing, including prices and performance.
Most importantly; the 33% weaker IPC is EXACTLY what we are seeing with the DF benchmarks. Even if you distrust the source, the numbers don't lie.
Its a carefully curated set of games that Nvidia chose. Undoubtedly, these games represent a best-case in terms of performance uplift over Turing.I don't think you understand what "at best" means, as the video shows an increase of more than 80% on multiple occasions.
Where do the consoles fit in all this?
AquaticSquirrel Since we're allowed to have a level of constructive speculation here regarding Nvidia's claimed performance numbers, we should be open to doing the same with AMD, and that extends to the next-gen consoles as well.
Other than that, yeah. We should always be critical of claimed performance numbers from these companies. NV, AMD, Microsoft, Sony, you name it. Probably the only one excused from this is Nintendo because they haven't chased the power/performance crown for decades.
Now translate that to AMD FLOPS and you see how all the PCMR flexing is meaningless.
Ampere still sees IPC gains simply by being on a newer process.