• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Nvidia Ampere teraflops and how you cannot compare them to Turing

psorcerer

Banned
It's actual gameplay with a FPS counter.

Mwahahaha!
No, it doesn't work like that.
Correctly measured FPS pls (frame times with 95/99 percentiles).

We already have benchmarks from DF showing 70-90% uplift in 6 different games all with different engines. Where did you find 40% from?

3080 vs 2080 (not vs 2080Ti, for fuck's sake)
 
Last edited:

diffusionx

Gold Member
Yup.
In Turing cuda core has 1xFP32 + 1xINT32 ALUs = 2ALUs but only one is FP capable
In Ampere a cuda core has 1xFP32 + 1xINT/FP32 ALUs = 2ALUs and both are FP capable
NV counts the Turing ALU as 1 and Ampere ALU as 2 although even the die size between these will be roughly the same.
Purely symantically inflating the number of cores that were present in Turing!

I'll say it again. Wait for benchmarks. This stuff that reads like it came from the console speculation thread is a waste of time.

If Nvidia oversold it, we will know, and quickly.
 
Last edited:

Ascend

Member
People trying to recalculate FLOPS between different architectures and things like "Ampere FLOPS != Turing FLOPS" just show repeatedly that they inherently do not understand what FLOPS mean.
But it's still important to point this out, because normally when a new architecture is released, you expected either similar or a slightly increased efficiency in FLOPS. In this case, there is a significant decrease in efficiency. They are doing this for marketing purposes.
 
Console fanboys trying to attack Ampere. It's cute :messenger_savoring:
Mwahahaha!
No, it doesn't work like that.
Correctly measured FPS pls (frame times with 95/99 percentiles).

I'm not the clown here who claimed that a 3080 is really just a 2080ti in disguise.

Your post will not age well.

We already have a direct comparison between the 3080 and the 2080ti WITH A FRAMERATE COUNTER.

If you don't like that you can wait a couple weeks and there will be endless benchmarks. None of them will show the 3080 as being a 2080ti in disguise.

So are you going to admit that you were wrong or continue moving goalposts?
 
As impressive the 3000 series is, I'm still waiting. 3D stacking on the 4000 series incoming? We gonna see some big gains in the future my boys!
 

3liteDragon

Member
So basically, in a nutshell, the xbox series X is still more powerfull than a 3070, considering the X's closed architechture. Perhaps the consoles aren't so "weak" after all.
FFS, stop comparing PC video cards to consoles.
 
Last edited:
  • Like
Reactions: 888

Chiggs

Member
We already have benchmarks from DF showing 70-90% uplift in 6 different games all with different engines. Where did you find 40% from?



I don't really give much credence to Digital Foundry, as they don't really have the experience I look for when it comes to reputable testing. Quite frankly, that's hardly a compelling or exhaustive benchmark video, given that Nvidia has told them not share much.

My opinion was based on the specs and how I think it will fare in a wide suite of tests. Admittedly, there's some guessing involved.

More than happy to come back here and eat crow if I'm wrong.
 
Last edited:

psorcerer

Banned
If you don't like that you can wait a couple weeks and there will be endless benchmarks. None of them will show the 3080 as being a 2080ti in disguise.

I'm not sure I get it.
Nvidia says that 3080 is 2x2080 (not fucking Ti, vanilla 2080) and you are saying that benchmarks will show that NV is wrong?
 

CuNi

Member
But it's still important to point this out, because normally when a new architecture is released, you expected either similar or a slightly increased efficiency in FLOPS. In this case, there is a significant decrease in efficiency. They are doing this for marketing purposes.

And we ARE getting more FLOPS.
People fail to understand that it's not that FLOPS differ between architectures, it's that FLOPS are not 1:1 translatable to game performance. And game performance is what we actually care about in a gaming GPU.
If a "1 TFLOP" GPU would deliver 3x the performance in all games, we wouldn't care that it's only 1 TFLOP.
 

Ascend

Member
I'll say it again. Wait for benchmarks. This stuff that reads like it came from the console speculation thread is a waste of time.

If Nvidia oversold it, we will know, and quickly.
We have benchmarks from DF, albeit percentage only. They have a clear collaboration with nVidia here, so that means their numbers are likely best case scenarios. Specific games, specific settings. Additionally, they are generally comparing to an RTX2080, rather than their previous flagship, the 2080Ti. 80% faster than a 2080 sounds a lot better than 30% faster than the 2080Ti, for example.
And then there's the FLOPS, which for example show a 180% increase, but, the best case scenario benchmarks are showing 90% at best.

There are many red flags, but people gobble them up.
 

GHG

Member
I don't get it...

The 3080 does 4k doom eternal at max settings at 120+ fps. Where's the problem here? That's monstrous performance.

I'm not sure I get it.
Nvidia says that 3080 is 2x2080 (not fucking Ti, vanilla 2080) and you are saying that benchmarks will show that NV is wrong?

Honestly, what's your point here?

What exactly is wrong with 2x the performance of the 2080?

Why are you having a multi-thread tantrum about this?
 

diffusionx

Gold Member
We have benchmarks from DF, albeit percentage only. They have a clear collaboration with nVidia here, so that means their numbers are likely best case scenarios. Specific games, specific settings. Additionally, they are generally comparing to an RTX2080, rather than their previous flagship, the 2080Ti. 80% faster than a 2080 sounds a lot better than 30% faster than the 2080Ti, for example.
And then there's the FLOPS, which for example show a 180% increase, but, the best case scenario benchmarks are showing 90% at best.

There are many red flags, but people gobble them up.

My point is, the "red flag" is irrelevant. If I buy a 3080, I will buy it knowing what I am getting into, based on actual reproducible benchmarks. Whatever Nvidia claims is a flop or what they claim their performance is does not matter at all.

I would say that many if not all PC gamers are in the same boat, who is spending $700 based on Nvidia's flop claims?
 

nochance

Banned
The consoles have not been this outmatched in a long time. Nvidia is also sitting on the 3060, which will likely outdo the next gen consoles at $349.
 
Last edited:

Ascend

Member
And we ARE getting more FLOPS.
People fail to understand that it's not that FLOPS differ between architectures, it's that FLOPS are not 1:1 translatable to game performance. And game performance is what we actually care about in a gaming GPU.
If a "1 TFLOP" GPU would deliver 3x the performance in all games, we wouldn't care that it's only 1 TFLOP.
But you should care about the opposite. If they are advertising with 30TF, but it's really 'only' 20TF to the previous generation, even if it is the fastest card available, it is deceiving.
My point is, the "red flag" is irrelevant. If I buy a 3080, I will buy it knowing what I am getting into, based on actual reproducible benchmarks. Whatever Nvidia claims is a flop or what they claim their performance is does not matter at all.

I would say that many if not all PC gamers are in the same boat, who is spending $700 based on Nvidia's flop claims?
You'd be surprised.
 
Last edited:
We already know it!
2080 = 10TF
3080 = 30TF = 3x 2080
Speed increase according to NV themselves: 2x

Changing an architecture is not lying. It happens all the time both with Nvidia and with AMD.

TFLOPS is ( and has always been ) a theoretical maximum based on nothing more than SHADERS x CLOCK x 2 = TFLOPS


There's literally nothing more to it than that and the ONLY time you can use it to gauge relative performance is when comparing 2 GPU's ( or consoles ) that are using THE SAME architecture.
 

trikster40

Member
I’m not even sure I understood the TL DR!
source.gif
 
Last edited:

CuNi

Member
But you should care about the opposite. If they are advertising with 30TF, but it's really 'only' 20TF to the previous generation, even if it is the fastest card available, it is deceiving.

You'd be surprised.

It's only deceiving people who focus on FLOPS and, like I said, don't understand that you cannot use those numbers to compare real life gaming performance.
They are not deceiving in any way. Those cards can and will deliver 30 TFLOPS. TFLOPS don't care about your feelings or about the fact that games use XYZ% of INT32 ALUs.
 

Chiggs

Member
I don't believe it is. I think it's more like 40-75% in certain situations. Still quite good, though.

EDIT:

OH, MIGHTYSQUIRREL! I AM SO TRULY SORRY FOR NOT CATCHING THE NUANCE OF A POST--IN THIS CASE, 2080 VS 2080TI. I JUST GOT DONE SPENDING 12 HOURS WITH RIDICULOUSLY DEMANDING CLIENTS ON A PROJECT THAT IS SHAPING UP TO BE EVEN WORSE THAN HALO INFINITE AND, IN A MOMENT OF WEAKNESS, I MISREAD THE PERSON I WAS RESPONDING TO.

PLEASE FORGIVE ME, KEEPER OF INTERNET TRUTH, LIGHT AND JUSTICE.
 

GHG

Member
I'm trying to explain why there is a relatively small improvement in perf over the last arch although the FLOP numbers suggest a much larger one.

But the benchmarks are not showing it to be "relatively small". It's actually on the high side when jumping from one GPU generation to the next compared to what we typically see.

Why are you crying about this? Will you also behave this way when AMD unveil big navi and the performance is between the 3070 and 3080?
 
EDIT:

OH, MIGHTYSQUIRREL! I AM SO TRULY SORRY FOR NOT CATCHING THE NUANCE OF A POST--IN THIS CASE, 2080 VS 2080TI. I JUST GOT DONE SPENDING 12 HOURS WITH RIDICULOUSLY DEMANDING CLIENTS ON A PROJECT THAT IS SHAPING UP TO BE EVEN WORSE THAN HALO INFINITE AND, IN A MOMENT OF WEAKNESS, I MISREAD THE PERSON I WAS RESPONDING TO.

PLEASE FORGIVE ME, KEEPER OF INTERNET TRUTH, LIGHT AND JUSTICE.

No biggie !
 

Ascend

Member
I don't get why everyone is jumping on psorcerer. You should be happy that he's pointing out that the TFLOPS advertised by nVidia require some additional awareness.

It's only deceiving people who focus on FLOPS and, like I said, don't understand that you cannot use those numbers to compare real life gaming performance.
They are not deceiving in any way.
Read those two sentences again. You're basically saying they are praying on the ignorant and then you say they are not deceiving in any way...

Mark my words. AMD will release a card that will perform better than at least one of these cards in the line-up, but will have a significantly lower amount of TFLOPS compared to that card, and many people will be buying nVidia, because of the higher TFLOPS.
The XSX GPU shouldn't be too far behind the RTX 3070, and yet, the RTX 3070 looks as if it is 75% faster if you look at the specs.

They are not deceiving in any way. Those cards can and will deliver 30 TFLOPS. TFLOPS don't care about your feelings or about the fact that games use XYZ% of INT32 ALUs.
They will only deliver the amount of advertised TFLOPS when there's zero INT32 instructions, which is unrealistic.
 
I'm not sure I get it.
Nvidia says that 3080 is 2x2080 (not fucking Ti, vanilla 2080) and you are saying that benchmarks will show that NV is wrong?

Just stop.

I never claimed a 3080 is double a 2080ti. I never said anything of the kind.

YOU are the one who claimed that "a 3080 is really just a 2080ti in disguise." I then provided proof you were wrong and you've been squirming and playing dumb ever since.
 
Last edited:

Krisprolls

Banned
it's like two PS5s duct-taped together, not 3.

It's a lot less than that because it's not the same architecture and consoles have tons of hardware optimizations PCs don't have, like all the custom I/O hardware on PS5. That and optimized low level APIs vs abstractions like Direct X on PC.

That's why you can get God of War or HZD on a 1.84 Tflop machine while PC port of HZD looks bad even years after and with a lot more "Teraflooops" (TM) on PC.

Apples and oranges. Like Carmack explained. But surely GAF PC fans know better than him.
 
Last edited:

psorcerer

Banned
I don't get why everyone is jumping on psorcerer. You should be happy that he's pointing out that the TFLOPS advertised by nVidia require some additional awareness.

That's because deep down they feel that I'm trying to attack their "precioussss"...
Although I will probably buy Ampere too.
Go figure...
 
TL;DR 1 Ampere TF = 0.72 Turing TF, or 30TF (Ampere) = 21.6TF (Turing)

Welp, at least you aren't claiming it's .5 Turing TF like you did in the other threads...but you're still wrong. Here's why.

Reddit Q&A

To accomplish this goal, the Ampere SM includes new datapath designs for FP32 and INT32 operations. One datapath in each partition consists of 16 FP32 CUDA Cores capable of executing 16 FP32 operations per clock. Another datapath consists of both 16 FP32 CUDA Cores and 16 INT32 Cores. As a result of this new design, each Ampere SM partition is capable of executing either 32 FP32 operations per clock, or 16 FP32 and 16 INT32 operations per clock. All four SM partitions combined can execute 128 FP32 operations per clock, which is double the FP32 rate of the Turing SM, or 64 FP32 and 64 INT32 operations per clock.

Read that bolded part carefully.

So, Turing GPU can execute 64INT32 + 64FP32 ops per clock per SM.
Ampere GPU can either execute 64INT32 + 64FP32 or 128FP32 ops per clock per SM.

Good. Addition, yay! So let's see where you went wrong...

Which means if a game executes 0 (zero) INT32 instructions then Ampere = 2xTuring
And if game executes 50/50 INT32 and FP32 then Ampere = Turing exactly.

And there it is. You can't only account for raw numerical performance; node process efficiency gains have to also be taken into account. So even if the numerical numbers between the two average out to being the same across both architectures, Ampere still sees IPC gains simply by being on a newer process (not to mention having other hardware present to offload certain taskwork more efficiently than present in Turing, such as RT, DLSS and AI through the Tensor cores. Equivalent performance in those areas on Turing would've required more raw GPU resources expended to cover the gap).

Some math: 36 / (100+36) = 26%, i.e. in an average game instruction stream 26% are INT32

I don't see why you're doing the math this way. Nvidia says they see 36 ADDITIONAL INT32 OPs for each 100 FP32 OPs. Just previously you listed Turing as 64 INT32 + 64 FP32, and one of Ampere's as the same. So would this above division not be worthless at that point? In both cases you get 128 OPs per SM per cycle; the two Ampere numbers are clearly an either/or, the SMs can operate either in full FP32 or mixed FP32/INT32 OP modes on a cycle.

So we can now calculate what will happen to both Ampere and Turing when 26% INT32 + 74% FP32 instruction streams are used.
I have written a simple software to do that. But you can calculate an analytical upper bound easily: 74%/50% = 1.48 or +48%
My software shows a slightly smaller number +44% (and that's because of the edge cases where you cannot distribute the last INT32 ops in a batch equally, as only one pipeline can issue INT32 per each block of 16 cores)
So the theoretical absolute max is +48%, in practice the absolute achievable max is +44%

Thus each 2TF of Ampere have only 1.44TF of Turing performance.

None of your calculations make sense in terms of context here. By the description at the top of your post, both Turing and Ampere are capable of the same number of INT32 OPs per clock cycle. Which means your numbers here should be reflective on both Ampere AND Turing, which ultimately means that the performance delta between an Ampere TF and Turing TF stays the same i.e 2TF Ampere would = 2 TF Turing (before factoring in node gains and improvements on earlier tech, API algorithms etc. present on Turing continued in Ampere, which would actually increase the Ampere performance over Turing, not decrease it).

Let's check the actual data Nvidia gave us:
3080 = 30TF (ampere) = 21.6TF (turing) = 2.14x 2080 (10.07TF turing)

Wrong; you seem to have forgotten that even with Ampere's new pipeline architecture they are capable of same INT32 OPs per clock cycle as Turing, and your numbers only applied the conditional to Ampere while ignoring doing the same with Turing (even though you claimed you were going to do so in a sentence before doing these calculations).

Nvidia is even more conservative than that and gives us: 3080 = 2x2080

Do you have a source where they specifically phrased 3080 performance in this manner?

3070 = 20.4TF (ampere) = 14.7TF (turing) = 1.86x 2070 (7.88TF turing)

Same as above two.

Nvidia is massively more conservative here giving us: 3070 = 1.6x2070

Again, where is a source that quotes official reps from Nvidia claiming this exact figurative comparison metric? You can't claim they said this or that if not able to source it yourself.

Actually if we average the two max numbers that Nvidia gives us (they explicitly say "up to") we get to even lower theoretical max of 1 Ampere TF = 0.65 Turing TF

Yes, "up to", as in, depending on what the game itself requires to be performed for calculations. Actually let's go back for a bit because I think you misread this following quote:

First, the Turing SM adds a new independent integer datapath that can execute instructions concurrently with the floating-point math datapath. In previous generations, executing these instructions would have blocked floating-point instructions from issuing.

So reading this again, it really does look like you got wonky with your calculations because it's Turing that would be hindered by running INT32 instructions on a clock cycle, not Ampere, since FP32 instructions would have to wait their turn until INT32 instructions are completed.

Which suggests that maybe these new FP32/INT32 mixed pipelines cannot execute FP32 at full speed (or cannot execute all the instructions).

Don't see where you're getting this from, especially considering I looked at your calculations and they seem dubious at best IMHO.

We do know that Turing had reduced register file access in INT32 (64 vs 256 for FP32) if it's the same (and everything suggests that Ampere is just a Turing facelift) then obviously not all FP32 instruction sequences can run on these pipelines.

Interesting speculation, but in light of what you've posted before, I don't know if the foundation of this speculation is necessarily sound.


Anyway a TF table:

Ampere TFTuring TF (me)Turing TF (NV)
3080 (Ampere)3021.619.5
3070 (Ampere)20.414.713.3
2080Ti (Turing)18.75 (me) or 20.7 (NV)13.513.5
2080 (Turing)14 (me) or 15.5 (NV)10.110.1
2070 (Turing)10.4 (me) or 11.5 (NV)7.57.5

So this is basically a recap of your calculations that I already touched on above, no need to repeat myself.

Needless to say, I think the context and conclusions of your calculations are inaccurate, because I don't think you initialized conditions for those calculations correctly.

Bonus round: RDNA1 TF
RDNA1 has no INT32 pipeline, all the INT32 instructions are handled in the main stream. Thus it's essentially almost exactly the same as Ampere, but it has no skew in the last instruction thus +48% theoretical max applies here (Ampere +2.3%)

Ampere TFTuring TF (me)Turing TF (NV)
5700XT (RDNA1)10.017.2?

Amusingly enough 5700XT actual performance is pretty similar to 2070 and these adjusted TF numbers show exactly that (10TF vs 10-11TF)

I'm not interested in discussing RDNA1 here as the crux of the discussion is on your (IMHO) flawed/inaccurate Ampere/Turing calculations, but needless to say I wouldn't be completely confident in these stated numbers either 🤷‍♂️

I don't get why everyone is jumping on psorcerer. You should be happy that he's pointing out that the TFLOPS advertised by nVidia require some additional awareness.

Personally I've no problem with anyone who wants to deep-dive into numbers these companies provide us. However, at least after looking over the OP's conditions for their calculations, I don't think they're accurate. Not the calculations themselves, but the foundation for initiating them and the context, because there are parts of the details NV provided he either ignored, didn't catch, and then did calculations with conditionals on only one side rather than both as they seemingly said they would've done on the outset.

If the conditions and context for calculations look suspect, I think that is worth questioning, long as it's respectful. FWIW there's been a rather strong push by some to downplay Nvidia's stuff, especially on the I/O front, following their presentation. If you look a little deeper you can infer why some people are doing it, too, but I'll leave that for another time and in fact I don't think it's really necessary to say why at this point :LOL:
 
Last edited:

CuNi

Member
Read those two sentences again. You're basically saying they are praying on the ignorant and then you say they are not deceiving in any way...


This is my whole point all along.
TFLOPS were never meant to be used to compare gaming performance.
They are a metric for scientific applications that run only on FP Operations and compare devices based on this.
A Ampere GPU with 10TF is just as good as a Pascal 10TF GPU. Obviously efficiency will be better but the idea of FLOPS does not care about efficiency.
It is a raw performance metric, period. NVidia says their cards will deliver up to 30tf. They are not lying.
It's people thinking they can derive gaming performance from this thus it's them fooling themselves.

They will only deliver the amount of advertised TFLOPS when there's zero INT32 instructions, which is unrealistic.

And I never said they will deliver those tf all the time!
Obviously they won't as its unrealistic as you said yourself!

That's what I meant by the fact that people don't understand the FLOPS metric at all.
If there were a metric by which we could exactly compare cards, we'd have no need for benchmarks. But that's the point.
We do not have a real metric on which to compare them scientifically accurately besides running multiple benchmarks and averaging out the results.
 
Welp, at least you aren't claiming it's .5 Turing TF like you did in the other threads...but you're still wrong. Here's why.

Reddit Q&A



Read that bolded part carefully.



Good. Addition, yay! So let's see where you went wrong...



And there it is. You can't only account for raw numerical performance; node process efficiency gains have to also be taken into account. So even if the numerical numbers between the two average out to being the same across both architectures, Ampere still sees IPC gains simply by being on a newer process (not to mention having other hardware present to offload certain taskwork more efficiently than present in Turing, such as RT, DLSS and AI through the Tensor cores. Equivalent performance in those areas on Turing would've required more raw GPU resources expended to cover the gap).



I don't see why you're doing the math this way. Nvidia says they see 36 ADDITIONAL INT32 OPs for each 100 FP32 OPs. Just previously you listed Turing as 64 INT32 + 64 FP32, and one of Ampere's as the same. So would this above division not be worthless at that point? In both cases you get 128 OPs per SM per cycle; the two Ampere numbers are clearly an either/or, the SMs can operate either in full FP32 or mixed FP32/INT32 OP modes on a cycle.



None of your calculations make sense in terms of context here. By the description at the top of your post, both Turing and Ampere are capable of the same number of INT32 OPs per clock cycle. Which means your numbers here should be reflective on both Ampere AND Turing, which ultimately means that the performance delta between an Ampere TF and Turing TF stays the same i.e 2TF Ampere would = 2 TF Turing (before factoring in node gains and improvements on earlier tech, API algorithms etc. present on Turing continued in Ampere, which would actually increase the Ampere performance over Turing, not decrease it).



Wrong; you seem to have forgotten that even with Ampere's new pipeline architecture they are capable of same INT32 OPs per clock cycle as Turing, and your numbers only applied the conditional to Ampere while ignoring doing the same with Turing (even though you claimed you were going to do so in a sentence before doing these calculations).



Do you have a source where they specifically phrased 3080 performance in this manner?



Same as above two.



Again, where is a source that quotes official reps from Nvidia claiming this exact figurative comparison metric? You can't claim they said this or that if not able to source it yourself.



Yes, "up to", as in, depending on what the game itself requires to be performed for calculations. Actually let's go back for a bit because I think you misread this following quote:



So reading this again, it really does look like you got wonky with your calculations because it's Turing that would be hindered by running INT32 instructions on a clock cycle, not Ampere, since FP32 instructions would have to wait their turn until INT32 instructions are completed.



Don't see where you're getting this from, especially considering I looked at your calculations and they seem dubious at best IMHO.



Interesting speculation, but in light of what you've posted before, I don't know if the foundation of this speculation is necessarily sound.




So this is basically a recap of your calculations that I already touched on above, no need to repeat myself.

Needless to say, I think the context and conclusions of your calculations are inaccurate, because I don't think you initialized conditions for those calculations correctly.



I'm not interested in discussing RDNA1 here as the crux of the discussion is on your (IMHO) flawed/inaccurate Ampere/Turing calculations, but needless to say I wouldn't be completely confident in these stated numbers either 🤷‍♂️
You are amazingly elegant with your posts. Great way to shut down trolls.
 

Kerlurk

Banned
 
Last edited:
You are amazingly elegant with your posts. Great way to shut down trolls.

Well, I have psorcerer on record saying he "posts to troll", and I am constantly guessing as to which direction he's really trying to supposedly do so.

If he and others weren't so hellbent on insisting some companies are embellishing their figures while conveniently pretending certain other companies don't do any of that themselves, I think everyone and everything would be better off for it. But alas that isn't so xD
 

psorcerer

Banned
Do you have a source where they specifically phrased 3080 performance in this manner?


Then there's the RTX 3080, which offers double the level of performance of the RTX 2080 for the same price.
The RTX 3070, meanwhile, is 60% more powerful than the RTX 2070.

The rest of your post is kinda irrelevant.
Please read the OP carefully. There are no errors there. Sorry (at least not the ones that you claim)


You're combative with everyone.

I'm combative? :messenger_grinning:
 
Last edited:
Top Bottom