• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Nvidia Ampere teraflops and how you cannot compare them to Turing

longdi

Banned
Nvidia is saying 3080 is almost 2x the 2080. What's wrong with that?

Pretty good for the same msrp.

Why did op claims 3080 is 2080ti in disguise?

Im convincing my self that the 10gb 3080 is good for 2-3years before i jump to the next 5080ti. :messenger_bicep:
 

mhirano

Member
People trying to recalculate FLOPS between different architectures and things like "Ampere FLOPS != Turing FLOPS" just show repeatedly that they inherently do not understand what FLOPS mean.
And do you understand?
If so, please explain
 

MarkMe2525

Member
So basically, in a nutshell, the xbox series X is still more powerfull than a 3070, considering the X's closed architechture. Perhaps the consoles aren't so "weak" after all.
I'm a big MS console fan and am really excited about the series x. I know XsX is going to have its own scaling solution, but did you see Nvidia throwing that 9x scaling out there. If that turns out to be legit, its gonna a more powerful platform in actual practice from day one and even more so into the future. I'm not saying MS is competing against them here. You still have to buy a whole pc if you don't have one to use this. We will have to wait and see what MS scaling solution is. I really do hope its powerful. I would love to play my entire Xbox library in 4k with auto hdr added!!

Edit: I'm speculating on how to the MS scaling solution is going to work here. I have no proof or reason to believe the XSX will be able to scale my old games that aren't programmed to do so. A man can have dreams though right?
 
Last edited:

pawel86ck

Banned
No matter how you look at it RTX still has 30TF because TF calculation method doesn't changed. But that's just theorethical number and we arnt seeing 3x more performance jump as numbers would suggest, but up to 2x exactly as Nvidia has said ("up" means it's lower than 2x on average).

Psorcerer is probably right saying Ampere CUDA cores are less efficient in real games compared to Turing, but the thing is each SM has now wayyy more CUDA cores, so Ampere SM architecture is way supperior in the end. That's why even RTX 3080 destroys 2080ti, and not to mention RTX 2080 (PS5 / XSX performance equivalent).
 
Last edited:

longdi

Banned
No matter how you look at it RTX still has 30TF because TF calculation method doesn't changed. But that's just theorethical number and we arnt seeing 3x more performance jump as numbers would suggest, but up to 2x as Nvidia has said ("up" means it's lower than 2x on average).

Psorcerer is probably right saying Ampere CUDA cores are less efficient now in real games, but the thing is each SM is wayyy faster than Turing SM, so Ampere architecture is way supperior in the end. That's why even RTX 3080 destroys 2080ti, and not to mention RTX 2080 (PS5 / XSX performance equivalent). People were expecting RTX 3090 to offer something like 30% performance jump over over 2080ti, yet even RTX 3080 is around 50% faster (up to 70% in few scenes) in Doom Eternal gameplay. I cant wait to see 3090 benchmarks, it should be nearly as fast as 2x 2080ti in raster and RT.

But 2080 base clocks were 1.5ghz iirc
3080 is now 1.7ghz.

2080 can hit 2ghz most of the time with good cooling

Does this means 3080 can do 2.23ghz most of the time with good cooling?

If 3080 is still stuck at 2ghz, meaning 2080 is more conservatively rated with higher head room?
 

pawel86ck

Banned
The jump from the 10XX series to the 20XX series was small ( only about 25-30% and that's WITH a 70% increase in price ) This is ONE of the reasons why so many people felt that Turing ( 20XX series ) was such a joke and a ripoff.
25% was minimum. There were games showing even up to 70% better raster performance, and average was probably somewhere around 40%.

Also if you add RT performance gains (especially in quake 2 RTX) and DLSS resolution boost, then you would need something like 6x 1080ti :p in order to play with similar graphics quality. The difference between pascal and turing was just GIGANTIC, but the problem was (and still is) not every game support RT and DLSS. It will however change in the near future, because many PS5 / XSX games will be build with RT in mind and PC ports will use it as well.

BTW. Unlike Turing, Pascal will not run next gen ports for long because it will not support HW data decompression. UE5 will run like 💩 on Pascal GPUs.
 
uh-hu...the same guy also claimed the 3080 is just a 2080Ti in disguise while we already have benchmarks factually refuting this.....
That guy is ignoring every fact that doesn´t fit his narrative. He´s a picture book troll-poster.
The arguments used are incorrect in some cases but the general point that TFLOPs aren’t comparable and that 30 TFLOPs is just a nice marketing number is right.
 

martino

Member
25% was minimum. There were games showing even up to 70% better raster performance, and average was probably somewhere around 40%.

Also if you add RT performance gains (especially in quake 2 RTX) and DLSS resolution boost, then you would need something like 6x 1080ti :p in order to play with similar graphics quality. The difference between pascal and turing was just GIGANTIC, but the problem was (and still is) not every game support RT and DLSS. It will however change in the near future, because many PS5 / XSX games will be build with RT in mind and PC ports will use it as well.

BTW. Unlike Turing, Pascal will not run next gen ports for long because it will not support HW data decompression. UE5 will run like 💩 on Pascal GPUs.

i also still expect old cards to take a good hit on performance when games will use mesh shader
 
The arguments used are incorrect in some cases but the general point that TFLOPs aren’t comparable and that 30 TFLOPs is just a nice marketing number is right.
Well, they are mathematically correct for certain use cases, so you can`t nail Nvidia to the cross here. That TF numbers do not directly translate to performance is nothing new.....but that`s what we have benchmarks for...which we already have in this case, too...but those real-life comparisons are getting ignored and the "it`s only marketing"-narrative is all that`s put to the foreground.
 
Last edited:
Well, they are mathematically correct for certain use cases, so you can`t nail Nvidia to the cross here. That TF numbers do not directly translate to performance is nothing new.....but that`s what we have benchmarks for...whicih we already have in this case, too.
They are 100% mathematically accurate but the context is this is a gaming card. They’re selling 30 TFLOPs to their audience but to be fair, consoles started this TFLOPs war so NVIDIA just used their own weapon against them.
 

GymWolf

Member
We already know it!
2080 = 10TF
3080 = 30TF = 3x 2080
Speed increase according to NV themselves: 2x
If it's that simple, why nobody sue nvidia for false marketing? You americans sue for every fucking thing, this sound like an opportunity to make easy money...
 
If it's that simple, why nobody sue nvidia for false marketing?
Because it`s not factually false and they themselves already gave a pretty realistic real-world-performance-increase number.
As crazy as the US law might be sometimes...that`s one case you`d lose.
 
Last edited:

Xyphie

Member
The benefit of them doing 2xFP32 paths per SM will benefit compute applications more than gaming performance. There are probably use cases like coin mining or whatever which will have close to full occupancy. There's nothing "marketing" or "false advertising" about it.
 

Stuart360

Member
All i know is that Doom Eternal prsentation was impressive to say the least. Not only was it comparing a 3080 to a 2080ti, and not a 2080, but a 40fps average better framerate at 4k is incredibly large for a new family of cards vs the previous family.
You would sometimes be lucky to get 40fps difference at 1080p between gpu families, but this was a 40fps difference at 4k.
Very impressive indeed.
 

GymWolf

Member
It's a lot less than that because it's not the same architecture and consoles have tons of hardware optimizations PCs don't have, like all the custom I/O hardware on PS5. That and optimized low level APIs vs abstractions like Direct X on PC.

That's why you can get God of War or HZD on a 1.84 Tflop machine while PC port of HZD looks bad even years after and with a lot more "Teraflooops" (TM) on PC.

Apples and oranges. Like Carmack explained. But surely GAF PC fans know better than him.
Dude horizon is a shitty port, that's why it runs like crap.
Console may be have some optimization, but most of the time the optimization means low setting and unstable 30 fps with a fov more tight than a virgin's pussy.
 
Last edited:
You can read all about the architecture here:


You are accusing Nvidia of lying and claiming that the 3080 is something it is not. That is a pretty heavy accusation.
That's the wrong Ampere.

GA100 is almost a completely different architecture to GA102 and below.
 

thelastword

Banned
Nice writeup, I was going to get on this myself, but you nailed it.....Also RDNA 1 was super fast, with the complete transition to Radeon DNA, removing all GCN constraints, we are in for a treat. The Navi TFLOP should be insanely fast and unrestrained with the rumored IPC gains, a pure screamer with insanely high clocks.

So in essence, a 20-25 TF AMD card should perform just as fast as these NV cards. Hopefully they have a Geometry Engine in place and have an enhanced HBCC feature with cache scrubbing there for launch to enhance perf even more...
 

pawel86ck

Banned
Nice writeup, I was going to get on this myself, but you nailed it.....Also RDNA 1 was super fast, with the complete transition to Radeon DNA, removing all GCN constraints, we are in for a treat. The Navi TFLOP should be insanely fast and unrestrained with the rumored IPC gains, a pure screamer with insanely high clocks.

So in essence, a 20-25 TF AMD card should perform just as fast as these NV cards. Hopefully they have a Geometry Engine in place and have an enhanced HBCC feature with cache scrubbing there for launch to enhance perf even more...
There are no rumored IPC gains for RDNA2 GPU's, we only know performance improvement per Watt.
 
Last edited:

FireFly

Member
This is the crux of this argument.

It IS "3x as fast" but that sentence by itself is meaningless. 3X faster at what? At maximum theoretical floating point operations per second - it IS. Just because you have 3X the TFLOPS doesn't mean 3X the framerate is some kind of guarantee. Far from it.

Look at it from the perspective of PS4 ( 1.8TFLOPS) to PS5 (10TFLOPS).

That's 5.55X the FLOPS.

So now think of a game that runs at 1080p 60FPS on PS4. Are you really expecting the PS5 to be able to run the same game at 333FPS? Cuz that shit ain't happening. ( maybe in some rare, contrived example )

Does this mean that the PS5's TFLOP count is "bullshit"? Using the "logic" of some people ITT, it would.

Very rarely does double the power equate to double the framerate. Some engines are just inefficient and doubling the framerate will require far more than double the power.
There are two separate issue here:

1.) How well games scale when increasing the texture rate, fill rate, compute and bandwidth in tandem, i.e scaling up the number of compute units.
2.) How well games scale when increasing one component (eg. FP32) alone, in certain workloads.

Scaling for 1.) should be pretty linear, when you are not CPU limited. For example a 2060 is basically half a 2080 Ti, and delivers around a half of the aggregate performance. Since RDNA 2 has 25% better IPC than Polaris (architecture in Xbox One X/PS4 Pro) which itself had better IPC than Tahiti (PS4) by maybe 10%, we should expect better than linear scaling with the PS4 to PS5 transition. You can also see this with the Xbox One to Xbox One X, where we sometimes saw bigger increases in performance than the difference in raw figures would suggest.

Scaling for 2.) depends entirely on the application and its mix of instructions, as indicated in the OP, and even by Nvidia themselves when they talk about *up to* a 2X improvement.
 
Last edited:
The simple fact is that Nvidia are amazing at marketing. They really do have world class marketing and sales strategies.

So the question: "Are Nvidia lying about their TFlops numbers?"

Well no. I believe the card (3080) can technically reach 30TF, it is mathematically correct.

"So what's the problem then?"

Well humans are pattern recognition and comparison machines. Our brains need to have some kind of metric to compare performance of things.

Cars have MPH/KPH and also Horse Power. Consoles used to use "bits" if anyone remembers as a relative rough performance metric and this was used in advertising until it essentially became meaningless for performance/performance comparisons.

GPUs don't really have a good performance metric to compare them (outside of Avg FPS which is probably the most accurate but is dependent on specific titles, engines and optimizations and will also change over time as more advanced games are released so not something you can print on a box as a single figure to sum up the overall performance of the GPU or put on a marketing slide.). At one point I think companies used to use polygons per second or triangles per second.

If anyone remembers the initial PS2 reveals with their nonsense marketing slides I believe they claimed 70,000,000 polygons per second compared to the Dreamcast's 3.5 million (roughly can't remember the exact figures). Of course these were unlit and unshaded polygons and theoretical maximum that did not reflect real world scenarios. But people not aware would believe that the PS2 was many many times more powerful than the Dreamcast when in reality although obviously more powerful it was nowhere near the level that figure would suggest.

That eventually went by the wayside. Then came TFlops. While these in and of themselves are not and should not be used as a gaming performance metric and instead be used for what they are, specific calculations. As we all know though, the heart wants what the heart wants. People needed something to be used as a performance metric and it looks like TFlops are the current trend. That means to the average person a higher TFlops number means more power than a lower TFlops number.

If anyone remembers the PS3 was claimed by Sony marketing slides as being a 2TF machine. While the PS4 is only a 1.8TF machine and yet the PS4 is many times more powerful and performant than the PS3 is. Similarly AMD cards had very strong compute capabilities with Vega which had a higher TF performance than a 1080ti and yet actual game performance was lower.

Even so, people needed a figure and TFlops are not exactly terrible when trying to compare relative performance of cards on the same architecture. Across architectures? It gets messy and turns into a meaningless figure.

So again, "what's the problem then?"

Knowing that the general gamer/average person will only see TF figures as representative of overall power or performance and tries to compare them directly Nvidia was very smart here but somewhat deceptive in that the TF figure is likely indeed correct (30TF) but that the actual performance of the card in games does not reflect that.

The idea from a marketing point of view is that average Joe gamer will look at lets say the new consoles at 10.3TF and 12TF and then compare them to 3080 at 30TF and assume that there is a linear performance gain. "My GOD! 3x the power of a PS4!" for example.

Similarly with Big Navi being unveiled soon according to rumours it could be anywhere from 17-20+ TFs, so even if the card matches or hell even exceeds a 3080, the average person will see 20 vs 30 and assume that the higher numbered card will be significantly more powerful.

The crux of it is that while technically 30TFs is true it is somewhat deceitful in a way as knowing people wrongly use TFs for performance/power Nvidia have made a really smart marketing move but that won't reflect in real world performance to that level. (The card will obviously still be very powerful). I think people take issue with being "taken for a ride" so to speak and are rightfully calling it out which I don't have an issue with.
 
Oh fucking hell.

The 3080 most certainly can do 30TF (assuming that there are no INT instructions in the pipeline). However 30TF doesn't fucking mean anything by the way of gaming performance for the love of all that is Holy.

unknown.png


Is this enough evidence to demonstrate that floating point performance does not directly translate to better gaming performance?

3080 has 3x the TF of the 2080, but is around 70% faster.
3080 has 2x the TF of the 2080 Ti, but is around 30% faster.

I've explained this before. The 3080 has 3 times as many (or 200% more) TeraFLOPS than the 2080, but as at best 80% faster in a handful of carefully selected Best-case-scenarios. Its more likely to be around 60-70% faster on average.
So what happened to all those extra FLOPS?


Welp, at least you aren't claiming it's .5 Turing TF like you did in the other threads...but you're still wrong. Here's why.

Reddit Q&A

Read that bolded part carefully.

Good. Addition, yay! So let's see where you went wrong...


And there it is. You can't only account for raw numerical performance; node process efficiency gains have to also be taken into account. So even if the numerical numbers between the two average out to being the same across both architectures, Ampere still sees IPC gains simply by being on a newer process (not to mention having other hardware present to offload certain taskwork more efficiently than present in Turing, such as RT, DLSS and AI through the Tensor cores. Equivalent performance in those areas on Turing would've required more raw GPU resources expended to cover the gap).


No. That's not how this works. Moving to a new process node doesn't automatically grant you additional "IPC". A process shrink usually grants you additional frequency (I say usually for a reason that will be clarified later). IPC comes from architectural improvements, not from a node transition.
A good example of this is Ryzen 1000 to Ryzen 2000. The only difference between the two is that it moves from GloFo 14nm to GloFo 12nm, which allows for slightly higher clocks, which is why it performs better. The actual IPC is exactly the same.

In this particular instance he appears to be exactly correct. In a given instance of a game where there is a mixture of FP/INT ops, Ampere = Turing. The only differentiating factor will be clock frequencies. If Ampere clocks higher than Turing, then it will be faster.
Unfortunately, Samsung 8nm is a trash node compared to TSMC 7nm, and while it does allow for higher density than Turing's 12nm process, there is no uplift in clock frequencies at all as evidenced by Nvidia's rated boost clock, which are maybe 30MHz higher on average?

Ampere's IPC uplift comes from it being able to do more floating point operations or mixed floats and ints depending on the circumstances, not from the node change. That's why its faster than Turing.
But again. The 30TF 3080 is not 3 times faster than the 10TF 2080, nor is it twice as fast as the 15TF 2080 Ti. Actual performance uplift is much much lower than that. Nvidia's own marketing and benchmarks is proof enough of that fact.


I don't see why you're doing the math this way. Nvidia says they see 36 ADDITIONAL INT32 OPs for each 100 FP32 OPs. Just previously you listed Turing as 64 INT32 + 64 FP32, and one of Ampere's as the same. So would this above division not be worthless at that point? In both cases you get 128 OPs per SM per cycle; the two Ampere numbers are clearly an either/or, the SMs can operate either in full FP32 or mixed FP32/INT32 OP modes on a cycle.

Yes.
Turing operated as follows:
[FP32 + INT32] concurrently

Ampere operates as follows:
[FP32 + FP32 | Int32] concurrently. Which as Nvidia stated, means Ampere can either operate in the same way as Turing; FP32 + INT32, OR it can operate as FP32 + FP32, depending on the needs of the pipeline.

All this means is that in workloads where there is little need for INT32 operations, or otherwise has a higher demand on floating point compute, Ampere should be dramatically faster.
Of course, that assumes that all of the Vector ALU's that Ampere now has can be fully occupied. As we all know, shader occupancy is never 100%. And in Ampere's case its shader occupancy could be even worse because other areas of the GPU's architecture might not have also been doubled up. We'll have to wait until the whitepapers are released.

So reading this again, it really does look like you got wonky with your calculations because it's Turing that would be hindered by running INT32 instructions on a clock cycle, not Ampere, since FP32 instructions would have to wait their turn until INT32 instructions are completed.

No. Turing can run FP32 and INT32 instructions at the same time. That's part of what contributes to its higher "IPC" than Pascal. Its why the 2080 with 2944 shaders can match a 1080Ti with 3584 shaders at similar frequencies.
Ampere can do the same. If there are no INTs in the pipeline, then it can massively speed up FP. But its either/or in Ampere's case. IT can either be twice as fast at FP or run FP&INT at the same time. Can't do both.

If the conditions and context for calculations look suspect, I think that is worth questioning, long as it's respectful. FWIW there's been a rather strong push by some to downplay Nvidia's stuff, especially on the I/O front, following their presentation. If you look a little deeper you can infer why some people are doing it, too, but I'll leave that for another time and in fact I don't think it's really necessary to say why at this point :LOL:

Of course you should downplay what Nvidia say. Why wouldn't you? Marketing is marketing. They can make all the fantasical claims they like. They can pay DF to benchmark a small selection of carefully selected games that offer the greatest performance increase for Ampere vs. Turing.
Never ever take marketing as gospel.
Wait for benchmarks before making grandiose claims on the basis of Floating point performance.

If you want a GPU for pure FP compute. Sure, Ampere is going to be lightning fast. The new Quadro's based on Ampere will absolute murder any and all competition at Compute. But that doesn't mean anything for gaming.
Vega was miles ahead of Pascal (and in some instances even Turing) at raw compute. But it gets absolute shat on in gaming workloads. I'm not saying Ampere is anything like Vega, because its got a much better architectural foundation, but the compute uplift doesn't directly translate to performance in gaming.
 

psorcerer

Banned
And in Ampere's case its shader occupancy could be even worse because other areas of the GPU's architecture might not have also been doubled up.

Turing had 1/4 of registers in the "uniform" pipeline. The one that's used for FP/INT32 in Ampere. Unless they fixed that it will be 4x harder to "occupy" it.
 
Last edited:

Jon Neu

Banned
3080 is 2080Ti in disguise.

tenor.gif


What a wonderul shitshow of a thread. The amount of salt among Sony fanboys the NVIDIA announcement has provided is so ridiculously delicious. If only MS still made their consoles with NVIDIA... Oh, can you IMAGINE the amounts of SALT then, my friend?

tenor.gif


Fun and super relevant fact: Nvidia sounds just like envy in spanish (envidia).

Most appropriate name ever.
 
tenor.gif


What a wonderul shitshow of a thread. The amount of salt among Sony fanboys the NVIDIA announcement has provided is so ridiculously delicious. If only MS still made their consoles with NVIDIA... Oh, can you IMAGINE the amounts of SALT then, my friend?

tenor.gif


Fun and super relevant fact: Nvidia sounds just like envy in spanish (envidia).

Most appropriate name ever.

While I agree that the Sony fans are certainly defensive vs the 3000 series hype I think the OP brings up a valid enough point.

We shouldn't be buying hook, line and sinker whatever theoretical figures we are given from anyone (be it Sony, MS, Nvidia, AMD, Intel etc..). Most of those figures that are designed for marketing tend to be artificial benchmarks or scenarios that don't represent real world usage or some kind of usage of funky mathematics to derive the desired number for marketing.

While the 3000 series seems impressive so far, the 30TF is not a good gaming performance metric and doesn't scale linearly compared to AMD TFs or previous Nvidia architecture's TF figures. I know everyone is on the Nvidia hype train right now because they had a great reveal showing and they are masters at marketing/mindshare but I don't think it is a bad thing to be sceptical and contextualize the figures given to us so far as they pertain to actual gaming performance.

Unless you also bought into all of the Sony marketing slides from PS2/PS3 era, "target renders" and whatnot which I somehow doubt. Calling out BS I think is healthy and something that will help educate everyone to be less likley to fall for these things in the future. I learned my lesson all the way back at the PS2 reveal/hype train/marketing.

Like with anything it is best to wait for real benchmarks before purchasing/jumping on the hype train. I'm looking forward to them as the 3000 series so far seem to be pretty great cards for solid prices (up to the 3080 anyway, 3090 is a different story) but I want to wait to see real benchmarks and also what AMD reveal and how it compares in performance before making a decision.
 
Last edited:

Ascend

Member
I wouldn't listen too deeply into MLID; won't get into specifics here, but he's made some rather inflammatory (and wrong) claims on other products in recent past.
He was right about the strength of the PS5 and XSX
He was right about what nVidia would be showing, including prices and performance.

Most importantly; the 33% weaker IPC is EXACTLY what we are seeing with the DF benchmarks. Even if you distrust the source, the numbers don't lie.
 

psorcerer

Banned
Some PCMR types just love wasting my time it seems.
Okay, let's find out what's different between Turing and Ampere.
NVIDIA Ampere GPU Architecture Tuning Guide

We have 5 new OPs

1. DMMA and HMNMX2 that operate on double and half-precision within new Tensor core. So far so good: Ampere is a Turing with newer Tensor cores.
2. LDGDEPBAR and LDGSTS - asynchronous barrier-aware load from global memory. Yup we know already: Ampere is a Turing with a newer memory controller.
3. REDUX - uniform datapath sync inter-thread reduction. Yup again: Ampere is a Turing with a newer memory controller and slightly different uniform (INT32) datapath.

That's it.
More than that they state:
The NVIDIA Ampere GPU architecture retains and extends the same CUDA programming model provided by previous NVIDIA GPU architectures such as Turing and Volta
The maximum number of concurrent warps per SM remains the same as in Volta (i.e., 64), and other factors influencing warp occupancy are:
  • The register file size is 64K 32-bit registers per SM.
  • The maximum number of registers per thread is 255.
  • The maximum number of thread blocks per SM is 32.
  • Shared memory capacity per SM is 164 KB, a 71% increase compared to GV100's capacity of 96 KB.
  • Maximum shared memory per thread block is 160 KB.
Overall, developers can expect similar occupancy as on Volta without changes to their application.

I.e. the arch is the same as Volta/Turing and the only ALU-related difference is "Shared memory capacity per SM is 164 KB, a 71% increase", i.e. bigger LDS cache.
Register file is unchanged, unified datapath is unchanged (4x less registers), max FP32 occupancy is thus increased 25% (1+1/4 = 125% over Turing) but not 2x

Really stop with the bullshit. I'm not devaluing your precious NV architecture. I'm just stating the facts.
 
Last edited:

CuNi

Member
And do you understand?
If so, please explain

Sure.
FLoating point
Operations
Per
Second

If you wouldn't have just quoted my first text and went out of your way to actually read my other posts, you'd maybe understand too.
FLOPS is a measurement used to describe the FP32 power of a piece of hardware. The 3080 will deliver 30TF. The 2080ti delivered 13.5TF.
So if you look for a GPU that you only use in scientific computing where all you need is FP32 performance than yes, a 3080 is more than 2x as good as a 2080ti.
Do all games only use FP32? No!
The inherent problem that I have with people "recalculating" TF from 1 Arch to another is, that they do so in a attempt to average out Game performance.
So you take a Scientific metric, that has nothing to do with gaming at all and try to bend and fit it to match gaming results that benchmarks provide.
That's like taking a truck engine engine with 400hp, comparing it to a 250hp car engine and be like "that truck engine may have more hp, but it accelerates way slower, so 1 truck HP is like 0.3 car hp!"
That's not how any of this works, and that's exactly why max TF numbers don't translate 1:1 into gaming performance, because gaming is much more than just simple FP32 calculations.

If people were to start and understand, they maybe could slowly move on from comparing TF numbers as the more tech we introduce, like ML, RT, etc. the less meaningful TF will become on a standalone basis and have less and less impact on gaming performance overall.
 
Sure.
FLoating point
Operations
Per
Second

If you wouldn't have just quoted my first text and went out of your way to actually read my other posts, you'd maybe understand too.
FLOPS is a measurement used to describe the FP32 power of a piece of hardware. The 3080 will deliver 30TF. The 2080ti delivered 13.5TF.
So if you look for a GPU that you only use in scientific computing where all you need is FP32 performance than yes, a 3080 is more than 2x as good as a 2080ti.
Do all games only use FP32? No!
The inherent problem that I have with people "recalculating" TF from 1 Arch to another is, that they do so in a attempt to average out Game performance.
So you take a Scientific metric, that has nothing to do with gaming at all and try to bend and fit it to match gaming results that benchmarks provide.
That's like taking a truck engine engine with 400hp, comparing it to a 250hp car engine and be like "that truck engine may have more hp, but it accelerates way slower, so 1 truck HP is like 0.3 car hp!"
That's not how any of this works, and that's exactly why max TF numbers don't translate 1:1 into gaming performance, because gaming is much more than just simple FP32 calculations.

If people were to start and understand, they maybe could slowly move on from comparing TF numbers as the more tech we introduce, like ML, RT, etc. the less meaningful TF will become on a standalone basis and have less and less impact on gaming performance overall.
This.

The TFLOPs war is silly but it started way back in 2013 and we can’t backpedal from it anymore. Might as well make the best of it instead of letting people run around with misinformation.
 

nochance

Banned
The I/O stuff is what it is.

I think his numbers are fine to be honest. Let me try and pitch it from another perspective, and maybe you'll understand...

RTX 2080: 10.7TF
RTX 3080: 29.8TF
RTX 3080 = 179% faster
DF benchmarks suggest RTX 3080 = ~80% faster at best

You don't see this as a problem? There is practically a 10TF mark-up on the RTX 3080.
I don't think you understand what "at best" means, as the video shows an increase of more than 80% on multiple occasions.
 
TL;DR 1 Ampere TF = 0.72 Turing TF, or 30TF (Ampere) = 21.6TF (Turing)

Reddit Q&A



A reminder from the Turing whitepaper:


So, Turing GPU can execute 64INT32 + 64FP32 ops per clock per SM.
Ampere GPU can either execute 64INT32 + 64FP32 or 128FP32 ops per clock per SM.

Which means if a game executes 0 (zero) INT32 instructions then Ampere = 2xTuring
And if game executes 50/50 INT32 and FP32 then Ampere = Turing exactly.

So how many INT32 are there on average?
According to Nvidia:



Some math: 36 / (100+36) = 26%, i.e. in an average game instruction stream 26% are INT32

So we can now calculate what will happen to both Ampere and Turing when 26% INT32 + 74% FP32 instruction streams are used.
I have written a simple software to do that. But you can calculate an analytical upper bound easily: 74%/50% = 1.48 or +48%
My software shows a slightly smaller number +44% (and that's because of the edge cases where you cannot distribute the last INT32 ops in a batch equally, as only one pipeline can issue INT32 per each block of 16 cores)
So the theoretical absolute max is +48%, in practice the absolute achievable max is +44%

Thus each 2TF of Ampere have only 1.44TF of Turing performance.

Let's check the actual data Nvidia gave us:
3080 = 30TF (ampere) = 21.6TF (turing) = 2.14x 2080 (10.07TF turing)
Nvidia is even more conservative than that and gives us: 3080 = 2x2080
3070
= 20.4TF (ampere) = 14.7TF (turing) = 1.86x 2070 (7.88TF turing)
Nvidia is massively more conservative here giving us: 3070 = 1.6x2070
Actually if we average the two max numbers that Nvidia gives us (they explicitly say "up to") we get to even lower theoretical max of 1 Ampere TF = 0.65 Turing TF
Which suggests that maybe these new FP32/INT32 mixed pipelines cannot execute FP32 at full speed (or cannot execute all the instructions).
We do know that Turing had reduced register file access in INT32 (64 vs 256 for FP32) if it's the same (and everything suggests that Ampere is just a Turing facelift) then obviously not all FP32 instruction sequences can run on these pipelines.

Anyway a TF table:

Ampere TFTuring TF (me)Turing TF (NV)
3080 (Ampere)3021.619.5
3070 (Ampere)20.414.713.3
2080Ti (Turing)18.75 (me) or 20.7 (NV)13.513.5
2080 (Turing)14 (me) or 15.5 (NV)10.110.1
2070 (Turing)10.4 (me) or 11.5 (NV)7.57.5

Bonus round: RDNA1 TF
RDNA1 has no INT32 pipeline, all the INT32 instructions are handled in the main stream. Thus it's essentially almost exactly the same as Ampere, but it has no skew in the last instruction thus +48% theoretical max applies here (Ampere +2.3%)

Ampere TFTuring TF (me)Turing TF (NV)
5700XT (RDNA1)10.017.2?

Amusingly enough 5700XT actual performance is pretty similar to 2070 and these adjusted TF numbers show exactly that (10TF vs 10-11TF)

So the conclusion is that its not actually an "OMG IT CAN SIMULATE LIFE WITH 30TF" type of reaction?
 
They are 100% mathematically accurate but the context is this is a gaming card. They’re selling 30 TFLOPs to their audience but to be fair, consoles started this TFLOPs war so NVIDIA just used their own weapon against them.

The conditions and contextualization of his calculations are fraudulent, that's the big issue. He implemented a scaling factor applied to Ampere on a percentage of INT32 OPs in a clock cycle, but seemingly forgot to apply the same to his Turing numbers, and also seemed to forget that both Ampere and Turing can perform 64 INT32 OPs per clock cycle so, essentially, they are even on that front at worst.

He was right about the strength of the PS5 and XSX
He was right about what nVidia would be showing, including prices and performance.

Most importantly; the 33% weaker IPC is EXACTLY what we are seeing with the DF benchmarks. Even if you distrust the source, the numbers don't lie.

Like I said I'm not gonna get into MLID ITT, but he's mis-stated things regarding Series X more than a few times over the past few months. Not necessarily on power-related stuff, mind you.

Stopped paying attention to his channel a while ago TBH so I wouldn't know what he said about Nvidia's stuff. I don't know who the "source" would be here; if you mean psorcerer, yes there are good reasons to distrust him. Somewhat similar with MLID. I don't distrust DF tho, just haven't gotten a chance to watch the video yet. But I will, and see what they have to say and go from there.

AquaticSquirrel AquaticSquirrel Look I get what you're saying and your main point is a completely valid one. However, can't we also put AMD's performance claims under similar bounds of skeptical speculation? What's to say PS5 and Series X will be able to hit their theoretical peak performance numbers when things like RT are being implemented? What's to say Nvidia can't hit closer to their theoretical peak when technologies like DLSS (which they're way ahead of AMD on) are utilized?

Since we're allowed to have a level of constructive speculation here regarding Nvidia's claimed performance numbers, we should be open to doing the same with AMD, and that extends to the next-gen consoles as well. Granted we don't know too much on "Big Navi" yet which prevents that from being the case; the most we have on it from a technical perspective is the Series X deep dive at Hot Chips. The question there though has and will continue to be how much of it is standard RDNA2 and how much of it has been customized for that particular system? Since we know already it's a mix of both, not all of what was detailed at that presentation can be taken as wholly representative of RDNA2 PC GPU performance metrics.

Other than that, yeah. We should always be critical of claimed performance numbers from these companies. NV, AMD, Microsoft, Sony, you name it. Probably the only one excused from this is Nintendo because they haven't chased the power/performance crown for decades.
 
Last edited:
I don't think you understand what "at best" means, as the video shows an increase of more than 80% on multiple occasions.
Its a carefully curated set of games that Nvidia chose. Undoubtedly, these games represent a best-case in terms of performance uplift over Turing.
 

Denton

Member
Great OP, interesting.

So if RTX2080Ti is running 1900mhz, it has some 16.5 TFLOPS, compared to 21 or thereabouts of 3080. That's not too bad, although I suspect in RT powered games 3080 will pull ahead more.
 

Kenpachii

Member
U compare tflops on the same architecture. Comparing 3000 tflops vs 2000 tflops is useless when architecture changes.
 
AquaticSquirrel AquaticSquirrel Since we're allowed to have a level of constructive speculation here regarding Nvidia's claimed performance numbers, we should be open to doing the same with AMD, and that extends to the next-gen consoles as well.

Other than that, yeah. We should always be critical of claimed performance numbers from these companies. NV, AMD, Microsoft, Sony, you name it. Probably the only one excused from this is Nintendo because they haven't chased the power/performance crown for decades.

I 100% agree, I wasn't necessarily saying to take everything AMD/MS/Sony have claimed/shown/will show at face value but to only be critical of Nvidia's claims/figures.

I think we are both pretty much in agreement especially your second point above. You also make some great points about potential differences with "official RDNA2" PC GPUs and what customisations/added/missing features are there compared to the consoles. Right now we don't know much as AMD have been incredibly tight lipped about Big Navi.

As with anything, lets wait for proper independent performance benchmarks for 3000 series and whatever the RDNA2 line of cards that AMD release and then we can see how they measure up against each other.
 
Last edited:

RoboFu

One of the green rats
Also wanted to add that the 3090 is the new titan card. it is expensive but compared to a titan x it is much cheaper. The 3070 is faster than a 2080 ti at $499. the price per performace has gone up a lot. it all screams a premtive strike and what ever RDNA 2 is. RADEON was the price per performance king, but unless they have something crazy under wraps I only see a hard time for AMD's gpus this year.
 
Top Bottom