• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Wolfenstein II and Far Cry 5 will support FP16 Rapid Packed Math

onQ123

Member
This only started when the new consoles launched and people needed an easy to use number that indicated performance. Before this generation no one used flops for gaming performance comparisons in the GPU space because they are quite useless for these comparisons, especially comparisons between different architectures. Even within a single architecture performance might not scale with flops.

I mean just look at the Pro. Even in GPU limited scenarios it doesn't achieve a 2.3 increase in performance like you would expect based on flops.


At the same time you have also seen games with 4X the resolution on PS4 Pro as you get on PS4

it's the theoretical peak performance that doesn't mean it's going to be 2.3X faster at everything it just mean that you can get up to 2.3X the 32-bit floating point performance out of PS4 Pro as you could PS4.
 

dogen

Member
Whenever we have gotten specs for GPUs they have been the theoretical peak floating point performance number & that doubles for fp16 when the GPU has RPM/double rate fp16.

You said "You will get 2X the performance out of your fp16 code as you would have without RPM".
 

onQ123

Member
You said "You will get 2X the performance out of your fp16 code as you would have without RPM".

LyxjuZg.jpg


and like I asked you the 1st time why are you acting like things are different now when we talk about GPU performance?
 

dogen

Member
LyxjuZg.jpg


and like I asked you the 1st time why are you acting like things are different now when we talk about GPU performance?

I never said anything is different. Doubling alu throughput alone is never gonna give you a consistent 2x speedup for everything. For some things, yeah maybe.
 

onQ123

Member
I never said anything is different. Doubling alu throughput alone is never gonna give you a consistent 2x speedup for everything. For some things, yeah maybe.

Which is why I clarified that it would be the theoretical peak floating point performance that would be 2X & asked why was you acting as if this isn't always the case.
 

dogen

Member
Which is why I clarified that it would be the theoretical peak floating point performance that would be 2X & asked why was you acting as if this isn't always the case.

Because you didn't at first. And I didn't act like anything.
 

dr_rus

Member
You're really overthinking things.


You will get 2X the performance out of your fp16 code as you would have without RPM I'm not talking about 2X the performance of the full game.

No, you won't. Not with RPM at least as this isn't a "true" doubling of math pipeline either, you're just doubling the data IO but not the number of executed instructions. It's called "packed math" for a reason. So even the argument of doubling the peak throughput is flawed here.

Can anybody give real world example of how this helps in games? Like which games supported this feature and what were the benefits?

Pretty much all mobile games make extensive use of FP16 calculations because a) it helps save power and b) on a 5-7" screen resulting shading artifacts are usually hard to notice so the performance gain can be pretty substantial. Switch is a good example of a platform where I'd expect most games to use FP16 in one way or another.

Hoping for PS4 Pro support implementing this technique for both games.

I'm 99% sure that it's the other way around, with PC getting FP16 code ported from Pro versions of said games.
 
You're really overthinking things.


You will get 2X the performance out of your fp16 code as you would have without RPM I'm not talking about 2X the performance of the full game.

Only if that shader is purely ALU limited which is far from always being the case. Shaders can be geo, rop, bandwidth, texture, fetch, mem latency limited among others in which case fp16 wont help

Without full context of those numbers, that reads as percentage speed ups of singular parts from whole effects (say, a single step in generating volumetric lighting, which may have 5-8 steps).

Or does that mean the Bloom in its entirety is now XX% faster than previous?

Those are just single steps from a given shader. The dice presentation that mentioned 30% was also just the resolve step of the checkerboard shader. From AMDs PR slides, overall performance of that 3d mark demo increased 13% by use of fp16. Actual games will probably get less
 

Izuna

Banned
OnQ didn't even say shit and people are making this thread about him.

As someone who laughed at some of the old things that he posted, those who are derailing this thread to repeatedly discredit Everything he speculates on or merely questions are being utter tossers.

Edit: didn't read the most recent page... Nevermind
 

onQ123

Member
Only if that shader is purely ALU limited which is far from always being the case. Shaders can be geo, rop, bandwidth, texture, fetch, mem latency limited among others in which case fp16 wont help



Those are just single steps from a given shader. The dice presentation that mentioned 30% was also just the resolve step of the checkerboard shader. From AMDs PR slides, overall performance of that 3d mark demo increased 13% by use of fp16. Actual games will probably get less

Why are people saying this as if it's not always the case?

Also compute rendering bypass ROPs
 

Donnie

Member
I did say "for the games devs are making", although i probably should have clarified, my bad dude.

Switch isn't getting far cry 5 or Wolfenstein 2 or any games that will utilize this technology to a large extent to benefit Pro by having an extension of a userbase

Unreal Engine 4 already supports FP16, the more games and therefore engines that start supporting it the more things will overlap. I mean for PS4 Pro it could be very useful to get PS4 games to run well at full 4k, considering the performance just isn't there to do it all the time at FP32.
 

Donnie

Member
should be interesting and will finally silence all the people who think fp16 is magic(nintendo fans). <15% seems realistic. should be easily enough to put vega ahead of a 1080 in such titles tho.

What are you actually talking about?.. The only people who treated Fp16 like it was magic were the trolls trying anything they could to play the system down. Acting like it was some kind of mythical fantasy feature that was simply theoretical. When in fact its only using lower precision to improve efficiency.
 
What are you actually talking about?.. The only people who treated Fp16 like it was magic were the trolls trying anything they could to play the system down. Acting like it was some kind of mythical fantasy feature that was simply theoretical. When in fact its only using lower precision to improve efficiency.

you missed all the Nintendo switch tech threads. it was amazing. they legit believe switch will perform close to an xbone because of fp16
 

Donnie

Member
you missed all the Nintendo switch tech threads. it was amazing.

I read them all, I saw people theorising on the degree to which fp16 could be utilised. With about the highest theory being 50% of code running in fp16 (possible 25% GPU performance increase) in a best case scenario. This AMD test is gaining 13% (not 13% GPU increase but 13% framerate)..

they legit believe switch will perform close to an xbone because of fp16

Who thought that?, I didn't see anyone claim thatl. Certainly not once Switch final specs were known. There was only one ridiculous theory regarding fp16 in that thread, that it was unachievable and nothing but a fantasy feature. Plenty of people tried to explain it was simply a efficiency improvement, relatively easily implemented and would absolutely give improved performance, but unfortunately people just continued with the "Teh Nintendo fans think Switch has magical features lulz!!111".
 

ozfunghi

Member
you missed all the Nintendo switch tech threads. it was amazing. they legit believe switch will perform close to an xbone because of fp16

I think i remember you. But the argument made was that FP16 (and a newer architecture) would make Switch perform close to HALF an XBone. With 30% increase in performance as stated by devs from DICE and Ubisoft (concerning FP16).
 
I read them all, I saw people theorising on the degree to which fp16 could be utilised. With about the highest theory being 50% of code running in fp16 (possible 25% GPU performance increase) in a best case scenario. This AMD test is gaining 13% (not 13% GPU increase but 13% framerate)..



Who thought that?, I didn't see anyone claim thatl. Certainly not once Switch final specs were known. There was only one ridiculous theory regarding fp16 in that thread, that it was unachievable and nothing but a fantasy feature. Plenty of people tried to explain it was simply a efficiency improvement, relatively easily implemented and would absolutely give improved performance, but unfortunately people just continued with the "Teh Nintendo fans think Switch has magical features lulz!!111".

yeah this is not how it actually went down

I think i remember you. But the argument made was that FP16 (and a newer architecture) would make Switch perform close to HALF an XBone. With 30% increase in performance as stated by devs from DICE and Ubisoft (concerning FP16).

there is no 30% stated by anyone. dice paper said 30% improvement of a single step of a single shader
 
You mean in that topic? Or no developer mentioned 30%?

http://gamingbolt.com/mass-effect-a...kerboard-rendering-30-improvement-due-to-fp16

EDIT: anyway, i think i saw you in topics about the Switch and FP16, and on those topics, it was certainly talked about numbers in that order, and not about Switch=XBone.

so do some math. CB resolve + taa is 2ms. lets assume that its best case and the fp16 applies to the taa as well. thats a whopping .6ms savings from a 33ms frame budget.
 

Space_nut

Member
No, you won't. Not with RPM at least as this isn't a "true" doubling of math pipeline either, you're just doubling the data IO but not the number of executed instructions. It's called "packed math" for a reason. So even the argument of doubling the peak throughput is flawed here.



Pretty much all mobile games make extensive use of FP16 calculations because a) it helps save power and b) on a 5-7" screen resulting shading artifacts are usually hard to notice so the performance gain can be pretty substantial. Switch is a good example of a platform where I'd expect most games to use FP16 in one way or another.



I'm 99% sure that it's the other way around, with PC getting FP16 code ported from Pro versions of said games.

Very good info thanks!!
 

ozfunghi

Member
so do some math. CB resolve + taa is 2ms. lets assume that its best case and the fp16 applies to the taa as well. thats a whopping .6ms savings from a 33ms frame budget.

I don't have to do any math, it's you that claimed that people said it would make the Switch perform close to XBO-level, which has only been theorized back when the rumor mill was still going on about the Switch getting the successor of the Tegra X1 when the hardware was still unknown. Which turned out to be false. Thats nearly a year ago.
 

onQ123

Member
No, you won't. Not with RPM at least as this isn't a "true" doubling of math pipeline either, you're just doubling the data IO but not the number of executed instructions. It's called "packed math" for a reason. So even the argument of doubling the peak throughput is flawed here.

It does double the peak theoretical throughput of fp16 I'm not sure how you can argue against that.
 
It does double the peak theoretical throughput of fp16 I'm not sure how you can argue against that.

Most of the users on this forum are not developers and many of them lack any knowledge on technology whatsoever. You are not helping anyone by quoting theoretical numbers if you don't put the time in to explain what these numbers mean and how they may be important in the real world. At best these posts are worthless and at worst they are misleading. Put some more effort into your posting and help the rest of this site's users understand what it is you are talking about.
 

onQ123

Member
Most of the users on this forum are not developers and many of them lack any knowledge on technology whatsoever. You are not helping anyone by quoting theoretical numbers if you don't put the time in to explain what these numbers mean and how they may be important in the real world. At best these posts are worthless and at worst they are misleading. Put some more effort into your posting and help the rest of this site's users understand what it is you are talking about.

You're the person who brought my name up in this thread & keep making sly remarks why don't you explain things.
 

dr_rus

Member
It does double the peak theoretical throughput of fp16 I'm not sure how you can argue against that.

It double peak theoretical throughput of FP16 _data_ but when it comes to actually doing math on this data all it does is perform the same shader instruction on two FP16 registers. This is a minor detail, sure, but it means that "packed math" is not the same as 2X FP32 in performance as you can't push twice as many _different_ instructions in FP16 as in FP32. So even saying that in pure theory FP16 math is double of FP32 performance isn't entirely correct as it's actually a different pipeline.

Effectively, you're looking at twice the width of SIMD units when using RPM, and the wider they are - the more SIMD power is wasted on oversubscribing resources. Thus RPM's effective peak throughput is less than twice that of FP32, even before we come down to facts about most shaders actually requiring FP32 or not being shader core limited in the first place.

Should probably also note that even GCN3 and GCN4 chips will get some benefits from using INT16/FP16 registers as it lowers on-chip bandwidth and storage requirements. This can easily eat up about half of the stated 16 bit advantage for FM Serra demo, for example, so the actual gains from RPM are likely to be even lower than the shown +13%.
 

low-G

Member
It double peak theoretical throughput of FP16 _data_ but when it comes to actually doing math on this data all it does is perform the same shader instruction on two FP16 registers. This is a minor detail, sure, but it means that "packed math" is not the same as 2X FP32 in performance as you can't push twice as many _different_ instructions in FP16 as in FP32. So even saying that in pure theory FP16 math is double of FP32 performance isn't entirely correct as it's actually a different pipeline.

Effectively, you're looking at twice the width of SIMD units when using RPM, and the wider they are - the more SIMD power is wasted on oversubscribing resources. Thus RPM's effective peak throughput is less than twice that of FP32, even before we come down to facts about most shaders actually requiring FP32 or not being shader core limited in the first place.

Should probably also note that even GCN3 and GCN4 chips will get some benefits from using INT16/FP16 registers as it lowers on-chip bandwidth and storage requirements. This can easily eat up about half of the stated 16 bit advantage for FM Serra demo, for example, so the actual gains from RPM are likely to be even lower than the shown +13%.

This is a good clarification. I find even myself (and I've done coding projects in ASM) thinking of instructions as the work done on data because of discussions by laymen on video games website, but it's not so at all.
 

onQ123

Member
It double peak theoretical throughput of FP16 _data_ but when it comes to actually doing math on this data all it does is perform the same shader instruction on two FP16 registers. This is a minor detail, sure, but it means that "packed math" is not the same as 2X FP32 in performance as you can't push twice as many _different_ instructions in FP16 as in FP32. So even saying that in pure theory FP16 math is double of FP32 performance isn't entirely correct as it's actually a different pipeline.

Effectively, you're looking at twice the width of SIMD units when using RPM, and the wider they are - the more SIMD power is wasted on oversubscribing resources. Thus RPM's effective peak throughput is less than twice that of FP32, even before we come down to facts about most shaders actually requiring FP32 or not being shader core limited in the first place.

Should probably also note that even GCN3 and GCN4 chips will get some benefits from using INT16/FP16 registers as it lowers on-chip bandwidth and storage requirements. This can easily eat up about half of the stated 16 bit advantage for FM Serra demo, for example, so the actual gains from RPM are likely to be even lower than the shown +13%.

That's where the problem come in because I was talking about having 2X the performance of fp16.

I'm not sure why you thought I was saying 2X the performance of fp32.


before RPM fp16 would be limited to the same rate as fp32 but now it's 2X the rate.
 
You're the person who brought my name up in this thread & keep making sly remarks why don't you explain things.

I would be happy too. Rapid packed math is a hardware feature that the PS4 Pro and AMD's Vega line of graphics card employ to speed up certain parts of the graphical pipeline, specifically those that can be calculated with less precision without a noticeable decrease in visual fidelity. Having the ability to run less complex calculations (FP16) at double the speed of the more complex calculations (FP32) means that in those situations where developers take advantage of that feature and FP16 is used, the PS4 Pro and Vega will receive a performance boost. How much of a boost, we still don't know. There are tests and benchmarks that show modest to significant gains in certain situations, effects and processes but it's too soon to tell how much of an impact Rapid Packed Math will have on the overall performance of a finished game.

So what do you think? Is my description accurate? Compare it to your posts stating that " PS4 Pro is 4.2TF FP32/ 8.4 TF FP16" and tell me which one is more likely to be helpful and informative to the average GAF user. That's what I mean when I say that you should put more effort into your posts.
 

onQ123

Member
I would be happy too. Rapid packed math is a hardware feature that the PS4 Pro and AMD's Vega line of graphics card employ to speed up certain parts of the graphical pipeline, specifically those that can be calculated with less precision without a noticeable decrease in visual fidelity. Having the ability to run less complex calculations (FP16) at double the speed of the more complex calculations (FP32) means that in those situations where developers take advantage of that feature and FP16 is used, the PS4 Pro and Vega will receive a performance boost. How much of a boost, we still don't know. There are tests and benchmarks that show modest to significant gains in certain situations, effects and processes but it's too soon to tell how much of an impact Rapid Packed Math will have on the overall performance of a finished game.

So what do you think? Is my description accurate? Compare it to your posts stating that " PS4 Pro is 4.2TF FP32/ 8.4 TF FP16" and tell me which one is more likely to be helpful and informative to the average GAF user. That's what I mean when I say that you should put more effort into your posts.

Why would I reply to dr_rus who was giving a deep dive tech explanation of RPM with this? he know what RPM is the problem was him thinking that I was talking about 2X the performance of fp32 when using fp16 when I was actually explaining to Dreamwriter that the big deal is that it's 2X the fp16 performance now vs fp16 being the same rate as fp32.

Maybe I'm missing something, but I just...don't get how this is a big feature. I just last week wrote a shader for pc using half-precision variables, it's a very old feature that every single shader programmer should be using, you don't use higher precision than you need. It's not difficult or anything either, you just replace floats with half or fixed. And all hardware has supported it for years. Heck, here's a bit from the Unity 3.55 documentation, from 2012:



Admittedly, I haven't been programming shaders very long, so I could very well be missing something.


The big deal is that instead of getting the same performance out of fp32 & fp16 you will get 2X the performance out of fp16 when it can be used.
 

Syrus

Banned
The Pro can NEVER be 8.4. Period. This fp16 and 8.4 is nonsense and is only misleading people that are not informed. Its inaccurate and wrong to spread this shit.

That said Im sure ID Tech will make good use of it to keep steady fps
 

onQ123

Member
The Pro can NEVER be 8.4. Period. This fp16 and 8.4 is nonsense and is only misleading people that are not informed. Its inaccurate and wrong to spread this shit.

That said Im sure ID Tech will make good use of it to keep steady fps

You're completely wrong & the fact that you use 8.4 without context tell me that you're only looking at the number.

PS4 Pro being 4.2 fp32 / 8.4tf fp16 is a fact I'm not even sure why that upset you so much.
 

Syrus

Banned
You're completely wrong & the fact that you use 8.4 without context tell me that you're only looking at the number.

PS4 Pro being 4.2 fp32 / 8.4tf fp16 is a fact I'm not even sure why that upset you so much.

Ive read on itand read other posters here. There will never be an instance or a game on the Pro that will use pure fp16 amd have the appearancw of 8.4 tf. That is a fact.

All you do is point back to a quote by Cerny and it was 100% PR theory talk. It will never happen.

Im bothered because you muck up these threads time and time again.

Fp16 is nothing more then an optimization tool at least as far as consoles are concerned
 

onQ123

Member
Ive read on itand read other posters here. There will never be an instance or a game on the Pro that will use pure fp16 amd have the appearancw of 8.4 tf. That is a fact.

All you do is point back to a quote by Cerny and it was 100% PR talk. It will never happen.

Im bothered because you muck up these threads time and time again.

Fp16 is nothing more then an optimization tool

I said PS4 Pro was 8.4tf fp16 before Cerny said anything I'm not pointing at a quote from Cerny & the PS4 Pro 8.4tf fp16 is the theoretical peak floating point performance this is the same way they have always giving you the numbers so why try to hop through loops trying to change how the numbers should be calculated now?

they give you the highest number that can be reached & it's up to the devs how much they get out of it but what the devs do with the specs doesn't change the specs.
 
Why would I reply to dr_rus who was giving a deep dive tech explanation of RPM with this?

Come on, you know these are not the posts I'm talking about. You had a series of posts in previous threads that had nothing in them other than "PS4 Pro is 4.2TF FP32/ 8.4TF FP16."
 

Syrus

Banned
I said PS4 Pro was 8.4tf fp16 before Cerny said anything I'm not pointing at a quote from Cerny & the PS4 Pro 8.4tf fp16 is the theoretical peak floating point performance this is the same way they have always giving you the numbers so why try to hop through loops trying to change how the numbers should be calculated now?

they give you the highest number that can be reached & it's up to the devs how much they get out of it but what the devs do with the specs doesn't change the specs.


Okay, let me try and understand your posts better.

Are you saying , in theory, if Devs used fp16 in such a way, they could make a Pro game that constantly would appear to have 8.4 worth of GPU power?
 

onQ123

Member
Come on, you know these are not the posts I'm talking about. You had a series of posts in previous threads that had nothing in them other than "PS4 Pro is 4.2TF FP32/ 8.4TF FP16."

Why would you be quoting a post in this thread to talk about post in previous treads asking me to explain stuff that people already know?

Okay, let me try and understand your posts better.

Are you saying , in theory, if Devs used fp16 in such a way, they could make a Pro game that constantly would appear to have 8.4 worth of GPU power?

You're still posting 8.4 without context & I'm not sure why. PS4 Pro is capable of 8.4tf of fp16 & 4.2tf of fp32 this is the way you have always gotten your GPU specs how much the devs get out of the GPU has never changed the specs of the GPU & yes if a dev made a game using nothing but fp16 they could get about as close to using the full 8.4tf fp16 as devs using fp32 can get to using the full 4.2tf fp32.

But like I said they give you the theoretical peak floating point performance & that doesn't change from game to game devs just make the best of the specs that they have.
 

martino

Member
Why would you be quoting a post in this thread to talk about post in previous treads asking me to explain stuff that people already know?



You're still posting 8.4 without context & I'm not sure why. PS4 Pro is capable of 8.4tf of fp16 & 4.2tf of fp32 this is the way you have always gotten your GPU specs how much the devs get out of the GPU has never changed the specs of the GPU & yes if a dev made a game using nothing but fp16 they could get about as close to using the full 8.4tf fp16 as devs using fp32 can get to using the full 4.2tf fp32.

But like I said they give you the theoretical peak floating point performance & that doesn't change from game to game devs just make the best of the specs that they have.
x-x-everywhere-dodges-dodges-everywhere.jpg


and without dodging how much performance do you think fp16 could bring to games at best ? and not x step(s) in one shader.
 
Ive read on itand read other posters here. There will never be an instance or a game on the Pro that will use pure fp16 amd have the appearancw of 8.4 tf. That is a fact.

All you do is point back to a quote by Cerny and it was 100% PR theory talk. It will never happen.

Im bothered because you muck up these threads time and time again.

Fp16 is nothing more then an optimization tool at least as far as consoles are concerned

onQ123 is merely talking about published max. theoretical numbers, in a thread on a very technical subject where there's already an inherent assumption that the folks this information is for already understand the difference.

Nowhere did he state in anyway that the "2x fp16" metric was anything other than ma. theoretical, and so you and a number of others dog-piling him and more-or-less accusing him of spreading misinformation is at best unfair and at worst shitting up the thread more than what you're accusing him of.

AMD themselves quote the same theoretical maximum doubling of fp16 throughput. See below:
Vega%20Final%20Presentation-27_575px.png


So shitting on onQ123 for merely stating published figures isn't really cool. onQ123 didn't make up these figures... AMD did. If you want to accuse anyone of being misleading, accuse AMD (and by extension Nvidia and every other computing IHV).

That the 2x performance metric, or 8TFLOPs fp16 numbers quoted by Sony, is a theoretical maximum and not an actual performance number is so self-evident to anyone who actually understands this stuff that it's not worth mentioning.

In fact, it's no more misleading than quoting 1.8TFLOPs as the fp32 GPU performance metric for PS4, as BOTH are theoretical max. figures. I don't see anyone jumping down the throat of people using those standard TFLOPs figures in discussion.

All published TFLOPs figures are bullshit marketing metrics published to promote a companies GPU product, that don't give the full picture in terms of actual performance.

So why don't we all learn to play nice and stop projecting shit onto posters who are clearly using the right context and correctly qualifying statements in their discussion.
 

AlStrong

Member
You're getting 2X the theoretical performance.

why do you act like the same rules don't apply with the numbers we been getting for years?

You need to take it within context of bottlenecks. The ALU performance may be doubled, but shaders are often composed of other operations (e.g. texture), which are not doubled. Performance gains can also be obfuscated by register pressure for certain shaders, which has to do with better utilization of the CU instead of an ALU bottleneck per se (e.g. Tonga/Polaris or even DX9 era GPUs).
 
Okay, let me try and understand your posts better.

Are you saying , in theory, if Devs used fp16 in such a way, they could make a Pro game that constantly would appear to have 8.4 worth of GPU power?

x-x-everywhere-dodges-dodges-everywhere.jpg


and without dodging how much performance do you think fp16 could bring to games at best ? and not x step(s) in one shader.

Posts like these don't really help the discussion anymore than what you're essentially accusing onQ123 of.

Theoretical maximum numbers are just that. Theoretical. GPUs are complex processors whose make up comprises more than just ALUs and whose real life workloads are equally as varied and complex.

In order to provide a basis for comparison between updates to the micro-architecture (as well as a point of comparison between competing GPU architectures), GPU vendors like AMD and Nividia came up with performance metrics which take some aspect of the GPU architecture, i.e. the ALU execution units, and calculate the maximum theoretical instruction throughput of that for each design. This is of course a max. theoretical number because no real video-game program consists of just endless FP (or floating point calculations). There's also aspects to the memory subsystem design to consider, i.e. how quickly data can be transported in and out of the execution units, as well as other potential bottlenecks caused by other areas of the GPU.

A max. theoretical FP throughput of a PS4 Pro GPU, in 32 bit precision, is 4.2 TFLOPs. This is a max. theoretical number. The max. theoretical throughput in 16 bit precision is 8.4 TFLOPS, i.e. also another max. theoretical figure.

You can argue, due to a number of factors that Dr-Suess explained already, that the 8.4 figure is a little more disingenuous than the 4.2 figure, but given that the 4.2 figure in the first place is disingenuous in terms of real performance in real code, you have to accept that both are technically correct in terms of being simplified theoretical performance numbers which are only really intended to provide no more than an indication of performance for specific workloads.
 

onQ123

Member
x-x-everywhere-dodges-dodges-everywhere.jpg


and without dodging how much performance do you think fp16 could bring to games at best ? and not x step(s) in one shader.


How did I dodge the question?

It's up to the devs how much they can get out of these specs just like it has always been but now they have more of a reason to use fp16 now because they can get more performance out of it. I cant give you a made up number when I don't know how much some devs are willing to optimize.


I can tell you that if a dev moved 50% of the fp32 code to fp16 they would get what we are used to seeing from 6.3TF fp32. but I don't know what % of the fp32 code devs will be able to get away with using fp16 for.


(This is only talking about the computing & not accounting for other difference in the GPU hardware)
 

dr_rus

Member
That's where the problem come in because I was talking about having 2X the performance of fp16.

I'm not sure why you thought I was saying 2X the performance of fp32.


before RPM fp16 would be limited to the same rate as fp32 but now it's 2X the rate.

FP16 performance will most certainly be less than 2X when compared to GCN3 or GCN4 chips as they gain performance from FP16 as well.
 

ethomaz

Banned
The Pro can NEVER be 8.4. Period. This fp16 and 8.4 is nonsense and is only misleading people that are not informed. Its inaccurate and wrong to spread this shit.

That said Im sure ID Tech will make good use of it to keep steady fps

Ive read on itand read other posters here. There will never be an instance or a game on the Pro that will use pure fp16 amd have the appearancw of 8.4 tf. That is a fact.

All you do is point back to a quote by Cerny and it was 100% PR theory talk. It will never happen.

Im bothered because you muck up these threads time and time again.

Fp16 is nothing more then an optimization tool at least as far as consoles are concerned
Please.

AMD and nVidia themselves uses the FP16 raw power nomenclature being 2x FP32 in the hardware that supports double rate.

The two metrics are common used by hardware manufactures and has nothing to do with your console wars bullshit...

Pro GPU: 4.2TFs FP32 and 8.4TFs FP16
Vega 64: 13.7TFs FP32 and 27.5TFs FP16

There is no nonsense when it is a fact of hardware specs... I can't understand what are you calling bullshit or fighting against lol

Edit - Some AMD slides from Vega just to end this discussion:


You can even look at AMD official site if you wish.
 

onQ123

Member
FP16 performance will most certainly be less than 2X when compared to GCN3 or GCN4 chips as they gain performance from FP16 as well.

Which goes back to you over thinking my post when I was explaining to someone what the big deal was with RPM.


I gave a simple answer for what was so different about using FP16 now.
 

onQ123

Member
Please.

AMD and nVidia themselves uses the FP16 raw power nomenclature being 2x FP32 in the hardware that supports double rate.

The two metrics are common used by hardware manufactures and has nothing to do with your console wars bullshit...

Pro GPU: 4.2TFs FP32 and 8.4TFs FP16
Vega 64: 13.7TFs FP32 and 27.5TFs FP16

There is no nonsense when it is a fact of hardware specs... I can't understand what are you calling bullshit or fighting against lol

Edit - Some AMD slides from Vega just to end this discussion:



You can even look at AMD official site if you wish.

I've been banned twice for this same information because people don't want to understand & just yell OMG he said PS4 Pro is 8.4TF!
 
Top Bottom