• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Wolfenstein II and Far Cry 5 will support FP16 Rapid Packed Math

onQ123

Member
Your info is totally legit for the topic here, but people have criticized the other circumstances you brought it up and your implications, rather than refuting the individual piece of info.

It's like repeatedly insisting on bringing up the fact that "some children who recieved vaccinations were later diagnosed with autism" in a thread about vaccine safety. That can be a factual statement, but it's being used to mislead.

I didn't bring it up I came into this thread & my name was in it .
 
Anything that helps boost performance and efficiency is welcome. Checkerboard rendering, this FP16 packed math stuff, whatever. It's all good, imho.
 

c0de

Member
There have always been games that implemented new technology, often with the help of manufacturers to try to demonstrate the new tech. Happened with 3dnow many years ago.
This doesn't mean anything, though. It would be something if it would be implemented in all major engines, though. But like with asynchronous compute, even mentioned in this article, we will see
a) if it will be implemented
b) when it will be implemented
c) what the exact benefits will be
 
Do they use this for AO implementations? Since clarity isn't as much of an issue there, it seems like this would work perfectly there, no?

I have no idea how does ambient occlusion calculation work, but based on that no idea, 1: it would work calculation wise, 2: it would quickly get limited by memory bandwidth anyway.
 

Gestault

Member
Are lower-precision results from FP16 useful for "physically" distant elements? With real-world optics, every time a distance is doubled, you're quartering the visual info (assuming the inverse-square principle). Can these calls be used for low LOD elements, or is that basically hogwash?
 
D

Deleted member 325805

Unconfirmed Member
Anything that helps boost performance and efficiency is welcome. Checkerboard rendering, this FP16 packed math stuff, whatever. It's all good, imho.

Yup, nothing kills a game quicker for me than performance issues, be it low frame rate, inconsistent frame rate, frame pacing errors, asset load hitching or pop in.
 
I didn't bring it up I came into this thread & my name was in it .

The way people are attacking you without you even being in threads about this subject is really out of order imo. Isn't one of the core rules of this forum to attack the post not the poster when disagreeing?

People are acting like you said PS4 Pro games will outperform Xbox One X games just because you stated the fact that on paper, the PS4 Pro GPU is 8.4TF @ FP16.

Several developers from major companies have already said on forums that FP16 now makes up as much as 30% of their code in certain modern games.

Again look at Snake Pass, a game that uses FP16 and has no business running on Switch in it's handheld mode. This modern UE4 game only runs on Switch because of FP16.

If all FP16 does is keep PS4 Pro / Xbox One X games above 30fps then great!
 

daman824

Member
The way people are attacking you without you even being in threads about this subject is really out of order imo. Isn't one of the core rules of this forum to attack the post not the poster when disagreeing?

People are acting like you said PS4 Pro games will outperform Xbox One X games just because you stated the fact that on paper, the PS4 Pro GPU is 8.4TF @ FP16.

Several developers from major companies have already said on forums that FP16 now makes up as much as 30% of their code in certain modern games.

Again look at Snake Pass, a game that uses FP16 and has no business running on Switch in it's handheld mode. This modern UE4 game only runs on Switch because of FP16.

If all it does is keep PS4 Pro / Xbox One X games above 30fps then great.
OnQ gets piled on because he consistently and repeatedly spreads FUD.

Like your 30% claim. I've never seen any quote from a developer actually making the claim that 30% of all their games code is fp16. I believe I have seen a developer use those numbers when specifically mentioning their checkerboard method.

The PS4 pro will perform exactly like you'd expect a 4.2 tf machine with a jaguar CPU would perform.
 
OnQ gets piled on because he consistently and repeatedly spreads FUD.

Like your 30% claim. I've never seen any quote from a developer actually making the claim that 30% of all their games code is fp16. I believe I have seen a developer use those numbers when specifically mentioning their checkerboard method.

The PS4 pro will perform exactly like you'd expect a 4.2 tf machine with a jaguar CPU would perform.

I never said that. I said "in certain modern games". Again people changing what people have said to suit their argument.
 

daman824

Member
I never said that. I said "in certain modern games". Again people changing what people have said to suit their argument.
Show me the quotes from "several developers from major companies".

As far as I know, one developer/team has said that fp16 has made their checkerboard rendering method 30% more efficient. No developers have claimed that fp16 has turned the pro into a 5.5tf machine
 

Unknown?

Member
Good news for Pro and PC. If TLOU2 uses this and other techniques I can see it being the best looking console game ever.
 

yurinka

Member
The PS4 Pro secret sauce is alive!
ss.png
 

Thraktor

Member
I'm not too surprised to see idTech making use of FP16, as Bethesda seem to have a very close relationship with AMD these days, but FC5 is quite interesting. Of course the Far Cry series uses its own engine, rather than Anvil or Snowdrop, so I wouldn't necessarily assume any wider support from Ubisoft, but it's a start. If the big publisher engines like Anvil and Frostbite start using it it could change the value proposition for Vega by quite a bit, but I'd hold my breath until we actually get any announcements (and benchmarks, for that matter).

It references the DICE / frostbite presentation, where fp16 RPM sped up 1 part of zhe checkerboarding pass by 30%. Not he whole renderer, just one part of one process. 'Tis fud what he doth speak.

As far as I can recall the 30% figure came from seppi over on beyond3d, but I think it was more a general estimate than relating to a specific game. He used to work on the Trials games and his current project is a semi-raytraced claymation game which looks amazing but isn't really representative of a typical game engine.

Regarding game engine support for FP16, the mobile branch of UE4 uses FP16 for all pixel shaders, although what proportion of shader workload is pixel vs geometry vs compute I don't know (although it would likely vary quite a lot from one game to the next).

Judging by the performance differential between Switch/PS4/PS4Pro in Snake Pass (the only UE4 comparison we have across the three platforms) it would appear that they're making quite a bit of use of FP16 in compatible consoles as well (or at least allowing developers to use it), but obviously it's impossible to say how extensive it is without a direct quote from the devs.
 

Lonely1

Unconfirmed Member
Is it set in stone that XBX is incapable of double rate FP16? I hope for Nvidia to also include the feature in their consumers GPUs.
 

onQ123

Member
Is it set in stone that XBX is incapable of double rate FP16? I hope for Nvidia to also include the feature in their consumers GPUs.

According to Goossen, some performance optimisations from the upcoming AMD Vega architecture factor into the Scorpio Engine's design, but other features that made it into PS4 Pro - for example, double-rate FP16 processing - do not. However, customisation was extensive elsewhere. Microsoft's GPU command processor implementation of DX12 has provided big wins for Xbox One developers, and it's set for expansion in Scorpio.

http://www.eurogamer.net/articles/digitalfoundry-2017-the-scorpio-engine-in-depth
 

ethomaz

Banned
I'd like to know why that is. Isn't it a weird feature to drop?
They didn't drop off anything because Polaris doesn't have double rate FP16.

What probably happened is that Sony asked (and paid) to AMD implement this Vega feature on Pro custom Polaris because they found potential in the feature. MS probably didn't see this potential and chose not implement (paid) it for Scorpio custom Polaris.
 

onQ123

Member
I'm not hardware proficient, at least not as much as a developer. But it looks like both platforms have taken somewhat different approaches in how they want their exclusives games rendered? Am I translating that right?

Exactly!

Sony basically made a 4K version of the PS4 out of what would have been a slim PS4 but instead of making a slim they used the die shrink to fit double the GPU into a PS4 that stayed about the same size & clocked it a little bit higher. MS basically is making a ready made UWP multimedia computer that's over 4X the power of Xbox One for a straight forward jump for pushing Xbox One games to 4K or close to it while PS4 Pro was made for pushing PS4 games to 4K or close to with cheaper rendering techniques. both will use cheaper rendering techniques when needed but Xbox One X is more of a brute force approach to having 4X the resolution of the base console.
 
half precision stuff has been a thing in GPU's since dx9, but for one reason or another it didn't take off the way it has been now. mostly been hidden from public eye.
FP16 was used for pixel shaders, but vertex shaders were always FP32 (vertices really need the increased accuracy to prevent artifacts). Back then, we had separate pixel/vertex shader pipelines, so different precision made sense. ATi (AMD) had FP24 for pixel shaders, which was a good compromise between FP16 and FP32.

When the industry migrated from DX9 (discrete shader pipelines) to DX10 (unified shaders), engineers had to adopt FP32 universally because of the vertex shader requirements for high accuracy.

Nowadays we have a new type of shaders: Compute Shaders. These type of shaders don't necessarily require 32-bit precision (i.e. AI pathfinding). nVidia even supports INT8 (8-bit integer) precision in HPC for deep neural network applications. People who compare this with the GeForce FX fiasco are really ignorant, since Compute Shaders didn't exist back then. GPUs were merely toys for 3D graphic processing/video games. Not anymore though:

https://www.theregister.co.uk/2016/09/13/nvidia_p4_p40_gpu_ai/
https://petewarden.com/2015/05/23/why-are-eight-bits-enough-for-deep-neural-networks/

Mixed FP16/FP32 coding will become the norm in the future, since next-gen consoles (PS5/XB2) are going to support it as a baseline feature (most likely with a Navi GPU). RPM, CBR and Ryzen will allow us to have more 4k60 games in the future. Good times ahead.

One could argue that PS4 Pro is beta testing "experimental" technologies for the PS5, just like PS3 (Cell SPUs) was beta testing for Compute Shaders... people may mock these technologies all they want, until they become mainstream enough. ;)

Last but not least, don't forget that Moore's Law has slowed down quite a bit. Some people expected that FP64 would replace FP32 in consumer GPUs, but that would halve performance or it would require double the amount of transistors for the same performance. That's why FP16 and even INT8 are becoming a thing. Mobile GPU design is also another factor to consider.

~

In regards to Switch (Tegra X1), does anyone know if any Nintendo exclusives (Zelda BotW, MK8 Deluxe, Splatoon 2) utilize RPM/2xFP16? 3rd party devs are even less likely to use it, but who knows... it's an interesting feature nonetheless.

When it comes to PS4 exclusives, I haven't heard of ND or GG utilizing it in current games. Maybe in the future.

There have always been games that implemented new technology, often with the help of manufacturers to try to demonstrate the new tech. Happened with 3dnow many years ago.
This doesn't mean anything, though. It would be something if it would be implemented in all major engines, though. But like with asynchronous compute, even mentioned in this article, we will see
a) if it will be implemented
b) when it will be implemented
c) what the exact benefits will be
Remember when nVidia had released GeForce 256 (very expensive back then) and everyone said that T&L was a "useless gimmick"? Most people had 3Dfx graphics cards and thought that a fast enough CPU would suffice for vertex transformation. T&L/DX7-ready games took a while to become mainstream... and look where we are now.

People will always be skeptical when a paradigm shift disrupts the status quo.
 
Maybe I’m missing something, but I just...don’t get how this is a big feature. I just last week wrote a shader for pc using half-precision variables, it’s a very old feature that every single shader programmer should be using, you don’t use higher precision than you need. It’s not difficult or anything either, you just replace floats with half or fixed. And all hardware has supported it for years. Heck, here’s a bit from the Unity 3.55 documentation, from 2012:

Unity 3.55 said:
Precision of computations
When writing shaders in Cg/HLSL, there are three basic number types: float, half and fixed (as well as vector & matrix variants of them, e.g. half3 and float4x4):

float: high precision floating point. Generally 32 bits, just like float type in regular programming languages.
half: medium precision floating point. Generally 16 bits, with a range of -60000 to +60000 and 3.3 decimal digits of precision.
fixed: low precision fixed point. Generally 11 bits, with a range of -2.0 to +2.0 and 1/256th precision.
Use lowest precision that is possible; this is especially important on mobile platforms like iOS and Android. Good rules of thumb are:

For colors and unit length vectors, use fixed.
For others, use half if range and precision is fine; otherwise use float.
On mobile platforms, the key is to ensure as much as possible stays in low precision in the fragment shader. On most mobile GPUs, applying swizzles to low precision (fixed/lowp) types is costly; converting between fixed/lowp and higher precision types is quite costly as well.

Admittedly, I haven’t been programming shaders very long, so I could very well be missing something.
 

Gitaroo

Member
Maybe I’m missing something, but I just...don’t get how this is a big feature. I just last week wrote a shader for pc using half-precision variables, it’s a very old feature that every single shader programmer should be using, you don’t use higher precision than you need. It’s not difficult or anything either, you just replace floats with half or fixed. And all hardware has supported it for years. Heck, here’s a bit from the Unity 3.55 documentation, from 2012:



Admittedly, I haven’t been programming shaders very long, so I could very well be missing something.


I think the discussion here is the performance gain from fp16 and hardware designed with it.
 

Ushay

Member
Exactly!

Sony basically made a 4K version of the PS4 out of what would have been a slim PS4 but instead of making a slim they used the die shrink to fit double the GPU into a PS4 that stayed about the same size & clocked it a little bit higher. MS basically is making a ready made UWP multimedia computer that's over 4X the power of Xbox One for a straight forward jump for pushing Xbox One games to 4K or close to it while PS4 Pro was made for pushing PS4 games to 4K or close to with cheaper rendering techniques. both will use cheaper rendering techniques when needed but Xbox One X is more of a brute force approach to having 4X the resolution of the base console.

Brute force isn't how I would word it given the DX12 customisations in the hardware. Feels more like they want to encapsulate backwards compatibility and ensure the newer consoles can work with previous games.

Sony, likewise have likely implemented their first party dev feedback straight into the design if the Pro.
 

onQ123

Member
Maybe I’m missing something, but I just...don’t get how this is a big feature. I just last week wrote a shader for pc using half-precision variables, it’s a very old feature that every single shader programmer should be using, you don’t use higher precision than you need. It’s not difficult or anything either, you just replace floats with half or fixed. And all hardware has supported it for years. Heck, here’s a bit from the Unity 3.55 documentation, from 2012:



Admittedly, I haven’t been programming shaders very long, so I could very well be missing something.

The big deal is that instead of getting the same performance out of fp32 & fp16 you will get 2X the performance out of fp16 when it can be used.
 
I think the discussion here is the performance gain from fp16 and hardware designed with it.
Yes, but my point is that as far as I know all hardware based on modern programmable shader systems have “fp16”, from the original XBox onward, including cell phones, and all games should have been using it for years and getting better performance. We shouldn’t be seeing any new performance gains from new games, because it’s not some amazing new technique made for 4k gaming that a game would need to be designed around, it’s a simple pixel-shader optimization feature that lets the GPU process half as much data per pixel, thus increasing the speed that the pixel will be rendered. Saying Wolfenstein II and Far Cry 5 will support it is like saying those games are going to support baked lighting, or even polygons.
 

tuxfool

Banned
Yes, but my point is that as far as I know all hardware based on modern programmable shader systems have ”fp16", from the original XBox onward, including cell phones, and all games should have been using it for years. We shouldn't be seeing any new performance gains from new games, because it's not some amazing new technique made for 4k gaming that a game would need to be designed around, it's a simple pixel-shader optimization feature that lets the GPU process half as much data per pixel, thus increasing the speed that the pixel will be rendered. Saying Wolfenstein II and Far Cry 5 will support it is like saying those games are going to support baked lighting, or even polygons.

As others have explained, you can process fp16 at double rates and int8 at quad rates. When previously if you used a fp16 variable it would only be executed at the same speed as fp32. Now, there is also a halfway point where you could stuff two fp16 variables into the same fp32 register but that would only relieve register pressure and not double execution speed.

So to be clear, currently you using fp16 variables your gpu isn't executing them any faster than fp32.
 

dr_rus

Member
The big deal is that instead of getting the same performance out of fp32 & fp16 you will get 2X the performance out of fp16 when it can be used.

You won't get 2X performance out of FP16 even when looking only at the shader using FP16. The slide you've shown is giving you the typical performance gains from using 16 bit math (note that INT16 and FP16 are different things, and they are supported differently by the h/w base) in specific shaders in a demo made specifically to showcase this feature.

Their final impact on the whole frame rendering time will be way less than stated there. I expect games to get less than 10% of performance above FP32 from double speed FP16 without loss of quality. Most graphics calculations pretty much require FP32, some graphics calculations which can live with FP16 are in fact not limited by math throughput. It's not a magic "2X" feature.
 

onQ123

Member
You won't get 2X performance out of FP16 even when looking only at the shader using FP16. The slide you've shown is giving you the typical performance gains from using 16 bit math (note that INT16 and FP16 are different things, and they are supported differently by the h/w base) in specific shaders in a demo made specifically to showcase this feature.

Their final impact on the whole frame rendering time will be way less than stated there. I expect games to get less than 10% of performance above FP32 from double speed FP16 without loss of quality. Most graphics calculations pretty much require FP32, some graphics calculations which can live with FP16 are in fact not limited by math throughput. It's not a magic "2X" feature.


You're really overthinking things.


You will get 2X the performance out of your fp16 code as you would have without RPM I'm not talking about 2X the performance of the full game.
 

dogen

Member
You're really overthinking things.


You will get 2X the performance out of your fp16 code as you would have without RPM I'm not talking about 2X the performance of the full game.

No you won't. Only if the shader in question is 100% alu bound.
 

killatopak

Member
Can anybody give real world example of how this helps in games? Like which games supported this feature and what were the benefits?
 

LCGeek

formerly sane
Remember when nVidia had released GeForce 256 (very expensive back then) and everyone said that T&L was a "useless gimmick"? Most people had 3Dfx graphics cards and thought that a fast enough CPU would suffice for vertex transformation. T&L/DX7-ready games took a while to become mainstream... and look where we are now.

People will always be skeptical when a paradigm shift disrupts the status quo.

I'm really starting to like the cut of your jib in certain topics. Expect a pm later on some of the networking stuff we discussed.

I also remember people thinking powervr and 3dfx would lead things as well.

Can anybody give real world example of how this helps in games? Like which games supported this feature and what were the benefits?

Polygonal spirte mentioned snake pass for switch which is a great example cause without it or optimization in general the game wouldn't run that well on the platform.
 

Without full context of those numbers, that reads as percentage speed ups of singular parts from whole effects (say, a single step in generating volumetric lighting, which may have 5-8 steps).

Or does that mean the Bloom in its entirety is now XX% faster than previous?
 

napata

Member
Whenever we have gotten specs for GPUs they have been the theoretical peak floating point performance number & that doubles for fp16 when the GPU has RPM/double rate fp16.

This only started when the new consoles launched and people needed an easy to use number that indicated performance. Before this generation no one used flops for gaming performance comparisons in the GPU space because they are quite useless for these comparisons, especially comparisons between different architectures. Even within a single architecture performance might not scale with flops.

I mean just look at the Pro. Even in GPU limited scenarios it doesn't achieve a 2.3 increase in performance like you would expect based on flops.
 
Top Bottom