• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PS4 Pro and Half Floats. 16bit at 8.4 TF vs. 32bit at 4.2 TF. Explanation on this?

Status
Not open for further replies.
Would you say it could reach 6TFlops? ;)

I kid, but would accept an answer.

Relative performance is the important keyword.

Tflop means

a unit of computing speed equal to one million million (1012) floating-point operations per second.

So no it wouldn't reach 6Tflops, it might reach relative performance to a standard 6Tflop capable machine though.
 

gofreak

GAF's Bob Woodward
It is up to the game to make use of such half-precision floating point variables, but I'd say for the sake of compatibility with PS4old most developers will not bother (the ISA should allow compiled shaders to work seamlessly across both consoles).

Halfs are already compiled to floats on platforms without this kind of support. You wouldn't need a specific shader codepath. That is, a shader code base that uses halfs where possible will work seamlessly on the regular PS4, just without the optimised performance.

I doubt very many devs will revise their shaders on games this year around this, but I can see mindfulness of this being pretty common going forward. Pretty much all console hardware going forward will benefit, and more and more PC GPUs.
 

True Fire

Member
I'm getting so sick of the phrase TFlop lol. It's just like RAM RAM RAM in 2013, no one knows what it actually means and it's just a game of "which number is bigger?"
 

Horp

Member
Worth mentioning is that in one of the most common types of shader, PBR shaders, 16 bits isn't enough for 90% (just a rough estimate) of the calculations. Calculating normals for reflection vectors needs 32 bit to not cause heavy banding, and the frame buffer normally being HDR these days means operations towards final pixel color has to be in 32 bit.

I'm getting so sick of the phrase TFlop lol. It's just like RAM RAM RAM in 2013, no one knows what it actually means and it's just a game of "which number is bigger?"

Wrong. It's not just a number.
Think of it like this, perhaps: RAM is how big the trunk of your car is and TFlop is how fast your car can go. It used to be that we really struggled to get a big enough trunk to do any kind of valuable transportation. These days we are struggling to go fast enough. We are really trying to go faster, cause that makes games prettier, but heat/power are limiting factors.
 

BroBot

Member
Its like the system is a super sayian but it also can become an ascended super sayian like when vegeta had that bulky form it wasnt a perfect form to SSJ2 but it allowed for additional power. lol

Thanks. Easier to understand now. Excited for the Playsaiyan 4 Pro. Not the onpy question that remains is if I should trade my OG PS4 towards it.
 

Izuna

Banned
You're going to get boosts in performance for certain, perhaps post-processing techniques. This isn't going to make the Pro twice as fast as people think, but perhaps there are some effects here and there that will cause less of a performance hit.
 

lord pie

Member
16 bit computation is generally not very useful outside of a small subset of problems.
The precision loss is just too great, for example, the calculation (10.0 + 1.0 / 500.0) cannot be computed accurately with 16bit floating point precision - the result will be 10.0. Now imagine performing 100 such calculations one after the other, and the accumulated precision loss can become quite dramatic.
Precision loss can be a problem with 32bit calculations sometimes, and 16bit is *dramatically* worse (thousands of times worse).

As such the number of places in a game where 16bit computation can be used is generally quite limited, but it's still a useful tool in those places. You won't see it being used in things that require high precision (lighting, animation, etc) but it may be practical where you are already dealing with already low-ish precision data that doesn't require heavy processing (e.g. post processing, antialiasing, etc). From personal experience, even these obvious cases you have to be very careful.

However even when you have a valid use case it won't mean you'll get a performance boost (sometimes it can be slower!) - almost all modern processors (including mobile parts) are limited by bandwidth and latency (either internal or external), as shunting data around draws a lot of power. As such, even using 16bit precision and doubling your theoretical FLOPs can typically mean you just end up bottlenecked elsewhere in the system.

But, this is ironically where 16bit can be most useful.

The primary advantage for hardware 16bit computation is the registers are half the size of 32bit - not that they are twice as fast. So sometimes you can more easily reduce register pressure for a shader by using 16bit computation, which can sometimes allow for a greater number of wavefronts to be active, which can sometimes improve *latency* hiding (assuming that is your bottleneck), thus sometimes improving performance. Lots of assumptions there though.

Thus, generally, the benefit of 16bit computation is not the doubled theoretical FLOPs, but the smaller data size and potential for improved latency hiding that results (Cerney actually mentioned this). But as with any modern, highly complex hardware, it's never black and white.


Where 16bit is really useful is a storage format. 16bit render targets, vertex data, etc, has been supported for a very, very long time (Vita had full support for 16bit render targets with blending, for example). It's great here because you do calculations in 32bit, then encode to 16bit or less from the higher precision calculated value.

This reduces memory usage, cache thrashing, etc (which is what the Killzone presentation mentioned above was referencing - NOT 16bit calculations).

Basically:

16bit storage: awesome, and supported everywhere.

16bit computation: very limited use, can often be bottlenecked elsewhere, but in certain limited situations can be very useful.

In no way whatsoever does supporting FP16 double the performance of a machine, but it does provide a useful tool to help micro optimize certain parts of a game though.

TFLOPs are still a fairly useful metric, because they are often balanced against other bottlenecks within the system (it's a waste of silicon to add more ALU if you are 95% latency bound). However, the catch is that this only applies to 32bit ops. The 16bit FLOP measurements they commonly use for mobile parts are best completely ignored (*cough* nvidia *cough*)
 

Rodelero

Member
It's also important to recognize that FP16 precision isn't just an issue for small fractional components of a number, but it also has serious range limitations, The largest number you can represent is 65504. Period. Anything larger either rounds down to 65504 or up to infinity. Also consider that the next smallest number you can represent is 65472, and yes I'm being serious. Not only can't you represent fractional components of numbers this "large" but the 31 integer values between these two values don't exist, either.

Great posts and all entirely correct, but just to clarify for people who aren't familiar with floating point numbers who may look at this and wonder whether half precision floats are uselessly imprecise: Floating point numbers are, by design, far more precise at low magnitudes than at high. 65504 is the highest number FP16 can represent, and 65472 is the second highest. However, with smaller numbers, the gaps are far smaller. It is possible to represent 1 using an FP16, and the next value from there is 1.0009765625. The gaps between values continue to get smaller as you approach 0.

If you put your mind to it, a lot can be done with numbers that are that precise, even if they do have their limitations. A general rule of thumb I apply when writing shaders for mobile apps is to represent positions with high precision, but when working with colours, directions, and texture coordinates, you can often get away with lower precision. These rules aren't applicable everywhere (they'd end up being fairly insufficient for elements of physically based rendering), but they're usually a good start.

Horp said:
Worth mentioning is that in one of the most common types of shader, PBR shaders, 16 bits isn't enough for 90% (just a rough estimate) of the calculations. Calculating normals for reflection vectors needs 32 bit to not cause heavy banding, and the frame buffer normally being HDR these days means operations towards final pixel color has to be in 32 bit.

You're right about PBR, a lot of the calculations related to calculating reflections/specular highlights do require high precision, however I don't really see why HDR would require FP32. It definitely requires more precision, but FP16 is more than sufficient to store HDR colour - in fact it's fairly common to use lower precision than FP16 for HDR framebuffers (e.g. R11G11B10, or R9G9B9E5).
 

Mindlog

Member
I really appreciate the effort some members put into these more technical posts. Undoubtedly some stuff can be wrong, but the general understanding one can glean from the information presented is very helpful.
 

Horp

Member
Great posts and all entirely correct, but just to clarify for people who aren't familiar with floating point numbers who may look at this and wonder whether half precision floats are uselessly imprecise:

Floating point numbers are, by design, far more precise at low magnitudes than at high. 65504 is the highest number FP16 can represent, and 65472 is the second highest. However, with smaller numbers, the gaps are far smaller. It is possible to represent 1 using an FP16, and the next value from there is 1.0009765625. The gaps between values continue to get smaller as you approach 0.

If you put your mind to it, a lot can be done with numbers that are that precise, even if they do have their limitations. A general rule of thumb I apply when writing shaders for mobile apps is to represent positions with high precision, and colours, directions, and texture coordinates, with lower precision.



You're right about PBR, a lot of the calculations related to calculating reflections/specular highlights do require high precision, however I don't really see why HDR would require FP32. It definitely requires more precision, but FP16 is more than sufficient to store HDR colour - in fact it's fairly common to use lower precision than FP16 for HDR framebuffers (e.g. R11G11B10, or R9G9B9E5).
Huh, interesting. The ones I've worked with are higher than 16. I just know how hard it was to get my PBR renderers to be banding free on older iOS-hardware with FP16 limit. Not only for. Normal map but for cubemaps and final color. Maybe the final color artifact was due to banding earlier in the shader.

Edit: you are right. HDR buffers are normally FP16. Also, FP16 supports blending, which FP32 normally doesnt.
 

Lady Gaia

Member
Great posts and all entirely correct, but just to clarify for people who aren't familiar with floating point numbers who may look at this and wonder whether half precision floats are uselessly imprecise...

It's hard to give a general impression of when they're useful and when they aren't to someone who doesn't actively develop software, but I think the back-and-forth nature of this conversation is a great guide for those wondering how useful FP16 is. The answer always comes down to some flavor of "that depends."

One fun fact to further refine people's understanding: by definition, there can be no more than 65,536 different values represented by a 16-bit value regardless of how it's interpreted. With FP16 in particular the peculiar truth is that there are 63,490 different numbers you can represent including all whole values as well as fractional values and negative versions of all of the above – and that includes positive and negative infinity as well as negative zero. ;-) The remaining 2,046 bit patterns are literally not numbers.

If you put your mind to it, a lot can be done with numbers that are that precise, even if they do have their limitations. A general rule of thumb I apply when writing shaders for mobile apps is to represent positions with high precision, and colours, directions, and texture coordinates, with lower precision.

That's a reasonable starting point, though it's worth keeping in mind that intermediate representations may need to be computed using higher precisions to avoid unacceptable loss of precision. FP16 is more than enough to store HDR10 color components but it's not always going to be great for every step along the path to reaching that conclusion, and it's not precise enough to correctly represent every value for a Dolby Vision color component.
 

Durante

Member
Where 16bit is really useful is a storage format. 16bit render targets, vertex data, etc, has been supported for a very, very long time (Vita had full support for 16bit render targets with blending, for example). It's great here because you do calculations in 32bit, then encode to 16bit or less from the higher precision calculated value.
Good post overall, I just wanted to mention one thing. I feel like using Vita (a system released in 2011) as an example here doesn't really drive home just how long 16 bit per component rendertargets have been a thing.

I personally remember using 16 bit float rendertargets on a Radeon 9700, a GPU released in 2002.
 
16 bit computation is generally not very useful outside of a small subset of problems.
The precision loss is just too great, for example, the calculation (10.0 + 1.0 / 500.0) cannot be computed accurately with 16bit floating point precision - the result will be 10.0. Now imagine performing 100 such calculations one after the other, and the accumulated precision loss can become quite dramatic.
Precision loss can be a problem with 32bit calculations sometimes, and 16bit is *dramatically* worse (thousands of times worse).

As such the number of places in a game where 16bit computation can be used is generally quite limited, but it's still a useful tool in those places. You won't see it being used in things that require high precision (lighting, animation, etc) but it may be practical where you are already dealing with already low-ish precision data that doesn't require heavy processing (e.g. post processing, antialiasing, etc). From personal experience, even these obvious cases you have to be very careful.

However even when you have a valid use case it won't mean you'll get a performance boost (sometimes it can be slower!) - almost all modern processors (including mobile parts) are limited by bandwidth and latency (either internal or external), as shunting data around draws a lot of power. As such, even using 16bit precision and doubling your theoretical FLOPs can typically mean you just end up bottlenecked elsewhere in the system.

But, this is ironically where 16bit can be most useful.

The primary advantage for hardware 16bit computation is the registers are half the size of 32bit - not that they are twice as fast. So sometimes you can more easily reduce register pressure for a shader by using 16bit computation, which can sometimes allow for a greater number of wavefronts to be active, which can sometimes improve *latency* hiding (assuming that is your bottleneck), thus sometimes improving performance. Lots of assumptions there though.

Thus, generally, the benefit of 16bit computation is not the doubled theoretical FLOPs, but the smaller data size and potential for improved latency hiding that results (Cerney actually mentioned this). But as with any modern, highly complex hardware, it's never black and white.


Where 16bit is really useful is a storage format. 16bit render targets, vertex data, etc, has been supported for a very, very long time (Vita had full support for 16bit render targets with blending, for example). It's great here because you do calculations in 32bit, then encode to 16bit or less from the higher precision calculated value.

This reduces memory usage, cache thrashing, etc (which is what the Killzone presentation mentioned above was referencing - NOT 16bit calculations).

Basically:

16bit storage: awesome, and supported everywhere.

16bit computation: very limited use, can often be bottlenecked elsewhere, but in certain limited situations can be very useful.

In no way whatsoever does supporting FP16 double the performance of a machine, but it does provide a useful tool to help micro optimize certain parts of a game though.

TFLOPs are still a fairly useful metric, because they are often balanced against other bottlenecks within the system (it's a waste of silicon to add more ALU if you are 95% latency bound). However, the catch is that this only applies to 32bit ops. The 16bit FLOP measurements they commonly use for mobile parts are best completely ignored (*cough* nvidia *cough*)

Really interesting post, thanks.
 

LowSignal

Member
To me it's PR buzz words. If most developers can't take advantage of it than to me it's worthless. Show me games with stable frame rates and great picture quality.
 

Tripolygon

Banned
Sorry no secret sauce in that console:(
Yep, Sony is a hardware company. No way for MS to be as forward thinking as Sony in adopting tech from other companies.
Bullshit aside, as far as i know, nobody ever said it was an exclusive PS4 Pro feature, it's a standard AMD Vega GPU feature going forward so you're creating a false narrative that doesn't exist.

But, here is a legitimate thread where OP poses a question of what double speed FP16 means in terms of actual use case in games. There are some people who have knowledge contributing to the understanding, but all you can contribute is bring this console warrior shit from that other thread into here.
To me it's PR buzz words. If most developers can't take advantage of it than to me it's worthless. Show me games with stable frame rates and great picture quality.
If you read just a few post up you would know its not just buzz words and neither is it a new thing.
 

Inuhanyou

Believes Dragon Quest is a franchise managed by Sony
To me it's PR buzz words. If most developers can't take advantage of it than to me it's worthless. Show me games with stable frame rates and great picture quality.

Its not as simple as that. Its not "PR". But at the same time its not secret sauce magical.

That's like saying the 10MB EDRAM configuration in the 360, or SPU's in the PS3 were PR buzzwords.

Yeah they could not be taken advantage of uniformly, but they were useful in many scenarios for helping game development on their respective platforms. Its far from something to dismiss as PR especially when talked about in non PR technical documentation.

Just because you don't understand the merits in micro situations doesn't mean it means nothing.
 

AlStrong

Member
16-bit stuff

dhMeAzK.gif


(some great contributions from others too, of course).
 

Nevyr

Banned
LOL! Isn't that the truth. I'm getting both but there's no way I'm convinced that the Pro has magical fairy dust that makes it equivalent to the Scrorpio or higher as some other forum members would like people to believe.

Not sure, I thought we are still waiting for the extra GPU/CPU of Xbox One to be unlocked.

I remember reading about those and it went on for several years. We must be close now.
 

Carn82

Member
16 bit computation is generally not very useful outside of a small subset of problems.

Great post :) People saying that it theoretically is a 8.4TF (FP16) machine could as well be saying that the PS4 Pro is theoretically an inefficient expensive hot air blower. Both statements aren't wrong per se, but have very little to do with how software actually runs on the machine.
 

c0de

Member
Bullshit aside, as far as i know, nobody ever said it was an exclusive PS4 Pro feature, it's a standard AMD Vega GPU feature going forward so you're creating a false narrative that doesn't exist.

But, here is a legitimate thread where OP poses a question of what double speed FP16 means in terms of actual use case in games. There are some people who have knowledge contributing to the understanding, but all you can contribute is bring this console warrior shit from that other thread into here.

If you read just a few post up you would know its not just buzz words and neither is it a new thing.

I posted in this thread before, keep your forum police attitude where it belongs.
 

Lady Gaia

Member
To me it's PR buzz words. If most developers can't take advantage of it than to me it's worthless.

This thread is complicated because the answer is nuanced. It isn't that "nobody can use this" nor is it "2x the teraflopz!", but somewhere between the extremes.
 

Tripolygon

Banned
I posted in this thread before, keep your forum police attitude where it belongs.
Yea you posted something relevant then you went ahead to quote and push a narrative that nobody is saying. I didn't say you couldn't post your drivel, just pointing out how silly that false narrative you're pushing is.
 

Rodelero

Member
If most developers can't take advantage of it than to me it's worthless. Show me games with stable frame rates and great picture quality.

It's not likely to be a binary thing. Every game could probably take some advantage using FP16, but I'm sure many developers won't consider it worthwhile. Those developers who are most able and willing to go the extra mile will probably find some way to improve performance using this. Put it this way - you want to see games with great framerates and great picture quality - FP16 is another tool developers can use in pursuit of that goal.
 
Good post overall, I just wanted to mention one thing. I feel like using Vita (a system released in 2011) as an example here doesn't really drive home just how long 16 bit per component rendertargets have been a thing.

I personally remember using 16 bit float rendertargets on a Radeon 9700, a GPU released in 2002.

didnt r300 do everything at 24 bit regardless? was their even a benefit?
 
So let's say I'm describing all the colors of the things in my room.

Couch: mottled slate gray with lighter iron gray highlights
Chair: salt-and-pepper backs with natural walnut finish legs
etc.

OR, I could describe them like this:

Couch: gray
Chair: black

Because I'm being less precise in the second set of descriptions, I can tell you all of the colors much more quickly.

I like this.

The missing part is, "Then if someone drew your room from your description, some of it it would look worse with the lower precision description".
 

gofreak

GAF's Bob Woodward
I wonder if devs might be more encouraged to use floating origin cameras. In that case 16-bit precision may go further in more kinds of calculation.

Or if you do all shading calculations in object space. If the objects aren't very large, 16 bit precision may suffice for certain parts of the calculation (more than if you do in worldspace, and with a non-floating-origin camera).

You're not going to want to do everything in 16-bit, but there may be some ways to get around some of the initial limits.

It'll be interesting, in that it may be another vector along which some devs can get creative with more unconventional techniques to squeeze out more performance.
 
Nice breakdown so far...

Can someone explain this to me using DBZ characters?
64 bit = Piccolo after merging with Nail and Kami

32 bit = Piccolo after merging with Nail

16 bit = standard Piccolo

Except that despite losing precision he'd get faster...
 

KOCMOHABT

Member
The return of Half Floats into main stream is quite relevant, it's not some PR bullshit.

It's ironic that I think it was Ati, too, that basically changed its pipeline to "float all day every day" a good while back.

But especially mobile gaming brought back the importance of half floats. What is described by Cerny here happened with the custom Apple GPU (derived from PowerVR mobile GPUs) where they, too, went the "half floats can share a float register"-route and it was a big component why iPhones were so much more performant in 3d applications than comparable phones.
EDIT: Here's an article http://www.realworldtech.com/apple-custom-gpu/

Contrary to some posts before, half-floats can be used extensively (in my opinion) and, in fact, are used extensively in lots of engines anyways.

An obvious example would be Unity shaders, but you can also check out CryEngine's main deferred shader here and look into the main functions : https://github.com/CRYTEK-CRYENGINE...e/Shaders/HWScripts/CryFX/DeferredShading.cfx

Other shaders, however, can profit even more. For example in this one halfs are used exclusively https://github.com/CRYTEK-CRYENGINE.../Shaders/HWScripts/CryFX/AmbientOcclusion.cfx

Just a note: It's ironic that all the legacy CryEngine code and shaders use halfs wherever possible, since that was relevant at the time, but with the newer stuff they often forego that step.
 

onQ123

Member
VooFoo Dev Was Initially Doubtful About PS4 PRO GPU’s Capabilities But Was Pleasantly Surprised Later


“I was actually very pleasantly surprised. Not initially – the specs on paper don’t sound great, as you are trying to fill four times as many pixels on screen with a GPU that is only just over twice as powerful, and without a particularly big increase in memory bandwidth,” he explained, echoing the sentiment that a lot of us seem to have, before adding, “But when you drill down into the detail, the PS4 Pro GPU has a lot of new features packed into it too, which means you can do far more per cycle than you can with the original GPU (twice as much in fact, in some cases). You’ve still got to work very hard to utilise the extra potential power, but we were very keen to make this happen in Mantis Burn Racing.

“In Mantis Burn Racing, much of the graphical complexity is in the pixel detail, which means most of our cycles are spent doing pixel shader work. Much of that is work that can be done at 16-bit rather than 32-bit precision, without any perceivable difference in the end result – and PS4 Pro can do 16 bit-floating point operations twice as fast as the 32-bit equivalent.”



I'm the Oracle

Back to this half-precision stuff:



Neo will perform almost as if it was 8.4TF when you use FP16 & fit 2 16-bit instructions into FP32.

but because the FP16 is actually compressed from what would have probably been a 32-bit instruction it will actually be performing what seems like close to 8.4TF of FP32.



I'm aware that y'all think I'm crazy but watch this 4.2TF console do 4K when devs use FP16.
 
Of course they want to sell their game to PS4 Pro owners, and it is their good right to present it in an optimistic way. But that doesn't mean it will perform close to a 8.4 TFLOPs equivalent (nor did they actually say it will, even with all that enthusiasm).

Regardless, to get a full picture we need to take into account an average of many games (or of many developer statements if you will), not a single one.
 

onQ123

Member
Of course they want to sell their game to PS4 Pro owners, and it is their good right to present it in an optimistic way. But that doesn't mean it will perform close to a 8.4 TFLOPs equivalent (nor did they actually say it will, even with all that enthusiasm).

Regardless, to get a full picture we need to take into account an average of many games (or of many developer statements if you will), not a single one.

It's 8.4TF FP16 that's not up for debate that is the spec so why would you have to take the average of many games?
 

Lady Gaia

Member
It's 8.4TF FP16 that's not up for debate that is the spec so why would you have to take the average of many games?

To get a feel for average real-world impact of the availability of double rate FP16 support. If indeed it roughly doubles GPU performance in most real world scenarios beyond the already expected ~2.3x boost from clock speed and compute units alone, then most games that run at 1080p on PS4 should have no problem running at 4K native. If, as many people have pointed out, it's a good thing but not a determinative breakthrough, then we'll keep seeing most 1080p PS4 titles rely on some combination of CBR and upscaling.

I don't think there has ever been much of a debate about whether FP16 exists, is supported, or has uses. The debate is about expectations regarding how much of an impact it will have in practice.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
16 bit computation is generally not very useful outside of a small subset of problems.
The precision loss is just too great, for example, the calculation (10.0 + 1.0 / 500.0) cannot be computed accurately with 16bit floating point precision - the result will be 10.0.
10.0 + 1.0 / 500.0 cannot be computed accurately with any binary FP, for the simple fact 1.0 / 500.0 is not a finite binary fraction. Or more precisely, it's not a sum or a multiple of finite binary fractions.

1/500 = (1/5)^3 * 1/4.

While 1/4 is an exact power of two, trying to compute 1/5 in binary yields:
1/5 ~= 1/8 + 1/16 = 3/16

Computing the error yields:
1/5 - 3/16 = 1/80 = (1/2)^4 * 1/5.

We got that 1/5 factor again. That is, trying to compute 1/5 exactly in binary gets us to compute 1/5 again - that's a recipe for an infinite periodical fraction, IOW an infinite recursion. Not representable in any finite positional notation.
 

BroBot

Member
64 bit = Piccolo after merging with Nail and Kami

32 bit = Piccolo after merging with Nail

16 bit = standard Piccolo

Except that despite losing precision he'd get faster...

Great explanation. I was thinking of something very similiar to that.
 

Green Yoshi

Member
According to Goossen, some performance optimisations from the upcoming AMD Vega architecture factor into the Scorpio Engine's design, but other features that made it into PS4 Pro - for example, double-rate FP16 processing - do not. However, customisation was extensive elsewhere. Microsoft's GPU command processor implementation of DX12 has provided big wins for Xbox One developers, and it's set for expansion in Scorpio.

http://www.eurogamer.net/articles/digitalfoundry-2017-the-scorpio-engine-in-depth

Will that have an impact? In theory every multiplatform game should perform better on Scorpio than on PS4 Pro.
 
Status
Not open for further replies.
Top Bottom