• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PS4 Pro and Half Floats. 16bit at 8.4 TF vs. 32bit at 4.2 TF. Explanation on this?

Status
Not open for further replies.
Eurogamer put together an article detailing more specific features on the PRO's hardware, it can be read here here.

But what I'm curious about is this:

"There's better support of variables such as half-floats. To date, with the AMD architectures, a half-float would take the same internal space as a full 32-bit float. With Polaris, it's possible to place two half-floats side by side in a register, which means if you're willing to mark which variables in a shader program are fine with 16-bits of storage, you can use twice as many. Annotate your shader program, say which variables are 16-bit, then you'll use fewer vector registers."

"One of the features appearing for the first time is the handling of 16-bit variables - it's possible to perform two 16-bit operations at a time instead of one 32-bit operation," he says, confirming what we learned during our visit to VooFoo Studios to check out Mantis Burn Racing. "In other words, at full floats, we have 4.2 teraflops. With half-floats, it's now double that, which is to say, 8.4 teraflops in 16-bit computation. This has the potential to radically increase performance."

I thought we were done with bits? Man I don't even know what the fuck a bit is. Can anyone explain this a bit better. I also remember a GAFer saying something a prediction about the PS4 Pro doing this, but he was laughed out. Was he right?
 

tuxfool

Banned
A bit is a single binary value either a 1 or a 0. There are 8 bits in a byte etc. etc.

The reason it can handle half precision floats (16bits) is because the Polaris architecture has better support for them.

It is up to the game to make use of such half-precision floating point variables, but I'd say for the sake of compatibility with PS4old most developers will not bother (the ISA should allow compiled shaders to work seamlessly across both consoles). It should also be noted that these variables are only useful in certain situations and were initially more prevalent in mobile applications.
 
A bit is a single binary value either a 1 or a 0. There are 8 bits in a byte etc. etc.

The reason it can handle half precision floats (16bits) is because the Polaris architecture has better support for them.

It is up to the game to make use of such half-precision floating point variables, but I'd say for the sake of compatibility with PS4old most developers will not bother (the ISA should allow compiled shaders to work seamlessly across both consoles). It should also be noted that these variables are only useful in certain situations and were initially more prevalent in mobile applications.

So this is something that is more relevant to future hardware releases (like PS5), not so much this?
 

tuxfool

Banned
So this is something that is more relevant to future hardware releases (like PS5), not so much this?

Maybe, but it seems unlikely to be massively important to standard console games. It certainly can be an improvement in frequently used variables that do not need 32bit precision. However, from what I've read, most graphics operations need 32bit precision.
 

Andodalf

Banned
This thread being this small on Gaf is shocking.

Can anyone share some light on this?

Are you kidding? There's a huge thread on it. It's probably not important, it might offer small benefits to certain parts of shaders and the rendering pipeline that benefit from them being able to make half precision essentially half as much to move around. It's been used on mobile for years, and hasn't made them ultra powerful. Generally speaking it's a non story.
 

Durante

Member
It's really quite simple.

Non-integer numbers are stored in floating point formats in a computer.
You can use a varying number of bits to store a single floating point number, usually either 16, 32 or 64.
Depending on how many bits you use, you will get a more or less accurate represenation of a given number. E.g. instead of 4.287502375023 in 64 bit you might get 4.28750237 in 32 bit and 4.2875 in 16 bit (these are made up to illustrate the concept).

On some GPU architectures (e.g. actually way back in 2003 in the Geforce FX series), but also in some modern GPUs from Nvidia and now apparently AMD, you can perform some calculations at 16 bit accuracy twice as fast as at 32 bit.

That's the simple part.

What's more difficult to explain and requires far more background knowledge is estimating how many of the GPU calculations in a modern game can be reduced to 16 bit precision without generating artifacts or losing important information.
 

dano1

A Sheep
It's really quite simple.

Non-integer numbers are stored in floating point formats in a computer.
You can use a varying number of bits to store a single floating point number, usually either 16, 32 or 64.
Depending on how many bits you use, you will get a more or less accurate represenation of a given number. E.g. instead of 4.287502375023 in 64 bit you might get 4.28750237 in 32 bit and 4.2875 in 16 bit (these are made up to illustrate the concept).

On some GPU architectures (e.g. actually way back in 2003 in the Geforce FX series), but also in some modern GPUs from Nvidia and now apparently AMD, you can perform some calculations at 16 bit accuracy twice as fast as at 32 bit.

That's the simple part.

What's more difficult to explain and requires far more background knowledge is estimating how many of the GPU calculations in a modern game can be reduced to 16 bit precision without generating artifacts or losing important information.


So are you saying it's a true 8.4 T Flop machine but just takes extra work to make it happen?
 

Durante

Member
So are you saying it's a true 8.4 T Flop machine but just takes extra work to make it happen?
No, that's definitely not what I'm saying.

It's a 4.1 TF machine (since "TFlop" without additional qualifiers in graphics generally refers to FP32 in this day and age). Depending on the individual workload profile of a given game, and how much developers are willing to invest into optimizations which won't do anything for the larger (non-Pro) part of their audience, it could perform faster than that (but never close to twice as fast).
 

KKRT00

Member
So are you saying it's a true 8.4 T Flop machine but just takes extra work to make it happen?

Yes, if the only purpose of your code is to calculate floating point values in half precision.

Hmm even that is not accurate, because FP16 is generally faster than FP32, so its 8.4TFlop machine only in a situation where the only purpose of your existing code is to calculate FP32 values (and you dont care about precision output) and then you fully convert it FP16.
 
It's really quite simple.

Non-integer numbers are stored in floating point formats in a computer.
You can use a varying number of bits to store a single floating point number, usually either 16, 32 or 64.
Depending on how many bits you use, you will get a more or less accurate represenation of a given number. E.g. instead of 4.287502375023 in 64 bit you might get 4.28750237 in 32 bit and 4.2875 in 16 bit (these are made up to illustrate the concept).

On some GPU architectures (e.g. actually way back in 2003 in the Geforce FX series), but also in some modern GPUs from Nvidia and now apparently AMD, you can perform some calculations at 16 bit accuracy twice as fast as at 32 bit.

That's the simple part.

What's more difficult to explain and requires far more background knowledge is estimating how many of the GPU calculations in a modern game can be reduced to 16 bit precision without generating artifacts or losing important information.

I've been following this pretty close and this is the first explanation I've been able to comprehend.

Thanks.
 
It's really quite simple.

Non-integer numbers are stored in floating point formats in a computer.
You can use a varying number of bits to store a single floating point number, usually either 16, 32 or 64.
Depending on how many bits you use, you will get a more or less accurate represenation of a given number. E.g. instead of 4.287502375023 in 64 bit you might get 4.28750237 in 32 bit and 4.2875 in 16 bit (these are made up to illustrate the concept).

On some GPU architectures (e.g. actually way back in 2003 in the Geforce FX series), but also in some modern GPUs from Nvidia and now apparently AMD, you can perform some calculations at 16 bit accuracy twice as fast as at 32 bit.

That's the simple part.

What's more difficult to explain and requires far more background knowledge is estimating how many of the GPU calculations in a modern game can be reduced to 16 bit precision without generating artifacts or losing important information.

I've been following this pretty close and this is the first explanation I've been able to comprehend.

Thanks.

Exactly. Thank you, Durante.
 
giphy.gif
 

KageMaru

Member
No one wants to talk about it because they didn't believe it was the case & now that it has come out as the truth they would rather say that it's meaningless.

That's not the case at all.

No, that's definitely not what I'm saying.

It's a 4.1 TF machine (since "TFlop" without additional qualifiers in graphics generally refers to FP32 in this day and age). Depending on the individual workload profile of a given game, and how much developers are willing to invest into optimizations which won't do anything for the larger (non-Pro) part of their audience, it could perform faster than that (but never close to twice as fast).

Thank you.
 
lol

I'm still none the wiser even after Durante's post. I need someone to explain this to me like im a 5 year old :)

So let's say I'm describing all the colors of the things in my room.

Couch: mottled slate gray with lighter iron gray highlights
Chair: salt-and-pepper backs with natural walnut finish legs
etc.

OR, I could describe them like this:

Couch: gray
Chair: black

Because I'm being less precise in the second set of descriptions, I can tell you all of the colors much more quickly.
 

Lady Gaia

Member
Depending on how many bits you use, you will get a more or less accurate represenation of a given number. E.g. instead of 4.287502375023 in 64 bit you might get 4.28750237 in 32 bit and 4.2875 in 16 bit (these are made up to illustrate the concept).

It's also important to recognize that FP16 precision isn't just an issue for small fractional components of a number, but it also has serious range limitations, The largest number you can represent is 65504. Period. Anything larger either rounds down to 65504 or up to infinity. Also consider that the next smallest number you can represent is 65472, and yes I'm being serious. Not only can't you represent fractional components of numbers this "large" but the 31 integer values between these two values don't exist, either.

There are absolutely times when they're appropriate. They work incredibly well for some computations and not at all for others. So, sure, a lot of threads devolved into people taking extreme positions but the reality is that they exist and are useful but are far from a panacea. Suggestions that existing PS4 shader code could run with FP16 were nonsense, as are insinuations that Scorpio might be a 6TF console only for FP16.
 

onQ123

Member
That's not the case at all.

Funny that someone replied & said it was meaningless.


This is basically what is going on.




But here is something to think about but I know people will ignore it because it's coming from me:

If artifacts & noise is a problem with FP16 wouldn't it be smart to have hardware for cleaning up artifacts in the PS4 Pro?
 

Caayn

Member
lol

I'm still none the wiser even after Durante's post. I need someone to explain this to me like im a 5 year old :)
Maybe it helps when you use Pi.

Pi in FP64 = 3.141592653589793
Pi in FP32 = 3.141592653
Pi in FP16 = 3.1415

In this case calculating Pi with FP32 (single precision) vs Fp16 will result in a more accurate representation of a circle, and thus a more "rounder" and smoother circle. How smooth you need the circle to be depends on the use case.

The use case determines the minimum accuracy that you need and thus the minimum type of (half/single/double) precision you need. For example half precision isn't fit for every use case, whereas double precision can be overkill for a lot of use cases.
It's also important to recognize that FP16 precision isn't just an issue for small fractional components of a number, but it also has serious range limitations, The largest number you can represent is 65504. Period. Anything larger either rounds down to 65504 or up to infinity. Also consider that the next smallest number you can represent is 65472, and yes I'm being serious. Not only can't you represent fractional components of numbers this "large" but the 31 integer values between these two values don't exist, either.
You make an excellent point with this.
 
So let's say I'm describing all the colors of the things in my room.

Couch: mottled slate gray with lighter iron gray highlights
Chair: salt-and-pepper backs with natural walnut finish legs
etc.

OR, I could describe them like this:

Couch: gray
Chair: black

Because I'm being less precise in the second set of descriptions, I can tell you all of the colors much more quickly.

Haha, thank! I understand what you mean :)

Hypothetically, could you use the couch and chair analogy at the same time? Couch: mottled slate gray with lighter iron gray highlights and Chair: black.

Can this FP16 and FP32 be used concurrently?
 
Haha, thank! I understand what you mean :)

Hypothetically, could you use the couch and chair analogy at the same time? Couch: mottled slate gray with lighter iron gray highlights and Chair: black.

Can this FP16 and FP32 be used concurrently?

Yes. The two types are suitable for different use cases in the same program and usable in the same program. FP16 isn't practical for an entire game, but there are useful optimisations that can be made using it.
 
Yes. The two types are suitable for different use cases in the same program and usable in the same program. FP16 isn't practical for an entire game, but there are useful optimisations that can be made using it.

Again, thanks. You learn something new everyday or in this case, two things :)
 

Hjod

Banned
What does this mean for Knack?

Thanks is in order for those who take time to explain to us dum dums.
 

Lonely1

Unconfirmed Member
Funny that someone replied & said it was meaningless.


This is basically what is going on.




But here is something to think about but I know people will ignore it because it's coming from me:

If artifacts & noise is a problem with FP16 wouldn't it be smart to have hardware for cleaning up artifacts in the PS4 Pro?
Magic artifact/noise cleaning hardware!?
 

Metfanant

Member
Others have already said it better than I ever could, but no, it's not meaningless, yes, when dealing strictly with fp16 calculations it is an 8.4tf gpu...but at the same time doesn't mean the PS4 Pro just doubled it's gpu power. FP16 is not appropriate for all situations, but it could be used in some cases to gain a good chunk of extra performance if the devs take the time to use it
 

platocplx

Member
It's really quite simple.

Non-integer numbers are stored in floating point formats in a computer.
You can use a varying number of bits to store a single floating point number, usually either 16, 32 or 64.
Depending on how many bits you use, you will get a more or less accurate represenation of a given number. E.g. instead of 4.287502375023 in 64 bit you might get 4.28750237 in 32 bit and 4.2875 in 16 bit (these are made up to illustrate the concept).

On some GPU architectures (e.g. actually way back in 2003 in the Geforce FX series), but also in some modern GPUs from Nvidia and now apparently AMD, you can perform some calculations at 16 bit accuracy twice as fast as at 32 bit.

That's the simple part.

What's more difficult to explain and requires far more background knowledge is estimating how many of the GPU calculations in a modern game can be reduced to 16 bit precision without generating artifacts or losing important information.

yep so You probally wont see games that require more precision on every aspect to utilize the feature fully however there can be a balance that could lead to squeezing more power out of the console from how i understand it so it it can be scaled to where someone could do something to get more power than 4.2 flops out of the system if they know which calculations they may not need that much precision. its pretty cool. may not be possible for the full 8.4 but somewhere in between enough so that someone probably could probably get in between 4.2 and 8.4 based on optimizations. Gives the system enough power to be study enough for the inevitable full redesign in the next 3-4 years IMO.

Nice breakdown so far...

Can someone explain this to me using DBZ characters?

Its like the system is a super sayian but it also can become an ascended super sayian like when vegeta had that bulky form it wasnt a perfect form to SSJ2 but it allowed for additional power. lol
 

Tripolygon

Banned
This is like saying Switch(supposed Tegra X2) is a 1.5TFLOPs machine.
Actually, it is exactly the opposite of what you're saying. Mobile SOCs are usually advertised with FP16 benchmarks. While desktops are advertised with FP32 benchmarks, especially when it comes to graphics benchmarks.

Scientific research and Simulation usually use FP64 and AI deep learning use FP16.

It's all relative to what you're doing.

To answer OPs question. Yes FP16 can be used in many cases in games.

Reading an actual developer's comments on this topic, it seems the idea is almost all current games output images at 8bit/channel so it will be a waist to calculate every intermediate math at higher 32-bit when you can do it at 16-bit without any distinguishable loss in quality.

For example Killzone Shadowfall
 
It's really quite simple.

Non-integer numbers are stored in floating point formats in a computer.
You can use a varying number of bits to store a single floating point number, usually either 16, 32 or 64.
Depending on how many bits you use, you will get a more or less accurate represenation of a given number. E.g. instead of 4.287502375023 in 64 bit you might get 4.28750237 in 32 bit and 4.2875 in 16 bit (these are made up to illustrate the concept).

On some GPU architectures (e.g. actually way back in 2003 in the Geforce FX series), but also in some modern GPUs from Nvidia and now apparently AMD, you can perform some calculations at 16 bit accuracy twice as fast as at 32 bit.

That's the simple part.

What's more difficult to explain and requires far more background knowledge is estimating how many of the GPU calculations in a modern game can be reduced to 16 bit precision without generating artifacts or losing important information.


It's also important to recognize that FP16 precision isn't just an issue for small fractional components of a number, but it also has serious range limitations, The largest number you can represent is 65504. Period. Anything larger either rounds down to 65504 or up to infinity. Also consider that the next smallest number you can represent is 65472, and yes I'm being serious. Not only can't you represent fractional components of numbers this "large" but the 31 integer values between these two values don't exist, either.

There are absolutely times when they're appropriate. They work incredibly well for some computations and not at all for others. So, sure, a lot of threads devolved into people taking extreme positions but the reality is that they exist and are useful but are far from a panacea. Suggestions that existing PS4 shader code could run with FP16 were nonsense, as are insinuations that Scorpio might be a 6TF console only for FP16.

These posts together are great and provide the important info, at least from my relatively limited understanding of programming.

My question is, how is this different than using most of the other declared data types?
 
Fp16 is enough precision for most of the post processing done in games. No idea what perf improvement this translates to in the end tho
 

c0de

Member
lol

I'm still none the wiser even after Durante's post. I need someone to explain this to me like im a 5 year old :)

Many numbers have an infinite “value“ (like pi) but computers can only store values with a finite value (but can still calculate arbitrary values if given enough time).
So computers, or better algorithms, do a lot of approximations to try circumvent that. Certain applications don't need much accuracy, others do (especially scientific applications).
FP16 is a way to trade accuracy for speed in that case.
 

ZoyosJD

Member
Funny that someone replied & said it was meaningless.

This is basically what is going on.

If artifacts & noise is a problem with FP16 wouldn't it be smart to have hardware for cleaning up artifacts in the PS4 Pro?

For all intents and purposes it is meaningless. The number of instances in which optimization will be applied for a small growing portion of the community that will mostly result in improvements in the number of simultaneous shaders running.

As for the potential of hardware to clean artifacts and noise.

I like the Pi analogy:

Maybe it helps when you use Pi.

Pi in FP64 = 3.141592653589793
Pi in FP32 = 3.141592653
Pi in FP16 = 3.1415

Sure you could implement hardware to guess that pi is 3.141572834 which is more accurate, but without an official rendering source that hardware might guess that pi is 3.146842749 which is less accurate.

Applying image analysis techniques, frequency domain analysis, temporal analysis, predictive branching, and stochastic modeling you may in some instances improve image quality and there is hardware that can implement these improvements at 4k quite well, but they tend to be very expensive; the cost of a PS4 Pro itself would likely not even be comparable, and it would add additional latency to the system upwards of 100ms. Neither of which is desirable for a console.

If you stopped to ask questions about how these things actually worked, we wouldn't have to constantly bag on you for assuming what isn't accurate.
 
Actually, it is exactly the opposite of what you're saying. Mobile SOCs are usually advertised with FP16 benchmarks. While desktops are advertised with FP32 benchmarks, especially when it comes to graphics benchmarks.

Scientific research and Simulation usually use FP64 and AI deep learning use FP16.

It's all relative to what you're doing.

To answer OPs question. Yes FP16 can be used in many cases in games.

Reading an actual developers comments on this topic, it seems the idea is almost all current games output images at 8bit/channel so it will be a waist to calculate every intermediate math at higher 32-bit when you can do it at 16-bit without any distinguishable loss in quality.

For example Killzone Shadowfall

I love this presentation! It's really awesome, I've seen some from Naughty Dog and Sucker Punch as-well!

I love when developers have presentations like this!
 
These posts together are great and provide the important info, at least from my relatively limited understanding of programming.

My question is, how is this different than most of the other declared data types?
It really isn't in principle. What changes is the execution environment and programming habits. Say, on Atari 2600 you wouldn't hesitate to brainstorm whether you need player character's vertical position to be 8-bit or 16-bit integer since you had only so much RAM, and doing something as simple as addition took more time on longer variable. But now most CPUs will actually do both of these in an instant and we have loads of RAM, so unless you have loads of characters of that type it doesn't matter. Same with those half-floats, it's just that the equivalent of ranges is crazier here than with integers. They used to be smaller but apparently processed as slowly (not including RAM transfers), so why bother thinking about them unless your problems are RAM size and bandwidth? Doubles (64-bit floats) were always slower though.
 
It really isn't in principle. What changes is the execution environment and programming habits. Say, on Atari 2600 you wouldn't hesitate to brainstorm whether you need player character's vertical position to be 8-bit or 16-bit integer since you had only so much RAM, and doing something as simple as addition took more time on longer variable. But now most CPUs will actually do both of these in an instant and we have loads of RAM, so unless you have loads of characters of that type it doesn't matter. Same with those half-floats, it's just that the equivalent of ranges is crazier here than with integers. They used to be smaller but apparently processed as slowly (not including RAM transfers), so why bother thinking about them unless your problems are RAM size and bandwidth? Doubles (64-bit floats) were always slower though.

Right so that leads me to my next question, with the advent of new storage technologies, presumably we're going to hit a point soon where our main storage devices as flash devices will be able to double as RAM when space is available. So obviously in practice I still use the most concise data type available to me, but as you said it was a product on necessity and just good practice, but going forward while still important won't it have a reduced effect/importance if anything? //Once again just to be perfectly clear, good programmers will still use the correct data types.

obviously the hope probably eventually is you have looser language, streamlined coding capabilities for games through powerful hardware features to bring down development costs. // I'd think at least.
 

astraycat

Member
When you store a number on a computer, you're storing them as a string of bits, which you can think of the digits in a number. This isn't wholly the case, but it's useful for illustration.

So let's look at the values you can store with 4 digits: 0 - 9999. You have 10000 values, all equally spaced. But what if you want to store values larger than 9999, or fractional values? Well, what if you changed the digits you stored to two separate values, like scientific notation? If we store the significand and the exponent separately, say 3 digits of significand and 1 digit of exponent then we can store a much wider range of values.

Now you can store values like 22200 as 2.22E4, or 0.0001234 as 1.234E-4. But if you compare this to the original values, you'll notice that you can no longer fully represent a number like 1002. The closest number is 1.00E3, so you've lost precision in this storage.

That's the trade-off of double/single/half precision. Each format has some number of bits to represent the exponent, and some number of bits to represent the significand.

When can you use half? I'd look first in things like tonemapping and color grading the final image -- the final numbers are going to be 12-bits a channel even in the future HDR case, but I'm willing to bet that the 10-bits of significand half has is already enough.

Another place to safely save some ops may be in normal mapping -- you're probably going from 8-bits a channel of your normal map up to maybe 16-bits for your final output to a gbuffer.

Dynamic AO will probably also be a good place to poke at. There's no reason for doing full precision for noisy output that's going to be blurred (another good place for half) anyway before use.
 

eso76

Member
If artifacts & noise is a problem with FP16 wouldn't it be smart to have hardware for cleaning up artifacts in the PS4 Pro?

Neh.
That's not the kind of artifact you can correct or clean up later.
It's just missing information you can't recover.

Think of it as shaders rendered at a lesser degree of precision.

In the colours example posted above, the chair may be black because brown isn't an option and black is the closest thing. The piece of information indicating that the chair is actually brown isn't stored anywhere else (even if it were, having to retrieve said information elsewhere would erase any advantage).

I guess it could be useful for those operations that don't require a great degree of precision, but I don't know what those would be (bump map? Tone mapping?), not sure if modern titles use FP16 at all or whether they are in a significant amount to really make a difference.

While theoretically it may fly, I doubt devs will spend any significant amount of their time trying to take advantage of this, especially considering the BC requirement.
 
Status
Not open for further replies.
Top Bottom