Support NeoGAF

blu · Apr 17, 2017

This is a strictly educational thread - leave your console-warrior armaments at the wardrobe. Thank you.

Floating point notations, in general, and in particular as defined by the IEEE-754-1985 and IEEE-754-2008 standards, are an exponential form of fractional numbers, i.e. their value is obtained as mantissa * 2^exp (when binary). You can read all your heart's desire about floats on the respective wiki pages.

Half-precision floating-point numbers, aka fp16, have both lower range and fewer significant digits (aka precision) than single-precision floating-point numbers, aka fp32. But of the two properties, fp16's range is worse by far - with a maximum exponent of only 15, the greatest number fp16 can represent before overflow is (2 - 2^-10) * 2^15 = 65504; in contrast, fp32 goes up to (2 - 2^-23) * 2^127 = 340282346638528859811704183484516925440. Now, as with all finite exponential notations, there's a catch that if you want absolute precision, you need to stay in the low end of the range - the higher you go in range, the worse your precision, as the same number of mantissa bits store each number, so when the number is big enough there are no bits left for any fractional part.

Now that we know range is not fp16's forte, let's focus on precision. Below are four sample images devised to show precision artifacts. They show side-by-side a test subject (left side) and a reference (right side). The test does as follows:

There's a plane of x and y axes, color-coded as red and green, respectively. Axes have range of [0, 1) and granularity of 1/512. For each point on the xy plane, a power function is computed, raising the coordinate pair (x, y) to the 8th power. The result is stored in 4 types of temporary storage: fp32, fp16, int16 and int8; the reference image (i.e. the right side) stores everything in fp32.

The test subject ultimately tries to reconstruct the original (x, y) pair as the inverse power of the value obtained from the temporary storage - i.e. for each point on the plane, 1/8th power is computed from the temporary storage at that point. Basically, you can think of the left side of each image as the reconstruction of the right side of the same image, with precision-sensitive data passed via the aforementioned temporary storage.

temp storage fp32 said:

temp storage fp16 said:

temp storage int16 said:

temp storage int8 said:

There are several observations that could be made from the test, particularly with respect to fp16, but I'll leave that part for now to the inquisitive reader, in hope that a healthy discussion forms. Back in a while.

ps: I'm open to suggestions re particular functions gaffers would like to see passed through this pipeline.

edit: Ok, I could have been a tad more verbose about the test procedure, so here are some details:

This is all running in a GLSL shader on an NV Kepler.
Temporary storage is a texture, making sure the GPU actually uses the desired storage type - most desktop GPUs (Kepler included) don't have fp16 ALUs and cannot keep fp16 in registers either.
Integer-type storage keeps fractional values as fixed-point. Basically, all participating types for temporary storage keep fractions, but the integer ones cannot do range above 1.0, which is fine for our purposes.

Rodelero said:
"We can see that there is significant precision loss with FP16, especially with the low inputs. In the case of this specific test, FP16 isn't precise enough to cope with the extremely small results of x^8 when the input itself is low. In fact, the precision is so subpar that many of the results end up being the same as each other."

<snip>
The first test for example is the kind of thing you might do when calculating a (fairly wide) specular highlight.

And my own take on the results:

Test was deliberately chosen to do power computations - power computations are common for specular effects (where a light source's reflection on a surface is approximated via a power function), and power functions exacerbate precision issues. That said..
Test is more sensitive to precision than a typical specular function, as whereas the latter might suffer in its lobe shape, this test, by virtue of reconstruction back from exp to linear space visually amplifies the precision effect.
Test is meant to explore slightly more than just fp arithmetics - with modern GPUs nothing stops devs from using all kinds of fp and fixed-point computations.
Fixed-point fractions are rather unfit for power functions - there's a very apparent banding on the exponents, even for large-ish types (e.g. int16), and the nether regions are, well, all gone.
The one issue with fp16 vs fp32 in this test is the apparent underflow - while fp16 exhibits a smooth gradient (as expected) it just runs out of bits in the darkest region, where small x's and y's are just squashed to zero.

To part two..

flkraven · Apr 17, 2017

I'm not a dumb guy, at least I don't think I am. I read the OP twice, and I still don't get it. I see the difference in the pictures, but I still don't understand.

ethomaz · Apr 17, 2017

I guess that can be posted here...

http://www.hwupgrade.it/articoli/skvideo/1013/radeon-x800-e-il-momento-di-r420_15.html

FarCry FP16 Zoom

FarCry FP32 Zoom

Others examples (without zoom):

FarCry FP16: http://www.hwupgrade.it/articoli/skvideo/1013/shader_nv40_16_1.jpg
FarCry FP32: http://www.hwupgrade.it/articoli/skvideo/1013/shader_nv40_32_1.jpg

FarCry FP16: http://www.hwupgrade.it/articoli/skvideo/1013/shader_nv40_16_2.jpg
FarCry FP32: http://www.hwupgrade.it/articoli/skvideo/1013/shader_nv40_32_2.jpg

There are others examples and benchmarks using old GPUs...

Credits to SappYoda.

RedlineRonin · Apr 17, 2017

ethomaz said:
I guess that can be posted here...

http://www.hwupgrade.it/articoli/skvideo/1013/radeon-x800-e-il-momento-di-r420_15.html

FarCry FP16: http://www.hwupgrade.it/articoli/skvideo/1013/shader_nv40_16_1.jpg
FarCry FP32: http://www.hwupgrade.it/articoli/skvideo/1013/shader_nv40_32_1.jpg

FarCry FP16: http://www.hwupgrade.it/articoli/skvideo/1013/shader_nv40_16_1.jpg
FarCry FP32: http://www.hwupgrade.it/articoli/skvideo/1013/shader_nv40_32_2.jpg

There are others examples and benchmarks using old GPUs that didn't run FP16 faster than FP32.

Credits to SappYoda.

I think the second set of pictures, the first one is the same as the first one in the first set.

(that's gotta be the worst sentence i've ever written but i think you get it)

ethomaz · Apr 17, 2017

RedlineRonin said:
I think the second set of pictures, the first one is the same as the first one in the first set.

(that's gotta be the worst sentence i've ever written but i think you get it)

Yeap... my mistake... fixed I guess.

benny_a · Apr 17, 2017

ethomaz said:
I guess that can be posted here...

The pictures doesn't make this look like a big thing visually, but how applicable is Far Cry and the tech that powers it in a 2017 context?
(Or 2012-2013 if we're talking GPU capability and engine capability of today.)

Edit: Now with the zoom I see the bigger difference.

Tyl3n0L85 · Apr 17, 2017

So from what I'm seeing and understanding FP32 is way more precise and "accurate" over FP16 isn't? So what would be the drawback of having FP32 over FP16? Does being more precise would also mean longer/slower to process since there's simply more numbers/data to calculate?

ethomaz · Apr 17, 2017

benny_a said:
The pictures doesn't make this look like a big thing visually, but how applicable is Far Cry and the tech that powers it in a 2017 context?
(Or 2012-2013 if we're talking GPU capability and engine capability of today.)

I added the zoom pictures to the posts... I guess it is better to see differences.

Hockeymac18 · Apr 17, 2017

Tyl3n0L85 said:
So from what I'm seeing and understanding FP32 is way more precise and "accurate" over FP16 isn't? So what would be the drawback of having FP32 over FP16? Does being more precise would also mean longer/slower to process since there's simply more numbers/data to calculate?

Yes, you just answered the question. That's exactly the trade off between the two.

Tyl3n0L85 · Apr 17, 2017

Hockeymac18 said:
Yes, you just answered the question. That's exactly the trade off between the two.

So next question is, what's the talk here on the forum about Double FP16? What's the advantages if there's any?

iTehDroiD · Apr 17, 2017

Tyl3n0L85 said:
So from what I'm seeing and understanding FP32 is way more precise and "accurate" over FP16 isn't? So what would be the drawback of having FP32 over FP16? Does being more precise would also mean longer/slower to process since there's simply more numbers/data to calculate?

Most modern GPUs including PS4 Pro only need half the time for FP16 calculations than for FP32 ones.

Gestault · Apr 17, 2017

Interestingly enough (or not intersetingly, depending on if you go to parties), the reason a lot of non-specialized hardware will process FP16, but only at the same rate as FP32 operations, is that it's basically discarding the extra precision results.

Painguy · Apr 17, 2017

flkraven said:
I'm not a dumb guy, at least I don't think I am. I read the OP twice, and I still don't get it. I see the difference in the pictures, but I still don't understand.

Basically bigger numbers in fp16 have less decimals lol.

Also OP i think it would be useful if u took the difference of the two images to better display the effect

OttoSporteman · Apr 17, 2017

So... FP16 is a caveman checkerboard trick?

Seik · Apr 17, 2017

Thanks a lot for that, blu.

I heard a lot about FP16/32 lately without actually knowing what it was. This sheds a better light on the subject.

Hockeymac18 · Apr 17, 2017

Tyl3n0L85 said:
So next question is, what's the talk here on the forum about Double FP16? What's the advantages if there's any?

A feature of Vega GPUs (and the PS4 Pro's GPU) is the ability to run two simultaneous FP16 calculations at the same time instead of one single FP32 calculation. This can effectively double the number of calculations that one wants to run - assuming, of course, that you are OK with the loss in precision going from FP32 to FP16.

This feature also existed in older Nvidia GPUs from the mid-2000's (hence the images of Far Cry as an example of the differences of using FP16 vs. FP32). It's also a thing in the mobile world where mobile GPUs have supported FP16 for a while I believe.

The "hard part" is determining what this really means in the context of games. It doesn't mean that the PS4 Pro can all of a sudden run games twice as fast. Real game code is more than just floating point calculations...Losing that precision will likely be only OK in certain parts of the code...how much of it? Probably depends on the game and what it's doing.

I don't know of any modern game examples that show the differences, so it's hard to understand how modern games would fare. As well, since XB1 and PS4 don't support double FP16 calculations, support for it in game code will probably be limited/non-existent for most games.

I won't pretend to know how much of a benefit this will end up being, and I won't give some random % figure...

I will say this - in my field of Bioinformatics, FP32 is usually not good enough. We usually need FP64 at a minimum as the precision is critical for our work. I have to imagine that game code requires FP32 for a lot of things...

Izuna · Apr 17, 2017

Modbot! he's doing it again~!

Also, this (FarCry images) is misleading because it's not always going to have these effects on a variety of shaders/techniques.

Sad Affleck · Apr 17, 2017

Very informative, thank you.

ethomaz · Apr 17, 2017

Tyl3n0L85 said:
So from what I'm seeing and understanding FP32 is way more precise and "accurate" over FP16 isn't? So what would be the drawback of having FP32 over FP16? Does being more precise would also mean longer/slower to process since there's simply more numbers/data to calculate?

That needs a bit of context...

Before 2006 when shaders got unified in GPUs the games used to run with FP16/FP32 on nVidia or FP24 on ATIs... in case of nVidia the units can do 2x FP16 at the time of 1x FP32 and that give a good boost in performance for games that you can switch between both modes (FarCry, Half-Life 2, etc).

After 2006 the standard of unified shaders was created with FP32 only in mind... and so devs started to use fully FP32 and the GPUs focused in FP32 performance with the FP16 being slow compared with the FP32 unit (some GPUs runs FP16 at 1/8 of the speed of FP32).

But today AMD/nVidia are starting to look at the FP16 due performance gain... nVidia focused the mobiles GPUs like Tegra in FP16 running twice faster than FP32 and mobiles devs mainly develop with FP16 in mind. Now AMD is close to announce the Vega GPU that will compete with nVidia Pascal and the new feature is that FP16 run twice faster than FP32.

I guess AnandTech quote shows what exactly is happening with this move... today it will be not used but in the future games will take advantage of FP16 to get better performance of the GPUs.

AnandTech said:
When PC GPUs made the jump to unified shaders in 2006/2007, the decision was made to do everything at FP32 since thats what vertex shaders typically required to begin with, and its only recently that anyone has bothered to look back. So while there is some long-term potential here for Vegas fast FP16 math to become relevant for gaming, at the moment it wouldnt do anything. Vega will almost certainly live and die in the gaming space based on its FP32 performance.

wesleyshark · Apr 17, 2017

It's always a good feeling to finally understand something instead of only having a vague idea of what it means.

Thank you

RedlineRonin · Apr 17, 2017

The zoomed pictures help.

I can't hardly see a difference between the two in the two groups of shots though.

Is the idea that 32 and 16 can be hotswapped? Like can it be 16 when stuff is far away and then 32 when I'm closer (which is the only time i can see the diff in the zoomed in images).

Skinpop · Apr 17, 2017

the images are misleading. in a real project you would use fp16 where you don't need a larger range and the precision is sufficient.

Gotdatmoney · Apr 17, 2017

Izuna said:
Modbot! he'd doing it again~!

Also, this (FarCry images) is misleading because it's not always going to have these effects on a variety of shaders/techniques.

I don't think he was arguing that. He was just showing an example of how it works in practice so peoole actually know wtf they are talking about.

Tyl3n0L85 · Apr 17, 2017

ethomaz said:
That needs a bit of context...

Before 2006 when shaders got unified in GPUs the games used to run with FP16/FP32 on nVidia or FP24 on ATIs... in case of nVidia the units can do 2x FP16 at the time of 1x FP32 and that give a good boost in performance for games that you can switch between both modes (FarCry, Half-Life 2, etc).

After 2006 the standard of unified shaders was created with FP32 only in mind... and so devs started to use fully FP32 and the GPUs focused in FP32 performance with the FP16 being slow compared with the FP32 unit (some GPUs runs FP16 at 1/8 of the speed of FP32).

But today AMD/nVidia are starting to look at the FP16 due performance gain... nVidia focused the mobiles GPUs like Tegra in FP16 running twice faster than FP32 and mobiles devs mainly develop with FP16 in mind. Now AMD is close to announce the Vega GPU that will compete with nVidia Pascal and the new feature is that FP16 run twice faster than FP32.

I guess AnandTech quote shows what exactly is happening with this move... today it will be not used but in the future games will take advantage of FP16 to get better performance of the GPUs.

Thanks for the info!

ethomaz · Apr 17, 2017

Skinpop said:
the images are misleading. in a real project you would use fp16 where you don't need a larger range and the precision is sufficient.

Yes this is the best case scenario but today in gaming development you use only FP32 because the performance is the same so you don't need to bother with that.

The FraCry images I posted show in a better way the difference in IQ and performance in old nVidia GPUs... that didn't mean it can have the same results and performance difference with actual modern GPUs.

With Vega launch I'm sure they will be demos and bench comparing both... that is supposed to happen in May.

Vintage · Apr 17, 2017

So.. does the api has an additional float type (fp16) that programmers can use, or is it hidden somewhere in the system or what?
I understand these images from compression perspective - it lowers the precision but it may still look good enough in computer graphics. But where and how is it practically used?

ethomaz said:
That needs a bit of context...

Before 2006 when shaders got unified in GPUs the games used to run with FP16/FP32 on nVidia or FP24 on ATIs... in case of nVidia the units can do 2x FP16 at the time of 1x FP32 and that give a good boost in performance for games that you can switch between both modes (FarCry, Half-Life 2, etc).

After 2006 the standard of unified shaders was created with FP32 only in mind... and so devs started to use fully FP32 and the GPUs focused in FP32 performance with the FP16 being slow compared with the FP32 unit (some GPUs runs FP16 at 1/8 of the speed of FP32).

This is much better explanation what happens. FP16 in shaders can be processed up to 2 times as fast on supported hardware, right?

Rodelero · Apr 17, 2017

blu said:
snip

This is a nice idea, though it probably needs some work before it's going to be informative without leading to further confusion. I have no issue following it, but I'm not sure someone with no knowledge of graphics programming would have a clue. I'd suggest removing the integer based stuff, it's just extra noise given what you want to present. I'd also recommend that, for each test you do, you append a short summary explaining what is happening and what can be taken from this. As an example, for your first test:

"We can see that there is significant precision loss with FP16, especially with the low inputs. In the case of this specific test, FP16 isn't precise enough to cope with the extremely small results of x^8 when the input itself is low. In fact, the precision is so subpar that many of the results end up being the same as each other."

It also would be great if you could suggest when this kind of operation might be used too. The first test for example is the kind of thing you might do when calculating a (fairly wide) specular highlight. It would also be nice to have some difference images, as in, the output of abs(reference - fp32), abs(reference - fp16) and abs(fp32 - fp16). In the case of this particular test, you're using the two axes for the same thing, which is probably unnecessary. You may as well only use x and output a monochrome image.

ethomaz · Apr 17, 2017

Vintage said:
So.. does the api has an additional float type (fp16) that programmers can use, or is it hidden somewhere in the system or what?
I understand these images from compression perspective - it lowers the precision but it may still look good enough in computer graphics. But where and how is it practically used?

The API actually has additional float type... game engines too that even have some converters when you compile for Mobile (cellphones, portables, etc).

It is not something new at all for APIs, Engines, compilers, SDKs, etc all supports FP16... just that PC gaming development use mainly FP32 because there is no reason to not use it (at least until Vega release).

Vintage said:
This is much better explanation what happens. FP16 in shaders can be processed up to 2 times as fast on supported hardware, right?

Yes but the number of actual hardware that do that is pretty low: Mobile GPUs (PowerVR, Tegra, etc), PS4 Pro's GPU and future Vega's GPUs.

VariantX · Apr 17, 2017

So from my completely uneducated perspective, you can use fp16 where you just don't need that level of precision and do the task faster? Anyone got any other concrete, real world or theoretical examples where games can benefit? Ethomaz already posted a pretty good example that I can follow.

belvedere · Apr 17, 2017

AMD and Sony were upfront about their partnership for PS4 & Pro. Cerny made it sound like the Vega features (including FP16) were carefully considered to help the PS4 Pro "punch above its weight" to paraphrase.

While I don't think FP16 implementation in Pro is going to catch the world on fire, I wouldn't think Cerny and co. would have invested in it if it were completely useless.

Skinpop · Apr 17, 2017

VariantX said:
So from my completely uneducated perspective, you can use fp16 where you just don't need that level of precision and do the task faster? Anyone got any other concrete, real world or theoretical examples where games can benefit? Ethomaz already posted a pretty good example that I can follow.

maybe coloring in fp16 and lighting in fp32. I could see that working.

LordOfChaos · Apr 17, 2017

Much needed recently, good stuff blu.

Space_nut · Apr 17, 2017

Best post to explain

Sony said:
Teraflops in itself is not always a metric of power without context. TF stands for "Tera floating point operations per second", which is 1000000000000 floating point operations per second. The variable here is the floating point.

simplified example
To simplify it, you can see FP32 as a number with 31 decimals and FP16 as a number with 15 decimals, so

FP32: 1.1234567891234567891234567891234
FP16: 1.123456789123456

As you can see, the FP32 number is more precise as it has more decimals. Now imagine you want to send your coordinates to a person when you're lost. For sake of the example, imagine A to C below as landmarks

[A].....X.........Y.....[C]
............. Fp16..........FP32

If you are between point A and point B, an FP16 coordinate will suffice. However, if you are just outside of point B, between B and C, then it will not suffice as it is out of range of the coordinate of FP16 and requires FP32.

Generally, "game coordinates" are between points B and C, which is why computing tersflops are in context of FP32. The reason why it's not mentioned is because it's a commonly accepted metric. Like in the US when the acceleration of a car is mentioned, people say the car goes zero to sixty in 10 seconds. They don't say the car goes zero to sixty miles per hour in 10 seconds as the metric is redundant.

So, what about the PS4 Pro vs. Scorpio flops Talk? In this example, a whole group of people is lost. If all the people are lost between point B and C (FP32), then per second, Scorpio can find roughly 50% more people (it's rounded up for sake of the example. So:

PS4 Pro per second:
[A].......................P1.....[C]
..................FP16..............FP32

[A].......................P2.....[C]
..................FP16..............FP32

Scorpio per second:
[A]..........................P1.....[C]
..................FP16..............FP32

[A]..........................P2.....[C]
..................FP16..............FP32

[A]..........................P3.....[C]
..................FP16..............FP32

So Scorpio is roughly 50% more capable than Pro in FP32. This number comes from Scorpio being 6TF and PS4Pro being 4.2TF fp32.

The situation changes however if there are people that get lost between A and B: PS4 Pro can process FP16, Scorpio can not. PS4 Pro can process 8.4TF FP16, compared to 4.2TF FP32, because now it only has to search in half the distance. PS4 Pro's FP16 is 8.4TF and Scorpios FP32 is 6TF. So in FP16, PS4 Pro is 40% more capable. So if a person is lost between point A and B, Scorpio has to search for that person between A and C, while PS4 Pro can just look between A and B, and this makes Pro 40% faster to find people that are lost between A and B. Illustration:
PS4 Pro per second:

[A]......P1.....................[C]
..................FP16.............FP32

[A]......P2.....................[C]
..................FP16.............FP32

[A]......P3.....................[C]
..................FP16.............FP32

[A]......P4.....................[C]
..................FP16.............FP32

[A]......P5.....................[C]
..................FP16.............FP32

[A]......P6.....................[C]
..................FP16.............FP32

[A]......P7.....................[C]
..................FP16.............FP32

Scorpio per second:
[A]......P1........................[C]
...................................FP32

[A]......P2........................[C]
...................................FP32

[A]......P3........................[C]
...................................FP32

[A]......P4........................[C]
...................................FP32

[A]......P5........................[C]
...................................FP32

So if Scorpio finds 5 people per second, PS4 Pro van find 7 people per second (40% more). Is this it then? Is the PS4 Pro more powerful? The answer is yes and no. If all the people would be lost between A and B then
PS4 Pro is more capable.

However, the reality is that is that if points A and C are the borders of "Game Engine Village", almost no one lives between A and B, in the FP16 area. And the people that do live between A and B are simply not that important to the overall economy of the village. If Game Engine Village exists in multiple parallel dimensions, then the different mayors (developers) can distribute the population between A and B and B and C, but the fact remains that people that will live between A and B are simply not that important and barely contribute to the economy of the village.

joesiv · Apr 17, 2017

Thanks Blu, always the voice of reason, bringing practical evidence backed by your own testing.

Good stuff!

My only suggestion would be to perhaps consider a test where there are several stages to the calculation. I believe this would make the end result even more clear.

As with anything in math, you want to maintain the maximum precision, or you will see artifacts. With computer graphics, often a value will be used all over the place and go through many stages of calculations to come to the end results. If you could do a few more operations on it, might be neat.

For others, Blu gives a good example of color reproduction, but often color gradiation can be "good enough", but consider an old school example of polygonal culling, or anything to to with a Z-buffer (depth). If you only can have 65,000 stages of depth in a scene, it'll be very hard to know which object is in front of another. These types of precision artifacts are much easier to see.

blu · Apr 17, 2017

flkraven said:
I'm not a dumb guy, at least I don't think I am. I read the OP twice, and I still don't get it. I see the difference in the pictures, but I still don't understand.

Test does precision-sensitive arithmetics in the range [0, 1]. It's trying to catch issues with fp16 precision. Int16 and int8 results are mainly for fun, but also indicative of how fp16 does notably better than the fixed-point int16, for the exact same amount of bits.

Seik said:
Thanks a lot for that, blu.

I heard a lot about FP16/32 lately without actually knowing what it was. This sheds a better light on the subject.

Np. If you are curious about a particular function to test in fp16 - let me know.

Rodelero said:
This is a nice idea, though it probably needs some work before it's going to be informative without leading to further confusion. I have no issue following it, but I'm not sure someone with no knowledge of graphics programming would have a clue. I'd suggest removing the integer based stuff, it's just extra noise given what you want to present. I'd also recommend that, for each test you do, you append a short summary explaining what is happening and what can be taken from this. As an example, for your first test:

"We can see that there is significant precision loss with FP16, especially with the low inputs. In the case of this specific test, FP16 isn't precise enough to cope with the extremely small results of x^8 when the input itself is low. In fact, the precision is so subpar that many of the results end up being the same as each other."

It also would be great if you could suggest when this kind of operation might be used too. The first test for example is the kind of thing you might do when calculating a (fairly wide) specular highlight. It would also be nice to have some difference images, as in, the output of abs(reference - fp32), abs(reference - fp16) and abs(fp32 - fp16). In the case of this particular test, you're using the two axes for the same thing, which is probably unnecessary. You may as well only use x and output a monochrome image.

Good points. I will actually use some of your own wording for the op, if you don't mind ;p

As for the two-axis pictures, the idea is to have other functions as well, so that's just for future-proofing ; )

Duxxy3 · Apr 17, 2017

I would think it would be useful in high fps games where detail is less needed. Think burnout.

Spaced Harrier · Apr 17, 2017

So is there any possible halfway house? An FP24? Running at 66% of time of FP32

Rodelero · Apr 17, 2017

Spaced Harrier said:
So is there any possible halfway house? An FP24? Running at 66% if time of FP32

The short answer is, no.

In a sense, FP16 is already the halfway house, with FP8 also being used in some domains.

Fisty · Apr 17, 2017

belvedere said:
AMD and Sony were upfront about their partnership for PS4 & Pro. Cerny made it sound like the Vega features (including FP16) were carefully considered to help the PS4 Pro "punch above its weight" to paraphrase.

While I don't think FP16 implementation in Pro is going to catch the world on fire, I wouldn't think Cerny and co. would have invested in it if it were completely useless.

Yeah they were probably insistent about getting it in Pro so devs could get used to it and start using it so they'll have stuff ready when PS5 launches. Unfortunately we probably won't see too much use outside of checkerboarding since they can't use fp16 on PS4 vanilla

Skinpop · Apr 17, 2017

Duxxy3 said:
I would think it would be useful in high fps games where detail is less needed. Think burnout.

it's not like it's either or. you can mix and use both where you need them for maximum performance(assuming proper hardware support). in some places using fp16 won't make any noticeable difference.

capitalCORN · Apr 17, 2017

OP, it's nice to simplify it into basic graphics terms. I do imagine that peripheral calculations are relegated to the GPU brunt, in particular physics, which FP32 would be quite advantageous.

ethomaz · Apr 17, 2017

Spaced Harrier said:
So is there any possible halfway house? An FP24? Running at 66% of time of FP32

That is what ATI Radeon used for years (give a lot for ATI Radeon 9800)... instead to go fully to FP32 like nVidia they choose to stay in the middle with FP24.

ATI choose FP24 because it was more than enough for gamers while nVidia said FP32 give you the top notch of image quality (the problem for nVidia was that FX cards runs like crap with FP32).

The first ATI GPU to support more than FP24 (eg. FP32) was R520.

Rodelero said:
The short answer is, no.

In a sense, FP16 is already the halfway house, with FP8 also being used in some domains.

FP24 is indeed a thing and it can be used

But actual hardware didn't have a specific unit for FP24 so it will use the FP32 unit not having any gain of performance.

PS. I don't know how Vega will handle FP24... I know it handle 2x faster FP16 and 4x faster INT8 compared with FP32... using the sense Vega packed them to get performance boost I believe FP24 won't have any gain because it can't be packed in any way inside the FP32 unit.

Hockeymac18 · Apr 17, 2017

Rodelero said:
The short answer is, no.

In a sense, FP16 is already the halfway house, with FP8 also being used in some domains.

FP24 was used in the past in ATI Radeon cards: https://en.wikipedia.org/wiki/ATi_Radeon_R300_Series

https://forum.beyond3d.com/threads/fp24-vs-fp32-pixel-shader-precision.10377/

mugurumakensei · Apr 17, 2017

Skinpop said:
maybe coloring in fp16 and lighting in fp32. I could see that working.

No, FP16 would primarily be useful on things in the background or collision calculations that don't require much precision where "close enough" is not going to appear noticeably wrong to a majority of gamers.

capitalCORN · Apr 17, 2017

Hockeymac18 said:
FP24 was used in the past in ATI Radeon cards: https://en.wikipedia.org/wiki/ATi_Radeon_R300_Series

https://forum.beyond3d.com/threads/fp24-vs-fp32-pixel-shader-precision.10377/

Yeah, the 900 line, when they had Nvidia's 300 and 500 line beat. The screwed the pooch with the x800 line which had no sm3.0 support. Cutting bits of precision in code won't mean shit when the hardware is already optimized for specific targets.

SURGEdude · Apr 17, 2017

Thanks for this blu!

This topic has been much talked about and little understood by so many gamers, especially in light of the Switch launch.

Painguy · Apr 17, 2017

VariantX said:
So from my completely uneducated perspective, you can use fp16 where you just don't need that level of precision and do the task faster? Anyone got any other concrete, real world or theoretical examples where games can benefit? Ethomaz already posted a pretty good example that I can follow.

I can give an example from an OpenGL perspective. In opengl you can send data to the gpu with a particular function( API thing). You can tell the opengl state machine to interpret this data as fp16 by specifying it as GL_HALF_FLOAT. A good use of fp16 I can think of is screen space reflections. Most of the time you are going to blur out the reflection so you dont need precision. You can also choose to reflect things that are close to you( again, need less bits since u dont need accurate data on things far away). Anyway using these calls this data will appear in the shader as fp16 data.

Struct half {
unsigned short bruh; //give us 16bits which we will tell gpu to treat as float
};

Typedef half glhfloat;

void shit (){
glhfloat idk = 1337.7331;
GLuint buffer;
...
glBufferData (..., idk,...);
glBindBuffer (..., buffer);
...
glVertexAttribPointer(0, 1, GL_HALF_FLOAT, GL_FALSE, sizeof(glhfloat), BUFFER_OFFSET(0));
}

It should look something like this anyway. Now i can use w/e data i sent as a float in the shader.

Im on mobile so im not gonna try too hard lol.

Paragon · Apr 17, 2017

Space_nut said:
Best post to explain

That's actually really confusing to me, and I already had a loose understanding of the difference between the two.
My understanding is that FP32 is not "a number with 31 decimals" nor is FP16 "a number with 15 decimals".

Someone please correct me if I'm wrong, but my understanding was that - as the name implies - floating point numbers use a fixed number of bits and, basically, the decimal moves places in the numbers those bits represent.
So for very small numbers, you have a lot of precision after the decimal; e.g. 1.23456789 while larger numbers have less precision; e.g. 123456.789

Note: the above is not an accurate representation, only used to simplify the explanation for people that haven't dealt with this sort of thing before.
I just hope that's not another "simplification" which actually complicates matters.

Really, it's just that FP16 is trading precision for performance.
There are times when that additional precision is not going to be necessary, and should speed things up without affecting image quality.
Unfortunately I suspect that it's going to be used improperly at times, and that will hurt image quality.
And as we're moving to HDR, I would have thought that games would be starting to focus on using more precision, rather than less, but I'm sure there are places you can cut the precision without affecting the end result visually.
I do wonder how much of an overall speed-up it's going to be. It's certainly not going to double game performance for example.

frankie_baby · Apr 17, 2017

Space_nut said:
Best post to explain

So your choice of best post to explain is concluding that its not very useful, sounds biased to me

blu · Apr 17, 2017

joesiv said:
Thanks Blu, always the voice of reason, bringing practical evidence backed by your own testing.

Np, I just don't like idling, so when I end up with spare time I revert to coding things just for amusement and/or satisfying curiosity.

My only suggestion would be to perhaps consider a test where there are several stages to the calculation. I believe this would make the end result even more clear.

Well, as I'm currently on fp32-only hw, doing such a test was a tad limiting. I had to think of a way to make sure data goes to the desired scalar type, so I had to revert to passes and using texture args in between. If I had proper fp16 hw I'd have much larger flexibility.

As with anything in math, you want to maintain the maximum precision, or you will see artifacts. With computer graphics, often a value will be used all over the place and go through many stages of calculations to come to the end results. If you could do a few more operations on it, might be neat.

Stay tuned ; )

Paragon said:
That's actually really confusing to me, and I already had a loose understanding of the difference between the two.
My understanding is that FP32 is not "a number with 31 decimals" nor is FP16 "a number with 15 decimals".

Someone please correct me if I'm wrong, but my understanding was that - as the name implies - floating point numbers use a fixed number of bits and, basically, the decimal moves places in the numbers those bits represent. So for very small numbers, you have a lot of precision after the decimal; e.g. 1.23456789 while larger numbers have less precision; e.g. 123456.789

Your understanding is correct.

Support NeoGAF

EduGAF: fp16 precision through pictures

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Banned

Member

Banned

extra source of jiggaflops

Neo Member

Banned

Member

Neo Member

Neo Member

Member

Member

Member

Banned

Member

Banned

Member

Banned

Banned

Member

Member

Member

Neo Member

Banned

Member

Member

Banned

Member

Junior Butler

Member

Member

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Member

Member

Member

Member

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Similar threads