blu
Wants the largest console games publisher to avoid Nintendo's platforms.
This is a strictly educational thread - leave your console-warrior armaments at the wardrobe. Thank you.
Floating point notations, in general, and in particular as defined by the IEEE-754-1985 and IEEE-754-2008 standards, are an exponential form of fractional numbers, i.e. their value is obtained as mantissa * 2^exp (when binary). You can read all your heart's desire about floats on the respective wiki pages.
Half-precision floating-point numbers, aka fp16, have both lower range and fewer significant digits (aka precision) than single-precision floating-point numbers, aka fp32. But of the two properties, fp16's range is worse by far - with a maximum exponent of only 15, the greatest number fp16 can represent before overflow is (2 - 2^-10) * 2^15 = 65504; in contrast, fp32 goes up to (2 - 2^-23) * 2^127 = 340282346638528859811704183484516925440. Now, as with all finite exponential notations, there's a catch that if you want absolute precision, you need to stay in the low end of the range - the higher you go in range, the worse your precision, as the same number of mantissa bits store each number, so when the number is big enough there are no bits left for any fractional part.
Now that we know range is not fp16's forte, let's focus on precision. Below are four sample images devised to show precision artifacts. They show side-by-side a test subject (left side) and a reference (right side). The test does as follows:
There's a plane of x and y axes, color-coded as red and green, respectively. Axes have range of [0, 1) and granularity of 1/512. For each point on the xy plane, a power function is computed, raising the coordinate pair (x, y) to the 8th power. The result is stored in 4 types of temporary storage: fp32, fp16, int16 and int8; the reference image (i.e. the right side) stores everything in fp32.
The test subject ultimately tries to reconstruct the original (x, y) pair as the inverse power of the value obtained from the temporary storage - i.e. for each point on the plane, 1/8th power is computed from the temporary storage at that point. Basically, you can think of the left side of each image as the reconstruction of the right side of the same image, with precision-sensitive data passed via the aforementioned temporary storage.
There are several observations that could be made from the test, particularly with respect to fp16, but I'll leave that part for now to the inquisitive reader, in hope that a healthy discussion forms. Back in a while.
ps: I'm open to suggestions re particular functions gaffers would like to see passed through this pipeline.
edit: Ok, I could have been a tad more verbose about the test procedure, so here are some details:
And my own take on the results:
To part two..
Floating point notations, in general, and in particular as defined by the IEEE-754-1985 and IEEE-754-2008 standards, are an exponential form of fractional numbers, i.e. their value is obtained as mantissa * 2^exp (when binary). You can read all your heart's desire about floats on the respective wiki pages.
Half-precision floating-point numbers, aka fp16, have both lower range and fewer significant digits (aka precision) than single-precision floating-point numbers, aka fp32. But of the two properties, fp16's range is worse by far - with a maximum exponent of only 15, the greatest number fp16 can represent before overflow is (2 - 2^-10) * 2^15 = 65504; in contrast, fp32 goes up to (2 - 2^-23) * 2^127 = 340282346638528859811704183484516925440. Now, as with all finite exponential notations, there's a catch that if you want absolute precision, you need to stay in the low end of the range - the higher you go in range, the worse your precision, as the same number of mantissa bits store each number, so when the number is big enough there are no bits left for any fractional part.
Now that we know range is not fp16's forte, let's focus on precision. Below are four sample images devised to show precision artifacts. They show side-by-side a test subject (left side) and a reference (right side). The test does as follows:
There's a plane of x and y axes, color-coded as red and green, respectively. Axes have range of [0, 1) and granularity of 1/512. For each point on the xy plane, a power function is computed, raising the coordinate pair (x, y) to the 8th power. The result is stored in 4 types of temporary storage: fp32, fp16, int16 and int8; the reference image (i.e. the right side) stores everything in fp32.
The test subject ultimately tries to reconstruct the original (x, y) pair as the inverse power of the value obtained from the temporary storage - i.e. for each point on the plane, 1/8th power is computed from the temporary storage at that point. Basically, you can think of the left side of each image as the reconstruction of the right side of the same image, with precision-sensitive data passed via the aforementioned temporary storage.
temp storage fp32 said:
temp storage fp16 said:
temp storage int16 said:
temp storage int8 said:
There are several observations that could be made from the test, particularly with respect to fp16, but I'll leave that part for now to the inquisitive reader, in hope that a healthy discussion forms. Back in a while.
ps: I'm open to suggestions re particular functions gaffers would like to see passed through this pipeline.
edit: Ok, I could have been a tad more verbose about the test procedure, so here are some details:
- This is all running in a GLSL shader on an NV Kepler.
- Temporary storage is a texture, making sure the GPU actually uses the desired storage type - most desktop GPUs (Kepler included) don't have fp16 ALUs and cannot keep fp16 in registers either.
- Integer-type storage keeps fractional values as fixed-point. Basically, all participating types for temporary storage keep fractions, but the integer ones cannot do range above 1.0, which is fine for our purposes.
"We can see that there is significant precision loss with FP16, especially with the low inputs. In the case of this specific test, FP16 isn't precise enough to cope with the extremely small results of x^8 when the input itself is low. In fact, the precision is so subpar that many of the results end up being the same as each other."
<snip>
The first test for example is the kind of thing you might do when calculating a (fairly wide) specular highlight.
And my own take on the results:
- Test was deliberately chosen to do power computations - power computations are common for specular effects (where a light source's reflection on a surface is approximated via a power function), and power functions exacerbate precision issues. That said..
- Test is more sensitive to precision than a typical specular function, as whereas the latter might suffer in its lobe shape, this test, by virtue of reconstruction back from exp to linear space visually amplifies the precision effect.
- Test is meant to explore slightly more than just fp arithmetics - with modern GPUs nothing stops devs from using all kinds of fp and fixed-point computations.
- Fixed-point fractions are rather unfit for power functions - there's a very apparent banding on the exponents, even for large-ish types (e.g. int16), and the nether regions are, well, all gone.
- The one issue with fp16 vs fp32 in this test is the apparent underflow - while fp16 exhibits a smooth gradient (as expected) it just runs out of bits in the darkest region, where small x's and y's are just squashed to zero.
To part two..