Support NeoGAF

Syrus · Aug 8, 2017

ethomaz said:
Please.

AMD and nVidia themselves uses the FP16 raw power nomenclature being 2x FP32 in the hardware that supports double rate.

The two metrics are common used by hardware manufactures and has nothing to do with your console wars bullshit...

Pro GPU: 4.2TFs FP32 and 8.4TFs FP16
Vega 64: 13.7TFs FP32 and 27.5TFs FP16

There is no nonsense when it is a fact of hardware specs... I can't understand what are you calling bullshit or fighting against lol

Edit - Some AMD slides from Vega just to end this discussion:

You can even look at AMD official site if you wish.

Wheres the receipts. Games do not use 100% FP16 RPM.

Its all misleadinf because its theory. Noone has done it yet on any game.

ethomaz · Aug 8, 2017

dr_rus said:
FP16 performance will most certainly be less than 2X when compared to GCN3 or GCN4 chips as they gain performance from FP16 as well.

That is to be see with the future games using it... that indeed the point of discussion of this thread and not the denial of the raw flops of FP16 in these new cards.

Syrus said:
Wheres the receipts. Games do not use 100% FP16 RPM.

Its all misleadinf because its theory. Noone has done it yet on any game.

Receipts for what? It is a fact and you can look at official specs for any card with double rate half-precision.

Now you want performance comparison? Wait the first games using FP16 to draw a picture but remember FP32 is not used 100% by games too that makes you claim dumb.

Syrus · Aug 8, 2017

ethomaz said:
That is to be see with the future games using it... that indeed the point of discussion of this thread and not the denial of the raw flops of FP16 in these new cards.

My main point is that it seems they use Fp16 only on thinga that dont need as much precision or detail not for the entire games calculations etc.

So far the only claim is 14% increase in FPS in Wolfenstein. Which is awesome

martino · Aug 8, 2017

onQ123 said:
I can tell you that if a dev moved 50% of the fp32 code to fp16 they would get what we are used to seeing from 6.3TF fp32. but I don't know what % of the fp32 code devs will be able to get away with using fp16 for.

why use a 50% example and not less?
This is the problem with you and fp16.. the implication.

thuway · Aug 8, 2017

I honestly wish we had a developer help explain how relevant this technique is and if it actually is the real deal or just marketing hype.

I kind of see it being analogous to the ES RAM in Xbox One where it was used as a crutch that did wonders in the right situation.

TheThreadsThatBindUs · Aug 8, 2017

Syrus said:
Wheres the receipts. Games do not use 100% FP16 RPM.

Its all misleadinf because its theory. Noone has done it yet on any game.

The ignorance of this post is that no game even uses 100% fp32 calculations. There's alot more to videogames than just floating point calculations.

TheThreadsThatBindUs · Aug 8, 2017

martino said:
why use a 50% example and not less?
This is the problem with you and fp16.. the implication.

No... you're arguing against a problem with what AMD publishes and then blaming it on onQ123.

I'd quit now if I was you.

onQ123 · Aug 8, 2017

martino said:
why use a 50% example and not less?
This is the problem with you and fp16.. the implication.

Because it's an example

What type of question is this?

TheThreadsThatBindUs · Aug 8, 2017

thuway said:
I honestly wish we had a developer help explain how relevant this technique is and if it actually is the real deal or just marketing hype.

I kind of see it being analogous to the ES RAM in Xbox One where it was used as a crutch that did wonders in the right situation.

Sebbbi A from Beyond3D (AKA an authority on the subject):

Sebbbi A from Beyond3D said:
All existing games (except a few HDR games) output image at 8 bits per channel (RGB8). Input textures are also commonly 8 bit per channel (and BC compressed = lower quality as 8 bit).

As your input and output data has only 8 bit precision you don't need to calculate all intermediate math at 32 bit. Games don't store intermediate buffers as 32 bit floats either. Rgba16f is used commonly for HDR data and rgb10 and rgba8 for other intermediate buffers. 16 bit float processing is fine for most math in games. Results cannot be distinguished by naked eye from full 32 bit float pipeline, as long as the developer knows what he/she is doing. Especially if temporal AA is used.

Unfortunately writing good mixed fp16/fp32 code requires good knowledge about floating point math behaviour and some basic numeric range analysis (inputs/outputs and intermediate values). It is possible to write math in a way that minimizes floating point issues, allowing you to use fp16 more often. Of course if you use fp16 in a wrong way, you get banding and other artifacts.

It's nothing like ES RAM, however. ESRAM usage couldn't be avoided on Xbox 360. FP16 usage can. FP16 usage with hardware supporting RPM is merely an optimization route for improving performance in some areas of the graphics rendering pipeline. Some games will benefit from it more than others, as will some developers (i.e. not all graphics programmers are made equal).

onQ123 · Aug 8, 2017

Syrus said:
Wheres the receipts. Games do not use 100% FP16 RPM.

Its all misleadinf because its theory. Noone has done it yet on any game.

You mean to tell me that you been doing these drive by posts for months & this is your understanding of things?

martino · Aug 8, 2017

TheThreadsThatBindUs said:
No... you're arguing against a problem with what AMD publishes and then blaming it on onQ123.

I'd quit now if I was you.

what i blame on onQ123 is what he hopes and imply from fp16. Not the way AMD publishes theorical number.
if you don't why 4.2 to 6.3 is a convinient example...it's because you don't want ti see it.

Sad Affleck · Aug 8, 2017

onQ123 said:
I've been banned twice for this same information because people don't want to understand & just yell OMG he said PS4 Pro is 8.4TF!

Honest question: Didn't this clue you in that you are doing something wrong? Was it really that 'people don't want to understand' or that you presented the information in such a manner as to cause confusion? I'll say it again: Quoting specs and theoretical numbers may be ok in an environment where everyone is a developer and understands the information immediately without needing an explanation. GAF is not such an environment. You should have known that quoting specs out of context and without any explanation as to what these specs mean could (and did) derail entire threads as users latch on to the familiar teraflop figure and fight each other over what it actually means. That's on you because you did a poor job explaining what these specs mean.

onQ123 · Aug 8, 2017

martino said:
what i blame on onQ123 is what he hopes and imply from fp16. Not the way AMD publishes theorical number.
if you don't why 4.2 to 6.3 is a convinient example...it's because you don't want ti see it.

So you think that would equal to me saying PS4 Pro is more powerful than Xbox One X?

going from fp32 to fp16 is a compromise to begin with & plus Xbox One has more & faster memory. you're fighting with your own insecurities my post had nothing to do with Xbox One X.

I used 50% because it's simple & right there in the middle to give a good example to go on.

TheThreadsThatBindUs · Aug 8, 2017

martino said:
what i blame on onQ123 is what he hopes and imply from fp16. Not the way AMD publishes theorical number.

This is your problem, right here. You're interpreting mal-intent based on nothing.

Go back and read everything that onQ123 posted and he consistently and clearly qualified his statements by claiming each time that he's talking about a "theoretical max." metric.

martino said:
if you don't why 4.2 to 6.3 is a convinient example...it's because you don't want ti see it.

Clearly you aren't grasping this because the very 4.2 TFLOPs figure you quote in this very post for PS4 Pro is also a "theoretical max." metric... just in 32 bit precision as opposed to 16 bit precision. It's just as unrealistic as the 2x fp16 metric, since NO GPU micro-architecture is 100% efficient.

What you're essentially arguing here is that onQ123 should instead try to pull a number out of his arse to estimate maximum fp16 performance, instead of using the published theoretical figure that the actual hardware designer, AMD, themselves released.... do you not see now how ridiculous you're sounding?

AMD themselves published that the Vega micro-architecture with RPM delivers 2x fp16 ops per clock than 32bit ops per clock (as per the image in my previous post).

If you have an argument about these numbers being disingenuous or deceptive then blame AMD.

onQ123 · Aug 8, 2017

Sad Affleck said:
Honest question: Didn't this clue you in that you are doing something wrong? Was it really that 'people don't want to understand' or that you presented the information in such a manner as to cause confusion? I'll say it again: Quoting specs and theoretical numbers may be ok in an environment where everyone is a developer and understands the information immediately without needing an explanation. GAF is not such an environment. You should have known that quoting specs out of context and without any explanation as to what these specs mean could (and did) derail entire threads as users latch on to the familiar teraflop figure and fight each other over what it actually means. That's on you because you did a poor job explaining what these specs mean.

No because just like this thread it's people throwing BS all over the place & making a lot of noise while pointing at the person who is giving a factual statement as if he did something wrong.

Mods don't always fact check they just come in & try to clean up the thread

TheThreadsThatBindUs · Aug 8, 2017

Sad Affleck said:
Honest question: Didn't this clue you in that you are doing something wrong? Was it really that 'people don't want to understand' or that you presented the information in such a manner as to cause confusion? I'll say it again: Quoting specs and theoretical numbers may be ok in an environment where everyone is a developer and understands the information immediately without needing an explanation. GAF is not such an environment. You should have known that quoting specs out of context and without any explanation as to what these specs mean could (and did) derail entire threads as users latch on to the familiar teraflop figure and fight each other over what it actually means. That's on you because you did a poor job explaining what these specs mean.

Base your criticisms on his posts in this thread and you'll see that nowhere has he quoted the 2x fp16 rate without clarifying that it's a theoretical maximum.

Why are you trying to crucify this guy for making correct statements based on something he said in the past in another thread that he's already learned his lesson for?

I don't get why you guys are so intent on targetting this guy...

You're seriously derailing this thread.

Syrus · Aug 8, 2017

onQ123 said:
You mean to tell me that you been doing these drive by posts for months & this is your understanding of things?

I didnt mean every calculation I meant where fp32 would be used , insert packed fp16 instead.

I also been mentioning that your quotes aee all theory, in which you agree, im just argueing that the theory means little until FP16 packed is heavily used and shown to have a huge boost in a game vs using fp32.

Which hasnt happened yet. So far the best thing weve got is here, a 14% fps boost. Which is a great thing.

I just think people are going to take your 2x boost and run wild with it and be misinformed on it all.

Ill stop posting about your guys quotes because im probably derailing the thread as much as I feel you are as well nor do I want to be banned for this.

onQ123 · Aug 8, 2017

Syrus said:
I didnt mean every calculation I meant where fp32 would be used , insert packed fp16 instead.

I also been mentioning that your quotes aee all theory, in which you agree, im just argueing that the theory means little until FP16 packed is heavily used and shown to have a huge boost in a game vs using fp32.

Which hasnt happened yet. So far the best thing weve got is here, a 14% fps boost. Which is a great thing.

I just think people are going to take your 2x boost and run wild with it and be misinformed on it all.

Ill stop posting about your guys quotes because im probably derailing the thread as much as I feel you are as well nor do I want to be banned for this.

Each & every time that I have posted the specs I have always had context to show that it's fp16 or fp32 I'm not sure what's misinforming about that.

kirankara78 · Aug 8, 2017

From what I've managed to find on the topic, fp16 usage will be only relevant in certain situations and isn't a magical increase all round.

http://www.gamersnexus.net/...

"Rapid-packed math was shown effectively doubling the amount of hair strands that could be rendered per second. This is done with precision switching, where Vega is able to push FP16 rather than FP32 for reduced precision and increased speed. Precision switching isnt deployable in every aspect of gaming youd have potential errors in data presentation but its a fine use case for hair rendering. Hair doesnt need to be precise, particularly with strands numbering in the tens of thousands. AMD mentioned that RPM has already been used in consoles and high-end compute GPUs, but that they were attempting to bring this feature into the PC gaming world. We continue to be dubious about gaming applicationsAMD can develop the technology, but its up to software developers to actually make use of it."

EvB · Aug 8, 2017

uprendering?

Sad Affleck · Aug 8, 2017

TheThreadsThatBindUs said:
Base your criticisms on his posts in this thread and you'll see that nowhere has he quoted the 2x fp16 rate without clarifying that it's a theoretical maximum.

Why are you trying to crucify this guy for making correct statements based on something he said in the past in another thread that he's already learned his lesson for?

I don't get why you guys are so intent on targetting this guy...

You're seriously derailing this thread.

If you feel this way I apologize and I'm out.

onQ123 · Aug 13, 2017

Posted?

RX Vega's Rapid Packed Math Could Narrow The Power Gap Between PS4 Pro and Xbox One X

RX Vega's launch is now imminent. Next Monday, AMD's latest high-end series of graphics cards will make its highly anticipated debut starting at $399.

Its launch and potential success could also have an indirect impact in the ongoing struggle between Sony's PlayStation 4 Pro and Microsoft's upcoming Xbox One X, due to be available in stores from November 7th for a price of $499.

Let's see exactly why. RX Vega will introduce, for the first time in desktop GPUs, the Rapid Packed Math feature. This allows two half-float operations (FP16) to be executed at the same time it would take for one full-float operation (FP32). In the words of AMD:

"Next-Gen Compute Units (NCUs) provide super-charged pathways for doubling processing throughput when using 16-bit data types.1 In cases where a full 32 bits of precision is not necessary to obtain the desired result, they can pack twice as much data into each register and use it to execute two parallel operations. This is ideal for a wide range of computationally intensive applications including image/video processing, ray tracing, artificial intelligence, and game rendering.
"

Just ahead of the PlayStation 4 Pro launch, System Architect Mark Cerny had revealed that the console would include a couple features from AMD's future roadmap. One of these is what AMD is now calling Rapid Packed Math; at the time, Cerny said that it has the potential to ”radically increase performance".

"A few AMD roadmap features are appearing for the first time in PS4 Pro. One of the features appearing for the first time is the handling of 16-bit variables – it's possible to perform two 16-bit operations at a time instead of one 32-bit operation. In other words, at full floats, we have 4.2 teraflops. With half-floats, it's now double that, which is to say, 8.4 teraflops in 16-bit computation. This has the potential to radically increase performance.
Talking to other game developers, we also got the same response."

Interestingly, Microsoft didn't opt to add this particular feature to their Xbox One X custom design. Perhaps they decided that it was not needed with the extra memory and GPU power. Also, this is a feature that developers would have to put time into researching and implementing, which is something that many third-party developers could choose to skip.

However, things got more interesting when AMD revealed last week that Wolfenstein II: The New Colossus and Far Cry 5, two of the most anticipated first-person shooters due in the upcoming months, will support Rapid Packed Math. You can watch Ubisoft's Steve Mcauley praising the feature in the video below.

This is important since with the feature now also available on PC with AMD's RX Vega GPUs, more game developers could be encouraged to use Rapid Packed Math across the board, PlayStation 4 Pro included. It's important to also note that this won't be enough to completely close the performance gap with the Xbox One X since it's simply too wide and FP16 operations cannot be used in every instance.

Still, it stands to reason that widespread adoption of Rapid Packed Math could help PlayStation 4 Pro versions of multiplatform games getting closer to the Xbox One X releases. It's hard to say how much exactly, since the benefits may vary depending on the specific technology used in any given game, but we'll be keeping an eye on this promising feature as more developers elect to use it.

Video

Wollan · Aug 13, 2017

Hopefully it can be toggled on/off on PC versions for benchmarking to see real world performance.

onQ123 · Aug 14, 2017

From the F1 2017 PS4 Pro enhancements blog post

Finally, track shaders also receive an upgrade, one again thanks to the clever design of the PS4 Pro hardware. As the developers explained, packed fp16 ALU operations, which are unique to the PS4 Pro in the console space, offer us a very powerful tool for optimising shaders, allowing us to enhance CU occupancy and reduce overall instruction count.

For the less tech-savvie, this ultimately means the hardware design naturally lends itself to providing higher-quality environment shaders, allowing the team to give tracks a more finely-detailed overall appearance.

Side note: I think it's time for a RPM/fp16 PS4 Pro thread that can stay open without the trolling. It seems that the trolls are doing a good job because there is no thread left open to post about PS4 Pro using fp16.

PLASTICA-MAN · Aug 15, 2017

onQ123 said:
From the F1 2017 PS4 Pro enhancements blog post

Side note: I think it's time for a RPM/fp16 PS4 Pro thread that can stay open without the trolling. It seems that the trolls are doing a good job because there is no thread left open to post about PS4 Pro using fp16.

I was going to post this here since I found out about this right now. It seems F1 2017 is the first game to release using F16 RPM which allowed them besides improving the resolution to add better reflections and improved shadows.

We also need a thread for games that will be using it on PS4 Pro.

onQ123 · Aug 15, 2017

PLASTICA-MAN said:
I was going to post this here since I found out about this right now. It seems F1 2017 is the first game to release using F16 RPM which allowed them besides improving the resolution to add better reflections and improved shadows.

We also need a thread for games that will be using it on PS4 Pro.

Good luck keeping that tread open lol people are really acting like kids when it come to something they don't want to understand. This isn't some fake made up stuff it's real but yet people act like someone is claiming secret sauce & fairy dust is inside of the PS4 Pro.

But on the topic of F1 2017 I wonder how the end result using CBR & higher settings look compared to what it would have looked if they went for native 4K with the same settings as base PS4.

ethomaz · Aug 16, 2017

I read the blog but I think most people wants see some examples that can be measurable like the same hardware running with RPM on/off.

AMD needs to show better examples or peharps we need to wait the first games on PC for that.

onQ123 · Aug 19, 2017

ethomaz said:
I read the blog but I think most people wants see some examples that can be measurable like the same hardware running with RPM on/off.

AMD needs to show better examples or peharps we need to wait the first games on PC for that.

They showed the ROTR hair demo I think that was a good example besides AMD don't make games so waiting for the games that take advantage of the hardware is the best bet.

Sidenote: PS4 Pro & Xbox One X GPU is separated by 4 more CU's + higher clock rate on Xbox One X & PS4 Pro having features like RPM from a GPU generation ahead of Xbox One X GPU I think the difference between games will be smaller than what is expected from the 4.2TF vs 6TF fp32 numbers. My guess is that the extra/faster Ram is going to be the thing that set Xbox One X apart with better textures while PS4 Pro might have better effects or even better fps while using CBR or a lower resolution.

I remember the dev or someone said that F1 2017 was going to be native 4K on Xbox One X but in the end they went with CBR also I wonder if that was because of the fact that PS4 Pro was able to use CBR & have higher settings than the OG PS4 which probably made them decide that it would be better to use CBR & higher settings vs native 4K with the same settings as the base hardware?

LordOfChaos · Aug 19, 2017

AMDs figures for FP16 performance increase. Not a magic performance doubler as some liked to bill it, but definitely a nice addition for no visual degradation (and more physics).

onQ123 · Aug 19, 2017

LordOfChaos said:
AMDs figures for FP16 performance increase. Not a magic performance doubler as some liked to bill it, but definitely a nice addition for no visual degradation (and more physics).

Who said it was a magic performance doubler?

if everything could be done using fp16 & it was no other bottlenecks maybe but that's not the case & the benefits will depend on what can be moved to fp16 without a noticeable drop in quality.

dr_rus · Aug 19, 2017

LordOfChaos said:
AMDs figures for FP16 performance increase. Not a magic performance doubler as some liked to bill it, but definitely a nice addition for no visual degradation (and more physics).

This image was posted and discussed previously.

Also - the jury's still out on the "no visual degradation" part.

Space_nut · Aug 19, 2017

LordOfChaos said:
AMDs figures for FP16 performance increase. Not a magic performance doubler as some liked to bill it, but definitely a nice addition for no visual degradation (and more physics).

And that's for only those specific rendering processes not the whole gpu rendering frame. So that 20% faster may only be for a process that's 5% of the entire rendering frame so it's a 1% increase in performance

Sony · Aug 19, 2017

onQ redeemed

onQ123 · Aug 19, 2017

Space_nut said:
And that's for only those specific rendering processes not the whole gpu rendering frame. So that 20% faster may only be for a process that's 5% of the entire rendering frame so it's a 1% increase in performance

Is that 1% of China?

anothertech · Aug 19, 2017

Hope this helps with framerates. Get a little more umph outta these jaguars. lol

PFD · Aug 19, 2017

LordOfChaos said:
AMDs figures for FP16 performance increase. Not a magic performance doubler as some liked to bill it, but definitely a nice addition for no visual degradation (and more physics).

PS4 Pro 8.4TF confirmed?

/s

onQ123 · Aug 19, 2017

PFD said:
PS4 Pro 8.4TF confirmed?

/s

PS4 Pro 8.4TF fp16 has been confirmed since last year

score01 · Aug 19, 2017

Space_nut said:
And that's for only those specific rendering processes not the whole gpu rendering frame. So that 20% faster may only be for a process that's 5% of the entire rendering frame so it's a 1% increase in performance

Or you know, the idTech6 demo quoted earlier which had 14% fps improvement with fp16 over fp32.

Onq123 redeemed.

Sounds like adding in RPM to the GPU was some great future proofing by Sony.

Negotiator · Aug 19, 2017

Space_nut said:
And that's for only those specific rendering processes not the whole gpu rendering frame. So that 20% faster may only be for a process that's 5% of the entire rendering frame so it's a 1% increase in performance

No need to downplay RPM just because X doesn't have it. X will still be the most powerful console until PS5 comes out.

TheThreadsThatBindUs · Aug 19, 2017

dr_rus said:
This image was posted and discussed previously.

Also - the jury's still out on the "no visual degradation" part.

Not really.

Obviously, developers are only going to use fp16 and mixed precision where it makes sense in the rendering pipeline (i.e. where the additional precision of 32bit floats isn't necessary).

No dev is going to sacrifice visual fidelity of graphical effects for the higher performance, otherwise all console games would be 60fps.

So I think its more accurate to claim that the jury's still out on the degree of performance gain mixed precision opens up in future games, or on the extent to which the graphics rendering pipeline can leverage mixed precision for the increased performance.

onQ123 · Aug 20, 2017

tapantaola said:
No need to downplay RPM just because X doesn't have it. X will still be the most powerful console until PS5 comes out.

I think it's more than RPM that's part of PS4 Pro but not Xbox One X like IWD & Primitive Shaders

”The work distributor in PS4 Pro is very advanced," he claimed. ”Not only does it have the fairly dramatic tesselation improvements from Polaris [AMD's GPU architecture], it also has some post-Polaris functionality that accelerates rendering of scenes with very small objects. ”

Edit:

Next-generation geometry engine

To meet the needs of both professional graphics and gaming
applications, the geometry engines in ”Vega" have been
tuned for higher polygon throughput by adding new fast
paths through the hardware and by avoiding unnecessary
processing. This next-generation geometry (NGG) path is
much more flexible and programmable than before.
To highlight one of the innovations in the new geometry
engine, primitive shaders are a key element in its ability to
achieve much higher polygon throughput per transistor.
Previous hardware mapped quite closely to the standard
Direct3D rendering pipeline, with several stages including
input assembly, vertex shading, hull shading, tessellation,
domain shading, and geometry shading. Given the wide
variety of rendering technologies now being implemented
by developers, however, including all of these stages isn't
always the most ecient way of doing things. Each stage
has various restrictions on inputs and outputs that may
have been necessary for earlier GPU designs, but such
restrictions aren't always needed on today's more flexible
hardware.

”Vega's" new primitive shader support allows some parts of
the geometry processing pipeline to be combined and
replaced with a new, highly ecient shader type. These
flexible, general-purpose shaders can be launched very
quickly, enabling more than four times the peak primitive
cull rate per clock cycle.
In a typical scene, around half of the geometry will be
discarded through various techniques such as frustum
culling, back-face culling, and small-primitive culling. The
faster these primitives are discarded, the faster the GPU
can start rendering the visible geometry. Furthermore,
traditional geometry pipelines discard primitives after
vertex processing is completed, which can waste computing
resources and create bottlenecks when storing a large batch
of unnecessary attributes. Primitive shaders enable early
culling to save those resources.

The ”Vega" 10 GPU includes four geometry engines which
would normally be limited to a maximum throughput of
four primitives per clock, but this limit increases to more
than 17 primitives per clock when primitive shaders are
employed.⁷
Primitive shaders can operate on a variety of dierent
geometric primitives, including individual vertices,
polygons, and patch surfaces. When tessellation is enabled,
a surface shader is generated to process patches and control
points before the surface is tessellated, and the resulting
polygons are sent to the primitive shader. In this case, the
surface shader combines the vertex shading and hull
shading stages of the Direct3D graphics pipeline, while the
primitive shader replaces the domain shading and
geometry shading stages.
Primitive shaders have many potential uses beyond
high-performance geometry culling. Shadow-map
rendering is another ubiquitous process in modern engines
that could benefit greatly from the reduced processing
overhead of primitive shaders. We can envision even more
uses for this technology in the future, including deferred
vertex attribute computation, multi-view/multi-resolution
rendering, depth pre-passes, particle systems, and
full-scene graph processing and traversal on the GPU.
Primitive shaders will coexist with the standard hardware
geometry pipeline rather than replacing it. In keeping with
”Vega's" new cache hierarchy, the geometry engine can now
use the on-chip L2 cache to store vertex parameter data.
This arrangement complements the dedicated parameter
cache, which has doubled in size relative to the
prior-generation ”Polaris" architecture. This caching setup
makes the system highly tunable and allows the graphics
driver to choose the optimal path for any use case.
Combined with high-speed HBM2 memory, these
improvements help to reduce the potential for memory
bandwidth to act as a bottleneck for geometry throughput.
Another innovation of ”Vega's" NGG is improved load
balancing across multiple geometry engines. An intelligent
workload distributor (IWD) continually adjusts pipeline
settings based on the characteristics of the draw calls it
receives in order to maximize utilization.

One factor that can cause geometry engines to idle is
context switching. Context switches occur whenever the
engine changes from one render state to another, such as
when changing from a draw call for one object to that of a
dierent object with dierent material properties. The
amount of data associated with render states can be quite
large, and GPU processing can stall if it runs out of
available context storage. The IWD seeks to avoid this
performance overhead by avoiding context switches
whenever possible.
Some draw calls also include many small instances (i.e.,
they render many similar versions of a simple object). If an
instance does not include enough primitives to fill a
wavefront of 64 threads, then it cannot take full advantage
of the GPU's parallel processing capability, and some
proportion of the GPU's capacity goes unused. The IWD
can mitigate this eect by packing multiple small instances
into a single wavefront, providing a substantial boost to
utilization.

”Vega" NCU with Rapid Packed Math

GPUs today often use more mathematical precision than
necessary for the calculations they perform. Years ago, GPU
hardware was optimized solely for processing the 32-bit
floating point operations that had become the standard for
3D graphics. However, as rendering engines have become
more sophisticated—and as the range of applications for
GPUs has extended beyond graphics processing—the value
of data types beyond FP32 has grown.

The programmable compute units at the heart of ”Vega"
GPUs have been designed to address this changing
landscape with the addition of a feature called Rapid
Packed Math. Support for 16-bit packed math doubles peak
floating-point and integer rates relative to 32-bit
operations. It also halves the register space as well as the
data movement required to process a given number of
operations. The new instruction set includes a rich mix of
16-bit floating point and integer instructions, including
FMA, MUL, ADD, MIN/MAX/MED, bit shifts, packing
operations, and many more.

For applications that can leverage this capability, Rapid
Packed Math can provide a substantial improvement in
compute throughput and energy eciency. In the case of
specialized applications like machine learning and training,
video processing, and computer vision, 16-bit data types are
a natural fit, but there are benefits to be had for more
traditional rendering operations, as well. Modern games,
for example, use a wide range of data types in addition to
the standard FP32. Normal/direction vectors, lighting
values, HDR color values, and blend factors are some
examples of where 16-bit operations can be used.
With mixed-precision support, ”Vega" can accelerate the
operations that don't benefit from higher precision while
maintaining full precision for the ones that do. Thus, the
resulting performance increases need not come at the
expense of image quality.

KnightimeX_Legacy · Aug 20, 2017

Is Wolfenstein 2 going to use the same engine DOOM was using?
I think it's IDtech 6 but not sure.

DOOM looked GREAT!
Wolfy 2 using the same would be very nice.

onQ123 · Aug 20, 2017

PS4 Pro GPU seem to be basically Vega without HBM all the way down to the fact that it can swap apps in & out of the GDDR5 to the DDR3 memory.

HBCC technology can be leveraged for consumer
applications, as well. The key limitation in that space is that
most systems wont have the benefit of large amounts of
system memory (i.e., greater than 32 GB) or solid-state
storage on the graphics card. In this case, HBCC efectively
extends the local video memory to include a portion of
system memory. Applications will see this storage capacity
as one large memory space. If they try to access data not
currently stored in the local high-bandwidth memory, the
HBCC can cache the pages on demand, while less recently
used pages are swapped back into system memory. This
unified memory pool is known as the HBCC Memory
Segment (HMS).

On the standard [PS4], if you're swapping between an application like Netflix and a game, Netflix is still resident in system memory, even when you're playing the game. We use that architecture because it allows for very quick swapping between applications. It's all already in memory, said Cerny.

On PS4 Pro, we do things a bit differently. When you stop using Netflix, we move it to the gigabyte of slow, conventional DRAM. Using that sort of strategy frees up almost a gigabyte of our 8 GB of GDDR5. We use 512 megabytes [of that] for games, which is to say that the games can use 5.5 GB rather than 5 GB. And we use most of the rest to make the PS4 Pro interface 4K, rather than the 1080p it's been to date. So when you hit the PS4 button, that's a 4K interface.

ethomaz · Aug 20, 2017

KnightimeX_Legacy said:
Is Wolfenstein 2 going to use the same engine DOOM was using?
I think it's IDtech 6 but not sure.

DOOM looked GREAT!
Wolfy 2 using the same would be very nice.

Yes it is the same engine but enhanced to take advantage of FP16 RPM.

Gitaroo · Aug 20, 2017

onQ123 said:
PS4 Pro GPU seem to be basically Vega without HBM all the way down to the fact that it can swap apps in & out of the GDDR5 to the DDR3 memory.

HBCC technology can be leveraged for consumer
applications, as well. The key limitation in that space is that
most systems wont have the benefit of large amounts of
system memory (i.e., greater than 32 GB) or solid-state
storage on the graphics card. In this case, HBCC efectively
extends the local video memory to include a portion of
system memory. Applications will see this storage capacity
as one large memory space. If they try to access data not
currently stored in the local high-bandwidth memory, the
HBCC can cache the pages on demand, while less recently
used pages are swapped back into system memory. This
unified memory pool is known as the HBCC Memory
Segment (HMS).

Still a huge gap between the pro and xbx. Memory is another thing I wish they would increase for higher res texture beside the lack of uhd bd drive.

dr_rus · Aug 20, 2017

TheThreadsThatBindUs said:
Not really.

Obviously, developers are only going to use fp16 and mixed precision where it makes sense in the rendering pipeline (i.e. where the additional precision of 32bit floats isn't necessary).

No dev is going to sacrifice visual fidelity of graphical effects for the higher performance, otherwise all console games would be 60fps.

So I think its more accurate to claim that the jury's still out on the degree of performance gain mixed precision opens up in future games, or on the extent to which the graphics rendering pipeline can leverage mixed precision for the increased performance.

How will you tell that you're seeing lower image quality due to 16 bit math being used if there's no way to compare that to 32 bits - on PS4Pro for example? Also where the line lies in quality degradation - is some minor shimmering in places okay for a 10% performance boost?

The jury is always out on such features. The fact is that 16 bit math usage can (and did, actually, back in GeForce FX days) impact quality. Developers aren't gods and they do make mistakes more often than not. The lure of using FP/INT16 a bit more often to get ahead of competition in PC space should not be underestimated either.

onQ123 said:
I think it's more than RPM that's part of PS4 Pro but not Xbox One X like IWD & Primitive Shaders

IWD seems to be little more than a marketing buzzword and primitive shaders is a software (driver) construct able to run on any modern GPU. In fact, it's a solid bet that NV has them (well, not them but a similar s/w / driver optimization) running for some time now as they are somewhat related to current TBIR implementations.

onQ123 said:
PS4 Pro GPU seem to be basically Vega without HBM all the way down to the fact that it can swap apps in & out of the GDDR5 to the DDR3 memory.

Vega doesn't move programs anywhere, it uses system RAM as a slow memory pool for whatever resources a running 3D app requires. What Cerny speak of is basically an implementation detail of their NUMA OS scheduler. It's a CPU/OS thing.

onQ123 · Aug 21, 2017

dr_rus said:
How will you tell that you're seeing lower image quality due to 16 bit math being used if there's no way to compare that to 32 bits - on PS4Pro for example? Also where the line lies in quality degradation - is some minor shimmering in places okay for a 10% performance boost?

The jury is always out on such features. The fact is that 16 bit math usage can (and did, actually, back in GeForce FX days) impact quality. Developers aren't gods and they do make mistakes more often than not. The lure of using FP/INT16 a bit more often to get ahead of competition in PC space should not be underestimated either.

IWD seems to be little more than a marketing buzzword and primitive shaders is a software (driver) construct able to run on any modern GPU. In fact, it's a solid bet that NV has them (well, not them but a similar s/w / driver optimization) running for some time now as they are somewhat related to current TBIR implementations.

Vega doesn't move programs anywhere, it uses system RAM as a slow memory pool for whatever resources a running 3D app requires. What Cerny speak of is basically an implementation detail of their NUMA OS scheduler. It's a CPU/OS thing.

My point is that PS4 Pro share these features with Vega , I'm not sure what your opinion of what IWD & Primitive Shaders is has to do with that. How did you come to the conclusion that a hardware feature that hasn't been talked about much is little more than a buzzword?

Gitaroo said:
Still a huge gap between the pro and xbx. Memory is another thing I wish they would increase for higher res texture beside the lack of uhd bd drive.

Did you quote the wrong post or something?

But what do you consider a huge gap? Some people don't even think the PS4 vs PS4 Pro is a huge gap.

kungfuian · Aug 21, 2017

Cerny, it's possible to perform two 16-bit operations at the same time, instead of one 32-bit operation. In other words, with full floats, PS4 Pro has 4.2 teraflops of computational power. With half floats, it now has double that -- which is to say, 8.4 teraflops of computational power. As I'm sure you understand, this has the potential to radically increase the performance of games.

Cerny doesn't have a history of hyperbole or spouting marketing BS and the key word he uses here is radically. Not 'a little' or 'marginally' or any other adjective he could have used. He specifically said radically and if he says radically he probably means radically.

Gitaroo · Aug 21, 2017

onQ123 said:
My point is that PS4 Pro share these features with Vega , I'm not sure what your opinion of what IWD & Primitive Shaders is has to do with that. How did you come to the conclusion that a hardware feature that hasn't been talked about much is little more than a buzzword?

Did you quote the wrong post or something?

But what do you consider a huge gap? Some people don't even think the PS4 vs PS4 Pro is a huge gap.

I mean memory in general between pro and xbx, even with the additional .5 gigs they added, 4k frame buffers ui etc already eat up a huge chunk of memory. If they designed they pro for 4k even if its checkerboarding they need much more for higher quality texture. Texture is one of the most abvious things that stands out.

dr_rus · Aug 21, 2017

onQ123 said:
My point is that PS4 Pro share these features with Vega , I'm not sure what your opinion of what IWD & Primitive Shaders is has to do with that. How did you come to the conclusion that a hardware feature that hasn't been talked about much is little more than a buzzword?

As I've said, HBCC is most certainly not in PS4Pro - and it would be pretty much useless there since this is a fixed h/w platform anyway.

IWD being a meaningless buzzword comes from the fact that there's no noticeable advantage of Vega against Polaris in a typical gaming geometry load. It is likely just a part of the whole "primitive shader" s/w optimization - which seems to be disabled in Vega for now which explains the results against Polaris.

kungfuian said:
Cerny, ”it's possible to perform two 16-bit operations at the same time, instead of one 32-bit operation. In other words, with full floats, PS4 Pro has 4.2 teraflops of computational power. With half floats, it now has double that -- which is to say, 8.4 teraflops of computational power. As I'm sure you understand, this has the potential to radically increase the performance of games."

Cerny doesn't have a history of hyperbole or spouting marketing BS and the key word he uses here is radically. Not 'a little' or 'marginally' or any other adjective he could have used. He specifically said radically and if he says radically he probably means radically.

What's "radically"? Because it will be less than 2X in any case.

Support NeoGAF

Wolfenstein II and Far Cry 5 will support FP16 Rapid Packed Math

Banned

Banned

Banned

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Member

Member

Banned

Member

Member

Member

Member

Member

Nintendo

Member

Member

Member

Member

Member

Member

Member

Member

Banned

Member

Banned

Member

Member

Member

Member

Member

Member

Similar threads