• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Oxide: Nvidia GPU's do not support DX12 Asynchronous Compute/Shaders.

https://www.reddit.com/r/AdvancedMi...ide_games_made_a_post_discussing_dx12/cul9auq

has this been posted yet? amd employee says maxwell 2 is incapable of async compute

Good find.

Oxide effectively summarized my thoughts on the matter. NVIDIA claims "full support" for DX12, but conveniently ignores that Maxwell is utterly incapable of performing asynchronous compute without heavy reliance on slow context switching.
GCN has supported async shading since its inception, and it did so because we hoped and expected that gaming would lean into these workloads heavily. Mantle, Vulkan and DX12 all do. The consoles do (with gusto). PC games are chock full of compute-driven effects.
If memory serves, GCN has higher FLOPS/mm2 than any other architecture, and GCN is once again showing its prowess when utilized with common-sense workloads that are appropriate for the design of the architecture.
 

KingSnake

The Birthday Skeleton
i don't see the gloom and doom here. the benchmarks show that current nvidia cards, which at the mid and high end are far more powerful than console GPUs, still perform perfectly well under dx12. they don't get the huge boost that AMD is seeing because AMD has been underperforming on dx11.

this isn't going to lead to a situation where the consoles "catch up" or anything like it. they started behind and are staying there.

Spot on. And while I get the enthusiasm of the PC AMD cards owners and the hope that finally the drivers won't hold the cards back anymore, I'm amazed about the PS4 owners gloating over this. I mean I'm sure the DirectX 12 will greatly improve the performance of the AMD APU in ... Oh, wait.
 

Arkanius

Member
Spot on. And while I get the enthusiasm of the PC AMD cards owners and the hope that finally the drivers won't hold the cards back anymore, I'm amazed about the PS4 owners gloating over this. I mean I'm sure the DirectX 12 will greatly improve the performance of the AMD APU in ... Oh, wait.

You know the PS4 uses an API equal to DX12 and Vulkan right?
 

JeffG

Member
I'm amazed about the PS4 owners gloating over this. I mean I'm sure the DirectX 12 will greatly improve the performance of the AMD APU in ... Oh, wait.
Regardless of what API a machine has, the more developers that use async compute, the more it will show up everywhere. (on machines that support it.)

async compute is not dependent on DX12
 
I'm talking about an obvious performance penalty from a single controller managing heterogeneus workloads.
Well, it's not obvious to me, so again, what performance penalty? Do you have any evidence the memory controllers in the APUs are being overloaded? Why would the be, assuming they're designed to handle those requests?

Thanks for enlightening me about shared memory.

Let's act as it hasen't been massively used on low end iGPUs for decades or phones instead of high performance stuff.

You can't be taken seriously when you talk about hUMA as an alternative to 68 GB/s for CPU and 336.5 GB/s for GPU instead of what it is, a solution for lower end products.
What are you even talking about? Nobody said any of that stuff.

The good and evil narrative, so tiresome.
Indeed, but I'm starting to wonder if it's all you have…


Thats not how its working. Its automatic scheduling and more queues is just a resource. A resource that can be completely unnecesery in 90% cases and give slight advantage in 10% of cases. I dont know how You cant get that. Its the same as You would increase the wifth of memory bus, without changing the bandwith and expected the faster transfer.
Or putting 128 ROPs into PS4. It wont help. Its not a simple system, but everything depends of everything.
No, that's not really what it's like. Like I said, it gives more flexibility to improve scheduling. Will that benefit every workload? Obviously not, but it will improve some, particularly the larger and/or more complex loads. But like I said, it's just a tool, so the benefit gained will vary from developer to developer and from project to project.

No performance comparisons in any tech paper for more than 1,5 years is not putting it in a good light either.
We've seen the benefits. Why do you continue to pretend there's no evidence for it? ><

Thats not true. hUMA was pushed by many as something that give devs a tool to achieve things with compute that was been able before or have been done in very basic, like things i've mentioned.
Sorry, I'm not sure what you're trying to say here. Are you saying that even without hUMA, you can sometimes perform some of the same techniques to a more limited degree? Again, that was never disputed. The argument is that hUMA makes these techniques easier to leverage by reducing their overhead, meaning you can use them in more places, and to greater benefit.

I'm not going to bother quoting more, because for whole discussion You have not provided any real example for any of Your point or that would counter my points.
I have, but you continue to pretend I haven't, while ignoring my requests that you explain what it is you're looking for. It's starting to seem that the only thing that will satisfy you is a hen's tooth.

If You want to quote and engage with discussion with me, first don't call me stupid, second provide anything to support Your point. Something that is based on a research.
What? I never called you stupid. =/
 

JeffG

Member
That's not the point. Is that API just launching now?

The PS4 api has been there since day 1.

Developers tend to code to the lowest common denominator (usually due to budget/time constraints)

The easier it is for them to adopt "best practices", the more you will see them implemented.

The API is the details of the implementation, the first step is for the developers to re-structure the code to use the new features appropriately.
 

KingSnake

The Birthday Skeleton
The PS4 api has been there since day 1.

Developers tend to code to the lowest common denominator (usually due to budget/time constraints)

The easier it is for them to adopt "best practices", the more you will see them implemented.

The API is the details of the implementation, the first step is for the developers to re-structure the code to use the new features appropriately.

There were examples given in this thread of games that already use async compute, so it's not like if the devs wanted to use that didn't. I agree though that more games using it might mean more games optimised for consoles too.
But 1st party developers had access to it from the start. So again. There's no reason for gloating, the performance is what it is there already.
 

Xbudz

Member
I hope I didn't make a mistake with my recent GTX 970 purchase.
Seems good so far, but will the AMD cards have the edge in the future?
 
I hope I didn't make a mistake with my recent GTX 970 purchase.
Seems good so far, but will the AMD cards have the edge in the future?

If a game made tons of async compute stuff and completely ignored NV specific stuff, then yes, an AMD card could do better in a game.

That is... assuming now other performance /graphical features favour NV hardware.
 

So in the end, its all about:

But+if+tali+dies+you+never+get+to+hear+of+_b6cb02d1402cfee11e91ff5014f79fcb.jpg
 

dogen

Member
There were examples given in this thread of games that already use async compute, so it's not like if the devs wanted to use that didn't. I agree though that more games using it might mean more games optimised for consoles too.
But 1st party developers had access to it from the start. So again. There's no reason for gloating, the performance is what it is there already.

Nope, mostly early ps4 games didn't use it or barely used it at all.
 

KingSnake

The Birthday Skeleton
As far as I know, yes.

Why would that be true? It's their API. Why wouldn't they be using it? I can't buy that, sorry.

Edit: if the post above is true, what would push the 1st party developers to use it in the future? It seems to be not such an easy win if only Infamous used it.
 

dogen

Member
Why would that be true? It's their API. Why wouldn't they be using it? I can't buy that, sorry.

Edit: if the post above is true, what would push the 1st party developers to use it in the future?

Because it takes time, and lots of those games started development before the specs were finalized.
 

ZOONAMI

Junior Member
Why would that be true? It's their API. Why wouldn't they be using it? I can't buy that, sorry.

Edit: if the post above is true, what would push the 1st party developers to use it in the future? It seems to be not such an easy win if only Infamous used it.

Optimizing games for pc and xbone using dx12 will push using async, which will translate to async being used in more ps4 ports as well. As for why 1st party devs haven't been doing it in the past, probably because it's still relatively new, and their experience in building pc games under dx11 never used async.

So it just isn't something devs are used to, but it will become commonplace as dx12 takes over on pc and xbone.
 

KingSnake

The Birthday Skeleton
Not sure what your point is.

That the excuse of early start of development doesn't hold against the examples. We are already one year and a half later compared to Infamous and according to your statement most of the games don't use the new revolutionary thing. On top, Infamous SS and Battlefield 4 are not some top models in terms of graphics and performance. So the question still remain, what will push the developers to use it more?

So it just isn't something devs are used to, but it will become commonplace as dx12 takes over on pc and xbone.

So considering the development cycles, with some luck it will become mainstream in the last year of life of the current consoles.
 

ZOONAMI

Junior Member
That the excuse of early start of development doesn't hold against the examples. We are already one year and a half later compared to Infamous and according to your statement most of the games don't use the new revolutionary thing. On top, Infamous SS and Battlefield 4 are not some top models in terms of graphics and performance. So the question still remain, what will push the developers to use it more?

Optimizing games for pc and xbone using dx12 will push using async, which will translate to async being used in more ps4 ports as well. As for why 1st party devs haven't been doing it in the past, probably because it's still relatively new, and their experience in building pc games under dx11 never used async.

So it just isn't something devs are used to, but it will become commonplace as dx12 takes over on pc and xbone.

This.
 

ZOONAMI

Junior Member

What?

As async becomes industry standard, 1st party ps4 devs will follow suite to achieve more performance. Why is it a bad thing if we haven't seen everything the ps4 can do?

Edit oh I didn't see your edit. Yeah that might be the case. But i think we will more and more games using it starting next year.
 

dogen

Member
That the excuse of early start of development doesn't hold against the examples. We are already one year and a half later compared to Infamous and according to your statement most of the games don't use the new revolutionary thing. On top, Infamous SS and Battlefield 4 are not some top models in terms of graphics and performance. So the question still remain, what will push the developers to use it more?

Well those games probably use it lightly, like ashes does.

I expect the next trials game, for example, to be a good showcase. According to the senior graphics programmer they're getting more than 30% more performance with it.
 
MDolenc at Beyond3D did a small benchmark.

It throws some interesting figures:

ac_980ti_vs_fury_x.png


- As you can see, Maxwell is up to 5 times faster if you dont fill all its 31 queues, each time you do that you incur on a time penalty.

- GCN 1.1 can eats up to 128 chunks without penalty, but starting at a much slower speed that will slow down rendering pipeline even at the lightest compute load.

So, at the end of the day, Nvidia will still be faster rendering beatiful graphics with not too stressful compute loads, and GCN will be better at not that hot graphics but with massive compute loads.

Faster but narrower vs slower but wider. Fancy ports car vs ugly truck, same story as ever.

tl:dr

Light async load:
Maxwell>GCN

Heavy async load:

GCN>Maxwell



This thing might be useful just on PS4 thanks to it having a shitty CPU and spare ACEs, but not that much on PC (cause of more resources) or XOne (cause of an already exceeded GPU)
 

vpance

Member
Infamous only used it for their particle system IIRC. Clearly there's a distinction between using it and USING it.
 

ZOONAMI

Junior Member
MDolenc at Beyond3D did a small benchmark.

It throws some interesting figures:

ac_980ti_vs_fury_x.png


- As you can see, Maxwell is up to 5 times faster if you dont fill all its 31 queues, each time you do that you incur on a time penalty.

- GCN 1.1 can eats up to 128 chunks without penalty, but starting at a much slower speed that will slow down rendering pipeline even at the lightest compute load.

So, at the end of the day, Nvidia will still be faster rendering beatiful graphics with not too stressful compute loads, and GCN will be better at not that hot graphics but with massive compute loads.

Faster but narrower vs slower but wider. Fancy ports car vs ugly truck, same story as ever.

tl:dr

Light async load:
Maxwell>GCN

Heavy async load:

GCN>Maxwell



This thing might be useful just on PS4 thanks to it having a shitty CPU and spare ACEs, but not that much on PC (cause of more resources) or XOne (cause of an already exceeded GPU)

Why does AMD seem to do better at higher resolutions and nvidia better at 1080p? What are beautiful graphics? Gameworks effects that hit performance so hard you need an sli set up to run them at high resolutions?
 

Blanquito

Member
MDolenc at Beyond3D did a small benchmark.

It throws some interesting figures:

ac_980ti_vs_fury_x.png


- As you can see, Maxwell is up to 5 times faster if you dont fill all its 31 queues, each time you do that you incur on a time penalty.

- GCN 1.1 can eats up to 128 chunks without penalty, but starting at a much slower speed that will slow down rendering pipeline even at the lightest compute load.

So, at the end of the day, Nvidia will still be faster rendering beatiful graphics with not too stressful compute loads, and GCN will be better at not that hot graphics but with massive compute loads.

Faster but narrower vs slower but wider. Fancy ports car vs ugly truck, same story as ever.

tl:dr

Light async load:
Maxwell>GCN

Heavy async load:

GCN>Maxwell



This thing might be useful just on PS4 thanks to it having a shitty CPU and spare ACEs, but not that much on PC (cause of more resources) or XOne (cause of an already exceeded GPU)

Following the conversation, we may want to wait for an updated version of the benchmark.

[Fake edit] What dogen said. Looks like seppi is providing feedback on what to do and what to look for to get good data out of it. [/Fake edit]
 
But 1st party developers had access to it from the start. So again. There's no reason for gloating, the performance is what it is there already.
Well, not really. Again, this is just a tool, and it takes skill to use a tool well. Skill typically comes through practice, so really, Sony 1st-party just has more practice than anyone else. Dating back to the PS3, actually, because similar job packaging and scheduling was done for the SPUs.

So considering the development cycles, with some luck it will become mainstream in the last year of life of the current consoles.
So, improving throughout this generation and commonplace in the next generation. And that's a problem because …?

On top, Infamous SS and Battlefield 4 are not some top models in terms of graphics and performance. So the question still remain, what will push the developers to use it more?
Oh, you're just trolling? heh You got me.


Who's that?
 
Disappointing, but not surprising. This is Nvidia efficiently investing in having the best DX11 cards, while also pressuring consumers to upgrade for DX12. It's a win-win for them.
 

tuxfool

Banned
The plot thickens..

So in the end, its all about:

But+if+tali+dies+you+never+get+to+hear+of+_b6cb02d1402cfee11e91ff5014f79fcb.jpg


That post is wrong about Async Timewarp.

Go back a few pages and see the Nvidia presentation on GameworksVR posted by Locuza. They're preempting other operations in the graphics pipe in order to do Async Timewarp, they're not using compute at all for this.
 

DonasaurusRex

Online Ho Champ
Some one needs to break this down for me, what does this mean for DX12 games going forward, are we going to see a trend of AMD cards performing more efficiently?

no both cards will do well with DX12 that much is clear its just the asyc compute feature that Maxwell doesn't have cannot take advantage of something in DX12. I have no idea if Pascal has the feature, and furthermore DX12 has more performance features than just async. From what i hear pascal is a monster anyway nothing to worry about here, but the good news is GPU's being used to a fuller extent and better performance for customers new and old. The gain by the 290X is wonderful news, that is the bump you're looking for great news.
 

Sijil

Member
Look, after reading what the EVGA guy wrote on Reddit and all that back and forth the only thing that I can say is time will tell. If AMD one ups NVIDIA then good on them and I guess that will give Nvidia the incentive to do better.

In the meantime, people who are looking to get rid of their 980ti's, how much?
 
That post is wrong about Async Timewarp.

Go back a few pages and see the Nvidia presentation on GameworksVR posted by Locuza. They're preempting other operations in the graphics pipe in order to do Async Timewarp, they're not using compute at all for this.

Yup async timewarp != async compute but the other points in that reddit post seemed reasonable enough, the games that show the biggest improvements from DX12 thus far are RTS titles that have huge numbers of drawcalls which most other genres don't really need. Whether the apparent weakness of NV at 'true' async compute with DX12 is seen in titles over the next 2-3 years is still an open to question to my mind.
 

dr_rus

Member
Regardless of what the Oxide guy says, I don't see PC GPU Asynchronous Compute support in UE4 doc search. (someone correct me plox). The rash conclusion I'm jumping to is that Epic may have put this on backburner with view of Nvidia's overwhelming market share.

Oh well if by the off-chance Maxwell doesn't implement it, Pascal is more than guaranteed to have it so it's just another year away.
So I'll have to say it again it seems: ANY DX12 code support Asynchronous Compute IF it has at least one compute job queue in addition to the obvious graphics job queue. You don't have to do anything more that have two shaders in your code - one for graphics and one for compute. API+driver will detect this and will launch them either in parallel or serially depending on the h/w capabilities.

There seems to be a lot of misinformation around the topic, partly because a lot of info coming from PS4 which has a different API and a completely different h/w profile and optimization guidelines.


Except it's also completely misleading. All execution on modern vec1 SIMDs is done in a serial fashion, so there is no "8 roads with 8 lanes for trucks which can be used to move freely" but there is 8 roads with 8 lanes which are waiting to be picked from to the execution pipeline. The more you have - the higher efficiency of the execution you may achieve.

I've heard about Maxwell's queue granularity being worse than that of GCN before. The question is is it so bad that it's completely unusable or are we looking at cases in which GCN will be better and cases where Maxwell will be better? Another question is does Maxwell even need async compute to keep its utilization in DX12 at peak?
 

frontieruk

Member
So I'll have to say it again it seems: ANY DX12 code support Asynchronous Compute IF it has at least one compute job queue in addition to the obvious graphics job queue. You don't have to do anything more that have two shaders in your code - one for graphics and one for compute. API+driver will detect this and will launch them either in parallel or serially depending on the h/w capabilities.

There seems to be a lot of misinformation around the topic, partly because a lot of info coming from PS4 which has a different API and a completely different h/w profile and optimization guidelines.



Except it's also completely misleading. All execution on modern vec1 SIMDs is done in a serial fashion, so there is no "8 roads with 8 lanes for trucks which can be used to move freely" but there is 8 roads with 8 lanes which are waiting to be picked from to the execution pipeline. The more you have - the higher efficiency of the execution you may achieve.

I've heard about Maxwell's queue granularity being worse than that of GCN before. The question is is it so bad that it's completely unusable or are we looking at cases in which GCN will be better and cases where Maxwell will be better? Another question is does Maxwell even need async compute to keep its utilization in DX12 at peak?

My take on it ATM, keep your 980ti's, async is probably going to be used for improved lighting and AA which aren't going to over load draw calls, so you'll see nvidia ahead in most games as AMD will suffer from latency of getting workloads into the queues where NV is quicker at low levels of async compute. If only we could get benches from the fable beta :( to confirm this
 

Naminator

Banned
Buys 980Ti Friday, this news breaks today....

gmGW5tf.gif


So it's time to bin it already. So the card will not be able to run anything on DX12 above 30FPS?

Yeah you and everyone else in this thread who are ready to throw out a 980Ti and replace it with a R9 290(because apparently now a R9 290 is equivalent to a 980Ti)

Go ahead and do it, you're making the right decision, because all benchmarks show that the 980Ti is just inferior to every AMD GCN 1.2 ever made.

http://www.extremetech.com/gaming/2...he-singularity-amd-and-nvidia-go-head-to-head
 
My take on it ATM, keep your 980ti's, async is probably going to be used for improved lighting and AA which aren't going to over load draw calls, so you'll see nvidia ahead in most games as AMD will suffer from latency of getting workloads into the queues where NV is quicker at low levels of async compute. If only we could get benches from the fable beta :( to confirm this

tumblr_lphkeyClDW1qaboh9o4_250.gif
 
My take on it ATM, keep your 980ti's, async is probably going to be used for improved lighting and AA which aren't going to over load draw calls, so you'll see nvidia ahead in most games as AMD will suffer from latency of getting workloads into the queues where NV is quicker at low levels of async compute. If only we could get benches from the fable beta :( to confirm this

Fable beta would definiely be interesting. A shame it is locked behind NDA atm.

BTW, earlier you .gif posted my mentioning of Crysis 2 concerning tesselation but failed to respond. The tessellation myth in Crysis 2 was disproved years ago just so you know. By Cryengine devs themselves among many others. A shame that silly wccftech report taking pictures and videos in debug mode came out... they just id not understand how cryengine works...
 

tuxfool

Banned

tuxfool

Banned
Fable beta would definiely be interesting. A shame it is locked behind NDA atm..

I don't think so. Fable Legends uses UE4 which seemingly isn't doing anything interesting in this area atm.

It is only interesting as a test of a different dx12 workload.
 
Top Bottom