• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Oxide: Nvidia GPU's do not support DX12 Asynchronous Compute/Shaders.

Sijil

Member
Buys 980Ti Friday, this news breaks today....

So it's time to bin it already. So the card will not be able to run anything on DX12 above 30FPS?

I don't think that's going to happen anytime soon.

There seem to be a lot of hyperbole surrounding one alpha test of single game engine. Doubt Nvidia with their mountain of cash would sit on their laurels and let AMD have an edge over them.
 

Arkanius

Member
There is this fantastic ELI5 explanation on a Reddit thread

Think of traffic flow moving from A->B.

NV GPUs: Has 1 road, with 1 lane for Cars (Graphics) and 32 lanes for Trucks (Compute).

But it cannot have both Cars and Trucks on the road at the same time. If the road is being used by Cars, Trucks have to wait in queue until all the Cars are cleared, then they can enter. This is the context switch that programmers refer to. It has a performance penalty.

AMD GCN GPUs: Has 1 Road (CP; Command Processor) with 1 lane for Cars & Trucks. Has an EXTRA 8 Roads (ACEs; Asynchronous Compute Engines) with 8 lanes each (64 total) for Trucks only.

So Cars and Truck can move freely, at the same time towards their destination, in parallel, asynchronously, Trucks through the ACEs, Cars through the CP. There is no context switch required.

NV's design is good for DX11, because DX11 can ONLY use 1 Road, period. GCN's ACEs are doing nothing in DX11, the extra roads are inaccessible/closed. DX12 opens all the roads.

https://www.reddit.com/r/pcgaming/comments/3j1916/get_your_popcorn_ready_nv_gpus_do_not_support/
 
I don't think that's going to happen anytime soon.

There seem to be a lot of hyperbole surrounding one alpha test of single game engine. Doubt Nvidia with their mountain of cash would sit on their laurels and let AMD have an edge over them.

Yep they'll just gameworks every popular DX12 game with their blood money
 
Err, the AMD APUs use GDDR5 as system RAM.

So? That doesn't changes the fact that both GPU and CPU have to battle with a single memory controller for memory access. That alone will leave real world memory bandwidth far away from theoretical figures, not even talking about how queues will hurt CPU performance every time you need to take some data out of the cache.

Also, 'physically', memory chips takes their time to be ready to write or read, losing several cycles each time they need to change state. So shared memory has it's own advantages, but a discrete buffer will be always better for raw performance.

It's not like industry is dumb enough to overlook this until Sony invented shared memory for a cheap performer console.



I believe that graph doesn't tell full story. I mean, when you're not VRAM limited on GPU everything should be fine as everything sits in high-bandwidth internal memory of GPU. However, when you run out of memory, you should see slower PCI becoming bottleneck.
Although in every case you'll see stuttering as PCi bandwidth is way lower than GDDR5/HBM bandwidth.

Well, if you run out of local RAM you will have to page, that will hurt performance alone just with the interface overhead. You shouldn't be using PCIe like that to start with.

perfrel.gif


With single top end cards setups, the difference between PCIe2 x8 (4GB/s) and PCIe3 x16 (15.8~) is barely a 5%, most of it coming from the improvements made to the interface efficiency, reducing overhead from 20% to a measly 2%. Sure, some BIG SLI/Cross configurations might get better figures on some benchmarks, since the traffic between GPUs increase wildly the bandwidth usage, but those are minor extreme situations.
 
Yes, i want real world example of it having meaningful performance advantage in rendering in games.
I don't understand why the advantage isn't immediately apparent to you. You know the thing that shows you what the next block is in Tetris? It helps you plan ahead and fill the space more efficiently, right? Now imagine you can peek eight blocks ahead, and choose any block you like to use next. That would probably make the game a lot easier, right? How exactly would you go about proving that? It's practically a different game now. It's not possible to play the old game in the new way, and if you play the new game but restrict yourself to the old rules, obviously you won't see any improvement. Hell, maybe you're dumb and the extra information and options just serve to confuse you.

So how are we to "prove" that having more queues is better? If a game is designed to use 16 queues managed by two schedulers, then no, running it on a 64-queue system with eight schedulers won't make much difference. If the game is designed to use 17 or more queues, then it won't run on the 16-queue system at all. Does that mean it's not a valid test, or does it prove the point?

It shows that You don't need hUMA to have proper CPU and GPU sync in gaming related scenarios.
No one ever claimed it was required. The claim is that it increases the cases where you're able to leverage GPGPU because it eliminates the copying. Your argument is a strawman.

Yes, i'm saying that its not necessary for gaming. At least it hasn't been proofed yet to be necessary, by any dev.
You understand this isn't like a clock bump that has some repeatable and easily measured effect, right? Async compute is a new and better tool for developers that allows them to make new and better games. But yes, like any tool, it is what the craftsman makes of it, and it doesn't help you much if you leave it rusting in the corner of your garage. People are using the technique, and they're both improving utilization and trimming execution time with it. What more do you want?

Sure, they don't, they are not supercharged enough.
Actually, it may be a wiring issue. Chafing can cause that sometimes.


It's interesting to see AMD's architecture work well with DX12, but that doesn't make the years of relatively terrible DX11 performance pay off.
Yes, let's criticize AMD because it took years for anyone to bother using their very useful tech. =/
 

bj00rn_

Banned
So, async isn't really important because we're gonna keep ignoring it a while longer?

Where the fuck did that come from? Why did you deliberately try to put words into my mouth? Weird.

The context of this thread is Maxwell 2 and its lack of async support in one early test from what I understood. Generally PC games' async future is somewhat dependent on DX12 uptake speed. Are there any games supporting it out on the market? So is DX12 practically relevant for consumers right now? Yes or no?
 

aeolist

Banned
Yes, let's criticize AMD because it took years for anyone to bother using their very useful tech. =/

no, let's criticize AMD for having shitty DX10/11 performance in their drivers and focusing almost entirely on features that nobody was even able to utilize. it's good that mantle helped drive the industry toward the low-level API future we're slowly realizing but it's pretty much the only thing they've done well that i can remember from the last few years. in practical, real-world terms they've been vastly inferior to the CPU and GPU competition in almost every way.
 
It's like I've travelled back in time to the RivaTnT debates ('who even uses 32 bit colour?'), we have one test point and the debate has somehow spun off into AMD/Nvidia willy waving (with consoles as a side dish, da fuq?). Right now we have one dev who is using async compute reporting anomalies in how NV uses async compute that has led them to conclude it actually works in serial rather than parrallel. This could be a driver fault, it could be silicon, given how aggressive NV's PR is I can't imagine they'll be letting this lie.

If all is as reported your 970/980/Titan is still an amazing card, you may suffer form a performance deficit vis a vis GCN if the title you want to play uses async in the fashion that AoS does. Given that you'll probably have swapped to a newer NV card by the time the AAA get around to rewriting their stuff for DX12 I wouldn't sweat it.

If this is confirmed to be a thing via multiple other tests then we can say NV have sacrificed the future for the present (and vice versa for AMD) until then this just an interesting data point.
 

Alej

Banned
Where the fuck did that come from? Why did you deliberately try to put words into my mouth? Weird.

The context of this thread is Maxwell 2 and its lack of async support in one early test from what I understood. Generally PC games' async future is somewhat dependent on DX12 support. So is DX12 practically relevant for consumers right now? Yes or no?

You have to remember that consoles have AMD APUs and don't need to wait for DX12 to be relevant anywhere.
Devs have the possibility right now to take advantage of this. Will they?

But what if asynchronous compute isn't that groundbreaking? There you would be right by saying it's not really important. Maybe you should argue about that and not about when DX12 will be relevant (will it?).
 

Arkanius

Member
It's like I've travelled back in time to the RivaTnT debates ('who even uses 32 bit colour?'), we have one test point and the debate has somehow spun off into AMD/Nvidia willy waving (with consoles as a side dish, da fuq?). Right now we have one dev who is using async compute reporting anomalies in how NV uses async compute that has led them to conclude it actually works in serial rather than parrallel. This could be a driver fault, it could be silicon, given how aggressive NV's PR is I can't imagine they'll be letting this lie.

If all is as reported your 970/980/Titan is still an amazing card, you may suffer form a performance deficit vis a vis GCN if the title you want to play uses async in the fashion that AoS does. Given that you'll probably have swapped to a newer NV card by the time the AAA get around to rewriting their stuff for DX12 I wouldn't sweat it.

If this is confirmed to be a thing via multiple other tests then we can say NV have sacrificed the future for the present (and vice versa for AMD) until then this just an interesting data point.

It just sucks for people that upgraded to a 980 Ti expecting it to last a few years into the future. (Since it was marked as DX12 compatible and with an higher DX feature level than AMD)

I'm waiting for Pascal myself, but I'm afraid Pascal might have the same uArch design as Maxwell but with HBM.
I would love for Nvidia to start investing in a single uArch instead of designing a new one every 2 years. GCN investment by AMD is starting to pay off. Whenever there is a gain in a GCN card, 2-3 years of cards reap the benefits.

But lets be real: 980 Ti destroys everything right now. Next year forward with DX12 games coming up, that might not be the case anymore.
 
no, let's criticize AMD for having shitty DX10/11 performance in their drivers and focusing almost entirely on features that nobody was even able to utilize. it's good that mantle helped drive the industry toward the low-level API future we're slowly realizing but it's pretty much the only thing they've done well that i can remember from the last few years. in practical, real-world terms they've been vastly inferior to the CPU and GPU competition in almost every way.
What? AMD has always been competitive with nVidia. Sure there have been performance differences, but most of the times all cards were priced competitively. Let's not over low this and call every AMD card from the 5870 on shitty.
 

kinggroin

Banned
I mean we absolutely KNOW this is a silicon limitation on NV's side? Or are current drivers written to better facilitate DX11 rendering and computing techniques, making parts of the silicon utilized incorrectly when running in DX12?
 

Crisium

Member
Some users at Beyond3D have tested the Async performance of both the 980 Ti and the 7870XT

980 Ti

pJqBBDS.png


7870XT

nu2NDtM.png


980 TI can't run both tasks in parallel, they are ran in a sequence order.
Apparently, this is a huge blow for VR performance (Not much into VR yet). It seems Nvidia have lag issues that create nausea due this latency of having to run both tasks in sequence.

I'd like to hear more about this. I have heard before that AMD is preferred for VR. But either a) this is suppressed knowledge or b) it is bogus information. Here on GAF, there are many Nvidia PCs built with VR in mind and no one ever says anything.

See the posts by zlatan:
http://forums.anandtech.com/showthread.php?t=2437630

Both SDK can work with any API. Most VR games will be based on DX12 and Vulkan. Gameworks VR works with API specific extensions, while LiquidVR works with a VR optimized Mantle version.

What's unique in LiquidVR is the latest data latch feature, and the timewarp soluiton.
A latest data latch allows more efficient head tracking, which is not possible with standard APIs. That's why they use Mantle.
The timewarpis also different in LiquidVR. GameWorks VR works with draw-level preemption, which is very inefficient with long draws, because it will delay the context switch. AMD use a compute based pipeline, which is more efficient, and there is an extension for Radeon R9 285/380/Fury, because these GPUs can support fine-grained preemption. This is the ultimate solution for VR, and NVidia told me they will switch to fine-grained preemption when their new architecture is ready.

For the best possible VR experience you should buy a GPU with fine-grained preemption support.
 
I mean we absolutely KNOW this is a silicon limitation on NV's side? Or are current drivers written to better facilitate DX11 rendering and computing techniques, making parts of the silicon utilized incorrectly when running in DX12?

NV has not said a thing adn the dev seems uncertain. So... who knows.
 
I don't think that's going to happen anytime soon.

There seem to be a lot of hyperbole surrounding one alpha test of single game engine. Doubt Nvidia with their mountain of cash would sit on their laurels and let AMD have an edge over them.

Not to mention that the 980Ti and its relatives are powerful enough to blow through games even without these benefits. Also, I'm sure it'll be a long time before DX12 games stop offering DX11 in their settings (hell, even GTA 5 still supports DX10), so it's not like there's a huge problem here even IF it's true.
 

dogen

Member
You have to remember that consoles have AMD APUs and don't need to wait for DX12 to be relevant anywhere.
Devs have the possibility right now to take advantage of this. Will they?

But what if asynchronous compute isn't that groundbreaking? There you would be right by saying it's not really important. Maybe you should argue about that and not about when DX12 will be relevant (will it?).

Sebbbi says it gives them huge gains.
https://forum.beyond3d.com/posts/1866842/
https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-7#post-1868894
 

Bastardo

Member
NV has not said a thing adn the dev seems uncertain. So... who knows.

Seeing the investments AMD made towards HSA, it is likely that current cards implement Async Compute in a better way.
Having said that: It will be a generation or two, before this feature is required for stable framerates, so don't worry everybody. Your top of the line cards will play current games fine.
 

Man

Member
There is this fantastic ELI5 explanation on a Reddit thread

Think of traffic flow moving from A->B.

NV GPUs: Has 1 road, with 1 lane for Cars (Graphics) and 32 lanes for Trucks (Compute).

But it cannot have both Cars and Trucks on the road at the same time. If the road is being used by Cars, Trucks have to wait in queue until all the Cars are cleared, then they can enter. This is the context switch that programmers refer to. It has a performance penalty.

AMD GCN GPUs: Has 1 Road (CP; Command Processor) with 1 lane for Cars & Trucks. Has an EXTRA 8 Roads (ACEs; Asynchronous Compute Engines) with 8 lanes each (64 total) for Trucks only.

So Cars and Truck can move freely, at the same time towards their destination, in parallel, asynchronously, Trucks through the ACEs, Cars through the CP. There is no context switch required.

NV's design is good for DX11, because DX11 can ONLY use 1 Road, period. GCN's ACEs are doing nothing in DX11, the extra roads are inaccessible/closed. DX12 opens all the roads.

https://www.reddit.com/r/pcgaming/comments/3j1916/get_your_popcorn_ready_nv_gpus_do_not_support/
Nice.
 
That was a pretty good explanation.


So? That doesn't changes the fact that both GPU and CPU have to battle with a single memory controller for memory access.
"Battle"? You've some evidence the memory controller can't handle the traffic from both chips? Seems like a lot of overhead is going to be reduced just by eliminating all the copy commands.

That alone will leave real world memory bandwidth far away from theoretical figures, not even talking about how queues will hurt CPU performance every time you need to take some data out of the cache.
Sorry, what does this mean?

Also, 'physically', memory chips takes their time to be ready to write or read, losing several cycles each time they need to change state. So shared memory has it's own advantages, but a discrete buffer will be always better for raw performance.
But again, shared memory will reduce the need for reads and writes, because it eliminates the shuffling.

It's not like industry is dumb enough to overlook this until Sony invented shared memory for a cheap performer console.
Err, shared memory has been around for a while, and using it like this is sorta AMD's thing. So, you're just opposed to this because you hate Sony? =/


Where the fuck did that come from? Why did you deliberately try to put words into my mouth? Weird.
Sorry, I was just making a joke, because it seemed like you were misplacing the blame a bit.

The context of this thread is Maxwell 2 and async from what I understood. Support in PC games is somewhat dependent on DX12 support. Is DX12 practically relevant for consumers right now? Yes or no?
I'm gathering that it isn't. What I don't understand is why AMD are to blame for that. People ignoring a technology doesn't serve as proof that it sucks. If anything, I would say that Mantle and DX12 serve as proof more people should've been paying attention to AMD for a while now.


no, let's criticize AMD for having shitty DX10/11 performance in their drivers and focusing almost entirely on features that nobody was even able to utilize. it's good that mantle helped drive the industry toward the low-level API future we're slowly realizing but it's pretty much the only thing they've done well that i can remember from the last few years. in practical, real-world terms they've been vastly inferior to the CPU and GPU competition in almost every way.
I don't follow PC stuff, but it seems like AMD created an advantageous tech, and it was never leveraged in DirectX, which is what everybody uses for everything. So eventually they came up with their own substitute, at which point MS said, "Oh, yeah, we do that too."

What should AMD have done differently? Go it alone from day one instead of hoping/expecting support for their new tech in DX? I realize DX12 is new, but GCN launched in 2011. Four years later, the rest of the industry finally starts to catch up and they're laughing at the "slackers" who wasted their time inventing stuff? =/
 

Herne

Member
No, the reason is because they are dirt cheap and desperate in terms of contracts. That is also why AMD is bleeding money despite ps4 selling well.

More likely Microsoft and Sony both worked with nVidia and both companies fell out with them. Also Nintendo has been working with AMD since they bought ArtX and for reasons of compatibility - as well as a long-term relationship in good standing - they have no reason to switch to nVidia. And their APU offerings, of course.
 

Azih

Member
It does make sense that AMD bet heavily on asynchrous computing and are only now reaping the benefits. Heck looking at it historically it seems like they created and pushed Mantle as a way of encouraging developers to use coding techniques that would show their GPUs in a better light. Didn't really gain any traction but Microsoft's DX12 has certainly put the whole thing into the spotlight.
 

diffusionx

Gold Member
I was planning to build a new PC this winter, with a 980ti. I had to push back the build until Spring for reasons, but after reading this I think it's the right decision. Actually, after the 970 deception, it'd be nice to give Nvidia the finger next time around.

Obviously it's not like tomorrow all games will go DX12 but I think it will be adopted much much faster than DX10/11 were. I am sure Nvidia will respond but it won't be until their next round of GPUs at the earliest.
 

laxu

Member
Err…

Right, because PCs are even more powerful than physics. ><

My comment was simply about consoles' lacking horsepower compared to PCs with their much faster CPU+GPU combos. Those are bound to be a limitation in a game engine faster than having separate RAM and VRAM on PC (which was the argument in the post I replied to).

I am not saying that unified memory architecture is in some way worse, it would be nice to see it on PCs but that's not going to happen anytime soon.
 

Zane

Member
So, no one has entertained the very distinct possibility that this one developer was either using a poor implementation of Async Compute or encountered a driver bug? Seems far more likely than Maxwell 2 not supporting the feature it claims to.
 

bj00rn_

Banned
Sorry, I was just making a joke, because it seemed like you were misplacing the blame a bit.

Ah crap, no problem. I'm really tired and perhaps in a bad mood so I kind of bluntly missed your joke :)

I'm gathering that it isn't. What I don't understand is why AMD are to blame for that. People ignoring a technology doesn't serve as proof that it sucks. If anything, I would say that Mantle and DX12 serve as proof more people should've been paying attention to AMD for a while now.

Why not, perhaps even their TDPs would be healthier if so..
 

tuxfool

Banned
So, no one has entertained the very distinct possibility that this one developer was either using a poor implementation of Async Compute or encountered a driver bug? Seems far more likely than Maxwell 2 not supporting the feature it claims to.

Yes, we did, back on the first page...
 

Arkanius

Member
So, no one has entertained the very distinct possibility that this one developer was either using a poor implementation of Async Compute or encountered a driver bug? Seems far more likely than Maxwell 2 not supporting the feature it claims to.

Doubt the Developer is lying about them trading emails with Nvidia and Nvidia saying for them to disable the feature all together since it introduces delay with their uArch.

Nvidia silence on this matter is also pretty telling to be honest. Last time Nvidia did this was when the 3.5 GB debate came up.
 

frontieruk

Member
So, no one has entertained the very distinct possibility that this one developer was either using a poor implementation of Async Compute or encountered a driver bug? Seems far more likely than Maxwell 2 not supporting the feature it claims to.

Could be driver, but in previous articles the Dev has going on record as saying their async code has passed AMD, Microsoft's and NVidias dx12 compilation tests so not the code, it could just be leveraging AMD strengths as it was a mantle test case as the starswarm demo.
 

dogen

Member
So, no one has entertained the very distinct possibility that this one developer was either using a poor implementation of Async Compute or encountered a driver bug? Seems far more likely than Maxwell 2 not supporting the feature it claims to.

Have you seen the results from the benchmark made by a dev on beyond3d?
 

mephixto

Banned
I just bought 10 sacks of corn.

Lets see how Nvidia will respond to this.

"We are focusing our efforts in bring you the best possible experience for the 99.999999999% of the games in the market also we are making progress to bring that support for the 0.00000001% games with DX12 and future releases.Thank you."
 

mephixto

Banned
Why people still the bring PS4 and X1? DX12 was in diapers when the GPUS of both consoles were already designed and complete. If you ever get some perfomance gains from this is gonna be 1-3fps at max.
 

dogen

Member
Why people still the bring PS4 and X1? DX12 was in diapers when the GPUS of both consoles were already designed and complete. If you ever get some perfomance gains from this is gonna be 1-3fps at max.

Nah, some console games in development are already seeing big gains.
 
I'm far from being a tech savvy guy, but this is not the first time we've heard that DX12 was boosting AMD's gpus performance whereas it didn't do much on NVidias.

Not trying to sound like a fanboy here, I love everysingle NV card I've got.

As I said, I believe this is not the first time this has been reported.
 
"Battle"? You've some evidence the memory controller can't handle the traffic from both chips? Seems like a lot of overhead is going to be reduced just by eliminating all the copy commands.

I'm talking about an obvious performance penalty from a single controller managing heterogeneus workloads.

Err, shared memory has been around for a while, and using it like this is sorta AMD's thing. So, you're just opposed to this because you hate Sony? =/

Thanks for enlightening me about shared memory.

Let's act as it hasen't been massively used on low end iGPUs for decades or phones instead of high performance stuff.

You can't be taken seriously when you talk about hUMA as an alternative to 68 GB/s for CPU and 336.5 GB/s for GPU instead of what it is, a solution for lower end products.

I'm gathering that it isn't. What I don't understand is why AMD are to blame for that. People ignoring a technology doesn't serve as proof that it sucks. If anything, I would say that Mantle and DX12 serve as proof more people should've been paying attention to AMD for a while now.

All AMD did was take the research was being done on DX12, release a proof of concept driver too buggy and lackluster for being used as an everyday driver and then stop all investment on it, cause it had no future since the beginning, extent they knew when they released all their PR bullshit about it.


tI don't follow PC stuff, but it seems like AMD created an advantageous tech, and it was never leveraged in DirectX, which is what everybody uses for everything. So eventually they came up with their own substitute, at which point MS said, "Oh, yeah, we do that too."

MS introduced DX12 later as a fully solid and useable API for a wide spectrum of hardware, that doesn't mean they coded it after Mantle. Just NOT at all.


What should AMD have done differently? Go it alone from day one instead of hoping/expecting support for their new tech in DX? I realize DX12 is new, but GCN launched in 2011. Four years later, the rest of the industry finally starts to catch up and they're laughing at the "slackers" who wasted their time inventing stuff? =/

Cause no one else invent useful stuff. Every advantage you can get from using Intel or Nvidia hardware is made of money, nothing else.

The good and evil narrative, so tiresome.
 

KKRT00

Member
So how are we to "prove" that having more queues is better? If a game is designed to use 16 queues managed by two schedulers, then no, running it on a 64-queue system with eight schedulers won't make much difference. If the game is designed to use 17 or more queues, then it won't run on the 16-queue system at all. Does that mean it's not a valid test, or does it prove the point?
Thats not how its working. Its automatic scheduling and more queues is just a resource. A resource that can be completely unnecesery in 90% cases and give slight advantage in 10% of cases. I dont know how You cant get that. Its the same as You would increase the wifth of memory bus, without changing the bandwith and expected the faster transfer.
Or putting 128 ROPs into PS4. It wont help. Its not a simple system, but everything depends of everything.
No performance comparisons in any tech paper for more than 1,5 years is not putting it in a good light either.


No one ever claimed it was required. The claim is that it increases the cases where you're able to leverage GPGPU because it eliminates the copying. Your argument is a strawman.
Thats not true. hUMA was pushed by many as something that give devs a tool to achieve things with compute that was been able before or have been done in very basic, like things i've mentioned.

I'm not going to bother quoting more, because for whole discussion You have not provided any real example for any of Your point or that would counter my points.
The only thing You can do, is to talk about generic theoretical advantages that were claimed by AMD and discussed here and beyond3d at large 1.5 years ago.

If You want to quote and engage with discussion with me, first dont call me stupid, second provide anything to support Your point. Something that is based on a research.
Because now discussion is similar to the 'coding to metal' theory.
 
Top Bottom