• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Oxide: Nvidia GPU's do not support DX12 Asynchronous Compute/Shaders.

What the fuck am i reading? What is this benchmark stress testing exactly?

The speed at which Nvidia Maxwell/2 GPUs and AMD GCN GPUs are completing graphics and compute tasks, which can effectively unveil whether each architecture can handle both tasks asynchronously for better performance.
 
What the fuck am i reading? What is this benchmark stress testing exactly?
Basically, it's measuring/detecting the systems' ability to do what Cerny referred to as "fine-grained compute," which I explained here. AMD_Robert is referring to the bars in this visualizer.

Since the idea behind fine-grained compute is getting extra work done "for free" by hanging compute jobs on to CUs left idle by the normal rendering process, we should see a lot of overlap when running rendering and compute jobs simultaneously. The AMD cards display significant overlap, but the NV cards display little or no overlap, and running the compute and rendering jobs "concurrently" can often actually be slower than running them consecutively. So it seems that not only are NV not actually doing fine-grained compute at all, the need to actively flip between job types is actually penalizing them.

As sort of a side note, Cerny draws a distinction between being asynchronous and being fine grained, and perhaps NV are drawing the same distinction. In programming terms, asynchronous means "not synchronized," which basically means that the parent process isn't tied up while waiting for results from the child. As long as a call to the GPU doesn't block the CPU, NV can technically claim their solution is asynchronous, even if it isn't fine grained.
 
Another reading is GCN async works like hyperthreading, injecting compute tasks directly into the idling stages of the rendering pipeline. GCN is way less efficient than Maxwell for rendering purpouses, with more dumb stream processors than Nvidia cards. This way you can reduce the massive dark silicon on GCN cards, unused resources that aren't there to start with on Nvidia, since the Maxwell rendering pipeline is already maxed.

Again, good for consoles with crappy CPUs, but too much hype for a feature that isn't even on the main DX12 feature set.
 

Vinland

Banned
Another reading is GCN async works like hyperthreading, injecting compute tasks directly into the idling stages of the rendering pipeline. GCN is way less efficient than Maxwell for rendering purpouses, with more dumb stream processors than Nvidia cards. This way you can reduce the massive dark silicon on GCN cards, unused resources that aren't there to start with on Nvidia, since the Maxwell rendering pipeline is already maxed.

Again, good for consoles with crappy CPUs, but too much hype for a feature that isn't even on the main DX12 feature set.

Another reading is GCN works by removing the handicap of a platform that is designed to execute a variety of workloads and maximize efficiency of highly parallelized tasks. GCN is more generalized so that the graphics pipeline can be repurposed for different types of work not just pixel generation at the same time. Nvidia has optimized their architecture for power gating efficiency and dx11 performance for gaming or scientific computing, not both at the same time because the market demand for that was high.

This new execution model plays out well for consoles where the economic viability is primary concern for a mass produced sku and it also works out well for those wanting to use dx12 for more than gaming. Time will tell if PC game developers will need to adopt this paradigm shift.

see how easy it is to be positive and get a point across ;)
 
Another reading is GCN async works like hyperthreading, injecting compute tasks directly into the idling stages of the rendering pipeline. GCN is way less efficient than Maxwell for rendering purpouses, with more dumb stream processors than Nvidia cards. This way you can reduce the massive dark silicon on GCN cards, unused resources that aren't there to start with on Nvidia, since the Maxwell rendering pipeline is already maxed.
That's a funny way of saying this helps the GCN cards to leverage their additional processing capabilities. So you're saying that even if fine-grained compute was working on the NV cards, it wouldn't help very much? :p

Again, good for consoles with crappy CPUs, but too much hype for a feature that isn't even on the main DX12 feature set.
Geez. You seem kinda salty. This is good for all CPUs.
 

mulac

Member
Honestly is this really a big deal? All nVidia need to do in the next round of 'x' cards is add this architecture in, yeah?

My 980GTX and 970GTX's on seperate systems run beautifully on current setup and will be future proofed for at least the next 2 years of gaming...

Big issue over nothing or am I missing something?
 

tuxfool

Banned
Honestly is this really a big deal? All nVidia need to do in the next round of 'x' cards is add this architecture in, yeah?

My 980GTX and 970GTX's on seperate systems run beautifully on current setup and will be future proofed for at least the next 2 years of gaming...

Big issue over nothing or am I missing something?

Well...Given that the Pascal GPUs are already being taped out (or are in planning), such a drastic change won't happen (if it isn't in Pascal) until Pascal+1.
 
Honestly is this really a big deal? All nVidia need to do in the next round of 'x' cards is add this architecture in, yeah?

My 980GTX and 970GTX's on seperate systems run beautifully on current setup and will be future proofed for at least the next 2 years of gaming...

Big issue over nothing or am I missing something?

Its a huge deal. It requires several parts of the architecture being completely redesigned from the ground up. This is highly unlikely to come with pascal. if 980ti drops to 290x levels of performance in an array of dx12 games its going to be chaos. Amds architecture seems to be much more forward looking.
 
This has gotten so damn confusing. Nvidia isn't suppose to do Async Compute and yet the results are showing that Nvidia, even without utilizing this feature, still manages to somehow get the work done in what appears a significantly faster time than the AMD cards that are taking advantage of the feature. So what gives?

Is it because this test is possibly so specific and limited in scope that it's incapable of showcasing how the Nvidia GPU, apparently not being able to do Async Compute (if that's truly the case, because I honestly don't know at this point), is meant to be a disadvantage? Or perhaps more specifically it's only a problem in games that are coded to take advantage of what appears to be an AMD hardware only feature at this present time. I guess the best way to look at this thing is that it isn't meant to showcase the benefits of Async Compute, but is instead designed to test whether or not the feature is actually present in the hardware. The ultimate point of the test isn't a race to see which GPU finishes first. Is anyone else getting that reading from these results?

Or perhaps the Nvidia results are actually meaningful and this would mean that as long as developers take the Nvidia GPU's specific needs into consideration, asynchronous compute's absence may not be such a bad thing after all? But of course this is all assuming that the situation isn't a far greater concern for Nvidia when running something a lot more complex & demanding, such as a full fledged DX12 gaming application. Maybe it's under such conditions that Nvidia GPUs won't be able to entirely get away with not having the feature. So at the end of the day we still need to see more tests and more games to know what any of this means.
 

Macrotus

Member
It means that it is not working like it should, yes.
But it also seems like thats more a driver problem.

---

I really doubt it, it more looks like something is fundamentally broken, probably with drivers.

Thx!
I guess I'll stay with Windows 8.1 for now.
 

Irobot82

Member
Its a huge deal. It requires several parts of the architecture being completely redesigned from the ground up. This is highly unlikely to come with pascal. if 980ti drops to 290x levels of performance in an array of dx12 games its going to be chaos. Amds architecture seems to be much more forward looking.

I remember seeing slides saying Pascal was going to be compute heavy, I would assume that also meants async.
 
For anyone that bought a 980ti, you're playing in high end territory now. You should have known you were gonna be obsolete really quickly. That's just how it works. FWIW, my last two cards were 780ti and currently 980ti. I pour hundreds of dollars down the drain because it's fun.

I can play MGSV: The Phantom Pain pretty painlessly in 4K, I know what I was getting into when I got my 980 Ti and I have no regrets.

If Pascal comes out next year and powerbombs my 980 Ti into the ground doing 4K then I'll buy a video card next year. Traditionally I've done 2 year upgrade cycles on cards but HBM2 generation seems to be a big deal. $650 is pretty steep for a 1-year gap filler but I'll manage. You can't take the money with you after you die anyways so fuck it, might as well spend it.
 

FLAguy954

Junior Member
More fuel to the fire!

Maxwell cards are now also crashing out of the benchmark as they spend >3000ms trying to compute one of the workloads.



https://www.reddit.com/r/pcgaming/comments/3j87qg/nvidias_maxwell_gpus_can_do_dx12_async_shading/

And to add some more fuel, a couple of upcoming games use async compute as well (courtesy of Mahigan from OCN):

Mahigan from OCN said:
If this is all true, this is why Asynchronous Compute matters:


Mirror's Edge Catalyst will be released on February 23, 2016 for Xbox One, PS4, and PC.
Read more: http://www.vcpost.com/articles/8717...ion-technologies-glass-city.htm#ixzz3kSkxnueB


Rise of the Tomb Raider Q1 2016
Read more: http://gearnuke.com/rise-of-the-tom...breathtaking-volumetric-lighting-on-xbox-one/


Deus Ex: Mankind Divided Q1 2016
Read more: http://gearnuke.com/deus-ex-mankind-divided-use-async-compute-enhance-pure-hair-simulation/


Just three titles on the way. That's without mentioning Fable Legends and others...


That's why this is a very big deal because Pascal won't arrive before, early estimates, Q2 2016.
 

Renekton

Member
Honestly is this really a big deal? All nVidia need to do in the next round of 'x' cards is add this architecture in, yeah?

My 980GTX and 970GTX's on seperate systems run beautifully on current setup and will be future proofed for at least the next 2 years of gaming...

Big issue over nothing or am I missing something?
When Witcher 3 (a flagship Nvidia showcase) performed uncharacteristically poorly on 780Ti, and now this, you can tell Nvidia intended faster replacement rate for its GPUs.
 

W!CK!D

Banned
This has gotten so damn confusing.

It really hasn't. People just don't want to believe what they see here and try to overcomplicate it. The result of this benchmark is:

Nvidia: time total = time compute + time graphic
AMD: time total = max (time compute, time graphic)

This means that Nvidia GPUs are not doing async compute in this benchmark. Period.

Async compute allows the developer to use GPGPU without negatively affecting rendering performance. Mark Cerny was, surprise, surprise, absolutely right from the beginning.
 
This has gotten so damn confusing. Nvidia isn't suppose to do Async Compute and yet the results are showing that Nvidia, even without utilizing this feature, still manages to somehow get the work done in what appears a significantly faster time than the AMD cards that are taking advantage of the feature. So what gives?
Something is weird about some of the compute timings on the AMD cards, but no one seems to be sure what's causing it. Basically, there's some minimum — and lengthy — amount of time that passes before the benchmark starts getting results back. Like, 40-50ms. AMD_Robert doesn't understand why, because he's working with stuff that has 10ms timing.

To me, it sorta sounds like the results are being delivered via some kind of lazy updating system. Like, maybe the results are actually available before that, but there's some system that comes by every 50ms and says, "You can get those results any time, you know." GCN has a lot of special stuff for keeping caches coherent and that sort of thing, so maybe that stuff isn't being used correctly or at all in this benchmark. Of course, I really have no idea what I'm talking about. :p

I guess the best way to look at this thing is that it isn't meant to showcase the benefits of Async Compute, but is instead designed to test whether or not the feature is actually present in the hardware. The ultimate point of the test isn't a race to see which GPU finishes first. Is anyone else getting that reading from these results?
Pretty much. Sounds like he designed this test specifically to see whether doing rendering and compute together was actually faster than doing them separately. It's not faster on the Nvidia cards, which indicates they're not doing fine-grained compute. To make things worse, the delay in switching job types means that if you need to do it a lot, the final result can actually take longer than normal. Meanwhile, the AMD cards see significant gains when running jobs concurrently, so that shows they are doing fine-grain correctly.
 

Bastardo

Member
Maxwell cards are now also crashing out of the benchmark as they spend >3000ms trying to compute one of the workloads.

https://www.reddit.com/r/pcgaming/comments/3j87qg/nvidias_maxwell_gpus_can_do_dx12_async_shading/

In my personal experience (only a few cards), nvidia has had worse drivers for all compute based frameworks except for CUDA, which was always supported from day 1. I've had significant problems and slow execution using OpenCL with initial drivers, which were patched out one after the other with new driver releases. Therefore I wouldn't account this specific 3000ms lag to the architecture but rather still to driver bugs. OpenCL has a similar maximum kernel runtime btw., after which the driver kills the tasks.
 

dr_rus

Member
So, looking at those results... it appears the Nvidia card isn't showing async compute, correct?
Maxwell 2 cards aren't showing any performance wins from running compute asynchronously in this benchmark. They still run the code 2-5 times faster than their GCN counterparts though. Doesn't mean anything else right now.

It's sounding like referring to NV's approach as "granular" at all may be a bit generous.
There's nothing generous in saying that the different behaviour may be because of pre-emption granularity difference.

Well, obviously a broken implementation isn't gong to help much, but that doesn't imply they wouldn't benefit from a proper one.
How do you know that it's "broken"? Are all i5s "broken" compared to i7s since they don't have HT enabled? What about if we look at gaming workloads specifically?

Well, then it's a good thing no one is claiming that. Again, this is just a tool, and as such the results will depend on the project in question, the skill of the developer in using the tool, and as these tests are showing, the quality of the tool itself.
A lot of people around here are claiming that and AMD seems to be claiming that as well.

It also says, "This is a good way to utilize unused GPU resources."
If there are such resources which is totally dependent on the GPU's architecture.

That's pretty much the opposite of what it says in the OP. "Ashes uses a modest amount of [Async Compute], which gave us a noticeable perf improvement."
"Noticeable" can mean anything. They aren't giving an exact figure which makes me doubt that it's anything to brag about.

Frankly, this is starting to sound like concern trolling. The fact that its utility varies does not diminish the technique in any way. It's a useful technique.
It is a useful technique on some architectures, it is not on the others and there is no direct indication of not being able to coupe just fine without it in DX12.

It really hasn't. People just don't want to believe what they see here and try to overcomplicate it. The result of this benchmark is:

Nvidia: time total = time compute + time graphic
AMD: time total = max (time compute, time graphic)

This means that Nvidia GPUs are not doing async compute in this benchmark. Period.

Async compute allows the developer to use GPGPU without negatively affecting rendering performance. Mark Cerny was, surprise, surprise, absolutely right from the beginning.
Right, and the results of Kepler cards which are showing total time as even less than compute+graphics are what exactly? You also seems to miss the part where Maxwell cards are several times faster than GCN ones even doing compute+graphics serially.

There is no clear understanding of what's going on in this benchmark right now so don't try to oversimplificate what we're looking at.
 
Again, good for consoles with crappy CPUs, but too much hype for a feature that isn't even on the main DX12 feature set.

Man, you gotta keep slinging that dig.

I've not spent much time in PC threads - but I'm trying to catch up a little bit after getting my new build - but seriously, posters like you are the worst kind of fanboys.

And the biggest surprise? You actually have graphic card fanboys! what a fucking mess.

Edit: The 'Build a PC' thread is excellent, those guys in there are brilliant, so helpful and knowledgeable. That is a damn good thread.
 

nib95

Banned
Man, you gotta keep slinging that dig.

I've not spent much time in PC threads - but I'm trying to catch up a little bit after getting my new build - but seriously, posters like you are the worst kind of fanboys.

And the biggest surprise? You actually have graphic card fanboys! what a fucking mess.

It's pretty amusing. Been watching him talk nonsense the entire thread, and be corrected by people over and over. His narrative has gone from this all being untrue, and that the said Nvidia cards have been able to do this exact same type of async compute all along (despite the evidence, benchmarks, and everything else to the contrary), to then changing to push this notion that it's not even important except for consoles with a terrible cpu's lol.
 

KKRT00

Member
I wouldnt interpret beyond3d's benchmark yet, it needs a lot of tweaking.

Why? My GTX 970 outperforms Fury X in compute by factor of 5 and by factor of 3 when its using async.
 

Crayon

Member
I wouldnt interpret beyond3d's benchmark yet, it needs a lot of tweaking.

Why? My GTX 970 outperforms Fury X by a factor of 2.

I wouldnt interpret beyond3d's benchmark yet, it needs a lot of tweaking.

Why? My GTX 970 outperforms Fury X in compute by factor of 5 and by factor of 3 when its using async.

giphy.gif
 

virtualS

Member
Another reading is GCN works by removing the handicap of a platform that is designed to execute a variety of workloads and maximize efficiency of highly parallelized tasks. GCN is more generalized so that the graphics pipeline can be repurposed for different types of work not just pixel generation at the same time. Nvidia has optimized their architecture for power gating efficiency and dx11 performance for gaming or scientific computing, not both at the same time because the market demand for that was high.

This new execution model plays out well for consoles where the economic viability is primary concern for a mass produced sku and it also works out well for those wanting to use dx12 for more than gaming. Time will tell if PC game developers will need to adopt this paradigm shift.

see how easy it is to be positive and get a point across ;)

If such a paradigm shift is occurring on consoles, why not on PC too via Vulkan and DX12? Games engines are game engines. The less reworking required the better.
 

W!CK!D

Banned
If such a paradigm shift is occurring on consoles, why not on PC too via Vulkan and DX12? Games engines are game engines. The less reworking required the better.

Aside from him being sarcastic in order to holding up a mirror to dr. apocalipsis, there is an essential factor that everyone here needs to internalize:

Consoles =|= PCs
Console development =|= PC development

Simple as that. Even if DX12 looks like a low-level API compared to DX11, it is not even remotely close to the low-level access that console devs have to the hardware. You really need to understand that. GNM for example allows the programmer to take full control. You'll never see something like that on a PC and that is a matter of principle which is due to the nature of the gaming platform PC: You can't have all the advantages of PC gaming without facing the disadvantages.
 

Sh1ner

Member
My last 3 GPUs are all AMD. Even if AMD can do async and Nvidia can't is it expected for AMD to see large gains across the board in dx12? My assumption is no, doesn't ashes of singularity rely massively on async which is why we see these huge gains for AMD?

Another assumption is that most new games won't be using async as heavily as Ashes of Singularity. Can a dev or an enthusiast chime in on this? How crippling is this blow to game development for Nvidia not supporting async?
 

Sijil

Member
It's really bizarre, I think it's a relatively recent thing. I don't remember GPU fanboys existing 10 years ago

It was always there, 3Dfx vs Matrox, Voodoo cards vs the res, Nvidia vs AMD, Green team vs Red, it's just the net expanding and making it more obvious.
 
My last 3 GPUs are all AMD. Even if AMD can do async and Nvidia can't is it expected for AMD to see large gains across the board in dx12? My assumption is no, doesn't ashes of singularity rely massively on async which is why we see these huge gains for AMD?

Another assumption is that most new games won't be using async as heavily as Ashes of Singularity. Can a dev or an enthusiast chime in on this? How crippling is this blow to game development for Nvidia not supporting async?

You are wrong, the expectation are in fact largen gains across the board with dx12 for AMD
because their dx11 drivers sucks in comparison.
 

Sh1ner

Member
well free gains are always nice. I just assumed its too early to tell after one game and lots of synthetic benchmarks.
 
I understood very little of the technical stuff. I just have one question: If even without async nVidia cards perform better, why should I care? I want to upgrade my rig and I am completely and utterly confused by all this.
 

Renekton

Member
I understood very little of the technical stuff. I just have one question: If even without async nVidia cards perform better, why should I care? I want to upgrade my rig and I am completely and utterly confused by all this.
Why even come here haha.

Just ignore us, buy a 970 and have fun.
 

Mabufu

Banned
I understood very little of the technical stuff. I just have one question: If even without async nVidia cards perform better, why should I care? I want to upgrade my rig and I am completely and utterly confused by all this.

If the game relies on asynch, AMD will perform singnificantly better even in lower priced cards. Right now AMD is better equipped for handling DX 12 and more futureproof.

Nvidia can perform better in DX11 scenarios because nvidia GPU architecture are strighforward optimized for it.

AMD GPU's have an architecture more inneficient for DX11 that happens to be a lot more friendly to DX12 and asynch than nvidia's.

Takin in account that AMD have his architecture on the console market, one can think the game development industry is going to take a slight shift to favor amd's architecture.
With the exception of the deals nvidia can (and will) make.
 
It's pretty amusing. Been watching him talk nonsense the entire thread, and be corrected by people over and over. His narrative has gone from this all being untrue, and that the said Nvidia cards have been able to do this exact same type of async compute all along (despite the evidence, benchmarks, and everything else to the contrary), to then changing to push this notion that it's not even important except for consoles with a terrible cpu's lol.

Lololol, butthurts everywhere.

From the very start of the thread:
That aside, Nvidia GPUs being bad at switching between compute and graphical tasks is old news. We all have suffered FPS tanking when activating Physx even on high end GPUs.

Some of you really have to learn how to read and write on a discussion board, beyond your heathen beliefs, and understand than brand loyalism isn't that strong on the PC sphere as it is on the console or even phone market.

Then, the CPU thing is important to note because the people who buys a 980ti/Fury usually is sitting on top of a high end i7 with threads to spare. Telling said people that their current cards are obsolete for DX12 because they don't support an AMD feature that isn't part of DX12 is unchaste and tendentious.

I understand that some of you have some sort of agenda, being GPU brand or console rooted doesn't matter, but some of us doesn't care about it. You keep saying that Maxwell is doomed for DX12 even when, async, serial or broken, Maxwell keeps being faster in a worst case scenario than current AMD offerings.

Most people who have a clue will tell you the same, this is pretty good for PS4, not that good for XOne, and meaningless for PC gaming.

I enjoy techie stuff and try to learn how things works, don't care so much about people feelings about brands tbh. I would enjoy you teaching me where was I wrong in this thread, so I could learn something useful. But remember, I don't care at all about your religion substitutes.

There is a reason for Horse Armour being that funny in this forum
 
Then, the CPU thing is important to note because the people who buys a 980ti/Fury usually is sitting on top of a high end i7 with threads to spare. Telling said people that their current cards are obsolete for DX12 because they don't support an AMD feature that isn't part of DX12 is unchaste and tendentious.

That's pure bullshit, there are a lot of AMD FX and even Intel i3/i5 owners out there (I'm on an AMD FX processor) that would benefit HUGELY from the DX12 boost the Fury cards are getting.
 
That's pure bullshit, there are a lot of AMD FX and even Intel i3/i5 owners out there (I'm on an AMD FX processor) that would benefit HUGELY from the DX12 boost the Fury cards are getting.

Most of it coming from DX12 reduced driver overhead, WDDM 2.0 and draw calls, features that will enjoy even the lower tier DX12 cards.
 

FrunkQ

Neo Member
I understood very little of the technical stuff. I just have one question: If even without async nVidia cards perform better, why should I care? I want to upgrade my rig and I am completely and utterly confused by all this.

It's not that complex. If you are on/moving to Windows 10 and plan on playing new games, then from next year AMD will probably offer you the best bang for your buck. They may well "win" in many benchmarks for the next year or two until NVidia get Async worked out. But there are so many dependencies with each game and the capability of the rest of your rig that most people will probably not notice...

If you don't understand a lot of the conversation here then you are probably not a "high end PC" type and just buy any card within your modest budget. The market has a habit of pricing cards of similar capability next to each other - especially in the low to mid-range space.

Chill... this is just an interesting development in graphics cards - kind of esoteric for now. But it is nice to see the "underdogs" upsetting the status quo. Lets face it AMD getting a break in the PC space is a cool turn of events. Just when NVidia was running away with the graphics card business, this has happened in the nick of time to keep AMD in the game.
 
Top Bottom