Support NeoGAF

frontieruk · Sep 1, 2015

Dictator93 said:
We, IMO, still really have a hard idea understanding what is happening. Has that benchmark gotten better yet?

there was about moving to GPU time stamping rather than CPU but I last checked a couple of hours ago...

dr_rus · Sep 1, 2015

frontieruk said:
My take on it ATM, keep your 980ti's, async is probably going to be used for improved lighting and AA which aren't going to over load draw calls, so you'll see nvidia ahead in most games as AMD will suffer from latency of getting workloads into the queues where NV is quicker at low levels of async compute. If only we could get benches from the fable beta to confirm this

Compute is mostly used for post processing and animation simulation (hair, physics, etc.) It doesn't matter what it's used for though as much as how much compute jobs are launching in parallel to the graphics one. If the MDolenc's benchmark gives an accurate comparative picture then Maxwell 2 is fine at handling 31-63 compute jobs and starts to fall back behind GCN when this nubmer goes higher. Still, all this is empty talk outside of real world examples so I hear you when you say that we should wait for Fable benchmarks. Tomorrow's Children is using only 3 compute queues in addition to the graphics one for example so Maxwell 2 may be just fine running D3D12 console ports and not PC exclusive stress tests like AoS.

tuxfool said:
I'm pointing out that those wanting to switch from a 980ti to a 290 aren't even looking at the best GCN gpus.

GCN 1.2 has minimal changes compared to 1.1 though - FB color compression enhancements and FP16 precision support with minimal performance gains - so from a user perspective 290 isn't that far from being a "best GCN" right now.

serversurfer said:
That's closer to my understanding of how this stuff works. Basically, you've got turnstile-style access to a fixed pool of resources; the various math units on the GPU, each with its own specialty. So think of it like loading a roller coaster. Every cycle, the system hangs the next rendering job on the GPU, occupying some or all of those specialized units. These jobs are the people who paid for VIP passes. Then the system looks at the math units that haven't been assigned jobs, compares that to the 64 jobs waiting at their respective turnstile — all managed by eight line attendants — and lets in whatever punters best fill the remaining seats before dispatching the train.

It sounds like NV do something similar, but instead of filling empty seats every cycle with jobs from the 31 compute queues, they actually alternate job types, pulling a job from the render queue on even cycles and a job (more?) from the compute queues on the odd cycles. Then they're saying, "Well, at the end of the day, everybody gets to ride." While it's true they're seamlessly pulling jobs from both queue types, because they can't pull from multiple queue types simultaneously, they're not actually doing much to increase utilization. Any math unit not used in a given render operation remains idle; it just gets used on the following cycle. I'm assuming they'd at least be able to pull from all 31 queues on the compute cycle to attempt to fully saturate the math units, but they'd still have a lot of idle units on the render cycle.

NV's async granularity is lower than that of GCN but that's as much as we know right now.

serversurfer said:
Is this a trick question because adding async to the mix "just" increases your peak utilization? It will have empty spaces in its rendering pipeline that need filling, just like any other GPU, if that's what you're asking.

No, it's not a trick question as it's pretty obvious that async shaders can actually lead to _worse_ utilization than serial execution when done in a wrong way - this is especially true for architectures which aren't build for fast context switching and are built for maximum throughput inside one context - which coincidentally is what Maxwell 2 is. If you want to have an example of how this may happen look no further than HT lowering your CPU performance in some benchmarks on PC - same thing may easily happen with async compute on GPUs.

Async shaders is hardly a magic pill which will make everything faster everywhere, saying that it will is just stupid. There's a lot of talk in Q's presentation on how they've tweaked the wavefronts specifically for PS4's GPU to get the maximum out of async. There's that tidbit from UE4's Fable async compute code submission which says that it should be used with caution as the results may actually be worse than without it. There's also the OP's statement on them not getting a lot of performance out of the feature at all. So it's not a clear cut on if a game should even use it on PC - as it's highly dependent on the workloads in question.

AmyS said:
How likely is it that Pascal will have Asynchronous Compute?

Even Kepler (GK110) have async compute, it's just can't run it alongside graphics job. And to my knowledge Maxwell 2 support async compute with graphics just fine though obviously the architectural choices are different from GCN or any other architecture out there and the benefits from running stuff asynchronously may be way less than on GCN.

AmyS said:
In general, and aside from moving from 28nm to 16nm FF+, having HBM2 and double rate FP16, shouldn't Pascal have more architectural changes over Maxwell than Maxwell had over Kepler?

Unlikely as Pascal appeared on the roadmap between Maxwell and Volta with 3D memory feature moved into it. I'm thinking that Pascal is basically a tweaked Maxwell with HBM bus and the next big architecture update from NV will be in Volta. But who knows? This stuff changes every month.

W!CK!D said:
The performance of a video game is defined by more than just frames per second or frametimes. Asynchronous compute allows for higher throughput at lower latencies which easily makes it one of the most important features for VR gaming. Remember the beginning of this gen when Mark Cerny explained again and again the importance of async compute for the future of video games? That was before Morpheus was announced. Two years later it all makes sense.

Asynchronous compute makes latencies somewhat unpredictable so in the end it may be a bad idea for a code which is latency critical. You seems to be mixing a specific VR timewarp case with async compute in general. (I'm also pretty sure that this gen of VR won't be nearly as big as some of you think but that's just me.)

Coulomb_Barrier · Sep 1, 2015

Dictator93 said:
We, IMO, still really have a hard idea understanding what is happening. Has that benchmark gotten better yet?

This explains it well:

nVidia does async compute differently (and less efficiently) than AMD, but it still does it using context switching. Guess what? AMD uses context switching too, but they have 8 engines 8 queues and nVidia has 1 engine 32 queues - AMD can context switch faster, nVidia can fill queues faster. In a large draw-call situation, nVidia has even pointed out in their whitepapers that their context switching will take a hit (guess what Ashes likely has in it? huge draw calls).

A game dev posts "As far as I know, Maxwell doesn’t really have Async Compute" (where in the previous sentence he said it was functional, but didn't work well with their code) and now Maxwell is crippled, doesn't support DX12, and doesn't do async compute? Don't get me wrong - GCN's architecture is much better suited to async compute, it was built for it. nVidia, however, still supports it - async timewarp, which relies on async compute, is a huge part of GameworksVR and what allows them to get frame times down to sub 2ms. https://www.reddit.com/r/oculus/comments/3gwnsm/nvidia_gameworks_vr_sli_test/

I'm still just boggled by how quickly this misinformation was picked up and ran with, I bet the PR folks at nVidia are going bonkers.

Horrible oversimplification:

AMD GCN: 8 engines, 8 queues = potential for 64 queues
Nvidia Maxwell: 1 engine, 32 queues = potential for 32 queues

SeeNoWeevil · Sep 1, 2015

dr_rus said:
Maxwell 2 is fine at handling 31-63 compute jobs and starts to fall back behind GCN when this nubmer goes higher.

Won't this just put developers in the position of choosing extra performance on AMD at the expensive of Nvidia chips?

dogen · Sep 1, 2015

Coulomb_Barrier said:
This explains it well, from the horse's mouth, so to speak (Nvidia guy):

Horrible oversimplification:

AMD GCN: 8 engines, 8 queues = potential for 64 queues
Nvidia Maxwell: 1 engine, 32 queues = potential for 32 queues

I'm 99% sure that guy doesn't work for vnidia.

dr_rus · Sep 1, 2015

SeeNoWeevil said:
Won't this just put developers in the position of choosing extra performance on AMD at the expensive of Nvidia chips?

It's not clear of what performance we're talking about here. AoS is getting less than +30% (and by less I think they mean way less - around 5-10%) and the best example we have right now is Tomorrow's Children which is getting around +30% on a fixed platform which they can fine tune the code to rather extensively. The latter is running only 3 compute queues (even if that's actually 3 ACEs with 8 queues each this still gives us 24 queues which is less than 31 limit for Maxwell 2) which means that they didn't see much benefit in running more of them.

It may as well be that while GCN can handle loads of asynchronous queues with little loss of performance it won't actually be able to execute these queues in real time - each queue is still a program which still must be executed; the more you have - the longer time you'll need to execute all of them. Will it be of any benefit to anyone if some code will run on a Fury at 5 fps while 980Ti will handle it only with 1 fps? We really need more real games using the feature before we'll be able to make any conclusions.

Coulomb_Barrier · Sep 1, 2015

dogen said:
I'm 99% sure that guy doesn't work for vnidia.

He's a developer though, right?

Fatmanp · Sep 1, 2015

So glad I never upgraded to Maxwell. Looks like I will be waiting until Pascal hits.

frontieruk · Sep 1, 2015

MDolenc said:
Well... That's interesting... Found a brand new behaviour on my GTX 680... Will post a new version a bit latter I still want to implement gpu timestamps could indicate better what's going on on GCN

Ooh we could be getting closer...

DonasaurusRex · Sep 1, 2015

dr_rus said:
It's not clear of what performance we're talking about here. AoS is getting less than +30% (and by less I think they mean way less - around 5-10%) and the best example we have right now is Tomorrow's Children which is getting around +30% on a fixed platform which they can fine tune the code to rather extensively. The latter is running only 3 compute queues (even if that's actually 3 ACEs with 8 queues each this still gives us 24 queues which is less than 31 limit for Maxwell 2) which means that they didn't see much benefit in running more of them.

It may as well be that while GCN can handle loads of asynchronous queues with little loss of performance it won't actually be able to execute these queues in real time - each queue is still a program which still must be executed; the more you have - the longer time you'll need to execute all of them. Will it be of any benefit to anyone if some code will run on a Fury at 5 fps while 980Ti will handle it only with 1 fps? We really need more real games using the feature before we'll be able to make any conclusions.

system time or user time though, because if the user time is shorter the gamer wont care if in actuality more cpu clk is being used .

Netherscourge · Sep 1, 2015

I just made an Ashley Madison account so I can buy AMD GPUs again without Nvidia knowing.

dogen · Sep 1, 2015

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-12#post-1869354

here you go boys

x3sphere · Sep 1, 2015

Fatmanp said:
So glad I never upgraded to Maxwell. Looks like I will be waiting until Pascal hits.

Stuff like this happens with almost every card generation, you can't really escape it.

Pascal will look seriously outdated when Volta hits as well.

Blanquito · Sep 1, 2015

dogen said:
https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-12#post-1869354

here you go boys

So, the results are..... ????

(don't have an account; can't dl the .zip file)

dogen · Sep 1, 2015

Blanquito said:
So, the results are..... ????

(don't have an account; can't dl the .zip file)

None yet, and I don't have win 10.

Blanquito · Sep 1, 2015

dogen said:
None yet, and I don't have win 10.

Oh sorry, I thought the .zip contained the poster's results; it probably just contains the test, doesn't it? :{

Seems silly s/he wouldn't reveal what they got on their Kepler card, though.

bj00rn_ · Sep 1, 2015

Fatmanp said:
So glad I never upgraded to Maxwell

I'm curious; what exactly do you mean by that? I mean, even the top technical wizards around are struggling to go either clear way in this matter yet. So it would be interesting to hear the details behind your conclusion

dogen · Sep 1, 2015

bj00rn_ said:
I'm curious; what exactly do you mean by that? Even the top technical wizards around are struggling to go either clear way in this yet.

Have any real technical wizards actually weighed in on this yet?

bj00rn_ · Sep 1, 2015

dogen said:
Have any real technical wizards actually weighed in on this yet?

I don't know who's a legitimate technical wizard or not. That part was a bit tongue-in-cheek. But the point is still that even those who normally claim authority on the internets are unusually low key about a clear conclusion in this matter. So how is it possible to come to a conclusion like the previous poster did..

frontieruk · Sep 1, 2015

dogen said:
Have any real technical wizards actually weighed in on this yet?

these guys yes.

dogen · Sep 1, 2015

frontieruk said:
[URL=" guys yes.[/URL]

hey that's me! lol

but anyway, i believe they were still speculating about the extra 40-50ms overhead on amd.

adamantypants · Sep 1, 2015

For anyone that bought a 980ti, you're playing in high end territory now. You should have known you were gonna be obsolete really quickly. That's just how it works. FWIW, my last two cards were 780ti and currently 980ti. I pour hundreds of dollars down the drain because it's fun.

frontieruk · Sep 1, 2015

dogen said:
hey that's me! lol

but anyway, i believe they were still speculating about the extra 40-50ms overhead on amd.

that's on CPU timing though, the new test has added GPU timings which is probably why the creator is seeing a new trend, but is waiting for more data sources from Nvidia and the AMD guys in case it's just card specific.

dogen · Sep 1, 2015

frontieruk said:
that's on CPU timing though, the new test has added GPU timings which is probably why the creator is seeing a new trend, but is waiting for more data sources from Nvidia and the AMD guys in case it's just card specific.

Yeah, that's what we're all waiting for. should be interesting...

dogen · Sep 1, 2015

First result.
290
Compute only ranges from 28ms to 420ms for 512 threads.
Graphics only is 36ms
Graphics + compute ranges from 28ms to 395ms for 512.
Graphics + compute single command list ranges from 54ms to 250ms

now a 980 ti
compute ranges from 5.7ms to 76.9ms
graphics only result is 16.5ms
graphics + compute result is 20.9ms to 92.4ms
graphics + compute single command list result:
20.6ms to over 3000ms(after 454 the timer seems to bug out)

Coral Griffon · Sep 1, 2015

frontieruk said:
If only we could get benches from the fable beta to confirm this

On it.

FtsH · Sep 1, 2015

dogen said:
First result.
290
Compute only ranges from 28ms to 420ms for 512 threads.
Graphics only is 36ms
Graphics + compute ranges from 28ms to 395ms for 512.
Graphics + compute single command list ranges from 54ms to 250ms

Now we just need a geforce to compare it to.

I need some education in layman's terms here. How should I relate these numbers to the GPU's capability? Is there some simple ways to just say "low is better" or "narrower range means Async" ?

frontieruk · Sep 1, 2015

dogen said:
First result.
290
Compute only ranges from 28ms to 420ms for 512 threads.
Graphics only is 36ms
Graphics + compute ranges from 28ms to 395ms for 512.
Graphics + compute single command list ranges from 54ms to 250ms

now a 980 ti
compute ranges from 5.7ms to 76.9ms
graphics only result is 16.5ms
graphics + compute result is 20.9ms to 92.4ms
graphics + compute single command list result:
20.6ms to over 3000ms(after 454 the timer seems to bug out)

I was just looking over that.

dogen · Sep 1, 2015

FtsH said:
I need some education in layman's terms here. How should I relate these numbers to the GPU's capability? Is there some simple ways to just say "low is better" or "narrower range means Async" ?

A graphics + compute result(processing time in ms) being smaller than the equivalent separate graphics and compute tests combined would imply async compute, as long as the tasks themselves conducive to async compute(i.e. have different bottlenecks)

Irobot82 · Sep 1, 2015

A Fury X result got posted. That 980ti one bugged out before it could finish?

Also: The compiled results thus far in visual form.

Here

FtsH said:
Thanks! So that makes sense.

Is there any significance on the numbers themselves? Not trying to get into a fanboy war but is it possible to conclude something like "even X card doesn't support Async, it still delivers better real-world performance than Y" ?

In real world, it means nothing right now as there are ZERO DX12 games on the market. So....who knows!

FtsH · Sep 1, 2015

dogen said:
A graphics + compute result(processing time in ms) being smaller than the equivalent separate graphics and compute tests combined would imply async compute, as long as the tasks themselves conducive to async compute(i.e. have different bottlenecks)

Thanks! So that makes sense.

Is there any significance on the numbers themselves? Not trying to get into a fanboy war but is it possible to conclude something like "even X card doesn't support Async, it still delivers better real-world performance than Y" ?

Blanquito · Sep 1, 2015

Coral Griffon said:
On it.

I love this place. Thanks for your time and future input.

dogen said:
A graphics + compute result(processing time in ms) being smaller than the equivalent separate graphics and compute tests combined would imply async compute, as long as the tasks themselves conducive to async compute(i.e. have different bottlenecks)

So, looking at those results... it appears the Nvidia card isn't showing async compute, correct?

Coulomb_Barrier · Sep 1, 2015

Irobot82 said:
A Fury X result got posted. That 980ti one bugged out before it could finish?

Also: The compiled results thus far in visual form.

Here

What the fack is that, all these bars and numbers

frontieruk · Sep 1, 2015

dogen said:
First result.
290
Compute only ranges from 28ms to 420ms for 512 threads.
Graphics only is 36ms
Graphics + compute ranges from 28ms to 395ms for 512.
Graphics + compute single command list ranges from 54ms to 250ms

now a 980 ti
compute ranges from 5.7ms to 76.9ms
graphics only result is 16.5ms
graphics + compute result is 20.9ms to 92.4ms
graphics + compute single command list result:
20.6ms to over 3000ms(after 454 the timer seems to bug out)

Fury X results
Compute only
26.03ms - 467.68ms

Graphics only: 26.01ms (64.50G pixels/s)

Graphics + compute:
26.70ms (62.83G pixels/s) to 512. 443.43ms (3.78G pixels/s)

Graphics, compute single commandlist:
52.04ms (32.24G pixels/s) [25.97] {64.50 G pixels/s} to 512. 234.23ms (7.16G pixels/s)

KKRT00 · Sep 1, 2015

Fresh Win 10, new GTX 970, 355.60 drivers
This is how GPU utilization looked like on this benchmark:

dogen · Sep 1, 2015

FtsH said:
Thanks! So that makes sense.

Is there any significance on the numbers themselves? Not trying to get into a fanboy war but is it possible to conclude something like "even X card doesn't support Async, it still delivers better real-world performance than Y" ?

I don't think it's a good real world test. I don't remember what it's doing, it might not be a similar workload to games at all.

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-9#post-1869028

So the graphics part is just pushing triangles, and is fillrate bound. That's very likely why nvidia is winning in the graphics only portion, they have a much higher fillrate(right?). I think that's also a good indicator that it's not necessarily a very real world test, I don't think games are often extremely fillrate bound(portions of the render process might be though).

Macrotus · Sep 1, 2015

KKRT00 said:
Fresh Win 10, new GTX 970, 355.60 drivers
This is how GPU utilization looked like on this benchmark:

I'm not familiar with these type of things, but is that graph a positive one? or a negative one?
I'm concerned cause I also use a GTX 970.

FtsH · Sep 1, 2015

Irobot82 said:
In real world, it means nothing right now as there are ZERO DX12 games on the market. So....who knows!

Well...that's true.....

And what's the reason for Nvidia cards to show the ladder pattern while GCN cards gave flat numbers across the test?

FtsH · Sep 1, 2015

dogen said:
I don't think it's a good real world test. I don't remember what it's doing, it might not be a similar workload to games at all.

Cool. so what's the reason for Nvidia cards to show this ladder pattern while GCN cards gave flat numbers across the test?

icecold1983 · Sep 1, 2015

Coulomb_Barrier said:
What the fack is that, all these bars and numbers

it basically shows gcn benefiting from async and and nv gpus not

KKRT00 said:
Fresh Win 10, new GTX 970, 355.60 drivers
This is how GPU utilization looked like on this benchmark:

could those be the slow context switches amd was talking about?

dogen · Sep 1, 2015

FtsH said:
Cool. so what's the reason for Nvidia cards to show this ladder pattern while GCN cards gave flat numbers across the test?

Not sure.

edit - from my last post

So the graphics part is just pushing triangles, and is fillrate bound. That's very likely why nvidia is winning in the graphics only portion, they have a much higher fillrate(right?). I think that's also a good indicator that it's not necessarily a very real world test, I don't think games are often extremely fillrate bound(portions of the render process might be though).

KKRT00 · Sep 1, 2015

Macrotus said:
I'm not familiar with these type of things, but is that graph a positive one? or a negative one?
I'm concerned cause I also use a GTX 970.

It means that it is not working like it should, yes.
But it also seems like thats more a driver problem.

---

icecold1983 said:
could those be the slow context switches amd was talking about?

I really doubt it, it more looks like something is fundamentally broken, probably with drivers.

Netherscourge · Sep 1, 2015

KKRT00 said:
Fresh Win 10, new GTX 970, 355.60 drivers
This is how GPU utilization looked like on this benchmark:

That benchmark pretty much confirms this:

Coulomb_Barrier said:
This explains it well:

Horrible oversimplification:

AMD GCN: 8 engines, 8 queues = potential for 64 queues
Nvidia Maxwell: 1 engine, 32 queues = potential for 32 queues

FtsH · Sep 1, 2015

dogen said:
Not sure.

edit - from my last post

So the graphics part is just pushing triangles, and is fillrate bound. That's very likely why nvidia is winning in the graphics only portion, they have a much higher fillrate(right?). I think that's also a good indicator that it's not necessarily a very real world test, I don't think games are often extremely fillrate bound(portions of the render process might be though).

Thanks again. So basically don't read too much into the results other than the info related to Async.

frontieruk · Sep 1, 2015

FtsH said:
Cool. so what's the reason for Nvidia cards to show this ladder pattern while GCN cards gave flat numbers across the test?

New test is showing the same ladder effect on GCN, you just have to push it harder.

dogen · Sep 1, 2015

FtsH said:
Thanks again. So basically don't read too much into the results other than the info related to Async.

Even then I wouldn't say we can be completely sure, even though that's what it looks like. Maybe it really is driver related. We don't know yet.

serversurfer · Sep 1, 2015

dr_rus said:
NV's async granularity is lower than that of GCN but that's as much as we know right now.

It's sounding like referring to NV's approach as "granular" at all may be a bit generous.

No, it's not a trick question as it's pretty obvious that async shaders can actually lead to _worse_ utilization than serial execution when done in a wrong way - this is especially true for architectures which aren't build for fast context switching and are built for maximum throughput inside one context - which coincidentally is what Maxwell 2 is.

Well, obviously a broken implementation isn't gong to help much, but that doesn't imply they wouldn't benefit from a proper one.

Async shaders is hardly a magic pill which will make everything faster everywhere, saying that it will is just stupid.

Well, then it's a good thing no one is claiming that. Again, this is just a tool, and as such the results will depend on the project in question, the skill of the developer in using the tool, and as these tests are showing, the quality of the tool itself.

There's that tidbit from UE4's Fable async compute code submission which says that it should be used with caution as the results may actually be worse than without it.

It also says, "This is a good way to utilize unused GPU resources."

There's also the OP's statement on them not getting a lot of performance out of the feature at all.

That's pretty much the opposite of what it says in the OP. "Ashes uses a modest amount of [Async Compute], which gave us a noticeable perf improvement."

So it's not a clear cut on if a game should even use it on PC - as it's highly dependent on the workloads in question.

dr_rus said:
It's not clear of what performance we're talking about here. AoS is getting less than +30% (and by less I think they mean way less - around 5-10%) and the best example we have right now is Tomorrow's Children which is getting around +30% on a fixed platform which they can fine tune the code to rather extensively. The latter is running only 3 compute queues (even if that's actually 3 ACEs with 8 queues each this still gives us 24 queues which is less than 31 limit for Maxwell 2) which means that they didn't see much benefit in running more of them.

It may as well be that while GCN can handle loads of asynchronous queues with little loss of performance it won't actually be able to execute these queues in real time - each queue is still a program which still must be executed; the more you have - the longer time you'll need to execute all of them. Will it be of any benefit to anyone if some code will run on a Fury at 5 fps while 980Ti will handle it only with 1 fps? We really need more real games using the feature before we'll be able to make any conclusions.

Frankly, this is starting to sound like concern trolling. The fact that its utility varies does not diminish the technique in any way. It's a useful technique.

frontieruk · Sep 1, 2015

Coral Griffon said:
On it.

As you're here...

Wheres my Beta invite :'(

I signed up at the very very very first announcement.

my mate signed up just after E3 and is in, he can't play as he can't match make but it's hella unfair

Joking...

maybe...

Nice to see you partaking in the fun btw

Irobot82 · Sep 1, 2015

Coulomb_Barrier said:
What the fack is that, all these bars and numbers

Per the creator

Each bar in the chart shows the time it took for the async compute to finish.
The red block that floats to the top is the time it would take for the compute, by itself, to finish.
The blue block at the bottom is the time it would take for the graphics, by itself, to finish.

What we want here is for the red and blue to overlap, this signifies the async compute running faster than if you were to run the compute and graphics separately.
Sometimes we see a white gap between the 2 colors, this signifies that the async compute run is slower than it would have been if the two were run separately.

Coulomb_Barrier · Sep 1, 2015

More fuel to the fire!

Maxwell cards are now also crashing out of the benchmark as they spend >3000ms trying to compute one of the workloads.

AMD_Robert said:
The author is not interpreting the results correctly.
Look at the height of the graphics bars.
Look at the height of the compute bars.

Notice how NVIDIA's async results are the height of those bars combined? This means the workloads are running serially, otherwise compute wouldn't have to wait on graphics and the bars would not be additive.
Compare that to the GCN results. Compute and graphics together, async shading bars are no higher than any other workload, demonstrating that frame latencies are not affected when the workloads are running together.
//EDIT: Asynchronous shading isn't simply whether or not a workload can contain compute and graphics. It's whether or not that workload overlay graphics and compute, processing them both simultaneously without the pipeline latency getting any longer than the longest job. This is what GCN shows, but Maxwell does not.
//15:45 Central Edit: This benchmark has now been updated. GPU utilization of Maxwell-based graphics cards is now dropping to 0% under async compute workloads. As the workloads get more aggressive, the application ultimately crashes as the architecture cannot complete the workload before Windows terminates the thread (>3000ms hang).

https://www.reddit.com/r/pcgaming/comments/3j87qg/nvidias_maxwell_gpus_can_do_dx12_async_shading/

Support NeoGAF

Oxide: Nvidia GPU's do not support DX12 Asynchronous Compute/Shaders.

Member

Member

Member

Member

Member

Member

Member

Member

Member

Online Ho Champ

Banned

Member

Member

Member

Member

Member

Banned

Member

Banned

Member

Member

Member

Member

Member

Member

Lionhead, Game Director

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Similar threads