Support NeoGAF

KKRT00 · Sep 2, 2015

serversurfer said:
Again, this isn't a performance test. It merely test for the presence of fine-grained compute.

Sure compute performance is irrelevant, the behavior of GCN 1.1 vs 1.2 is irrelevant, drivers are irrelevant, or setting test up to use Maxwell properly is also irrelevant. Jesus.
Its really not great benchmark, its more like a test of standard DX12 drivers with common shader.

serversurfer said:
You still don't seem to get it. Yes, there are plenty of tricks you can employ to achieve similar performance gains, but you forget/ignore those same tricks are still available to the fine-grained systems.

Sure i dont. The other parts of my post of course dont matter, You had to only mention the one about GPU time in context of algorithm optimization ...

I've checked Your post history, it is all focused on Sony oriented threads or "console wars comparison' threads, so i'm out of discussion with You, its a waste of time when You look at tech related things only through one platform.

serversurfer · Sep 2, 2015

dogen said:
Nvm. Still trying to figure this out.

That's kinda what the thread is about. The new DX12 features intended to improve performance instead degraded instead degraded it. When asked about it, NV told the dev to test if it was an NV card and if so, treat it like DX11.

What the dev and I both find kinda odd is they're not checking for a specific NV family. In effect, they're saying if it's NV, just don't bother.

KKRT00 said:
Sure compute performance is irrelevant, the behavior of GCN 1.1 vs 1.2 is irrelevant, drivers are irrelevant, or setting test up to use Maxwell properly is also irrelevant. Jesus.

I never said that.

Its really not great benchmark, its more like a test of standard DX12 drivers with common shader.

Again, it's testing for the presence of fine-grained compute.

Sure i dont. The other parts of my post of course dont matter, You had to only mention the one about GPU time in context of algorithm optimization ...

The other parts of your post were irrelevant or simply wrong.

I've checked Your post history, it is all focused on Sony oriented threads or "console wars comparison' threads, so i'm out of discussion with You, its a waste of time when You look at tech related things only through one platform.

Signing off with an ad hom? I accept your surrender.

dr. apocalipsis · Sep 2, 2015

serversurfer said:
I didn't think he was that funny. I thought the poor guy was as confused as you are. I was gonna type a long post to try to explain this stuff to him.

This, pretty much, summarizes all of your interventions and should disqualify you for the few people that still could take you seriously.

You are not worth the time.

DonasaurusRex · Sep 2, 2015

hmm seems like amd chose wisely in its implementation of DX12 support, happens every gen hopefully the presence of ACE's help older gen GPU's squeeze some new perf in DX12. Either way moving forward 2016 is going to be exciting, wonder what Khronos will bring to the table.

SeeNoWeevil · Sep 2, 2015

I'm guessing 980Ti vs Fury X Fable benchmark is what everyone is waiting to see.

frontieruk · Sep 2, 2015

SeeNoWeevil said:
I'm guessing 980Ti vs Fury X Fable benchmark is what everyone is waiting to see.

It'll be interesting to see what techs a MS first party will leverage on PC.

serversurfer · Sep 2, 2015

dr. apocalipsis said:
This, pretty much, summarizes all of your interventions and should disqualify you for the few people that still could take you seriously.

You are not worth the time.

I should be ignored because I try to be helpful and try to give people the benefit of the doubt?

You make very strange arguments. =/

Last Hearth · Sep 2, 2015

So I just built a PC with a 970 last week, upgraded from a 7950, I should have just kept the 7950?

frontieruk · Sep 2, 2015

Last Hearth said:
So I just built a PC with a 970 last week, upgraded from a 7950, I should have just kept the 7950?

No, you probably made the eight decision for you at this time, all we have is speculation atm

Alej · Sep 2, 2015

dr. apocalipsis said:
This, pretty much, summarizes all of your interventions and should disqualify you for the few people that still could take you seriously.

You are not worth the time.

Personal attacks then? Personal attacks.
*popcorn*

You weren't taken seriously for the get go BTW.

Sijil · Sep 2, 2015

Last Hearth said:
So I just built a PC with a 970 last week, upgraded from a 7950, I should have just kept the 7950?

No. By the time DX12 is in common use there will be better cards to buy specifically for DX12, for the time being the 970 torpedoes the 7950.

Locuza · Sep 3, 2015

Sebastian Aaltonen from RedLynx about the little async compute test:

If the compiler is good, it could either skip the vector units completely by emitting pure scalar unit code (saving power) or emitting both scalar + vector in an interleaved "dual issue" way (the CU can issue both at the same cycle, doubling the throughput).

Benchmarking thread groups that are under 256 threads on GCN is not going to lead into any meaningful results, as you would (almost) never use smaller thread groups in real (optimized) applications. I would suspect a performance bug if a kernel thread count doesn't belong to {256, 384, 512}. Single lane thread groups result in less than 1% of meaningful work on GCN. Why would you run code like this on a GPU (instead of using the CPU)? Not a good test case at all. No GPU is optimized for this case.

Also I question the need to run test cases with tens of (or hundreds of) compute queues. Biggest gains can be had with one or two additional queues (running work that hits different bottlenecks each). More queues will just cause problems (cache trashing, etc).

https://forum.beyond3d.com/posts/1869700/

Dictator93 · Sep 3, 2015

Locuza said:
Sebastian Aaltonen from RedLynx about the little async compute test:

https://forum.beyond3d.com/posts/1869700/

So its task / benchmark is unrealistic as a real world use case. Hrm.

icecold1983 · Sep 3, 2015

Dictator93 said:
So its task / benchmark is unrealistic as a real world use case. Hrm.

for perf comparisons yes, its useful in determining if async works or not.

Belmire · Sep 3, 2015

Excuse my ignorance, I am not as familiar with this subject material as many of the posters here.

One question; Lets assume that Maxwell 2 can't do Async the same way AMD can, or not at all, or even less of it. AMD currently does not support CR or ROVs. However, they were quoted saying that they can achieve the same results using other methods and that they have already demonstrated this in Dirt Rally.

Does this mean that Nvidia can take the same approach to solving it's assumed Async deficiency?

icecold1983 · Sep 3, 2015

Belmire said:
Excuse my ignorance, I am not as familiar with this subject material as many of the posters here.

One question; Lets assume that Maxwell 2 can't do Async the same way AMD can, or not at all, or even less of it. AMD currently does not support CR or ROVs. However, they were quoted saying that they can achieve the same results using other methods and that they have already demonstrated this in Dirt Rally.

Does this mean that Nvidia can take the same approach to solving it's assumed Async deficiency?

nope. asyncs purpose is to keep as much of the gpu saturated with work as possible.

Locuza · Sep 3, 2015

Belmire said:
One question; Lets assume that Maxwell 2 can't do Async the same way AMD can, or not at all, or even less of it. AMD currently does not support CR or ROVs. However, they were quoted saying that they can achieve the same results using other methods and that they have already demonstrated this in Dirt Rally.

Does this mean that Nvidia can take the same approach to solving it's assumed Async deficiency?

You can achieve similar things without the need of CR or ROVs, but CR & ROVs exists because they target directly common problems and inefficiency.
So alternatives will have draw-backs.
If Nvidia can't process Async Compute efficiently, then developers have to look how to handle this problem.

Either looking at a use-case which isn't hurting the performance or using different processing paths per vendor or not using it at all.

Renekton · Sep 3, 2015

Given Nvidia's overwhelming marketshare, it's safe to say most devs won't support it regardless of use case benefits. Epic might drag its feet as well.

dogen · Sep 3, 2015

Renekton said:
Given Nvidia's overwhelming marketshare, it's safe to say most devs won't support it regardless of use case benefits. Epic might drag its feet as well.

They seem to be so far.

Locuza · Sep 3, 2015

On the other side, consoles dictate more or less how games are designed and which features they use.
Since many games will use it, the question is how developers will solve this on PC?
Globally deactivate it for every vendor? Making specific paths for each vendor?
Don't care about the other vendors?

Maybe we will see all three options later on.

icecold1983 · Sep 3, 2015

Renekton said:
Given Nvidia's overwhelming marketshare, it's safe to say most devs won't support it regardless of use case benefits. Epic might drag its feet as well.

async compute will see far more use than CR or ROV imo

Belmire · Sep 3, 2015

Locuza said:
You can achieve similar things without the need of CR or ROVs, but CR & ROVs exists because they target directly common problems and inefficiency.
So alternatives will have draw-backs.
If Nvidia can't process Async Compute efficiently, then developers have to look how to handle this problem.

Either looking at a use-case which isn't hurting the performance or using different processing paths per vendor or not using it at all.

Thanks.

Vinland · Sep 3, 2015

Belmire said:
Thanks.

Look at it like from the perspective of cellphone apps. If Oracle Java has a method in their API that does an array sort using modern intel/amd/sparc/PowerPC hardware supported extensions in their math processors and the latest arm processor does not have any equivalent feature then Google with android and Oracle themselves with the embedded Java have to find a way to get it done. They can try all sorts of ways and if they come close no matter what the methodology no one cares how it was done. If it is a common method call and it is super slow on arm then some devs may decide to do a platform check and call another implementation altogether that forgoes the feature in favor of the faster path. In many cases no one will even notice as they don't have a point of reference to contrast. If you put the Desktop version of the app side by side you may notice some differences.

A lot of times the compiler makes these decisions for you and sometimes you need to actively defend against it. That is why profilers and debuggers are really handy in the development studio of whatever sdk you are running.

Renekton · Sep 3, 2015

icecold1983 said:
async compute will see far more use than CR or ROV imo

Also I'm sure Frostbite will. Johan loves a challenge, he'd marry Cell if it was a woman.

(Please don't rule34 dragonball me :<)

hooijdonk17 · Sep 3, 2015

Renekton said:
Given Nvidia's overwhelming marketshare, it's safe to say most devs won't support it regardless of use case benefits. Epic might drag its feet as well.

Epic will drag its feet to enable a feature that current consoles can already take advantage of, and try to sell the engine to multi platform developers.

The world has run out of rationality.

Renekton · Sep 3, 2015

hooijdonk17 said:
Epic will drag its feet to enable a feature that current consoles can already take advantage of, and try to sell the engine to multi platform developers.

The world has run out of rationality.

For PC I mean D:

They got the X1 version running.

hooijdonk17 · Sep 3, 2015

Renekton said:
For PC I mean D:

They got the X1 version running.

Regardless of platform it will be the same engine; to think they will enable a feature on consoles only to cripple it for DX12 is beyond absurd.

foolishoptimist · Sep 3, 2015

Renekton said:
For PC I mean D:

They got the X1 version running.

I thought Lionhead did the work, that's why it's XB1 only.

frontieruk · Sep 3, 2015

foolishoptimist said:
I thought Lionhead did the work, that's why it's XB1 only.

It's getting back ported to the PC engine, but that will take time

foolishoptimist · Sep 3, 2015

frontieruk said:
It's getting back ported to the PC engine, but that will take time

My point was that so far Epic hasn't done anything to support this feature.

SeeNoWeevil · Sep 3, 2015

This is the only bench I can find of the Fury X vs the 980Ti. So why is everyone saying the 980Ti got outperformed?

frontieruk · Sep 3, 2015

SeeNoWeevil said:
This is the only bench I can find of the Fury X vs the 980Ti. So why is everyone saying the 980Ti got outperformed?

The furyx performs the same as the 290x due to having the same architecture for async compute the 7950 also puts up a good show.

The 290x released almost 2 years ago which is why the fuss as it's trading blows with a card that destroys it in dx11

Dictator93 · Sep 3, 2015

SeeNoWeevil said:
This is the only bench I can find of the Fury X vs the 980Ti. So why is everyone saying the 980Ti got outperformed?

Where did you grab that graph from?

SeeNoWeevil · Sep 3, 2015

frontieruk said:
The furyx performs the same as the 290x due to having the same architecture for async compute the 7950 also puts up a good show.

The 290x released almost 2 years ago which is why the fuss as it's trading blows with a card that destroys it in dx11

But it's not like people with 980Tis would swap for a 290x and have much worse DX11 performance over the next couple of years just so they could have very efficient DX12 in some future games, maybe. Or that the Fury X is now a better buy than the 980TI.

These new findings basically just mean, great DX11 performance is expensive!

Naminator · Sep 3, 2015

frontieruk said:
The furyx performs the same as the 290x due to having the same architecture for async compute the 7950 also puts up a good show.

The 290x released almost 2 years ago which is why the fuss as it's trading blows with a card that destroys it in dx11

Fury X is Fury X.

Unless AMD has something better out right now I don't see the point of trying to call it a 290X and pretend like it came out 2 years a go.

SeeNoWeevil · Sep 3, 2015

Dictator93 said:
Where did you grab that graph from?

http://www.extremetech.com/gaming/2...he-singularity-amd-and-nvidia-go-head-to-head

KingSnake · Sep 3, 2015

SeeNoWeevil said:
This is the only bench I can find of the Fury X vs the 980Ti. So why is everyone saying the 980Ti got outperformed?

Going by the comments here I thought it was some kind of a bloodbath for Nvidia. Quick, take back your 980tis from ebay.

frontieruk · Sep 3, 2015

Naminator said:
Fury X is Fury X.

Unless AMD has something better out right now I don't see the point of trying to call it a 290X and pretend like it came out 2 years a go.

Except no one in here has actually been talking about the fury x its all come about because the 290x out performs the 980ti under DX12.

I think your being a bit disingenuous by saying I pretended a fury x was a 290x I pointed out that as the underlying architecture is the same that's why the older card has caused the ruckous not the new card. If you'd actually read the thread you'd see I've been one of the more level headed commenters here not actually taking a side but go ahead a pull that fanboy shit out your ass, I've got nothing to hide as I said I've even advised to keep recently bought NV cards as it doesn't mean shit yet.

SeeNoWeevil said:
But it's not like people with 980Tis would swap for a 290x and have much worse DX11 performance over the next couple of years just so they could have very efficient DX12 in some future games, maybe. Or that the Fury X is now a better buy than the 980TI.

These new findings basically just mean, great DX11 performance is expensive!

Which has been pretty much the opinion of the level headed commenters here, you'll even see me advise keeping recently perchased NV cards as at the moment apart from giving some geeks something to speculate about it doesn't mean shit. Most games for the next year are going to support DX11

dr. apocalipsis · Sep 3, 2015

frontieruk said:
Except no one in here has actually been talking about the fury x its all come about because the 290x out performs the 980ti under DX12.

SeeNoWeevil said:
And the Fury X?

icecold1983 said:
faster of course

Except some people actually did.

And then they keep trying to force AC as a key part of DX12 when it isn't even on the feature set of the API.

This is like saying Tessellation was a key feature of DX10 because many cards of the era were able to support it, but it wasn't a requisite of DX set until DX11.

Who knows, maybe Horse Armour is right and it becomes a requisite for DX13.

W!CK!D · Sep 3, 2015

SeeNoWeevil said:
So why is everyone saying the 980Ti got outperformed?

Fury X is different: Devs are used to working with GDDR5 for years and the code is optimized accordingly. HBM is a completely new approach for memory that'll most likely need different memor access patterns to unlock its full potential.

justsomeguy · Sep 3, 2015

Locuza said:
Sebastian Aaltonen from RedLynx about the little async compute test:

https://forum.beyond3d.com/posts/1869700/

"Biggest gains can be had with one or two additional queues (running work that hits different bottlenecks each). More queues will just cause problems (cache trashing, etc)."

Interesting - so maybe the XBox's 2 ACE units are fine after all, if I've understood correctly.

frontieruk · Sep 3, 2015

dr. apocalipsis said:
Except some people actually did.

And then they keep trying to force AC as a key part of DX12 when it isn't even on the feature set of the API.

This is like saying Tessellation was a key feature of DX10 because many cards of the era were able to support it, but it wasn't a requisite of DX set until DX11.

Who knows, maybe Horse Armour is right and it becomes a requisite for DX13.

Did I say it though? He said I pretended a furyx was a 290x, I pointed out that its the two year old 290x causing the raised eyebrows.

As a side note, the graph is also the one that doesnt show the results where all the fuss started.

When 4xMSAA was enabled the fury pulled ahead. Which lead to the whole NV saying the code was broken yaddy yaddy yaya.

Leading us to this thread.

W!CK!D · Sep 3, 2015

justsomeguy said:
Interesting - so maybe the XBox's 2 ACE units are fine after all, if I've understood correctly.

It's too early to judge the value of Sony's additional resources. They didn't pack 8 ACEs and 64 queues for no reason.

You have to consider that things like allocated resources and priorities for async shaders as well as communication between ACEs happen on driver level. It's impossible to predict what console devs will squeeze out of 8 ACEs manually.

dr. apocalipsis · Sep 3, 2015

W!CK!D said:
Devs are used to working with GDDR5 for years and the code is optimized accordingly.

Sweet lord...

W!CK!D said:
It's too early to judge the value of Sony's additional resources. They didn't pack 8 ACEs and 64 queues for no reason.

Actually, they are still overkill. They are just dispatchers and there is a reason for AMD doing their 64:4:1:1 ratio.

Arkanius · Sep 3, 2015

Beyond3D have gotten more results, and it Async Computing in Maxwell 2 is "supported" through the Driver offloading the Computing calculations to the CPU and back, hence the huge delay added, and why it was faster for Oxide to disable it all together for Nvidia.

https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-21#post-1869774

Anyhow, sebbi says this:

This is not a performance (maximum throughput) benchmark. However it seems that less technically inclined people believe it is, because this thread is called "DX12 performance thread". This thread does't in any way imply that "asynchronous compute is broken in Maxwell 2", or that "Fiji (Fury X) is super slow compared to NVIDIA in DX12 compute". This benchmark is not directly relevant for DirectX 12 games. As some wise guy said in SIGGRAPH: graphics rendering is the killer-app for compute shaders. DX12 async compute will be mainly used by graphics rendering, and for this use case the CPU->GPU->CPU latency has zero relevance. All that matters is the total throughput with realistic shaders. Like hyperthreading, async compute throughput gains are highly dependent on the shaders you use. Test shaders that are not ALU / TMU / BW bound are not a good way to measure the performance (yes I know, this is not even supposed to be a performance benchmark, but it seems that some people think it is).

This benchmark has relevance for mixed tightly interleaved CPU<->GPU workloads. However it is important to realize that the current benchmark does not just measure async compute, it measures the whole GPU pipeline latency. The GPUs are good at hiding this latency internally, but are not designed to hide it to external observers (such as the CPU).

dr. apocalipsis · Sep 3, 2015

It will be interesting to see how some people here will try to discredit Sebastian now.

SimpleCRIPPLE · Sep 3, 2015

While still early, its looking like AMD played the long game beautifully on this one. Its exciting that the GPU space is interesting for the first time in years, and I hope this brings AMD back in a big way and lights a fire under Nvidia's ass.

Everybody wins. I know is a foreign concept for a lot of the internet, but its possible.

Alexlf · Sep 3, 2015

Ouch! Nvidia better release some driver updates to allow async compute in-card or this is going to look reeeally bad. Not that it doesn't already.

dr_rus · Sep 3, 2015

So this graph (from the same B3D thread) kinda shows that Maxwell 2 does support async compute but it's implementation is far from ideal as there are a lot of cases where running things serially may actually be faster:

Note the section with 2-31(ish) threads though - async compute is always faster than serial there. Couple this with what we know of the best example of async compute currently (and with some general knowledge of how this stuff works) and I would say that Maxwell 2 will handle async more or less fine in the first generation of DX12 titles (and it's not clear that we'll get the second one while this console gen is going).

There's also this graph:

Which kinda illustrate what I've said earlier about async making latencies less predictable and possibly leading to hitches in graphics thread. Note as well that this is even worse on 1.2 Fiji.

This post is good at explaining some stuff as well:

Again, this "async compute" is not an API feature - it's not an optional capability that can be exposed to the API programmer. This is a WDDM driver/DXGK feature which can improve performance in GPU-bound scenarios. Developers would just use compute shaders for lighting and global illumination, and in AMD implementation there are 2 to 8 ACE (asynchronous compute engine) blocks which are dedicated command processors that completely bypass the rasterization/setup engine for compute-only tasks. In theory this means additional compute performance without stalling the main graphics pipeline.

Parallel execution is actually a built-in feature in the Direct3D 12 - it's called "synchronization and multi-engine". There are three sets of functions for copy, compute and rendering, and these tasks can be parallelized by runtime and driver when you have the right hardware. You just need to submit your compute shaders to the Direct3D runtime using the usual API calls, and on high-end AMD hardware with additional ACE blocks, you may use larger and more complex shaders and/or create additional command queues using multiple CPU threads. This will saturate the compute pipeline and you would still get fair performance gains comparing to the traditional rendering path.

So when Oxide said they had to query hardware IDs for Nvidia cards then disable some features in the rendering path, it makes sense. When they talk about console developers getting 30% gains by using "async compute" - i.e. using compute shaders to accelerate lighting calculations in parallel to the main rendering stack - it makes sense as well.

But when Oxide says that the 900-series (Maxwell-2) don't have the required hardware but the Nvidia driver still exposes "async compute" capability, I don't think they can really tell this for sure, because this feature would be exposed through DXGK (DirectX Graphics Kernel) driver capability bits, and these are driver-level interfaces which are only visible to the DXGI and the Direct3D runtime, but not the API programmer (and the MSDN hardware developer documentation for WDDM 2.0 and DXGI 1.4 does not exist yet).

They are probably wrong on hardware support too, since Nvidia asserted to AnandTech that the 900-series have 32 scheduling blocks, of which 31 can be used for compute tasks.

So if Nvidia really asked Oxide to disable the parallel rendering path in their in-game benchmark, that has to be some driver problem rather that missing hardware support. Nvidia driver probably doesn't expose the "async" capabilities yet, so the Direct3D runtime cannot parallelize the compute tasks, or the driver is not fully optimized yet... not really sure, but it would take me quite enormous efforts to investigate even if I had full access to the source code.

serversurfer said:
I'm saying that referring to a non-granular system as merely having a "granularity difference" is generous and misleading. You're implying that NV's approach is somewhat granular, but it really isn't.

You can say whatever you want but that won't make the granularity difference into something else. Having a on/off granularity (i.e. serial execution only) is still a granularity choice which can be compared as a coarser granularity, and Maxwell 2 to my knowledge has a finer granularity than that (i.e. it does support running compute threads in parallel to graphics thread).

serversurfer said:
I refer to it as "broken" because NV refer to it as "fully compliant." Yes, it doesn't crash in response to the command, but the operations intended to improve performance instead degrade it. So I assume it's actually intended to deliver the claimed functionality, and generously refer to it as broken, yes. But you may be right too; maybe it was never intended to work correctly, and they were just misleading us when they said it would.

There are no implementation requirement for async compute in either WDDM 2.0 or DX12. You can support it in a serialized fashion or as a coarse grained async pipeline or as a finer grained one. Note that GCN is the only h/w on the market right now which actually does support it in a fine grained fashion.

serversurfer said:
Then I imagine you won't have any trouble providing us with some links.

Or, sure, if I'll stumble upon one next time I'll post it here, no problem.

serversurfer said:
Completely untrue. There are always unused resources, because not every processor is needed in every phase of the rendering pipeline. Try to keep up.

The amount of idle resources in a GPU is totally dependent on the workload this GPU is doing at the moment. Saying that there are always unused resources in a GPU is a plain lie.
What's more important to the question at hand is that the amount of idle resources in a GPU is completely dependent on the said GPU's architecture. NV GPUs are known for their ability to achieve higher performances with smaller FLOPs / SPs / die sizes than their GCN counterparts. They are able to do this because their architecture is made specifically to minimize the amount of idle blocks per time slice and to achieve that they try to extract more ILP per each clock than GCN's counterparts.
This could mean that the reason NV didn't do the same level of TLP in Maxwell as AMD has in GCN is because they simply doesn't have as much idling resources in their GPUs and going for a more efficient TLP would be a waste of effort as they won't be able to run compute threads in parallel to graphic one simply because of utilization of available resources being peaked already.
Are you keeping up?

serversurfer said:
So you claim they admitted to not getting a lot of performance out of the feature, despite his actual statement being that he got a noticeable improvement with only a modest amount of effort. When I call you out on completely misrepresenting what he said, your defense is, "No, he's the liar!!" ><

My defense is well stated above but you seems to be unable to comprehend it so I won't bother repeating myself.

serversurfer said:
It's a useful technique on any architecture that implements it correctly.

There are no "correct" implementation of TLP. Even the need for TLP is completely task dependent. It may well be that a "correct" implementation would be to not implement it at all.
How are you doing on keeping up with me?

serversurfer said:
This benchmark isn't designed to test actual performance; the GCN cards are dispatching jobs half-filled. This benchmark merely tests for the presence of fine-grained compute. The AMD cards pass that test, while the NV cards fail. We can't compare fine-grained performance because the current NV cards aren't capable of doing it at all.

Did that clear things up for you?

Things were rather clear for me from the start - we're discussing alpha software running on alpha drivers in a game made on AMD money to promote Mantle. And MDolenc's synthetic benchmark is actually showing a lot more stuff than you pretend its showing. Performance on this particular task is a result as much as anything else, don't try to diminish it.

Arkanius said:
Beyond3D have gotten more results, and it Async Computing in Maxwell 2 is "supported" through the Driver offloading the Computing calculations to the CPU and back, hence the huge delay added, and why it was faster for Oxide to disable it all together for Nvidia.

https://forum.beyond3d.com/threads/dx12-async-compute-latency-thread.57188/page-21#post-1869774

Anyhow, sebbi says this:

Running any GPU workload on CPU for "emulation" (emulation of what and why would they even emulate this?) is a completely stupid idea all the time. I don't believe in this for a second. The CPU load is likely related to WDDM hitting the timeouts on Maxwell more than anything else.

AP90 · Sep 3, 2015

Ok.. I attempted to read and understand numerous tech lingo iterated above on the pc/desktop end.. I think I have a slight understanding now.

So what does this mean for current AMD laptop gpus (R9 M200series) in the future (next 2 years.. Obviously the R9 M300series when released will be a leap.

Secondly.. Does the ACE feature in the cpugpu setup for consoles potentially provide a boost in performance? (Sony, MS and Nintendo)...aka giving this Gen a long stride like last Gen?

Support NeoGAF

Oxide: Nvidia GPU's do not support DX12 Asynchronous Compute/Shaders.

Member

Member

Banned

Online Ho Champ

Member

Member

Member

Member

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Banned

Member

The Birthday Skeleton

Member

Banned

Banned

Member

Member

Banned

Banned

Member

Banned

Member

Member

Member

Member

Similar threads