Support NeoGAF

qa_engineer · Mar 31, 2015

From what I've read Asynchronous shaders will be one of DX12's more notable features, which is something we're already seeing in PS4 games. Now that future PC games should be making use of this tech, will we see more of an emphasis on async compute on multiplat games as well? Will the inclusion of 8 ACEs in Sony's Livepool SoC help keep the PS4 relevant as it begins to age?

Excerpts from Tom's Hardware: http://www.tomshardware.com/news/amd-dx12-asynchronous-shaders-gcn,28844.html

In DirectX 12, however, a new merging method called Asynchronous Shaders is available, which is basically asynchronous multi-threaded graphics with pre-emption and prioritization. What happens here is that the ACEs (Asynchronous Compute Engines) on AMD's GCN-based GPUs will interleave the tasks, filling the gaps in one queue with tasks from another, kind of like merging onto a highway where nobody moves to the side for you.

The most basic GPUs have just two ACEs, while more elaborate GPUs carry eight.

Excerpt from Anandtech:
http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading

Execution theory aside, what is the actual performance impact of asynchronous shaders? This is a bit harder of a question to answer at this time, though mostly because there’s virtually nothing on the PC capable of using async shaders due to the aforementioned API limitations. Thief, via its Mantle renderer, is the only PC game currently using async shaders, while on the PS4 and its homogenous platform there are a few more titles making using of the tech.

Mindman · Mar 31, 2015

Waiting patiently to buy a DX12 AMD GPU

Nikodemos · Mar 31, 2015

Well, since they made Mantle an open standard, it was only a matter of time until other companies integrated their interpretations of it in their own APIs.

Captain Chaos · Mar 31, 2015

Will my R9 280X be DX12 compatible?

Bsigg12 · Mar 31, 2015

Nikodemos said:
Well, since they made Mantle an open standard, it was only a matter of time until other companies integrated their interpretations of it in their own APIs.

Eh Mantle will probably die a quick death now that DX12 is almost here.

Pop · Mar 31, 2015

Captain Chaos said:
Will my R9 280X be DX12 compatible?

Yes, of course

qa_engineer · Mar 31, 2015

Captain Chaos said:
Will my R9 280X be DX12 compatible?

My understanding is that any GCN 1.0 or 1.1 GPU will be fully compatible with DX12.

Bsigg12 said:
Eh Mantle will probably die a quick death now that DX12 is almost here.

Which is why AMD is doing their best to make sure all of their learnings from Mantle is incorporated into DX12 and Vulkan.

Irobot82 · Mar 31, 2015

Bsigg12 said:
Eh Mantle will probably die a quick death now that DX12 is almost here.

It already has, it's the foundation of Vulkan and I believe they are moving Mantle into other directions now.

Captain Chaos · Mar 31, 2015

Pop said:
Yes, of course

There was talk of needing a new card when DX12 was announced though?

mrklaw · Mar 31, 2015

It sounds interesting, but only having two ACEs might limit the efficiency on xbox one and most GCN GPUs. Which ones have 8 on the PC?

qa_engineer · Mar 31, 2015

mrklaw said:
It sounds interesting, but only having two ACEs might limit the efficiency on xbox one and most GCN GPUs. Which ones have 8 on the PC?

290x is the only AMD card with 8, PS4's liverpool also has 8. The 295x is sounding like a really good investment right about now since its a dual gpu on a single pcie card setup.

wit3tyg3r · Mar 31, 2015

I've been considering getting the GTX 980 Ti, if that every gets announced. However, this may push me towards waiting for the R9 390x. I'm still curious to see how they will compare in raw performance.

qa_engineer · Mar 31, 2015

wit3tyg3r said:
I've been considering getting the GTX 980 Ti, if that every gets announced. However, this may push me towards waiting for the R9 390x. I'm still curious to see how they will compare in raw performance.

I am currently running a 290x and was also contemplating jumping to team green, but the 390x may be the better buy. Only time will tell.

metalhead79 · Mar 31, 2015

qa_engineer said:
290x is the only AMD card with 8, PS4's liverpool also has 8. The 295x is sounding like a really good investment right about now since its a dual gpu on a single pcie card setup.

I thought the R9 290 had 8 ACE's as well. How many does it have?

Pop · Mar 31, 2015

Captain Chaos said:
There was talk of needing a new card when DX12 was announced though?

http://www.amd.com/en-us/products/graphics/desktop/r9#

Scroll down to specs and check it out.

qa_engineer · Mar 31, 2015

metalhead79 said:
I thought the R9 290 had 8 ACE's as well. How many does it have?

That's a good question. I can't seem to find any documentation on how many ACEs it has. It most likely does have 8 since certain 290 cards can be flashed to 290x.

edit: As it turns out the 285x, 290, 290x all have 8 ACEs. The 295x technically has 16 ACEs because of its dual GPU, single card setup.

Dictator93 · Mar 31, 2015

A quck reminder that these allow for better GPU utilization. Not EXTRA utilization.

With that being said... awww yis more performance.

wit3tyg3r · Mar 31, 2015

qa_engineer said:
I am currently running a 290x and was also contemplating jumping to team green, but the 390x may be the better buy. Only time will tell.

I've been an NVIDIA customer ever since I began building PCs (currently running a GTX 680). I've rarely had issues with them. However, over the past year or so, I've actually been considering jumping to Team Red. When it comes to pure performance, I know the NVIDIA cards have generally done better. But, the emerging issue has been company policies and business practices.

NVIDIA likes to license everything, making it more expensive for hardware manufacturers and game developers to use their technology. AMD seems to be more on board with Open Source initiatives and making sure that everyone has access to their tech, even NVIDIA. It makes AMD look like the "good guys" and I want to support them for that.

And this news with Asynchronous Shading tech makes me want to support Team Red even more!

I've been debating this issue with myself for several months and I'm honestly stuck. It's partly the reason why I didn't jump onto the 700 or 900 series when they released. I've been holding off to upgrade my 680 until I am set with either NVIDIA or AMD.

Kezen · Mar 31, 2015

Perhaps better compute capabilities can allow AMD to gain marketshare, I hope Nvidia will have something similar to combat them.

AmyS · Mar 31, 2015

Looking forward to a DX12 GPU from either AMD or Nvidia.

qa_engineer · Mar 31, 2015

Kezen said:
Perhaps better compute capabilities can allow AMD to gain marketshare, I hope Nvidia will have something similar to combat them.

I would think Nvidia has same type of technology in the works, only with a different marketing term

Dictator93 said:
A quck reminder that these allow for better GPU utilization. Not EXTRA utilization.

With that being said... awww yis more performance.

Bringing console efficiency to the PC. Efficiency has always been lacking in the DX api.

Shibalan · Mar 31, 2015

qa_engineer said:

I don't understand anything about this tech, but based on this drawing only, I really hope that it purpose is to resolve my traffic problems in Cities : Skylines.

DagnastyEp · Mar 31, 2015

mrklaw said:
It sounds interesting, but only having two ACEs might limit the efficiency on xbox one and most GCN GPUs. Which ones have 8 on the PC?

Its been rumored (confirmed?) that XB1 has two graphics command processors, vs only one on pretty much all other GPUs.

qa_engineer · Mar 31, 2015

Shibalan said:
I don't understand anything about this tech, but based on this drawing only, I really hope that it purpose is to resolve my traffic problems in Cities : Skylines.

https://www.youtube.com/watch?v=v3dUhep0rBs

DagnastyEp said:
Its been rumored (confirmed?) that XB1 has two graphics command processors, vs only one on pretty much all other GPUs.

I believe that is the case. I read somewhere a while back that it's used to reduce CPU latency and to improve the GPU usage when switching back and forth between UI elements, snapping etc. But what do I know?

onQ123 · Mar 31, 2015

qa_engineer said:
From what I've read Asynchronous shaders will be one of DX12's more notable features, which is something we're already seeing in PS4 games. Now that future PC games should be making use of this tech, will we see more of an emphasis on async compute on multiplat games as well? Will the inclusion of 8 ACEs in Sony's Livepool SoC help keep the PS4 relevant as it begins to age?

Excerpts from Tom's Hardware: http://www.tomshardware.com/news/amd-dx12-asynchronous-shaders-gcn,28844.html

Excerpt from Anandtech:
http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading

Me from 2 years ago on Beyond3D they even use the cars in traffic as an example like I did lol

https://forum.beyond3d.com/posts/1697620/

onQ said:
Hasn't it been said that the PS4 GPGPU has 8 Asynchronous Compute Engines instead of the 2 ACE's that the other AMD GCN cards have?

& with Asynchronous computing code can run on the same thread without having to wait for the other task to finish as long as the tasks are not blocking each other so a graphic code that takes 16ms & a compute code that takes 10ms can run at the same time in the same threads & only take 16ms to complete instead of 26ms.

this is what I'm getting from it could be wrong but that's the way it seem to me after reading about Asynchronous Computing.

so even though it's not going to give you 2X the power for graphics it can still run the graphics task and the compute task at the same time because they just past through each other instead of the slower car holding up traffic.

so you can use the full 1.84TFLOP for graphics & still run physics & other compute tasks on the GPGPU as long as the tasks are not blocking one another.

Kinthalis · Mar 31, 2015

This is how compute was utilized on some PS4 games, some dev released a video or slide where they showed the compute tasks kind of running around the other 3d rendering work in the frame.

Didn't know that Nvidia didn't have a hardware feature for this though. Hmmm..

RoboPlato · Mar 31, 2015

qa_engineer said:
From what I've read Asynchronous shaders will be one of DX12's more notable features, which is something we're already seeing in PS4 games. Now that future PC games should be making use of this tech, will we see more of an emphasis on async compute on multiplat games as well? Will the inclusion of 8 ACEs in Sony's Livepool SoC help keep the PS4 relevant as it begins to age?

One of the Q Games devs working on Tomorrow Children had some great insight into it in the thread about their GDC talk. He said that it saved them 6ms frame time in average conditions and 10ms in their stress tests. He also said that he thinks lots of multiplatform devs will use it as well since it's relatively easy to use, you can just do more with it on PS4.

qa_engineer · Mar 31, 2015

Yes, Q games is utilizing Async Compute on PS4 for tomorrow Children. I believe they're using it primarily for the Voxel Cone Tracing and their Global Illumination system.

Kinthalis · Mar 31, 2015

qa_engineer said:
Yes, Q games is utilizing Async Compute on PS4 for tomorrow Children. I believe they're using it primarily for the Voxel Cone Tracing and their Global Illumination system.

Which is what made me think Nvidia had somehting similar, since their 900 series cards support voxel based global, dynamic illumination.

Locuza · Mar 31, 2015

Captain Chaos said:
There was talk of needing a new card when DX12 was announced though?

You need a new card, if you want every new Feature.
DX12 will offer several different Feature-Levels to support a wide range of GPUs.

mrklaw said:
It sounds interesting, but only having two ACEs might limit the efficiency on xbox one and most GCN GPUs. Which ones have 8 on the PC?

Sebbi from B3D says this:

Two additional compute queues (one high prio, one low) should be enough for most purposes. More might benefit some special cases, like running many simultaneous GPU accelerated middlewares that do not know about each other. High number of queues is more about convenience than performance, just like being able to run multiple software threads on a single CPU core (OS will time slice the CPU threads).

https://forum.beyond3d.com/threads/asynchronous-compute-what-are-the-benefits-was-ps4-async-compute-benefits.54891/page-10#post-1832253

qa_engineer said:
290x is the only AMD card with 8, PS4's liverpool also has 8. The 295x is sounding like a really good investment right about now since its a dual gpu on a single pcie card setup.

Kaveri, Tonga, Hawaii and the PS4 have 8 ACEs so far.

dr_rus · Mar 31, 2015

Mindman said:
Waiting patiently to buy a DX12 AMD GPU

All GCN GPUs should be DX12 compatible.
GCN1/2/3 GPUs (excluding the first gen "GCN0") should be able to handle DX12 FL12_0.
Fiji is likely to be the only discreet GPU from AMD which will be able to handle all DX12 features -- FL12.1+.

Nikodemos said:
Well, since they made Mantle an open standard, it was only a matter of time until other companies integrated their interpretations of it in their own APIs.

Mantle and other APIs has nothing to do with asynchronous compute. It was available in NV's Kepler since GK104 launch and in AMD's Tahiti since 7970 launch.

Captain Chaos said:
Will my R9 280X be DX12 compatible?

Yes. But it won't support all features of DX12.

qa_engineer said:
My understanding is that any GCN 1.0 or 1.1 GPU will be fully compatible with DX12.

No currently available AMD GPU will be "fully" DX12 compatible.

qa_engineer said:
Which is why AMD is doing their best to make sure all of their learnings from Mantle is incorporated into DX12 and Vulkan.

Mantle and AMD has nothing to do with asynchronous compute.

mrklaw said:
It sounds interesting, but only having two ACEs might limit the efficiency on xbox one and most GCN GPUs. Which ones have 8 on the PC?

The efficiency of asynchronous compute in graphics is a moot point. It certainly is nice to have the feature but as for how much performance can actually be gained from asynchronous compute in games is up for discussion.

wit3tyg3r said:
I've been considering getting the GTX 980 Ti, if that every gets announced. However, this may push me towards waiting for the R9 390x. I'm still curious to see how they will compare in raw performance.

Maxwell 2 has 32 active queues while the latest GCN chips have 8. Not sure why you would get a GPU based on that metric as it is rather unclear by how much that feature is actually helping in games.

AmyS said:
Looking forward to a DX12 GPU from either AMD or Nvidia.

You can stop looking: all Maxwell 2 GPUs are fully compatible with the highest DX12 feature level.

Kezen said:
Perhaps better compute capabilities can allow AMD to gain marketshare, I hope Nvidia will have something similar to combat them.

Better compute capabilities are known to be a reason for a loss of gaming GPU marketshare. This is why gaming Keplers were cut down and why some compute features are cut from Maxwell 2 as well. Then again if the number of active queues is an indication of better compute capabilities then Maxwell 2 is four times ahead of Hawaii here actually.

qa_engineer · Mar 31, 2015

Mantle and AMD has nothing to do with asynchronous compute.

They didn't create async compute, but they created Mantle which some would say is the precursor to DX12, and one step closer to bringing async compute to the mainstream. What other PC games do you know uses async compute?

Maxwell 2 has 32 active queues while the latest GCN chips have 8. Not sure why you would get a GPU based on that metric as it is rather unclear by how much that feature is actually helping in games.

64 queues? You may be right on this one, but I do recall an article whereby the ACEs are described as having multiple queues.

Edit:

Thirdly, said Cerny, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?print=1
http://www.eteknix.com/playstation-4-amd-radeon-r9-290x-gpu-share-8-asynchronous-compute-engines/

Dictator93 · Mar 31, 2015

qa_engineer said:
8 ACEs in total, with two queues each. Making it 64 queues? You may be right on this one.

You are going to have to exlpain the math here.

Also... is Async compute the same as dynamic parallism introduced with kepler (and expanded upon with maxwell)?

lefox360 · Mar 31, 2015

Shibalan said:
I don't understand anything about this tech, but based on this drawing only, I really hope that it purpose is to resolve my traffic problems in Cities : Skylines.

this is great, thread redemed.

R_Deckard · Mar 31, 2015

dr_rus said:
No currently available AMD GPU will be "fully" DX12 compatible.

As any Nvidia GPU, but AMD cards support a lot more of DX12

dr_rus said:
Mantle and AMD has nothing to do with asynchronous compute.

No but the entire range of cards are much better designed and suited for this, the recent DX12 tests demonstrate this issue from AMD to Nvidia GPU

dr_rus said:
The efficiency of asynchronous compute in graphics is a moot point. It certainly is nice to have the feature but as for how much performance can actually be gained from asynchronous compute in games is up for discussion.

No it really is not, the facts from Q games on the PS4 show that this alone save them upwards of 20% in frame time and this was not maximised or filled.

It really will (and has) added real world benefit already their is no mystic here.

dr_rus said:
Maxwell 2 has 32 active queues while the latest GCN chips have 8. Not sure why you would get a GPU based on that metric as it is rather unclear by how much that feature is actually helping in games.

But the latest AMD has double the que limit here.

dr_rus said:
You can stop looking: all Maxwell 2 GPUs are fully compatible with the highest DX12 feature level.

No they are not (at least from my information at this point) they are lacking in ROV, Tier 2 even of Tiling etc

dr_rus said:
Better compute capabilities are known to be a reason for a loss of gaming GPU marketshare. This is why gaming Keplers were cut down and why some compute features are cut from Maxwell 2 as well. Then again if the number of active queues is an indication of better compute capabilities then Maxwell 2 is four times ahead of Hawaii here actually.

But see above it is not as it has less queues to handle so at best it will be half as good with slower and lesser bandwidth.

qa_engineer · Mar 31, 2015

Dictator93 said:
You are going to have to exlpain the math here.

Also... is Async compute the same as dynamic parallism introduced with kepler (and expanded upon with maxwell)?

I was typing too fast and edited my comment, it's 8x8 for a total of 64. Dyanmica Parallelism sounds like it would fit the bill, whatever its call it will be total marketing term. "Blast Processing 2.0"

Locuza · Mar 31, 2015

dr_rus said:
GCN1/2/3 GPUs (excluding the first gen "GCN0") should be able to handle DX12 FL12_0.

What is GCN 0?

Mantle and other APIs has nothing to do with asynchronous compute. It was available in NV's Kepler since GK104 launch and in AMD's Tahiti since 7970 launch.

They have something to do with it, since the API must have the capilitiy for the developer to execute serveral queues, with Vanilla OGL und DX11 this is not possible.
And the GK110 was the first Nvidia GPU which could handle different queues.

The efficiency of asynchronous compute in graphics is a moot point. It certainly is nice to have the feature but as for how much performance can actually be gained from asynchronous compute in games is up for discussion.

The Tomorrow Children from Q-Games uses 3 compute queues for the voxel lightning system.
I think they said it gave them around 20% performance.

Maxwell 2 has 32 active queues while the latest GCN chips have 8. Not sure why you would get a GPU based on that metric as it is rather unclear by how much that feature is actually helping in games.

GCN Gen 2 should be able to dispatch 64 queues:
http://abload.de/img/78767566nug1.jpg

Besthelboy · Mar 31, 2015

Working "very closely" indeed.

Async Shading isn't AMD specific but the more features the better!

qa_engineer · Mar 31, 2015

Besthelboy said:
Working "very closely" indeed.

Async Shading isn't AMD specific but the more features the better!

Haha. Great find!

Blanquito · Mar 31, 2015

Subscribed. This is interesting stuff.

dr_rus · Mar 31, 2015

qa_engineer said:
They didn't create async compute, but they created Mantle which some would say is the precursor to DX12, and one step closer to bringing async compute to the mainstream. What other PC games do you know uses async compute?

Mantle is as much of a precursor to DX12 as an API of PS2 is. The only thing that Mantle did to Dx12 is kicked MS in their lazy ass and made them finish the DX12 work which was started some time ago faster.

Also - asynchronous compute has nothing to do with APIs. It's a hardware feature, APIs are built to allow access to hardware, not the other way around.

qa_engineer said:
64 queues? You may be right on this one, but I do recall an article whereby the ACEs are described as having multiple queues.

Edit:

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?print=1
http://www.eteknix.com/playstation-4-amd-radeon-r9-290x-gpu-share-8-asynchronous-compute-engines/

8 ACEs in GCN is the same as 32 compute queues in Maxwell 2. Each ACE/queue can have a number of threads running on it as usual - this was done since I don't know how long to be able to launch new threads while old threads are stalled waiting for data from memory. This is pretty much explicitly said in Anandtech article you've linked to.

R_Deckard said:
As any Nvidia GPU, but AMD cards support a lot more of DX12

Nope. Maxwell 2 is the only architecture which supports "a lot more of DX12" at the moment.

R_Deckard said:
No but the entire range of cards are much better designed and suited for this, the recent DX12 tests demonstrate this issue from AMD to Nvidia GPU

If by recent DX12 tests you mean the Futuremark test then it is pure synthetics which is unlikely to show up in real games and the tests are run on the beta software. There are several tests which show the exact opposite right now so it is hardly an evidence of AMD's cards being "much better designed and suited for this".

R_Deckard said:
No it really is not, the facts from Q games on the PS4 show that this alone save them upwards of 20% in frame time and this was not maximised or filled.

It really will (and has) added real world benefit already their is no mystic here.

PS4's GPU is limited by it's memory bus and PS4 rendering resolution/AA. What's good for a PS4 isn't necessarily good for a PC GPU which may be limited by shading more than raster operations. As I've said the exact impact of that feature on GPU utilization on PC is unknown at the moment.

R_Deckard said:
But the latest AMD has double the que limit here.

No it's not.

R_Deckard said:
No they are not (at least from my information at this point) they are lacking in ROV, Tier 2 even of Tiling etc

Yes they are.

R_Deckard said:
But see above it is not as it has less queues to handle so at best it will be half as good with slower and lesser bandwidth.

It actually has four time more active queues. As for how many threads are in flight on these queues - that's a different question but a direct comparison here is unlikely to provide any insight into efficiency gains due to a significant differences in architectures.

Locuza said:
What is GCN 0?

GCN 1.0. I thought I've explained this.

Locuza said:
They have something to do with it, since the API must have the capilitiy for the developer to execute serveral queues, with Vanilla OGL und DX11 this is not possible.
And the GK110 was the first Nvidia GPU which could handle different queues.

GK104 was the first NV's GPU which can handle two compute queues. GK110 upped the number to 32 - four times more than 290 series 1,5 years earlier.

Locuza said:
The Tomorrow Children from Q-Games uses 3 compute queues for the voxel lightning system.
I think they said it gave them around 20% performance.

Again, on PS4. PS4 is a non-conventional system in PC metrics. It have a lot of compute resources while having a comparatively narrow memory access path. If a GPU is starved by memory access in graphics queues then launching a pure compute queues in parallel can provide a substantial benefit.

Locuza said:
GCN Gen 2 should be able to dispatch 64 queues:
http://abload.de/img/78767566nug1.jpg

There's a bit of terminology mismatch since "queues" and "threads" are used rather interchangeably by different companies. I do believe that what AMD calls ACE is a queue processor in NV's terminology. So 8 ACEs in GCN1/2 and 32 of these in Maxwell 2 is what we have right now.

Locuza · Apr 1, 2015

dr_rus said:
PS4's GPU is limited by it's memory bus and PS4 rendering resolution/AA. What's good for a PS4 isn't necessarily good for a PC GPU which may be limited by shading more than raster operations. As I've said the exact impact of that feature on GPU utilization on PC is unknown at the moment.

Now this is kind of crazy talk, the underlying GPU technology is the same, so the principial benefits.

No it's not.

This chart from anandtech is misleading, especially when looking at the context.

GCN Gen 1 / Southern Islands / IP v6 has two ACEs, each can dispatch one queue.
GCN Gen 2 / Sea Islands / IP v7 can dispatch up to 8 queues per ACE.

http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf

Page 13:

Important differences between S.I. and C.I. GPUs

Multi queue compute

Lets multiple user-level queues of compute workloads be bound to the device
and processed simultaneous. Hardware supports up to eight compute
pipelines with up to eight queues bound to each pipeline.

Bonaire can handle up to 16 queues, in contrast to the first GCN Gen 1 product stack with 77xx/78xx/79xx which is limited to 2.

So writing simply two compute in the table is simply insufficient.

(The table was first claiming 1 + 1 and 1 + 7 in mixed mode for GCN)

GCN 1.0. I thought I've explained this.

I don't see where but I more likely believe that Fiji will have the same IP Version as Tonga.
GCN Gen 3 in the end.

GK104 was the first NV's GPU which can handle two compute queues. GK110 upped the number to 32 - four times more than 290 series 1,5 years earlier.

Any source about the cabilitity of the GK104 chip to handle two compute-queues?

Again, on PS4. PS4 is a non-conventional system in PC metrics. It have a lot of compute resources while having a comparatively narrow memory access path. If a GPU is starved by memory access in graphics queues then launching a pure compute queues in parallel can provide a substantial benefit.

I see nothing really special about it.
GCN Gen 2 put on a shared-memory bus, with access to 176 GB/s.
The CPU takes its slice, than you have some problems with memory contention and the effective bandwith for the PS4 GPU is under 140 GB/s, but from raw power perspective the 7850/7870 are in the same ballpark and featuring 154 GB/s.

The GPU doesn't need to be starved by memory access, bubbles in the work-queue and dependencies can always occur.
For GCN it should be principial needed, to use async compute to utilize most of the ALUs.
And with bigger GPUs I only see more bubbles and a greater need for async compute.

There's a bit of terminology mismatch since "queues" and "threads" are used rather interchangeably by different companies. I do believe that what AMD calls ACE is a queue processor in NV's terminology. So 8 ACEs in GCN1/2 and 32 of these in Maxwell 2 is what we have right now.

You can call it a queue-processor who can dispatch up to 8 queues.
Having 64 different compute-queues in total, when you build 8 ACEs.

If I look at hyper-q then Nvidia can handle 32 compute-queues with GK110.
http://www.pcgameshardware.de/screenshots/1280x1024/2012/05/gtc_2012_gk110_architecture_details_1.jpg
And if Anandtech got the right Info from Nvidia, Maxwell Gen 2 offers the same 32 queues.

BladeSinner · Apr 1, 2015

Interesting, now all we need is info on the 3xx series. I am thinking of getting a Titan X, but something tells me I should hold out and wait for either the 3xx series, or the next nvidia generation cards. Definitely exciting times ahead though for both AMD and Nvidia cards with DX12 coming up.

rambis · Apr 1, 2015

Besthelboy said:
Working "very closely" indeed.

Async Shading isn't AMD specific but the more features the better!

Looks like we've found some of AMDs contributions to DX12. Its not uncommon for IHVs to provide architecture features to these API working groups.

joshcryer · Apr 1, 2015

I think this is part of the longer term HSA roadmap. AMD are innovating. I think there will be a shift eventually. Not with Carrizo but something after from what they are learning.

R_Deckard · Apr 1, 2015

dr_rus said:
8 ACEs in GCN is the same as 32 compute queues in Maxwell 2. Each ACE/queue can have a number of threads running on it as usual - this was done since I don't know how long to be able to launch new threads while old threads are stalled waiting for data from memory. This is pretty much explicitly said in Anandtech article you've linked to.

No its not 8 = 64 Queues of work that can be re-ordered within cycles, Double what Nvidia has with the Maxwell 2 cards

on top of this all pre- maxwell 2 cards can only serialise Graphics or Compute and not run in parallel

On a side note, part of the reason for AMD's presentation is to explain their architectural advantages over NVIDIA, so we checked with NVIDIA on queues. Fermi/Kepler/Maxwell 1 can only use a single graphics queue or their complement of compute queues, but not both at once – early implementations of HyperQ cannot be used in conjunction with graphics. Meanwhile Maxwell 2 has 32 queues, composed of 1 graphics queue and 31 compute queues (or 32 compute queues total in pure compute mode). So pre-Maxwell 2 GPUs have to either execute in serial or pre-empt to move tasks ahead of each other, which would indeed give AMD an advantage..

http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading

dr_rus said:
Nope. Maxwell 2 is the only architecture which supports "a lot more of DX12" at the moment.

Yes only the latest cards, as I said all AMD cards for the past 5+ years support far more of DX12 than an Nvidia card

dr_rus said:
If by recent DX12 tests you mean the Futuremark test then it is pure synthetics which is unlikely to show up in real games and the tests are run on the beta software. There are several tests which show the exact opposite right now so it is hardly an evidence of AMD's cards being "much better designed and suited for this".

PS4's GPU is limited by it's memory bus and PS4 rendering resolution/AA. What's good for a PS4 isn't necessarily good for a PC GPU which may be limited by shading more than raster operations. As I've said the exact impact of that feature on GPU utilization on PC is unknown at the moment.

What on earth is this? A GPU is a GPU if the jobs are all Render bound and all resources reside on GPU and need no syncing with CPU it will work and improve on any identical resource. PS4 and X1 are 11.2 (at least) GCN cards it DOES improve performance.

dr_rus said:
No it's not.

That list is wrong and you seem to coming from a point of little actual knowledge, the GCN cards need to be increased by a factor of 8. 8x8 = 64 which is Twice the amount of the Maxwell 2 card (which is the only one that can mix Graphics and Compute contexts hence the gulf in performance on 770-780 etc

dr_rus said:
Yes they are.

It actually has four time more active queues. As for how many threads are in flight on these queues - that's a different question but a direct comparison here is unlikely to provide any insight into efficiency gains due to a significant differences in architectures.

Again the list clearly shows far more support on AMD cards than Nvidia ones.

dr_rus said:
GCN 1.0. I thought I've explained this.

GK104 was the first NV's GPU which can handle two compute queues. GK110 upped the number to 32 - four times more than 290 series 1,5 years earlier.

Again, on PS4. PS4 is a non-conventional system in PC metrics. It have a lot of compute resources while having a comparatively narrow memory access path. If a GPU is starved by memory access in graphics queues then launching a pure compute queues in parallel can provide a substantial benefit.

There's a bit of terminology mismatch since "queues" and "threads" are used rather interchangeably by different companies. I do believe that what AMD calls ACE is a queue processor in NV's terminology. So 8 ACEs in GCN1/2 and 32 of these in Maxwell 2 is what we have right now.

You are playing semantics here and talking yourself in circles. A PS4 is a GPU like any GCN card the benefits are transferable and equally viable on a PC so long as you are using Async for render work. Only if you through in Compute work for CPU and the sync is needed to pass back is the split memory an issue with PC and would be far less successful than on PS4 or X1.

Objectively Bad Opinion · Apr 1, 2015

Dictator93 said:
You are going to have to exlpain the math here.

Also... is Async compute the same as dynamic parallism introduced with kepler (and expanded upon with maxwell)?

No, dynamic parallelism basically allows the GPU to schedule work for itself. It's not for gaming.

EDIT: Also this argument about the number of queues is pointless. GCN2 and higher do have 8Qs per ACE, but it's not like that many are necessary or even useful. More isn't always better, you only need enough for 100% shader utilization, that is the end goal here-- I don't know how many it would take but it's certainly not 64 or 32!

Kezen · Apr 1, 2015

I wonder how this is going to affect multiplatform games, will it tip the balance in AMD's favor ? We shall see.

Will it also inflate the specs for a console-matching experience ?

So many questions.

Marlenus · Apr 1, 2015

dr_rus said:
No currently available AMD GPU will be "fully" DX12 compatible.

Technically GCN 1.1 and 1.2 support FL 12_0 completely but Maxwell 1 and lower do not.

Maxwell 2 supports FL 12_1 and I expect the next GCN iteration to do the same.

dr_rus said:
Maxwell 2 has 32 active queues while the latest GCN chips have 8. Not sure why you would get a GPU based on that metric as it is rather unclear by how much that feature is actually helping in games.

Again each ACE in GCN 1.1 and 1.2 supports 8 queues each so current GCN has 64 queues when paired with 8 ACE units.

EDIT: Looks like I was beaten with all of this.

dr_rus said:
Mantle is as much of a precursor to DX12 as an API of PS2 is. The only thing that Mantle did to Dx12 is kicked MS in their lazy ass and made them finish the DX12 work which was started some time ago faster.

If that is the case why is the documentation for both almost identical.

hesido · Apr 1, 2015

Kinthalis said:
Which is what made me think Nvidia had somehting similar, since their 900 series cards support voxel based global, dynamic illumination.

It's not related to what can be done with compute, but the way it is done.

Support NeoGAF

AMD working closely with Microsoft on DX12, details Asynchronous Shading

Member

Member

Member

Member

Member

Member

Member

Member

Member

MrArseFace

Member

Member

Member

Neo Member

Member

Member

Member

Member

Banned

Member

Member

Member

Member

Member

Member

Banned

I'd be in the dick

Member

Banned

Member

Member

Member

Member

Banned

Member

Member

Member

Banned

Member

Member

Member

Member

Neo Member

Banned

it's ok, you're all right now

Member

Member

Banned

Member

Member

Similar threads