• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

AMD working closely with Microsoft on DX12, details Asynchronous Shading

From what I've read Asynchronous shaders will be one of DX12's more notable features, which is something we're already seeing in PS4 games. Now that future PC games should be making use of this tech, will we see more of an emphasis on async compute on multiplat games as well? Will the inclusion of 8 ACEs in Sony's Livepool SoC help keep the PS4 relevant as it begins to age?

Excerpts from Tom's Hardware: http://www.tomshardware.com/news/amd-dx12-asynchronous-shaders-gcn,28844.html


In DirectX 12, however, a new merging method called Asynchronous Shaders is available, which is basically asynchronous multi-threaded graphics with pre-emption and prioritization. What happens here is that the ACEs (Asynchronous Compute Engines) on AMD's GCN-based GPUs will interleave the tasks, filling the gaps in one queue with tasks from another, kind of like merging onto a highway where nobody moves to the side for you.

The most basic GPUs have just two ACEs, while more elaborate GPUs carry eight.

3.PNG


Excerpt from Anandtech:
http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading

Execution theory aside, what is the actual performance impact of asynchronous shaders? This is a bit harder of a question to answer at this time, though mostly because there’s virtually nothing on the PC capable of using async shaders due to the aforementioned API limitations. Thief, via its Mantle renderer, is the only PC game currently using async shaders, while on the PS4 and its homogenous platform there are a few more titles making using of the tech.

Async_Games.png
 

Nikodemos

Member
Well, since they made Mantle an open standard, it was only a matter of time until other companies integrated their interpretations of it in their own APIs.
 

mrklaw

MrArseFace
It sounds interesting, but only having two ACEs might limit the efficiency on xbox one and most GCN GPUs. Which ones have 8 on the PC?
 
It sounds interesting, but only having two ACEs might limit the efficiency on xbox one and most GCN GPUs. Which ones have 8 on the PC?
290x is the only AMD card with 8, PS4's liverpool also has 8. The 295x is sounding like a really good investment right about now since its a dual gpu on a single pcie card setup.
 

wit3tyg3r

Member
I've been considering getting the GTX 980 Ti, if that every gets announced. However, this may push me towards waiting for the R9 390x. I'm still curious to see how they will compare in raw performance.
 
I've been considering getting the GTX 980 Ti, if that every gets announced. However, this may push me towards waiting for the R9 390x. I'm still curious to see how they will compare in raw performance.

I am currently running a 290x and was also contemplating jumping to team green, but the 390x may be the better buy. Only time will tell.
 
I thought the R9 290 had 8 ACE's as well. How many does it have?

That's a good question. I can't seem to find any documentation on how many ACEs it has. It most likely does have 8 since certain 290 cards can be flashed to 290x.


edit: As it turns out the 285x, 290, 290x all have 8 ACEs. The 295x technically has 16 ACEs because of its dual GPU, single card setup.
 
A quck reminder that these allow for better GPU utilization. Not EXTRA utilization.

With that being said... awww yis more performance.
 

wit3tyg3r

Member
I am currently running a 290x and was also contemplating jumping to team green, but the 390x may be the better buy. Only time will tell.

I've been an NVIDIA customer ever since I began building PCs (currently running a GTX 680). I've rarely had issues with them. However, over the past year or so, I've actually been considering jumping to Team Red. When it comes to pure performance, I know the NVIDIA cards have generally done better. But, the emerging issue has been company policies and business practices.

NVIDIA likes to license everything, making it more expensive for hardware manufacturers and game developers to use their technology. AMD seems to be more on board with Open Source initiatives and making sure that everyone has access to their tech, even NVIDIA. It makes AMD look like the "good guys" and I want to support them for that.

And this news with Asynchronous Shading tech makes me want to support Team Red even more!

I've been debating this issue with myself for several months and I'm honestly stuck. It's partly the reason why I didn't jump onto the 700 or 900 series when they released. I've been holding off to upgrade my 680 until I am set with either NVIDIA or AMD.
 

Kezen

Banned
Perhaps better compute capabilities can allow AMD to gain marketshare, I hope Nvidia will have something similar to combat them.
 
Perhaps better compute capabilities can allow AMD to gain marketshare, I hope Nvidia will have something similar to combat them.
I would think Nvidia has same type of technology in the works, only with a different marketing term

A quck reminder that these allow for better GPU utilization. Not EXTRA utilization.

With that being said... awww yis more performance.

Bringing console efficiency to the PC. Efficiency has always been lacking in the DX api.
 
It sounds interesting, but only having two ACEs might limit the efficiency on xbox one and most GCN GPUs. Which ones have 8 on the PC?

Its been rumored (confirmed?) that XB1 has two graphics command processors, vs only one on pretty much all other GPUs.
 
I don't understand anything about this tech, but based on this drawing only, I really hope that it purpose is to resolve my traffic problems in Cities : Skylines.

https://www.youtube.com/watch?v=v3dUhep0rBs

Its been rumored (confirmed?) that XB1 has two graphics command processors, vs only one on pretty much all other GPUs.

I believe that is the case. I read somewhere a while back that it's used to reduce CPU latency and to improve the GPU usage when switching back and forth between UI elements, snapping etc. But what do I know?
 

onQ123

Member
From what I've read Asynchronous shaders will be one of DX12's more notable features, which is something we're already seeing in PS4 games. Now that future PC games should be making use of this tech, will we see more of an emphasis on async compute on multiplat games as well? Will the inclusion of 8 ACEs in Sony's Livepool SoC help keep the PS4 relevant as it begins to age?

Excerpts from Tom's Hardware: http://www.tomshardware.com/news/amd-dx12-asynchronous-shaders-gcn,28844.html






3.PNG


Excerpt from Anandtech:
http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading



Async_Games.png



Me from 2 years ago on Beyond3D they even use the cars in traffic as an example like I did lol


https://forum.beyond3d.com/posts/1697620/


onQ said:
Hasn't it been said that the PS4 GPGPU has 8 Asynchronous Compute Engines instead of the 2 ACE's that the other AMD GCN cards have?


& with Asynchronous computing code can run on the same thread without having to wait for the other task to finish as long as the tasks are not blocking each other so a graphic code that takes 16ms & a compute code that takes 10ms can run at the same time in the same threads & only take 16ms to complete instead of 26ms.

this is what I'm getting from it could be wrong but that's the way it seem to me after reading about Asynchronous Computing.


so even though it's not going to give you 2X the power for graphics it can still run the graphics task and the compute task at the same time because they just past through each other instead of the slower car holding up traffic.


so you can use the full 1.84TFLOP for graphics & still run physics & other compute tasks on the GPGPU as long as the tasks are not blocking one another.
 

Kinthalis

Banned
This is how compute was utilized on some PS4 games, some dev released a video or slide where they showed the compute tasks kind of running around the other 3d rendering work in the frame.

Didn't know that Nvidia didn't have a hardware feature for this though. Hmmm..
 

RoboPlato

I'd be in the dick
From what I've read Asynchronous shaders will be one of DX12's more notable features, which is something we're already seeing in PS4 games. Now that future PC games should be making use of this tech, will we see more of an emphasis on async compute on multiplat games as well? Will the inclusion of 8 ACEs in Sony's Livepool SoC help keep the PS4 relevant as it begins to age?

One of the Q Games devs working on Tomorrow Children had some great insight into it in the thread about their GDC talk. He said that it saved them 6ms frame time in average conditions and 10ms in their stress tests. He also said that he thinks lots of multiplatform devs will use it as well since it's relatively easy to use, you can just do more with it on PS4.
 
Yes, Q games is utilizing Async Compute on PS4 for tomorrow Children. I believe they're using it primarily for the Voxel Cone Tracing and their Global Illumination system.
 

Kinthalis

Banned
Yes, Q games is utilizing Async Compute on PS4 for tomorrow Children. I believe they're using it primarily for the Voxel Cone Tracing and their Global Illumination system.

Which is what made me think Nvidia had somehting similar, since their 900 series cards support voxel based global, dynamic illumination.
 

Locuza

Member
There was talk of needing a new card when DX12 was announced though?
You need a new card, if you want every new Feature.
DX12 will offer several different Feature-Levels to support a wide range of GPUs.

It sounds interesting, but only having two ACEs might limit the efficiency on xbox one and most GCN GPUs. Which ones have 8 on the PC?
Sebbi from B3D says this:
Two additional compute queues (one high prio, one low) should be enough for most purposes. More might benefit some special cases, like running many simultaneous GPU accelerated middlewares that do not know about each other. High number of queues is more about convenience than performance, just like being able to run multiple software threads on a single CPU core (OS will time slice the CPU threads).
https://forum.beyond3d.com/threads/asynchronous-compute-what-are-the-benefits-was-ps4-async-compute-benefits.54891/page-10#post-1832253

290x is the only AMD card with 8, PS4's liverpool also has 8. The 295x is sounding like a really good investment right about now since its a dual gpu on a single pcie card setup.
Kaveri, Tonga, Hawaii and the PS4 have 8 ACEs so far.
 

dr_rus

Member
Waiting patiently to buy a DX12 AMD GPU
All GCN GPUs should be DX12 compatible.
GCN1/2/3 GPUs (excluding the first gen "GCN0") should be able to handle DX12 FL12_0.
Fiji is likely to be the only discreet GPU from AMD which will be able to handle all DX12 features -- FL12.1+.

Well, since they made Mantle an open standard, it was only a matter of time until other companies integrated their interpretations of it in their own APIs.
Mantle and other APIs has nothing to do with asynchronous compute. It was available in NV's Kepler since GK104 launch and in AMD's Tahiti since 7970 launch.

Will my R9 280X be DX12 compatible?
Yes. But it won't support all features of DX12.

My understanding is that any GCN 1.0 or 1.1 GPU will be fully compatible with DX12.
No currently available AMD GPU will be "fully" DX12 compatible.

Which is why AMD is doing their best to make sure all of their learnings from Mantle is incorporated into DX12 and Vulkan.
Mantle and AMD has nothing to do with asynchronous compute.

It sounds interesting, but only having two ACEs might limit the efficiency on xbox one and most GCN GPUs. Which ones have 8 on the PC?
The efficiency of asynchronous compute in graphics is a moot point. It certainly is nice to have the feature but as for how much performance can actually be gained from asynchronous compute in games is up for discussion.

I've been considering getting the GTX 980 Ti, if that every gets announced. However, this may push me towards waiting for the R9 390x. I'm still curious to see how they will compare in raw performance.
Maxwell 2 has 32 active queues while the latest GCN chips have 8. Not sure why you would get a GPU based on that metric as it is rather unclear by how much that feature is actually helping in games.

Looking forward to a DX12 GPU from either AMD or Nvidia.
You can stop looking: all Maxwell 2 GPUs are fully compatible with the highest DX12 feature level.

Perhaps better compute capabilities can allow AMD to gain marketshare, I hope Nvidia will have something similar to combat them.
Better compute capabilities are known to be a reason for a loss of gaming GPU marketshare. This is why gaming Keplers were cut down and why some compute features are cut from Maxwell 2 as well. Then again if the number of active queues is an indication of better compute capabilities then Maxwell 2 is four times ahead of Hawaii here actually.
 
Mantle and AMD has nothing to do with asynchronous compute.
They didn't create async compute, but they created Mantle which some would say is the precursor to DX12, and one step closer to bringing async compute to the mainstream. What other PC games do you know uses async compute?

Maxwell 2 has 32 active queues while the latest GCN chips have 8. Not sure why you would get a GPU based on that metric as it is rather unclear by how much that feature is actually helping in games.

64 queues? You may be right on this one, but I do recall an article whereby the ACEs are described as having multiple queues.

Edit:

Thirdly, said Cerny, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."

http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?print=1
http://www.eteknix.com/playstation-4-amd-radeon-r9-290x-gpu-share-8-asynchronous-compute-engines/
 

R_Deckard

Member
No currently available AMD GPU will be "fully" DX12 compatible.

As any Nvidia GPU, but AMD cards support a lot more of DX12

Mantle and AMD has nothing to do with asynchronous compute.

No but the entire range of cards are much better designed and suited for this, the recent DX12 tests demonstrate this issue from AMD to Nvidia GPU

The efficiency of asynchronous compute in graphics is a moot point. It certainly is nice to have the feature but as for how much performance can actually be gained from asynchronous compute in games is up for discussion.

No it really is not, the facts from Q games on the PS4 show that this alone save them upwards of 20% in frame time and this was not maximised or filled.

It really will (and has) added real world benefit already their is no mystic here.

Maxwell 2 has 32 active queues while the latest GCN chips have 8. Not sure why you would get a GPU based on that metric as it is rather unclear by how much that feature is actually helping in games.
But the latest AMD has double the que limit here.

You can stop looking: all Maxwell 2 GPUs are fully compatible with the highest DX12 feature level.

No they are not (at least from my information at this point) they are lacking in ROV, Tier 2 even of Tiling etc

Better compute capabilities are known to be a reason for a loss of gaming GPU marketshare. This is why gaming Keplers were cut down and why some compute features are cut from Maxwell 2 as well. Then again if the number of active queues is an indication of better compute capabilities then Maxwell 2 is four times ahead of Hawaii here actually.
But see above it is not as it has less queues to handle so at best it will be half as good with slower and lesser bandwidth.
 
You are going to have to exlpain the math here.

Also... is Async compute the same as dynamic parallism introduced with kepler (and expanded upon with maxwell)?

I was typing too fast and edited my comment, it's 8x8 for a total of 64. Dyanmica Parallelism sounds like it would fit the bill, whatever its call it will be total marketing term. "Blast Processing 2.0"
 

Locuza

Member
GCN1/2/3 GPUs (excluding the first gen "GCN0") should be able to handle DX12 FL12_0.
What is GCN 0?

Mantle and other APIs has nothing to do with asynchronous compute. It was available in NV's Kepler since GK104 launch and in AMD's Tahiti since 7970 launch.
They have something to do with it, since the API must have the capilitiy for the developer to execute serveral queues, with Vanilla OGL und DX11 this is not possible.
And the GK110 was the first Nvidia GPU which could handle different queues.

The efficiency of asynchronous compute in graphics is a moot point. It certainly is nice to have the feature but as for how much performance can actually be gained from asynchronous compute in games is up for discussion.
The Tomorrow Children from Q-Games uses 3 compute queues for the voxel lightning system.
I think they said it gave them around 20% performance.

Maxwell 2 has 32 active queues while the latest GCN chips have 8. Not sure why you would get a GPU based on that metric as it is rather unclear by how much that feature is actually helping in games.
GCN Gen 2 should be able to dispatch 64 queues:
http://abload.de/img/78767566nug1.jpg
 

dr_rus

Member
They didn't create async compute, but they created Mantle which some would say is the precursor to DX12, and one step closer to bringing async compute to the mainstream. What other PC games do you know uses async compute?
Mantle is as much of a precursor to DX12 as an API of PS2 is. The only thing that Mantle did to Dx12 is kicked MS in their lazy ass and made them finish the DX12 work which was started some time ago faster.

Also - asynchronous compute has nothing to do with APIs. It's a hardware feature, APIs are built to allow access to hardware, not the other way around.


8 ACEs in GCN is the same as 32 compute queues in Maxwell 2. Each ACE/queue can have a number of threads running on it as usual - this was done since I don't know how long to be able to launch new threads while old threads are stalled waiting for data from memory. This is pretty much explicitly said in Anandtech article you've linked to.

As any Nvidia GPU, but AMD cards support a lot more of DX12
Nope. Maxwell 2 is the only architecture which supports "a lot more of DX12" at the moment.

No but the entire range of cards are much better designed and suited for this, the recent DX12 tests demonstrate this issue from AMD to Nvidia GPU
If by recent DX12 tests you mean the Futuremark test then it is pure synthetics which is unlikely to show up in real games and the tests are run on the beta software. There are several tests which show the exact opposite right now so it is hardly an evidence of AMD's cards being "much better designed and suited for this".

No it really is not, the facts from Q games on the PS4 show that this alone save them upwards of 20% in frame time and this was not maximised or filled.

It really will (and has) added real world benefit already their is no mystic here.
PS4's GPU is limited by it's memory bus and PS4 rendering resolution/AA. What's good for a PS4 isn't necessarily good for a PC GPU which may be limited by shading more than raster operations. As I've said the exact impact of that feature on GPU utilization on PC is unknown at the moment.

But the latest AMD has double the que limit here.
No it's not.

UJRXEj6.png


No they are not (at least from my information at this point) they are lacking in ROV, Tier 2 even of Tiling etc
Yes they are.

DKiYoGI.jpg


But see above it is not as it has less queues to handle so at best it will be half as good with slower and lesser bandwidth.
It actually has four time more active queues. As for how many threads are in flight on these queues - that's a different question but a direct comparison here is unlikely to provide any insight into efficiency gains due to a significant differences in architectures.

What is GCN 0?
GCN 1.0. I thought I've explained this.

They have something to do with it, since the API must have the capilitiy for the developer to execute serveral queues, with Vanilla OGL und DX11 this is not possible.
And the GK110 was the first Nvidia GPU which could handle different queues.
GK104 was the first NV's GPU which can handle two compute queues. GK110 upped the number to 32 - four times more than 290 series 1,5 years earlier.

The Tomorrow Children from Q-Games uses 3 compute queues for the voxel lightning system.
I think they said it gave them around 20% performance.
Again, on PS4. PS4 is a non-conventional system in PC metrics. It have a lot of compute resources while having a comparatively narrow memory access path. If a GPU is starved by memory access in graphics queues then launching a pure compute queues in parallel can provide a substantial benefit.

GCN Gen 2 should be able to dispatch 64 queues:
http://abload.de/img/78767566nug1.jpg
There's a bit of terminology mismatch since "queues" and "threads" are used rather interchangeably by different companies. I do believe that what AMD calls ACE is a queue processor in NV's terminology. So 8 ACEs in GCN1/2 and 32 of these in Maxwell 2 is what we have right now.
 

Locuza

Member
PS4's GPU is limited by it's memory bus and PS4 rendering resolution/AA. What's good for a PS4 isn't necessarily good for a PC GPU which may be limited by shading more than raster operations. As I've said the exact impact of that feature on GPU utilization on PC is unknown at the moment.
Now this is kind of crazy talk, the underlying GPU technology is the same, so the principial benefits.

No it's not.

UJRXEj6.png
This chart from anandtech is misleading, especially when looking at the context.

GCN Gen 1 / Southern Islands / IP v6 has two ACEs, each can dispatch one queue.
GCN Gen 2 / Sea Islands / IP v7 can dispatch up to 8 queues per ACE.

http://developer.amd.com/wordpress/media/2013/07/AMD_Sea_Islands_Instruction_Set_Architecture.pdf

Page 13:
Important differences between S.I. and C.I. GPUs

• Multi queue compute

Lets multiple user-level queues of compute workloads be bound to the device
and processed simultaneous. Hardware supports up to eight compute
pipelines with up to eight queues bound to each pipeline.

Bonaire can handle up to 16 queues, in contrast to the first GCN Gen 1 product stack with 77xx/78xx/79xx which is limited to 2.

So writing simply two compute in the table is simply insufficient.

(The table was first claiming 1 + 1 and 1 + 7 in mixed mode for GCN)

GCN 1.0. I thought I've explained this.
I don't see where but I more likely believe that Fiji will have the same IP Version as Tonga.
GCN Gen 3 in the end.

GK104 was the first NV's GPU which can handle two compute queues. GK110 upped the number to 32 - four times more than 290 series 1,5 years earlier.
Any source about the cabilitity of the GK104 chip to handle two compute-queues?

Again, on PS4. PS4 is a non-conventional system in PC metrics. It have a lot of compute resources while having a comparatively narrow memory access path. If a GPU is starved by memory access in graphics queues then launching a pure compute queues in parallel can provide a substantial benefit.
I see nothing really special about it.
GCN Gen 2 put on a shared-memory bus, with access to 176 GB/s.
The CPU takes its slice, than you have some problems with memory contention and the effective bandwith for the PS4 GPU is under 140 GB/s, but from raw power perspective the 7850/7870 are in the same ballpark and featuring 154 GB/s.

The GPU doesn't need to be starved by memory access, bubbles in the work-queue and dependencies can always occur.
For GCN it should be principial needed, to use async compute to utilize most of the ALUs.
And with bigger GPUs I only see more bubbles and a greater need for async compute.

There's a bit of terminology mismatch since "queues" and "threads" are used rather interchangeably by different companies. I do believe that what AMD calls ACE is a queue processor in NV's terminology. So 8 ACEs in GCN1/2 and 32 of these in Maxwell 2 is what we have right now.
You can call it a queue-processor who can dispatch up to 8 queues.
Having 64 different compute-queues in total, when you build 8 ACEs.

If I look at hyper-q then Nvidia can handle 32 compute-queues with GK110.
http://www.pcgameshardware.de/screenshots/1280x1024/2012/05/gtc_2012_gk110_architecture_details_1.jpg
And if Anandtech got the right Info from Nvidia, Maxwell Gen 2 offers the same 32 queues.
 

BladeSinner

Neo Member
Interesting, now all we need is info on the 3xx series. I am thinking of getting a Titan X, but something tells me I should hold out and wait for either the 3xx series, or the next nvidia generation cards. Definitely exciting times ahead though for both AMD and Nvidia cards with DX12 coming up.
 

rambis

Banned
Working "very closely" indeed.



Async Shading isn't AMD specific but the more features the better!
Looks like we've found some of AMDs contributions to DX12. Its not uncommon for IHVs to provide architecture features to these API working groups.
 

joshcryer

it's ok, you're all right now
I think this is part of the longer term HSA roadmap. AMD are innovating. I think there will be a shift eventually. Not with Carrizo but something after from what they are learning.
 

R_Deckard

Member
8 ACEs in GCN is the same as 32 compute queues in Maxwell 2. Each ACE/queue can have a number of threads running on it as usual - this was done since I don't know how long to be able to launch new threads while old threads are stalled waiting for data from memory. This is pretty much explicitly said in Anandtech article you've linked to.

No its not 8 = 64 Queues of work that can be re-ordered within cycles, Double what Nvidia has with the Maxwell 2 cards

on top of this all pre- maxwell 2 cards can only serialise Graphics or Compute and not run in parallel

On a side note, part of the reason for AMD's presentation is to explain their architectural advantages over NVIDIA, so we checked with NVIDIA on queues. Fermi/Kepler/Maxwell 1 can only use a single graphics queue or their complement of compute queues, but not both at once – early implementations of HyperQ cannot be used in conjunction with graphics. Meanwhile Maxwell 2 has 32 queues, composed of 1 graphics queue and 31 compute queues (or 32 compute queues total in pure compute mode). So pre-Maxwell 2 GPUs have to either execute in serial or pre-empt to move tasks ahead of each other, which would indeed give AMD an advantage..

http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading



Nope. Maxwell 2 is the only architecture which supports "a lot more of DX12" at the moment.

Yes only the latest cards, as I said all AMD cards for the past 5+ years support far more of DX12 than an Nvidia card

If by recent DX12 tests you mean the Futuremark test then it is pure synthetics which is unlikely to show up in real games and the tests are run on the beta software. There are several tests which show the exact opposite right now so it is hardly an evidence of AMD's cards being "much better designed and suited for this".


PS4's GPU is limited by it's memory bus and PS4 rendering resolution/AA. What's good for a PS4 isn't necessarily good for a PC GPU which may be limited by shading more than raster operations. As I've said the exact impact of that feature on GPU utilization on PC is unknown at the moment.

What on earth is this? A GPU is a GPU if the jobs are all Render bound and all resources reside on GPU and need no syncing with CPU it will work and improve on any identical resource. PS4 and X1 are 11.2 (at least) GCN cards it DOES improve performance.
No it's not.
UJRXEj6.png

That list is wrong and you seem to coming from a point of little actual knowledge, the GCN cards need to be increased by a factor of 8. 8x8 = 64 which is Twice the amount of the Maxwell 2 card (which is the only one that can mix Graphics and Compute contexts hence the gulf in performance on 770-780 etc



Yes they are.

DKiYoGI.jpg



It actually has four time more active queues. As for how many threads are in flight on these queues - that's a different question but a direct comparison here is unlikely to provide any insight into efficiency gains due to a significant differences in architectures.

Again the list clearly shows far more support on AMD cards than Nvidia ones.

GCN 1.0. I thought I've explained this.


GK104 was the first NV's GPU which can handle two compute queues. GK110 upped the number to 32 - four times more than 290 series 1,5 years earlier.


Again, on PS4. PS4 is a non-conventional system in PC metrics. It have a lot of compute resources while having a comparatively narrow memory access path. If a GPU is starved by memory access in graphics queues then launching a pure compute queues in parallel can provide a substantial benefit.


There's a bit of terminology mismatch since "queues" and "threads" are used rather interchangeably by different companies. I do believe that what AMD calls ACE is a queue processor in NV's terminology. So 8 ACEs in GCN1/2 and 32 of these in Maxwell 2 is what we have right now.

You are playing semantics here and talking yourself in circles. A PS4 is a GPU like any GCN card the benefits are transferable and equally viable on a PC so long as you are using Async for render work. Only if you through in Compute work for CPU and the sync is needed to pass back is the split memory an issue with PC and would be far less successful than on PS4 or X1.
 
You are going to have to exlpain the math here.

Also... is Async compute the same as dynamic parallism introduced with kepler (and expanded upon with maxwell)?

No, dynamic parallelism basically allows the GPU to schedule work for itself. It's not for gaming.

EDIT: Also this argument about the number of queues is pointless. GCN2 and higher do have 8Qs per ACE, but it's not like that many are necessary or even useful. More isn't always better, you only need enough for 100% shader utilization, that is the end goal here-- I don't know how many it would take but it's certainly not 64 or 32!
 

Kezen

Banned
I wonder how this is going to affect multiplatform games, will it tip the balance in AMD's favor ? We shall see.

Will it also inflate the specs for a console-matching experience ?

So many questions.
 

Marlenus

Member
No currently available AMD GPU will be "fully" DX12 compatible.

Technically GCN 1.1 and 1.2 support FL 12_0 completely but Maxwell 1 and lower do not.

Maxwell 2 supports FL 12_1 and I expect the next GCN iteration to do the same.

CA2AIhD.jpg



Maxwell 2 has 32 active queues while the latest GCN chips have 8. Not sure why you would get a GPU based on that metric as it is rather unclear by how much that feature is actually helping in games.

Again each ACE in GCN 1.1 and 1.2 supports 8 queues each so current GCN has 64 queues when paired with 8 ACE units.

EDIT: Looks like I was beaten with all of this.

Mantle is as much of a precursor to DX12 as an API of PS2 is. The only thing that Mantle did to Dx12 is kicked MS in their lazy ass and made them finish the DX12 work which was started some time ago faster.

If that is the case why is the documentation for both almost identical.

CBBu9COWwAAPzZB.jpg:large
 
Top Bottom