• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Quantum Break PC performance thread

TSM

Member
Unless it's due to async compute limitations. Nvidia's implementation is nowhere near as efficient as AMDs. So, if a game relies on Async heavily in it's DX12 implementation, then Nvidia cards will suffer. Pascal corrected a lot of the issues that Maxwell 2 had, such as:

Source Anandtech:


on Pascal :



and in comparison to AMD:



This could account for the difference between the 970 and the 1060 on DX12 and for the difference between the 1060 and 480. It seems to scale pretty well with the async compute capabilities of the cards.

Assuming it had anything to do with that, Remedy still chose to do that and botched the performance. That has nothing to do with DX12 and everything to do with the dev getting terrible performance out of Nvidia hardware. The difference in performance demonstrated by those benchmarks should frankly be embarrassing to a dev that decided to use DX12. DX11 is over 33% faster than DX12 on a GTX 1080 at 1080p.

Then there is the fact that with the RX 480 they didn't gain any performance with DX12 over DX11. There is usually a sizable gain between the 2 on AMD hardware, and Remedy sees no improvement. This probably rules out async having anything to do with these results as there would be bump in performance on AMD from async alone if they used it. I think Remedy is either very poor with DX12, or they shoehorned DX12 in late into development and it did more harm than good.
 
Assuming it had anything to do with that, Remedy still chose to do that and botched the performance. That has nothing to do with DX12 and everything to do with the dev getting terrible performance out of Nvidia hardware. The difference in performance demonstrated by those benchmarks should frankly be embarrassing to a dev that decided to use DX12. DX11 is over 33% faster than DX12 on a GTX 1080 at 1080p.

Then there is the fact that with the RX 480 they didn't gain any performance with DX12 over DX11. There is usually a sizable gain between the 2 on AMD hardware, and Remedy sees no improvement. This probably rules out async having anything to do with these results as there would be bump in performance on AMD from async alone if they used it. I think Remedy is either very poor with DX12, or they shoehorned DX12 in late into development and it did more harm than good.

It's possible, it's also possible that the 480 is being held back in other areas not associated with async.

There's no way for me to really know, it's just interesting how the performance scales between maxwell and pascal and how the main difference between the two other than node size seems to be async implementation. The engineering team didn't veer too far from Maxwell in many regards.
 

TSM

Member
It's possible, it's also possible that the 480 is being held back in other areas not associated with async.

There's no way for me to really know, it's just interesting how the performance scales between maxwell and pascal and how the main difference between the two other than node size seems to be async implementation. The engineering team didn't veer too far from Maxwell in many regards.

The fact that there is pretty much a 0% improvement using DX12 over DX11 on AMD across the board speaks volumes. As far as I know this is the first game that has ever seen such a result for a normal gaming workload. Something appears to be very wrong with Remedy's DX12 implementation.
 
The fact that there is pretty much a 0% improvement using DX12 over DX11 on AMD across the board speaks volumes. As far as I know this is the first game that has ever seen such a result for a normal gaming workload. Something appears to be very wrong with Remedy's DX12 implementation.

I'm not arguing against that. I'm just discussing the performance differences we are seeing and what within each card could be causing them to run the same code so differently.

I'm not claiming that Remedy's implementation is good, I'm talking about what could be causing three cards that are relatively similar in performance to get such different results running the same code.
 

PaulloDEC

Member
Core i7 3820
GTX 970
16gb RAM

I've only played a little so far, but things are looking good for 60fps at 1080p, all settings medium with upscaling on. I don't love the upscaling, but playing from a couple of meters away on my tv helps.
 

dr_rus

Member
That's just it, if a game's implementation of DX12 relies on running code asyncronously, then running serially could cause lower framrate and could stall out if the card isn't able to keep up with vital processes. Sort of like what seems to be happening with the 970 in the GF video which leads to a driver crash.

Considering that the DX12 version was coded to maximize performance on a console, it's a safe bet that it heavily favors async compute as that should be the best way to maximize performance there.

No, it can't. A DX12 without async compute is a complete clone of DX11 submission model. The only reason for such gains in DX11 on NV h/w is that the DX12 renderer is badly optimized for anything but the (console) GCN GPUs, and it's most likely the usual resource management issue which has nothing to do with async compute.

The reason AMD is loosing some performance is because there's no async compute in DX11 (and some CPU limitation of their DX11 driver may be at play in the higher end), you can pretty much gain the incredible performance increases AMD gets from async compute here from CB.de's results:

gNbc.png


Fury X: 48,6->52,2 - +7%
390: 38,5->43,2 - +12%
480: a small loss of performance as Polaris is more efficient in graphics than GCN2/3 and because of this it will gain less from running compute asynchronously. The loss is likely due to AMD's DX11 driver being what it is. It's also possible that the game's async compute implementation just wasn't tweaked for Polaris GPUs.
 
No, it can't. A DX12 without async compute is a complete clone of DX11 submission model.

Yes, and if the code written for the DX12 implementation requires more than the DX11 submission model allows, then the card will struggle to run the DX12 code using the "clone of the dx11 submission model." There's really no discussion to be had here, this is really simple logic. You can't fit 6 gallons in a 5 gallon bucket. One gallon will have to wait to go in once more room in available. If that 6th gallon in needed before there's room for it in the pipeline, then you crash.

Again, I'm not saying that this is what's happening in Quantum Break, just that the performance between the 970, 1060, and 480 scales really well with their async compute implementation running on DX12.
 
No, it can't. A DX12 without async compute is a complete clone of DX11 submission model. The only reason for such gains in DX11 on NV h/w is that the DX12 renderer is badly optimized for anything but the (console) GCN GPUs, and it's most likely the usual resource management issue which has nothing to do with async compute.

The reason AMD is loosing some performance is because there's no async compute in DX11 (and some CPU limitation of their DX11 driver may be at play in the higher end), you can pretty much gain the incredible performance increases AMD gets from async compute here from CB.de's results:

gNbc.png


Fury X: 48,6->52,2 - +7%
390: 38,5->43,2 - +12%
480: a small loss of performance as Polaris is more efficient in graphics than GCN2/3 and because of this it will gain less from running compute asynchronously. The loss is likely due to AMD's DX11 driver being what it is. It's also possible that the game's async compute implementation just wasn't tweaked for Polaris GPUs.

this part isnt true. a game written for dx12 from the ground up will be entirely different. even without async compute their are performance gains to be had on the gpu side once this happens
 

Locuza

Member
The fact that there is pretty much a 0% improvement using DX12 over DX11 on AMD across the board speaks volumes. As far as I know this is the first game that has ever seen such a result for a normal gaming workload. Something appears to be very wrong with Remedy's DX12 implementation.
AMD does gain quite a lot if you use a weaker CPU:
qb1tb5o.jpg

https://www.computerbase.de/2016-09/quantum-break-steam-benchmark/3/#diagramm-quantum-break-1920-1080-fx-8370

8% for the 380, 26% for the 390, 24% with the RX 480 and extreme 47% with the Fury X.

Also the frametimes are better:
https://www.computerbase.de/2016-09/quantum-break-steam-benchmark/4/#diagramm-frametimes-unter-steam-auf-dem-i7-6700k

In general DX12 is positive for AMD, you either get better performance or frametimes and you practically never loose something.
Even with the 6700K at least Hawaii GPUs can get 12% more performance.

The DX12 implementation in QB is at least for AMD one of the better examples, since Deus Ex: Mankind Divided (Maybe like in Tombraider it will get better), Rise of the Tombraider (after the patches it should be better) and Total War: Warhammer (still beta) were all trash under DX12 for AMD even with weak CPUs.

[...]you can pretty much gain the incredible performance increases AMD gets from async compute here from CB.de's results:
If it uses Multiengine at all, since it doesn't seem like that:
[...]
But though it's definitely DirectX 12, there doesn't seem to be anything actually being put into the compute queue at all. It's empty. Even though AMD has quite the large advantage in all resolutions, it isn't because of asynchronous compute at all, but due to something else entirely. Taking a look at GPUView after a particularly grueling session revealed literally nothing but the render queue full, to the brim.
http://www.tweaktown.com/guides/7655/quantum-break-pc-performance-analysis/index2.html

[...]
480: a small loss of performance as Polaris is more efficient in graphics than GCN2/3 and because of this it will gain less from running compute asynchronously. The loss is likely due to AMD's DX11 driver being what it is. It's also possible that the game's async compute implementation just wasn't tweaked for Polaris GPUs.
Even if we assume Quantum Break does use Async Compute how would you explain the results from the 380 (GCN Gen 3) which are identical to the 480 (GCN Gen 4)?
 
AMD does gain quite a lot if you use a weaker CPU:
qb1tb5o.jpg

https://www.computerbase.de/2016-09/quantum-break-steam-benchmark/3/#diagramm-quantum-break-1920-1080-fx-8370

8% for the 380, 26% for the 390, 24% with the RX 480 and extreme 47% with the Fury X.

Also the frametimes are better:
https://www.computerbase.de/2016-09/quantum-break-steam-benchmark/4/#diagramm-frametimes-unter-steam-auf-dem-i7-6700k

In general DX12 is positive for AMD, you either get better performance or frametimes and you practically never loose something.
Even with the 6700K at least Hawaii GPUs can get 12% more performance.

The DX12 implementation in QB is at least for AMD one of the better examples, since Deus Ex: Mankind Divided (Maybe like in Tombraider it will get better), Rise of the Tombraider (after the patches it should be better) and Total War: Warhammer (still beta) were all trash under DX12 for AMD even with weak CPUs.


If it uses Multiengine at all, since it doesn't seem like that:

http://www.tweaktown.com/guides/7655/quantum-break-pc-performance-analysis/index2.html


Even if we assume Quantum Break does use Async Compute how would you explain the results from the 380 (GCN Gen 3) which are identical to the 480 (GCN Gen 4)?
I'm trying to understand Tweaktowns reasoning... They say there's no async and their reasoning is that the compute queue is empty and the render queue is full... Isn't that what async compute is supposed to do, all the compute tasks get completed instead of waiting in the queue for all the tasks in the render queue to be completed before they can be processed? There's a shit ton of render tasks, and yet all of the compute tasks are getting completed immediately and none of them are having to wait in queue...
 
What are the best settings for the Steam version(DX11) to maintain good image quality(1440P) and 60 or near 60FPS on a GTX 980TI?

Should I leave Upscaling on? I'm not interested in the craze that is "Max" just looking for decent image quality and decent performance as I believe shooters need 60FPS especially on PC.
 

Daingurse

Member
What are the best settings for the Steam version(DX11) to maintain good image quality(1440P) and 60 or near 60FPS on a GTX 980TI?

Should I leave Upscaling on? I'm not interested in the craze that is "Max" just looking for decent image quality and decent performance as I believe shooters need 60FPS especially on PC.

I'd turn Upscaling off and lower settings. Upscaling can be decent when I'm playing on my TV due to the viewing distance, but I prefer having a cleaner looking image overall.
 
What are the best settings for the Steam version(DX11) to maintain good image quality(1440P) and 60 or near 60FPS on a GTX 980TI?

Should I leave Upscaling on? I'm not interested in the craze that is "Max" just looking for decent image quality and decent performance as I believe shooters need 60FPS especially on PC.

Medium settings, textures and AF maxed, upscaling on is a good starting place. You can increase settings and test from there if you have spare performance. Scaling the majority of settings up from medium destroys performance for very little visual gain
 
I'd turn Upscaling off and lower settings. Upscaling can be decent when I'm playing on my TV due to the viewing distance, but I prefer having a cleaner looking image overall.

Medium settings, textures and AF maxed, upscaling on is a good starting place. You can increase settings and test from there if you have spare performance. Scaling the majority of settings up from medium destroys performance for very little visual gain

Thanks guys, I'll try that then as my performance was pretty bad as it defaulted me on a mix of Ultra and High.
 

PowerK

Member
Just reached chapter 3 act 1.

Super stable at 60 FPS at 1080p, no drops.

Playing at everything maxed out. Film Grain off, AA on, upscaling on, exclusive full screen.
Not surprised it's running well. Because it's essentially running at 720p.
 

Locuza

Member
I'm trying to understand Tweaktowns reasoning... They say there's no async and their reasoning is that the compute queue is empty and the render queue is full... Isn't that what async compute is supposed to do, all the compute tasks get completed instead of waiting in the queue for all the tasks in the render queue to be completed before they can be processed? There's a shit ton of render tasks, and yet all of the compute tasks are getting completed immediately and none of them are having to wait in queue...
IC839522.png

https://msdn.microsoft.com/en-us/library/windows/desktop/dn899217(v=vs.85).aspx

The concept behind Multi-Engine (publicly known as Async Compute) is to be able to define different queue types with different possible operations.

You can define any type of operation inside a 3D queue, classic rendering, compute and copy operations.
The 3D queue is practically a universal queue, where you can put any type of workload in it.
The Compute queue is more restrictive, only compute shaders and copy operations can be used and the Copy queue is only for copy operations.

What the developer will try to do is to look at the different workload his game processes and looking out for different bottlenecks.
He then sorts out his workload, he puts some of it in the 3D queue and the other in the Compute queue, there is a start/end mechanism which tells the driver when certain queues are active in parallel or when a certain job needs to be done before the other starts.

The Idea behind is to define the workload in a more independet manner at a given timeframe respectively having a larger working pool at the same time available, so the GPU has more freedom about how and when it executes the given workload.

In regards to Quantum Break the argument is that the snapshot of GPUView shows no workload inside a Compute queue, meaning all workload is defined inside the 3D queue, running seriell one after another.
 

Arkanius

Member
Basically, instead of throwing everything max.
What are the base Xbox One settings and what can I turn up before destroying performance on my 980 TI?

I just want 60FPS at 1440p (With or without Upscaling). I have Gsync so it can fluctuate a bit.
 

Nillansan

Member
Basically, instead of throwing everything max.
What are the base Xbox One settings and what can I turn up before destroying performance on my 980 TI?

I just want 60FPS at 1440p (With or without Upscaling). I have Gsync so it can fluctuate a bit.

I would love to know this as well.
 

Daingurse

Member
Basically, instead of throwing everything max.
What are the base Xbox One settings and what can I turn up before destroying performance on my 980 TI?

I just want 60FPS at 1440p (With or without Upscaling). I have Gsync so it can fluctuate a bit.

I believe that XB1 settings are Medium across the board, with the exception Texture Quality which is Ultra.
 

4jjiyoon

Member
i can't seem to get vsync to function correctly.

the ingame one is double buffered so it constantly drops to 30fps from 60fps and back again.

so i did what i usually do and disabled it and turned it on with triple buffering in the nvidia control panel.

except, that's dropping to 30fps from 60fps and back again too. it shouldn't do that and it never has in any other game.

the only vsync option i can get working correctly is the newer "fast" option in the nvidia control panel then use rtss to stop it going above 60fps.

it's really weird. anyone else having this problem?
 

Diggler

Member
i can't seem to get vsync to function correctly.

the ingame one is double buffered so it constantly drops to 30fps from 60fps and back again.

so i did what i usually do and disabled it and turned it on with triple buffering in the nvidia control panel.

except, that's dropping to 30fps from 60fps and back again too. it shouldn't do that and it never has in any other game.

the only vsync option i can get working correctly is the newer "fast" option in the nvidia control panel then use rtss to stop it going above 60fps.

it's really weird. anyone else having this problem?

The nvidia panel setting only works for openGL games.

You can try alt-tabbing in and out of the game once it's running, that sometimes forces triple buffering. Otherwise it's G-Sync time!
 

dr_rus

Member
Yes, and if the code written for the DX12 implementation requires more than the DX11 submission model allows, then the card will struggle to run the DX12 code using the "clone of the dx11 submission model." There's really no discussion to be had here, this is really simple logic. You can't fit 6 gallons in a 5 gallon bucket. One gallon will have to wait to go in once more room in available. If that 6th gallon in needed before there's room for it in the pipeline, then you crash.

Again, I'm not saying that this is what's happening in Quantum Break, just that the performance between the 970, 1060, and 480 scales really well with their async compute implementation running on DX12.
No, it won't struggle, it will just run the same commands as if it would in DX11. Your gallons analogy doesn't make sense because the h/w capacity doesn't change between different APIs (it does when the h/w is able to take advantage of async compute - hence the gains in performance), API is just software.

The cases of big loss of performance in DX12 compared to DX11 on NV h/w is mostly resource management related, with most DX12 renderers being either straight ports from XBO (GCN h/w) or AMD sponsored efforts and just not optimized for NV h/w enough. QB is a prime example of this really, with gains up to 33% in DX11 even on older Maxwell cards.

DX12 should at worst perform on the same level as DX11 on all h/w. When it doesn't it's purely DX12 renderer fault and no one else as the only reason for such regression is the inefficient implementation of functions which are handled by the driver in DX11 API. There are no other reasons why a software translation layer which even have more scheduling capabilities may produce worse results than the fault in this software layer itself or in the software which is using it.

this part isnt true. a game written for dx12 from the ground up will be entirely different. even without async compute their are performance gains to be had on the gpu side once this happens
Give me an example of such gain beside async compute filling the graphics bubbles.
You also seem to not really understand that it doesn't matter what API there is on the s/w side, the scheduling on the h/w side is the same -- even in case of AMD's ACEs they are able to make use of them in DX11 via their driver if a game is coded for this.

It's possible but would be a strange decision considering that they are doing lots of compute stuff, including mips generation judging from their presentations.

Even if we assume Quantum Break does use Async Compute how would you explain the results from the 380 (GCN Gen 3) which are identical to the 480 (GCN Gen 4)?
380 can be mostly bandwidth limited in any API. same can be true for 480, sure, although it should be generally more efficient and it has 40% faster RAM.
 

Mohasus

Member
so i did what i usually do and disabled it and turned it on with triple buffering in the nvidia control panel.

except, that's dropping to 30fps from 60fps and back again too. it shouldn't do that and it never has in any other game.

the only vsync option i can get working correctly is the newer "fast" option in the nvidia control panel then use rtss to stop it going above 60fps.

Triple buffering in CP is for OpenGL games.
Fast sync doesn't work like that, you'd need at least 2x the framerate to have it enabled.

I think borderless window + v-sync will force triple buffering.
 
No, it won't struggle, it will just run the same commands as if it would in DX11. Your gallons analogy doesn't make sense because the h/w capacity doesn't change between different APIs (it does when the h/w is able to take advantage of async compute - hence the gains in performance), API is just software.
The bolded is you paraphrasing the very thing you are arguing against. You are supporting what I said. Make up your mind.
It does matter if the h/w is better capable of making use of async. And if code is written in a way that throws more compute tasks at the pipeline than some hardware can handle, then that hardware will struggle, even in DX12.

Of course APIs are just software, so are the tasks. So are drivers and game code. They all effect how the hardware functions. Hardware doesn't run efficiently on magic.
 

dr_rus

Member
The bolded is you paraphrasing the very thing you are arguing against. You are supporting what I said. Make up your mind.
No, I'm not. You're saying that DX12 may lead to a performance loss compared to DX11 on h/w which isn't able to take advantage of async compute. This is incorrect, such h/w won't show any gains but there is no technical reason for any loss (there's some additional CPU overhead in async scheduling in DX12 but it's minimal). Nobody's arguing the fact that h/w can gain performance from async compute in DX12 if it's capable.
 
No, I'm not. You're saying that DX12 may lead to a performance loss compared to DX11 on h/w which isn't able to take advantage of async compute. This is incorrect, such h/w won't show any gains but there is no technical reason for any loss (there's some additional CPU overhead in async scheduling in DX12 but it's minimal). Nobody's arguing the fact that h/w can gain performance from async compute in DX12 if it's capable.
If more compute tasks are used than DX11 can handle, then it does. DX12 isn't magic. The door swings both ways. DX12 sees improvements on this front because it allows those compute tasks to get processed faster, so they don't have to sit and wait. If a DX12 game is written correctly, it will rely on that to happen. It'll no longer be a boost but the normal. This will lose performance with less capable async compute implementation.

DX12 QB was written to run on Xbox One. If Remedy wanted to maximize their compute tasks and there were CPU bound tasks that could be moved, they could have moved tasks that were CPU bound on the DX11 version over to the GPU because DX12 allows for more tasks to be completed in a cycle since the GPU no long has to complete the cycle before more compute tasks can be added.

If they did this, then poor compute scheduling could cause a bottleneck.

If any developer does this, then poor compute scheduling could become a bottleneck.

And again, since I seem to have to keep reminding you, I'm not saying that this is what's happening in QB. So please stop acting like I am.
 

Locuza

Member
[...]
Give me an example of such gain beside async compute filling the graphics bubbles.
You also seem to not really understand that it doesn't matter what API there is on the s/w side, the scheduling on the h/w side is the same -- even in case of AMD's ACEs they are able to make use of them in DX11 via their driver if a game is coded for this.
Since it's a future looking speculation we would think about all features you potentially could embrace with DX12 in comparison to DX11-
You probably know for yourself that with explicit resource managemant, ExecuteIndirect and bindless resources developers could enhance or implement new rendering methods, improving performance without Async Compute.
Going into the future with Shader-Model 6.0 there will be directly new operations for improving the GPU-Performance.

It's possible but would be a strange decision considering that they are doing lots of compute stuff, including mips generation judging from their presentations.
You could of course find simple reasons like an early DX12 launch on PC, where Async Compute would have meant additional effort and validation.

380 can be mostly bandwidth limited in any API. same can be true for 480, sure, although it should be generally more efficient and it has 40% faster RAM.
It's likely true for the RX480 too, since the raw power is also nearly 40% higher but there is of course a better DCC and more L2$.
In the end there are different bottlenecks, maybe the game is bottlenecked by a weak geometry-frontend, maybe it's more starved because of the bandwidth and the ALU:Bandwidth ratio is always important.
The gains from Async Compute are in Ashes of the Singularity not smaller or bigger for the RX480 in comparison to the 380(X):
https://www.computerbase.de/2016-06/radeon-rx-480-test/11/#diagramm-ashes-of-the-singularity-async-compute_2
 
I've just bought the Steam version, I only have a 1080P monitor so if I turn scaling off will it render in 1080P?

Or is it a good idea to turn it on? I have a 6800K @ 4.2Ghz and a 1080.
 

Daingurse

Member
I've just bought the Steam version, I only have a 1080P monitor so if I turn scaling off will it render in 1080P?

Or is it a good idea to turn it on? I have a 6800K @ 4.2Ghz and a 1080.

I'd turn scaling off on your rig, especially since you only have a 1080p display. You would be able to get away with some good settings at that res with your GPU.
 

kittoo

Cretinously credulous
Why does this game look so blurry even at 1080p? I tried turning off anti-aliasing but that didnt help either. What the hell is going on?

Edit: Have kept all settings at highest. I have a 1070 and i7 4770k.
 

kittoo

Cretinously credulous
Well when it's on you are playing in 720p (on 1080p display) upscaled to 1080p. That's why it looks blurry.

When you turn in off you are playing in your native resolution.

The fuck? Why wouldnt they write this in the description? It just said it would make the performance better.
 
Amazon is now estimating delivery at two weeks from next Tuesday. I'm guessing the game discs went gold this week upon digital release.
 

Locuza

Member
If more compute tasks are used than DX11 can handle, then it does. DX12 isn't magic. The door swings both ways. DX12 sees improvements on this front because it allows those compute tasks to get processed faster, so they don't have to sit and wait. If a DX12 game is written correctly, it will rely on that to happen. It'll no longer be a boost but the normal. This will lose performance with less capable async compute implementation.
The shaders are the same between the APIs since you use the current Shader-Model 5.(1).
The infrastructure is different but if you claim that there are too much compute tasks for DX11 to handle, then the only logical explanations for me would be that the driver on the CPU side can't keep up with the batches (draw-calls/compute-dispatches) the application is ordering, which would be a very rich and a extreme case, something I would only expect in the upcoming years.

DX12 QB was written to run on Xbox One. If Remedy wanted to maximize their compute tasks and there were CPU bound tasks that could be moved, they could have moved tasks that were CPU bound on the DX11 version over to the GPU because DX12 allows for more tasks to be completed in a cycle since the GPU no long has to complete the cycle before more compute tasks can be added.
It's a heavy code modification to transfer tasks from the CPU to the GPU, having different implementations are very unlikely.
In most cases the developer should do the transition and abandon the old solution.

If they did this, then poor compute scheduling could cause a bottleneck.

If any developer does this, then poor compute scheduling could become a bottleneck.
I believe this is unlikely, you see the results with Time-Spy and Ashes of the Singularity, where the Compute Queue is not used under Maxwell, the execution is serialized and the performance doesn't seem broken.
You might pay something for the synchronisation points and serialization, but the overhead shouldn't become a bottleneck.
For me it sounds unreasonable why a DX12 implementation should perform worse on Nvidia even with Async Compute.
If the general optimisation for the architecture is solid it shouldn't be (much) slower than DX11.

Why does this game look so blurry even at 1080p? I tried turning off anti-aliasing but that didnt help either. What the hell is going on?

Edit: Have kept all settings at highest. I have a 1070 and i7 4770k.
You could try Reshade:
http://screenshotcomparison.com/comparison/186265
http://screenshotcomparison.com/comparison/186264
 

coughlanio

Member
Just grabbed this and was pleasantly surprised at the performance. Running on my 2560x1440p 165hz GSYNC monitor, default Ultra settings (1080p) I get about 120FPS average (100 min, 155 max) on my GTX 1080. A little blurry, but I think it looks pretty good. Running everything ultra, 2560x1440 with upscaling off, I average about 70FPS.
 
Okay I'm running everything maxed out, upscale is off and I've capped it to to 30 FPS, it looks gorgeous but what's this weird blurring whenever a character moves?
 

Daingurse

Member
Okay I'm running everything maxed out, upscale is off and I've capped it to to 30 FPS, it looks gorgeous but what's this weird blurring whenever a character moves?

I've heard it's like some kind of temporal artifacting? It's one of the blemishes in the visuals for sure. Shit just looks weird.
 
Okay I'm running everything maxed out, upscale is off and I've capped it to to 30 FPS, it looks gorgeous but what's this weird blurring whenever a character moves?

I've heard it's like some kind of temporal artifacting? It's one of the blemishes in visuals for sure. Shit just looks weird.

It is from the upscale/taa (yep, even when you turn it off). It decreases in accordance with framerate. So sadly.. it is really obvious at 30fps.
 

dr_rus

Member
If more compute tasks are used than DX11 can handle, then it does. DX12 isn't magic. The door swings both ways. DX12 sees improvements on this front because it allows those compute tasks to get processed faster, so they don't have to sit and wait. If a DX12 game is written correctly, it will rely on that to happen. It'll no longer be a boost but the normal. This will lose performance with less capable async compute implementation.
A. There's no limit on the number of compute tasks DX11 can handle.
B. A game can't rely on a task being processed in some set time on PC.
You're talking about execution latency which will only be different if a GPU is able to execute some compute tasks in parallel in DX12 and can't in DX11. If it can't execute them in parallel in any API due to h/w limitations there won't be any difference in execution time between these APIs.

DX12 QB was written to run on Xbox One. If Remedy wanted to maximize their compute tasks and there were CPU bound tasks that could be moved, they could have moved tasks that were CPU bound on the DX11 version over to the GPU because DX12 allows for more tasks to be completed in a cycle since the GPU no long has to complete the cycle before more compute tasks can be added.
What cycle? You can't just move tasks from CPU to GPU, it doesn't work like this, if some task would be theoretically more efficiently executed on the GPU than on CPU this will be true for any API this task is being run through. Async compute changes the execution order between graphics and compute filling the gaps in the graphics pipeline, it doesn't make a GPU magically more suited for compute than under DX11. In fact, it doesn't affect GPU's compute capability at all.

If they did this, then poor compute scheduling could cause a bottleneck.
For the third time - no, it can't. There's no "poor compute scheduling" in DX12 with or without async compute running concurrently.

And again, since I seem to have to keep reminding you, I'm not saying that this is what's happening in QB. So please stop acting like I am.
I'm not talking about QB, I'm talking about your false claim that running DX12 renderer with async compute on a h/w which can't run async compute concurrently can make it run worse than the DX11 renderer performing same tasks would.

Since it's a future looking speculation we would think about all features you potentially could embrace with DX12 in comparison to DX11-
You probably know for yourself that with explicit resource managemant, ExecuteIndirect and bindless resources developers could enhance or implement new rendering methods, improving performance without Async Compute.
Going into the future with Shader-Model 6.0 there will be directly new operations for improving the GPU-Performance.
I've yet to see any example of a better resource management by DX12 renderer than what is there in DX11 so excuse me if I don't hold my breath over this providing any benefit, ever. It's also hardly something unlocking a hidden GPU potential.
ExecuteIndirect is mostly a CPU side API optimization, not sure what it has to do with GPU either.
Bindless models and SM6 are not exclusive to DX12 and are there in DX11.x too.

It's likely true for the RX480 too, since the raw power is also nearly 40% higher but there is of course a better DCC and more L2$.
In the end there are different bottlenecks, maybe the game is bottlenecked by a weak geometry-frontend, maybe it's more starved because of the bandwidth and the ALU:Bandwidth ratio is always important.
The gains from Async Compute are in Ashes of the Singularity not smaller or bigger for the RX480 in comparison to the 380(X):
https://www.computerbase.de/2016-06/radeon-rx-480-test/11/#diagramm-ashes-of-the-singularity-async-compute_2

I wouldn't use AotS as a good indication of async gains as it's a rather unorthodox renderer and it have (had?) some strange pipeline changes between async on and off.

Smaller gains from async on 480 can be seen in a number of games now, although all of them can be explained by something else of course. It's however a bit strange to try to debunk this as the simple fact of Polarises better graphics pipeline natively lead to less headroom for running compute alongside it - so in theory this must happen in some cases anyway, even if doesn't actually happening in QB.
 

Locuza

Member
[...]
I've yet to see any example of a better resource management by DX12 renderer than what is there in DX11 so excuse me if I don't hold my breath over this providing any benefit, ever.
Well we all have, at least for Nvidia.
Since no side can proof or state facts for the future, there isn't much left than waiting for the upcoming DX12 titles.

ExecuteIndirect is mostly a CPU side API optimization, not sure what it has to do with GPU either.
Bindless models and SM6 are not exclusive to DX12 and are there in DX11.x too.
It depends on the use case, you could kill off round trips between the processors, saving additional latency and idle-times, you can build new mechanism living inside the GPU.
Intel gave away the performance numbers for their simple asteroid demo:
vHf9DrH.png


DICE and Confetti presented methods for triangle culling and a visibility buffer, one key foundation are the expanded capabilities coming with ExecuteIndirect.
Beginning with page 22:
http://frostbite-wp-prd.s3.amazonaws.com/wp-content/uploads/2016/03/29204330/GDC_2016_Compute.pdf

Also page 22:
http://www.conffx.com/Visibility_Buffer_GDCE.pdf

SM6 may also come to DX11.3/4 but the explicit resource managament, the command-list-generation, ExecuteIndirect and the new binding model are only available under DX12, as long I didn't get anything wrong:
https://msdn.microsoft.com/en-us/library/windows/desktop/dn859252(v=vs.85).aspx

I wouldn't use AotS as a good indication of async gains as it's a rather unorthodox renderer and it have (had?) some strange pipeline changes between async on and off.

Smaller gains from async on 480 can be seen in a number of games now, although all of them can be explained by something else of course. It's however a bit strange to try to debunk this as the simple fact of Polarises better graphics pipeline natively lead to less headroom for running compute alongside it - so in theory this must happen in some cases anyway, even if doesn't actually happening in QB.
I agree, the variance under AotS is too high for precise arguments but unfortunately there isn't much choice.
At least there is also Time Spy:
http://www.legitreviews.com/3dmark-time-spy-benchmark-dx12-async-compute-performance-tested_184260

10,6 % for the 380X Nitro, 12,1% with the RX 480 and 12,8% for Fury X.

I would be interested in cases where you could see smaller gains from Async Compute on the RX 480 but there are currently only two applications which allow comparisons between Async Compute on/off and are publicly tested.
There might be a third with Rise of the Tombraider since there is a Async Compute option in the config but I didn't find results for it.
Fortunately with Gears of War 4 there will be an upcoming title which also allows Async Compute on/off comparisons.
 
Top Bottom