[...]
Vega 10 still looses a lot of performance even with minimum tessellation factor - which means that they will push for no or low tessellation again in GE titles.
For games CB writes the following:
Dasselbe Verhalten zeigt sich auch in Spielen. So nutzt Rise of the Tomb Raider keine hohen Tessellation-Faktoren. Während die GeForce GTX 1080 durch das Feature sieben Prozent an Geschwindigkeit verliert, ist es auf der Radeon RX Vega nur ein Prozent. Die Radeon R9 Fury X verliert hohe 18 Prozent, die Radeon RX 580 noch sechs Prozent.
Die HairWorks-Integration in The Witcher 3 nutzt dagegen hohe Tessellation-Faktoren und schon dreht sich das Bild um. Die GeForce GTX 1080 verliert 16 Prozent durch die hübschere Haardarstellung, die Radeon RX Vega 64 höhere 23 Prozent. Radeon R9 Fury X und RX 580 werden gleich um 31 Prozent langsamer.
Tomb Raider:
GTX 1080 looses 7% performance with Tess on.
RX Vega 1%.
Fury X 18%
RX 580 6%
TW3 with Hairworks:
GTX 1080 16%
RX Vega 23%
Fury X and RX 580 both 31%
Quite confusing is the topic around the primitive shader or the new Next-Generation-Geometry fast-path.
According to Computerbase and Golem it's not implemented yet, according to Anandtech (I belive) AMD said it is.
But synthetic tests show no higher throuhput than the classic pipeline can do, so it might be rather not implemented or only for certain applications.
In the white paper AMD mentions a peak rate of 17 primitives vs. the 11 they claimed in the beginning but the native/classic pipeline only achieves 4.
http://radeon.com/_downloads/vega-whitepaper-11.6.17.pdf
In general terms AMD doubled the size of the parameter cache so that might lead to less stalls overall.
Huge wins when using async compute - even compared to Polaris - means that shader core is idling more than it should. Seems to be even bigger than these of Fiji which is really surprising.
Ashes of the Singularity is not fully deterministic.
I wouldn't use it for precise claims because the results vary too much.
On the next page is Gears of War 4 with Async Compute.
There Vega profits around 7%, RX 580 also around 7% and the Fury X about 10%.
All IPC improvements in NCU went into 16/8-bit packed math apparently. I'm not even sure that this counts as an IPC improvement actually.
Well it increases the instructions per clock cycles but it's quite misleading when you think in more general terms.
In the beginning one AMD architect said they increased the instruction buffers but in most cases that's probably just a minor perf bonus, AMD already increased the instruction buffers with Polaris and per clock we didn't saw huge improvements.
Effective bandwidth went down somewhat - but what's more important is that this isn't compensated by the DCC improvements as can be seen from the results of black texture on Fiji clock/bandwidth.
In general it's good to see that with RX Vega the texel rate seems normal.
Vega FE showed one quarter less throughput than the Fury X, now RX Vega and Fury X are looking the same.
On the effective bandwidth side the RX Vega results are also better than on the Vega FE.
The random texture throuhput (effectivly no compression help) shows 18% better results, with black textures only 4%.
But looking at the RX Vega @ 1.05 Ghz results with overclocked memory for the same 512 GB/s show still huge deficit in comparison to the Fury X.