Another interesting set of tweets is this one, open to see all.
So as I see it Sony have faster caches giving a decent boost guaranteed.
20% faster lets say.
The cache scrubbers provide a further boost, how much will differ but he does suggest he thinks it'll matter more than cu count or clocks. It saves bandwidth with no re fetching.
He also mentions the latency of caches and seeing what sony has done with the io breakthroughs, what we can see of the re design of the zen2 cpu in places, its not really a surprise to claim they may have reconfigured some more bits to improve latency, otherwise there's no point modifying and you may as well go stock parts if there's no improvement.
We have the io that won't be a bottleneck and can feed data at unseen of levels and reduce the amount needed to be stored in ram leaving more for whats on screen, not what might be used.
The whole system design is clear, where possible latency is reduced and speed and efficiency is essential.
So all these things add up, the marginal gains I have mentioned before. Some large benefits like the clocks that benefit so much and give ps5 some large advantages. But then add in the ssd and io, add in the smalls gains here and there and all together at times this is allowing ps5 to match series x and even outperform it sometimes.
There's a few things to keep in mind here. For one, Matt isn't
always (or I'd even say, most of the time) speaking about this stuff from a strictly PS5 vs Series X POV; a lot of this stuff he's talking about in general just in relation to what they have done with PS5 design-wise. So it's always best to take their statements from a neutral stance unless he specifies something we know is specific to PS5...
...like the cache scrubbers, which he touches on. Like I was saying earlier,
fundamentally the technology of the SRAM embedded in both Sony and MS's systems are the same. If one has a 1ns latency on the L0$, then the other will, because that type of specific thing is going to be standardized by the
architecture. What he's addressing in terms of faster caches isn't something inherent to having literal faster/lower-latency cache; it's about using the cache coherency engines and cache scrubbers to do pinpoint evictions of bit data in the cache instead of flushing the whole cache line.
By that notion, yeah on one hand we could say "up to 20% faster caches", but there's two ways to actually look at that. The first is to look at it from leveraging the coherency engines/cache scrubbers themselves, but the issue here is, that's always going to be a
very situational, per-game basis type of thing. Think of it like how DLSS functions for games running on Nvidia cards; you can have two games both heavily leveraging the tech, but that doesn't mean they're going to see equal gains, because differences in game engines, game code, game asset sizes and amounts, game asset streaming rates, game logic, and overall render activity plus use of various effects and the rate of the usage of those effects, will ALL have differing impacts.
It's the exact same (well, just about) with cache scrubbers and the coherency engines; some games will benefit much more from using them than others, that's just the nature of things. So you're not always looking at a 20% higher cache throughput for all instances; a lot of times it will probably be less. Quite less, in fact, but again it all depends on the requirements of the game. Another thing impacting this, like I was saying before, is the specific type of scrubbing Sony uses (which we don't know about). There's demand scrubbing and patrol scrubbing: demand scrubbing is something that's explicitly stated to be done as it's needed; patrol scrubbing kicks in when there's enough downtime in the GPU activity levels on whatever given cycles to initiate. Each have their benefits and drawbacks; I can't say which is better though (probably better to read a few posts from members on B3D who are more knowing of how cache scrubbing works).
The other way to look at it is by looking at cache bandwidth on a per-frame or better yet, per-cycle basis. Like mentioned before, if the L0$ is 1ns in latency (assumedly), then that puts PS5's total L0$ at around 10.2 TB/s and Series X's total L0$ at around 12.14 TB/s. If both are running games with a 60 FPS target, then per-frame Series X still has the higher cache bandwidth. PS5's caches work faster due to the GPU clock, but this is to make up for having a lower amount of compute units (I know there's the meme that TFs don't mean everything but one of the things they ARE good for is measuring L0$ bandwidths). The cache coherency engines and cache scrubbers can help push up overall efficiency of PS5's cache usage, but the literal limit is
always going to be around 10.2 TB/s for the LO$, because that's what the GPU's theoretical performance peak is at.
Going back to your 20% figure, then, this doesn't actually seem too reflective of the actual differential we'd see between Sony and Microsoft's systems, because since Sony has a peak of 10.2 TB/s, that means you're actually saying Microsoft's GPU would perform at a 20% theoretical deficit of the 12.14 TB/s peak, or down to 9.7 TB/s (or another way of expressing it, around 9.7 TF). However, this isn't true
at all, because for starters we already have performance results that show this to be inaccurate: the academic benchmark of Photo Mode in Control, for example, shows the Series X GPU performing effectively 16% better than Sony's. Yes it's just Photo Mode but even in real game mode, the deficit in performance between PS5 and Series X is
nowhere near 20%, it seems to be more around 2% - 6% favoring Sony. This seems to roughly be the case for 3P multiplat performance so far where Sony wins out, never mind a few outliers.*
So then that leaves one (actually two) other area where PS5's faster GPU clockspeed would translate to faster caches that are actually effectively faster: Per-cycle and per-CU. And I mean that in terms of
single-CU per cycle (hence sticking to one rather than two). 10.275 TF/36 gives about 285.41 GF/s per CU, or about
282.65 GB/s L0$ bandwidth. OTOH, for Series X, 12.147 TF/52 gives around 233.596 GF/s per CU, or around
231.26 GB/s L0$ bandwidth. This is
kind of why I said a while ago that if PS5 is performing at peak GPU levels, for a Series X version of that game to run at similar levels, you would maybe need to saturate around 44 of the CUs given the GPU clock differences. However, this is also assuming the game is leveraging equal technologies on both systems; that extra CU headroom needed on Series X could either grow or shrink depending on what features the game uses on that platform that may not be present on PS5 (and vice-versa if there's something like say specific to PS5's GE not present on Series X that a game would need to use).
I think in terms of the faster GPU cache bandwidths, these are the things Matt was alluding/referring to.
*
EDIT: Actually I went back and thought about this part some more, and figure that it might be higher than a 2%-6%. In practice currently the Series X might be underperforming relative its theoretical peak by around 15% - 16% or exactly what the actual raw difference in GPU capability between the two systems is.
However, I don't think that is primarily in favor of PS5 due to the cache scrubbers, for reasons I talk about here. Those would mainly aid in helping PS5 maintain its peak performance more consistently when it is really needed, but this early in the gen I'd suspect few games are really leveraging the scrubbers. Therefore I'd say the reasons for underperformance on Series X's side with a lot of the 3P games is (remember I'm an optimist here) probably due to unavailability of certain system OS/GDK tools, instability of certain GDK tools, and limited availability of some RDNA 2 features.
Though the latter part would be somewhat curious as a major reason IMHO since that's relying on outright featuresets to obtain a theoretical performance advantage that should just be readily accessible from raw generic hardware usage, so I'm leaning more on the other two reasons. We'll probably find out more on some of this from MS's tech presentation(s) this month.