The fact of XSX's CU's having L0 caches for each one of them is very nice since they wouldn't really function without it. By the way i know that they are the fastest caches, PS5's are even faster. If we are discussing per CU efficiency (like we were before your reply) i can't see how it helps your cause. Secondly, i didn't claim to know XSX's L1 cache size did i? I only said it has to feed 4 more CU's at 20% lower bandwidth. I used 5700 xt as a reference, it has 128kb. I don't know if it's really possible to increase that amount by 20% like the L2 cache. We'll see.
Man, I can almost taste the facetiousness in this post...
Anyway, it's like I said earlier about the PS5 caches being "faster"; that's something that begins to factor in if the graphical tasks in question need a certain duration of cycles to where it actually becomes a factor. Otherwise, on a
cycle-for-cycle basis, the
larger GPU with the larger physical amount of cache is going to be able to crunch more data in
parallel than the smaller GPU with a smaller physical array of cache.
I don't know what was being discussed before my reply, but everything I'm bringing up fits neatly into that discussion. It's contingent to it, it's pertinent. You don't get to determine something that fits relatively close in with what you were discussing prior (does not physical cache allocation on the L0$ level affect CU efficiency? I surely would think it does) just because it brings up a point you either didn't consider or, in light of being indicated, don't like.
Why would it not be possible to increase the L1$ size? These systems are at least on 7nm DUV Enhanced; even a few slight architectural changes here and there would allow for more budget to cache sizes. I'm not saying it's 100% a lock they did increase the L1$ size, just that it's premature to assume they did not, when they've already increased the L2$ amount.
Otherwise yes, it's true if the L1$ sizes are the same for both then Series X feeds more at a 20% reduced speed. But you won't be
needing to access the L1$ frequently in the first place if you have more physical L0$ allowing for a higher amount of unique data to be retained in the absolute fastest cache pool available. And that's where, on same architectures, the larger GPU has a
very clear advantage in; always have and always will (unless we're talking about GPUs of two different architectures where the smaller one has a much larger L1$, but that's not what we're dealing with here regards RDNA2. Only discrete GPUs I can think of doing this are some upcoming Intel Xe ones that are very L0$-happy).
It'd be really nice if we stopped confusing cache
speed with cache
bandwidth. IMO the former should pertain to overall data throughput measured in overall time (cycle) duration. The latter should pertain to single-cycle throughput, which is dependent on actual cache sizes. Assuming L1$ is the same, their bandwidth is the same and the speed advantage for a given graphical task getting crunched on the caches only starts showing a perceptible difference in favor for that with the faster clocks if a certain threshold of data processing for that task in the caches is done. We can apply this to the L0$ as well.
It is bot about being smart or stupid. It is about costs.
Were they stupid because they have only 10 GB of full speed RAM and the rest travelling at a significant lower speed?
No, because if PC GPU benchmarks are anything to go by many, MANY reserve a chunk of the VRAM as just-in-case cache, even if the game isn't actually occupying the cache in that moment of time.
So you'd think smarter utilization of the VRAM by cutting down on the use of chunks of it as a cache would make better use of it...thankfully MS have developed things into XvA like SFS to enable that type of smarter utilization of a smaller VRAM budget. Sony has a great solution too; it's different to MS, but both are valid and make a few tradeoffs to hit their marks. At least regarding MS's, I don't think those tradeoffs are what you're highlighting here, going by extensive research into this.
Were they stupid to allow Sony an advantage on shared resources on the GPU where clock is the only differentiator across consoles (could have gone for an extra Shader Engine and thus overall smaller Shader Arrays)?
You do realize the RAM still needs to hold the OS, CPU-bound tasks and audio data, correct? Realistically we're looking at 14 GB for everything outside of the OS reserve for PS5 (NX Gamer's brought up the whole idea of caching the data to the SSD before; not that it's a realistic option IMHO outside of some tertiary OS utilities seeing as how the vast bulk of critical OS tasks expect the speed and byte-level addressable granularity of volatile memory to work with), and if we're talking games with similar CPU and audio budgets on both platforms, at most you have 1 extra GB for the GPU on PS5 vs. Series X, but you sacrifice half a gig of RAM for CPU and audio-bound data.
Yes it does have a faster SSD but there's still a lot of aspects of the I/O data pathway that are apparently CPU-bound once the data is actually in RAM.
Was Sony stupid for not adding more CU’s? Nope...
Glad we agree on this part.
it is possible that they believe the current solution fits their targets, regardless of it it gives them an absolutely dominating position or not (12 TFLOPS tagli be is enough to have).
Just because MS happens to have more TF performance doesn't mean they didn't aim for a balanced design target, either. This is a common misconception and comes from a binary mode of thinking, where everything's either a hard either/or. Console design is much more complicated than that.