I have never said it is 'only' a software based solution and never will. The point is that it contains clear process steps that build CPU overhead.
You're grossly overstating this overhead. Continuing to bring it up as if it is taxing on the system given the frequency you address it, when all proof points to the opposite.
Also if we want to get technical PS5's solution does not 100% absolve the CPU from doing anything; the CPU still needs to instruct the processor core of the I/O block on what data to fetch, move, etc., let alone initialize it for work. That still requires the CPU's input. It's just that all of the grunt from thereon is handled through the I/O block. On the Series systems some bit more after that point is handled through the CPU, but only at 1/10th of a CPU core (the OS core), which is nowhere near the CPU overhead you seem to think it is.
Once again - the point is that PS5 has a hardware path without CPU overhead while XSX does not
On an absolute technical level I've already disproven this with the response above.
A hardware path without CPU overhead always beats a path with CPU overhead on latency (and throughput) everything else equal. And in this case 'everything else' is not even equal it is to the benefit of the PS5.
Again, has been disproven. There is no 100% path "without" CPU overhead because the CPU still needs to initialize and coordinate dedicated hardware components to do their job, similar to how it is responsible for issuing instructions to the GPU (traditionally).
The reason for me writing 'willing to bet' is based on the above together with the aknowledgement that none of us has actual data from these machines in hand, i.e. we are basing our writing on available information.
This is a faith-based argument, though. And at least from what I can tell you haven't looked at all available information when it pertains to MS. Also at the end of the day you aren't really disagreeing with anything I'm saying because my argument has never been that XvA suddenly "closes the gap" or surpasses PS5's solution, but you seem to think that is what I've been arguing.
My entire point is that XvA's optimizations greatly narrow the SSD I/O performance gap between Series X and PS5, and is implemented in a way where direct comparisons aren't even particularly valid a good deal of the time. It is also more hardware-agnostic which of course comes with its own drawbacks (tho not nearly in the way you seem to be postulating), but has the benefit of being scalable to future hardware implementations in a way Sony's solution is not (since it relies so much more on absolute fixed-function hardware).
You are absolutely right. They have chosen a flexible solution (i.e. CPU overhead) that can be updated over time but lost throughout and latency in the process. That is the tradeoff they have made - I respect them for that. You are trying to argue there was no cost to that decision though which I find odd.
Latency is not a fixed, static element; algorithms improve and hardware designs improve over time, and therefore latency can drop. You will eventually see PC solutions of SSD I/O utilizing quality drives and aspects of XvA MS ports over (such as DirectStorage) that outperform Sony's solution. Will that be immediately? No. But within a year or two, it is absolutely possible. THAT is the benefit of a more hardware-agnostic design that nonetheless programs itself very well against new hardware to leverage them as best as possible.
CPU overhead is a big thing so I am not sure why you argue about that. Honestly, the entire field of RISC processors for example is about this, i.e. 'how can I lower the loss of speed in the CPU and still maintain some level of flexibility'.
No, RISC processors came about as a means of trying to parse down complex CISC instruction sets and isolate specific instructions to simpler hardware implementations for devices targeting specific markets. The trade-off being that simpler instructions execute faster on RISC but more complex ones take longer cycles to compute. None of that has anything to do with CPU overhead.
Speaking of, you're still overstating it in this specific case. A software-based solution running on two generations of processors with the latter having more modern design, better architecture etc...will run better (and therefore have much lower CPU overhead) on the latter. Again, MS have been able to reduce XvA stack on CPU to 1/10th of a CPU core, do you think this would suddenly balloon on a more advanced desktop CPU that were paired with similar SSD I/O hardware as the next-gen systems? Absolutely not.
I get where you are going wrong there, but Sony haven't committed to an ASIC for zlib or derivatives kraken, oodle, etc.
The IO complex is using two programmable co-processors and specialist super low latency SRAM memory to accommodate compression and decompression algorithm acceleration in whatever solution it evolves into, over the generation. So even if they used zlib, they'd just update the IO complex code, as the IO complex data is moved directly into the unified RAM and so wouldn't need any CUs for that type of work.. To put in context Cerny, stated that the IO complex decompressor is supposed to be equal to 9 of the PS5 Zen2 cores attempting the same work - which is also why I said the XsX might be using AVX2 for its CPU decompression, as clearly the Zen2 cores are suited to the work.
As for your Goosen quote, that feels like an end to the idea of AVX2 decompression on XsX, and possibly even the CUs for decompression. It is just a shame that they don't seem to want to reveal the details of the decompression block in the way Sony have - maybe that's what the ARM processors you mentioned are being used for.
In regards of SFS, I suspect Dirt is able to replace data mid-frame because it is (probably) only streaming in from the 10GB GPU pool, with pre-stage assets ready to use for that race. What I was referring to was streaming in from the SSD like an open world game does. It would defeat the purpose of SFS saving on bandwidth and memory use to stage the assets in GPU ram in the traditional way,, but you can't derive and return sampler feedback data without lower order place holder data loaded and rendered, first. And because of the round trip delay of the sampler feedback info, some of those early transition frames will need to be rendered as is AFAIK, which would be the data latency I was talking about.
"specialist super low latency SRAM"
There are many different types of SRAM, with varying latency, that also often depends on the size. For all we know Sony could've gone for cheaper SRAM with higher latency that could still be somewhat lower than DDR-based memories but not by a hefty amount. Conversely, if they've gone with high-quality SRAM with seemingly very low latency, then the cache size will be quite small, which will affect other performance metrics of the SSD I/O. Seeing as how they've had to be considerate of both GDDR6 and NAND memory prices, I'd suggest they've gone with some middle-of-the-road SRAM for the I/O block cache, so it actually remains to be seen just how low the latency actually is.
Yes touch on them being able to update the code for the I/O hardware; that is essentially firmware. It is the software to the hardware. The same stuff I was just telling Elog about. People are underestimating how crucial good software is required to leverage the hardware that lies underneath. So I am at least glad you have acknowledged this by stating how Sony would evolve the firmware over the generation. This is also what MS will do for their I/O hardware over the generation, too.
Also yes PS5 decompression can seemingly do 9 Zen 2 cores worth of decompression on the I/O block, but then how are you conflating this to saying Series X would need to use its own actual CPU for data decompression or CU cores of the GPU? That system has a decompression block as well. Is it as much as Sony's in throughput? No. But it doesn't need to be. So this really just moreso works against the idea that MS would need to use the CPU for decompression work or CU cores for texture data decompression (unless I missed something and you suggested Sony would have to use some CUs on their GPU for a similar purpose, if by some chance the I/O decompression hardware is not doing this?)
So with that said I don't see why you would state this, then assume Goosen's quote is suggesting what you suggest. If Series X were a PC with no specialized decompression hardware then I'd agree this is what they'd be doing, and maybe this is something they can do on PC. But I don't see any context for Series X (or Series S) needing to set aside these resources when there is no insistence PS5 does this, and both systems have I/O-purposed hardware in them to handle these very tasks.
EDIT: Wanted to add in real quick, if the insistence comes from the fact PS5's SSD I/O has more comparable raw horsepower to it, we also need to keep in mind it needs that for its higher raw bandwidth totals. Series X is targeting lower raw bandwidth with its solution, so it doesn't need as much of that hardware built right into the I/O solution (this is not to suggest it is meek by any measures here, however).