I feel like most of this thread is talking about GPU cores as if they're CPU cores and you have to program explicit multithreading or the extra ones just don't work. It doesn't really work like that, and I maintain that the application of the term "cores" to GPU groupings of ALUs was a mistake that continue to fool uninformed people.
GPUs and GPU programming are "embarrassingly parallel" machines, the parallelism is inherent in their nature. You're not sitting there going, oh fuck, I have to make a thread for CU 51 now. If there's an issue scaling up to a higher CU count, there's something bottlenecking it or not moving fast enough to feed it. The PS5s GPU simplified as being weaker because people take the shaders * clock speed = Tflops as everything they need to know, but the higher clock speed actually clocks other parts of its logic higher, for example its pixel fill rate of 142Gpixel/s vs 116 on the XSX, and then all the Compute Unit command processor logic etc.
It sounds like maybe these bottlenecks are being worked around and gradually showing more of the XSX's higher peak shader performance, and the APIs and OS and toolset are surely getting better to do so as well. The XBO generation also showed their API was heavier despite being among the "low level" ones, maybe some of that still going on and improving as well. The PS5 for its part has its own leads in hardware as well, Gflops are the simplest baseline paper calculation, it's like comparing CPU speeds just by clock speeds.