As I said before Tempest Engine (or half of it as the other half is used for audio) is more than capable of handling just the BVH transversal and intersection portion of RT calculation, there are other parts to RT but this portion is just basic vector math and since Tempest Engine is made to calculate sound bounces off of objects, it must be almost exactly the same when geared towards light bounces off of objects.
And this part needs branch prediction for collisions and I believe it is best offloaded to components that can do parallel calc and in this case mainly CUs. So if Tempest Engine is offloading BVH part of RT completely from the CUs and with this lessening the impact of having less # of CUs for RT, then it can really be considered secret sauce.
On the other hand, the rest of the RT is still done on GPU, but I think the rest, like shading, more dependent on the clock speed of the hardware, so in that case too higher clock speeds on PS5 can in fact prove more stronger in terms of RT. Remember RT is not Floating Point Operation at all, and its capabilities are not TFLOP dependent at all too, that is why nVidia (as being first to market) invented Giga Rays for measuring their RT capabilities. And that part, for both consoles are completely missing from official technical specifications released by MS and Sony. But what I figure is PS5 can do (mostly because of higher clocks alone, and not even figuring Tempest into it) more Giga Rays per second intersections than Series X.
This is speculation on my part, but that is mostly dependent of scouring the white papers of nVidia's RT & RTX GPUs and AMD's proposed RT patents, and Tempest part is mostly how Cerny decided to frame his explanation of Tempest Engine. Remember that he purposefully let one half of TE's reason of existence out of his presentation but only alluded to it. When someone like him goes out of his way to explain how it is similar to PS3 Cell's SPUs and how they are strong at something, it is best to really give attention to it. It explains why they chose to make a CU behave like a SPU for audio, but then says TE is much more than just audio and game developers will have access to half of its power for 'other reasons'.
Now I will put a partial transcript of what Cerny said and will let you decide
At 44:35 explaining what Tempest Engine is
"It is based on AMD's GPU technology, we modified a Compute Unit in such a way as to make it very close to SPUs in PS3. Remember when I said that they were ideal for audio. So, the Tempest Engine has no caches, just like an SPU,
all data access is via DMA (Direct Memory Access), just like an SPU. Our target was that it would have more power than a CPU thanks to the parallelism that a GPU can achieve, and it would be more efficient than a GPU thanks to the SPU like architecture.
The goal being to make possible near 100% utilization of CUs vector units."
At 46:03 mark explaining what TE can actually do
"We want to be able to throw an
overwhelming amount of processing power at the problem.... In fact with the Tempest Engine we even got enough power that we can allocate some to the games, to the extent that games want to make use of convolution reverb and
other algorithms that are computationally expensive or need
high bandwidth..."
and back when talking about RT, at 30:02
"The CUs contain a new specialized unit called the Intersection Engine, which can calculate the intersection of rays with boxes and triangles.
To use the Intersection Engine, first you build what is called an acceleration structure, it is data in ram that contains all of your geometry. There is a specific set of formats that you can use, they're variations on the same BVH concept. Then in your shader program you use a new instruction that asks the Intersection Engine to check a ray against a BVH. While the Intersection Engine is processing the requested ray/triangle or ray/box intersections, the shaders are free to do other work. Having said that, the ray tracing instruction is pretty
memory intensive, so it is good mix with logic heavy code."
To me this is like an open book; geometry data of the game (or that exact frame to be precise) resides in the RAM which is also streaming on the fly thanks to the incredible speed of the SSD (within 1 second margin); Tempest Engine utilizes half of it's overwhelming processing power to build the acceleration structure or a BVH if you will and constantly updates these each frame; finally Intersection Engines in the CUs checks in-game light sources' rays against the acceleration structure made by the tempest Engine. This way BVH isn't made by CUs and isn't subtracting power from it, and also the number of CUs are unimportant, you are not limited by # of CUs but the processing power and DMA bandwidth of Tempest Engine. Let me remind you that sound is processed much much more frequently per second than a frame in graphics, like in 192
kHz sounds vs 60Hz or even 144Hz screens, and that can provide you with necessary power of simple vector calculation prowess of Tempest Engine.