Tensor cores are much smaller and faster than CUDA cores at the cost of reduced precision.
Bovine feces.
Tensor cores are not faster, let alone "much faster" than normal shaders (which you referred to as "CUDA cores").
It's not about fp precision either.
It's the fact that shadres are more like generic compute units, sort of small and (compared to x86 world) dumb CPUs.
Whereas tensor cores are like AVX support in CPUs. They can only do that bunch of fp ops. (4 x 4 "matrix", but in reality, you could view it as simply 16 ops)
So, normal shader can do many things, while tensor core can do only that one thing.
In the past shaders could do int + fp op per cycle.
With Ampere, NV shaders can do int + fp or fp + fp.
That is why Huang thought it is a wise idea to claim that ampere has twice the number of shaders it has.
Oh well, of all the bazinga marketing coming out of it, this is perhaps the least harmful...
Also has the benefit of using a separate, dedicated piece of silicon rather than using the standard CUDA cores/Stream Processors and thus taking some performance away from rendering
I see, "separate" is how things get "better". Will remember that, thanks for the insight.
And to math, 3060 has 120 "tensor cores" and 3584/2 => 1792 shaders.
So, shaders could do 2*1792 while "tensor cores" can do 16*120 => roughly half the fp power of shaders.
Which, indeed, isn't insignificant, although, it's not clear if tensor cores are indeed independent or share some part of ALU units.with normal shaders.
On top of it, low end GPUs with tensor cores are still overpowered by higher end GPUs from earlier gens, so the "uh, you need these to apply some NN post processing to the TAA(u) upscaling also known as DLSS 2.0" is... likely a lie.