• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Ray Tracing in NV/AMD, demystified.

psorcerer

Banned
I see a lot of bullshit is flying in all the threads about how RT is implemented in NV/AMD and how one is superior to the other or vise versa.
I hope I can put all of these arguments out once and for all. (err, whom I kidding)
TL;DR NV and AMD solutions are almost exactly the same. Nothing to see here.

Now actually if we open the Nvidia whitepaper on ray tracing we can see how it is built (roughly)
Each Turing SM (shader module) has a special silicon called "RT unit", it is located close to TEX fetch units, in fact it is located before them, meaning it is on path of VRAM->L1 cache but just before the Texture units
And each 4 Texture units have 1 RT unit serving them.

But what's the performance of an RT unit?
We do not know.
But we do know the texture unit performance, for a typical 2080Ti (in boost mode) it's ~420GTex/sec
Because RTs are on the path to texture cache they cannot possibly fetch from VRAM faster than that.
I would suspect that NV doesn't state the actual max perf numbers just because the actual intersection check is almost instant (probably small number of clocks) after the sample is loaded from memory.
How do I know it?
From the same presentation we can see that RT core accelerates ray->AABB/Tri intersection in a BVH structure.

What's BVH?
It's a tree of boxes for each and every "object"/"group of objects" in the scene, and when boxes house a sufficiently small object that tree leaves are the triangles that this object is built from.
So essentially BVH includes all of your scene, everything and every fucking triangle.
It means in turn that BVH structures are huge, a lot of memory, depends on how precise you want your effects to be, but still.
The size and maintenance (you need to add new objects to that tree, and reconfigure tree when object changes position) of BVH is the first stumble block of a real-time raytracing.
So, to test for an intersection of ray vs BVH we ask the texture cache to load a small BVH "node" into our RT unit, check for intersection, then load next node and so on.
In no place Nvidia mentions any type of special cache for the BVH or any cache in RT unit whatsoever, so, until further notice, we should assume no internal memory in RT units.
That's why each RT unit is effectively bottlenecked by how fast textures (in our case it's a part of BVH tree) can be fetched from VRAM.

Now we do have another number for 2080Ti it's "10Grays/sec" how was it calculated?
According to the same whitepaper it's the best synthetic benchmark result they could achieve on a primary ray intersections for a specific curated list of BVHs they had.
If we open a more realistic scenario of a multi-ray benchmark here we can see that performance drops further, to 3-3.5Grays/sec.
These are synthetic. Actual games will have even less performance available.

So what will happen in actual games?
Let's return to the whitepaper: we can see that the RT core is invoked by scheduling an instruction from the shader and the result is returned back to the shader engine (probably in a register).
It means that "rasterization" is not going anywhere, after we got the intersection it's up to the shader itself to determine what to do with it, how to color the pixel, how to render the shadow, etc.
RT units accelerate BVH traversal and nothing else, all the usual shaders need to run and use these normal unaccelerated FLOPS to render the final image.

What about AMD?
Let's check the AMD patent here.
What do we see? We see "intersection engines"(IE) that are colocated with texture cache units.
Each TEX unit gets an intersection engine that gets BVH node from the cache and then returns the result to the shader.
It's exactly the same path, but with 1 IE core per each 1 TEX core.
The only difference is that "NV RT" is placed before the cache, and "AMD RT" goes after the cache.

So what about bottlenecks?
It's exactly the same. For XSeX RDNA2 GPU we have a texture fillrate of 208TEX units * 1.825Ghz = 379.6GTex/sec.
And we have a number from MSFT "380 billion BVH traversals per second".
Doesn't ring a bell? Yep. It does. We are still limited by the ~380GTex/sec
The same as NV.

Can we compare 2080Ti to XSeX?
Yep, now we can.
We can approximately calculate theoretical difference in max RT performance between 2080Ti and XSeX: 420 vs 380 = 10Grays/sec vs 9Grays/sec
Pretty close. But again, actual in-game numbers will be much, much, much lower.
Probably to the point that there is no difference at all.

What about PS5?
Simple 144*2.23 = 321GTex/sec (yes it is a boost clock, and it is a boost clock for NV too)
Which places it in theoretical 7.6Grays/sec. Not bad. But lower than the other two.
Actually NV states 6Grays/sec for 2070, so still better than that.

Questions?
 

SighFight

Member
Wow, thank you. So tge difference in rt power is the same as the raw compute power. ~18% ray intersects/sec. You also mention the large memory consumption. How does the faster, slighty smaller xbsx memory stack up against the larger unified pool that's a bit slower on ps5? Possible to predict the impact?
 

UnNamed

Banned
Nvidia showed RT cores as separate cores from shader cores, what I don't understand if this implementation means RT "instruction" are inside the shader cores so they became some sort of more complex cores, tied to them, separate, etc. Sorry for my incorrect use of specific terms.
 

psorcerer

Banned
Nvidia showed RT cores as separate cores from shader cores

That was misleading. That's how it looks in Nvidia whitepaper.
tensorcore.jpg
 

GamingKaiju

Member
P psorcerer that was an excellent explanation, thank you taking the time to write that up.

So based on your research we could see better RT implementation on xsx than on ps5 due to the higher Grays/sec of the xsx?

From my knowledge the nvme stuff in ps5 wouldn't make that much of an impact on RT due to the data been held in Vram.

I'm not really interested in the fanboy stuff I'm generally more interested in the RT implementation of these new consoles.
 
Last edited:

psorcerer

Banned
So based on your research we could see better RT implementation on xsx than on ps5 due to the higher Grays/sec of the xsx?

Yes.

From my knowledge the nvme stuff in ps5 wouldn't make that much of an impact on RT due to the data been held in Vram.

NVME stuff will impact "shader" phase a lot though. Which is still a part of the RT pipeline.
Overall I do not expect heavy RT usage in anything that looks better than Minecraft.
Light usage (like in Control) will be bottlenecked by memory latency, I suppose.
GDDR6 in both consoles, latency will be virtually the same.
 

Racer!

Member
P psorcerer that was an excellent explanation, thank you taking the time to write that up.

So based on your research we could see better RT implementation on xsx than on ps5 due to the higher Grays/sec of the xsx?

From my knowledge the nvme stuff in ps5 wouldn't make that much of an impact on RT due to the data been held in Vram.

I'm not really interested in the fanboy stuff I'm generally more interested in the RT implementation of these new consoles.

Could it be though that you could actually dynamically "switch out" pre computed ray traced textures for static scenery...like changing weather/time of day in a title like GTSport on the fly, while just real time ray trace animated scenery with for example reflections and shadows on a car?
 

psorcerer

Banned
Could it be though that you could actually dynamically "switch out" pre computed ray traced textures for static scenery...like changing weather/time of day in a title like GTSport on the fly, while just real time ray trace animated scenery with for example reflections and shadows on a car?

Yep, something like that. There is no point in raytracing static stuff anyway. It can be prebaked in a much better quality.
Some would argue that off-screen reflections/shadows could be accelerated for statics too, but I think it's so minor...
 

Racer!

Member
Yep, something like that. There is no point in raytracing static stuff anyway. It can be prebaked in a much better quality.
Some would argue that off-screen reflections/shadows could be accelerated for statics too, but I think it's so minor...

Wow that would benefit alot of genres I think. That SSD it seems will be the key to a lot of new game design opportunities.
 
Judging from Cerny's talk he thinks that RT is a gimmick (at the current state of technology).
I also think it's a gimmick.
Hope NV Ampere will prove me wrong.

I'm not saying it's not a gimmick but haven't we seen it look great in a game like Control, albeit at a substantial performance hit?
 

psorcerer

Banned
I'm not saying it's not a gimmick but haven't we seen it look great in a game like Control, albeit at a substantial performance hit?

I'm not impressed. But people are, so it's ok.
I think it's a definition of "gimmick" - "some people are impressed". :messenger_tears_of_joy:


So in short: if somebody wants the best RT experince (my favorite next gen feature) he should buy XSX?

Probably better buy NV Ampere, if you have the moneys.
But if not, then yes, XSeX is a good choice.
 
Last edited:

Racer!

Member
I think SSD is kind of off topic here.
You can prebake and use static assets even without SSD.

Yes but the usability in harmony with that SSD might make up for some of that lower ray tracing performance. Which of course would work on both consoles.
 

psorcerer

Banned
Yes but the usability in harmony with that SSD might make up for some of that lower ray tracing performance.

SSD was made for other things.
PS5 vision is to remove bottlenecks in data streaming from persistent storage to the screen.
A lot like what PS2 was: data streaming machine that can pipe from the storage to the pixel in the least time possible.
The problem is, no multiplatform game would ever use it.
At the PS2 times, PS2 was an absolute king of sales, so games were made for PS2 and then ported to other devices.
We are living in a world where PC is the king for multiplatforms. And if you develop a game for PC, which is "low streaming", "high RAM", "CPU limited" then PS5 is a worst possible machine.
 

Ascend

Member
Yup.

AMD should also bring good cards this time around. Obviously I don't expect them to match NVidia's stuff but RDNA2 is a massive upgrade over anything they've come up with in many years. If the XSX is a freaking APU with 12TFs, their discrete GPU should get 15TF easily.
15TF? I would be surprised if they do not get to 18TF.
 
I'm not impressed. But people are, so it's ok.
I think it's a definition of "gimmick" - "some people are impressed". :messenger_tears_of_joy:




Probably better buy NV Ampere, if you have the moneys.
But if not, then yes, XSeX is a good choice.
I don’t have the money for NV but I will buy XSX :)
 

rashbeep

Banned
Judging from Cerny's talk he thinks that RT is a gimmick (at the current state of technology).
I also think it's a gimmick.
Hope NV Ampere will prove me wrong.

He's downplaying it because it can't be offered at a price that makes sense for most people.

I think he is smart enough to know it's not a gimmick
 

psorcerer

Banned
He's downplaying it because it can't be offered at a price that makes sense for most people.

Nope. It is a gimmick.
Any complex shaders and it's a toast. Any open world - it's a toast.
You can use some RT stuff for physics, audio, shadows, etc.
But full RT is a gimmick. It's enough to analyze Quake 2 RT on 2080 to understand.
Shadows calculated each 4 frames? :messenger_tears_of_joy:
 
Nope. It is a gimmick.
Any complex shaders and it's a toast. Any open world - it's a toast.
You can use some RT stuff for physics, audio, shadows, etc.
But full RT is a gimmick. It's enough to analyze Quake 2 RT on 2080 to understand.
Shadows calculated each 4 frames? :messenger_tears_of_joy:
Imagine full scene, per pixel RT. Maybe 10 years from now?
 

Ascend

Member
Nope. It is a gimmick.
Any complex shaders and it's a toast. Any open world - it's a toast.
You can use some RT stuff for physics, audio, shadows, etc.
But full RT is a gimmick. It's enough to analyze Quake 2 RT on 2080 to understand.
Shadows calculated each 4 frames? :messenger_tears_of_joy:
It might be, it might not be. If implemented correctly, you don't need separate effects like ambient occlusion, shadows, lighting, reflections, alphas for transparency, texture space diffusion... The RT handles all of them naturally.
 

VFXVeteran

Banned
I doubt that the consoles will have the same RT performance (in terms of FPS) as the 2080Ti @4k. It'll be interesting to see but I'm betting that in a realworld scenario, the pipeline of the chipsets on the consoles will lag far behind once you push pixel throughput higher. We'll see.
 

A.Romero

Member
Personally I like RT on BFV a lot but it does come with a substantial hit. It's really only useful if it's matched with DLSS, otherwise you have to chose between playing on a high res/high fps or RT.

I'm expecting creative uses of RT for next gen but nothing that requires a lot of raw power. As usual, raw power will be available in PC.
 
Top Bottom