• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Digital Foundry about XSX teraflops advantage : It's kinda all blowing up in the face of Xbox Series X

Lysandros

Member
İn reality, the SX's graphics card is more powerful, system has more CUs, and a higher texture rate.
The entire point of this thread is to highlight the fact that the Series X is overall more powerful, and yet, not to the degree that was earlier suggested.
XSX's GPU having more CUs and texture fill rate doesn't make it "a more powerful graphics card" (XSX doesn't have graphics 'card' by the way it's an integrated APU). Those are just two metrics where it happens to be slightly ahead. The power of a GPU is the sum of all its throughputs relevant to game performance. There are as many even more GPU metrics where XSX is actually in deficit compared to PS5 and those can not be ignored because of one's personal wishes. This is the same tired, faulty and partial look which ever fails to grasp the reality of the situation.

And as a regular contributor to this thread, i wholeheartedly disagree with "The entire point of this thread is to highlight the fact that the Series X is overall more powerful" part. With all due respect, maybe you should speak for yourself there.
 
Last edited:

HeWhoWalks

Gold Member
XSX's GPU having more CUs and texture fill rate doesn't make it "a more powerful graphics card" (XSX doesn't have graphics 'card' by the way it's an integrated APU). Those are just two metrics where it happens to be ahead. There are as many even more GPU metrics where it's actually in deficit compared to PS5 and those can not be ignored regardless of one's personal wishes. This is the same tired, faulty and partial look which ever fails to grasp the reality of the situation.

And as a regular contributor to this thread, i wholeheartedly disagree with "The entire point of this thread is to highlight the fact that the Series X is overall more powerful" part. With all due respect, maybe you should speak for yourself there.
So, we've been led to believe the XSX is more power, yet........... it isn't? Quite the revelation.

No, I don't speak for myself. DF is saying the talk of the XSX being "more powerful" might be blowing up in its face. I used examples to show how this scenario might be true. Yes, both graphics cards have advantages over each other, but unless you've been living under a rock for the past three years, the narrative has been that the XSX's graphics card is stronger, the system has a higher teraflop count (irrelevant in my eyes, but that doesn't matter), and so forth. Therefore, that's what everyone has lived by.

However we choose to play slow now isn't my problem. There was a narrative. The truth might be different, but that doesn't change that said narrative has blown up in the XSX's face.
 

Hoddi

Member
Nanite on vs off.

tUZ9qrB.jpg
 

DaGwaphics

Member
However we choose to play slow now isn't my problem. There was a narrative. The truth might be different, but that doesn't change that said narrative has blown up in the XSX's face.

The issue is that it's your narrative (and in this case DF's as well if they honestly said that the XSX was "blowing up in MS's face"). These were two $500 configurations in disk form that are very similar but at the same time different and they've been performing very similar to each other but also different as a result for the last 3yrs.

The XSX is a great machine, it isn't blowing up on anyone's face from a performance or consumer perspective. I will say that from the business side, PS5 took some chances and went outside the lines a bit in terms of how consoles are designed and saved themselves some money in the process. That might sting MS a bit, but at the same time MS has never made a major push to earn a profit directly from hardware where Sony has drifted in that direction, MS will live.

I've seriously got to get a board or something and reinforce my computer shelf so I can start using my XSX, it's still wrapped up in the felt at the moment. LOL
 

Lysandros

Member
So, we've been led to believe the XSX is more power, yet........... it isn't? Quite the revelation.

No, I don't speak for myself. DF is saying the talk of the XSX being "more powerful" might be blowing up in its face. I used examples to show how this scenario might be true. Yes, both graphics cards have advantages over each other, but unless you've been living under a rock for the past three years, the narrative has been that the XSX's graphics card is stronger, the system has a higher teraflop count (irrelevant in my eyes, but that doesn't matter), and so forth. Therefore, that's what everyone has lived by.

However we choose to play slow now isn't my problem. There was a narrative. The truth might be different, but that doesn't change that said narrative has blown up in the XSX's face.
Mate, maybe you are unfamiliar with my posts since 2020. Let me just say that i am certainly not one of the clueless who got manipulated by either the sensless initial marketing or DF's rutin supportive propaganda. I am in fact one of their harshest critic. I was always at core interested in adressing misleading narratives about these systems with objective facts and logic. So no, not "everyone has lived by it".
 
Last edited:

HeWhoWalks

Gold Member
Mate, maybe you are unfamiliar with my posts since 2020. Let me just say that i am certainly not one of the clueless who got manipulated by either the sensless initial marketing or DF's rutin supportive propaganda. I am in fact one of their harshest critic. I was always at core interested in adressing misleading narratives about these systems with objective facts and logic. So no, not "everyone has lived by it".
Cool, mate, maybe you haven’t, but I said a narrative. What you do as an individual is irrelevant to that. Clearly I was talking in general.

DaGwaphics DaGwaphics

Nah, it definitely isn’t a narrative exclusive to the pieces of DNA that make up my brain, but like I said, I get it. Have at it. This thread wouldn’t exist if that were the case.

As for using your XSX, you should do that! Despite our debate, it’s a nice piece of kit! :)
 
Last edited:

Lysandros

Member
Cool, mate, maybe you haven’t, but I said a narrative. What you do as an individual is irrelevant to that. Clearly I was talking in general.

DaGwaphics DaGwaphics

Nah, it definitely isn’t a narrative exclusive to the pieces of DNA that make up my brain, but like I said, I get it. Have at it. This thread wouldn’t exist if that were the case.

As for using your XSX, you should do that! Despite our debate, it’s a nice piece of kit! :)
Fair. Just to close, i wasn't alone. I was among friends who fought this very narrative fervently. Yes, we were the minority but we were quite vocal and i think that our efforts did paid off by the end, at least to some degree. By this my existential rant ends.
 
Last edited:

HeWhoWalks

Gold Member
Fair. Just to close, i wasn't alone. I was among friends who fought this very narrative fervently. Yes, we were the minority but we were quite vocal and i think that our efforts did paid off by the end, at least to some degree. By this my existential rant ends.
In truth, I always questioned it, especially when you dissected both machines closely. Just felt pretty even with some variations here and there. So, I agree with you there!
 

rnlval

Member
To my understanding, they will have to costumize the cores again with Zen 4 for the PS4 backwards compatibility.
Despite common instruction set support, timing with resource synchronization can be an issue without resource tracking.
 

Justin9mm

Member
Who cares about XSX or PS5 specs when we have the laziest incapable devs of a generation. The optimisation and state of games launched this gen has been unacceptable. This has been the slowest gen as far as evolution of a console's capabilities.

It's sad that Sony is probably realising a Pro to brute force performance because devs in general don't want to put in the optimisation work. The PS5 and XSX are very capable consoles. The situation is ridiculous.
 
Last edited:
Who cares about XSX or PS5 specs when we have the laziest incapable devs of a generation. The optimisation and state of games launched this gen has been unacceptable. This has been the slowest gen as far as evolution of a console's capabilities.

It's sad that Sony is probably realising a Pro to brute force performance because devs in general don't want to put in the optimisation work. The PS5 and XSX are very capable consoles. The situation is ridiculous.
The in the trenches devs themselves are not lazy. Especially the ones on optimization coding (that’s down to the metal stuff).

It’s the higher ups that mandate stuff like Jedi Survivor having global RT that just kills the resolution and framerate. Most of those devs were just the one’s trying to make that stuff work however they could.
 
Last edited:

DenchDeckard

Moderated wildly
Have you completed the check and analyze?
Differences in graphics settings will be more accurate if many people confirm.

I'm working this morning then gotta take the car to the garage. My kids were on my xbox all night last night.

I'll grab an xbox shot and also get some shots on pc at different settings.

All being well I can report back this evening.
 

DenchDeckard

Moderated wildly
Yes.

On consoles if you enable 120FPS mode, it disables nanite/lumen. Disabling 120FPS mode turns it back on.

Is the top shot with nanite and lumen and the bottom without?

Is it pc ultra settings? Because the top image seems to have the less foliage trees like the xbox shot. 🤔

When you zoom in on the foliage it's actually higher quality but less dense but the denser one looks like lower quality 2d sprites but fuller.
 
Last edited:

Darius87

Member
this Tflops metric driven narrative is only correct with same series components and same manufactor so mostly in pc space even if ps5 and xsx have same manufactor and same series components how it's works it's quite different mainly because ps5 architectural improvements/customizations by that i mean removing bottlenecks at system level, e.x I/O, latency, CPU and it's new power draw method constant power suply(ps5) vs variable power suply(everything else) which allows ps5 to more easily reach to it's theoretical max tflop output and of course better dev tolls.
 

PaintTinJr

Member
XSX's GPU having more CUs and texture fill rate doesn't make it "a more powerful graphics card" (XSX doesn't have graphics 'card' by the way it's an integrated APU). Those are just two metrics where it happens to be slightly ahead.

Is that second metric for texturing misleading for advanced games?

IIRC We had the technical info direct from Xbox and their diagram says Texturing or BVH acceleration - whereas Cerny's road to PS5 has him state the BVH query can run in the background while running shaders, so as the gen progresses and BVH use for lighting becomes the de facto standard in high end AAA games the texture rate on the XsX is going to fall below the base PS5 capability by quite a bit.

This situation would also support a kit-bashing bottleneck on XsX that the Coalition encounter - assuming the PS5's geometry engine uses the BVH accelerators to real-time kit-bash before the vertex pipeline - because as the BVH accelerator use ends on a previous frame before the gathering/denoising/scaling the BVH accelerators would presumably be idle, meaning the next frame kit-bashing can occur before the next frame time-slice begins and would theoretically finish and free up the BVH accelerators before needed for the lumen lighting pass - where as on XsX the everything would need to work-in around the liberal use of sampler calls(for texturing) if I'm not mistaken.
 

DeepEnigma

Gold Member
Is the top shot with nanite and lumen and the bottom without?

Is it pc ultra settings? Because the top image seems to have the less foliage trees like the xbox shot. 🤔

When you zoom in on the foliage it's actually higher quality but less dense but the denser one looks like lower quality 2d sprites but fuller.
Might be a nanite thing with how it impacts foliage. You can tell lumen is off, however.
 

winjer

Gold Member
Is that second metric for texturing misleading for advanced games?

IIRC We had the technical info direct from Xbox and their diagram says Texturing or BVH acceleration - whereas Cerny's road to PS5 has him state the BVH query can run in the background while running shaders, so as the gen progresses and BVH use for lighting becomes the de facto standard in high end AAA games the texture rate on the XsX is going to fall below the base PS5 capability by quite a bit.

This situation would also support a kit-bashing bottleneck on XsX that the Coalition encounter - assuming the PS5's geometry engine uses the BVH accelerators to real-time kit-bash before the vertex pipeline - because as the BVH accelerator use ends on a previous frame before the gathering/denoising/scaling the BVH accelerators would presumably be idle, meaning the next frame kit-bashing can occur before the next frame time-slice begins and would theoretically finish and free up the BVH accelerators before needed for the lumen lighting pass - where as on XsX the everything would need to work-in around the liberal use of sampler calls(for texturing) if I'm not mistaken.

In RDNA2, be it on PC, Series or PS5, ray intersection tests are accelerated by a few instructions in the TMUs. And in both cases, BVH traversal is done on the shaders or the CPU.
BVH is not ray-tracing. It's just a form of defining where to cast rays to test against triangles, thus reducing the amount of rays necessary to test.
A BVH structure subdivides a scene's geometry into bounding boxes, and each box might have other bounding boxes, that increase granularity. This way, rays are only cast into areas that have geometry.
The result is a tree of boxes, that is required to traverse, from level to level. And this is a somewhat intensive process, especially on the L2 caches of RDNA2.
 

JackMcGunns

Member
Well said. Xbox wanted to have a bigger TF number, regardless of what needed to be sacrificed in order to achieve it.

DF lost my respect years ago with all the damage control they did for the Xbox One. Now it seems that they're doing preemptive damage control before the arrival of a PS5 Pro.


Everyone ignored what Oliver said? :unsure: Even Richard later mentions API and software which are variables that have always been a factor with closed boxes, then there's the market leader factor. When the PS5 is your bread and butter and you start building your engine there as your base console, there's very little incentive to exploit such a minimal difference even if it's there, you just focus on making the game work. There are so many games running at a higher internal resolution on Series X that there shouldn't even be a question on whether there's a difference or not. Whether it matters or not is a different topic, but one good take-away is that Series X improved greatly upon the situation they had with Xbox One vs PS4, and that's excellent news for Xbox fans, it was an entire generation of a product that was significatly inferior, but now Series X delivered the goods.
 
Last edited:

PaintTinJr

Member
In RDNA2, be it on PC, Series or PS5, ray intersection tests are accelerated by a few instructions in the TMUs. And in both cases, BVH traversal is done on the shaders or the CPU.
BVH is not ray-tracing. It's just a form of defining where to cast rays to test against triangles, thus reducing the amount of rays necessary to test.
A BVH structure subdivides a scene's geometry into bounding boxes, and each box might have other bounding boxes, that increase granularity. This way, rays are only cast into areas that have geometry.
The result is a tree of boxes, that is required to traverse, from level to level. And this is a somewhat intensive process, especially on the L2 caches of RDNA2.
I wasn't talking in implementation specifics for the code, more just in terms of where the real-time kit-bashing might fit the pipeline and get to use the hardware.

I'm working on the assumption the PS5 demo real-time kit-bashing uses the the BVH acceleration feature to build a new BVH (octree/BSP-tree, etc) for two intersecting (kit bashing) BVH structures - that were prebuilt offline when nanite is enabled on a model in a UE5 project.
 

winjer

Gold Member
I wasn't talking in implementation specifics for the code, more just in terms of where the real-time kit-bashing might fit the pipeline and get to use the hardware.

I'm working on the assumption the PS5 demo real-time kit-bashing uses the the BVH acceleration feature to build a new BVH (octree/BSP-tree, etc) for two intersecting (kit bashing) BVH structures - that were prebuilt offline when nanite is enabled on a model in a UE5 project.

Nanite has 2 options to trace against. One is to use a simplified geometry proxy, that sort of resembles the geometry rendered for the player to see. This is much faster, because there are fewer triangles to to intersection tests to. But it can have small visual errors, as the proxy geometry does not match the high detail geometry.
The other option is to trace directly to the geometry that nanite builds. This is slower but more accurate.

The BVH is neither of these geometries or meshes. The BVH is just a set of boxes that encompass an object or part of an object, as to define where rays are cast. Because if we cast a ton of rays into a place where there is nothing, we are just wasting resources.
For the BVH traversal, it doesn't really matter if an object inside a bounding box has 100 triangles or 100.000. What matters is to identify the region where to cast ray intersection tests.
It will matter for the ray accelerators, simply because doing 100 intersection tests is much faster than doing 100.000.

At least on PC, RDNA2 uses a BVH with 6-8 levels. This is defined by the AMD driver. But since MS and Sony have their own drivers, they could have different levels.
But at least on PC, having such granularity, allows to cast fewer rays, increasing performance.
 

DeepEnigma

Gold Member
Yeah it looks like nanite makes the foliage look like the series X version.

I'm going to try and have a proper look tonight on all 3.
However, it does not seem to be that way on the PS5 version, so it's odd.

To compare Fortnite footage you need to launch a game with both consoles on a group and spectate the same player
or join the same game to have exactly the same time of day
This. Might just have my friend stand where I am standing when we cross play and snap a shot on her X when I snap a shot on the PS5.
 

PaintTinJr

Member
Nanite has 2 options to trace against. One is to use a simplified geometry proxy, that sort of resembles the geometry rendered for the player to see. This is much faster, because there are fewer triangles to to intersection tests to. But it can have small visual errors, as the proxy geometry does not match the high detail geometry.
The other option is to trace directly to the geometry that nanite builds. This is slower but more accurate.

The BVH is neither of these geometries or meshes. The BVH is just a set of boxes that encompass an object or part of an object, as to define where rays are cast. Because if we cast a ton of rays into a place where there is nothing, we are just wasting resources.
For the BVH traversal, it doesn't really matter if an object inside a bounding box has 100 triangles or 100.000. What matters is to identify the region where to cast ray intersection tests.
It will matter for the ray accelerators, simply because doing 100 intersection tests is much faster than doing 100.000.

At least on PC, RDNA2 uses a BVH with 6-8 levels. This is defined by the AMD driver. But since MS and Sony have their own drivers, they could have different levels.
But at least on PC, having such granularity, allows to cast fewer rays, increasing performance.
We are talking at cross-purposes here. nanite doesn't automatically mean real-time kit-bashing.

kit bashing is the ability to slam geometry together in a very imprecise way for naturally irregular geometry that produces random or high frequency facets - like rocks or the wood of a tree being bashed together to form the root, branches and sub branches joins. And this is a technique that can be done accurately offline, like using the program: 3D Blender to use a (Boolean) modifier(difference, union or intersect) on two objects, that performs a set theory operation on the two models.

AFAIK, the operation is done by the program producing a BVH structure of each model, and then does set theory operation by intersecting bound volumes (as lines) of one structure on the other to return a final BVH with geometry that represents the result of the kit-bashing.

The PS5 was able to kit bash billions of polygons in real-time from the nanite models, before the nanite rendering pass, to then let nanite produce 1-4 polygons per pixel at (2560x)1404p resolution - in the first unreal 5 demo - ranging between a subset of 4-16M polygons per frame - for the on screen camera angle- from the billions used available in the models.
 

winjer

Gold Member
We are talking at cross-purposes here. nanite doesn't automatically mean real-time kit-bashing.

kit bashing is the ability to slam geometry together in a very imprecise way for naturally irregular geometry that produces random or high frequency facets - like rocks or the wood of a tree being bashed together to form the root, branches and sub branches joins. And this is a technique that can be done accurately offline, like using the program: 3D Blender to use a (Boolean) modifier(difference, union or intersect) on two objects, that performs a set theory operation on the two models.

AFAIK, the operation is done by the program producing a BVH structure of each model, and then does set theory operation by intersecting bound volumes (as lines) of one structure on the other to return a final BVH with geometry that represents the result of the kit-bashing.

The PS5 was able to kit bash billions of polygons in real-time from the nanite models, before the nanite rendering pass, to then let nanite produce 1-4 polygons per pixel at (2560x)1404p resolution - in the first unreal 5 demo - ranging between a subset of 4-16M polygons per frame - for the on screen camera angle- from the billions used available in the models.

But Lumen on the PS5 is not using RT, it's using SDFs.
Of course SDFs and Virtual Shadows in UE5 use a form of tracing a ray. But it's not the hardware RT that some games have on PC.
 

PaintTinJr

Member
But Lumen on the PS5 is not using RT, it's using SDFs.
Of course SDFs and Virtual Shadows in UE5 use a form of tracing a ray. But it's not the hardware RT that some games have on PC.
AFAIK the difference is that the RT units on nvidia cards in UE4 games and higher features on UE5 lumen, does the non SDF BVH RT as a simultaneous pass on the dedicate units and returns the RT pass result for gathering, whereas on AMD it is part of the shader/compute shader meaning it has to care about the BVH being available and when the ray tests are done, and all at the expense of both passes being combined in one pass where they have to share the resources.

The benefit on AMD can be better hardware utilisation of all workloads, workload flexibility - with redundant BVH loading and processing potential -for new algorithms for RT unsuited to Nvidia RT units, and with potentially lower latency.
 
Last edited:

winjer

Gold Member
AFAIK the difference is that the RT units on nvidia cards in UE4 games and higher features on UE5 lumen, does the non SDF BVH RT as a simultaneous pass on the dedicate units and returns the RT pass result for gathering, whereas on AMD it is part of the shader/compute shader meaning it has to care about the BVH being available and when the ray tests are done, and all at the expense of both passes being combined in one pass where they have to share the resources.

The benefit on AMD can be better hardware utilisation of all workload, workload flexibility - with redundant BVH loading and processing potential -for new algorithm for RT unsuited to Nvidia RT units, and with potentially lower latency.

Work Waves or Warps, drop significantly on AMD and NVidia, when using ray-tracing.
For example, on RDNA3, Work Waves while using RT can drop to half of what the GPU usually can do.
This is an issue with the front-end of the GPU, as it tries to schedule work waves, but with RT, these are much more difficult to coordinate.
That's why nvidia introduced Shader Execution Reordering, to improve on this issue. And it does quite a bit, although it still isn't perfect. And I bet NVidia will continue to improve SER, in the next generations.

This means that RDNA2 and RDNA3, while using RT have a lot of their shaders unused. So having them do the BVH traversal is not a limiting factor.
AMD did try to improve on Work Wave execution with bigger caches and registers on RDNA3. And this results in higher occupancy of shaders.

The other issue is that AMD's Ray-accelerators are still limited in the amount of rays they can cast, compared to NVidia.
AMD does have a more granular BVH, with 6-8 levels. This takes longer to traverse, but means fewer rays to test intersections.
NVidia is using a 2-3 level BVH, so it's faster to traverse. But it has less granularity and require more rays to test, but Nvidia's RT cores are so powerful, they can brute force this and still come well ahead of AMD.
 

PaintTinJr

Member
Work Waves or Warps, drop significantly on AMD and NVidia, when using ray-tracing.
For example, on RDNA3, Work Waves while using RT can drop to half of what the GPU usually can do.
This is an issue with the front-end of the GPU, as it tries to schedule work waves, but with RT, these are much more difficult to coordinate.
That's why nvidia introduced Shader Execution Reordering, to improve on this issue. And it does quite a bit, although it still isn't perfect. And I bet NVidia will continue to improve SER, in the next generations.

This means that RDNA2 and RDNA3, while using RT have a lot of their shaders unused. So having them do the BVH traversal is not a limiting factor.
AMD did try to improve on Work Wave execution with bigger caches and registers on RDNA3. And this results in higher occupancy of shaders.

The other issue is that AMD's Ray-accelerators are still limited in the amount of rays they can cast, compared to NVidia.
AMD does have a more granular BVH, with 6-8 levels. This takes longer to traverse, but means fewer rays to test intersections.
NVidia is using a 2-3 level BVH, so it's faster to traverse. But it has less granularity and require more rays to test, but Nvidia's RT cores are so powerful, they can brute force this and still come well ahead of AMD.
Obviously the reason for me discussing this follows on from a much earlier thread comment of me saying that Xbox is the outlier for TF/s not measuring up,

The context with regard to kit-bashing and BVH accelerators was the Xbox Alphapoint demo lacking geometry(megascans just 10%) and lacking real-time kit-bashing for performance reasons the Coalition noted in their 100M poly demo @1080p30. My first theory to explain the major shortcoming to the PS5 demo was the PS5 having a custom geometry engine with hw accelerated kitbashing, but on a second pass my theory is that it is because the Series technical info says that the BVH and TMU are joined so that shaders can't texture while waiting for a BVH lookup to return a result, unlike the words of Cerny in the Road to PS5 saying you can kick off a query and continue shading while waiting..

So now in relation to your interesting PC GPU AMD/nvidia info, the consideration is does the limitations of the Series BVH/TMU blocking come from the PC as standard AMD, or is it just the series as the odd one out, and does the 6-8 levels of granularity apply to Series, and apply to PS5 - when kit bashing performance on PS5 was crazy impressive from billions of polygons and Cerny has patents on BVH/RT?

Your point about the scheduling and efficiency issues with Nvidia's ASIC type black box RT is interesting, which I suspect is partly due to the overhead of loading the BVH data to the RT units on a shared bus, and that certainly aligns with the async on Nvidia being described as async light for the difficulty to get a big benefit from it,

From the info you provided, I'd hazard a guess that another benefit of the AMD solution in the long run, is that it will easily scale with all their future designs and gain easy performance with more units at lower lithography fabrication, whereas the Nvidia external RT/AI solutions would be expected to struggle to scale by internal GPU bus contention/scheduling and bandwidth.
 
Last edited:

winjer

Gold Member
Obviously the reason for me discussing this follows on from a much earlier thread comment of me saying that Xbox is the outlier for TF/s not measuring up,

The context with regard to kit-bashing and BVH accelerators was the Xbox Alphapoint demo lacking geometry(megascans just 10%) and lacking real-time kit-bashing for performance reasons the Coalition noted in their 100M poly demo @1080p30. My first theory to explain the major shortcoming to the PS5 demo was the PS5 having a custom geometry engine with hw accelerated kitbashing, but on a second pass my theory is that it is because the Series technical info says that the BVH and TMU are joined so that shaders can't texture while waiting for a BVH lookup to return a result, unlike the words of Cerny in the Road to PS5 saying you can kick off a query and continue shading while waiting..

The issue with the ray accelerators being in the TMUs, is that if there is a texture calculation at the same time an RT instruction is sent there, it means this last one will stall, until the texture work is done.
The solution for MS was introduced with DXR1.1, which added Inline Ray-Tracing. This allows the RT pass to be executed at any point during the shader pipeline.
This reduces contention with texture work on the TMUs. And Sony probably has a similar thing wit their RT API.

So now in relation to your interesting PC GPU AMD/nvidia info, the consideration is does the limitations of the Series BVH/TMU blocking come from the PC as standard AMD, or is it just the series as the odd one out, and does the 6-8 levels of granularity apply to Series, and apply to PS5 - when kit bashing performance on PS5 was crazy impressive from billions of polygons and Cerny has patents on BVH/RT?

Your point about the scheduling and efficiency issues with Nvidia's ASIC type black box RT is interesting, which I suspect is partly due to the overhead of loading the BVH data to the RT units on a shared bus, and that certainly aligns with the async on Nvidia being described as async light for the difficulty to get a big benefit from it,

From the info you provided, I'd hazard a guess that another benefit of the AMD solution in the long run, is that it will easily scale with all their future designs and gain easy performance with more units at lower lithography fabrication, whereas the Nvidia external RT/AI solutions would be expected to struggle to scale by internal GPU bus contention/scheduling and bandwidth.

The 6-8 level BVH is something that is done by AMD's driver. I don't know what Sony or MS are doing with theirs, because they do not use AMD's drivers. They use their own drivers on consoles.
The other thing to consider is that on consoles, it might be better to calculate the BVH traversal on the CPU. This is because CPU's are much better at branching calculations, than a GPU.
On PC, the CPU is far away from the GPU. There is quite a bit of latency going through the PCIe bus. And increases data going through the bus.
But on consoles, they have an SoC, so the CPU is right next to the GPU. So the CPU can calculate the BVH and pass it onto the GPU, much faster.
I suspect that, because Spider-man port to PC calculates the BVH on the CPU. This is probably something related to how the PS5 calculates the BVH on the PS5.

NVidia's solution is scaling very well with every iteration. And with things like SER, it will improve shader occupancy.
They might even improve the unit for BVH traversal, allowing the GPU to be more effective with deeper structures.

Shaders are not that good for calculating BVH traversal. So dedicated hardware would be better, both in performance and in power usage.
If AMD ever manages to improve work waves to the point of not having idle shaders, will mean that BVH traversal will be contending for resources.
 

PaintTinJr

Member
The issue with the ray accelerators being in the TMUs, is that if there is a texture calculation at the same time an RT instruction is sent there, it means this last one will stall, until the texture work is done.
The solution for MS was introduced with DXR1.1, which added Inline Ray-Tracing. This allows the RT pass to be executed at any point during the shader pipeline.
This reduces contention with texture work on the TMUs. And Sony probably has a similar thing wit their RT API.



The 6-8 level BVH is something that is done by AMD's driver. I don't know what Sony or MS are doing with theirs, because they do not use AMD's drivers. They use their own drivers on consoles.
The other thing to consider is that on consoles, it might be better to calculate the BVH traversal on the CPU. This is because CPU's are much better at branching calculations, than a GPU.
On PC, the CPU is far away from the GPU. There is quite a bit of latency going through the PCIe bus. And increases data going through the bus.
But on consoles, they have an SoC, so the CPU is right next to the GPU. So the CPU can calculate the BVH and pass it onto the GPU, much faster.
I suspect that, because Spider-man port to PC calculates the BVH on the CPU. This is probably something related to how the PS5 calculates the BVH on the PS5.

NVidia's solution is scaling very well with every iteration. And with things like SER, it will improve shader occupancy.
They might even improve the unit for BVH traversal, allowing the GPU to be more effective with deeper structures.

Shaders are not that good for calculating BVH traversal. So dedicated hardware would be better, both in performance and in power usage.
If AMD ever manages to improve work waves to the point of not having idle shaders, will mean that BVH traversal will be contending for resources.
In Spiderman they aren't using the kit-bashing or kit-bashing of megascans because buildings are largely engineered - even when rustic - and megascanning a skyscraper is beyond the scope of scanning, unless I missed a video about drones slowly hoverng inch by inch over one.

I suspect you are correct, that the highest quality aspects of RT on remastered/Morales and incoming spiderman 2 are all doing the precise non SDF foreground RT on the PS5 CPU cores, that are unburdened of IO, but kit bashing models with 15mpolygons with 5million polygon rock would eat up way too much of the 40Gbps CPU bandwidth Cerny discussed in the road to PS5, and at those geometry levels I don't believe hardware RT acceleration is practical.

Kit-bash acceleration on the PS5 GPU makes more sense IMO, when you consider the BVH depths involved to generate an optimal intersection of the BVHs to leave minimal geometry remaining to be rendered - when starting with megascan assets.

I agree Nvidia RT is currently scaling well for foreground h/w accelerated RT, but AFAIK utilisation for non RT gaming is superior on AMD's design by higher frame-rates including nanite and software lumen where shaders and async compute shaders run and its a battle between GDDR6 and GDDR6X, if I'm not mistaken.

So I'm of the belief - that as conventional UE4 type resource needs plateau and get met easily by both manufacturer top cards - AMD will catchup by being able to scale up their lighter BVH acceleration count at the heart of their GPU - with stacking and lithograph changes -much easier that Nvidia re-engineering their RT/AI and async slower bus and scheduling issue on the periphery of a design too tied to monolithic design.
 
Last edited:

sinnergy

Member
Personally, I think developers need to take into account the devices that they are developing for from the start. As a digital designer and web designer/developer we also build for a large range of devices and specifications.. it is what it is and it is lots of work .. too much to be honest .

So some games/projects are not planned right .. even MS ones.
 
Last edited:
Personally, I think developers need to take into account the devices that they are developing for from the start. As a digital designer and web designer/developer we also build for a briars range of devices and specifications.. it is what it is and it is lots of work .. too much to be honest .

So some games/projects are not planned right .. even MS ones.

I don't think its just planning just that each system has their own strengths and weaknesses. It's why we can see results flip flop between the two. It's not necessarily a bad thing because it means they are on par with each other. Which is great for developers and consumers.
 

winjer

Gold Member
In Spiderman they aren't using the kit-bashing or kit-bashing of megascans because buildings are largely engineered - even when rustic - and megascanning a skyscraper is beyond the scope of scanning, unless I missed a video about drones slowly hoverng inch by inch over one.

I suspect you are correct, that the highest quality aspects of RT on remastered/Morales and incoming spiderman 2 are all doing the precise non SDF foreground RT on the PS5 CPU cores, that are unburdened of IO, but kit bashing models with 15mpolygons with 5million polygon rock would eat up way too much of the 40Gbps CPU bandwidth Cerny discussed in the road to PS5, and at those geometry levels I don't believe hardware RT acceleration is practical.

Kit-bash acceleration on the PS5 GPU makes more sense IMO, when you consider the BVH depths involved to generate an optimal intersection of the BVHs to leave minimal geometry remaining to be rendered - when starting with megascan assets.

I agree Nvidia RT is currently scaling well for foreground h/w accelerated RT, but AFAIK utilisation for non RT gaming is superior on AMD's design by higher frame-rates including nanite and software lumen where shaders and async compute shaders run and its a battle between GDDR6 and GDDR6X, if I'm not mistaken.

So I'm of the belief - that as conventional UE4 type resource needs plateau and get met easily by both manufacturer top cards - AMD will catchup by being able to scale up their lighter BVH acceleration count at the heart of their GPU - with stacking and lithograph changes -much easier that Nvidia re-engineering their RT/AI and async slower bus and scheduling issue on the periphery of a design too tied to monolithic design.

Megascans is just a way of getting fast results, with high quality assets.
But after the scan, it's still necessary to clean up and arrange the model.
And for older game engines, like UE4, it still requires pairing down geometry and creating several LODs.
On UE5, this process is automated to some extent.
Regarding kit bashing models, that is just a technique artists use to blend models. It's not limited by hardware.
But it does have the disadvantage of creating overdraw of geometry. Modern GPUs have the ability to cull some hidden geometry.
And modern game engines can also cull geometry using things like HZB.
For example, UE4 and UE5 have several HZB algorithms enabled by default. But this culling pass takes some time to process.
But usually, it's worth it, because removing hidden geometry means less time wasted calculating and shading hidden triangles.
 

PaintTinJr

Member
Megascans is just a way of getting fast results, with high quality assets.
But after the scan, it's still necessary to clean up and arrange the model.
And for older game engines, like UE4, it still requires pairing down geometry and creating several LODs.
On UE5, this process is automated to some extent.
Regarding kit bashing models, that is just a technique artists use to blend models. It's not limited by hardware.
But it does have the disadvantage of creating overdraw of geometry. Modern GPUs have the ability to cull some hidden geometry.
And modern game engines can also cull geometry using things like HZB.
For example, UE4 and UE5 have several HZB algorithms enabled by default. But this culling pass takes some time to process.
But usually, it's worth it, because removing hidden geometry means less time wasted calculating and shading hidden triangles.
Re-topologizing megascans isn't really practical,

The "edit" is like touching up a photo where the detail is way beyond what someone could draw like to fit with the photo. Re-topologizing by using the scan to trace over is an option - but the resultant mesh isn't a megascan,

I'm not sure if you've done a 3D scanning with with your smartphone (like I can on my Xperia 1 and on my previous XZ premium with Sony 3D creator app) which typically produces a scan with 100K polys, but to use them you are typically taking them as-is, maybe just clipping them or texture painting - basic stuff -or re-scanning for better results, as the scanning doesn't produce nice grids of regular quads that can be decimated easily for LoDs - like professional artist hand produced models - and instead produces a web of triangles - with each triangle being roughly the same area. Models at megascan level can be produce via virtual clay modelling/sculpting in 3rd modelling programs like zbrush, mudbox, blender, like the model of the statue with the shield in the PS5 demo with 30M polys per statue.

Kit-bashing in real-time like the PS5 demo did allowed them to render 4-16M polys per frame and make it look like it was rendering 10 billion per frame because of the real-time kit bashing. You could let an artist do that offline, but the workflow isn't going to be good if you have to compile the kit-bashing results to actual 10billion statically placed polygons, losing all the redundant storage savings of reusing a few models with 15Mpolys each (300MB -1GB data the UE5 demo info said) and all the redundant data processing by the geometry and fragment pipelines, as that geometry local to the scene that passes a coarse bounding box scene cull will still need translated by the model view matrix to be tested by the geometry pipeline - and potentially the fragment pipeline. So the HiZbuffer on PS5 UE5 is probably getting used most with the cascading shadowmaps.

I have my own technique from 15years ago for a type of HiZbuffering using just one zbuffer and the stencil buffer to accurately and efficiently draw from 1mm to 1000km in one frustum - or as far as something would be smaller than a fragment by the projection - so am very familiar with the costs/tradeoffs surrounding Hidden surface removal and the accuracy for shadowmap testing as to why hierarchical zbufferring is needed

I think explains why the AlphaPoint demo by the coalition used just 100Mpoyls, and just 10% of the 100M polys scene data for megascans (so just one 10Mpoly mesh megascan) and did the rest with non-kit bashing static nanite geometry and 2d decal techniques.

This major difference in the demos is probably the central reason that PS5-only games are going to look completely different to multiplatform results built around the Series. as the lowest common denominator. The more effective the real-time kit-bashing and the more dense the polygons it works with the better the set theory results of kit-bashing, resulting in less overdraw from nanite because the sub-pixel triangles will only make it pass the kit-bashing in edge cases and typically won't survive the geometry pipeline culling to even reach the fragment processing pipeline, in-line with Cerny's tweets about culling geometry early before rendering when the PS5 was showcased and people jumped on the RDNA 1.5 and geometry shader DX acronyms that PS5 had other names for.
 
Last edited:

winjer

Gold Member
Re-topologizing megascans isn't really practical,

The "edit" is like touching up a photo where the detail is way beyond what someone could draw like to fit with the photo. Re-topologizing by using the scan to trace over is an option - but the resultant mesh isn't a megascan,

I'm not sure if you've done a 3D scanning with with your smartphone (like I can on my Xperia 1 and on my previous XZ premium with Sony 3D creator app) which typically produces a scan with 100K polys, but to use them you are typically taking them as-is, maybe just clipping them or texture painting - basic stuff -or re-scanning for better results, as the scanning doesn't produce nice grids of regular quads that can be decimated easily for LoDs - like professional artist hand produced models - and instead produces a web of triangles - with each triangle being roughly the same area. Models at megascan level can be produce via virtual clay modelling/sculpting in 3rd modelling programs like zbrush, mudbox, blender, like the model of the statue with the shield in the PS5 demo with 30M polys per statue.

Kit-bashing in real-time like the PS5 demo did allowed them to render 4-16M polys per frame and make it look like it was rendering 10 billion per frame because of the real-time kit bashing. You could let an artist do that offline, but the workflow isn't going to be good if you have to compile the kit-bashing results to actual 10billion statically placed polygons, losing all the redundant storage savings of reusing a few models with 15Mpolys each (300MB -1GB data the UE5 demo info said) and all the redundant data processing by the geometry and fragment pipelines, as that geometry local to the scene that passes a coarse bounding box scene cull will still need translated by the model view matrix to be tested by the geometry pipeline - and potentially the fragment pipeline. So the HiZbuffer on PS5 UE5 is probably getting used most with the cascading shadowmaps.

I have my own technique from 15years ago for a type of HiZbuffering using just one zbuffer and the stencil buffer to accurately and efficiently draw from 1mm to 1000km in one frustum - or as far as something would be smaller than a fragment by the projection - so am very familiar with the costs/tradeoffs surrounding Hidden surface removal and the accuracy for shadowmap testing as to why hierarchical zbufferring is needed

I think explains why the AlphaPoint demo by the coalition used just 100Mpoyls, and just 10% of the 100M polys scene data for megascans (so just one 10Mpoly mesh megascan) and did the rest with non-kit bashing static nanite geometry and 2d decal techniques.

This major difference in the demos is probably the central reason that PS5-only games are going to look completely different to multiplatform results built around the Series. as the lowest common denominator. The more effective the real-time kit-bashing and the more dense the polygons it works with the better the set theory results of kit-bashing, resulting in less overdraw from nanite because the sub-pixel triangles will only make it pass the kit-bashing in edge cases and typically won't survive the geometry pipeline culling to even reach the fragment processing pipeline, in-line with Cerny's tweets about culling geometry early before rendering when the PS5 was showcased and people jumped on the RDNA 1.5 and geometry shader DX acronyms that PS5 had other names for.

Megascans have too much geometry to be used in a game engine like UE4, for example. It's always necessary to pair back the geometry and also to create the lower LODs.
Even UE5 has to do this, up to some extent. Despite nanite having such high geometry throughput.

I don't know what you mean with kit-bashing, but I always understood it as mixing up a few models to make a bigger one. This term comes from the cinema and TV, with people that make models and sets.
For games one example would be taking a bunch of models of rocks, to make a landscape. I wonder if you have a different definition for it.

At least on and UE4 UE5, HZB culling is software based, so it should be identical on the PS5, Series consoles and PC.
The same for nanite rasterization, as it's also software based. Running on shaders.

My guess in the performance diferences between these consoles seem to be the software, meaning the APIs and drivers.
Sony seems to have significant lower overhead. Meaning they can have more draw calls than the series consoles.

On PC, one thing I noticed, is that several games on the MS Store run slower by a handful of fps, than running on Steam or Epic.
And this seems to be an issue with the overhead caused by UWP.
I would not be surprised if the Series consoles were also having performance drops because of UWP.
 

PaintTinJr

Member
Megascans have too much geometry to be used in a game engine like UE4, for example. It's always necessary to pair back the geometry and also to create the lower LODs.
Even UE5 has to do this, up to some extent. Despite nanite having such high geometry throughput.

I don't know what you mean with kit-bashing, but I always understood it as mixing up a few models to make a bigger one. This term comes from the cinema and TV, with people that make models and sets.
For games one example would be taking a bunch of models of rocks, to make a landscape. I wonder if you have a different definition for it.

At least on and UE4 UE5, HZB culling is software based, so it should be identical on the PS5, Series consoles and PC.
The same for nanite rasterization, as it's also software based. Running on shaders.

My guess in the performance diferences between these consoles seem to be the software, meaning the APIs and drivers.
Sony seems to have significant lower overhead. Meaning they can have more draw calls than the series consoles.

On PC, one thing I noticed, is that several games on the MS Store run slower by a handful of fps, than running on Steam or Epic.
And this seems to be an issue with the overhead caused by UWP.
I would not be surprised if the Series consoles were also having performance drops because of UWP.
No, in regards of early geometry culling a pro PS5 tweet definitely referenced geometry being removed before the vertex pipeline by the custom geometry engine IIRC, and IMO h/w accelerated kit-bashing of the megascans' BVH representations - on the BVH accelerators on PS5 that don't block sampler/texturing - prior to the vertex pipeline seem like the correct candidate to explain how1GB of data with 300MB streaming pool was able to generate frames looking like they came from the 10billion polys when Series couldn't do the real-time kit bashing and used just one megascan and was forced to limit all scene geometry to 100M polys in the AlphaPoint demo -and looked it.

Unfortunately when I mentioned UE4 in a prior post, you have taken it literally, rather than than as the shorthand I intended for all pre hardware acceleration RT, rasterization techniques when I was discussing AMD vs Nvidia performance in games.

And no, my definition of megascans is the same as yours, but I think the part I'm struggling to explain in my posts is the crippling aspect of actually converting the procedural descriptions of a kit-bashed scene, into the literal geometry - after kit bashing.

While the scene is just a few 10-30mpoly assets with a procedural description of how they recursively bash together to form a scene with 10billion polys, the memory and processing requirements are manageable. - the video below was something I watched a few months back, and although it doesn't kit-bash, it nicely shows how applying a complex geometry node tree(procedural description to geometry) to turn the mesh from polys to lego bricks can overwhelm Blender in actual geometry, when it is no longer a virtual description. - which I believe is the cheat the PS5 does. It is able to avoid fully applying the procedural description until the camera is set - per frame - and then minimal-istically only handles the data in the frustum that isn't occluded after kit-bashing bounding boxes to occlude as much as possible, and then gets all the benefits of nanite and lumen, on top.
 

winjer

Gold Member
No, in regards of early geometry culling a pro PS5 tweet definitely referenced geometry being removed before the vertex pipeline by the custom geometry engine IIRC, and IMO h/w accelerated kit-bashing of the megascans' BVH representations - on the BVH accelerators on PS5 that don't block sampler/texturing - prior to the vertex pipeline seem like the correct candidate to explain how1GB of data with 300MB streaming pool was able to generate frames looking like they came from the 10billion polys when Series couldn't do the real-time kit bashing and used just one megascan and was forced to limit all scene geometry to 100M polys in the AlphaPoint demo -and looked it.

Unfortunately when I mentioned UE4 in a prior post, you have taken it literally, rather than than as the shorthand I intended for all pre hardware acceleration RT, rasterization techniques when I was discussing AMD vs Nvidia performance in games.

And no, my definition of megascans is the same as yours, but I think the part I'm struggling to explain in my posts is the crippling aspect of actually converting the procedural descriptions of a kit-bashed scene, into the literal geometry - after kit bashing.

While the scene is just a few 10-30mpoly assets with a procedural description of how they recursively bash together to form a scene with 10billion polys, the memory and processing requirements are manageable. - the video below was something I watched a few months back, and although it doesn't kit-bash, it nicely shows how applying a complex geometry node tree(procedural description to geometry) to turn the mesh from polys to lego bricks can overwhelm Blender in actual geometry, when it is no longer a virtual description. - which I believe is the cheat the PS5 does. It is able to avoid fully applying the procedural description until the camera is set - per frame - and then minimal-istically only handles the data in the frustum that isn't occluded after kit-bashing bounding boxes to occlude as much as possible, and then gets all the benefits of nanite and lumen, on top.


There seems to be some confusion here.
All GPUs for the last 20 years have had some level of geometry culling. And RDNA2 also has hardware to do this, but it's likely the same on the PS5 and Series S/X.
In the case of UE4 and UE5, the engine gives devs the option to have only hardware culling, or to have HZB culling, which is done in software, which is the default option. So the PS5 and Series S/X will have the same software culling enabled.

RDNA2 does not have BVH acceleration. It's either done in shaders or in the CPU. And it's software based.
Neither the PS5 or Series S/X have any hardware to accelerate BVH traversal.

Also consider that UE5 with modern GPUs will not use a vertex stage. Because it uses Mesh Shaders, it will use a new geometry pipeline. It's just Amplification shaders, Mesh Shaders, Rasterization and Pixel shaders.
The PS5 supposedly uses primitive Shaders, but it's not much different. Only has a couple more steps left in, from the old pipeline form.

UE5 has a cache for render mesh & blas data, that has a size of 200MB by default. And a streaming pool for nanite with a value of 512MB by default. This can be changed to alleviate I/O dependency.
And UE5 also has the option to use compression for nanite. So if a system has a good I/O setup and decompression, it can save space using compression.
 

PaintTinJr

Member
There seems to be some confusion here.
All GPUs for the last 20 years have had some level of geometry culling. And RDNA2 also has hardware to do this, but it's likely the same on the PS5 and Series S/X.
In the case of UE4 and UE5, the engine gives devs the option to have only hardware culling, or to have HZB culling, which is done in software, which is the default option. So the PS5 and Series S/X will have the same software culling enabled.

RDNA2 does not have BVH acceleration. It's either done in shaders or in the CPU. And it's software based.
Neither the PS5 or Series S/X have any hardware to accelerate BVH traversal.

Also consider that UE5 with modern GPUs will not use a vertex stage. Because it uses Mesh Shaders, it will use a new geometry pipeline. It's just Amplification shaders, Mesh Shaders, Rasterization and Pixel shaders.
The PS5 supposedly uses primitive Shaders, but it's not much different. Only has a couple more steps left in, from the old pipeline form.

UE5 has a cache for render mesh & blas data, that has a size of 200MB by default. And a streaming pool for nanite with a value of 512MB by default. This can be changed to alleviate I/O dependency.
And UE5 also has the option to use compression for nanite. So if a system has a good I/O setup and decompression, it can save space using compression.
I'm going to guess you haven't done much - if any - graphics programming which is why you are making statements that are technically false,

For a start, any unit inside a GPU doing a task offloaded from the CPU is h/w acceleration, so the BVH units with the PS5 and Series GPUs are accelerators, as are shaders and compute shaders, which I assume you think because they are programmable software somehow invalidates the acceleration they provide.

What is different between AMD and Nvidia, and for the purpose of this discussion we'll say the Nvidia RT accelerators are like fixed path ASICs pre opengl 1.5 when Hardware T&L was coined as acceleration. The Nvidia accelerators, and even CUDA units work independently of their conventional programmable graphics pipeline, so work in parallel without blocking once the setup of the parallel jobs has been complete.

Secondly, Mesh shaders are still part of the geometry pipeline that encompasses the vertex and geometry shader pipelines. Mesh shading doesn't magically save the GPU from having to use CUs to transform vertices, texture coordinates and vertex normals of assets from model space through the transforms to project as fragments in the viewport. Mesh shaders just become a different abstract entry point to that same functionality.
Even if GPU extensions have existed for basic geometry culling, geometry culling as a meaningful process has previous been a CPU + shader/compute shader task, unless we are being verbose to focus on the frustum clip plane culling - which is supposed to be somewhat redundant in a game engine for all but polygons that straddle the walls of the frustum and trigger primitive subdivision so the inside part gets kept and the outside part gets culled.

In the discussion we were having, I'm talking about the models' BVH representations getting real-time kit-bashed accelerated via the BVH units - probably as async compute shader - while the previous frame's gather, denoise and upscale is taking place in a shader - because Cerny in his Road to PS5 said the BVH units can resolve a query while the shader is still running, meaning (AFAIK) unlike the Series hardware, the PS5 doesn't block the Texture memory unit use while waiting on a BVH query, so can async both shader and BVH via compute.


As for UE5 it will exploit the custom hardware in the PS5 as part of Sony and Epic's partnership - Sony owns a few percent of Epic - so, if as I suspect the PS5 can real-time kit-bash via the custom geometry engine using the BVH units, then that will be an option for developers - in addition to the default provided by UE5. The hierarchical zbuffering, by its very name zbuffering, doesn't cull geometry, but fragments, as the zbuffer operates in the fragment shader pipeline, unless they are using the terminology loosely as a catchy name for an algorithm - it just sounds like a frustum partition algorithm to improve accuracy of zbuffer fragment when projected, ie everything is already passed the frustum culling to be in the rendering process.
 
Last edited:

winjer

Gold Member
For a start, any unit inside a GPU doing a task offloaded from the CPU is h/w acceleration, so the BVH units with the PS5 and Series GPUs are accelerators, as are shaders and compute shaders, which I assume you think because they are programmable software somehow invalidates the acceleration they provide.

What is different between AMD and Nvidia, and for the purpose of this discussion we'll say the Nvidia RT accelerators are like fixed path ASICs pre opengl 1.5 when Hardware T&L was coined as acceleration. The Nvidia accelerators, and even CUDA units work independently of their conventional programmable graphics pipeline, so work in parallel without blocking once the setup of the parallel jobs has been complete.

The concept of hardware acceleration is usually referred to a unit designed specifically to run certain instructions. Shaders are generalist parallels processors, so they are not hardware accelerators.
For example, if a GPU has a unit specific to decode AV1, that is hardware acceleration decode/encode of AV1 video. But if it's run on shaders, the it's just software based.
An example of this is relating to UE5 is rasterization. A GPU has specific units for rasterization, what you called ASICs. But EPIC choose to do software rasterization, because it is more flexible and better suited for their engine.

In the the case of RT, nvidia has units to specifically accelerate both, BVH traversal and ray testing.
In the case of RDNA2, it only has some instructions inside the TMUs, that accelerate ray-testing. But BVH traversal is done in software, in the GPU shaders.
RDNA2 does not have any hardware to accelerate BVH traversal. It's just shaders and they don't even have instructions to accelerate the BVH.

Secondly, Mesh shaders are still part of the geometry pipeline that encompasses the vertex and geometry shader pipelines. Mesh shading doesn't magically save the GPU from having to use CUs to transform vertices, texture coordinates and vertex normals of assets from model space through the transforms to project as fragments in the viewport. Mesh shaders just become a different abstract entry point to that same functionality.
Even if GPU extensions have existed for basic geometry culling, geometry culling as a meaningful process has previous been a CPU + shader/compute shader task, unless we are being verbose to focus on the frustum clip plane culling - which is supposed to be somewhat redundant in a game engine for all but polygons that straddle the walls of the frustum and trigger primitive subdivision so the inside part gets kept and the outside part gets culled.

I should have been clearer about what I was talking about when refereeing to Mesh Shaders.
I'm talking about the new GPU pipeline for geometry rendering, introduced with DX12_2
Previously in DX12, we had these stages: Input Assembler; Vertex Shader; Hull Shader; Tessellation; Domain Shader; Geometry Shader; Rasterization; Pixel Shader
But with the new pipeline: Amplification Shader; Mesh Shader; Rasterization and Pixel Shader.
It's a simpler pipeline that reduces overhead and increases geometry throughput significantly.

GPUs have been doing hardware culling, to prevent overdraw, even before the existence of programable shaders.
Of course it wasn't as advanced as what we have today, but it did offer performance improvements.

In the discussion we were having, I'm talking about the models' BVH representations getting real-time kit-bashed accelerated via the BVH units - probably as async compute shader - while the previous frame's gather, denoise and upscale is taking place in a shader - because Cerny in his Road to PS5 said the BVH units can resolve a query while the shader is still running, meaning (AFAIK) unlike the Series hardware, the PS5 doesn't block the Texture memory unit use while waiting on a BVH query, so can async both shader and BVH via compute.

We already talked about this. That feature is called In-Line ray-tracing.
Something that the Series X and RDNA2 on PC can do. Even NVidia's hardware benefited from this, as it improved contention during the execution pipeline.

BTW, can you point me to where Cerny said that.

As for UE5 it will exploit the custom hardware in the PS5 as part of Sony and Epic's partnership - Sony owns a few percent of Epic - so, if as I suspect the PS5 can real-time kit-bash via the custom geometry engine using the BVH units, then that will be an option for developers - in addition to the default provided by UE5. The hierarchical zbuffering, by its very name zbuffering, doesn't cull geometry, but fragments, as the zbuffer operates in the fragment shader pipeline, unless they are using the terminology loosely as a catchy name for an algorithm - it just sounds like a frustum partition algorithm to improve accuracy of zbuffer fragement when projected, ie everything is already past the frustum culling to be in the rendering process.

But geometry can be culled using an Hierarchical Z:
That is what the r.HZBOcclusion cvar does in Unreal.

This is not related to UE5, but it's a good example of occlusion culling with Hierarchical-Z
 

Insane Metal

Gold Member
Ps5/xsx are as close as it gets in terms of overall specs. Aside from the SSD. I think that kind of discussion is useless. Yes, MS came forward saying it'd beat the ps5 easily, which it didn't and won't. But they have to value their machine, so that's to be expected from them. Every company does that. With that said, both consoles are great machines.
 
Top Bottom