• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Inside Unreal: In-depth look at PS5's Lumen in the land Of Nanite demo(only 6.14gb of geometry) and Deep dive into Nanite

Status
Not open for further replies.

PaintTinJr

Member
I find the "Secret Power of the Tempest Engine" talk hilarious, considering we are talking about a single CU, which the PS5 GPU already has 72 of (36 dual CUs/WGPs). Yes, I'm sure a part delivering 1/72nd of the power of the PS5 is the secret to unlocking its almighty power. By my calculations, Tempest would have 0.143 Teraflops, if it was running at the same speed as the GPU. So that's a nice bump over the PS4 CPU, like Cerny was saying.
It is more like an SPU according to Cerny - where low latency was key to them being so versatile and powerful on the PS3- so it might be more about how fast it can pre-pass to eliminate redundant RT work that the GPU would do with the compute cores for the SW RT mesh distance field rendering, than about the 52 versus 36 you appear to be angling at. HW RT is part of lumen, but for a very short distance, and the real overall IQ will be how performant the first two parts of lumen's software rendering work, and for how far at that performance(200m? or 1km?) IMHO.
 

FireFly

Member
It is more like an SPU according to Cerny - where low latency was key to them being so versatile and powerful on the PS3- so it might be more about how fast it can pre-pass to eliminate redundant RT work that the GPU would do with the compute cores for the SW RT mesh distance field rendering, than about the 52 versus 36 you appear to be angling at. HW RT is part of lumen, but for a very short distance, and the real overall IQ will be how performant the first two parts of lumen's software rendering work, and for how far at that performance(200m? or 1km?) IMHO.
It's not 52 vs 36. It's 72 vs. 1! So that super efficient "SPU" like processor that according to Cerny might be able to hit 100% efficiency in a specific task due to some of those changes, can contribute a whole 1.4% to the peak floating point power of the system. I'm sure that whole 1.4% of extra compute is going to revolutionise everything!
 
S staticshock

Are the internal names Reverb and Topaz references to Lumen and Nanite respectively? The first being related to reflections and the second being related to geometry.

No. They are strictly a reference to the two demo project and have no meaning what so ever.

Reverb = Lumen in the land of nanite.
Topaz = Valley of the ancient

They make it clear in the Welcome to UE5 Early Access Livestream that’s available on YouTube.

It has absolutely nothing to do with any 3D audio chip in the PS5 that does convolution reverb which Xbox Series also has and also does convolution reverb.

People like PaintTinJr PaintTinJr will keep coming up with new things to push their bs agenda. Just like “playing in the editor is less demanding than the compile packaged version”. Daniel Wright saw that and debunked it, then they come out with even more BS. As the saying goes, you can’t cure stupid. You also can’t cure fanaticism.

202008180212441.jpg
 
Last edited:

PaintTinJr

Member
It's not 52 vs 36. It's 72 vs. 1! So that super efficient "SPU" like processor that according to Cerny might be able to hit 100% efficiency in a specific task due to some of those changes, can contribute a whole 1.4% to the peak floating point power of the system. I'm sure that whole 1.4% of extra compute is going to revolutionise everything!
Well look at L1, L2 and LLC, in a PC with 8GBs of memory, or look at some workloads with and without HT available. Or an ASIC for that matter.

Regardless of 72 vs 1, if you need the 1 and not always need the 72, then its contribution can be more than looking at its floating point power as a black box unit. If it only provides 1.4%, as you are trying to assert, then they would have just gone with 38CUs instead, because it is cheaper, less risk, etc. Clearly the cost of of that one bespoke CU is already more performant at what it does than 4 of those 72, no?
 
Last edited:

PaintTinJr

Member
First of all lumen IS NOT PUSHING harder in the lumen in the land of nanite demo. Lumen has always been heavy, then and now. In fact Lumen is heavier than because it uses ray tracing hardware which it didn’t use before.
He literally says the land of the ancients is using very little GI - ie Lumen - and is mostly directly lit, so unless lumen isn't for high quality real-time GI with reflections, and the warrior grave yard isn't an indoor scene in context of lumen's sw GI then you are failing to comprehend what was said, and would therefore be wrong.
 
He literally says the land of the ancients is using very little GI - ie Lumen - and is mostly directly lit, so unless lumen isn't for high quality real-time GI with reflections, and the warrior grave yard isn't an indoor scene in context of lumen's sw GI then you are failing to comprehend what was said, and would therefore be wrong.

No he didn’t. You are litterally making stuff up and have absolutely no clue what you are talking about. Zero. None. Zip. Nil. Nada.

it makes it impossible to have a discussion because every statement you make is litterally made up.
 

FireFly

Member
Well look at L1, L2 and LLC, in a PC with 8GBs of memory, or look at some workloads with and without HT available. Or an ASIC for that matter.

Regardless of 72 vs 1, if you need the 1 and not always need the 72, then its contribution can be more than looking at its floating point power as a black box unit. If it only provides 1.4%, as you are trying to assert, then they would have just gone with 38CUs instead, because it is cheaper, less risk, etc. Clearly the cost of of that one bespoke CU is already more performant at what it does than 4 of those 72, no?
Well, 1.4% is the theoretical peak, which Cerny believes they can hit 100% of. A regular CU wouldn't be able to hit 100% of its theoretical capacity. So yes, Tempest is more efficient at using its budget, which justifies the custom silicon. So, say it’s 3X more efficient. That's ~4%. Nothing to get excited about, and certainly nothing that could allow you to double the number of pixels rendered.
 
He literally says the land of the ancients is using very little GI - ie Lumen - and is mostly directly lit, so unless lumen isn't for high quality real-time GI with reflections, and the warrior grave yard isn't an indoor scene in context of lumen's sw GI then you are failing to comprehend what was said, and would therefore be wrong.

The slide literally says Indoor GI “Quality”.
He is not saying and never said there is less GI. GI is a combination of direct lighting and Indirect lighting. He is saying if your Indirect Lighting system has low quality, for example has splotches, dots, artifacts, lighh leaking. The flaws would be minimized in outdoor scenes with abundant direct lighting.

The direct lighting doesn’t replace the indirect lighting, there isn’t less lumen in the outdoor scene. Direct lighting combines with it. But in an indoor scene where there is less direct lighting to combine with. The flaws of your indirect lighting will be visible.

But there isn’t more indirect lighting there.
 

IntentionalPun

Ask me about my wife's perfect butthole
PaintTinJr PaintTinJr : Are you suggesting the PS5 UE5 demo used a bunch of their specialized hardware? Because the only thing they used were primitive shaders; there was no use of anything else, certainly not the 3D audio chip lol

It was all in the Eurogamer interview they did.. software based RT for Lumen, nanite used primitive shaders (which are not a unique PS5 feature in any way), but beyond that, everything was just the CUs on the GPU and the CPU.
 

PaintTinJr

Member
No he didn’t. You are litterally making stuff up and have absolutely no clue what you are talking about. Zero. None. Zip. Nil. Nada.

it makes it impossible to have a discussion because every statement you make is litterally made up.

Lumen: Inside Unreal Engine 5 said:
Lumen Overview @10:05

Daniel:

So let me just give you a quick overview of our motivation for Lumen.

Goals slide:

Daniel: When we set out to make this thing, we wanted to solve the problem of dynamic global illumination. And instead of making something that targeted at current gen consoles and trying to scale it up, we just kind of bit the bullet and targeted next gen consoles so that we could make something really high quality. But we did also set out to scale to high-end PC, especially enterprise use cases where quality needs to be top notch. That's also part of Lumen's goals.And we really wanted to provide performant reflections together with dynamic GI because it's really difficult to dynamic global illumination together with reflections.

That's not something that existed before in the engine.

Goals slide with Valley of the Ancients openworld pic.

Chance: That's great.
Daniel: Lumen also needs to work with large open worlds, which is a big focus of Unreal Engine 5, particularly with Nanite. Nanite makes crazy levels possible, millions of instances. Just ridiculous content, really. And lumen needs to work with all of that. It can't be the thing keeping people from making aesome levels. While all running in real time. And that means all of Lumen's algorithn had to stream, and they all need to be GPU based.

Goals Interior GI window lit room pic.

Daniel: But indirect lighting can't just work outdoors. It's much more important - indirect lighting shows up much more prominently indoors. And this also is by far the hardest problem for dynamic global illumination because the entire room can be lit by a very small area. And we have to find that area reliably in order to give good image quality. And it's not just enough to solve these individually, but we have to solve them seamlessly so that you can walk into a room that looks great and walk back out into a complex open world. So these have been the problems that we've been focusing on and trying to solve with Lumen.

Chance: Daniel, since I'm curious as a not artist here. Indoor quality, by far, the hardest problem in real time GI. Why is that? Is it because of all of the different geometry that's kind of in an enclosed space and trying to get the bounce right of the things? Or lights affect more things that are closer to the camera? Is there a reason that is, say, harder than an outdoor space?

Daniel: Yeah, it's that there are a lot more sheltered areas where only GI is on screen. Like in this screenshot here, the direct lighting is just in the bottom left. Everything else is Lumen. Lumen skylight, Lumen global illumination. So every pixel, Lumen is providing all of the lighting for it.

Goals slide with Valley of the Ancients openworld pic.

Whereas outdoor - this is not a middle of the day scene - but in what we shipped for Valley of the Ancients, because the sun is what you see the most. So we can get away with a lower quality there. And that is true of most outdoor scenes.

Chance: Got you.
upto @13:16

When he says get away with lower quality he is IMHO clearly talking about the cheap fast lighting like this image's right picture.

QygJbz8.jpg
 
Last edited:

Md Ray

Member
yeah but a part from the last thread where snake29 was very present .. is long time . basically from the presentation of the engine that many saying inaccuracies .. from the mythical @Bo_Hazem to @ethomaz , and again @assurdum P Panajev2001a @James Sawyer Ford @Papacheeks and many many others that i blocked from some time and do not come to mind now.The thread on the consoles the one on the next gen and finally this one on the engine will remain milestones The crows have been served
I should have been on that list.
Sad Cartoon GIF
 
Last edited:

PaintTinJr

Member
PaintTinJr PaintTinJr : Are you suggesting the PS5 UE5 demo used a bunch of their specialized hardware? Because the only thing they used were primitive shaders; there was no use of anything else, certainly not the 3D audio chip lol

It was all in the Eurogamer interview they did.. software based RT for Lumen, nanite used primitive shaders (which are not a unique PS5 feature in any way), but beyond that, everything was just the CUs on the GPU and the CPU.
I was speculating that the ~50% downgrade from 1400p30 with 14ms on average to spare in the UE5 PS5 demo - shown 12months ago - doesn't reconcile with PS5 Valley of Ancients now supposedly only capable of 1080p30 on PS5 . As some big reason needs to explain those compute resources changing so heavily. 5ms Nanite is being taxed more in the Valley, but ~15ms Lumen should have a lesser burden because of lower GI. I'm now inclined to think the original demo numbers are misleading because it maybe wasn't interactive at all, and the minute you add interactivity the lumen compute budget might triple.

I don't normally read the likes of Eurogamer if I can help it, but if you can link that article you are referring to it would be appreciated(y)
 

IntentionalPun

Ask me about my wife's perfect butthole
I was speculating that the ~50% downgrade from 1400p30 with 14ms on average to spare in the UE5 PS5 demo - shown 12months ago - doesn't reconcile with PS5 Valley of Ancients now supposedly only capable of 1080p30 on PS5 . As some big reason needs to explain those compute resources changing so heavily. 5ms Nanite is being taxed more in the Valley, but ~15ms Lumen should have a lesser burden because of lower GI. I'm now inclined to think the original demo numbers are misleading because it maybe wasn't interactive at all, and the minute you add interactivity the lumen compute budget might triple.

I don't normally read the likes of Eurogamer if I can help it, but if you can link that article you are referring to it would be appreciated(y)


Explains that Lumen at the time was showing off software based RT:

"Lumen uses ray tracing to solve indirect lighting, but not triangle ray tracing," explains Daniel Wright, technical director of graphics at Epic. "Lumen traces rays against a scene representation consisting of signed distance fields, voxels and height fields. As a result, it requires no special ray tracing hardware."

Explains that nanite is mostly software based, but does use the PS5 primitive shaders:

"The vast majority of triangles are software rasterised using hyper-optimised compute shaders specifically designed for the advantages we can exploit," explains Brian Karis. "As a result, we've been able to leave hardware rasterisers in the dust at this specific task. Software rasterisation is a core component of Nanite that allows it to achieve what it does. We can't beat hardware rasterisers in all cases though so we'll use hardware when we've determined it's the faster path. On PlayStation 5 we use primitive shaders for that path which is considerably faster than using the old pipeline we had before with vertex shaders."

These are direct dev quotes so really have little to do with Digital Foundry.


DF themselves proves their incompetence I believe in this very article.. repeating the myth that the demo actually rendered billions of triangles lol (might have been a different article though TBH)
 
Last edited:

PaintTinJr

Member
Well, 1.4% is the theoretical peak, which Cerny believes they can hit 100% of. A regular CU wouldn't be able to hit 100% of its theoretical capacity. So yes, Tempest is more efficient at using its budget, which justifies the custom silicon. So, say it’s 3X more efficient. That's ~4%. Nothing to get excited about, and certainly nothing that could allow you to double the number of pixels rendered.
Whether it is useful beyond their audio - which was just speculation - you are still trying to frame the Tempest Engine's ability to do its work in terms of CUs, as though it isn't a bespoke piece of compute hardware, and can be trivially ignored or replaced by something else.

IIRC the Tempest engine is designed to run CPU type code - rather than GPU code - but with GPU efficiency - which was also just one of many capabilities the SPUs had in the PS3. So if an algorithm that can't run well on a GPU needs to accelerated then the Tempest engine can likely be that solution, and if one of those algorithms was aligned to Lumen - rather than Nanite - then accelerating it could potentially have a multi-fold benefit in other components if latency is the source of the bottleneck.
 

PaintTinJr

Member

Explains that Lumen at the time was showing off software based RT:



Explains that nanite is mostly software based, but does use the PS5 primitive shaders:



These are direct dev quotes so really have little to do with Digital Foundry.


DF themselves proves their incompetence I believe in this very article.. repeating the myth that the demo actually rendered billions of triangles lol (might have been a different article though TBH)
Yeah it doesn't sound likely it uses Tempest, but sadly what you quoted didn't rule it out either IMO because SW RT is general purpose code, and Tempest is a processor for a specific set of general purpose problems.
 

PaintTinJr

Member
Serious question....Did geordiemp geordiemp retire and pass his armchair privileges down? Cause it's quite funny to see how many different, yet unique ways, PaintTinJr PaintTinJr can spin this somehow.










I'm surprised you haven't taken umbrage with Daniel for actually putting a Ray count number of +200 rays per pixel in his info for good quality GI from RT hardware - assuming it would scale and maintain frame-rate, and BVH's could be updated as needed - as that completely disarms the warriors like yourself pushing the: "if only the consoles hadn't cheaped out on HW RT " narrative "we could have perfect RT in all games".

Because AFAIK, a simple bit of maths places a lower bound of 200 rays at 60fps and 3840x2160 =~100 GigaRays/sec, which is considerably more than a 10GigaRays/sec RTX 2080 Ti monster, and still 2.5billion rays per second short of doing a lower bound of 12.5GigaRays/sec for 1080p30.
 
Last edited:

drotahorror

Member
You should really put a comma after deep dive. Otherwise it reads as an instruction LOL.

On topic, are we now at a consensus that the IO in PS5 is overkill, seeing as Unreal 5 seems to work extremely well with normal SSD's or, are we excited to see where engines go with the extra headroom?

I was just jokin around. Between all of Ubisoft's deep-dives, it seems like it's sort of a new buzz word right now.
 

FireFly

Member
Whether it is useful beyond their audio - which was just speculation - you are still trying to frame the Tempest Engine's ability to do its work in terms of CUs, as though it isn't a bespoke piece of compute hardware, and can be trivially ignored or replaced by something else.

IIRC the Tempest engine is designed to run CPU type code - rather than GPU code - but with GPU efficiency - which was also just one of many capabilities the SPUs had in the PS3. So if an algorithm that can't run well on a GPU needs to accelerated then the Tempest engine can likely be that solution, and if one of those algorithms was aligned to Lumen - rather than Nanite - then accelerating it could potentially have a multi-fold benefit in other components if latency is the source of the bottleneck.
It contains the same vector ALUs as contained within the GPU, but 1/72nd of them. So even if it's several times more efficient at occupying those ALUs than the GPU, with a given algorithm, it's still only going to be able to do a fraction of the work.
 
When he says get away with lower quality he is IMHO clearly talking about the cheap fast lighting like this image's right picture.

QygJbz8.jpg

Why do you keep making stuff. Why do you need "IMHO" rather than just listening to the creator of Lumen?
Are you even listening to the video? That's not cheap fast lighting. He is telling you straight up explain how Lumen works which is tracing against the Global Distance Field when settings is set to Global Tracing and then tracing against the more accurate Mesh distance field for 2 meters if you have Detail Tracing selected.

You can open up the Valley project and switch it to Global Tracing and compare the FPS. There's nothing cheap about that.

It works the same way indoor as it does outdoor. You are literally making stuff up.
The Valley of the Ancient project uses detail tracing and Final Gather 4. (the highest).

Heck the Lumen in the Land of nanite only has 1-2 light sources and/or one directional light on at one time and lumen scales with the number of light sources.

2 spot light at the beginning which gets turned off after she enters the crack.
The one directional light for the rock climbing which gets turned off after she enters the building.
Then one point light while she's in the building which gets turned off for her to open the ceiling.
The one directional light once the ceiling is open which is the same for the end flying scene.
 
Last edited:
Why do you keep making stuff. Why do you need "IMHO" rather than just listening to the creator of Lumen?
Are you even listening to the video? That's not cheap fast lighting. He is telling you straight up explain how Lumen works which is tracing against the Global Distance Field when settings is set to Global Tracing and then tracing against the more accurate Mesh distance field for 2 meters if you have Detail Tracing selected.

You can open up the Valley project and switch it to Global Tracing and compare the FPS. There's nothing cheap about that.

It works the same way indoor as it does outdoor. You are literally making stuff up.
The Valley of the Ancient project uses detail tracing and Final Gather 4. (the highest).

Heck the Lumen in the Land of nanite only has 1-2 light sources and/or one directional light on at one time and lumen scales with the number of light sources.

2 spot light at the beginning which gets turned off after she enters the crack.
The one directional light for the rock climbing which gets turned off after she enters the building.
Then one point light while she's in the building which gets turned off for her to open the ceiling.
The one directional light once the ceiling is open which is the same for the end flying scene.

PaintTinJr PaintTinJr

I meant to say you can crank up Valley Demo to Final Gather 4 which is only for art viz (originally set to 1 and probably the same with last years demo) and Lumen to Detail Tracing (originally set to Global Tracing and probably the same with last years demo) with very little FPS cost (~5).
There's nothing cheap about Global Tracing. Detail Tracing just adds a tiny bit of cost for very minimal gain. Since its only 2 meter. Its good for arc viz.
So what you are trying to push that last years demo has better lumen is straight nonsense. Even Daniel says Lumen has improved tremendously quality wise compared to the PS5 demo.
"Lumen 12 months ago. In that Lumen in the land of nanite Demo. We greatly improved lumen since then."
 

JeloSWE

Member
You should really put a comma after deep dive. Otherwise it reads as an instruction LOL.

On topic, are we now at a consensus that the IO in PS5 is overkill, seeing as Unreal 5 seems to work extremely well with normal SSD's or, are we excited to see where engines go with the extra headroom?
An example where the SSD overkill speed can be indispensable, shown by R&C, is that you can do super fast movie like cuts between two entirely different locations, another example is hitting the crystals and instantly swapping in an entire level, it's quite the joy to experience. I'd venture there will be many unexpected uses for it's overkill speed we haven't thought of yet. But the fact that I no longer have to stare at a loading screen when fast traveling is huge for me.
 
An example where the SSD overkill speed can be indispensable, shown by R&C, is that you can do super fast movie like cuts between two entirely different locations, another example is hitting the crystals and instantly swapping in an entire level, it's quite the joy to experience. I'd venture there will be many unexpected uses for it's overkill speed we haven't thought of yet. But the fact that I no longer have to stare at a loading screen when fast traveling is huge for me.
You got any data sizes that is being steamed in? How fast is the throughput? And are you implying this isn't possible anywhere else?
 
For real, the guy vanished completely, is he banned, or just had a mental breakdown after all his BS theories came out not to be true?
I'm assuming the latter. After constantly referring to how PC will never be able to run this demo, because of the lack of SSD i/o, latency, cache scrubbers, etc, it must have been a huge hit to his pride and ego. And I don't blame him for going missing literally the day after the Unreal Unhinge 5 Deep Dive. He didn't even stay here, to see Land of Nanite running on PC!


Someone should seriously check on him, to make sure he's ok.
 

JeloSWE

Member
You got any data sizes that is being steamed in? How fast is the throughput? And are you implying this isn't possible anywhere else?
You can find the numbers on the internet. But say on avarage 9GB/s at 60fps will equate 150mb per frame rendered or it will fill the entire RAM in less than 2 sec. And if you are loading in a level you can start by streaming in assets that are only in your field of view so you don't need to load all assets and fill RAM entirely before you start rendering the scene.

I don't think you can find any reasonably priced consumer hardware yet that can match the speed of this HW integration, fetch priority levels and low level API. It's coming but it's not here for a while.

Also you can think of the SSD speed in two ways, you have latency, what is the fastest possible fetch you can do, and then you have sustained through put, like with many cars next to each other on a wide high way. Having low latency is maybe more useful when running a game than overall through put and PS5 is ridiculously fast and we'll eventually see novel usages for it I think.
 
Last edited:
You can find the numbers on the internet. But say on avarage 9GB/s at 60fps will equate 150mb per frame rendered or it will fill the entire RAM in less than 2 sec. And if you are loading in a level you can start by streaming in assets that are only in your field of view so you don't need to load all assets and fill RAM entirely before you start rendering the scene.

I don't think you can find any reasonably priced consumer hardware yet that can match the speed of this HW integration, fetch priority levels and low level API. It's coming but it's not here for a while.

Also you can think of the SSD speed in two ways, you have latency, what is the fastest possible fetch you can do, and then you have sustained through put, like with many cars next to each other on a wide high way. Having low latency is maybe more useful when running a game than overall through put and PS5 is ridiculously fast and we'll eventually see novel usages for it I think.
No, I'm specifically asking for those details on R&C when switching worlds, as many imply it can't be done else where, which would have to mean it's SSD is faster than outdated DDR3 ram, DDR4, etc. Which would mean it's much faster than what Cerny said. Otherwise, it's a very disingenuous thing to imply, as we have all witnessed in this thread: all the people who ate crow for saying this wouldn't work on PC, or would require an unreleased super SSD to run.





I even claimed this demo would run much better on pci 3.0 SSD and look how some tried to clown me, but ended up being clowns themselves for not listening to any of us who at least have a bit of common sense and basic understanding of these things.




What if it runs even better on a pci 3.0 SSD though? This won't be a hypothetical question anymore, as we'll be able to test the ins and outs soon. And myself and others will be reporting our findings.

PS5 specifically. (From the article)

Rewrite specifically for ps5 only? Or just for next gen i/o altogether? You know, moving from the slow ass HDD's from over 10 years ago. SSD's have been in existence for well over 10 plus years now. Just keep my posts bookmarked. You'll be eating crow when the demo releases.


You still don't understand apparently. Until you know what the actual speeds were, the marketing deal (same as last go around), or why Sweeney won't respond directly to questions pertaining to PS5 vs PC performance in the demo, you can kinda figured out where it's going. No matter how hard you fight, you'll never stop the inevitable, which is UE5 demo releasing for PC. When I can run it with better performance on a pci 3.0 SSD better than what has been shown, what will your excuse be? And the demo will release before PC gets directstorage I/O updates. Isn't that crazy that pc won't require that update and will still have better performance?!

His reading skills seem to be a bit lacking :). And also capacity for understanding things if he'd actually read the facts everywhere.


S systemr123 Did you ever gain the capacity for understanding things yet? Crow must taste amazing.
 
Last edited:

mitchman

Gold Member
You can find the numbers on the internet. But say on avarage 9GB/s at 60fps will equate 150mb per frame rendered or it will fill the entire RAM in less than 2 sec. And if you are loading in a level you can start by streaming in assets that are only in your field of view so you don't need to load all assets and fill RAM entirely before you start rendering the scene.

I don't think you can find any reasonably priced consumer hardware yet that can match the speed of this HW integration, fetch priority levels and low level API. It's coming but it's not here for a while.

Also you can think of the SSD speed in two ways, you have latency, what is the fastest possible fetch you can do, and then you have sustained through put, like with many cars next to each other on a wide high way. Having low latency is maybe more useful when running a game than overall through put and PS5 is ridiculously fast and we'll eventually see novel usages for it I think.
The real gem here is the priority levels. R&C basically divides the level up into priority levels based on LOD level (simplified). LOD 0, closest to the camera, gets highest priority, LOD 1 gets lower priority, etc. The API and SSD will then proceed with a huge patch of assets and load them according to the priority level, and it's fairly transparent to the game. The end result is that the important quality assets that must be high quality are near instant with placeholders for higher LODs in the same patch, then it will proceed to load the higher LODs/priority levels.
So even if loading the whole level takes longer than a second, for the perceived performance, it appears to be almost instant.
 

FireFly

Member
Why are you keep comparing Tempest with GPU, Tempest is more comparable to SPU in PS3, which is a CPU.
In his talk, Cerny said that they took an RDNA CU, and removed the cache, relying on DMA instead, to create an SPU-like device. And PaintTinJr speculated that "lumen began life at a lower fidelity/performance for running on the PS5's 3D audio tempest accelerator solution and was extended to include graphics when they realised how effective it could be". And moreover that: "there is a possibility that lumen can be real-time (single frame ~ 1400p30) [...] using a combination of the GPU and Tempest engine to eliminate redundancy by lumen's sw traced lighting being built as a superset of the audio tracing".

So I am not the one saying Tempest is designed for graphics workloads. I'm merely pointing out that if you do want to use it that way, its total compute power is pretty limited, relative to the PS5 GPU.
 

Panajev2001a

GAF's Pleasant Genius
In his talk, Cerny said that they took an RDNA CU, and removed the cache, relying on DMA instead, to create an SPU-like device. And PaintTinJr speculated that "lumen began life at a lower fidelity/performance for running on the PS5's 3D audio tempest accelerator solution and was extended to include graphics when they realised how effective it could be". And moreover that: "there is a possibility that lumen can be real-time (single frame ~ 1400p30) [...] using a combination of the GPU and Tempest engine to eliminate redundancy by lumen's sw traced lighting being built as a superset of the audio tracing".

So I am not the one saying Tempest is designed for graphics workloads. I'm merely pointing out that if you do want to use it that way, its total compute power is pretty limited, relative to the PS5 GPU.
Depending where it is located, need some schematics, it could have lower latency and easier control from CPU code to work as a co-processor and there may be some advantages to that for some very coarse grained work.
 

JeloSWE

Member
No, I'm specifically asking for those details on R&C when switching worlds, as many imply it can't be done else where, which would have to mean it's SSD is faster than outdated DDR3 ram, DDR4, etc. Which would mean it's much faster than what Cerny said. Otherwise, it's a very disingenuous thing to imply, as we have all witnessed in this thread: all the people who ate crow for saying this wouldn't work on PC, or would require an unreleased super SSD to run.


I even claimed this demo would run much better on pci 3.0 SSD and look how some tried to clown me, but ended up being clowns themselves for not listening to any of us who at least have a bit of common sense and basic understanding of these things.
While desktop PC with a good SSD can read data at a pretty high speed, the biggest difference with the PS5 SSD solution is it's ability to stream and decompress the files stored with Kraken directly into the unified GPU/CPU RAM on the fly. Doing this on a PC is basically impossible with any type of consumer HW at the moment and would require every core of a very beefy CPU dedicated to the task, leaving no compute power left to the game.

lMojBrt.jpg
 
While desktop PC with a good SSD can read data at a pretty high speed, the biggest difference with the PS5 SSD solution is it's ability to stream and decompress the files stored with Kraken directly into the unified GPU/CPU RAM on the fly. Doing this on a PC is basically impossible with any type of consumer HW at the moment and would require every core of a very beefy CPU dedicated to the task, leaving no compute power left to the game.

lMojBrt.jpg
Again, that isn't saying anything about data throughput for R&C or saying why it can't run on PC, or run better. Remember how several people were saying how UE5 demo was impossible to run on PC, and how it uses much more data throughput than R&C? Well we all saw how easily that was put to rest, as the guy ran the demo inside of the editor.

R&C isn't available on PC (yet?), So there's no way to compare currently, and the developers haven't started the throughput of data streaming. DirectStorage and RTX I/O will definitely help *in the future*, but there's no games that are impossible to run on PC currently, because this tech isn't available to the public yet.
 

JeloSWE

Member
Again, that isn't saying anything about data throughput for R&C or saying why it can't run on PC, or run better. Remember how several people were saying how UE5 demo was impossible to run on PC, and how it uses much more data throughput than R&C? Well we all saw how easily that was put to rest, as the guy ran the demo inside of the editor.

R&C isn't available on PC (yet?), So there's no way to compare currently, and the developers haven't started the throughput of data streaming. DirectStorage and RTX I/O will definitely help *in the future*, but there's no games that are impossible to run on PC currently, because this tech isn't available to the public yet.
Well I never said anything about the UE5 Demo, a regular SSD can certainly stream in data fast enough when traversing that world for Nanite. But no consumer hardware can read decompressed data straight into RAM as quick as PS5 does right now. I don't see any PC being able to jump between worlds as fast as RC does at the moment unless you are already storing both levels in RAM, it's just that a PC can't read compressed data from disk that fast. In RC when you hit a crystal the whole level is swapped in a few frames, it's very impressive and couldn't be done on PC unless resident in RAM basically.
 
Last edited:
Well I never said anything about the UE5 Demo, a regular SSD can certainly stream in data fast enough when traversing that world for Nanite. But no consumer hardware can read decompressed data straight into RAM as quick as PS5 does right now. I don't see any PC being able to jump between worlds as fast as RC does at the moment unless you are already storing both levels in RAM, it's just that a PC can't read compressed data from disk that fast. In RC when you hit a crystal the whole level is swapped in a few frames, it's very impressive and couldn't be done on PC unless resident in RAM basically.
Which is my entire point. Just like in UE5, and being able to run from RAM and any SSD, anything can be done. And PC wouldn't need data to be super compressed. That's why it's funny when people say R&C couldn't be done on PC. It definitely can, while pushing higher fidelity, higher framerates, and better raytracing. But many people want you to believe it's impossible or that DirectStorage isn't here yet there for it's impossible, which is all b.s.
 

JeloSWE

Member
Which is my entire point. Just like in UE5, and being able to run from RAM and any SSD, anything can be done. And PC wouldn't need data to be super compressed. That's why it's funny when people say R&C couldn't be done on PC. It definitely can, while pushing higher fidelity, higher framerates, and better raytracing. But many people want you to believe it's impossible or that DirectStorage isn't here yet there for it's impossible, which is all b.s.
Well, consider this scenario, which actually does happen in RC, there is a couple of instances where he basically goes between several world in quick succession, IF RC's levels take up near 16GB RAM then you'd need at least double the amount to swap between 2 world but you'd also need more so you can start cashing the 3rd level coming up while the other two are switching, so you'd for simplicity sake need 48GB to handle this and the data can't be compressed on disk either, so the game will take up a lot more space as well. Thus, if you have cash, you can certainly build a PC that could run a game like RC without problem. But you would have to have A LOT of RAM to compensate for the SSD speed on the PS5. Currently nothing beats the PS5 in this regard if you consider cost.
 
Last edited:
Well, consider this scenario, which actually does happen in RC, there is a couple of instances where he basically goes between several world in quick succession, IF RC's levels take up near 16GB RAM then you'd need at least double the amount to swap between 2 world but you'd also need more so you can start cashing the 3rd level coming up while the other two are switching, so you'd for simplicity sake need 48GB to handle this and the data can't be compressed on disk either, so the game will take up a lot more space as well. Thus, if you have cash, you can certainly build a PC that could run a game like RC without problem. But you would have to have A LOT of RAM to compensate for the SSD speed on the PS5. Currently nothing beats the PS5 in this regard if you consider cost.
Where are you getting these absurd numbers from? Who said they are filling up 16gb of RAM in the first place, or that each instance uses 16gb of RAM? Why would the character data, the 2-3gb of RAM dedicated to the OS, physics, miscellaneous data, need to be swapped constantly? Do you have any info on this data, or is it a highly unlikely hypothetical scenario?
 

JeloSWE

Member
Where are you getting these absurd numbers from? Who said they are filling up 16gb of RAM in the first place, or that each instance uses 16gb of RAM? Why would the character data, the 2-3gb of RAM dedicated to the OS, physics, miscellaneous data, need to be swapped constantly? Do you have any info on this data, or is it a highly unlikely hypothetical scenario?
I'm just giving an example, they are by no means actual numbers, I don't know the details of the OS, And as mitchman mitchman mentioned and you wouldn't need to load all assets completely either, just what is visible on screen. Still, while a PC in theory can do everything the PS5 can if you keep the game resident in RAM, it's in no way cost effective.
 
Last edited:

Rea

Member
In his talk, Cerny said that they took an RDNA CU, and removed the cache, relying on DMA instead, to create an SPU-like device. And PaintTinJr speculated that "lumen began life at a lower fidelity/performance for running on the PS5's 3D audio tempest accelerator solution and was extended to include graphics when they realised how effective it could be". And moreover that: "there is a possibility that lumen can be real-time (single frame ~ 1400p30) [...] using a combination of the GPU and Tempest engine to eliminate redundancy by lumen's sw traced lighting being built as a superset of the audio tracing".

So I am not the one saying Tempest is designed for graphics workloads. I'm merely pointing out that if you do want to use it that way, its total compute power is pretty limited, relative to the PS5 GPU.
Do you know that Tempest engine supports 2 wave fronts, one is for 3D audio and the other one is for games. I don't know how this one wavefront can be used for games. I'm not dev, Only devs will know.
 

PaintTinJr

Member
In his talk, Cerny said that they took an RDNA CU, and removed the cache, relying on DMA instead, to create an SPU-like device. And PaintTinJr speculated that "lumen began life at a lower fidelity/performance for running on the PS5's 3D audio tempest accelerator solution and was extended to include graphics when they realised how effective it could be". And moreover that: "there is a possibility that lumen can be real-time (single frame ~ 1400p30) [...] using a combination of the GPU and Tempest engine to eliminate redundancy by lumen's sw traced lighting being built as a superset of the audio tracing".

So I am not the one saying Tempest is designed for graphics workloads. I'm merely pointing out that if you do want to use it that way, its total compute power is pretty limited, relative to the PS5 GPU.
You quoted me speculating about the Tempest's potential audio tracing being a subset of the GPUs hw/sw rt superset, as the slide from Road to Ps5 indicates, so saying I said it was used for graphics is either you misunderstanding me, or you are incorrectly paraphrasing my words.
 
An example where the SSD overkill speed can be indispensable, shown by R&C, is that you can do super fast movie like cuts between two entirely different locations, another example is hitting the crystals and instantly swapping in an entire level, it's quite the joy to experience. I'd venture there will be many unexpected uses for it's overkill speed we haven't thought of yet. But the fact that I no longer have to stare at a loading screen when fast traveling is huge for me.
You can find the numbers on the internet. But say on avarage 9GB/s at 60fps will equate 150mb per frame rendered or it will fill the entire RAM in less than 2 sec. And if you are loading in a level you can start by streaming in assets that are only in your field of view so you don't need to load all assets and fill RAM entirely before you start rendering the scene.

I don't think you can find any reasonably priced consumer hardware yet that can match the speed of this HW integration, fetch priority levels and low level API. It's coming but it's not here for a while.

Also you can think of the SSD speed in two ways, you have latency, what is the fastest possible fetch you can do, and then you have sustained through put, like with many cars next to each other on a wide high way. Having low latency is maybe more useful when running a game than overall through put and PS5 is ridiculously fast and we'll eventually see novel usages for it I think.
While desktop PC with a good SSD can read data at a pretty high speed, the biggest difference with the PS5 SSD solution is it's ability to stream and decompress the files stored with Kraken directly into the unified GPU/CPU RAM on the fly. Doing this on a PC is basically impossible with any type of consumer HW at the moment and would require every core of a very beefy CPU dedicated to the task, leaving no compute power left to the game.
Well I never said anything about the UE5 Demo, a regular SSD can certainly stream in data fast enough when traversing that world for Nanite. But no consumer hardware can read decompressed data straight into RAM as quick as PS5 does right now. I don't see any PC being able to jump between worlds as fast as RC does at the moment unless you are already storing both levels in RAM, it's just that a PC can't read compressed data from disk that fast. In RC when you hit a crystal the whole level is swapped in a few frames, it's very impressive and couldn't be done on PC unless resident in RAM basically.
Well, consider this scenario, which actually does happen in RC, there is a couple of instances where he basically goes between several world in quick succession, IF RC's levels take up near 16GB RAM then you'd need at least double the amount to swap between 2 world but you'd also need more so you can start cashing the 3rd level coming up while the other two are switching, so you'd for simplicity sake need 48GB to handle this and the data can't be compressed on disk either, so the game will take up a lot more space as well. Thus, if you have cash, you can certainly build a PC that could run a game like RC without problem. But you would have to have A LOT of RAM to compensate for the SSD speed on the PS5. Currently nothing beats the PS5 in this regard if you consider cost.
Every single thing you said is completely wrong.
Everything. The R&C portal can be replicated in under 2 seconds using regular old UE4 Level Streaming (Not even the new World Partition in UE5) and a SATA SSD.
This can be done continuously, there's no limit.

No superfast SSD. No DirectStorage. No RX IO.
The 48GB crap is the same nonesense people were saying that you need 128 GB to run the UE5 demo and even then you won't be able to run the flying scene.

Portal Transition Effect using Level Streaming - The Gabmeister



TransitionZone.gif


PortalDiagram.jpg
 
Last edited:

FireFly

Member
You quoted me speculating about the Tempest's potential audio tracing being a subset of the GPUs hw/sw rt superset, as the slide from Road to Ps5 indicates, so saying I said it was used for graphics is either you misunderstanding me, or you are incorrectly paraphrasing my words.
I opted to directly quote rather than paraphrase your position, because it was not 100% clear to me. However, my understanding of what you said was that Tempest together with the PS5's GPU could accelerate Lumen, such that "1400p30" rendering would be possible, instead of 1080p30. Feel free to correct me if I have misunderstood.
 
Last edited:

PaintTinJr

Member
PaintTinJr PaintTinJr

I meant to say you can crank up Valley Demo to Final Gather 4 which is only for art viz (originally set to 1 and probably the same with last years demo) and Lumen to Detail Tracing (originally set to Global Tracing and probably the same with last years demo) with very little FPS cost (~5).
There's nothing cheap about Global Tracing. Detail Tracing just adds a tiny bit of cost for very minimal gain. Since its only 2 meter. Its good for arc viz.
So what you are trying to push that last years demo has better lumen is straight nonsense. Even Daniel says Lumen has improved tremendously quality wise compared to the PS5 demo.
"Lumen 12 months ago. In that Lumen in the land of nanite Demo. We greatly improved lumen since then."
You are not comprehending receipts and you aren't comprehending how the Lumen pipeline works and uses less compute as you get further from near frustum clip plane.

Lumen has a hierarchical system using the depth buffer and surface normal from the nanite pass AFAIK, so assuming assuming HW RT is enabled. You have ray test in close range Hw RT, which then overflows into SW RT(with mesh distance fields first at close range - or 1km in PS5 demo - then global distance fields - which he states are lower quality and very fast in the video).

In an indoor GI lit scene, all the rays remain in the more performance demanding distances, so consume more compute, whereas outdoors has less bounced light at close range, and resolves mostly by direct lighting or cheaper global distance field GI.

If you still don't accept that, save us all some trouble and please put me on ignore.
 
unknown.png


Yeah, this is definitely in the same league as ratchet and Clank.... As long as you're blind.
What a weird take....



Obviously it's a proof of concept... Pretty sure there is more than 1 person who worked on R&C, vs this....

One is heavily funded, one is made just to prove PC is more than capable of doing this, and that it doesn't require cerny architecture for it to be possible. It's what many of us have been saying all along, people need to stop thinking a certain way because of he say she say, and think with logic and rationale for a second.


Md Ray Md Ray you have anything to add besides laughing emojis? You're reminding me of your younger brother ethomaz now....
 
Last edited:
Status
Not open for further replies.
Top Bottom