DF: Unreal Engine 5 Matrix City Sample PC Analysis: The Cost of Next-Gen Rendering

SlimySnake

The Contrarian
uch, seeing how heavy is ue5 on cpu and how badly scale with cores number maybe massive exodus to ue5 for so many studios is little premature
Whats crazy is that even after I turn off all traffic, all pedestrians and all parked cars, the game stays around the same fps. literally no change. maybe an fps or two. so either all of that shit is still running in the background or Lumens and nanite are what are cpu intensive, and that does not bode well for any game using those two technologies.
 

NXGamer

Member
He blamed other people of being mean and fallacious.
And spouted some stuff about the engine being heavy, while ignoring that the game is heavy on all machines.
Where did I do this again?

Yeah, PC's can vary a bit. But not like 60% performance difference or more, with very similar hardware.
And using a vast amount more of ram more. I just tested my system with no programs open in the background, except MSI Afterburner. Ram usage was 7.5GB, almost half of what his PC was using.
WTF was he running in the background to have such ram usage?
We even had a user, with a 2700X, underclocking his ram bellow the spec of what NXGamer was using, and still got a much higher frame rate.

Now seeing these results with this demo, it kind of makes sense why his results always seemed off.
Cannot see any shots or videos here, so imagining your results!

PC's differ, it is not just a CPU/GPU it is everything, Ram, Mobo, Drivers, PSU....

Here is an RTX2070Super with a 10700K getting lower performance than meAgain 3070 Ti with a 10700K is up and down from my results and using 10GB of Ram.

As I commented in my video and earlier in this thread, the game on my system is Memory/Cache bound and the GPU is being under utilised. The game is very well threaded but the Nanite and Readbacks of the materials is stalling on the GPU and this performance suffers heavily.
 
Last edited:

01011001

Gold Member
performance on this seems to ve very weird and all over the olace

on my RTX3060ti and Ryzen 5600X combo the compiled version runs at ~31fps with standard settings (everything on 3, with TSR upsampling, 1440p)

my CPU is at ~70% and my GPU is completely maxed at all times

someone's compiled a version with DLSS and TSR, which you can switch between in real time.

and going from TSR to DLSS Quality gives me a 25% performance uplift to above 40fps (while also looking better) my CPU doesn't get more usage tho.
 
Last edited:

PaintTinJr

Member
Whats crazy is that even after I turn off all traffic, all pedestrians and all parked cars, the game stays around the same fps. literally no change. maybe an fps or two. so either all of that shit is still running in the background or Lumens and nanite are what are cpu intensive, and that does not bode well for any game using those two technologies.
I get the same - but obviously at lower fps - with the pre-compiled demo by default - DLSS enabled - but after pressing 1 - on the keyboard - disabling DLSS and turning on TSR the crowd does affect my fps a bit more, by another 2-5fps
 

SlimySnake

The Contrarian
Where did I do this again?


Cannot see any shots or videos here, so imagining your results!

PC's differ, it is not just a CPU/GPU it is everything, Ram, Mobo, Drivers, PSU....

Here is an RTX2070Super with a 10700K getting lower performance than meAgain 3070 Ti with a 10700K is up and down from my results and using 10GB of Ram.

As I commented in my video and earlier in this thread, the game on my system is Memory/Cache bound and the GPU is being under utilised. The game is very well threaded but the Nanite and Readbacks of the materials is stalling on the GPU and this performance suffers heavily.
Please download and run some benchmarks using this demo. There is something seriously wrong with your build. You can compare the GPU usage in your benchmarks vs those videos you posted and you will see even your 3600/6800 pc is performing extremely poorly. The GPU utilization at 1080p is around 50%. same when you push it to native 4k. In my tests and in every other benchmark ive seen, the gpu utilization at higher resolutions is almost always in the 90s and every rarely drops to 70%.


I am not saying your conclusions are wrong, but your benchmarks simply are. I appreciate you taking the time to engage with us, but I suspect a better use of your time would be to simply redo your benchmarks using this demo that everyone's been using to run benchmarks. I can understand memory speed, ssd speeds and other non-cpu/gpu parts messing with benchmarks here and there but your results are off by 50-100% at times. thats not a single digit difference.

P.S the 2070 benchmark you posted is running at 1440p offering better performance than your 1080p benchmarks which should be a dead giveaway that there is something wrong with your benchmark or your system.
 









Nice work exposing the fraud! :messenger_ok:
 

SlimySnake

The Contrarian
I get the same - but obviously at lower fps - with the pre-compiled demo by default - DLSS enabled - but after pressing 1 - on the keyboard - disabling DLSS and turning on TSR the crowd does affect my fps a bit more, by another 2-5fps
Wait, i thought 1 was native. Ive been running it on 1 assuming it would run at native 4k to match my screen resolution. does it only enable TSR when resolution is 1440p or 1080p?

My config file has the resolution quality set to 100% so i assumed it would not be using TSR at native 4k.

[ScalabilityGroups]
sg.ResolutionQuality=100
sg.ViewDistanceQuality=3
sg.AntiAliasingQuality=3
sg.ShadowQuality=3

I think the Zen 2 CPUs with their somewhat low clocks are holding back the console GPUs and even PC GPUs. Ive hated on my i7-11700k processor a lot these past few months, but im very impressed by the fact that i can get 40-45 fps at native 4k. 4x the PS5 resolution and 50% more fps to boot. Absolutely insane performance uplift for a GPU that is only 2x more powerful than the PS5/XSX GPUs. However, the results are roughly 2x better than my 2080 so its not like the 3080 is performing miracles here, it is scaling like it should. I was getting 23-25 fps at 1440p. so 2x better than the 2080 just like nvidia advertised.

So clearly, the console GPUs are being held back by the CPUs, and thats why i dont think the PS5 IO is doing much here to help it perform better. It should be doing way better than 1080p 25 fps. Unless both NX gamer and DF have gotten their pixel counting wrong. VG Tech said that the demo was 1404p on both the xsx and ps5 which could explain the drops to 25 fps. At 1080p, i just never saw my 2080 drop below 30 fps aside from the initial runs when it had shader compile stutters.

P.S My gen 4 ssd, cpu, mobo and gpu combo might be giving me that long awaited PCIE Gen 4 performance boost so there is that.
 
Last edited:

yamaci17

Member
well here's the thing, not every architecture scales perfectly with frequency. for example, i did some weird tests to test CPU scaling a couple months earlier to make some comparisons. I've nerfed my CPU to the console levels. 6 core, 6 thread at 1 ghz was nearly providing an IPC performance close to the CPUs found on ps4 pro xbox one x.

i was shocked to see it could push a consistent solid 30 fps in majority of games. games tested were: rdr 2, horizon zero dawn, ac odyssey and a couple more other games

by mere logic, going from 1 ghz to 4 ghz should've provided me a 4x performance increase. tests were done at 360p/720p to push gpu out of the discussion.

rdr 2 benefitted the most, it was rendering 95 fps @4 ghz (34 fps @1 ghz)
odyssey was the worst, it was rendering 62 fps @4 ghz (barely 30 fps @1 ghz)

it was incredible to see the lack of freq. scaling in odyssey. my motivation behind the test was: "jeez this game runs at a rock solid 30 fps on a 1.6 ghz ps4 chip. how the hell i'm cpu bound at 60 fps with a cpu that is 3-4 times faster than consoles"

as it turns out, you could still get the very same PS4-like cpu performance at lower frequencies, but rather, game did not like high frequencies.

so back to the topic, yes, 3.5-3.7 ghz is cool and all but for zen/zen+/zen 2, in certain situations, frequency performance scaling is pretty notorious. other than that, the big underlying issue of having two CCX groups are still present on consoles. and CCX communication needs low latency memory because of infinity fabric. and thats where problems arise, consoles have high bandwidth and enormously high latency memory. cross CCX latency is probably off the roofs on consoles.

and then finally there's the 8 mb cache issue. these are damn gaming consoles, not cheap, knockoff budget laptops. they should've somehow given them at least 16 mb cache. if you ask me, 32 mb would've even better. maybe with "muh" optimizations such as cache scrubbers and whatnot, they might alleviate having low cache. but its still a factor when you look at consoles' raw cpu performance.

in short, there are some people act as if consoles have cpu near 3700x. in reality, its most likely somewhere near a ryzen 1700 (not joking). all the tests i've done so far put my 3.7 ghz 2700 above %25-30 where i've expected worse. even in UE5 demo, where ps5 drops to 20s, i drop to 25s-26s


amazing innit?
 
Last edited:

winjer

Gold Member
Where did I do this again?

Here, for example:

Cannot see any shots or videos here, so imagining your results!

Already posted several screenshots from my PC. Here one of my posts.

PC's differ, it is not just a CPU/GPU it is everything, Ram, Mobo, Drivers, PSU....

Here is an RTX2070Super with a 10700K getting lower performance than meAgain 3070 Ti with a 10700K is up and down from my results and using 10GB of Ram.

As I commented in my video and earlier in this thread, the game on my system is Memory/Cache bound and the GPU is being under utilised. The game is very well threaded but the Nanite and Readbacks of the materials is stalling on the GPU and this performance suffers heavily.

Nice work mate. You are comparing videos of people running the demo at higher resolution than your test.
Remember you tested at 1080p and got 19-23 fps walking around. And 14-19 fps while driving.
But in these videos, one is running the demo at 1440p and the other at 3440x1440.
Even at higher resolutions than you, they are not having results as bad as yours.
 
Last edited:
Please download and run some benchmarks using this demo. There is something seriously wrong with your build. You can compare the GPU usage in your benchmarks vs those videos you posted and you will see even your 3600/6800 pc is performing extremely poorly. The GPU utilization at 1080p is around 50%. same when you push it to native 4k. In my tests and in every other benchmark ive seen, the gpu utilization at higher resolutions is almost always in the 90s and every rarely drops to 70%.


I am not saying your conclusions are wrong, but your benchmarks simply are. I appreciate you taking the time to engage with us, but I suspect a better use of your time would be to simply redo your benchmarks using this demo that everyone's been using to run benchmarks. I can understand memory speed, ssd speeds and other non-cpu/gpu parts messing with benchmarks here and there but your results are off by 50-100% at times. thats not a single digit difference.

P.S the 2070 benchmark you posted is running at 1440p offering better performance than your 1080p benchmarks which should be a dead giveaway that there is something wrong with your benchmark or your system.
I have 3060ti 3700x and I get 70% or lower gpu usage with traffic on max and 90% with traffic off. It also adds a lot of fps for me.
Also that fichier link is trolling me. Always stalls at 6.9gb downloaded
 
Last edited:

PaintTinJr

Member
Wait, i thought 1 was native. Ive been running it on 1 assuming it would run at native 4k to match my screen resolution. does it only enable TSR when resolution is 1440p or 1080p?

My config file has the resolution quality set to 100% so i assumed it would not be using TSR at native 4k.

[ScalabilityGroups]
sg.ResolutionQuality=100
sg.ViewDistanceQuality=3
sg.AntiAliasingQuality=3
sg.ShadowQuality=3

I think the Zen 2 CPUs with their somewhat low clocks are holding back the console GPUs and even PC GPUs. Ive hated on my i7-11700k processor a lot these past few months, but im very impressed by the fact that i can get 40-45 fps at native 4k. 4x the PS5 resolution and 50% more fps to boot. Absolutely insane performance uplift for a GPU that is only 2x more powerful than the PS5/XSX GPUs. However, the results are roughly 2x better than my 2080 so its not like the 3080 is performing miracles here, it is scaling like it should. I was getting 23-25 fps at 1440p. so 2x better than the 2080 just like nvidia advertised.

So clearly, the console GPUs are being held back by the CPUs, and thats why i dont think the PS5 IO is doing much here to help it perform better. It should be doing way better than 1080p 25 fps. Unless both NX gamer and DF have gotten their pixel counting wrong. VG Tech said that the demo was 1404p on both the xsx and ps5 which could explain the drops to 25 fps. At 1080p, i just never saw my 2080 drop below 30 fps aside from the initial runs when it had shader compile stutters.

P.S My gen 4 ssd, cpu, mobo and gpu combo might be giving me that long awaited PCIE Gen 4 performance boost so there is that.
I'm not sure, that's why people asking nxgamer and DF - even if Alex's hypothesis of disabling HW lumen is wrong) - about the results - that are totally transparent - feels like the wrong way round to me.

Anyone claiming better performance should at the very least have compiled themselves and understand what their settings are, that they are comparing.

edit:
Going by what Carmack said many. many years ago about compiler complexity and the impact of choosing the best flags, and the time taken to try them, something as simple as a difference between using the free Community Visual Studio and the paid version, or using Intel's paid one - that is less favourable for AMD chips IIRC - could be enough to show a decent delta - DLSS use or not - and it would only take someone to accidental choose a debug - instead of release- setting for visual studio and a large amount of performance could be lost too.

Given that all the general public only has access to the community compiler, IMHO that's the performance that's better for discussing.
 
Last edited:

NXGamer

Member
Here, for example:



Already posted several screenshots from my PC. Here one of my posts.



Nice work mate. You are comparing videos of people running the demo at higher resolution than your test.
Remember you tested at 1080p and got 19-23 fps walking around. And 14-19 fps while driving.
But in these videos, one is running the demo at 1440p and the other at 3440x1440.
Even at higher resolutions than you, they are not having results as bad as yours.
Top response was aimed at people stating my work history and experience, which was incorrect and made up. You are creating a straw man here, poor show.

Screen shots are static, if I stand still in some areas of the city I can also get 34fps.

I explained this to death, but I will try again as you appear to be struggling...the game is not GPU bound (entirely anyway) and rendering at higher resolutions even on a 2070 just increase utilisation and not performance, I could render at 4K here and still be getting similar results.
 
Last edited:
Please download and run some benchmarks using this demo. There is something seriously wrong with your build. You can compare the GPU usage in your benchmarks vs those videos you posted and you will see even your 3600/6800 pc is performing extremely poorly. The GPU utilization at 1080p is around 50%. same when you push it to native 4k. In my tests and in every other benchmark ive seen, the gpu utilization at higher resolutions is almost always in the 90s and every rarely drops to 70%.


I am not saying your conclusions are wrong, but your benchmarks simply are. I appreciate you taking the time to engage with us, but I suspect a better use of your time would be to simply redo your benchmarks using this demo that everyone's been using to run benchmarks. I can understand memory speed, ssd speeds and other non-cpu/gpu parts messing with benchmarks here and there but your results are off by 50-100% at times. thats not a single digit difference.

P.S the 2070 benchmark you posted is running at 1440p offering better performance than your 1080p benchmarks which should be a dead giveaway that there is something wrong with your benchmark or your system.
This demo clearly has tampered settings.
 

winjer

Gold Member
Top response was aimed at people stating my work history and experience, which was incorrect and made up. You are creating a straw man here, poor show.

People were just speculating about your technical background.
All you had to do was correct them and be done with it. Insults were not necessary.

Screen shots are static, if I stand still in some areas of the city I can also get 34fps.

I explained this to death, but I will try again as you appear to be struggling...the game is not GPU bound (entirely anyway) and rendering at higher resolutions even on a 2070 just increase utilisation and not performance, I could render at 4K here and still be getting similar results.

Here are a few more of me driving around and crashing.
Also remember that there are videos and screenshots from other people in this thread. And other outlets, including Digital Foundry.

Now might I ask if you are asking for more evidence to find why there is a difference in results, or if you are just trying to undermine other people's arguments?

 
well here's the thing, not every architecture scales perfectly with frequency. for example, i did some weird tests to test CPU scaling a couple months earlier to make some comparisons. I've nerfed my CPU to the console levels. 6 core, 6 thread at 1 ghz was nearly providing an IPC performance close to the CPUs found on ps4 pro xbox one x.

i was shocked to see it could push a consistent solid 30 fps in majority of games. games tested were: rdr 2, horizon zero dawn, ac odyssey and a couple more other games

by mere logic, going from 1 ghz to 4 ghz should've provided me a 4x performance increase. tests were done at 360p/720p to push gpu out of the discussion.

rdr 2 benefitted the most, it was rendering 95 fps @4 ghz (34 fps @1 ghz)
odyssey was the worst, it was rendering 62 fps @4 ghz (barely 30 fps @1 ghz)

it was incredible to see the lack of freq. scaling in odyssey. my motivation behind the test was: "jeez this game runs at a rock solid 30 fps on a 1.6 ghz ps4 chip. how the hell i'm cpu bound at 60 fps with a cpu that is 3-4 times faster than consoles"

as it turns out, you could still get the very same PS4-like cpu performance at lower frequencies, but rather, game did not like high frequencies.

so back to the topic, yes, 3.5-3.7 ghz is cool and all but for zen/zen+/zen 2, in certain situations, frequency performance scaling is pretty notorious. other than that, the big underlying issue of having two CCX groups are still present on consoles. and CCX communication needs low latency memory because of infinity fabric. and thats where problems arise, consoles have high bandwidth and enormously high latency memory. cross CCX latency is probably off the roofs on consoles.

and then finally there's the 8 mb cache issue. these are damn gaming consoles, not cheap, knockoff budget laptops. they should've somehow given them at least 16 mb cache. if you ask me, 32 mb would've even better. maybe with "muh" optimizations such as cache scrubbers and whatnot, they might alleviate having low cache. but its still a factor when you look at consoles' raw cpu performance.

in short, there are some people act as if consoles have cpu near 3700x. in reality, its most likely somewhere near a ryzen 1700 (not joking). all the tests i've done so far put my 3.7 ghz 2700 above %25-30 where i've expected worse. even in UE5 demo, where ps5 drops to 20s, i drop to 25s-26s


amazing innit?
You are exactly right. Consoles have 3.5ghz mobile Zen 2 with reduced cache (8MB L3 cache) similar to about 1700x in raw performance. We already knew this years ago after the first PS5 benchmarks.

Which makes the performance of those consoles compared to ~5GHz CPUs having 32MB of L3 cache (costing more than the whole console) even more impressive.

People using those high end CPUs to make benchmarks against consoles and claiming they are only comparing GPUs are either ignorant or dishonest.
 
Last edited:

DaGwaphics

Member
Whats crazy is that even after I turn off all traffic, all pedestrians and all parked cars, the game stays around the same fps. literally no change. maybe an fps or two. so either all of that shit is still running in the background or Lumens and nanite are what are cpu intensive, and that does not bode well for any game using those two technologies.

That's the result almost everyone is getting on YT. It looks like the real bottleneck here is the GI, as that is the only setting that drastically improves FPS and lowers CPU load.
 
You are exactly right. Consoles have 3.5ghz mobile Zen 2 with reduced cache (8MB L3 cache) similar to about 1700x in raw performance. We already knew this years ago after the first PS5 benchmarks.

Which makes the performance of those consoles compared to ~5GHz CPUs having 32MB of L3 cache (costing more than the whole console) even more impressive.

People using those high end CPUs to make benchmarks against consoles and claiming they are only comparing GPUs are either ignorant or dishonest.
Consoles got a fairly optimized demo. PC did not
 

yamaci17

Member
You are exactly right. Consoles have 3.5ghz mobile Zen 2 with reduced cache (8MB L3 cache) similar to about 1700x in raw performance. We already knew this years ago after the first PS5 benchmarks.

Which makes the performance of those consoles compared to ~5GHz CPUs having 32MB of L3 cache (costing more than the whole console) even more impressive.

People using those high end CPUs to make benchmarks against consoles and claiming they are only comparing GPUs are either ignorant or dishonest.
yes, but again, my cpu is zen+ (worse ipc than zen 2), clocked at 3.7 ghz for test purposes, only have 16 mb cache (still more than consoles). i practically get similar or actually a tad bit better performance than consoles with a nearly similar CPU
 
As I commented in my video and earlier in this thread, the game on my system is Memory/Cache bound and the GPU is being under utilised. The game is very well threaded but the Nanite and Readbacks of the materials is stalling on the GPU and this performance suffers heavily.

This is ridiculously incorrect. The I/O, Super fast SSD none-sense again.
In your video, it's using up 11.3GB System Memory. For me for example, at no point did it even touch 5GB System Memory.
The demo has huge memory leaks. So I'm not surprised you were getting 11/12GB if you were changing settings and quitting and relaunching. On top of that, you compiled with the Dev tools.

Here is a performance video of various resolutions and settings (we don't know how many times they ran the demo up till the point the video started). It starts at (9GB system RAM) and ends up at 32 GB System RAM Usage as they switched settings and resolution.
Demonstrating the memory leaks.


Here is my memory usage. It never touched 5 GB System Memory. Your analysis is incorrect in every way possible.
You need to compile the shipping mode (it only takes ~30 mins)
Then do a fresh shut down and start up and only launch the demo then look at the memory usage.


 
Last edited:

Hoddi

Member
well here's the thing, not every architecture scales perfectly with frequency. for example, i did some weird tests to test CPU scaling a couple months earlier to make some comparisons. I've nerfed my CPU to the console levels. 6 core, 6 thread at 1 ghz was nearly providing an IPC performance close to the CPUs found on ps4 pro xbox one x.

i was shocked to see it could push a consistent solid 30 fps in majority of games. games tested were: rdr 2, horizon zero dawn, ac odyssey and a couple more other games

by mere logic, going from 1 ghz to 4 ghz should've provided me a 4x performance increase. tests were done at 360p/720p to push gpu out of the discussion.

rdr 2 benefitted the most, it was rendering 95 fps @4 ghz (34 fps @1 ghz)
odyssey was the worst, it was rendering 62 fps @4 ghz (barely 30 fps @1 ghz)

it was incredible to see the lack of freq. scaling in odyssey. my motivation behind the test was: "jeez this game runs at a rock solid 30 fps on a 1.6 ghz ps4 chip. how the hell i'm cpu bound at 60 fps with a cpu that is 3-4 times faster than consoles"
It's not as strange as you think once you factor memory bandwidth into the equation. AC Odyssey is a DX11 game that pushes 80k+ draw calls in a frame which is enormously bandwidth intensive. People have noted that the game doesn't really scale with faster processors and it's partially because bandwidth matters as well as the CPU+GPU.

The console CPUs might have fairly small caches but it could be alleviated somewhat by high(ish) system bandwidth.


 

winjer

Gold Member
It's not as strange as you think once you factor memory bandwidth into the equation. AC Odyssey is a DX11 game that pushes 80k+ draw calls in a frame which is enormously bandwidth intensive. People have noted that the game doesn't really scale with faster processors and it's partially because bandwidth matters as well as the CPU+GPU.

The console CPUs might have fairly small caches but it could be alleviated somewhat by high(ish) system bandwidth.

On the case of CPU caches, they are there more to reduce access latency, than to improve memory bandwidth. Although they can also do the latter.
CPUs run instructions that are very dependent on each other, with lots of branching, that makes predicting the data needed, a lot more difficult. And when that data is not available, the CPU pipeline has to wait until data is fetched.
For most cases, CPUs do a lot of fetches of small amounts of data. This means the time to load data into registries for execution, is one of the major bottlenecks in the CPU's pipeline.
This is why on todays CPUs, most of it's transistors is spent on branch prediction, caches and logic to keep those caches loaded with the right data for the next instructions.
So having less cache, even if it's L3, means more cache misses and this means fetching data from memory.
Now compare the latency on memory and an L3 cache. For a Zen2, L3 cache is around 10 nano seconds. But the latency to access memory on Zen2 with DDR4, is 70-80ns. Can go to the 60s, with optimized timmings.
But on consoles, the memory controller is optimized for bandwidth, not latency. It has a latency of around 140ns. So when there is cache miss, it takes much longer to get that data from memory.
And because there is less L3 cache, only 4+4MB, vs 16+16 on PC, the probability of a cache miss is higher.

Now this is not the end for consoles CPU's. There are other factors at play for the performance of a CPU in games.
So don't take the case example of cache, as the only contributing factor to the performance of a system.
 
Last edited:

PaintTinJr

Member
That's the result almost everyone is getting on YT. It looks like the real bottleneck here is the GI, as that is the only setting that drastically improves FPS and lowers CPU load.
Going by the original nanite/lumen numbers my guess of the cause would be latency between nanite and lumen on PC(PCIe bandwidth); because lumen is still two passes IIRC and needs to wait for nanite to complete. Disabling lumen GI on my system doubles frame-rate and maxes GPU utilisation - because it is back to HW T&L accelerated shader model (eg DX10/11 AFAIK ). and uses proxy models in all likelihood.
 

PaintTinJr

Member
This is ridiculously incorrect. The I/O, Super fast SSD none-sense again.
In your video, it's using up 11.3GB System Memory. For me for example, at no point did it even touch 5GB System Memory.
The demo has huge memory leaks. So I'm not surprised you were getting 11/12GB if you were changing settings and quitting and relaunching. On top of that, you compiled with the Dev tools.

Here is a performance video of various resolutions and settings (we don't know how many times they ran the demo up till the point the video started). It starts at (9GB system RAM) and ends up at 32 GB System RAM Usage as they switched settings and resolution.
Demonstrating the memory leaks.


Here is my memory usage. It never touched 5 GB System Memory. Your analysis is incorrect in every way possible.
You need to compile the shipping mode (it only takes ~30 mins)
Then do a fresh shut down and start up and only launch the demo then look at the memory usage.


I know English probably isn't your first language, but why do you so aggressively conflate what people say with this ALT account?

Even me using the word "apparently" - which you quoted - didn't tell you my 5hr comment was referencing someone else's claim - and the irony was the 5hr number for compiling the shaders came from a DF video or a comment on beyond3d about Alex compiling the sample with HW lumen disabled IIRC.
 
Last edited:

Hoddi

Member
On the case of CPU caches, they are there more to reduce access latency, than to improve memory bandwidth. Although they can also do the latter.
CPUs run instructions that are very dependent on each other, with lots of branching, that makes predicting the data needed, a lot more difficult. And when that data is not available, the CPU pipeline has to wait until data is fetched.
For most cases, CPUs do a lot of fetches of small amounts of data. This means the time to load data into registries for execution, is one of the major bottlenecks in the CPU's pipeline.
This is why on todays CPUs, most of it's transistors is spent on branch prediction, caches and logic to keep those caches loaded with the right data for the next instructions.
So having less cache, even if it's L3, means more cache misses and this means fetching data from memory.
Now compare the latency on memory and an L3 cache. For a Zen2, L3 cache is around 10 nano seconds. But the latency to access memory on Zen2 with DDR4, is 70-80ns. Can go to the 60s, with optimized timmings.
But on consoles, the memory controller is optimized for bandwidth, not latency. It has a latency of around 140ns. So when there is cache miss, it takes much longer to get that data from memory.
And because there is less L3 cache, only 4+4MB, vs 16+16 on PC, the probability of a cache miss is higher.

Now this is not the end for consoles CPU's. There are other factors at play for the performance of a CPU in games.
So don't take the case example of cache, as the only contributing factor to the performance of a system.
I don't disagree at all. I didn't mean that fast memory is a solution but simply that it can help against a smaller L3 cache.

In any case, I've tried reducing the cache frequency on my own CPU (9900k) and it didn't cause any linear drop in this demo. Performance only dropped from ~43fps at 4.3ghz to ~38fps at 2.5ghz while reducing it all the way 0.8ghz had a bigger effect at ~18fps.
 
Last edited:

SlimySnake

The Contrarian
You are exactly right. Consoles have 3.5ghz mobile Zen 2 with reduced cache (8MB L3 cache) similar to about 1700x in raw performance. We already knew this years ago after the first PS5 benchmarks.

Which makes the performance of those consoles compared to ~5GHz CPUs having 32MB of L3 cache (costing more than the whole console) even more impressive.

People using those high end CPUs to make benchmarks against consoles and claiming they are only comparing GPUs are either ignorant or dishonest.
I have said this several times before. Alex using i9-11290k to compare PC GPUs to console GPUs is fucking ridiculous even in games that are not CPU bound. It's why I like NX Gamer using the 2700x in his tests to compare PC GPUs to the PS5 because that CPU kinda sucks and holds back his 2070 in ways the PS5 CPU is likely holding back the PS5 GPU. What i dont care for is when he starts comparing two different PC GPUs one with a 2700x and one with a 3600 like he did for Deathloop. It's like what are you doing.

BTW, there was a test done on that supposed PS5 APU that was released recently with the GPU disabled, and it offered similar performance to the 2700x. I cant find the benchmarks but i vaguely recall seeing it outperform the 1700x and was much closer to the 2700x performance.
I don't disagree at all. I didn't mean that fast memory is a solution but simply that it can help against a smaller L3 cache.

In any case, I've tried reducing the cache frequency on my own CPU and it didn't cause any linear drop in this demo. Performance only dropped from ~43fps at 4.3ghz to ~38fps at 2.5ghz while reducing it all the way 0.8ghz had a bigger effect at ~18fps.
This is very interesting. Would you mind running another test with the 3.5 Ghz clockspeeds we see in consoles?

What CPU do you have?
 

Hoddi

Member
I have said this several times before. Alex using i9-11290k to compare PC GPUs to console GPUs is fucking ridiculous even in games that are not CPU bound. It's why I like NX Gamer using the 2700x in his tests to compare PC GPUs to the PS5 because that CPU kinda sucks and holds back his 2070 in ways the PS5 CPU is likely holding back the PS5 GPU. What i dont care for is when he starts comparing two different PC GPUs one with a 2700x and one with a 3600 like he did for Deathloop. It's like what are you doing.

BTW, there was a test done on that supposed PS5 APU that was released recently with the GPU disabled, and it offered similar performance to the 2700x. I cant find the benchmarks but i vaguely recall seeing it outperform the 1700x and was much closer to the 2700x performance.

This is very interesting. Would you mind running another test with the 3.5 Ghz clockspeeds we see in consoles?

What CPU do you have?
I forgot to mention that it's a 9900k so it's not directly comparable to Zen 2. But dropping the core clockrate to 3.5ghz makes the cache run at 3.2ghz and seems to stabilize in the 35fps range.

In terms of cache then it has 18MB of combined L2+L3 vs 12MB in the consoles. Still a difference but not quite as big as those 40MB+ monsters like the 12900k and Ryzen 3D chips.
 
Last edited:

twilo99

Member
Has anyone tested the 5800x3d with this demo? I've been waiting to see if the extra mem makes a difference here since it seems to be working nicely for a lot of other engines out there.

This whole thing with the 3d cash from AMD bodes well for the consoles.. RDNA3 will also have stacked cache.
 

PaintTinJr

Member
Just tested. Flying around low to ground and i bring my gpu usage below 30%
Why do you think it drops when (fast/slow?) flying ? And is there anything visual difference when the utilisation lowers - and lowers from what on your GPU?

Without testing myself, I would think lower utilisation of that level might mean that lumen isn't getting updated every frame - maybe because the image is relatively unchanged over multiple frames so it just uses cached results, or - less likely IMHO - because the traversal speed might be too quick and the GI can't update in time before already being outdated, so pauses updating and recalculates fully when it next can.
 

StreetsofBeige

Gold Member
What did Ethomaz get a straight perm ban for??
Doesn't pinpoint anything in the Ban tab, but I'd guess just an accumulation of fan warrior posts. The guy will defend Sony to the bone muddying up every thread. If there's one person I had to guess might be a video game employee defending his company on GAF it's him. IMO, the guy has a good probability of working at Sony. Or he doesn't and is just a giant warrior.
 
Last edited:

Hezekiah

Banned
Doesn't pinpoint anything in the Ban tab, but I'd guess just an accumulation of fan warrior posts. The guy will defend Sony to the bone muddying up every thread. If there's one person I had to guess might be a video game employee defending his company on GAF it's him. IMO, the guy has a good probability of working at Sony. Or he doesn't and is just a giant warrior.
Seemed like there was no build-up to it though.

Plus his posts were often detailed and well thought out whether you agreed with them or not.

Also, if we're talking about somebody on here potentially being a video game employee it's gotta be Senjutsusage or Riky....
 
Doesn't pinpoint anything in the Ban tab, but I'd guess just an accumulation of fan warrior posts. The guy will defend Sony to the bone muddying up every thread. If there's one person I had to guess might be a video game employee defending his company on GAF it's him. IMO, the guy has a good probability of working at Sony. Or he doesn't and is just a giant warrior.
Vine Ok GIF
 

NXGamer

Member
People were just speculating about your technical background.
All you had to do was correct them and be done with it. Insults were not necessary.
I did not need to, it was done already by a link and I have had this conversation here multiple times. The comments were not "speculating" it was stated as fact and you know it.
I simply replied with the same energy levels (to a lesser degree) than was being pointed towards me, no insults at all as, again, you change the facts.

Here are a few more of me driving around and crashing.
Also remember that there are videos and screenshots from other people in this thread. And other outlets, including Digital Foundry.

Now might I ask if you are asking for more evidence to find why there is a difference in results, or if you are just trying to undermine other people's arguments?
I have shared ones that match mine and others here have stated the same, I even cover this in the video a PC is not a fixed spec and my results are what they are.

Nothing has changed, Nanite, Lumen are the big costs and affect GPU utilisation far too much. As I show in the video my performance increases by 80%+ by turning RT off and Util increases also.

Even on yours and other rigs it has heavy dips and issues, this conversation on the state of this demo is borderline obsessive at this point.

This is ridiculously incorrect. The I/O, Super fast SSD none-sense again.
In your video, it's using up 11.3GB System Memory. For me for example, at no point did it even touch 5GB System Memory.
The demo has huge memory leaks. So I'm not surprised you were getting 11/12GB if you were changing settings and quitting and relaunching. On top of that, you compiled with the Dev tools.

Here is a performance video of various resolutions and settings (we don't know how many times they ran the demo up till the point the video started). It starts at (9GB system RAM) and ends up at 32 GB System RAM Usage as they switched settings and resolution.
Demonstrating the memory leaks.


Here is my memory usage. It never touched 5 GB System Memory. Your analysis is incorrect in every way possible.
You need to compile the shipping mode (it only takes ~30 mins)
Then do a fresh shut down and start up and only launch the demo then look at the memory usage.


Learn some manors, a discussion needs to be civil and your barbed response here is unrequired and incorrect.

My demo does not exceed 5GB either and you are speculating on what I did, let me clarify for you.

1) I compiled a Release Package and tested that with only changes coming from .ini and command console.
2) I ran the demo for at least 45 minutes before any performance testing
3) I rebooted my machine and booted straight into the demo between tests.

Memory leaks maybe, but the memory issue I stated is either readback, shared between Sysram/Vram through PCIe or cache related. It is not that it has no memory, it is that for whatever reason the engine is creating huge bubbles and thus GPU us not maximised and even CPU is not once you go above 4 cores.

Just tested. Flying around low to ground and i bring bring my gpu usage below 30%
Yep, I get that also and you can even turn off all traffic/pedestrians and see util bump up, then turn on each by 25% and it remains better than when turned off, then you move and it drops again.
The RT element and Nanite is what is the main issue on my side, the Nanite Materials and Readback take half of my Frame-time alone at 25fps (~20ms)
 
Last edited:

DaGwaphics

Member
Going by the original nanite/lumen numbers my guess of the cause would be latency between nanite and lumen on PC(PCIe bandwidth); because lumen is still two passes IIRC and needs to wait for nanite to complete. Disabling lumen GI on my system doubles frame-rate and maxes GPU utilisation - because it is back to HW T&L accelerated shader model (eg DX10/11 AFAIK ). and uses proxy models in all likelihood.

Okay. I was under the impression that there were no light sources at all when GI was disabled (or shadows or anything like that), the night portion is basically just blacked out other than the windows being shaded yellow. I don't have the PC to mess around with this myself (I'm just looking at YT LOL).
 

NXGamer

Member
Please download and run some benchmarks using this demo. There is something seriously wrong with your build. You can compare the GPU usage in your benchmarks vs those videos you posted and you will see even your 3600/6800 pc is performing extremely poorly. The GPU utilization at 1080p is around 50%. same when you push it to native 4k. In my tests and in every other benchmark ive seen, the gpu utilization at higher resolutions is almost always in the 90s and every rarely drops to 70%.


I am not saying your conclusions are wrong, but your benchmarks simply are. I appreciate you taking the time to engage with us, but I suspect a better use of your time would be to simply redo your benchmarks using this demo that everyone's been using to run benchmarks. I can understand memory speed, ssd speeds and other non-cpu/gpu parts messing with benchmarks here and there but your results are off by 50-100% at times. thats not a single digit difference.

P.S the 2070 benchmark you posted is running at 1440p offering better performance than your 1080p benchmarks which should be a dead giveaway that there is something wrong with your benchmark or your system.
I downloaded and got exact same results, same issues.

I even clean wiped GPU driver and did all again, still inconsistent and can hover between 33-16fps depending on city section, action on screen etc
 

winjer

Gold Member
I have shared ones that match mine and others here have stated the same, I even cover this in the video a PC is not a fixed spec and my results are what they are.

Nothing has changed, Nanite, Lumen are the big costs and affect GPU utilisation far too much. As I show in the video my performance increases by 80%+ by turning RT off and Util increases also.

Even on yours and other rigs it has heavy dips and issues, this conversation on the state of this demo is borderline obsessive at this point.

We all know this demo is heavy. That is not the issue we were discussing.

On the same scene, standing still, 1080p, settings at 3, your PC does 23 fps. Mine does 39.
While driving, it's difficult to get a concrete number. But your PC runs in the teens. Mine runs in the twenties.
yamaci17, with his underclocked 2700X has results closer to mine, than yours.
There is clearly some issue with your test. You have people on this forum that could help you troubleshot the issue. But only if you want to.
I know you have tested another package. If you upload your packaged demo, we can test it. Just to make sure.
 
The I/O, Super fast SSD none-sense again.

Why does any talk regarding the I/O upset you?

Remember both the PS5 and Series have revamped I/O systems. Any benefits that the PS5 gets can also be applied to the Series and PC as well.

I know the benefits of the PS5 I/O system get stated a lot here but most of them can also be applied to the Series and PC. If devs benefit from one I/O system it will most likely trickle down to all the other platforms. It’s a win for all platform owners and nothing to get upset by,
 

yamaci17

Member
Why does any talk regarding the I/O upset you?

Remember both the PS5 and Series have revamped I/O systems. Any benefits that the PS5 gets can also be applied to the Series and PC as well.

I know the benefits of the PS5 I/O system get stated a lot here but most of them can also be applied to the Series and PC. If devs benefit from one I/O system it will most likely trickle down to all the other platforms. It’s a win for all platform owners and nothing to get upset by,

talk is legit, examples are wrong

he made an entire 20 minute long video about how the i/o subsystems helped ps5 secure a %70-80 lead over 2700x (2700x rendering 14-15 fps against ps5 rendering 24-30 fps in similar place). its simply wrong, since 2700x renders 30-32 fps at those locations. this fact completely invalidates the entire point of his video. not the point itself, but the way he presented it.
 
Last edited:

PaintTinJr

Member
Why does any talk regarding the I/O upset you?

Remember both the PS5 and Series have revamped I/O systems. Any benefits that the PS5 gets can also be applied to the Series and PC as well.

I know the benefits of the PS5 I/O system get stated a lot here but most of them can also be applied to the Series and PC. If devs benefit from one I/O system it will most likely trickle down to all the other platforms. It’s a win for all platform owners and nothing to get upset by,
I don't think he is upset at all. I think it is all an act and part of a coordinated effort to discredit voices of discourse to control a narrative. Go back and look at how people are usually in a contentious discussion maybe even by themselves potentially out on a limb, or as earlier in this thread IIRC - someone with a seemingly normal account pops up to tee up discrediting people - and this account comes in aggressive with barbs to try and gatcha - to get them to shut up and leave. And that's what it looked like he was trying on nxgamer. It all looks like shilling IMO and I suspect it is an ALT account.
 
talk is legit, examples are wrong

he made an entire 20 minute long video about how the i/o subsystems helped ps5 secure a %70-80 lead over 2700x (2700x rendering 14-15 fps against ps5 rendering 24-30 fps in similar place). its simply wrong, since 2700x renders 30-32 fps at those locations. this fact completely invalidates the entire point of his video. not the point itself, but the way he presented it.

Well the same probably applies to the Series as well since they show similar performance with the Matrix Demo.

P.S I know the XSS is below the PS5 and XSX but the I/O performance is pretty similar to the premium consoles with the PS5 having the edge in theory.
 

DenchDeckard

Gold Member
I don't think he is upset at all. I think it is all an act and part of a coordinated effort to discredit voices of discourse to control a narrative. Go back and look at how people are usually in a contentious discussion maybe even by themselves potentially out on a limb, or as earlier in this thread IIRC - someone with a seemingly normal account pops up to tee up discrediting people - and this account comes in aggressive with barbs to try and gatcha - to get them to shut up and leave. And that's what it looked like he was trying on nxgamer. It all looks like shilling IMO and I suspect it is an ALT account.

If this was true I would be absolutely astonished and quite pissed that someone has deceived us.

I would also be astonished if all the evidence that has been shared over the last 48 hours is true and we have someone making 20 minute videos that are shared over the internet and who is commisioned by IGN to create content has absolute innacurate data and is creating theories like they are truth on why a console is beating PC performance by like 60 percent when it is categorically wrong and false.

I just hope that IF the issues are identified and proven true that they have the decency to pull the video and redo it highlighting their honest mistake.
 
Last edited:

DaGwaphics

Member
Well the same probably applies to the Series as well since they show similar performance with the Matrix Demo.

P.S I know the XSS is below the PS5 and XSX but the I/O performance is pretty similar to the premium consoles with the PS5 having the edge in theory.

The issue at hand is that it is difficult to say the I/O of the next-gen systems helped them move past a specific CPU when most examples of said CPU in action are performing on par or better than the consoles. PC appears to have no trouble (other than the typical issue with shader compilation, a basic cost of not being a fixed platform) matching console performance with several CPU/GPU combination offering a more stable 30fps experience at higher detail levels.

The hard part for the PC crowd is that PC is having a hard time doubling that performance level to hit 60fps. But it will probably be easily doable with the mid-range offerings of Zen 4 and whatever is next from Intel.
 
now that's an amazing suggestion

He just posted:
I downloaded and got exact same results, same issues.

I even clean wiped GPU driver and did all again, still inconsistent and can hover between 33-16fps depending on city section, action on screen etc

Makes absolutely no sense why 10700k gets double that. Dude's just sticking with his fake narrative. I think he's beyond saving.
 

PaintTinJr

Member
He just posted:


Makes absolutely no sense why 10700k gets double that. Dude's just sticking with his fake narrative. I think he's beyond saving.
Considering in the photos the GPU info is iffy with a generic GFX description with less info - like I've only seen in a HyperV session - and on the pics from nxgamer's video the GPU has full details GPU name, and has a higher clock, has specified VRAM clock I believe, higher load, lower framerate, different temp(think it was 6degress higher, might have been lower) in the comparative shot, yet lower Power draw by a reasonable amount. Clearly the two systems aren't an exact match and no one is even checking those details while bemoaning nxgamer's result, why is that?

Maybe nxgamer has 32GBs or more - the other system showed 16GBs - and so nxgamer's PC might be operating in single channel memory mode - like the vast majority of PCs in the wild - because the OS and engine all fit inside one 16GB(or 32GB) module on his system, whereas on the other system they exceed an 8GB module and need to span two, no matter what. But the list of potential causes is endless. It might be any number of reasons : a faulty high precision motherboard clock or the feature disabled could cause random stalls. , salient PSU issue capping GPU supply. A bifurcation config issue - for an nvme + GPU - could be throttling something. Difference in SSD overprovisioning, virtual memory config differences, even whether SMT is enabled or disabled, etc, etc, etc.
 
Last edited:

PaintTinJr

Member
If this was true I would be absolutely astonished and quite pissed that someone has deceived us.

I would also be astonished if all the evidence that has been shared over the last 48 hours is true and we have someone making 20 minute videos that are shared over the internet and who is commisioned by IGN to create content has absolute innacurate data and is creating theories like they are truth on why a console is beating PC performance by like 60 percent when it is categorically wrong and false.

I just hope that IF the issues are identified and proven true that they have the decency to pull the video and redo it highlighting their honest mistake.
I was only talking about that one forum account that randomly appears to gatcha people, so wasn't expecting your following points about ign.

It feels odd that you keep doubling down on the benchmarks of a "sample" not for consumer use that needed "apparently" 30mins to compile with thousands and thousands of files. The margin of latitude - to not call someone's credibility into question - is surely greater than the performance delta, no? Especially from non-technical people, I should hope.
 

DenchDeckard

Gold Member
I was only talking about that one forum account that randomly appears to gatcha people, so wasn't expecting your following points about ign.

It feels odd that you keep doubling down on the benchmarks of a "sample" not for consumer use that needed "apparently" 30mins to compile with thousands and thousands of files. The margin of latitude - to not call someone's credibility into question - is surely greater than the performance delta, no? Especially from non-technical people, I should hope.

Well, if someone is making a video that could be viewed by thousands upon thousands of people and is also making claims as to why a system is out performing a high end pc which could be categorically and factually wrong. I would hope that they would be called out for it. Especially if it is not intentional, so the person can be given a chance to rectify the content at fault.

If it is not at fault then myself and others can be ignored, and we can move on. There's nothing wrong with questioning the data here when so many people have vastly different performing results to the content in question.
 
Last edited:
Top Bottom