RDNA 2 DCU contains texture and raytracing hardware units NOT just texture units. XSX GPU is RDNA v2 CU NOT RDNA v1 CU.I'm talking about this "XSX GPU has additional CUs for raytracing and texture operations"
XSX GPU has additional CUs for raytracing and texture operations. TMUs can be used as an ROPS workaround.
6 / 4.2 = 1.42, hence X1X has 42% extra GPU FP32 compute and texture operation power.This is my thought. I still don't expect much difference though. This is the closest these two consoles (Sony & Microsoft) have ever been spec-wise. Even on paper we are talking about only a 17% difference in the GPU and a negligible CPU power difference. One X had a 50% more powerful GPU than PS4 Pro, and the PS4 had 50% more powerful GPU than base Xbox One. PS3 was way more powerful than Xbox 360 on paper but was extremely hard to develop for, and then original Xbox was much more powerful than PS2 but came out much later.
Poor people here, in this thread. They are so confused.
XSX/PS5 have already nearly 300 games, most o those games run better on XSX, just because most games don't have 500 million dollar ad campaigns and take over most YouTube ad slots don't mean those games don't exist.
People here are acting like XSX/PS5 together only have 50 or 60 games, over a month after launch.
Says “TMUs can be used as an ROPS workaround.”That's BS.
Try again.
PS5 GPU with 8 MB L2 cache is unproven. NAVI 21 still has a 4MB L2 cache.I think I see what you're saying here, given the TMUs can still be used for texture texel image data. I'm not 100% clear how it would work in practice but it should technically be doable, since texels and pixels are somewhat interchangeable when talking TMUs.
Aside, want to touch back real quick on the discussion of cache bandwidths. Let's just focus on L1$ here. I don't know the actual latency of the L1$ with AMD's GPUs (and I mean L1$ the way AMD refers to L2$ on the CPU), but theoretically we know the peaks. For workloads that favor 36 CUs and lower PS5's clock gives it the cache advantage for both L1$ and L2$, I figure their L1$ cache at most is 4.11 TB/s. At 36 CUs for Microsoft's system, at most it would be 2.1 TB/s.
If 3P games are having a hard time right now saturating all 52 CUs on Series X then this alone is a good indication of where a problem might be on that end, because the L1$ for PS5 has virtually a 2x bandwidth increase over Series X's for workloads saturating 36 CUs or less. The same can be said for L0$ rates, if CU saturation is at 36 units or less.
In order to overcome that on Series X, with their clocks, they'd need to saturate at least 44 CUs, as that'd get their bandwidth to 4.11 TB/s. However, Series systems lack cache scrubbers, so any cache misses that go beyond the L2$ (therefore go into system memory) may take a bigger hit because the full cache line that needs an update has to be flushed out, that data has to then go into GDDR6 memory (if it isn't there already), then get populated back into the cache.
Series X having larger GDDR6 bandwidth should theoretically mitigate some of that hit, but consider this: at full GPU saturation, the L1$ bandwidth on Series X is at most, 4.858 TB/s. That's about a 768 GB/s difference over PS5's. Repeating this with the respective L2$ bandwidths would see a somewhat smaller advantage for Series X, but still an advantage. Where it is somewhat interesting is if Sony may've increased their L2$ size, say to 8 MB, as that could make a bit of a difference. Also we've already been known that Series X has something of a bandwidth dropoff in the GDDR6 pool if the GPU has to go beyond the 10 GB specifically allocated to it. What amount of that dropoff is unknown at this current time.
I'm bringing this up because theoretically, if that aforementioned dropoff shouldn't be too much, but we're running into areas where there are some noticeable performance issues on Series X compared to PS5, then I have to think their "tools" being looked into, are focusing on that "split" RAM bandwidth access, and will probably focus on ways to pack GPU-relevant data into the fast 10 GB pool more efficiently so that it doesn't spill out into the 6 GB pool (if that is already happening), or need to access storage as frequently (if that is already happening).
CU saturation in and of itself is actually NOT the issue here, but if a given game either runs on an engine that targets a specific saturation point lower in hardware resources than what a given other system actually has on offer (in terms of CUs speaking here), AND the other hardware system with proverbial food left on the table already requires you to "eat" that food at a slower pace (slower GPU clock) no matter what (due to fixed frequency implementation), THAT can create utilization problems for multiplat game performance.
Of course, if you keep all the aforementioned in mind you can also figure that even things like texture fillrate run worst on Series X if CU utilization is at 36 or lower, because that's yet another thing affected by the clock rate in a sizable fashion. That also puts a bigger hit on things that are bound tightly to the pixel fillrate; you're right in stating that the TMUs can be used in lieu of the ROPs for certain (maybe most) pixel-bound things or alpha-related work, but that'd have to predicate itself on at least 44 CUs being saturated; even then, you trade in some of the texel performance on a few of those CUs to give a boost to the pixel fillrate, so ideally you probably want to saturate another 4 CUs, to bring that to 48.
How many cross-gen 3P games are actually doing this type of saturation or optimization? Probably very few TBH. Cyberpunk is probably one of the few examples but even that game still has a few performance issues here and there on MS's platform. Sometimes I do wonder how much of this could've been avoided in terms of behind-the-scenes dev headaches if they clocked the GPU even 100 MHz higher.
DX12_FL12_1_ROV (re-order raster op) and MSAA will be missing with texture unit-based ROPS workaround, hence alternatives have to be used. Any ROPS fix functions will be missing with the texture unit-based ROPS workaround.Says “TMUs can be used as an ROPS workaround.”
Shows slide where you bypass part of ROPs processing with CUs.
If you look at the previous slides it was a issue found where you don’t have enough ROPs to use all the bandwidth required by the RGB blending example.
BTW the presentation transcript:
“Writing through a UAV bypasses the ROPs and goes straight to memory. This solution obviously does not apply to all sorts of rendering, for one we are skipping the entire graphics pipeline as well on which we still depend for most normal rendering. However, in the cases where it applies it can certainly result in a substantial performance increase. Cases where we are initializing textures to something else than a constant color, simple post -effects, this would be useful.”
Are you sure RDNA 2 do both at same time?RDNA 2 DCU contains texture and raytracing hardware units NOT just texture units. XSX GPU is RDNA v2 CU NOT RDNA v1 CU.
The concurrent issue is a different debate. My argument is RDNA 2 CU contains both texture and raytracing hardware units regardless of concurrent design flaws.Are you sure RDNA 2 do both at same time?
Because Series X do either 4 Texture or 4 Ray-tracing ops per cycle.
PS5 GPU with 8 MB L2 cache is unproven. NAVI 21 still has a 4MB L2 cache.
Cerny's statement is true for GCN vs RDNA, but NAVI 21 is about 2X scale RX 5700 XT's gaming results.Most powerful can only be defined as the best performant. So far, for the time being, the best performant console is the PS5.
If you can't beat your competitor at the metric "power" is made to address: game performance; then you're not more powerful at running games. Maybe more powerful at cleaning dishes or a different strand of code not related to gaming perhaps...
The question causes tribal warfare due to the obvious but there is something to be said about marketing depts misleading with power narratives (an ambiguous term) in a complex environment as is gaming performance. Obviously marketing depts don't care cause they're trying to sell you a product. The problem is when the sheep verbatim buy into it despite real world evidence showing different results.
To quote some scripture,
Cerny 26:24: "This continuous improvement in AMD technology means it's dangerous to rely on teraflops as an absolute indicator of performance, and CU count should be avoided as well."
RDNA 2 DCU contains texture and raytracing hardware units NOT just texture units. XSX GPU is RDNA v2 CU NOT RDNA v1 CU.
You're wrong.
If you compare CUs and Memory speed and bandwidth then YES.I own both platforms but for every generation, I buy the exclusive of each platform and multiplatform games on the system that display the game best (and usually its a collection. I don't buy mix). Because why play the inferior version when you have both systems right?
on paper, the XSX is more powerful. but the last few games PS5 has some advantages aside from COD Black ops which is better on Xbox series X ( no dips while ray tracing on ).
Is it really some stupid tools we are missing on Xbox or is the PS5 generally better system ?
Yeah it has nothing to do that it is developing for PS4 pro, and has the same amount of CUs but faster ...It's a shame when people look at PS5 performance in cross-gen titles versus Series X and immediately assume there's something wrong with the Xbox. Maybe it's worth exploring why PS5 is efficient as it is?
Everyone gather in a circle and hold hands. The seance for conjuring an x-ray for PS5's die shot will begin shortly...
Or possibly PS5 has some form of rumored infinity cache and it's CPU has its L3 cache unified. We won't know for sure until x-ray shot of the die.Yeah it has nothing to do that it is developing for PS4 pro, and has the same amount of CUs but faster ...
Or possibly PS5 has some form of rumored infinity cache and it's CPU has its L3 cache unified. We won't know for sure until x-ray shot of the die.
What we do know is the cache scrubbers are reducing GPU overhead by creating fine-grain eviction of data for the GPU's cache. No other GPU on the market has this. No constant flushing of data and certainly less, if not all, cache misses removed means the GPU does what it's supposed to do. Draw. Not to mention the savings on memory bandwidth that creates.
Overall, yes, we'll have to wait for real next-gen games to see the difference. I just find it funny how people just want to compare numbers without actually focusing on how the whole system works.
I'm leaning far more toward PS5 having no infinity cache than anything else. It's still possible for Sony to have some sort of custom implementation with the caches, but the x-ray shot is needed. There have been patents showing L3 cache unified, though. I believe good old George posted about that. GPU cache scrubbers will definitely be putting in work to keep that GPU drawing as much as possible. I think people underestimate it.PS5 doesn't have any Infinity Cache; the APU is too small for it. At most they might've increased the size of the GPU L2$ a bit, maybe to 5 MB, maybe to 6 MB, maybe to 8 MB. But there's no 128 MB of IC, or even 32 or 16 MB there, the APU just isn't big enough for that and IC resides on the GPU (which is part of the APU).
I do think the cache scrubbers are giving some benefits though, that should be readily noticeable. Touched on it a bit in the previous post.
PS5 got "Large" SRAM in I/O complex with high probaility for similiar task. Thread focus: Even MS itself only speaks of the most powerful Xbox ever. PS5 will stay in future the best place for multigames.I'm leaning far more toward PS5 having no infinity cache than anything else. It's still possible for Sony to have some sort of custom implementation with the caches, but the x-ray shot is needed. There have been patents showing L3 cache unified, though. I believe good old George posted about that. GPU cache scrubbers will definitely be putting in work to keep that GPU drawing as much as possible. I think people underestimate it.
Yes SX is more powerful.
SX = goku
Ps5 = vegeta
Sony api and devtools are much better now, allowing ps5 devs to push the system more and punches above it's weight.
Of course the gap is not as large as XO and ps4, but nonetheless we should be seeing SX gaining on ps5 over the years. A comeback is on.
While a lot is made on ps5 high clocks, you have to remember the tdp and temp limitations of a closed box console. Current cross gen games are not hammering the bits and silicon inside both apu as hard. Once the likes of avx2 is used, it eats up the thermal headroom it has now.
Also rdna2 pc gpu we seen, are highly tdp limited and also perform just as well with a underclock and undervolt, not sure if the high clocks above 2ghz are giving the same penny worth
I believe the SRAM in the I/O complex just grants faster access to cached memory as it relates to the SSD. If it were to benefit the PS5's GPU it would need to be directly on the GPU for direct access. That's not to say it doesn't help the GPU in some roundabout way when you look at the system as a whole.PS5 got "Large" SRAM in I/O complex with high probaility for similiar task. Thread focus: Even MS itself only speaks of the most powerful Xbox ever. PS5 will stay in future the best place for multigames.
FACTS: When compared to PS5, XSX has additional raytracing units via the additional CU count.I know what it has but you said additional CUs for ray-tracing that is not true
NAVI 21's 128 MB Infinity Cache (L3 cache) is about 100 mm2 in size. From Infinity Cache size, 4MB L3 cache would be ~3.125 mm2.PS5 doesn't have any Infinity Cache; the APU is too small for it. At most they might've increased the size of the GPU L2$ a bit, maybe to 5 MB, maybe to 6 MB, maybe to 8 MB. But there's no 128 MB of IC, or even 32 or 16 MB there, the APU just isn't big enough for that and IC resides on the GPU (which is part of the APU).
I do think the cache scrubbers are giving some benefits though, that should be readily noticeable. Touched on it a bit in the previous post.
FACTS: When compared to PS5, XSX has additional raytracing units via the additional CU count.
it doesnt say TMUs are being used as ROPs it says you can bypass ROPs in the compute shaders and modify texture buffers that is not the same a doing ROP's work
Actually, XSX has superior raytracing when compared to PS5.And it runs comparisons worse than ps5, so how does that work ?
I'm not sure what point you are trying to make here. Whether we are talking about 40% or 50% difference in GPU. or a 45% difference in memory bandwidth the point I was trying to get across still stands.6 / 4.2 = 1.42, hence X1X has 42% extra GPU FP32 compute and texture operation power.
PS4 Pro GPU has rapid pack math with 8.4 TFLOPS FP16 while X1X GPU has 6 TFLOPS FP16 (Polaris pack math), hence PS4 Pro GPU has 40% extra FP16 compute power when compared to X1X GPU.
326 / 224 = 1.45, hence X1X has 45% extra memory bandwdith for compute, textures and raster.
At 4K resolution, PS4 Pro GPU is mostly memory bandwidth bottlenecked.
On paper, PS4 Pro's GPU with 64 ROPS and 8.4 TFLOPS FP16 RPM is a pretty good GPU, but it was gimped by 224 GB/s memory bandwidth.I'm not sure what point you are trying to make here. Whether we are talking about 40% or 50% difference in GPU. or a 45% difference in memory bandwidth the point I was trying to get across still stands.
The PS4 on paper was stronger than the Xbox One in every way, and the games proved that out with many games running at 1080p on PS4 with enhanced AA or other graphic features, and the Xbox One ran games at 900p, or sometimes as low as 720p. Same with the PS4 Pro and One X. One X ran far more games at full 4k or 1800p with greater graphical features than the PS4 Pro. Those major differences (on paper) were born out in real world results.
This gen is going to show very minimal differences between the two in multi-platform games, as both systems are super close in power.
Shader resource view (SRV) and Unordered Access view (UAV) - UWP applications
Shader resource views typically wrap textures in a format that the shaders can access them. An unordered access view provides similar functionality, but enables the reading and writing to the texture (or other resource) in any order.docs.microsoft.com
"UAV Texture" is texture unit usage. LOL
Texture units are the primary read/write (scatter-gather) units for GpGPU work.
Stream ALU processors don't have gather-scatter load-store units.
.
1. ROPS are the traditional graphics read and write units with fixed-function graphics features e.g. MSAA.a texture is a buffer, it says it bypases to directly access the UAV Textures(or buffer) in compute shader that is not doing ROP's work rather accesing the buffers that ROPS can use in compute shaders, there is a difference
Actually, XSX has superior raytracing when compared to PS5.
And nobody can see it in any game, funny that. Try harder
COD: Black Ops Cold War Performs Better on Xbox Series X With Ray Tracing.COD: Black Ops Cold War Performs Better on Xbox Series X With Ray Tracing, While Better on PS5 in 120fps Mode - TechnoSports
PlayStation 5 and Xbox Series X, both have launched in the market, now the two consoles are being compared depending upon the performance like always. For testing Call of Duty: Black Ops Cold War is a nice option, as it is a next-gen game, supports multiplatform, and offers both 120fps and Ray...technosports.co.in
Try again.
It wasn't.No that was a bug that went away when reloading the same point. Try again.
It wasn't.
There was an issue on PS5 where the frame rate could randomly drop below 60fps during scenes that it previously ran at a solid 60fps. This issue wasn't encountered on Xbox Series X.
FACTS: When compared to PS5, XSX has additional raytracing units via the additional CU count.
It doesn't prove it.Yes it was
Try again.
It doesn't prove it.
FACT: XSX GPU has more hardware RT resources when compared to PS5. My argument is NVIDIA RTX way!SMH so you're just trying to make it sound better lol
FACTS: When compared to PS5, XSX has additional raytracing units via the additional CU count.