Support NeoGAF

rnlval · Dec 16, 2020

ethomaz said:
Wut?

No... TMUs can’t be used as ROPs.

Please.

You're wrong.

rnlval · Dec 16, 2020

onQ123 said:
I'm talking about this "XSX GPU has additional CUs for raytracing and texture operations"

RDNA 2 DCU contains texture and raytracing hardware units NOT just texture units. XSX GPU is RDNA v2 CU NOT RDNA v1 CU.

thicc_girls_are_teh_best · Dec 16, 2020

rnlval said:
XSX GPU has additional CUs for raytracing and texture operations. TMUs can be used as an ROPS workaround.

I think I see what you're saying here, given the TMUs can still be used for texture texel image data. I'm not 100% clear how it would work in practice but it should technically be doable, since texels and pixels are somewhat interchangeable when talking TMUs.

Aside, want to touch back real quick on the discussion of cache bandwidths. Let's just focus on L1$ here. I don't know the actual latency of the L1$ with AMD's GPUs (and I mean L1$ the way AMD refers to L2$ on the CPU), but theoretically we know the peaks. For workloads that favor 36 CUs and lower PS5's clock gives it the cache advantage for both L1$ and L2$, I figure their L1$ cache at most is 4.11 TB/s. At 36 CUs for Microsoft's system, at most it would be 3.36 TB/s.

If 3P games are having a hard time right now saturating all 52 CUs on Series X then this alone is a good indication of where a problem might be on that end, because the L1$ for PS5 has virtually a 1.22x bandwidth increase over Series X's for workloads saturating 36 CUs or less. The same can be said for L0$ rates, if CU saturation is at 36 units or less.

In order to overcome that on Series X, with their clocks, they'd need to saturate at least 44 CUs, as that'd get their bandwidth to 4.11 TB/s. However, Series systems lack cache scrubbers, so any cache misses that go beyond the L2$ (therefore go into system memory) may take a bigger hit because the full cache line that needs an update has to be flushed out, that data has to then go into GDDR6 memory (if it isn't there already), then get populated back into the cache.

Series X having larger GDDR6 bandwidth should theoretically mitigate some of that hit, but consider this: at full GPU saturation, the L1$ bandwidth on Series X is at most, 4.858 TB/s. That's about a 768 GB/s difference over PS5's. Repeating this with the respective L2$ bandwidths would see a somewhat smaller advantage for Series X, but still an advantage. Where it is somewhat interesting is if Sony may've increased their L2$ size, say to 8 MB, as that could make a bit of a difference. Also we've already been known that Series X has something of a bandwidth dropoff in the GDDR6 pool if the GPU has to go beyond the 10 GB specifically allocated to it. What amount of that dropoff is unknown at this current time.

I'm bringing this up because theoretically, if that aforementioned dropoff shouldn't be too much, but we're running into areas where there are some noticeable performance issues on Series X compared to PS5, then I have to think their "tools" being looked into, are focusing on that "split" RAM bandwidth access, and will probably focus on ways to pack GPU-relevant data into the fast 10 GB pool more efficiently so that it doesn't spill out into the 6 GB pool (if that is already happening), or need to access storage as frequently (if that is already happening).

CU saturation in and of itself is actually NOT the issue here, but if a given game either runs on an engine that targets a specific saturation point lower in hardware resources than what a given other system actually has on offer (in terms of CUs speaking here), AND the other hardware system with proverbial food left on the table already requires you to "eat" that food at a slower pace (slower GPU clock) no matter what (due to fixed frequency implementation), THAT can create utilization problems for multiplat game performance.

Of course, if you keep all the aforementioned in mind you can also figure that even things like texture fillrate run worst on Series X if CU utilization is at 36 or lower, because that's yet another thing affected by the clock rate in a sizable fashion. That also puts a bigger hit on things that are bound tightly to the pixel fillrate; you're right in stating that the TMUs can be used in lieu of the ROPs for certain (maybe most) pixel-bound things or alpha-related work, but that'd have to predicate itself on at least 44 CUs being saturated; even then, you trade in some of the texel performance on a few of those CUs to give a boost to the pixel fillrate, so ideally you probably want to saturate another 4 CUs, to bring that to 48.

How many cross-gen 3P games are actually doing this type of saturation or optimization? Probably very few TBH. Cyberpunk is probably one of the few examples but even that game still has a few performance issues here and there on MS's platform. Sometimes I do wonder how much of this could've been avoided in terms of behind-the-scenes dev headaches if they clocked the GPU even 100 MHz higher.

longdi · Dec 16, 2020

Yes SX is more powerful.
SX = goku
Ps5 = vegeta

Sony api and devtools are much better now, allowing ps5 devs to push the system more and punches above it's weight.

Of course the gap is not as large as XO and ps4, but nonetheless we should be seeing SX gaining on ps5 over the years. A comeback is on.

While a lot is made on ps5 high clocks, you have to remember the tdp and temp limitations of a closed box console. Current cross gen games are not hammering the bits and silicon inside both apu as hard. Once the likes of avx2 is used, it eats up the thermal headroom it has now.

Also rdna2 pc gpu we seen, are highly tdp limited and also perform just as well with a underclock and undervolt, not sure if the high clocks above 2ghz are giving the same penny worth

rnlval · Dec 16, 2020

LordBlodgett said:
This is my thought. I still don't expect much difference though. This is the closest these two consoles (Sony & Microsoft) have ever been spec-wise. Even on paper we are talking about only a 17% difference in the GPU and a negligible CPU power difference. One X had a 50% more powerful GPU than PS4 Pro, and the PS4 had 50% more powerful GPU than base Xbox One. PS3 was way more powerful than Xbox 360 on paper but was extremely hard to develop for, and then original Xbox was much more powerful than PS2 but came out much later.

6 / 4.2 = 1.42, hence X1X has 42% extra GPU FP32 compute and texture operation power.

PS4 Pro GPU has rapid pack math with 8.4 TFLOPS FP16 while X1X GPU has 6 TFLOPS FP16 (Polaris pack math), hence PS4 Pro GPU has 40% extra FP16 compute power when compared to X1X GPU.

326 / 224 = 1.45, hence X1X has 45% extra memory bandwdith for compute, textures and raster.

At 4K resolution, PS4 Pro GPU is mostly memory bandwidth bottlenecked.

papersleeves · Dec 16, 2020

Afro Republican said:
Poor people here, in this thread. They are so confused.

XSX/PS5 have already nearly 300 games, most o those games run better on XSX, just because most games don't have 500 million dollar ad campaigns and take over most YouTube ad slots don't mean those games don't exist.

People here are acting like XSX/PS5 together only have 50 or 60 games, over a month after launch.

ZzzzzzzzZzzzzzzz

Comments like that, with any lack of backing up the claim, put me to sle

ethomaz · Dec 16, 2020

rnlval said:
That's BS.

Try again.

Says “TMUs can be used as an ROPS workaround.”
Shows slide where you bypass part of ROPs processing with CUs.

If you look at the previous slides it was a issue found where you don’t have enough ROPs to use all the bandwidth required by the RGB blending example.

BTW the presentation transcript:

“Writing through a UAV bypasses the ROPs and goes straight to memory. This solution obviously does not apply to all sorts of rendering, for one we are skipping the entire graphics pipeline as well on which we still depend for most normal rendering. However, in the cases where it applies it can certainly result in a substantial performance increase. Cases where we are initializing textures to something else than a constant color, simple post -effects, this would be useful.”

rnlval · Dec 16, 2020

thicc_girls_are_teh_best said:
I think I see what you're saying here, given the TMUs can still be used for texture texel image data. I'm not 100% clear how it would work in practice but it should technically be doable, since texels and pixels are somewhat interchangeable when talking TMUs.

Aside, want to touch back real quick on the discussion of cache bandwidths. Let's just focus on L1$ here. I don't know the actual latency of the L1$ with AMD's GPUs (and I mean L1$ the way AMD refers to L2$ on the CPU), but theoretically we know the peaks. For workloads that favor 36 CUs and lower PS5's clock gives it the cache advantage for both L1$ and L2$, I figure their L1$ cache at most is 4.11 TB/s. At 36 CUs for Microsoft's system, at most it would be 2.1 TB/s.

If 3P games are having a hard time right now saturating all 52 CUs on Series X then this alone is a good indication of where a problem might be on that end, because the L1$ for PS5 has virtually a 2x bandwidth increase over Series X's for workloads saturating 36 CUs or less. The same can be said for L0$ rates, if CU saturation is at 36 units or less.

In order to overcome that on Series X, with their clocks, they'd need to saturate at least 44 CUs, as that'd get their bandwidth to 4.11 TB/s. However, Series systems lack cache scrubbers, so any cache misses that go beyond the L2$ (therefore go into system memory) may take a bigger hit because the full cache line that needs an update has to be flushed out, that data has to then go into GDDR6 memory (if it isn't there already), then get populated back into the cache.

Series X having larger GDDR6 bandwidth should theoretically mitigate some of that hit, but consider this: at full GPU saturation, the L1$ bandwidth on Series X is at most, 4.858 TB/s. That's about a 768 GB/s difference over PS5's. Repeating this with the respective L2$ bandwidths would see a somewhat smaller advantage for Series X, but still an advantage. Where it is somewhat interesting is if Sony may've increased their L2$ size, say to 8 MB, as that could make a bit of a difference. Also we've already been known that Series X has something of a bandwidth dropoff in the GDDR6 pool if the GPU has to go beyond the 10 GB specifically allocated to it. What amount of that dropoff is unknown at this current time.

I'm bringing this up because theoretically, if that aforementioned dropoff shouldn't be too much, but we're running into areas where there are some noticeable performance issues on Series X compared to PS5, then I have to think their "tools" being looked into, are focusing on that "split" RAM bandwidth access, and will probably focus on ways to pack GPU-relevant data into the fast 10 GB pool more efficiently so that it doesn't spill out into the 6 GB pool (if that is already happening), or need to access storage as frequently (if that is already happening).

CU saturation in and of itself is actually NOT the issue here, but if a given game either runs on an engine that targets a specific saturation point lower in hardware resources than what a given other system actually has on offer (in terms of CUs speaking here), AND the other hardware system with proverbial food left on the table already requires you to "eat" that food at a slower pace (slower GPU clock) no matter what (due to fixed frequency implementation), THAT can create utilization problems for multiplat game performance.

Of course, if you keep all the aforementioned in mind you can also figure that even things like texture fillrate run worst on Series X if CU utilization is at 36 or lower, because that's yet another thing affected by the clock rate in a sizable fashion. That also puts a bigger hit on things that are bound tightly to the pixel fillrate; you're right in stating that the TMUs can be used in lieu of the ROPs for certain (maybe most) pixel-bound things or alpha-related work, but that'd have to predicate itself on at least 44 CUs being saturated; even then, you trade in some of the texel performance on a few of those CUs to give a boost to the pixel fillrate, so ideally you probably want to saturate another 4 CUs, to bring that to 48.

How many cross-gen 3P games are actually doing this type of saturation or optimization? Probably very few TBH. Cyberpunk is probably one of the few examples but even that game still has a few performance issues here and there on MS's platform. Sometimes I do wonder how much of this could've been avoided in terms of behind-the-scenes dev headaches if they clocked the GPU even 100 MHz higher.

PS5 GPU with 8 MB L2 cache is unproven. NAVI 21 still has a 4MB L2 cache.

rnlval · Dec 16, 2020

ethomaz said:
Says “TMUs can be used as an ROPS workaround.”
Shows slide where you bypass part of ROPs processing with CUs.

If you look at the previous slides it was a issue found where you don’t have enough ROPs to use all the bandwidth required by the RGB blending example.

BTW the presentation transcript:

“Writing through a UAV bypasses the ROPs and goes straight to memory. This solution obviously does not apply to all sorts of rendering, for one we are skipping the entire graphics pipeline as well on which we still depend for most normal rendering. However, in the cases where it applies it can certainly result in a substantial performance increase. Cases where we are initializing textures to something else than a constant color, simple post -effects, this would be useful.”

DX12_FL12_1_ROV (re-order raster op) and MSAA will be missing with texture unit-based ROPS workaround, hence alternatives have to be used. Any ROPS fix functions will be missing with the texture unit-based ROPS workaround.

ethomaz · Dec 16, 2020

rnlval said:
RDNA 2 DCU contains texture and raytracing hardware units NOT just texture units. XSX GPU is RDNA v2 CU NOT RDNA v1 CU.

Are you sure RDNA 2 do both at same time?
Because Series X’s CU do either 4 Texture or 4 Ray-tracing ops per cycle.

rnlval · Dec 16, 2020

ethomaz said:
Are you sure RDNA 2 do both at same time?
Because Series X do either 4 Texture or 4 Ray-tracing ops per cycle.

The concurrent issue is a different debate. My argument is RDNA 2 CU contains both texture and raytracing hardware units regardless of concurrent design flaws.

For TMUs and RT hardware, AMD probably didn't design to be dual-port I/O i.e. hence RDNA 3.

.

thicc_girls_are_teh_best · Dec 16, 2020

rnlval said:
PS5 GPU with 8 MB L2 cache is unproven. NAVI 21 still has a 4MB L2 cache.

I know; just speculating there's an off-chance they might've doubled that size. I kind of doubt it, but at the same time they would've had some budget to put towards it since they have a smaller GPU.

Knitted Knight · Dec 16, 2020

Most powerful can only be defined as the best performant. So far, for the time being, the best performant console is the PS5.

If you can't beat your competitor at the metric "power" is made to address: game performance; then you're not more powerful at running games. Maybe more powerful at cleaning dishes or a different strand of code not related to gaming perhaps... Since we're talking about a dedicated gaming console and not a multi-purpose PC the point is moot for "power for something else".

The question causes tribal warfare due to the obvious but there is something to be said about marketing depts misleading with power narratives (an ambiguous term) in a complex environment as is gaming performance. Obviously marketing depts don't care cause they're trying to sell you a product. The problem is when the sheep verbatim buy into it despite real world evidence showing different results.

To quote some scripture,

Cerny 26:24: "This continuous improvement in AMD technology means it's dangerous to rely on teraflops as an absolute indicator of performance, and CU count should be avoided as well."

rnlval · Dec 16, 2020

Codename: Classified said:
Most powerful can only be defined as the best performant. So far, for the time being, the best performant console is the PS5.

If you can't beat your competitor at the metric "power" is made to address: game performance; then you're not more powerful at running games. Maybe more powerful at cleaning dishes or a different strand of code not related to gaming perhaps...

The question causes tribal warfare due to the obvious but there is something to be said about marketing depts misleading with power narratives (an ambiguous term) in a complex environment as is gaming performance. Obviously marketing depts don't care cause they're trying to sell you a product. The problem is when the sheep verbatim buy into it despite real world evidence showing different results.

To quote some scripture,

Cerny 26:24: "This continuous improvement in AMD technology means it's dangerous to rely on teraflops as an absolute indicator of performance, and CU count should be avoided as well."

Cerny's statement is true for GCN vs RDNA, but NAVI 21 is about 2X scale RX 5700 XT's gaming results.

AMD Radeon RX 6900 XT Review - The Biggest Big Navi

AMD's Radeon RX 6900 XT offers convincing 4K gaming performance, yet stays below 300 W. Thanks to this impressive efficiency, the card is almost whisper-quiet, quieter than any RTX 3090 we've ever tested. In our review, we not only benchmark the RX 6900 XT on Intel, but also on Zen 3, with fast...

www.techpowerup.com

AMD effectively superglued two RX 5700 XTs, reduce power consumption, add DX12U features = RX 6800 XT and 6900 XT

onQ123 · Dec 16, 2020

rnlval said:
RDNA 2 DCU contains texture and raytracing hardware units NOT just texture units. XSX GPU is RDNA v2 CU NOT RDNA v1 CU.

I know what it has but you said additional CUs for ray-tracing that is not true

Dave_at_Home · Dec 16, 2020

It's a shame when people look at PS5 performance in cross-gen titles versus Series X and immediately assume there's something wrong with the Xbox. Maybe it's worth exploring why PS5 is efficient as it is?

Everyone gather in a circle and hold hands. The seance for conjuring an x-ray for PS5's die shot will begin shortly...

polybius80 · Dec 16, 2020

rnlval said:
You're wrong.

it doesnt say TMUs are being used as ROPs it says you can bypass ROPs in the compute shaders and modify texture buffers that is not the same a doing ROP's work

Ev1L AuRoN · Dec 16, 2020

They excel at different things, still too early to call who will perform better, one thing we can all agree is, the PS5 is an incredible efficient machine. My money is on parity for this gen, the differences are minimal and most developers will aim for parity. It is in the exclusive that we'll see both machines stretch their muscles.

Silent Viper · Dec 16, 2020

//DEVIL// said:
I own both platforms but for every generation, I buy the exclusive of each platform and multiplatform games on the system that display the game best (and usually its a collection. I don't buy mix). Because why play the inferior version when you have both systems right?

on paper, the XSX is more powerful. but the last few games PS5 has some advantages aside from COD Black ops which is better on Xbox series X ( no dips while ray tracing on ).

Is it really some stupid tools we are missing on Xbox or is the PS5 generally better system ?

If you compare CUs and Memory speed and bandwidth then YES.

Normally on PCs with higher CUs and Bandwidth considered more powerful but it also comes down to drivers/software to utilize power properly.

Incase SX vs PS5 both are very close but SX has advantage in cpu and gpu department but PS5 is not super far behind. Ultimately it comes to software and tools and how devs utilizes hardware.

I can see in future SX winning multiplatform battles once devs get familiar with hardware but it will not be by huge margin. Difference will be very negligible.

Bo_Hazem · Dec 16, 2020

sinnergy · Dec 16, 2020

Dave_at_Home said:
It's a shame when people look at PS5 performance in cross-gen titles versus Series X and immediately assume there's something wrong with the Xbox. Maybe it's worth exploring why PS5 is efficient as it is?

Everyone gather in a circle and hold hands. The seance for conjuring an x-ray for PS5's die shot will begin shortly...

Yeah it has nothing to do that it is developing for PS4 pro, and has the same amount of CUs but faster ...

Shifty1897 · Dec 16, 2020

Now that I think about it, this is probably the closest two competing consoles have ever been in terms of power. I'm really interested to see if XSX can update their SDK/Devkits to iron out the performance issues we're seeing on the console.

DonJimbo · Dec 16, 2020

Who cares enjoy the games

RobRSG · Dec 16, 2020

In raw compute, XSX is better.

But seems that devs are having an easier time with PS5.

In 1 or 2 years we will be able to tell who is the multi-platform king, as the output of new games increases.

Dave_at_Home · Dec 16, 2020

sinnergy said:
Yeah it has nothing to do that it is developing for PS4 pro, and has the same amount of CUs but faster ...

Or possibly PS5 has some form of rumored infinity cache and it's CPU has its L3 cache unified. We won't know for sure until x-ray shot of the die.

What we do know is the cache scrubbers are reducing GPU overhead by creating fine-grain eviction of data for the GPU's cache. No other GPU on the market has this. No constant flushing of data and certainly less, if not all, cache misses removed means the GPU does what it's supposed to do. Draw. Not to mention the savings on memory bandwidth that creates.

Overall, yes, we'll have to wait for real next-gen games to see the difference. I just find it funny how people just want to compare numbers without actually focusing on how the whole system works.

thicc_girls_are_teh_best · Dec 16, 2020

Dave_at_Home said:
Or possibly PS5 has some form of rumored infinity cache and it's CPU has its L3 cache unified. We won't know for sure until x-ray shot of the die.

What we do know is the cache scrubbers are reducing GPU overhead by creating fine-grain eviction of data for the GPU's cache. No other GPU on the market has this. No constant flushing of data and certainly less, if not all, cache misses removed means the GPU does what it's supposed to do. Draw. Not to mention the savings on memory bandwidth that creates.

Overall, yes, we'll have to wait for real next-gen games to see the difference. I just find it funny how people just want to compare numbers without actually focusing on how the whole system works.

PS5 doesn't have any Infinity Cache; the APU is too small for it. At most they might've increased the size of the GPU L2$ a bit, maybe to 5 MB, maybe to 6 MB, maybe to 8 MB. But there's no 128 MB of IC, or even 32 or 16 MB there, the APU just isn't big enough for that and IC resides on the GPU (which is part of the APU).

I do think the cache scrubbers are giving some benefits though, that should be readily noticeable. Touched on it a bit in the previous post.

Captain Toad · Dec 16, 2020

Games run better on PS5. Honestly, that's all that matters.

Dave_at_Home · Dec 16, 2020

thicc_girls_are_teh_best said:
PS5 doesn't have any Infinity Cache; the APU is too small for it. At most they might've increased the size of the GPU L2$ a bit, maybe to 5 MB, maybe to 6 MB, maybe to 8 MB. But there's no 128 MB of IC, or even 32 or 16 MB there, the APU just isn't big enough for that and IC resides on the GPU (which is part of the APU).

I do think the cache scrubbers are giving some benefits though, that should be readily noticeable. Touched on it a bit in the previous post.

I'm leaning far more toward PS5 having no infinity cache than anything else. It's still possible for Sony to have some sort of custom implementation with the caches, but the x-ray shot is needed. There have been patents showing L3 cache unified, though. I believe good old George posted about that. GPU cache scrubbers will definitely be putting in work to keep that GPU drawing as much as possible. I think people underestimate it.

Edit: A question remains: Is it simply the PS5's cache scrubbers and clock speeds that allows it to keep up or outperform Series X? Maybe. Can't help but think there's got to be something else other than clock speeds, cache scrubbers, and API.

MrLove · Dec 16, 2020

Dave_at_Home said:
I'm leaning far more toward PS5 having no infinity cache than anything else. It's still possible for Sony to have some sort of custom implementation with the caches, but the x-ray shot is needed. There have been patents showing L3 cache unified, though. I believe good old George posted about that. GPU cache scrubbers will definitely be putting in work to keep that GPU drawing as much as possible. I think people underestimate it.

PS5 got "Large" SRAM in I/O complex with high probaility for similiar task. Thread focus: Even MS itself only speaks of the most powerful Xbox ever. PS5 will stay in future the best place for multigames.

Yoboman · Dec 16, 2020

longdi said:
Yes SX is more powerful.
SX = goku
Ps5 = vegeta

Sony api and devtools are much better now, allowing ps5 devs to push the system more and punches above it's weight.

Of course the gap is not as large as XO and ps4, but nonetheless we should be seeing SX gaining on ps5 over the years. A comeback is on.

While a lot is made on ps5 high clocks, you have to remember the tdp and temp limitations of a closed box console. Current cross gen games are not hammering the bits and silicon inside both apu as hard. Once the likes of avx2 is used, it eats up the thermal headroom it has now.

Also rdna2 pc gpu we seen, are highly tdp limited and also perform just as well with a underclock and undervolt, not sure if the high clocks above 2ghz are giving the same penny worth

PS5:

Series X:

Series S:

Dave_at_Home · Dec 16, 2020

MrLove said:
PS5 got "Large" SRAM in I/O complex with high probaility for similiar task. Thread focus: Even MS itself only speaks of the most powerful Xbox ever. PS5 will stay in future the best place for multigames.

I believe the SRAM in the I/O complex just grants faster access to cached memory as it relates to the SSD. If it were to benefit the PS5's GPU it would need to be directly on the GPU for direct access. That's not to say it doesn't help the GPU in some roundabout way when you look at the system as a whole.

rnlval · Dec 16, 2020

onQ123 said:
I know what it has but you said additional CUs for ray-tracing that is not true

FACTS: When compared to PS5, XSX has additional raytracing units via the additional CU count.

rnlval · Dec 16, 2020

thicc_girls_are_teh_best said:
PS5 doesn't have any Infinity Cache; the APU is too small for it. At most they might've increased the size of the GPU L2$ a bit, maybe to 5 MB, maybe to 6 MB, maybe to 8 MB. But there's no 128 MB of IC, or even 32 or 16 MB there, the APU just isn't big enough for that and IC resides on the GPU (which is part of the APU).

I do think the cache scrubbers are giving some benefits though, that should be readily noticeable. Touched on it a bit in the previous post.

NAVI 21's 128 MB Infinity Cache (L3 cache) is about 100 mm2 in size. From Infinity Cache size, 4MB L3 cache would be ~3.125 mm2.

PS5's 309 mm2 size APU is tight when NAVI 10 GPU has 252 mm2 which is without hardware raytracing, machine learning dot-math, and DirectX12 Ultimate features.

For legacy PS4 Pro hardware support, semi-custom features such as hardware checkerboard feature may need to be added into PS5's NAVI scale with 40 CU GPU. This is not including PS5's semi-custom audio DSP hardware based AMD's CU IP.

geordiemp · Dec 16, 2020

rnlval said:
FACTS: When compared to PS5, XSX has additional raytracing units via the additional CU count.

And it runs comparisons worse than ps5, so how does that work ?

rnlval · Dec 16, 2020

polybius80 said:
it doesnt say TMUs are being used as ROPs it says you can bypass ROPs in the compute shaders and modify texture buffers that is not the same a doing ROP's work

Shader resource view (SRV) and Unordered Access view (UAV) - UWP applications

Shader resource views typically wrap textures in a format that the shaders can access them. An unordered access view provides similar functionality, but enables the reading and writing to the texture (or other resource) in any order.

docs.microsoft.com

"UAV Texture" is texture unit usage. LOL

Texture units are the primary read/write (scatter-gather) units for GpGPU work.

Stream ALU processors don't have gather-scatter load-store units.

Read UAV Texture and TMU usage relationship example from https://github.com/sebbbi/perftest/issues/3

.

rnlval · Dec 16, 2020

geordiemp said:
And it runs comparisons worse than ps5, so how does that work ?

Actually, XSX has superior raytracing when compared to PS5.

LordBlodgett · Dec 16, 2020

rnlval said:
6 / 4.2 = 1.42, hence X1X has 42% extra GPU FP32 compute and texture operation power.

PS4 Pro GPU has rapid pack math with 8.4 TFLOPS FP16 while X1X GPU has 6 TFLOPS FP16 (Polaris pack math), hence PS4 Pro GPU has 40% extra FP16 compute power when compared to X1X GPU.

326 / 224 = 1.45, hence X1X has 45% extra memory bandwdith for compute, textures and raster.

At 4K resolution, PS4 Pro GPU is mostly memory bandwidth bottlenecked.

I'm not sure what point you are trying to make here. Whether we are talking about 40% or 50% difference in GPU. or a 45% difference in memory bandwidth the point I was trying to get across still stands.

The PS4 on paper was stronger than the Xbox One in every way, and the games proved that out with many games running at 1080p on PS4 with enhanced AA or other graphic features, and the Xbox One ran games at 900p, or sometimes as low as 720p. Same with the PS4 Pro and One X. One X ran far more games at full 4k or 1800p with greater graphical features than the PS4 Pro. Those major differences (on paper) were born out in real world results.

This gen is going to show very minimal differences between the two in multi-platform games, as both systems are super close in power.

rnlval · Dec 16, 2020

LordBlodgett said:
I'm not sure what point you are trying to make here. Whether we are talking about 40% or 50% difference in GPU. or a 45% difference in memory bandwidth the point I was trying to get across still stands.

The PS4 on paper was stronger than the Xbox One in every way, and the games proved that out with many games running at 1080p on PS4 with enhanced AA or other graphic features, and the Xbox One ran games at 900p, or sometimes as low as 720p. Same with the PS4 Pro and One X. One X ran far more games at full 4k or 1800p with greater graphical features than the PS4 Pro. Those major differences (on paper) were born out in real world results.

This gen is going to show very minimal differences between the two in multi-platform games, as both systems are super close in power.

On paper, PS4 Pro's GPU with 64 ROPS and 8.4 TFLOPS FP16 RPM is a pretty good GPU, but it was gimped by 224 GB/s memory bandwidth.

MS asked AMD to attach a 2 MB render cache and 326 GB/s memory bandwidth to X1X GPU (6 TFLOPS FP16/FP32)'s 32 ROPS which is missing from baseline Polaris 10 GPU. X1X GPU has 2 MB L2 cache (for TMU) + 2 MB render cache (for ROPS), hence roughly duplicating Vega 56/64's unified 4MB L2 cache for TMU and ROPS.

Baseline Polaris 10 IP has a 2 MB L2 cache attached to TMUs which is applicable for PS4 Pro and X1X.

polybius80 · Dec 16, 2020

rnlval said:
Shader resource view (SRV) and Unordered Access view (UAV) - UWP applications

Shader resource views typically wrap textures in a format that the shaders can access them. An unordered access view provides similar functionality, but enables the reading and writing to the texture (or other resource) in any order.

docs.microsoft.com

"UAV Texture" is texture unit usage. LOL

Texture units are the primary read/write (scatter-gather) units for GpGPU work.

Stream ALU processors don't have gather-scatter load-store units.

.

a texture is a buffer, it says it bypases to directly access the UAV Textures(or buffer) in compute shader that is not doing ROP's work rather accesing the buffers that ROPS can use in compute shaders, there is a difference

rnlval · Dec 17, 2020

polybius80 said:
a texture is a buffer, it says it bypases to directly access the UAV Textures(or buffer) in compute shader that is not doing ROP's work rather accesing the buffers that ROPS can use in compute shaders, there is a difference

1. ROPS are the traditional graphics read and write units with fixed-function graphics features e.g. MSAA.
2. TMUs are the alternative read and write units for GpGPU workloads.

For ROPS bound situation, use texture read/write the unit path to shift towards memory bandwidth bound.

geordiemp · Dec 17, 2020

rnlval said:
Actually, XSX has superior raytracing when compared to PS5.

And nobody can see it in any game, funny that. Try harder

rnlval · Dec 17, 2020

geordiemp said:
And nobody can see it in any game, funny that. Try harder

COD: Black Ops Cold War Performs Better on Xbox Series X With Ray Tracing, While Better on PS5 in 120fps Mode - TechnoSports

PlayStation 5 and Xbox Series X, both have launched in the market, now the two consoles are being compared depending upon the performance like always. For testing Call of Duty: Black Ops Cold War is a nice option, as it is a next-gen game, supports multiplatform, and offers both 120fps and Ray...

technosports.co.in

COD: Black Ops Cold War Performs Better on Xbox Series X With Ray Tracing.

Try again.

geordiemp · Dec 17, 2020

rnlval said:
COD: Black Ops Cold War Performs Better on Xbox Series X With Ray Tracing, While Better on PS5 in 120fps Mode - TechnoSports

PlayStation 5 and Xbox Series X, both have launched in the market, now the two consoles are being compared depending upon the performance like always. For testing Call of Duty: Black Ops Cold War is a nice option, as it is a next-gen game, supports multiplatform, and offers both 120fps and Ray...

technosports.co.in

COD: Black Ops Cold War Performs Better on Xbox Series X With Ray Tracing.

Try again.

No that was a bug that went away when reloading the same point. Try again.

rnlval · Dec 17, 2020

geordiemp said:
No that was a bug that went away when reloading the same point. Try again.

It wasn't.

geordiemp · Dec 17, 2020

rnlval said:
It wasn't.

Yes it was

There was an issue on PS5 where the frame rate could randomly drop below 60fps during scenes that it previously ran at a solid 60fps. This issue wasn't encountered on Xbox Series X.

Black Ops Cold War Glitch Leads To Performance Issues With RT On PS5, Here's A Temp Fix

Call of Duty: Black Ops Cold War has a performance glitch on the PS5 which can lead to frame rate issues when running with ray-tracing mode.

twistedvoxel.com

.

Try again.

onQ123 · Dec 17, 2020

rnlval said:
FACTS: When compared to PS5, XSX has additional raytracing units via the additional CU count.

SMH so you're just trying to make it sound better lol

rnlval · Dec 17, 2020

geordiemp said:
Yes it was

Try again.

It doesn't prove it.

geordiemp · Dec 17, 2020

rnlval said:
It doesn't prove it.

Does not prove otherwise either, a bug where if you restart checkpoint it goes away.

So your wrong.

By your logic, Cyberpunk the 60 FPS mode on XSX drops to 30 sometimes and totally freezes, so must be less powerful - no its a bug.

rnlval · Dec 17, 2020

onQ123 said:
SMH so you're just trying to make it sound better lol

FACT: XSX GPU has more hardware RT resources when compared to PS5. My argument is NVIDIA RTX way!

Coulomb_Barrier · Dec 17, 2020

rnlval said:
FACTS: When compared to PS5, XSX has additional raytracing units via the additional CU count.

Spiderman: MM ray tracing looks another level compared to any RT on XsX though...that said you're wrong.

Support NeoGAF

Ok, We need to know this: Is the XSX more powerful than the PS5 or not?

Member

Member

Member

Banned

Member

Member

Banned

Member

Member

Banned

Member

Member

Banned

Member

Member

Member

Member

Member

Banned

Banned

Member

Member

Member

Member

Member

Member

Banned

Member

Banned

Gold Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Member

Similar threads