• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Analysis 12.15 - 10.28= 1.87 Teraflops difference between the XSX and PS5 (52 CU's vs. 36 CU's)

Marlenus

Member
Jul 29, 2013
1,369
315
495
UK
Language and I used this.
It is wrong.

Flops has always been calculated using clock speed * shaders * operations per clock.

Rops, tmus and rasterisation have nothing to do with alu performance which is what flops is a measurement of.
 
  • Thoughtful
Reactions: Connxtion

Neur4lN01s3

Neophyte
Mar 19, 2020
91
129
125
The narrative of faster and narrow and more balanced has to stop.
This is simply history repeating itself with the show on the other foot.

Some facts

The Xbox One launched with 12 CU's at 933mhz
The PS4 launched with 18 CU's at 800mhz..

The PS4 had 50% more CU count than the Xbox One
The Xbox had 16% more speed.

The XSX has 52 CU's at 1823mhz
The PS4 has 36 Cu's at up to 2233mhz

The XSX has 44% more CU count than PS5
The Ps4 has up to 22% more speed.

We've heard wasted cycles, and balanced before.




That's from 2013 Xbox One interview Digital Foundry did when MS was getting slammed for having 12 vs 18 Cu's.

We all know how this ended up working.
We all know how this played across the generation.

Stop hoping for something different this gen.
 

rnlval

Member
Jun 26, 2017
363
299
255
Sector 001
gpucuriosity.wordpress.com
Yup. We can also see the same with PC GPUS very clearly.

You're quote paper spec, not real-world clock speed results.

From https://www.techpowerup.com/review/nvidia-geforce-rtx-2080-super-founders-edition/33.html



RTX 2080 Super Founders Edition has 1919 Mhz average clock speed which is about 11.79 TFLOPS.




RTX 2080 Ti Founders Edition has 1824 Mhz average clock speed which is about 15.876 TFLOPS.
 

Dontero

Member
Apr 19, 2018
2,607
2,629
595
What is the key factor behind implementing only 36 CUs? Why not use, like, 48, or whatever?

Cost? power consumption?
Cost. XSX will be more expensive.

I also don't understand why they picked up power as baseline rather than temperature or clocks. Power as baseline makes sense for server hardware where you have millions of W delivered and fluctuations of power delivery can literally make or break super server.
And like i predicted Cerny focused on some small bullshit (audio) than on something big much like he did before with Garlic bus on PS4. PS5 SSD setup is more worthy to tout as main feature than audio.

IF XSX has lower spec brother like rumors say either way it won't matter because Baseline for PS5 will be much higher which will have direct effect on games.
 

sinnergy

Member
Jun 16, 2007
3,166
1,142
1,135
LOL. Good 'ol Tom sticking to his 9tf.
He is not wrong, 9.2 was stress tested in the Github leak, that’s full power with locked clocks on both CPU and GPU, which is predictable performance , and nice to program for.

If you want to get frisky as a dev you can opt to optimize for a higher clock on CPU or GPU. But in the early days besides 1st party we won’t see much of them is my bet, takes to much time to get game to market.
 
Last edited:

Neur4lN01s3

Neophyte
Mar 19, 2020
91
129
125
Xbox Series X DirectX 12 Ultimate

DirectX Raytracing 1.1
DirectX Raytracing (DXR) brings a new level of graphics realism to video games, previously only achievable in the movie industry. The effects achievable by DXR feel more real, because in a sense they are more real: DXR traces paths of light with true-to-life physics calculations, which is a far more accurate simulation than the heuristics based calculations used previously.

We’ve already seen an unprecedented level of visual quality from titles that use DXR 1.0 since we unveiled it, and built DXR 1.1 in response to developer feedback, giving them even more tools with which to utilize DXR.

DXR 1.1 is an incremental addition over the top of DXR 1.0, adding three major new capabilities:

  • GPU Work Creation now allows Raytracing. This enables shaders on the GPU to invoke raytracing without an intervening round-trip back to the CPU. This ability is useful for adaptive raytracing scenarios like shader-based culling / sorting / classification / refinement. Basically, scenarios that prepare raytracing work on the GPU and then immediately spawn it.
  • Streaming engines can more efficiently load new raytracing shaders as needed when the player moves around the world and new objects become visible.
  • Inline raytracing is an alternative form of raytracing that gives developers the option to drive more of the raytracing process, as opposed to handling work scheduling entirely to the system (dynamic-shading). It is available in any shader stage, including compute shaders, pixel shaders etc. Both the dynamic-shading and inline forms of raytracing use the same opaque acceleration structures.
When to use inline raytracing
Inline raytracing can be useful for many reasons:

  • Perhaps the developer knows their scenario is simple enough that the overhead of dynamic shader scheduling is not worthwhile. For example, a well constrained way of calculating shadows.
  • It could be convenient/efficient to query an acceleration structure from a shader that doesn’t support dynamic-shader-based rays. Like a compute shader or pixel shader.
  • It might be helpful to combine dynamic-shader-based raytracing with the inline form. Some raytracing shader stages, like intersection shaders and any hit shaders, don’t even support tracing rays via dynamic-shader-based raytracing. But the inline form is available everywhere.
  • Another combination is to switch to the inline form for simple recursive rays. This enables the app to declare there is no recursion for the underlying raytracing pipeline, given inline raytracing is handling recursive rays. The simpler dynamic scheduling burden on the system can yield better efficiency.
Scenarios with many complex shaders will run better with dynamic-shader-based raytracing, as opposed to using massive inline raytracing uber-shaders. Meanwhile, scenarios that have a minimal shading complexity and/or very few shaders will run better with inline raytracing.

If the above all seems quite complicated, well, it is! The high-level takeaway is that both the new inline raytracing and the original dynamic-shader-based raytracing are valuable for different purposes. As of DXR 1.1, developers not only have the choice of either approach, but can even combine them both within a single renderer. Hybrid approaches are aided by the fact that both flavors of DXR raytracing share the same acceleration structure format, and are driven by the same underlying traversal state machine.
https://devblogs.microsoft.com/directx/dxr-1-1/


Variable Rate Shading
Variable Rate Shading (VRS) allows developers to selectively vary a game’s shading rate. This lets them ‘dial up’ the GPU power in more importance parts of the game for better visuals and ‘dial back’ the GPU power in less important areas of a game for better speed. Variable Rate Shading also has the advantage of being relatively low cost to implement for developers.
https://devblogs.microsoft.com/directx/variable-rate-shading-a-scalpel-in-a-world-of-sledgehammers/


Mesh Shaders
Mesh Shaders give developers more programmability than ever before. By bringing the full power of generalized GPU compute to the geometry pipeline, mesh shaders allow developers to build more detailed and dynamic worlds than ever before.

Prior to mesh shader, the GPU geometry pipeline hid the parallel nature of GPU hardware execution behind a simplified programming abstraction which only gave developers access to seemingly linear shader functions. For instance, the developer writes a vertex shader function that is called once for each vertex in a model, implying serial execution. However, behind the scenes, the hardware packs adjacent vertices to fill a SIMD wave, then executes 32 or 64 vertex shader functions in parallel on a single shader core. This model has worked extremely well for many years, but it is leaving performance and flexibility on the table by hiding the details of what the hardware is really doing from developers.

Mesh shaders change this by making geometry processing behave more like compute shaders. Rather than a single function that shades one vertex or one primitive, mesh shaders operate across an entire compute thread group, with access to group shared memory and advanced compute features such as cross-lane wave intrinsics that provide even more fine grained control over actual hardware execution. All these threads work together to shade a small indexed triangle list, called a ‘meshlet’. Typically there will be a phase of the mesh shader where each thread is working on a separate vertex, then another phase where each thread works on a separate primitive – but this model is completely flexible allowing data to be shared across threads, new vertices or primitives created as needed, existing primitives clipped or culled, etc.

Along with this new flexibility of thread allocation comes a flexibility of input data formats. Mesh shader no longer uses the Input Assembler block, which was previously responsible for fetching index and vertex data from memory. Instead, shader code is free to read whatever data is needed from any format it likes. This enables novel new techniques such as index buffer compression, or the use of multiple different index buffers for different channels of vertex data. Such approaches can reduce memory usage and also reduce the memory bandwidth used during rendering, thus boosting performance.

Although more flexible than the previous geometry pipeline, the mesh shader model is also much simpler:


Along with mesh shader comes an optional new shader stage called the Amplification Shader. This runs before the mesh shader, runs some computations, determines how many mesh shader thread groups are needed, and then launches that many mesh shaders:


Amplification shaders are especially useful for culling, as they can determine which meshlets are visible, testing each set of between 32-256 triangles against a geometric bounding volume, normal cone, or more advanced techniques such as portal visibility planes, before deciding whether to launch a mesh shader thread group for that meshlet. Previously, culling was typically performed on a coarser per-mesh level to decide whether to draw an object at all, and also on a finer per-triangle level at the end of the geometry pipeline. This new intermediate level of culling improves performance when drawing models that are only partially occluded. For instance, if part of a character is on screen while just one arm is not, an amplification shader can cull that entire arm after much less computation than it would have taken to shade all the triangles within it.
https://devblogs.microsoft.com/directx/coming-to-directx-12-mesh-shaders-and-amplification-shaders-reinventing-the-geometry-pipeline/


Sampler Feedback
Sampler Feedback enables better visual quality, shorter load time, and less stuttering by providing detailed information to enable developers to only load in textures when needed.

Suppose you are a game developer shading a complicated 3D scene. The camera moves swiftly throughout the scene, causing some objects to be moved into different levels of detail. Since you need to aggressively optimize for memory, you bind resources to cope with the demand for different LODs. Perhaps you use a texture streaming system; perhaps it uses tiled resources to keep those gigantic 4K mip 0s non-resident if you don’t need them. Anyway, you have a shader which samples a mipped texture using A Very Complicated sampling pattern. Pick your favorite one, say anisotropic.

The sampling in this shader has you asking some questions.

What mip level did it ultimately sample? Seems like a very basic question. In a world before Sampler Feedback there’s no easy way to know. You could cobble together a heuristic. You can get to thinking about the sampling pattern, and make some educated guesses. But 1) You don’t have time for that, and 2) there’s no way it’d be 100% reliable.

Where exactly in the resource did it sample? More specifically, what you really need to know is— which tiles? Could be in the top left corner, or right in the middle of the texture. Your streaming system would really benefit from this so that you’d know which mips to load up next.c

Sampler feedback solves this by allowing a shader to efficiently query what part of a texture would have been needed to satisfy a sampling request, without actually carrying out the sample operation. This information can then be fed back into the game’s asset streaming system, allowing it to make more intelligent, precise decisions about what data to stream in next. In conjunction with the D3D12 tiled resources feature, this allows games to render larger, more detailed textures while using less video memory.

Sampler feedback also enables Texture-space shading (TSS), a rendering technique which de-couples the shading of an object in world space from the rasterization of the shape of that object to the final target.

TSS is a technique that allows game developers to do expensive lighting computations in object space, and write them to a texture— for example, something that looks like a UVW unwrapping of the object. Since nothing is being rasterized the shading can be done using compute, without the graphics pipeline at all. Then, in a separate step, bind the texture and rasterize to screen space, performing a dead simple sample. This approach reduces aliasing and allows computing lighting less often than rasterization. Decoupling these two rates allows the use of more sophisticated lighting techniques at higher framerates.


Setup of a scene using texture-space shading
One obstacle in getting TSS to work well is figuring out what in object space to shade for each object. Everything? That would be hardly efficient. What if only the left-hand side of an object is visible? With the power of sampler feedback, the rasterization step can simply record what texels are being requested and only perform the application’s expensive lighting computation on those.



https://devblogs.microsoft.com/directx/coming-to-directx-12-sampler-feedback-some-useful-once-hidden-data-unlocked/
 

wintersouls

Member
Jan 26, 2020
157
228
220
XseX is only 15% faster in TF count than ps5, if the price of the XseX is $100 more than ps5 do you think is worth it?

Do you buy a console for TF or for games? I did not buy Switch thinking about TF if not games.

On the other hand, it has already been said several times by real hard experts that you can read on the network that PS5 has better things than XBOX X because of its new architecture. Not everything is TF and until the first games are seen, they are not you will see.
 

LordKasual

Member
Jul 28, 2016
5,545
1,083
430
If the performance of the PS5 holds up as much as we expect it to, then yes, the XSX is inefficient.

But i mean it doesn't really matter that much. It would just mean that in the future, more consoles would adopt PS5's architecture, although im sure that's coming anyway.
 

dEvAnGeL

Member
Feb 11, 2012
3,132
52
625
Do you buy a console for TF or for games? I did not buy Switch thinking about TF if not games.

On the other hand, it has already been said several times by real hard experts that you can read on the network that PS5 has better things than XBOX X because of its new architecture. Not everything is TF and until the first games are seen, they are not you will see.
You seem angry, very angry, play DOOM ETERNAL as therapy. ✌
 
  • LOL
Reactions: MiyazakiHatesKojima

wintersouls

Member
Jan 26, 2020
157
228
220
You seem angry, very angry, play DOOM ETERNAL as therapy. ✌

¿What?

I don't know what his job is. But I hope it has nothing to do with valuing people's mood, because you're not very good at it


I recommend growing flowers as an alternative to playing fortune tellers on the mood of strangers.
 
Last edited:

ThatGuy707

Member
Jun 29, 2012
60
6
440
First of Microsoft did do a damn fine job! They are kicking goals already with XSX.

That being said, I'm pretty sure it was the other way round with months of being told 9 TF for PS5. I feel it was more Sony fans getting a kicking from Xbox fans.

But just be clear, that presentation from Cerny was not meant for consumers it was only because Coronavirus that the GDC event for developers was cancelled.. It was just a bad decision on their part to be so silent and then have every Tom, Dick and Harry watching on the edge of their seat for a glimpse of the PS5 because it was streamed and they put it out in a tweet. This presentation was not meant for you or me or any general consumer not involved in developing games.
So it was devs they were trying to convince to ignore the TFlops? I mean it was clearly for consumers. They didn't even give specific numbers about how the PS5 will throttle its performance. I am sure devs need to know that. Sony def knew consumers would be watching this.
 

pyrocro

Member
Mar 19, 2020
90
109
185
So it was devs they were trying to convince to ignore the TFlops? I mean it was clearly for consumers. They didn't even give specific numbers about how the PS5 will throttle its performance. I am sure devs need to know that. Sony def knew consumers would be watching this.
The DX12 ultimate videos Ms released are for developers, Sony presentation was not developer-centric.
It was more a way of managing consumer expectations.
 
Last edited:

Xaero Gravity

Member
May 12, 2013
12,752
11,678
1,140
Canada Eh
Its not this, its just loads of stuff you have put lately. Its like all the speculation just made some people crack lately lol.
I've noticed that too. Normally cool people have suddenly snapped and turned into raving fanboys. It's like they're being poossessed possessed by the spirit of OutrageousFarts.
 
Oct 26, 2018
10,411
12,512
590
I've noticed that too. Normally cool people have suddenly snapped and turned into raving fanboys. It's like they're being poossessed possessed by the spirit of OutrageousFarts.
For sure. I love it. Puts the forum back in balance.

But this will die off. I'll give it one more week, and all these discussions in these next gen threads will tone down. I'm already getting beat, so I'v focused on the Meme thread as typing stuff out about specs and insider BS takes a toll.

There's no way it can go on for another 7-8 months like this.

It actually doesn't help when every nuance about SeX and PS5 gets a separate thread. Someone can find one system has balloon packaging and the other styrofoam and that would be a new thread.
 
  • Like
Reactions: Connxtion

Shmunter

Gold Member
Aug 25, 2018
3,516
6,863
625
For sure. I love it. Puts the forum back in balance.

But this will die off. I'll give it one more week, and all these discussions in these next gen threads will tone down. I'm already getting beat, so I'v focused on the Meme thread as typing stuff out about specs and insider BS takes a toll.

There's no way it can go on for another 7-8 months like this.

It actually doesn't help when every nuance about SeX and PS5 gets a separate thread. Someone can find one system has balloon packaging and the other styrofoam and that would be a new thread.
7-8 months if we're lucky with the bast soup fever and impending financial ruin.

But this battle will evolve. Once Xbox Vs PS clans stop taking shots at each other and all that tapers off, a new phase of Xbox & PS unity will emerge. This new coalition will mount an attack on the PC peasants and their LOL hard drives and SATA SSD's.

Plans need to be made, battle lines drawn. Mindsets will be altered - by force of repetition 24/7. Have at you!
 
Last edited:

Shin

Gold Member
Feb 4, 2013
5,646
4,123
995
Considering that it was Xbox's goal from the start (CU and architectural gains aside), I reckon Sony probably went about it the same way.
Their goals were and/or are different, whatever ended up being the CU count is a side effect of their design philosophies.

"12 TFLOPs was our goal from the very beginning. We wanted a minimum doubling of performance over Xbox One X to support our 4K60 and 120 targets. And we wanted that doubling to apply uniformly to all games," explains Andrew Goossen. "To achieve this, we set a target of 2x the raw TFLOPs of performance knowing that architectural improvements would make the typical effective performance much higher than 2x. We set our goal as a doubling of raw TFLOPs of performance before architectural improvements were even considered - for a few reasons. Principally, it defined an audacious target for power consumption and so defined our whole system architecture.
 

Mendou

Anti-Semite/Xenophobic - report me if I use slurs
Jan 31, 2020
179
497
270
The narrative of faster and narrow and more balanced has to stop.
This is simply history repeating itself with the show on the other foot.

Some facts

The Xbox One launched with 12 CU's at 933mhz
The PS4 launched with 18 CU's at 800mhz..

The PS4 had 50% more CU count than the Xbox One
The Xbox had 16% more speed.

The XSX has 52 CU's at 1823mhz
The PS4 has 36 Cu's at up to 2233mhz

The XSX has 44% more CU count than PS5
The Ps4 has up to 22% more speed.

We've heard wasted cycles, and balanced before.




That's from 2013 Xbox One interview Digital Foundry did when MS was getting slammed for having 12 vs 18 Cu's.

We all know how this ended up working.
We all know how this played across the generation.

Stop hoping for something different this gen.
Straight up facts. I guess what goes around comes around this next gen.
 
  • LOL
Reactions: kareemna