• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Inside the Scorpio Engine: the processor architecture deep dive

jroc74

Phone reception is more important to me than human rights
We already know that Scorpio will pretty much run everything in 4K other than some 720p titles which will be checkerboard 4K, as stated in this article.

That's proof in the pudding that this thing is more powerful than the Pro...

Proof was the specs themselves.

I honestly dont think no one is claiming the Pro is more powerful. Maybe some are insinuating the Pro isnt that bad in comparison. And I dont think thats an issue either.

Doesnt the XBO have some type of cpu advantage over the PS4? It came up/comes up in debates too and I dont think it was a problem then.

Its just a specs debate, that some are getting testy about.
 

jaypah

Member
Say what you want but MS is absolutely genius. They took "Gaf/Reddit" mainstream. They threw napalm on a simmering oil fire. All this chip deep dive architecture shared memory bus DCC jargon although valid is the new this vs that. I don't ask my Doctor how certain chemicals in my prescriptions bind to particular receptors. I just know it works. Can't wait till E3!

So taking GAF talking points to DF is going mainstream? Lmao, nah dude. It's like going from Marrero to Westwego.
 

Crayon

Member
It's not a secret. It's an undeniably useful capability that the PS4 Pro has, and the Xbox One does not. It doesn't magically make the PS4 Pro stronger than Scorpio (and no-one in this thread is suggesting that)... but it does help.

Sure it's real. But the best case performance boost, however much that ends up being, will take some amount of special attention to leverage. The more effort put in, the closer to the full potential of the feature can be used. I see the outcomes, all informed solely by my experience from years of online videogame drama:

1. Teams that are willing and able to get the most out of the feature will likewise be putting in extra effort to leverage other hardware features and general efficiency of their game. These are probably going to be first parties. Their games are going to look the best because of an overarching commitment to demonstrate the prowess of the hardware. It will be nigh impossible to deduce the benefit of the one specific secret sauce feature in an exclusive game that's pushing the hardware in every direction.

2. A neat use of the secret sauce feature that is easy to implement without much time or imagination could be found. It becomes very common in pro patches but then it gets over used and we a all get sick of it and start complaining.

3. Really specific information about how this feature is used will be hard to come by and could even reveal contradictory reports. Can take on a vaguely mythical quality.

Soon enough, we will be able to compare scorpio games to pro games in the flesh. Compared to how revealing and informative that is going to be, talk of fp16 or jaguar evolved or the like is bench racing. I'm open minded and I find it fun but I can't take it too seriously.
 

KageMaru

Member
Based on dev comments here, it seems well established that the majority of focus will remain on PS4/XBO since that is where the install base is. So while disappointing, I wonder if they didn't bother with FP16 due to the assumption that these mid gen refreshes won't receive the attention necessary to make it worth while.

Maybe fp16 doesnt matter, maybe it does. We will see when we get to compare gears to los2, forza to gt etc

Can't really agree with using exclusives to compare two pieces of hardware. Different engines, priorities, teams, budgets, etc. all make comparing two different exclusives extremely difficult. This is especially true when these exclusives won't ever appear on the other console, which would be the only real way to use them in comparisons.

A good example is Forza and GT. GT4 looked better than Forza even though the Xbox was ~3x more powerful than the PS2. I'm sure Turn 10 has improved a lot from their first game, but there will still be a difference in shaders, assets, lighting, etc between Forza 7 and GT Sport making the comparison impossible.
 

jroc74

Phone reception is more important to me than human rights
Next it will be whether or not 1800p and checkerboarding is distinguishable from native 4K

Thats already been discussed in Pro threads. The threads where DF did the comparisons. But of course it will come up again. Those threads will be something else, lol.

Happy easter everyone!

Happy Easter. My kids are happy this weekend. And I got tons of Easter candy to eat. Hmmm.
 

geordiemp

Member
Next it will be whether or not 1800p and checkerboarding is distinguishable from native 4K

Funnily enough on my 55 inch 4K living room at 8 ft it is actually really hard to see unless you screen grab and stare, and yes I have allot of differing pro games from native 4K to 1800c, heck even 1440 looks pretty good and its hard to see.

I am much more susceptible to frame rate above 1080p....A MUCH better difference would be 60 FPS instead of 30, but I dont think Pro or Scorpio are powerful enough to do that above 1080p for 30 FPS limited games. My view has not changed since Neo reveal, and playing pro patched games has not changed it

I was disappointed in Pro having Jag, also in Scorpio having jag......But appreciate why.......Next gen eh...happy easter
 
I still have no idea what effect any of this shit will have in game and the more people argue about it, the more I think it's hypothetical dick waving that won't have any real effect when people sit down to play something on their PS4 Pro or Xbox One Two.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Ok, my post wasn't entirely correct, but neither is just comparing the mantissa and forgetting the expoent, and the extra bits of the fp32 expoent also gives it way more decimal places specially for larger numbers.
Of course. I was just being anally retentive.

But to be completely correct: going fp32 to fp16 reduces the range of the numbers you can represent by a factor of over 65k
True. Then again, at the top end of the exponent fp32 represents 128 binary digits, of which only the leading 24 digits can be anything else than zero, so its humungous range is rather irrelevant for most graphics purposes, and more practical ranges are limited to about 20-26 binary (integer) digits. In comparison, fp16's meaningful top end goes to 8-12 binary (integer) digits, so I'm not sure a factor of 64K or beyond from there would get us anything particularly meaningful in fp32 in the context of graphics.
 
Cool article. Downplay on power of scorpio because it doesnt utilize half precision??? Wut....

L
M
F
A
O

..sigh...

Anyway, the more i hear them talk about their process of building and directing how scorpio will work, i am way more certain on a 399 price point.

I cant wait
 

Inuhanyou

Believes Dragon Quest is a franchise managed by Sony
I think if FP16 was good for gaming a solution would have been made at least a decade ago.

FP16 has been used in gaming for many years. PS3 used it a lot for as that had RSX cell combo. Only AMD has not used it for a while
 

ethomaz

Banned
I think there are a lot of misunderstanding here...

1. Pro is really a 8.4 FP16 TFs machine while it is a 4.2 FP32 TFs machine... both are true because are two different ways of measure raw power... Scorpio is a 6TFs machine in both FP32 and FP16... any side trying to damage control is just that trying...

2. FP16 is actually not used in games except for mobile phones because it really decrease the quality of the final graphics rendered by GPU... ATI in the past showed FP24 was the minimum that the graphical difference will be perceptible for games... so by old ATI claim FP32 to FP24 is imperceptible but FP16 is really a downgrade in graphics compared with FP32 or FP24... that is really a olde claim... tech envolved together with the number of pixels on screen... that means today it is possible to FP32 vs FP24 be perceptible for us gamers...

3. Said the point 2... there are cases that you don't need FP32 and you can use FP16 without post quality... cases where FP16 and FP32 delivery the same graphic result or the result is close enough that 90% of the gamer won't see difference... these cases can take advantage of Pro 2x faster FP16 operations.

4. At the end Scorpio is 43% stronger than Pro and that won't change with any magical sauce but if the devs can use FP16 in these imperceptible cases then Pro can close the power gap compared with Scorpio... the part is that I don't believe 30% of the graphics tasks can be chance from FP32 to FP16 without lose quality... in the end this FP16 faster unit will help in a avg. of 10-20% of graphics processing... that is obvious a Pro advantage but not nowhere close to kill the power gap.
 

godhandiscen

There are millions of whiny 5-year olds on Earth, and I AM THEIR KING.
The FP16 instructions requires devs to code specifically for it. Microsoft had the option to choose hardware customizations and went for the DX12 instruction optimization since it would improve the already existing catalogue and any future titles built on DX12. Scorpio will most likely have a huge advantage on multi platform titles built on DX12, more than the regular 30% in TF difference. However Sony exclusives that take advtage of the FP16 instructions could close the visual gap.


Basically Sony is pulling the special sauce now. We'll see how it goes. During the PS3 era first party titles managed to take advantage of the cell.
 

ethomaz

Banned
The FP16 instructions requires devs to code specifically for it. Microsoft had the option to choose hardware customizations and went for the DX12 instruction optimization since it would improve the already existing catalogue and any future titles built on DX12. Scorpio will most likely have a huge advantage on multi platform titles built on DX12, more than the regular 30% in TF difference. However Sony exclusives that take advtage of the FP16 instructions could close the visual gap.


Basically Sony is pulling the special sauce now. We'll see how it goes. During the PS3 era first party titles managed to take advantage of the cell.
DX12 optimizations are the same of XB1 and XB1S... it is nothing different from what low level PS4/Pro API does (or any low level API does... even the customized command processor is already there to any dev or publish use).

DF tried to make it something new and amazing but it is not.

Even using DX12 the dev at end will need to optimize to low level PS4 API that will give the same beneficies/advantages of DX12 (or even more because DX12 needs to prove itself yet).
 

Interesting tidbit on DCC

3) Try 32-bit floating point depth buffer formats (D32F) instead of 16-bit (D16) for better performance.

D32Fs actually may compress smaller than D16s when used as shader resources, and compress exactly the same way when not shader compatible. They are only different in allocation size and bandwidth when decompressed, which typically isn’t too frequent (but may happen when a dense mesh with many micro-triangles is rendered into a small screen-space area). D32F also allows you to use reverse Z for added precision, so that can be leveraged for nearly free. Keep in mind that on GCN, there’s no such thing as a real 24-bit depth target. Under the hood, those are handled as 32-bit, just with 8 bit of precision thrown away – so there’s no cost in switching from D24 to D32 targets.
 

ethomaz

Banned
Interesting tidbit on DCC
This is old based in GCN FP32 and FP16 delivering the same performance.

With GCN 5.0 FP16 is twice faster than FP32 and that change everything... so it not about memory and bandwidth only anymore but performance included which the link didn't cover because it is something new in the hardware.

- GCN 5.0 FP16 give you 2x performance, 1/2 memory and bandwidth use.

- GCN 4.0 or below give you 1/2 memory and bandwidth use.
 

dr_rus

Member
I think it'll likely head higher towards 50% as devs gain more familiarity in how to handle it. From what I've read 2xFP16 will see worth while usage as time goes on, especially for console devs.
I seriously doubt that. Most of figures in range of +50% come from people describing the speed up they got from using FP16 for some specific part of code so this is basically a best case scenario. When it is added to the rest of code which remains FP32 it will result in a lot smaller overall gain. People should really be expecting no more than 25% overall here.

I could easily have misunderstood, but I thought what you describe is the effect of using FP16 at all, regardless of rate. What then is the difference between using FP16 and using packed FP16?
I was talking about the Pro's memory bandwidth which is only 218GB/s compared to Scorpio's 326GB/s which is basically 50% more. This difference will mean much more for Scorpio than its flops advantage as this difference specifically is what makes native 4K possible for example.

Usage of packed FP16 registers will help on all GCN3+ h/w but it's not about bandwidth, it's about the ability to perform FP32 math on FP16 data with much less latency which results in a speed up. The bandwidth remains the same.

This is old based in GCN FP32 and FP16 delivering the same performance.

With GCN 5.0 FP16 is twice faster than FP32 and that change everything... so it not about memory and bandwidth only anymore but performance included which the link didn't cover because it is something new in the hardware.

- GCN 5.0 FP16 give you 2x performance, 1/2 memory and bandwidth use.

- GCN 4.0 or below give you 1/2 memory and bandwidth use.

Internal processing precision have nothing to do with memory bandwidth or use or surface formats for the matter.

Moreover, GCN5 FP16 won't give you 2x performance.

But even if the above was correct, you wouldn't be able to get both 2x performance and 1/2 bandwidth use as 2xFP16 = 1xFP32 in bandwidth use.
 

ethomaz

Banned
Internal processing precision have nothing to do with memory bandwidth or use or surface formats for the matter.

Moreover, GCN5 FP16 won't give you 2x performance.

But even if the above was correct, you wouldn't be able to get both 2x performance and 1/2 bandwidth use as 2xFP16 = 1xFP32 in bandwidth use.
Of course you are using two instructions instead of one... the memory use will be the same lol

Using the same memory and bandwidth to execute 2 instructions instead one in FP16 means you are using half of the memory and bandwidth if you use 2 instruction in FP32.
 

timlot

Banned
So we should start seeing native 4K games from PS4 Pro right? I mean that's elephant in the room concerning Pro's performance. If 1st and 3r party AAA titles like Horizon continue to be checkerboarded then obviously FP16 isn't changing that narrative.
 

ethomaz

Banned
So we should start seeing native 4K games from PS4 Pro right? I mean that's elephant in the room concerning Pro's performance. If 1st and 3r party AAA titles like Horizon continue to be checkerboarded then obviously FP16 isn't changing that narrative.
Both console didn't hit the power for 4k.

What you will see are 1080p games on XB1 delivering 4k on Scorpio while 1080p on PS4 won't hit 4k on Pro... save non-demanding games on both like sports, racing, indies, etc that will hit 4k.

2x FP16 helps in a few cases but it won't change that... even Scorpio having 2x FP16 won't make that difference too.
 

dr_rus

Member
Of course you are using two instructions instead of one... the memory use will be the same lol

Using the same memory and bandwidth to execute 2 instructions instead one in FP16 means you are using half of the memory and bandwidth if you use 2 instruction in FP32.

Memory and bandwidth consumption has no relation to shader math precision. Usage of FP16 improves on chip latency as you can perform two operations instead of one (same instruction for both, btw, so no, not "2 instructions") but the result written from the pipeline to memory will be the same as with FP32 since this is what you're gunning for in the first place. You're confusing two different things - internal shader processing precision and external data storage format.
 
Memory and bandwidth consumption has no relation to shader math precision. Usage of FP16 improves on chip latency as you can perform two operations instead of one (same instruction for both, btw, so no, not "2 instructions") but the result written from the pipeline to memory will be the same as with FP32 since this is what you're gunning for in the first place. You're confusing two different things - internal shader processing precision and external data storage format.

No, no, we're blitting our shader microcode to DCC encoded RTs so that its losslessly encoded and we get major bandwidth savings on the fetches, reducing I$ stalls, which is how we get to 12 TFLOPS.
 

ethomaz

Banned
Memory and bandwidth consumption has no relation to shader math precision. Usage of FP16 improves on chip latency as you can perform two operations instead of one (same instruction for both, btw, so no, not "2 instructions") but the result written from the pipeline to memory will be the same as with FP32 since this is what you're gunning for in the first place. You're confusing two different things - internal shader processing precision and external data storage format.
No... I'm not.

Use of FP16 instead FP32 decrease memory and bandwidth use by half (without compression or tricks) and you saying you can execute the same instruction instead of 2 differents didn't impact anything for graphics because graphic instructions are all the same executed in parallel.

No, no, we're blitting our shader microcode to DCC encoded RTs so that its losslessly encoded and we get major bandwidth savings on the fetches, reducing I$ stalls, which is how we get to 12 TFLOPS.
Exactly... FP16 uses less memory and bandwidth than FP32.

If you can do that twice faster (what GCN 5.0, P100 and mobile GPU does) then it is is the best actual hardware scenario.
 

leeh

Member
No, no, we're blitting our shader microcode to DCC encoded RTs so that its losslessly encoded and we get major bandwidth savings on the fetches, reducing I$ stalls, which is how we get to 12 TFLOPS.
Wat8.jpg
 

godhandiscen

There are millions of whiny 5-year olds on Earth, and I AM THEIR KING.
DX12 optimizations are the same of XB1 and XB1S... it is nothing different from what low level PS4/Pro API does (or any low level API does... even the customized command processor is already there to any dev or publish use).

DF tried to make it something new and amazing but it is not.

Even using DX12 the dev at end will need to optimize to low level PS4 API that will give the same beneficies/advantages of DX12 (or even more because DX12 needs to prove itself yet).
No that is not true, the X1 doesn't have all of the DX12 optimizations. For Scorpio, Microsoft added more hardware baked shortcuts after identifying common instructions in current engines. DX12.1 wasn't even used when the X1 shipped. The XB1 has some but Scorpio will have additional ones, which is why the scaling of 900p games to 4K is possible. Unless new game engines are made in the next two years, those optimizations will be beneficial.
Also, not all devs will optimiza for FP16. If an effect cannot be computed in FP16, it won't be rewritten to fit in. The precision gap between FP16 and FP32 is not just twice, it is an order of maginute. A shader that looks good in FP32 would look pixelated or perform much slower in FP16. Shaders have to be designed to fit FP16.
 

ethomaz

Banned
No that is not true, the X1 doesn't have all of the DX12 optimizations. For Scorpio, Microsoft added more hardware baked shortcuts after identifying common instructions in current engines. DX12.1 wasn't even used when the X1 shipped. The XB1 has some but Scorpio will have additional ones, which is why the scaling of 900p games to 4K is possible. Unless new game engines are made in the next two years, those optimizations will be beneficial.
Also, not all devs will optimiza for FP16. If an effect cannot be computed in FP16, it won't be rewritten to fit in. The precision gap between FP16 and FP32 is not just twice, it is an order of maginute. A shader that looks good in FP32 would look pixelated or perform much slower in FP16. Shaders have to be designed to fit FP16.
Source about these new DX12 instructions?

The improvements from XB1 to Scorpio are the same found from GCN 1.0 to GCN 4.0 (these are included on PS4 to Pro too).

Where you saw a exemple of 900p game on XB1 being 4k on Scorpio?
 

JaggedSac

Member
So here is what we know so far.

Scorpio has a 6 inch penis with decent girth.

PS4 Pro has a 4 inch penis with not as much girth.

However during certain scenarios PS4 Pro can double it's length to 8 inches.
 

Syrus

Banned
So here is what we know so far.

Scorpio has a 6 inch penis with decent girth.

PS4 Pro has a 4 inch penis with not as much girth.

However during certain scenarios PS4 Pro can double it's length to 8 inches.


Lol. Its what people believe but in actuality maybe if Pro flexes its penis it will be 4.2 inches
 
Comparing scorpio to ps4 pro exclusives isnt going to work because sony 1st party developers are better and they have a higher minimum they can target
 
So here is what we know so far.

Scorpio has a 6 inch penis with decent girth.

PS4 Pro has a 4 inch penis with not as much girth.

However during certain scenarios PS4 Pro can double it's length to 8 inches but when it does it get's even thinner, like a pencil or stretch armstrongs fist.

FTFY
 

Crayon

Member
So here is what we know so far.

Scorpio has a 6 inch penis with decent girth.

PS4 Pro has a 4 inch penis with not as much girth.

However during certain scenarios PS4 Pro can double it's length to 8 inches.

If doubling of length persists for more than four hours, consult your physician.
 

Marmelade

Member
Oh there's a game dev in the thread! (I might be wrong but BriareosGAF worked on Infinite Warfare)
Apologies to other game devs already present

No, no, we're blitting our shader microcode to DCC encoded RTs so that its losslessly encoded and we get major bandwidth savings on the fetches, reducing I$ stalls, which is how we get to 12 TFLOPS.

I didn't understand a word of that but I'm pretty sure it was tongue in cheek because of that 12TF figure..

So what are your thoughts on the FP16 debate?
 

godhandiscen

There are millions of whiny 5-year olds on Earth, and I AM THEIR KING.
Source about these new DX12 instructions?

The improvements from XB1 to Scorpio are the same found from GCN 1.0 to GCN 4.0 (these are included on PS4 to Pro too).

Where you saw a exemple of 900p game on XB1 being 4k on Scorpio?

Here:
[UPDATE 7/4/17 20:44: Microsoft's Andrew Goossen has been in touch to clarify that D3D12 support at the hardware level is actually a part of the existing Xbox One and Xbox One S too. "Scorpio builds on the Command Processor capability present in the original Xbox One," we're told. "Our implementation of D3D12 supports all Xbox Ones, and games have already shipped that use it. When a game using D3D12 starts up, we reprogram the GPU's Command Processor front-end. The 50 per cent CPU rendering overhead improvement was reported by shipping games. The amount of win is dependent on the game engine and content, and not all games will see that size of improvement. Scorpio's Command Processor provides additional capability and programmability beyond what Xbox One/Xbox One S can do. We plan to take advantage of this in the future."]

The 900p games scaled to 4K is a promise seen only on their profiler.
But the proof of the pudding is in the eating. Specs are one thing, but Microsoft is promising that both 900p and 1080p Xbox One games should be able to run at native 4K on Project Scorpio. We needed to see validation of this, meaning we needed to see software - a tough call so many months out from release.
Scorpio doesn't have the TF to brute force 900p games running at 4k but they have achieved it thanks to the additional customizations they added to the hardware as a whole.
 

ethomaz

Banned
Here:


The 900p games scaled to 4K is a promise seen only on their profiler.

Scorpio doesn't have the TF to brute force 900p games running at 4k but they have achieved it thanks to the additional customizations they added to the hardware as a whole.
The quote just shows except the improvements from GCN 1.0 to GCN 4.0 everything is exactly like XB1 and XB1S.

And the promise about 900p to 4k is just that... a promise... let's wait and see any real example.

I will be the first to give the proper congrats if they showed a Halo 5 or Gears 4 native 4k.

No its not , not in the console space
Not in gaming space to be fair.

FP16 is a deal breaker on computacional tasks... that is why Tesla P100 do it 2x faster years ago.
 
Top Bottom