• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Inside the Scorpio Engine: the processor architecture deep dive

timlot

Banned
?
Nobody's gonna make a game on the Pro using FP16 exclusively

But, that is where the hidden power is. Mark Cerny, "With half-floats, it's now double that, which is to say, 8.4 teraflops in 16-bit computation. This has the potential to radically increase performance."

If I pay for a PS4 Pro I want to see this radical increase in performance the Lead System Architect of the PS4 (per his twitter description) says can be achieved by using 16-computation.
 
?
Nobody's gonna make a game on the Pro using FP16 exclusively

After reading through page after page of this nonsense this is where I stand too. Sure, in theory doubling the FP16 calculations could be quite useful in certain situations. In practice, how many games are going to take advantage of that when its extra work you don't need for PS4, XBone, Scorpio and a whole bunch of PCs?
 

Marmelade

Member
But, that is where the hidden power is. Mark Cerny, "With half-floats, it's now double that, which is to say, 8.4 teraflops in 16-bit computation. This has the potential to radically increase performance."

If I pay for a PS4 Pro I want to see this radical increase in performance the Lead System Architect of the PS4 (per his twitter description) says can be achieved by using 16-computation.

"This has the potential to radically increase performance" in scenarios where you can get away with using half precision
 

c0de

Member
But, that is where the hidden power is. Mark Cerny, "With half-floats, it's now double that, which is to say, 8.4 teraflops in 16-bit computation. This has the potential to radically increase performance."

If I pay for a PS4 Pro I want to see this radical increase in performance the Lead System Architect of the PS4 (per his twitter description) says can be achieved by using 16-computation.

The same was said about gpgpu.
 

timlot

Banned
"This has the potential to radically increase performance" in scenarios where you can get away with using half precision

His quote end at "radically increase performance." There was no caveat. If I was Pro owner I would be demanding developers use Pro to the full potential that the Lead Architect of the system says is possible. Why are they having to checkerboard with a 8.4TF GPU?
Use the radical increase in performance method that Mark Cerny has suggested and lets see that 8.4TF and not talk about it.

"One of the features appearing for the first time is the handling of 16-bit variables - it's possible to perform two 16-bit operations at a time instead of one 32-bit operation," he says, confirming what we learned during our visit to VooFoo Studios to check out Mantis Burn Racing. "In other words, at full floats, we have 4.2 teraflops. With half-floats, it's now double that, which is to say, 8.4 teraflops in 16-bit computation. This has the potential to radically increase performance."

http://www.eurogamer.net/articles/d...tation-4-pro-how-sony-made-a-4k-games-machine
 

ethomaz

Banned
SappYoda posted a way better example of FP16 vs FP32 than mine... it is a good article.

http://m.neogaf.com/showpost.php?p=234027793

I found an interesting article written in 2004 about FP16 vs FP32

http://www.hwupgrade.it/articoli/skvideo/1013/radeon-x800-e-il-momento-di-r420_15.html

Here sample translated using google:
Since the VS and PS 2.0 have introduced the floating point calculation there have always been many discussions about which was the standard regarding the number of bits to be used. ATI has chosen the path of 24 bits per component which leads to a total precision to 96 bits (24 bits * 3 (RGB) + 24 bits (alpha)). According to the Canadian company this mode offers a good compromise between quality and speed. NVIDIA, however, has embarked on a more dynamic road: 16 bit and 32 bit. If a shader does not foresee too complex calculations to the American Society it believes that the FP16 precision is more than enough. Otherwise it is possible to use 32 bits per channel for a total of 128 well-bit.Tuttavia, the main problem in the architecture of NV3x has always been that by enabling the FP32 calculation is assisted to a real collapse of performance compared to FP16 mode . Moreover, what it was to be expected given that a higher precision requires a greater number of registers and a higher bandwidth.

In that article there are some screenshot comparison using Far Cry. Can you spot the difference?

FP16:
shader_nv40_16_1.jpg


FP32:
shader_nv40_32_1.jpg


FP16:
shader_nv40_16_2.jpg


FP32:
shader_nv40_32_2.jpg


And here are some benchmarks with FP16 vs FP32:
http://www.hwupgrade.it/articoli/skvideo/1013/radeon-x800-e-il-momento-di-r420_19.html
 

c0de

Member
And? It is being put to good use. In UC4 for e.g.:
https://twitter.com/jqgregory/status/663196554883297280


And I'm sure many other games use it too. HZD uses GPU-based procedural placement and from what I understand, according to the GDC slides, it's traditionally done on the CPU.

Of course it is used but games are still limited by the CPU because gpgpu is nota universal tool where you can just throw load at and it magically runs better than before.
 

ethomaz

Banned
On mobile but was anyone able to highlight the screenshot differences? I couldn't see anything obvious.
There are differences...

FP16 pictures is less sharper, the lighting lose intensity and there are artifacts not present in the FP32 picture... FP32 picture is clean with better reflections.
 
There's nothing specific I can discuss (and my comments should not be construed as the official position of my employer, etc.), but I will say it's one of the more fun things in the job to be on the leading edge of disclosure with new hardware. And where I am in my career now one of the few challenges left is to participate in the design process for hardware, so it's been fun seeing tastes of that through time since we work so closely with IHV partners. We also have some incredibly talented engineers on staff here who take it as a personal challenge to discover features and nuances in the hardware before first party does, which is always fun.
Let me just say, apart from all this Scorpio folderol, this is a very fascinating peek into the world of high-end developers. Thanks so much for putting up with the goofball posts of inexpert folks like me, and continuing to participate.

Wait, what? ...In both cases you're looking at the same bandwidth usage as two FP16 numbers will consume the same bandwidth as one FP32. You're winning in latency of shader operations which basically means that you're winning in performance.

Buffers is generally something which reside in device memory (VRAM) and they have no relation to shader processing as you're writing results from the shader execution to some surface format which in 99% of cases is either 32 or 64 bits already.
Let me try to explain what I meant again:

Double-rate FP16 increases performance, for the tasks where it's applicable. I'm trying to understand what visible effect this extra performance could be used for. I suggested higher-quality shader work (i.e. double the computations), but you say they typically have to run in FP32 to avoid artifacting, so this isn't an option. What is then? I don't mean, "What tasks can double-rate FP16 be used for?", but "The benefits of double-rate FP16 will have what visible effect?" Is higher framerates the result? Something else?

After reading through page after page of this nonsense this is where I stand too. Sure, in theory doubling the FP16 calculations could be quite useful in certain situations. In practice, how many games are going to take advantage of that when its extra work you don't need for PS4, XBone, Scorpio and a whole bunch of PCs?
The thing is, it's not necessarily extra work. Multiple examples have been given in the thread about games that use FP16 already, for some small portions of their pipeline. Now, PS4 Pro (and Switch) can run those at double speed. Games will use this; some already are.

More important to the impact is the question of how much improvement can a game see by using it--1%? 10%? 30%?--and how will that manifest? Understandably there are very few developers willing to commit to definite answers, because each game's situation differs, both from other games and even versus itself over time. The only consensus is that FP16 is one of the tools in everyone's arsenal, and double-rate calculation will improve performance of FP16.

Ultimately, the question is only of academic interest (which doesn't mean it's of no interest). Even waiting to see real games won't tell us more. Without very explicit postmortems we won't know which portions of the end result were possible due to double-rate FP16, rather than some other optimization. And speaking personally, most such postmortems that include sufficient technical detail often surpass my ability to parse them.

There's still much satisfaction to be gained hearing experts talk to each other, though, in any field.
 

MilkyJoe

Member
There are differences...

FP16 pictures is less sharper, the lighting lose intensity and there are artifacts not present in the FP32 picture... FP32 picture is clean with better reflections.

So in a nut shell, FP16 looks worse than 32, so it's a pointless conversation?
 

DeeBatch

Member
Huhuh 😂 😂 😂 😂 😂 😂 😂 😂 😂 😂

lol you wish it was the same a Ps4 pro.. The jaguar customization inside Scorpio is much much more hence why Scorpio will have native 4k 60 fps games.. Somebody disputing or interpreting different is your proof cmon.. Did you not read the article when they say games that are 900p 30fps if the dev want they can run @ 1080p 60 fps .. Ps4 pro CPU is not able to do this so please stop the CPU's are not the same at all..

"So, eight cores, organised as two clusters with a total of 4MB of L2 cache. These are unique customised CPUs for Scorpio running at 2.3GHz.

The new x86 cores in Scorpio are 31 per cent faster than Xbox One's, with extensive customisation to reduce latency in order to keep the processor occupied more fully, while CPU/GPU coherency also gets a performance uplift. There's significant hardware offloading too - some of which is inherited from Xbox One, some of which is radically new.

"We essentially moved Direct3D 12," says Goossen. "We built that into the command processor of the GPU and what that means is that, for all the high frequency API invocations that the games do, they'll all natively implemented in the logic of the command processor - and what this means is that our communication from the game to the GPU is super-efficient."

Processing draw calls - effectively telling the graphics hardware what to draw - is one of the most important tasks the CPU carries out. It can suck up a lot of processor resources, a pipeline that traditionally takes thousands - perhaps hundreds of thousands - of CPU instructions. With Scorpio's hardware offload, any draw call can be executed with just 11 instructions, and just nine for a state change.

Man, look you don't know what you are talking about so you should stop making factual statements. The Jags inside the Scorpio are not "gutted" and "changed completely." They are slightly modified. And yes, the Scorpio has other small modifications and additions, just like the Pro has. What exactly are you trying to argue?

"A lot of really specific custom work went into this."
Of course, the base hardware designs across the various components and blocks within the Scorpio Engine SoC (system on chip) are indeed based on technology derived from AMD - the CPU technology has been customised to the point where Microsoft doesn't refer to them as Jaguar architecture any more, but that is clearly the starting point from which the Project Scorpio design began.

Adding to the list of enhancements, Microsoft increased performance in CPU/GPU coherency and enhanced and improved the speed of the GPU command processor to offload a lot of work from the CPU too, specifically with DirectX 12 engines.

I don't know what I am talking about? They were heavily customized not slightly. Honestly there is going to be a big difference in graphic quality and frame rates between the 2 consoles anyone thinking otherwise IMO is setting themselves up for disappointment. I don't want to go back and forth on this.. From Info thus far Scorpio is a native 4k console with 900p and 1080p xbox one game engines the majority will be.. Ps4 pro the opposite majority will be Checkerboard and few will be native 4k but without 4k assets or ultra settings. FP16 won't change that we can agree on this right?
 
His quote end at "radically increase performance." There was no caveat. If I was Pro owner I would be demanding developers use Pro to the full potential that the Lead Architect of the system says is possible. Why are they having to checkerboard with a 8.4TF GPU?
And his quote starts at "With half-floats". That is a caveat. At no point does Mr. Cerny say that all work will be possible using half-floats. Indeed, he suggests just the opposite, by saying the double rate has (my emphasis) "potential to radically increase performance", not "this radically increases performance right now". As with other hardware features, it remains up to developers to find ways to leverage it.

So in a nut shell, FP16 looks worse than 32, so it's a pointless conversation?
No, in a nut shell FP16 can't be used everywhere because of issues. However, there are processes where nothing visible is lost using FP16. Which is why games are using it.
 

ethomaz

Banned
So in a nut shell, FP16 looks worse than 32, so it's a pointless conversation?
That was always a fact.

The point is the trade-off make the difference for the performance gains? Some benchs with 6800 Ultra showed a increase from 50 to 60fps using FP16.

That is about 20% performance gain.
 
What's going on in here.

vvvvvvvv

Can someone explain this fp16 and fp32 stuff to me like I'm five?

I'm surprised people are giving any relevance to this discussion... but I want to know if this actually matters.

Is it confirmed that Scorpio doesn't have fp16?

If yes, does it mean it isn't a big leap over PS4 Pro anymore? And why can't Microsoft just add fp16 themselves? It requires special hardware?

Or, can we still comfortably say Scorpio is a beast and all the other engineering put into the console deserves high praise?

I wish you didn't ask this. This is going to make things go south really fast. Hold on to your butt's guys...thread is about to go nuclear.

I'm firmly palming both of my cheeks Hawk269 sama but it's all I can do to hang on. (╬ಠ益ಠ)
 

belvedere

Junior Butler
I had to look but I noticed it. To those claiming any game would ever be 100% FP16, that's not how I recall Cerny explaining its utilization at all. I thought FP16 was in some way used for CB implementation, in specific use cases.
 

ethomaz

Banned
So the OS is a VM? Can anyone explain why?
It is not different from XB1.

There are a main OS using a hypervisor to handle two virtual machine instances:

1) Xbox OS: The mais OS with menus and apps (UWP apps or games runs here) with limited hardware resources.
2) Game OS: The OS that runs the games with access to more hardware resources.

xbox-one-operating-system-architecture-diagram-sdk-leak.jpg
 
The jaguar customization inside Scorpio is much much more hence why Scorpio will have native 4k 60 fps games.
PS4 Pro has native 4K 60fps games, so you're wrong here.

Did you not read the article when they say games that are 900p 30fps if the dev want they can run @ 1080p 60 fps .. Ps4 pro CPU is not able to do this so please stop the CPU's are not the same at all.
There are games that go from 900p 30fps on standard PS34 to 1080p 60fps on Pro, so you're wrong here.

From Info thus far Scorpio is a native 4k console with 900p and 1080p xbox one game engines the majority will be.. Ps4 pro the opposite majority will be Checkerboard and few will be native 4k but without 4k assets or ultra settings. FP16 won't change that we can agree on this right?
We haven't actually seen how many games will be native 4K on Scorpio, but Microsoft's claims don't seem implausible. So with all this you're probably right.
 

LordOfChaos

Member
SappYoda posted a way better example of FP16 vs FP32 than mine... it is a good article.

http://m.neogaf.com/showpost.php?p=234027793



I don't think we can apply the benchmark part to today, there are different degrees of a GPU supporting FP16 operations. Those older cards with split pixel and vertex shaders had an easier time of it as there's a clear distinction of when you can do half precision and not, while a newer unified shader didn't know what it was doing so we went back to FP32 for everything for some time.

Unified shaders have all of the transistors and computational power necessary for FP32 precision. 32-bit values still eat up additional register space, cache room, and (if it comes to it) memory bandwidth, so FP16 is still a boost there, and Rapid Packed Math is another benefit with two values per ALU.

There's another benchmark with the 7800, I'd like to see these for the current crop of GPUs.

shadermark-mse.gif
 

Tripolygon

Banned
There are differences...

FP16 pictures is less sharper, the lighting lose intensity and there are artifacts not present in the FP32 picture... FP32 picture is clean with better reflections.
Not the best gif but this should highlight some of the things you pointed out. With how advanced rendering has become, there are ways to hide these things and developers can use high precision where needed.
 

Space_nut

Member
I think the biggest difference you'll see in Scorpio vs pro are the inclusion of 4k assets for textures, poly meshes, and post effects. That is something only having more ram and bandwidth than base model consoles can provide
 

DeeBatch

Member
PS4 Pro has native 4K 60fps games, so you're wrong here.


There are games that go from 900p 30fps on standard PS34 to 1080p 60fps on Pro, so you're wrong here.


We haven't actually seen how many games will be native 4K on Scorpio, but Microsoft's claims don't seem implausible. So with all this you're probably right.

What games went from 900p 30 fps to 1080p 60fps on ps4 pro? how many titles ?

I know Ps4 pro has native 4k games I said that won't be the case most of the time checkerboarding will be the defacto 4k way on Ps4 Pro
 

Tripolygon

Banned
I think the biggest difference you'll see in Scorpio vs pro are the inclusion of 4k assets for textures, poly meshes, and post effects. That is something only having more ram and bandwidth than base model consoles can provide
Hi again, there is no such thing as 4K assets, poly meshes and no developer is using 4K textures for effects nor post effects. Developers will, however, increase resolution where they see fit, like how you can choose higher settings for AA, AF, AO, Texture resolution (not 4K), high lod level and draw distance.
 

Space_nut

Member
Hi again, there is no such thing as 4K assets, poly meshes and no developer is using 4K textures for effects nor post effects. Developers will, however, increase resolution where they see fit, like how you can choose higher settings for AA, AF, AO, Texture resolution (not 4K), high lod level and draw distance.

Again all will be better on Scorpio due to ram and bandwidth

Not only can Scorpio run Forza 6 at 4k 60fps. It can run it with dynamic weather, 4k textures, lod0 for all cars at all times, max cars on track, ultra settings and have a bit of headroom left
 

onQ123

Member
Here's a thought - could you make a longer post detailing your position to clarify it for people that (you feel) don't understand? Here's some questions that might be worth answering: (1) Which is more powerful, Scorpio, or the PS4 Pro? (2) How much of an advantage d'you think the double-rate FP16 capability offers? (3) How much smaller do you expect the gap between the Pro's GPU (4.2TF at FP32), and Scorpio's GPU (6TF at FP32) to be due to double-rate FP16?

Ok to make it simple PS4 Pro has 36 delivery trucks that move 91 MPH & Xbox Scorpio has 40 delivery trucks that move 117 MPH , each of these trucks can fit 64 packages when they are being packed following the 32-bit rule that packs the truck with 64 boxes that have 1 package in each box but some times packages don't really need a whole box but the 32-bit rule does not allow these smaller packages to be double stuffed into one box each package has to be shipped in that same size box so each truck will move 64 packages each trip. PS4 Pro sometimes break this 32-bit rule & use the 16-bit rule take the packaging peanuts out of the boxes & stuff 2 packages in the box. every package can't be delivered this way & some will break without the packaging peanuts but for the packages that can be shipped this way PS4 Pro can fit up to 128 packages in their delivery trucks instead of 64.
 
That's a good question, actually, I'm not sure I remember the original rationale (e.g. why not patch on the PPU via write combine?), but it's not entirely uncommon to do such things, really, just usually you have a custom front-end processor on the GPU that does DMA, etc., while on the RSX the real cost was the mode switch. Needless to say one of the earliest tasks leveraging the SPU was to move shader program patching there. (Another fun fact: stock Unreal 3 had only one SPU module, and it either did shader program patching or EDGE culling depending on the arguments passed. On Call of Duty we had in excess of 64 SPU modules by the time we finished MW3.)



I think Michal Drobot might discuss our up-sample approach at SIGGRAPH (other primary author here is Jorge Jimenez), but we don't use entity ID data as part of our custom up-sample in Infinite Warfare. DICE's is much closer to the "stock" up-sample arrangement.



Like anything sufficiently interesting, it's complicated, and workloads vary depending on requirements, context, etc. That's really my point most of the time posting in threads like this--it is essentially impossible for the consumer to ever understand the environment in which decisions like this are made or the real underlying reasons and rationale. And really, there's no reason to--that's why I like when DF just focuses on the end result *as observable by the consumer* e.g. doing analysis of frame pacing, resolution, etc. It avoids them getting into the dangerous territory of speculating about how hardware is designed, leveraged, and how software is written, which is far more complicated than possible to express accurately in web articles.

look forward to reading it
 

Ushay

Member
There are differences...

FP16 pictures is less sharper, the lighting lose intensity and there are artifacts not present in the FP32 picture... FP32 picture is clean with better reflections.
So why is FP16 being talked up like it's amazing?
 

Inuhanyou

Believes Dragon Quest is a franchise managed by Sony
So why is FP16 being talked up like it's amazing?

My guess is that the cutbacks in certain spots to effects and such(like lower resolution transparency effects and such) can be optimized to provide a better utilization result overall than the 4.2 TF GPU space would have one believe.

But i think its being overblown just to compete with Scorpio. Nothing will overcome that power.
 

Rodelero

Member
Can 16 and 32 be used on the same frame?

Yes. A developer has total control over the precision for each variable and operation in the shaders. In a modern game, the colour of each pixel on the screen depends on a massive series of calculations. The developer can control which are done at FP16, and which are done at FP32.

Ushay said:
So why is FP16 being talked up like it's amazing?

It's not. The discussion/argument/debate/farce going on in here is primarily about one group trying to show that FP16 is more or less worthless, and another group trying to show that it's not. For the most part it's become a completely circular discussion over something that's really not that big of a deal.
 

timlot

Banned
Ok to make it simple PS4 Pro has 36 delivery trucks that move 91 MPH & Xbox Scorpio has 40 delivery trucks that move 117 MPH , each of these trucks can fit 64 packages when they are being packed following the 32-bit rule that packs the truck with 64 boxes that have 1 package in each box but some times packages don't really need a whole box but the 32-bit rule does not allow these smaller packages to be double stuffed into one box each package has to be shipped in that same size box so each truck will move 64 packages each trip. PS4 Pro sometimes break this 32-bit rule & use the 16-bit rule take the packaging peanuts out of the boxes & stuff 2 packages in the box. every package can't be delivered this way & some will break without the packaging peanuts but for the packages that can be shipped this way PS4 Pro can fit up to 128 packages in their delivery trucks instead of 64.

Except your 16 bit package my contain 2 Blu phones while the 32 bit package contains an iPhone.
 
What games went from 900p 30 fps to 1080p 60fps on ps4 pro? how many titles ?
Only a couple, both because far fewer PS4 games start at 900p than on Xbox One, and because doubling of framerate is liable to be rare on both platforms. For example, how many Xbox One games go from 900p 30fps to 1080p 60fps on Scorpio?

I know Ps4 pro has native 4k games I said that won't be the case most of the time checkerboarding will be the defacto 4k way on Ps4 Pro
You also said that we know Scorpio's CPU is better than Pro's because it allows native 4K games. My point was that, since PS4's CPU can achieve it, that obviously doesn't prove Scorpio's CPU to be better.
 

SappYoda

Member
The Switch is also capable of double FP16 operations.

As with Kepler and Fermi before it, Maxwell only features dedicated FP32 and FP64 CUDA cores, and this is still the same for X1. However in recognition of how important FP16 performance is, NVIDIA is changing how they are handling FP16 operations for X1. On K1 FP16 operations were simply promoted to FP32 operations and run on the FP32 CUDA cores; but for X1, FP16 operations can in certain cases be packed together as a single Vec2 and issued over a single FP32 CUDA core.

Using FP16 Switch could achieve 1TFLOP if Nintendo enabled 1GHz GPU clock, closing the gap to Xbox One 1,31 TFLOPS

Source: http://www.anandtech.com/show/8811/nvidia-tegra-x1-preview/2
 

anothertech

Member
Ok to make it simple PS4 Pro has 36 delivery trucks that move 91 MPH & Xbox Scorpio has 40 delivery trucks that move 117 MPH , each of these trucks can fit 64 packages when they are being packed following the 32-bit rule that packs the truck with 64 boxes that have 1 package in each box but some times packages don't really need a whole box but the 32-bit rule does not allow these smaller packages to be double stuffed into one box each package has to be shipped in that same size box so each truck will move 64 packages each trip. PS4 Pro sometimes break this 32-bit rule & use the 16-bit rule take the packaging peanuts out of the boxes & stuff 2 packages in the box. every package can't be delivered this way & some will break without the packaging peanuts but for the packages that can be shipped this way PS4 Pro can fit up to 128 packages in their delivery trucks instead of 64.
Lol nice. But I don't think they'll get it still. Cause Mark Cerny obviously doesn't know how the ps4 pro he built works.

Honestly, ppl just mad that there's more Vega in pro than Scorpio, so they gonna downplay as much as possible.

There is literally no visual evidence to compare games with or without vegas RPM in action, and there won't be until Scorpio is released and games that actually use this tech can be looked at back to back. It will be a while so people need to chill.

The answer everyone is looking for is this: having vegas RPM is absolutely a good thing. It will help close the gap between scorpios brute force 4k and pros checkerboard 4k.

But Scorpio fans shouldn't worry about this really, as we have no evidence Atm what this will actually do in practice.

What Scorpio fans should be scared of however is the possibility that Pros focus on checkerboard rendering in conjunction with RPM tech will make pro games possibly run with better iq/effects/fps than the brute force native 4k of scorpios focus. That would be a ridiculous outcome, but very possible if they just do Xbone quality ports in native 4k and keep the frame rate.
 

Marmelade

Member
The Switch is also capable of double FP16 operations.



Using FP16 Switch could achieve 1TFLOP if Nintendo enabled 1GHz GPU clock, closing the gap to Xbox One 1,31 TFLOPS

Source: http://www.anandtech.com/show/8811/nvidia-tegra-x1-preview/2

The FP16 wet dreams have got to stop, whether about the Pro, Switch or whatever
It's not gonna double the peak performance of your platform of choice in each and every case
That's what some people here are trying to explain
Let's not make it into a bigger deal than it really is
 
Top Bottom