Support NeoGAF

Pasedo · Apr 13, 2017

I'm picking up my Switch and copy of BOTW today after work. We have a 4 day long weekend here in Australia so will enjoy taking it through its paces. I agree. I too have a beastly gtx 1080 ftw gaming overclocked PC and although I'm used to seeing games at 1080p Ultra the art direction on games like UC4 and Ninty games still blow me away.

Mokujin · Apr 13, 2017

Donnie said:
Ah right, any guess on what kind of difference in performance A57 would show in that test? A53 is 2-way SIMD and A57 4-way right?

Would be interesting BTW to see that bench ran on a Jaguar based system and also on Shield TV

To expand on this acording to wikipedia list of Arm processors.-

A53 - 2.3 DMIPS/MHz

A57 - 4.6 DMIPS/MHz

Liabe Brave · Apr 13, 2017

dr_rus said:
I find that very hard to believe. That would mean that MS had Scorpio GPU silicon before Sony had Neo's, and that just doesn't compute considering their launch frames. FP16x2 will most certainly be in Scorpio, I think, and it will most certainly have more Vega features than PS4Pro, not less. HDMI 2.1 support for example can be one such feature.

That would make sense, but so far none of the preview material has included any Vega features for Scorpio. (I highly doubt HDMI version is a GPU feature; from what I can tell 2.1 uses the same hardware as 2.0, and can potentially be achieved with firmware update.) And the Gamasutra article, presumably using info straight from Microsoft, says the GPU is custom with "Polaris features", no mention of Vega.

lwilliams3 · Apr 13, 2017

Donnie said:

Their's this benchmark by Blu:

http://www.neogaf.com/forum/showpost.php?p=192321356&postcount=99

Code:

| CPU                   | N-way SIMD ALUs  | flops/clock | remarks                                        |
|-----------------------|------------------|-------------|------------------------------------------------|
| IBM PowerPC 750CL     | 2-way            | 1.51        | g++ 4.6, paired-singles via autovectorization  |
| AMD Bobcat            | 2-way            | 1.47        | clang++ 3.4, SSE2 via intrinsics               |
| Intel Sandy Bridge    | 8-way            | 9.04        | clang++ 3.6, AVX256 via generic vectors        |
| Intel Ivy Bridge      | 8-way            | 9.09        | clang++ 3.6, AVX256 via generic vectors        |
| Intel Haswell         | 8-way            | 9.56        | clang++ 3.6, AVX256 + FMA3 via generic vectors |
| Intel Xeon Phi (KNC)  | 16-way           | 6.62        | icpc 14.0.4, MIC via intrinsics                |
| iMX53 Cortex-A8       | 2-way            | 2.23        | clang++ 3.5, NEON via inline asm               |
| RK3368 Cortex-A53     | 2-way            | 2.40        | clang++ 3.5, A32* NEON via inline asm          |
| AppliedMicro X-Gene 1 | 2-way            | 2.71        | clang++ 3.5, A64 NEON via generic vectors      |
| Apple A7              | 4-way            | 11.07       | apple clang++ 7.0.0, A64 NEON via intrinsics   |
| Apple A8              | 4-way            | 12.19       | apple clang++ 7.0.0, A64 NEON via intrinsics   |
| Apple A9              | 4-way            | 16.79       | apple clang++ 7.x.x, A64 NEON via intrinsics   |

No A57 there, but does show a A53 beating the 750cl quite handily.

EDIT: Actually it appears to be a Cortex A32, which I assume is a bit lower performance than A53? (and obviously well below A57).

Mokujin said:
To expand on this acording to wikipedia list of Arm processors.-

A53 - 2.3 DMIPS/MHz

A57 - 4.6 DMIPS/MHz

So the A57 may double what the A53 did in Blu's benchmark. So, by these numbers, Switch's A57s are a little above 3x the performance of Wii U's CPU cores clock-to-clock. On their set clock frequency, the Switch would be about 2.6x the CPU power of the Wii U. That is a significant difference in single core CPU performance. Those calculations wouldn't take any of Expresso's customizations into account, though.

Liabe Brave said:
That would make sense, but so far none of the preview material has included any Vega features for Scorpio. (I highly doubt HDMI version is a GPU feature; from what I can tell 2.1 uses the same hardware as 2.0, and can potentially be achieved with firmware update.) And the Gamasutra article, presumably using info straight from Microsoft, says the GPU is custom with "Polaris features", no mention of Vega.

Off topic, but I didn't expect that. If this is the case, that can definitely narrow the performance gap between those two.

blu · Apr 13, 2017

lwilliams3 said:
So the A57 may double what the A53 did in Blu's benchmark. So, by these numbers, Switch's A57s are a little above 3x the performance of Wii U's CPU cores clock-to-clock. On their set clock frequency, the Switch would be about 2.6x the CPU power of the Wii U. That is a significant difference in single core CPU performance. Those calculations wouldn't take any of Expresso's customizations into account, though.

Just a note: this test uses the paired-singles SIMD of the Espresso, otherwise known as 750CL. Apropos, getting gcc to use paired singles was a nuisance, thanks to the way this feature is supported by this compiler.

dr_rus · Apr 13, 2017

Liabe Brave said:
That would make sense, but so far none of the preview material has included any Vega features for Scorpio. (I highly doubt HDMI version is a GPU feature; from what I can tell 2.1 uses the same hardware as 2.0, and can potentially be achieved with firmware update.) And the Gamasutra article, presumably using info straight from Microsoft, says the GPU is custom with "Polaris features", no mention of Vega.

2.1 is a h/w change, no firmware update will work unless the h/w already was there - for which it had to be released at least after the spec draft which appeared around autumn 2016. So Pascals are surely out.

Thraktor · Apr 13, 2017

dr_rus said:
2.1 is a h/w change, no firmware update will work unless the h/w already was there - for which it had to be released at least after the spec draft which appeared around autumn 2016. So Pascals are surely out.

It requires a hardware change if they use the extra bandwidth (which I doubt they will). It's conceivable that they could support some of the new features like VRR in firmware to be "HDMI 2.1 compliant" without necessarily supporting the full HDMI 2.1 spec. I'd actually be interested to see if Sony make a move to patch VRR support into the PS4 Pro now that MS are doing so with Scorpio.

lwilliams3 · Apr 13, 2017

blu said:
Just a note: this test uses the paired-singles SIMD of the Espresso, otherwise known as 750CL. Apropos, getting gcc to use paired singles was a nuisance, thanks to the way this feature is supported by this compiler.

Oh ok. Thanks for clarifying that, and for devoting the work to give us these numbers. Would we still have to take the modified cache system with EDRAM into account?

On a related note, I have read that the OS in the Wii U didn't use any resources off of the 750CL cores, but used a smaller ARMs core ( probably an ARM9 @ <500MHz.) If that was true, that was good for games since the CPU ended up being an issue even with 100% accessibility. When we compare it to the Switch, though, the difference between the OS performance makes a lot of sense. The Switch is devoting a whole A57 core for background tasks compared to Wii U's tiny IO processor. The power difference is likely beyond an order of magnitude. No wonder the Switch OS is so snappy compared to the Wii U!

blu · Apr 13, 2017

dr_rus said:
2.1 is a h/w change, no firmware update will work unless the h/w already was there - for which it had to be released at least after the spec draft which appeared around autumn 2016. So Pascals are surely out.

But HDMI protocols, hw or not, are completely orthogonal to GPU architectures. You can equip an fp16-fb-yielding GPU with an ASIC designed to do the OETF and call it a day for the GPU, no?

lwilliams3 said:
Oh ok. Thanks for clarifying that, and for devoting the work to give us these numbers. Would we still have to take the modified cache system with EDRAM into account?

Test runs from L1 on all participating architectures.

Donnie · Apr 13, 2017

lwilliams3 said:
Oh ok. Thanks for clarifying that, and for devoting the work to give us these numbers. Would we still have to take the modified cache system with EDRAM into account?

On a related note, I have read that the OS in the Wii U didn't use any resources off of the 750CL cores, but used a smaller ARMs core ( probably an ARM9 @ <500MHz.) If that was true, that was good for games since the CPU ended up being an issue even with 100% accessibility. When we compare it to the Switch, though, the difference between the OS performance makes a lot of sense. The Switch is devoting a whole A57 core for background tasks compared to Wii U's tiny IO processor. The power difference is likely beyond an order of magnitude. No wonder the Switch OS is so snappy compared to the Wii U!

The ARM core ran security and IO tasks (IOSU), but the main WiiU Operating System ran on the main Espresso core (the one with a full 2MB cache) AFAIR. But I think that main core was also available for games (?)

lwilliams3 · Apr 13, 2017

blu said:
But HDMI protocols, hw or not, are completely orthogonal to GPU architectures. You can equip an fp16-fb-yielding GPU with an ASIC designed to do the OETF and call it a day for the GPU, no?

Test runs from L1 on all participating architectures.

Oh ok. That's that then. It's funny seeing the 750CL beating Bobcat.

Donnie said:
The ARM core ran security and IO tasks (IOSU), but the main WiiU Operating System ran on the main Espresso core (the one with a full 2MB cache) AFAIR. But I think that main core was also available for games (?)

Eh? Wow, well Expresso didn't really have power to spare like that, so that would be unfortunate. Maybe 30% or less was partitioned to OS tasks? Even if it was that much, that is still a fraction of the power devoted for the Switch. I think they could have been better off using small ARMs for the OS. What do you think?

dr_rus · Apr 13, 2017

Thraktor said:
It requires a hardware change if they use the extra bandwidth (which I doubt they will). It's conceivable that they could support some of the new features like VRR in firmware to be "HDMI 2.1 compliant" without necessarily supporting the full HDMI 2.1 spec. I'd actually be interested to see if Sony make a move to patch VRR support into the PS4 Pro now that MS are doing so with Scorpio.

This is highly unlikely as there is no such thing as partial support of HDMI 2.1 specs. It could be possible of course but that would be a hack of HDMI specs more than anything else and so far the only such hack I'm aware of is the inclusion of HDR10 metadata into HDMI 1.4 transport on PS4 - but this one is pure software, I'm not so sure that VRR will be pure s/w.

blu said:
But HDMI protocols, hw or not, are completely orthogonal to GPU architectures. You can equip an fp16-fb-yielding GPU with an ASIC designed to do the OETF and call it a day for the GPU, no?

Well, yes, but TMDS h/w is a part of said GPUs and you will in fact need a whole new GPU chip even if you want to update only this part specifically. Considering that HDMI 2.1 doubles the signaling frequency of the interface this may not be such an easy update and may in fact cause issues in other parts of the chip. So basically it most likely isn't worth it outside of building a totally new GPU and thus any "updates" of such kind are unlikely.

joesiv · Apr 13, 2017

dr_rus said:
This is highly unlikely as there is no such thing as partial support of HDMI 2.1 specs. It could be possible of course but that would be a hack of HDMI specs more than anything else and so far the only such hack I'm aware of is the inclusion of HDR10 metadata into HDMI 1.4 transport on PS4 - but this one is pure software, I'm not so sure that VRR will be pure s/w.

Well, yes, but TMDS h/w is a part of said GPUs and you will in fact need a whole new GPU chip even if you want to update only this part specifically. Considering that HDMI 2.1 doubles the signaling frequency of the interface this may not be such an easy update and may in fact cause issues in other parts of the chip. So basically it most likely isn't worth it outside of building a totally new GPU and thus any "updates" of such kind are unlikely.

I think you might misunderstand how the semi-custom chips work with AMD, from what I gather, AMD gives a list of features, parts of the puzzle, and their roadmaps, the customer (Sony/MS) can pick and choose the components that they want for their custom silicon.

It's not strictly a full meal deal, Vega or Polaris type deal, you can take certain pieces (like HDMI support) from the lager VEGA generations and mate it with a polaris core. Or you can go the other way (Sony seemed to have done this), and take more core VEGA parts, which support double rate FP16, but not a fully Vega chip.

It seems that MS had specific goals in mind with their custom chip, which was different than what Sony had.

blu · Apr 13, 2017

dr_rus said:
Well, yes, but TMDS h/w is a part of said GPUs and you will in fact need a whole new GPU chip even if you want to update only this part specifically. Considering that HDMI 2.1 doubles the signaling frequency of the interface this may not be such an easy update and may in fact cause issues in other parts of the chip. So basically it most likely isn't worth it outside of building a totally new GPU and thus any "updates" of such kind are unlikely.

Check this out. (the video from the talk is even more interesting but it's lengthy)

dr_rus · Apr 13, 2017

joesiv said:
I think you might misunderstand how the semi-custom chips work with AMD, from what I gather, AMD gives a list of features, parts of the puzzle, and their roadmaps, the customer (Sony/MS) can pick and choose the components that they want for their custom silicon.

It's not strictly a full meal deal, Vega or Polaris type deal, you can take certain pieces (like HDMI support) from the lager VEGA generations and mate it with a polaris core. Or you can go the other way (Sony seemed to have done this), and take more core VEGA parts, which support double rate FP16, but not a fully Vega chip.

It seems that MS had specific goals in mind with their custom chip, which was different than what Sony had.

blu said:
Check this out. (the video from the talk is even more interesting but it's lengthy)

Sorry, got lost between two threads as in each of them people started talking about HDMI 2.1 for some reason =)

Yes, of course you can add HDMI 2.1 to Scorpio's SoC while the rest of the SoC will be based on Polaris tech for example. However even this alone would mean that Scorpio's SoC is more advanced than Neo's.

As for HDMI encoder being external to the SoC - well, this is possible but I don't see why MS would want to go this way.

Pasedo · Apr 14, 2017

Been a long while since Ive been excited to wake up just to play a game. The Switch and botw is an addictive combo. So in regards to tech I'm curious, botw utilises some realistic textures like rock, tree, dirt etc and special effects like water reflections, mist and noticed it's predominantly on static assets. Im not a developer so don't understand how this all works with hardware but I'm guessing that it's less taxing on the hardware when textures are on assets which don't need to move or interact? And if better quality textures are on moving objects such as more realistic character models is this where a good memory bandwidth is important as assets need to continuously load into memory to keep adjusting and refreshing on to a moving object? Just curious as Nintendo prefer more simplistic character models which I guess is meant this way to maximise less powerful hardware. Also could 3rd party's apply this technique on their games to more easily port their games to Switch? eg retain decent quality surrounding textures but really reduce textures on moving objects? Imo I'm not really fussed if character models have super high detail or realistic textures simply as you forget about it because it's moving.

mitchman · Apr 14, 2017

dr_rus said:
This is highly unlikely as there is no such thing as partial support of HDMI 2.1 specs. It could be possible of course but that would be a hack of HDMI specs more than anything else and so far the only such hack I'm aware of is the inclusion of HDR10 metadata into HDMI 1.4 transport on PS4 - but this one is pure software, I'm not so sure that VRR will be pure s/w.

As long as you don't advertise it as 2.1 compliant, it's perfectly possible to add fw upgradable software features. You mention HDR, but an earlier example is 3D support on HDMI 1.3 in the PS3.

blu · Apr 14, 2017

Pasedo said:
Been a long while since Ive been excited to wake up just to play a game. The Switch and botw is an addictive combo. So in regards to tech I'm curious, botw utilises some realistic textures like rock, tree, dirt etc and special effects like water reflections, mist and noticed it's predominantly on static assets. Im not a developer so don't understand how this all works with hardware but I'm guessing that it's less taxing on the hardware when textures are on assets which don't need to move or interact? And if better quality textures are on moving objects such as more realistic character models is this where a good memory bandwidth is important as assets need to continuously load into memory to keep adjusting and refreshing on to a moving object? Just curious as Nintendo prefer more simplistic character models which I guess is meant this way to maximise less powerful hardware. Also could 3rd party's apply this technique on their games to more easily port their games to Switch? eg retain decent quality surrounding textures but really reduce textures on moving objects? Imo I'm not really fussed if character models have super high detail or realistic textures simply as you forget about it because it's moving.

There's a difference in the texturing techniques to skinned and rigid meshes, but other than that there are no differences that I can think of. Skinned meshes do take more computational resources to get certain kinds of textures, as those have to be done in tangential space, and that requires a bit more computations than, say, object-space techniques. But none of those things depend on texture resolution.

Pasedo · Apr 14, 2017

blu said:
There's a difference in the texturing techniques to skinned and rigid meshes, but other than that there are no differences that I can think of. Skinned meshes do take more computational resources to get certain kinds of textures, as those have to be done in tangential space, and that requires a bit more computations than, say, object-space techniques. But none of those things depend on texture resolution.

For us less technical minds it would be good to get examples of what certain technical improvements will do to visuals. For example if Switch had say better bandwidth does that mean it can pump out graphics at higher native resolution? Also the higher the resolution wouldnt it need more CPU power to process all the high res textures etc thus affecting frame rates? So could you then say that the bandwidth matched the capability of the CPU and any higher bandwidth was wastage?

dr_rus · Apr 14, 2017

mitchman said:
As long as you don't advertise it as 2.1 compliant, it's perfectly possible to add fw upgradable software features. You mention HDR, but an earlier example is 3D support on HDMI 1.3 in the PS3.

Thing is, I doubt that HDMI's VRR will be purely s/w.

Milly Osworth · Apr 14, 2017

So what tech does Switch Pro need in 2 years to run Switch games in 4K?

blu · Apr 14, 2017

Pasedo said:
For us less technical minds it would be good to get examples of what certain technical improvements will do to visuals. For example if Switch had say better bandwidth does that mean it can pump out graphics at higher native resolution? Also the higher the resolution wouldnt it need more CPU power to process all the high res textures etc thus affecting frame rates? So could you then say that the bandwidth matched the capability of the CPU and any higher bandwidth was wastage?

There's definitely a plateau point for BW beyond which a given GPU setup would get little-to-no gains. Now, when considering the BW for the TX1, we have yet to collect data points, but in general:
1. framebuffer BW is not part of main RAM BW by virtue of tiling
2. framebuffer tiles are subject to color compression schemes
3. main RAM BW is shared with CPU by virtue of UMA
4. textures residing in RAM are also subject to various texture compression schemes

So, for a given scenario, we can try to produce a cumulative BW estimate, but in general, it's rather hard (and also rather pointless) to try to pin the cumulative BW as sufficient or not per se.

Branduil · Apr 14, 2017

The cel-shading in BotW is just for stylization. The character models are still pretty complex and all that grass is certainly resource-intensive.

Mr. Pointy · Apr 14, 2017

Milly Osworth said:
So what tech does Switch Pro need in 2 years to run Switch games in 4K?

Mobile GTX 1160.

Pasedo · Apr 14, 2017

blu said:
There's definitely a plateau point for BW beyond which a given GPU setup would get little-to-no gains. Now, when considering the BW for the TX1, we have yet to collect data points, but in general:
1. framebuffer BW is not part of main RAM BW by virtue of tiling
2. framebuffer tiles are subject to color compression schemes
3. main RAM BW is shared with CPU by virtue of UMA
4. textures residing in RAM are also subject to various texture compression schemes

So, for a given scenario, we can try to produce a cumulative BW estimate, but in general, it's rather hard (and also rather pointless) to try to pin the cumulative BW as sufficient or not per se.

Something else I've noticed on Nintendo 1st party games and Im not sure if it has anything to do with frame buffer tiles you're mentioning but theres this sense of effeciency in the visuals. Best way to describe it is as if they have a predetermined set of textures e.g rock surface, water, dirt and because these are standardised it's almost as if they can be stored in such a way they can easily be used with very little impact to system resources. This then frees the system to focus on moving elements. Is this essentially what goes on? It's difficult to notice this uniformity on 3rd party games. There's so much going on it seems like a complicated mess. Which is why I don't have much hope for 3rd party games with high visuals coming over to Switch because the development approach appear to be completely different and less efficient which means they will always struggle on hardware which suit Nintendos more effecient development style.

ShadowFox08 · Apr 15, 2017

Pasedo said:
Something else I've noticed on Nintendo 1st party games and Im not sure if it has anything to do with frame buffer tiles you're mentioning but theres this sense of effeciency in the visuals. Best way to describe it is as if they have a predetermined set of textures e.g rock surface, water, dirt and because these are standardised it's almost as if they can be stored in such a way they can easily be used with very little impact to system resources. This then frees the system to focus on moving elements. Is this essentially what goes on? It's difficult to notice this uniformity on 3rd party games. There's so much going on it seems like a complicated mess. Which is why I don't have much hope for 3rd party games with high visuals coming over to Switch because the development approach appear to be completely different and less efficient which means they will always struggle on hardware which suit Nintendos more effecient development style.

Nintendo devs will build games that focus on working around their console weaknesses to fully optimize their games. On the other hand, third party devs that make multi plat ports are concerned more about scaling games properly on the switch from PS4 base builds. I've also noticed a trend with most Nintendo games that they typically don't have a ton of objects rendered at once, so this seems to help against bottlenecks like RAM and CPU. They also don't focus to heavily on textures.. But they do prioritize more on graphical fidelity or locked and smooth framerate.

Honestly from the 3rd party ports we've seen so far on Switch(I Am Setsuna, Snake Pass, and Lego City Undercover being the most notable examples), devs have made the Switch hold up p on its own quite nicely vs PS4/Xbone versions. Identical to the twins but minor less graphical features that aren't easily detectable unless you pause the game, and are fairly minor.. As well as either a frame rate drop in half or resolution downgrade that is somewhere between 1.5 to 2.25x less than PS4 version.

Hermii · Apr 15, 2017

Milly Osworth said:
So what tech does Switch Pro need in 2 years to run Switch games in 4K?

A powerful gpu in the dock?

I don't think Nintendo is going to chase 4K. What I think is more likely is a Switch pro with improvements such as a TX2, variable framerate display etc.

PdotMichael · Apr 15, 2017

I would expect that a Pro version would be something to run games in docked mode on the way.

So a pretty small improvement over the base version.

ShadowFox08 · Apr 15, 2017

PdotMichael said:
I would expect that a Pro version would be something to run games in docked mode on the way.

So a pretty small improvement over the base version.

That's a very significant improvement considering docked mode is 2.5x more GPU performance than handheld. If pro mode's handheld = current switch docked, that's 400 GFLOPS. 400 X 2.5=1 TFLOP. Factor in mixed precision mode, it could easily boost it to 1.5 TFLOPS if devs take advantage of it. It would surpass xbone GPU as a result. Newer architecture could mean more modern tools and more GPU efficency per FLOP over 2011 tech, which could make it trade blows with PS4 base version. I'm expecting a RAM bump to 8GB and a CPU upgrade equivalent to PS3/Xbone too of course.

This will likely take 2-3 years from now, which PS4 Pro and Scorpio will be default standard consoles over base PS4 and Xbone by then.

lwilliams3 · Apr 15, 2017

We were discussing this topic earlier.

http://m.neogaf.com/showthread.php?t=1360738&page=1#post233972309

So Scropio doesn't have 2x FP16 afterall. Seems like a strange omission IMO.. even using a fp16 in a fraction of the code could have easily save devs more than Switch's total raw power. I don't think it will change the chances of developers utilizing 2x Fp16 either way, though, since the XB1 brand is currently not leading in sales anywhere.

Skittzo0413 · Apr 15, 2017

lwilliams3 said:
We were discussing this topic earlier.

http://m.neogaf.com/showthread.php?t=1360738&page=1#post233972309

So Scropio doesn't have 2x FP16 afterall. Seems like a strange omission IMO.. even using a fp16 in a fraction of the code could have easily save devs more than Switch's total raw power. I don't think it will change the chances of developers utilizing 2x Fp16 either way, though, since the XB1 brand is currently not leading in sales anywhere.

So the PS4Pro would probably wind up with a fairly significant advantage over Scorpio? I guess depending on how much developers leverage the FP16 advantage?

And this being a Switch thread, I still think the PS4Pro having the 2xFP16 advantage will benefit the Switch due to more developers utilizing it.

icecold1983 · Apr 15, 2017

Skittzo0413 said:
So the PS4Pro would probably wind up with a fairly significant advantage over Scorpio? I guess depending on how much developers leverage the FP16 advantage?

And this being a Switch thread, I still think the PS4Pro having the 2xFP16 advantage will benefit the Switch due to more developers utilizing it.

no not even close. stop listening to fanboys

blu · Apr 15, 2017

Skittzo0413 said:
So the PS4Pro would probably wind up with a fairly significant advantage over Scorpio? I guess depending on how much developers leverage the FP16 advantage?

And this being a Switch thread, I still think the PS4Pro having the 2xFP16 advantage will benefit the Switch due to more developers utilizing it.

In terms of efficiency - quite likely. In terms of raw speed - we'll see.

Gotdatmoney · Apr 15, 2017

Skittzo0413 said:
So the PS4Pro would probably wind up with a fairly significant advantage over Scorpio? I guess depending on how much developers leverage the FP16 advantage?

And this being a Switch thread, I still think the PS4Pro having the 2xFP16 advantage will benefit the Switch due to more developers utilizing it.

FP16 is not going to leverage that raw of a Flops difference especially with all the other tech in Scorpio. Everyone would be doing it if the benefits could make up that gap in power

blu said:
In terms of efficiency - quite likely. In terms of raw speed - we'll see.

Would you say it's a major oversight on MS's part if FP16 is really that much of a game changer?

blu · Apr 15, 2017

Gotdatmoney said:
Would you say it's a major oversight on MS's part if FP16 is really that much of a game changer?

The fact that (a) some computations are fine by fp16, and (b) transistors are getting more and more expensive make fp16 an inevitable step for an age where Moore's law has reached the end of its economically-viable run.

lwilliams3 · Apr 15, 2017

Skittzo0413 said:
So the PS4Pro would probably wind up with a fairly significant advantage over Scorpio? I guess depending on how much developers leverage the FP16 advantage?

And this being a Switch thread, I still think the PS4Pro having the 2xFP16 advantage will benefit the Switch due to more developers utilizing it.

I can especially see Japanese developers taking advantage of 2x fp16. UE4, for example, uses fp16 by default of mobile, so games ported up to the Switch can take good advantage of that.

Gotdatmoney said:
FP16 is not going to leverage that raw of a Flops difference especially with all the other tech in Scorpio. Everyone would be doing it if the benefits could make up that gap in power

Would you say it's a major oversight on MS's part if FP16 is really that much of a game changer?

Scorpio has other numerous optimizations, so the hardware should be able to outperform the PS4Pro in most cases either way. It is still a weird omission since the other newer systems and AMD's Vega GPU will support the feature. The lack of double rate fp16 can narrow the gap between the Scorpio and the other systems to a certain degree.

Of course, Sony can also get slick and advertise that their machine can perform 8.4TFLOPS* to downplay Scorpio's raw performance. For a machine that is getting adversited for its power, it makes it more bizarre that it wasn't included to make it easier to win the "numbers" game.

Skittzo0413 · Apr 15, 2017

icecold1983 said:
no not even close. stop listening to fanboys

Gotdatmoney said:
FP16 is not going to leverage that raw of a Flops difference especially with all the other tech in Scorpio. Everyone would be doing it if the benefits could make up that gap in power

I don't mean to say that the PS4Pro will perform better overall, that's not at all what I meant. I just meant it could be a significant advantage in this area, namely the double speed FP16 processing. The same way that this FP16 advantage exists for the Switch over the XB1 but doesn't bring it to the same overall level.

Sorry if that was confusing.

Gotdatmoney · Apr 15, 2017

blu said:
The fact that (a) some computations are fine by fp16, and (b) transistors are getting more and more expensive make fp16 an inevitable step for an age where Moore's law has reached the end of its economically-viable run.

Yeah like I understand that we are eventually going to move more toward FP16 simply do to not being able to endlessly chase hardware forever. But I was more asking, does the added benefit from FP16 really bring PS4 Pro that close to Scorpio?

Skittzo0413 said:
I don't mean to say that the PS4Pro will perform better overall, that's not at all what I meant. I just meant it could be a significant advantage in this area, namely the double speed FP16 processing. The same way that this FP16 advantage exists for the Switch over the XB1 but doesn't bring it to the same overall level.

Sorry if that was confusing.

Oh okay, gotchu.

blu · Apr 15, 2017

Gotdatmoney said:
Yeah like I understand that we are eventually going to move more toward FP16 simply do to not being able to endlessly chase hardware forever. But I was more asking, does the added benefit from FP16 really bring PS4 Pro that close to Scorpio?

Well, it's an optimisation means - if used properly it can surely bring the pro closer to scorpio. How much closer? Hard to tell ATM. And when we say an 'optimisation means', that always comes at an implied effort, but the situation here is definitely not ps3-cell level of effort.

ShadowFox08 · Apr 15, 2017

Skittzo0413 said:
So the PS4Pro would probably wind up with a fairly significant advantage over Scorpio? I guess depending on how much developers leverage the FP16 advantage?

And this being a Switch thread, I still think the PS4Pro having the 2xFP16 advantage will benefit the Switch due to more developers utilizing it.

with mixed precision mode, I could see ps4 pro hit 6 TFLOPS when devs support, which is more the less gonna be the same with scorpio's 6 TFLOPS But scorpio still has 50% more RAM, more bandwidth, and a better CPU. Its gonna be interesting to see the ports go head to head.

Scorpio also has native directx12 support though, right?

PdotMichael · Apr 15, 2017

ShadowFox08 said:
That's a very significant improvement considering docked mode is 2.5x more GPU performance than handheld. If pro mode's handheld = current switch docked, that's 400 GFLOPS. 400 X 2.5=1 TFLOP. Factor in mixed precision mode, it could easily boost it to 1.5 TFLOPS if devs take advantage of it. It would surpass xbone GPU as a result. Newer architecture could mean more modern tools and more GPU efficency per FLOP over 2011 tech, which could make it trade blows with PS4 base version. I'm expecting a RAM bump to 8GB and a CPU upgrade equivalent to PS3/Xbone too of course.

This will likely take 2-3 years from now, which PS4 Pro and Scorpio will be default standard consoles over base PS4 and Xbone by then.

That's quite the small step compared to people who are already talking about moving to the next architecture because reasons.

Also no comment about that FP16 fantasy.

LordOfChaos · Apr 15, 2017

Donnie said:

Their's this benchmark by Blu:

http://www.neogaf.com/forum/showpost.php?p=192321356&postcount=99

Code:

| CPU                   | N-way SIMD ALUs  | flops/clock | remarks                                        |
|-----------------------|------------------|-------------|------------------------------------------------|
| IBM PowerPC 750CL     | 2-way            | 1.51        | g++ 4.6, paired-singles via autovectorization  |
| AMD Bobcat            | 2-way            | 1.47        | clang++ 3.4, SSE2 via intrinsics               |
| Intel Sandy Bridge    | 8-way            | 9.04        | clang++ 3.6, AVX256 via generic vectors        |
| Intel Ivy Bridge      | 8-way            | 9.09        | clang++ 3.6, AVX256 via generic vectors        |
| Intel Haswell         | 8-way            | 9.56        | clang++ 3.6, AVX256 + FMA3 via generic vectors |
| Intel Xeon Phi (KNC)  | 16-way           | 6.62        | icpc 14.0.4, MIC via intrinsics                |
| iMX53 Cortex-A8       | 2-way            | 2.23        | clang++ 3.5, NEON via inline asm               |
| RK3368 Cortex-A53     | 2-way            | 2.40        | clang++ 3.5, A32* NEON via inline asm          |
| AppliedMicro X-Gene 1 | 2-way            | 2.71        | clang++ 3.5, A64 NEON via generic vectors      |
| Apple A7              | 4-way            | 11.07       | apple clang++ 7.0.0, A64 NEON via intrinsics   |
| Apple A8              | 4-way            | 12.19       | apple clang++ 7.0.0, A64 NEON via intrinsics   |
| Apple A9              | 4-way            | 16.79       | apple clang++ 7.x.x, A64 NEON via intrinsics   |

No A57 there, but does show a A53 beating the 750cl quite handily.

EDIT: Actually it appears to be a Cortex A32, which I assume is a bit lower performance than A53? (and obviously well below A57).

What workload is that btw? If KNC is doing worse than Haswell it's obviously not something that is making good use of the vector instruction set on KNC. Similar Haswell+ should do a fair bit better than anything previous if AVX2 is being used, but I'm guessing the workload is just not hammering FLOPS in a way that it shows up.

Apples SoCs are indeed pretty crazy, but this also seems like the worst case scenario for Intel.

ShadowFox08 · Apr 15, 2017

PdotMichael said:
That's quite the small step compared to people who are already talking about moving to the next architecture because reasons.

Also no comment about that FP16 fantasy.

Yes the Switch pro does seem more like a half generation upgrade like PS4 Pro and Scorprio but we're talking about a handheld here, so its a lot more significant than you think. I'd reckon we'd have to go to 7nm to reach that power with a good battery life and high power efficiency.

Its not a fantasy when you see it in games I am Setsuna, Snake Pass, and Legocity undercover. the fp16 feature narrows the gap quite a bit. This can explain why PS4 pro of Snake Pass has double the framerate AND pushing out +56% more pixels than the OG PS4 version. On paper, PS4 pro only is 2.2x more powerful in GPU power with a minor bump in CPU and bandwith, but we saw more than 3x the power of the OG PS4 displayed on the PRO. Mixed precision mode explains the extra power, as well as on the Switch as well.

blu · Apr 15, 2017

LordOfChaos said:
What workload is that btw? If KNC is doing worse than Haswell it's obviously not something that is making good use of the vector instruction set on KNC. Similar Haswell+ should do a fair bit better than anything previous if AVX2 is being used, but I'm guessing the workload is just not hammering FLOPS in a way that it shows up.

Apples SoCs are indeed pretty crazy, but this also seems like the worst case scenario for Intel.

It's a mat4x4 multiplicator that takes two arguments, does the multiplication and then loops. It's doing worse flops on low-superscalarity uarchs than a large dense matrix multiplicator due to the much smaller computational kernel and the implications from that, eg. higher data dependencies.

Re KNC, it's using the ISA optimally - the entire multiplication is carried via 4 ALU ops and only 4 explicit swizzles - much better than AVX, but the uarch is low-superscalarity, high-latency and is not tuned for small kernels. Here's the generated code for the inner-most loop - it's optimal under the condition of no unrolling:

Code:

4030d0: vmovaps 0x604cc0(%rax),%zmm4
4030da: vpermf32x4 $0x0,0x604d00(%rdx),%zmm0
4030e5: vpermf32x4 $0x55,0x604d00(%rdx),%zmm1
4030f0: vmulps %zmm4{aaaa},%zmm0,%zmm5
4030f6: inc %ecx
4030f8: vpermf32x4 $0xaa,0x604d00(%rdx),%zmm2
403103: vfmadd231ps %zmm4{bbbb},%zmm1,%zmm5
403109: vpermf32x4 $0xff,0x604d00(%rdx),%zmm3
403114: vfmadd231ps %zmm4{cccc},%zmm2,%zmm5
40311a: add %rsi,%rdx
40311d: vfmadd231ps %zmm4{dddd},%zmm3,%zmm5
403123: vmovnrngoaps %zmm5,0x604d40(%rax,%r14,1)
40312e: add %rsi,%rax
403131: cmp $0x3938700,%ecx
403137: jb 4030d0

Negotiator · Apr 16, 2017

Are there any Switch games that utilize 2xFP16 processing?

frankie_baby · Apr 16, 2017

tapantaola said:
Are there any Switch games that utilize 2xFP16 processing?

Presumably snake pass does as fp16 is supported by UE4

plank · Apr 16, 2017

frankie_baby said:
Presumably snake pass does as fp16 is supported by UE4

Is it being utilized in the game though?

BriareosGAF · Apr 16, 2017

PdotMichael said:
Also no comment about that FP16 fantasy.

I still pop in here every now and then to see how insane the level of discourse is and every time it beats my expectations. Someone needs to write an fp16 word salad reply generator.

frankie_baby · Apr 16, 2017

plank said:
Is it being utilized in the game though?

Dont know but dont see why it wouldnt be

Pasedo · Apr 16, 2017

So I've been doing a little reading and it seems it's very easy to port games from unreal engine 4 over to Switch/X1. NVIDIA themselves said it took only a week to port the Elemental demo over and a little extra polishing time. So alot of 3rd party games are on UE4. I imagine a straight port in its current form would mean drastically cut graphics and probably terrible frame rates. But is it possible for devs to use two game engines at a time? For example I imagine the engine Nvidia developed with Nintendo is super efficient and gets the most out of the hardware. E.g. alot of FP16 techniques on perhaps custom and fixed shaders etc. So what if devs utilised some of the features on say certain surfaces and terain that are probably less important visually and use the rest of the resources to port the original assets in higher native resolution and coding over. This should help to bump up frame rates yeah?

Support NeoGAF

Confirmed: The Nintendo Switch is powered by an Nvidia Tegra X1

Member

Member

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Member

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Member

Gold Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Member

Member

Member

Member

Banned

Member

Member

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Banned

Member

Member

Wants the largest console games publisher to avoid Nintendo's platforms.

Member

Member

Member

Member

Member

Member

Similar threads