• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

What is the actual power of the Nintendo Switch?

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Oh, well there you go. What makes the X1 different from other big.LITTLE ARM SOCs in that regard? Could the "custom"
lol
X1 in Switch have been modified to change this? Or has it been confirmed that all of the engineering hours Nvidia put into the SOC were strictly software related?
It all originates from the fact TX1 uses an in-house coherence interconnect (which is strictly cluster-switching). Had NV used a standard (for the time) ARM CCI it'd have been fine for HMP.

http://www.anandtech.com/show/8811/nvidia-tegra-x1-preview said:
However, rather than a somewhat standard big.LITTLE configuration as one might expect, NVIDIA continues to use their own unique system. This includes a custom interconnect rather than ARM's CCI-400, and cluster migration rather than global task scheduling which exposes all eight cores to userspace applications. It's important to note that NVIDIA's solution is cache coherent, so this system won't suffer from the power/performance penalties that one might expect given experience with previous SoCs that use cluster migration.

Furthermore, from some switch system-call reverse engineering, 4 cores are visible to user processes.

[ed] Sorry, just spotted this:

It doesn't affect how the code is written, but it does affect how long it takes to execute.
ARM is a RISC cpu, x86 isn't. In other words something that is a single instruction that takes a single cycle in x86 be multiple instructions that take multiple cycles on a RISC cpu.
That's an utterly reductive explanation of the conceptual differences between RISC and CISC.

Most ARM and x86 same-semantic instructions take the same or similar number of clocks. What ARM lacks vs x86 is the ability of ALU ops to take a mem operand [which is never free in terms of clocks on x86], but ARM more than compensates that with larger register files, both GPR and SIMD.

Actually, if what you said was remotely true, all benchmarks would indicate higher performance per-clock for similarly-specced x86 vs ARM, and a mere glance at popular benchmarks like geekbench shows that's not the case.
 
It all originates from the fact TX1 uses an in-house coherence interconnect (which is strictly cluster-switching). Had NV used a standard (for the time) ARM CCI it'd have been fine for HMP.



Furthermore, from some switch system-call reverse engineering, 4 cores are visible to user processes.

Ah, thanks. Seems like both Nvidia and Nintendo left something on the table, here, then. Kind of adds weight to the theory that Nintendo got X1 for a good price simply because Nvidia had foundry time they had to fill due to lack of Shield sales.
 

Vinci

Danish
It's true power is that I can play Breath of the Wild anywhere and at anytime, with the ability to bounce in and out of the game within seconds.

But to your specific definition for 'power': Based on what I've seen, it's set somewhere between last gen and this one, possibly edging slightly closer to this one.
 

FinalAres

Member
It's true power is that I can play Breath of the Wild anywhere and at anytime, with the ability to bounce in and out of the game within seconds.

But to your specific definition for 'power': Based on what I've seen, it's set somewhere between last gen and this one, possibly edging slightly closer to this one.

Again, I'm going to call you out on this, because whilst I agree that the Switch is great not because of its power, I don't like the fanboy-marketing bullshit.

The specific definition of power you're talking about, also happens to be the universally accepted definition talked about in games. The power is also much closer to last-gen and nowhere near this gen.

The Switch is what it is, and its better for it. Do I want a handheld as powerful as an xbox one with 30 minutes battery that leaves me with hot hands? Hell no. The Switch is a beast for its size and its probably the most impressive piece of gaming tech on the market at the moment. There! look, I was able to be complimentary about the Switch without lying through my teeth,
 

Rodin

Member
So.. After a few comments in this thread my revised version looks like this:

XBOX360 - 240 GFLOPS - 48 vector processors * 10FP/cycle per processor * 500Mhz
WIIU - probably 170 GFLOPS - 8 clusters * 20 cores/cluster *550Mhz * 2 (20 cores/cluster seems to be a speculation, might be 32 or a lot less likely 40)
XBOX ONE - 1300 GFLOPS - 768 cores at 853Mhz *2
PS4 - 1800 GFLOPS - 1152 cores *800 Mhz *2

Nintendo switch docked: ~400 GFLOPS - 256 cores * 768Mhz * 2 (there was a rumor of ~900Mhz, but I didn't find a good source, so sticking with 768)
Nintendo switch undocked: ~200 GFLOPS - 256 cores * 384Mhz * 2

So, better than a WiiU when portable, and around 1/3 of xb1 when docked.

Notes:
Removed ps3 because cell is strange, hard to compare directly.
The memory BW of the switch is a lower than the XBONE. ~x3 slower for main memory, +xbone has 32MB of ~x8 faster memory.
XB1 has 8 x86 cores @1.7Ghz (~6 available for games), while Switch has 4 ARM probably at 1Ghz (3 available for games) cores. So probably overall x2-3 times slower?
This is a rough estimate. It doesn't mean that a game that runs on xb1 can run on switch at x0.6 resolution (0.6^2=0.36 -> 1/3 of total pixels).
360 numbers are off, should be only a bit above 200 gflops iirc. The architecture is also pretty ancient at this point (R600 is about 10 years old).


The specific definition of power you're talking about, also happens to be the universally accepted definition talked about in games. The power is also much closer to last-gen and nowhere near this gen.
Nah, it's in the middle. Maybe a bit closer to Wii U than Xbox One, but yeah.
 

Buggy Loop

Member
All those GFLOPs calculation and comparison doesn't mean squat when you compare a gpu architecture and pipeline that is over 10 years old vs one that is ~2 years old.
 
The handheld mode is basically a Wii U+ (BOTW, for example, runs better in that mode than on Wii U)
The docked mode is like 2.5x that which should allow in many cases for a decent resolution bump.
You saw it with MK8D which is MK8 but 1080p on the TV.
Switch also has more ram which helps, I believe the double items were disabled in the original due to a RAM issue.
The architecture is also really modern when compared to Wii U which helps

It's not good to only use Wii U ports as a comparison, because those games were engineered based on Wii U's architecture. The true comparisons of what it can do vs Wii U will come with ground-up games engineered to make the most of the extra ram and the Tegra X1, with Mario Odyssey and others.
 
Kid Gohan most definitely. Adult Gohan is the Wii U.

In that case, perhaps the Switch is adult Gohan after
he began training again and regaining his ultimate power
. That would work for the Switch since
Gohan's power is now in a relevant range, but is still (currently) not a match for Goku at full power(PS4).
:)

Heh, that was fun.
 

Rodin

Member
It's not good to only use Wii U ports as a comparison, because those games were engineered based on Wii U's architecture. The true comparisons of what it can do vs Wii U will come with ground-up games engineered to make the most of the extra ram and the Tegra X1, with Mario Odyssey and others.
Considering that 3D World was finished around fall 2013, i'm pretty sure development for Odyssey started on Wii U as well. Still looks much better despite being a game with open ended areas vs small linear levels.
 

Hilarion

Member
All those GFLOPs calculation and comparison doesn't mean squat when you compare a gpu architecture and pipeline that is over 10 years old vs one that is ~2 years old.

XBox360 still used DirectX8.

You're right, of course. While on paper, the Switch's GPU is roughly on par with the 360s undocked and doubles the 360 docked, the use of modern architecture and an API not from 10 years ago means that the difference is much larger. Also, 8x as much RAM.
 

Astral Dog

Member
In that case, perhaps the Switch is adult Gohan after
he began training again and regaining his ultimate power
. That would work for the Switch since
Gohan's power is now in a relevant range, but is still (currently) not a match for Goku at full power(PS4).
:)

Heh, that was fun.
Goku did a very soecial legend ritual with five other Saiyans,trained with Gods and Angels for years to achieve Godhood and this punk Gohan comes close in a few days with an old man dancing. 😬
 

Rolf NB

Member
That's an utterly reductive explanation of the conceptual differences between RISC and CISC.

Most ARM and x86 same-semantic instructions take the same or similar number of clocks. What ARM lacks vs x86 is the ability of ALU ops to take a mem operand [which is never free in terms of clocks on x86], but ARM more than compensates that with larger register files, both GPR and SIMD.

Actually, if what you said was remotely true, all benchmarks would indicate higher performance per-clock for similarly-specced x86 vs ARM, and a mere glance at popular benchmarks like geekbench shows that's not the case.
x86_64 caught up in GPRs, 2003ish.

There is no ISA-induced difference in what the execution cores can or cannot do, nobody implements x86-aware execution units anymore, x86 is just a translation frontend at this point.

The real strength of x86 is that it is effectively transparent code compression, so relatively small caches can already do wonders and there's little memory interference between code and data.

XBox360 still used DirectX8.
I'm not sure you're sure what that is supposed to even mean, but no.
 

Jawmuncher

Member
Did MS say anything about why minecraft has less draw distance on switch and WiiU hardware wise? Found that really odd.
 

Costia

Member
360 numbers are off, should be only a bit above 200 gflops iirc. The architecture is also pretty ancient at this point (R600 is about 10 years old).
Nah, it's in the middle. Maybe a bit closer to Wii U than Xbox One, but yeah.
360 numbers are not off, i even provided the calculation.
If you think they are off, provide a source and i will fix it.

That's an utterly reductive explanation of the conceptual differences between RISC and CISC....
I compared switch (ARM) to XB1 (x86) in a previous post.
Since i am not familiar with Jaguar's and ARM implementation, this was meant as a warning not to take the CPU freq. as a base for direct comparison, since the philosophy behind their architecture is different.
If in reality the difference isn't significant, it just makes my previous comparison more "accurate".
Can you link to benchmarks? I was looking for some, but couldn't find one where i could guarantee its the same code/version running on both.
Most ARM and x86 same-semantic instructions take the same or similar number of clocks. What ARM lacks vs x86 is the ability of ALU ops to take a mem operand [which is never free in terms of clocks on x86], but ARM more than compensates that with larger register files, both GPR and SIMD.
Actually, if what you said was remotely true, all benchmarks would indicate higher performance per-clock for similarly-specced x86 vs ARM, and a mere glance at popular benchmarks like geekbench shows that's not the case.
Why isn't it free? I would assume that as long as its a cache hit and the CPU has enough of "other" stuff to do, it will be free as far as throughput is concerned.
Is this still true on intel x86 vs ARM?
SSE is comparable to NEON?
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
x86_64 caught up in GPRs, 2003ish.
Temporarily. Arm64 has 31 GPRs, x86-64 - still 16 GPRs.

There is no ISA-induced difference in what the execution cores can or cannot do, nobody implements x86-aware execution units anymore, x86 is just a translation frontend at this point.
There's plenty of ISA-induced differences - run an EBS profiler some time and watch the decoder stalls in the frontend on a reasonably-big code.

The real strength of x86 is that it is effectively transparent code compression, so relatively small caches can already do wonders and there's little memory interference between code and data.
"Relatively small caches", as in the extra 1.5K-uop cache that SNB introduced to tackle the fronted congestion issues (on top of the L1 I-cache)?

"Effectively transparent code compression", as in average op-length of 5+bytes in modern, richly REX-prefixed x86-64 code, going to 6+bytes in VEX-prefixed AVX code?
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
I compared switch (ARM) to XB1 (x86) in a previous post.
Since i am not familiar with Jaguar's and ARM implementation, this was meant as a warning not to take the CPU freq. as a base for direct comparison, since the philosophy behind their architecture is different.
The philosophy is different, but not on the level you described - there is practically no difference there today.

If in reality the difference isn't significant, it just makes my previous comparison more "accurate".
Well, not really. As things are today, RISC and CISC have same complexity (on a semantic level) ops, but RISC is a load/store paradigm with a 'reduced instruction decoder complexity', whereas CISC can have mem operands pretty much anywhere and also very long (in terms of encoding seq) ops, resulting in complex decoders. Apropos, x86 is an example of how not to design a CISC ISA:

http://www.agner.org/optimize/forwardcom.pdf said:
Some commonly used instruction sets are poorly designed from the beginning. These systems have been augmented many times with extensions and patches. One of the worst cases is the widely used x86 instruction set and its many extensions. The x86 instruction set is the result of a long history of short-sighted extensions and patches. The result of this development history is a very complicated architecture with thousands of different instruction codes, which is very difficult and costly to decode in a microprocessor. We need to learn from past mistakes in order to make better choices when designing a new instruction set architecture and the software that supports it.

Can you link to benchmarks? I was looking for some, but couldn't find one where i could guarantee its the same code/version running on both.
We've had pages and pages of those in past NX "speculation" threads - just look up any geekbech Jaguar/A57 comparison, and normalize by clock, as needed.

Why isn't it free? I would assume that as long as its a cache hit and the CPU has enough of "other" stuff to do, it will be free as far as throughput is concerned.
Cache is not free - a cache hit still adds a small amount of clocks (2-3 for L1, much more for L2, etc) to the op latency, so a CISC op that has a mem operand normally has the overall latency of a 'load op + consumer op' pair. And throughput is of no relevance to the CISC/RISC distinction - superscalarity is orthogonal to these paradigms (though, historically, it was introduced with RISC).

SSE is comparable to NEON?
SSE is an obsolete 2-arg SIMD. AVX128 and AVX256 (and soon AVX512) are comparable to NEON, as they are all modern SIMD ISAs. But SIMD paradigms are moving on as we speak.
 
Thing is PS4 barely 900p these days and xbox one at 720p.
so what will be switch 576p docked 360p not docked?
Devs can put 1080p for all the games they have worked on since last generation, but when you are prioritizing resolution, you are sacrificing power towards framerate and graphical fidelity in exchange. Its good to have some balance in between all 3, or prioritize one or two more over the other ones.

We will likely never see switch games at vita resolution. 720p at lowest(or with dynamic at occasionally 675p). But this is entirely up to the devs. We could get 1080p call of duty games if devs choose to do so, but it will be at the cost of graphical fidelity(lower textures, polygons, lighting, shadows, anti aliasing, effects) and/or framerate.

So far we've seen games that have lower resolution, lower resolution but nearly identical graphical fidelity(Snake Pass) like some lower quality effects and framerate, as well as games with identical resolution to the PS4, but framerate and graphical fidelity sacrificed(Legocity:Undercover).
 

Astral Dog

Member
That's not how it works. 360 and PS3 can handle 1080p games, but when you are prioritizing resolution, you are sacrificing power towards framerate and graphical fidelity in exchange. Its good to have some balance in between all 3, or prioritize one or two more over the other ones.

We will likely never see switch games at vita resolution. 720p at lowest(or with dynamic at occasionally 675p). But this is entirely up to the devs. We could get 1080p call of duty games if devs choose to do so, but it will be at the cost of graphical fidelity(lower textures, polygons, lighting, shadows, anti aliasing) and/or framerate.
Snake Pass is sub Vita
and i assume it won't be the only one
 

dogen

Member
Compared to wiiU
In handheld mode the GPU is like a wiiU plus (think 3DS > NEW 3DS) but has the benefit of 3x the usable ram, a significantly stronger CPU, and an architecture about 5-10 years newer (yes I realize that's a wide gap but the architecture on the wiiU was a weird hodgepodge of older and newer).

The new 3DS had twice as many cores as the original and each was over 3x faster.
 
Snake Pass is sub Vita
and i assume it won't be the only one
For undocked when in dynamic mode, yeah. Docked mode is 720p, but some dynamic resolution lowers resolution in some moments to in the 600p range.

If cod ends up at 720p and 60fps while maximizing fidelity to be close to PS4's, I'll be happy. I think we may see this more than the 1080p route with less fidelity for third party games.
 
Snake Pass is sub Vita
and i assume it won't be the only one

Many vita games were sub vita.

And they weren't lower res versions of a PS4 game with a couple of effects missing.

It's a weird way of describing the game's resolution is basically all I'm saying.
 

Rodin

Member
360 numbers are not off, i even provided the calculation.
If you think they are off, provide a source and i will fix it.

The 360 GPU architecture is different, every five shaders there is one that does a single floating point operation instead of the usual two. So it would be 216 gflops.

Aside from that Xbox One doesn't use 8 cores for games.
 

ec0ec0

Member
This is false. GC to Wii was the exact same architecture boosted up, Switch has a drastically more modern featureset than Wii U, and is capable of far more than a Wii U when docked.

why does that matter if that extra power is only going to be used for going from 720p undocked to 900p/1080p docked? It is not going to be used to make games with better physics, ai, graphics, textures, biggers, more complex, worlds, etc...

In those regards, switch games will look like games from the wiiu generation.
 
why does that matter if that extra power is only going to be used for going from 720p undocked to 900p/1080p docked? It is not going to be used to make games with better physics, ai, graphics, textures, biggers, more complex, worlds, etc...

In those regards, switch games will look like games from the wiiu generation.

Except, you know, Lego City has all the extra visual upgrades seen in the PS4 version, running at 720p undocked. It's a night-and-day difference with the physically based rendering, and on top of that you get better lighting elsewhere, and better shadows, normal maps, an improved draw distance and higher quality assets.

I was talking about GFLOPS when mentioning Switch docked, not what the games will look like.

Snake Pass is sub Vita
and i assume it won't be the only one

So what? The most demanding Vita games at launch went down to as low as 360p. But strong antialiasing meant that few actually noticed that Everybody's Golf and NFS: Most Wanted were rendering at 1/4 of the resolution of their PS3 counterparts. That's not the case with Snake Pass on Switch - the resolution gap isn't that small even in handheld mode.

Late Vita stuff like World of Final Fantasy and Atelier Firis naturally appear to go even lower than 360p, the former looks like a mess even on the Vita's display - there's no antialiasing at all and it's significantly below native resolution.
 
Seems like a step above wii u, a few steps behind the ps4 and xbox one. Its not my primary means of playing third party stuff and im in it for the first party stuff so i dont really mind
 

Astral Dog

Member
Many vita games were sub vita.

And they weren't lower res versions of a PS4 game with a couple of effects missing.

It's a weird way of describing the game's resolution is basically all I'm saying.
You are right i think its actually sub SD on handheld mode D:

But most games should have that 720p minimum locked with dynamic res here and there
 

gamerMan

Member
Obviously, the Switch is not as powerful as the PS4, but I think it can run all modern games with specially designed assets and lower resolutions. Games haven't advanced a whole lot since last generation aside from resolutions. Games are still using the same AI, lighting models, and physics. In my opinion, Grand Theft Auto 5 ran on last generation but it is still far more impressive than anything released this generation.

Right now there is too much risk for a AAA 3rd party developer to spend the time to get their game running on it. If the base of the Switch gets to 15 million, then developers are going to figure out how to get their games on there.
 

matthewuk

Member
Let's see how they compare in terms of PC specs

Xbox 360

Pentium 4@3.2 GHz (tri core)
512mb ram
Radeon x1900xt GPU no vram but 10mb edram

A typical 2006 Dell with a top graphics card( for 2006)

Ps3

Pentium 4@3.2 GHz (single core but 8 additional FPU/simd units)
256mb ram
Nividia 7900 256mb vram

An exotic 2006 linux PC with high end graphics but low ram

Wii u

Pentium M (dothan) @ 1.2ghz (tricore)
2gb ram
Radeon Hd 6450m no vram but 32mb edram

An midrange laptop with a modest CPU but better than average Dedicated graphics

Switch

Core i3 ulv 1 GHz (quad core)
4GB ram
Nividia geforce 830m/Tx1 no vram

A Microsoft surface style configuration with the dedicated graphics option.

Note: this isn't really to say how powerful the switch is. It's just a thought experiment.
 

Costia

Member
The 360 GPU architecture is different, every five shaders there is one that does a single floating point operation instead of the usual two. So it would be 216 gflops.
Aside from that Xbox One doesn't use 8 cores for games.

My calculation is 5 Multiply-add commands/cycle per vector unit. So i think it already takes what you said into account...
The ~6/8 on xb1 cores is mentioned in my post.

The philosophy is different, but not on the level you described - there is practically no difference there today....
Thanks for the explanations.
What about intel/apple?
Is their advantage mostly better prediction/instruction order optimizations?
IIRC apple's version of ARM has a bigger instruction reorder buffer.
 
Well, 240p will not do most games justice in visuals.
If GPU was at least as powerful as the gamecube's, we'd have Wii games in textures, polygons, AA and lighting at 240p. It has a higher pixel density because of the small screen also though, so it would look way better on 3ds screen vs tv.

Heck, imagine if Nintendo came out with a 480p screen with the Wii GPU and new 3ds CPU and RAM and maybe better battery life. Call it the New 3dsSell it for 200 to 230, and give options to devs to give 3ds games the ps4 pro treatment and it would sold like crazy. Of course, its 20-20 hindsight, but Ninty really dun goofed by upgrading everything but the GPU.
 

matthewuk

Member
No it's definatly more powerful than a 360, it's more an excersise in showing how things have gotten smaller,more effecient and more powerful. Don't let the modest specs fool you. A good example is my laptop. It's integrated graphics are a cheap part to keep costs down but it beats the socks of the 360 which was made from high end parts in its day.
 

Rodin

Member
My calculation is 5 Multiply-add commands/cycle per vector unit. So i think it already takes what you said into account...
The ~6/8 on xb1 cores is mentioned in my post.
Maybe my wording wasn't perfect, but i found two posts from more tech savy people than me.

It's probably more accurate to label Xenos as 216GFLOPs given the vec4+1 nature of the ALUs, but anyways. There will be some other difficulties in comparing to the vec5 of the r7xx generation.

Similarly, there are some quirky... quirks of G7x that would make the theoretical flops laughable.

Just for reference, since we are flopping around here, Xenos pushes a "mere" ~216 GFLOPS. Its architecture is a bit unique, and 1 out of every 5 shaders does a single floating point operation per cycle rather than 2.

Edit: Damn you, Al!

Hope these help.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Thanks for the explanations.
What about intel/apple?
Is their advantage mostly better prediction/instruction order optimizations?
IIRC apple's version of ARM has a bigger instruction reorder buffer.
Intel/Apple's advantage stems mainly from higher transistor counts - their target markets allow them to use fabnodes and die sizes that come at a premium. Ergo they have wider designs, better caches, larger LLC - all that jazz. Ok, apple's team is also damned talented - they currently produce arguably the best designs on the planet in their class.
 
If GPU was at least as powerful as the gamecube's, we'd have Wii games in textures, polygons, AA and lighting at 240p. It has a higher pixel density because of the small screen also though, so it would look way better on 3ds screen vs tv.

Heck, imagine if Nintendo came out with a 480p screen with the Wii GPU and new 3ds CPU and RAM and maybe better battery life. Call it the New 3dsSell it for 200 to 230, and give options to devs to give 3ds games the ps4 pro treatment and it would sold like crazy. Of course, its 20-20 hindsight, but Ninty really dun goofed by upgrading everything but the GPU.

To have a 4x increase in resolution, the GPU would have to be around 4x stronger than the original 3DS. Wii's GPU is not 4x stronger than the 3DS, and it is missing the fixed shaders that was used to make current games on the 3DS look as good as they are on the system. Besides that, it would probably run too hot for a portable form factor, and would have BC issues without addition hardware.

How does the CPU in the switch compare to the Jaguar CPU in PS4's / Xbox One?

Blu and I did some math with some benchmarks. The Switch's CPU came up with being roughly 80% of the performance of the PS4 per core. When we consider the dimishing returns of splitting tasks among more than 3 cores and not knowing how much of the 4th core is available for Switch devs, we may be looking at something like 50% of the CPU performance of the PS4 at full utilization.
 
Top Bottom