• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

SemiAccurate: Nintendo NX handheld to use Nvidia Tegra-based Soc

Wii U turned out the way it did for backwards compatibility. For them to make NX that weak, they'd either be going for GBA-level battery life or intentionally trying to make it as weak as possible just because, and I'm not even sure if the former can be done due to how low it would need to scale down. Assuming a Tegra K1 (nothing before that would be cost effective of efficient), they'd need to clock it at 100MHz. Would they really go that far? I can understand how you don't want to burned again and choose the remain skeptical though.

Also, keep in mind that 3DS was originally meant to use a Tegra, so it's possible that they just didn't have time to choose a better design after that fell through.




It's not because of BC that Wii U ended up with 3 core or with such a low tier GPU with 160 shader units. It could still've been BC with 6 core and a 320 or 480 shader units part. They aimed for a certain power consumption.
 

Vena

Member
With Nintendo's track record, under normal circumstances i'd never consider them going for it. But the "industry leading chip" and "modern custom chip" lend more credibility to it being an option.

The track record which would have put a fresh off the presses Tegra chip in the 3DS? That track record?
 

MuchoMalo

Banned
It's not because of BC that Wii U ended up with 3 core or with such a low tier GPU with 160 shader units. It could still've been BC with 6 core and a 320 or 480 shader units part. They aimed for a certain power consumption.

It's because of BC that they had to use the Wii's CPU, take up a ton of die space with eDRAM to substitute for Wii's embedded memory (which also kept them on 40nm for the GPU), and why they needed to spend extra time modifying an older architecture with fixed-function hardware that Wii required. Wii U wasn't the best they could do in 30W console; not by a long shot.

Besides, if they make the handheld as weak as you think, the console would need to only be be a GC->Wii kind of jump from Wii U in the best case (weaker than Wii U in the worst case) for any sort of sharing to work.
 

ozfunghi

Member
The track record which would have put a fresh off the presses Tegra chip in the 3DS? That track record?

Please. Spare me. You've never heard the phrase "lateral thinking with withered technology"?

"would have put"... right. And what info is there that this would have been a spankin' new chip, and not an older chip, stripped and downclocked, with some extra features bolted on top? So to answer your question: no, not that trackrecord. The one they have with hardware they actually did use. In real life. At least the last decade.
 

NeOak

Member
19.4 watts in a gaming benchmark here.



I already knew that, but he based his claim around tegra x1 drawing 5W while under load when it's actually closer to 20W. Do you seriously think the next tegra is going to be greater than an order of magnitude more power efficient?

Thanks for the link.

It's because of BC that they had to use the Wii's CPU, take up a ton of die space with eDRAM to substitute for Wii's embedded memory (which also kept them on 40nm for the GPU), and why they needed to spend extra time modifying an older architecture with fixed-function hardware that Wii required. Wii U wasn't the best they could do in 30W console; not by a long shot.

Besides, if they make the handheld as weak as you think, the console would need to only be be a GC->Wii kind of jump from Wii U in the best case (weaker than Wii U in the worst case) for any sort of sharing to work.

I believe making more out of it would have been exponentially more expensive while keeping it at 30W with that process node.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
It's worth keeping in mind the difference between perf/W at the highest achievable clocks and perf/W at the kind of clocks you would be likely to see in a handheld. When Nvidia were claiming a 2x perf/W increase from TK1 to TX1, they were of course cherry-picking the test, but they also would have cherry-picked the point at which the perf/W gap was the greatest, and that point is likely to be a lot closer to the clocks we'd be looking at for a handheld than the max achievable clocks. This isn't to say that I think the real-world perf/W ratio between the TX1 and TK1 in a handheld would be anywhere close to 2, but I do think it's at least somewhere above 1.
Indeed. Power draw does not scale linearly with clock.

Ditto for the difference between Maxwell and Pascal, comparing the performance of GM200 versus GP100 at 250W+ envelopes won't necessarily translate to the benefits when moving from one architecture to the other in a 2W TDP, which could be much higher or much lower for all we know.
Agreed.

Incidentally, on the subject of the transistor implication of Pascal's increased FP64 support, it appears that GP100 is the only Pascal die that's actually making the shift to 64 ALU SMs with half-rate FP64 support, and GP104 (and presumably all smaller chips) are apparently sticking to the 128 ALU Maxwell SMs with very limited FP64 support.
Ah. That bit about the fp64 had escaped me somehow (I know, how did it do that, amidst that flurry of reviews and analyses). So Maxwell SMs are here to stay, good. A 'hypothetical Maxwell @16FF' does not seem that hypothetical anymore.

This seems to be borne out with GP104 having a similar ALU density per billion transistors to Maxwell, and with GTX 1080 benchmarks seeing a performance boost roughly in line with raw floating point performance growth (i.e. indicating few architectural changes). This could actually be a good thing for Nintendo, as they could in theory get a 28nm "Maxwell" chip for a home console and a 16nm "Pascal" chip for the handheld with the same SMs across both.
Indeed.
 

Thraktor

Member
Please. Spare me. You've never heard the phrase "lateral thinking with withered technology"?

"would have put"... right. And what info is there that this would have been a spankin' new chip, and not an older chip, stripped and downclocked, with some extra features bolted on top? So to answer your question: no, not that trackrecord. The one they have with hardware they actually did use. In real life. At least the last decade.

There are many sources confirming that the 3DS was originally specified to use the Tegra 2, even to the point of distributing dev kits using the chip (which was, at the time, the most advanced SoC Nvidia had on offer). Why they switched to their internally-developed chip with Pica200 graphics IP is a different matter, but if they were willing to use a top-of-the-line SoC at that time, when there was little need to (as the DS had shown them they could succeed without competing on performance) why wouldn't they be willing to do so now, when there would be a benefit to doing so (in being able to more easily develop games which run across both it and a home console)?
 

Earendil

Member
Indeed. Power draw does not scale linearly with clock.


Agreed.


Ah. That bit about the fp64 had escaped me somehow (I know, how did it do that, amidst that flurry of reviews and analyses). So Maxwell SMs are here to stay, good. A 'hypothetical Maxwell @16FF' does not seem that hypothetical anymore.


Indeed.

It's worth keeping in mind the difference between perf/W at the highest achievable clocks and perf/W at the kind of clocks you would be likely to see in a handheld. When Nvidia were claiming a 2x perf/W increase from TK1 to TX1, they were of course cherry-picking the test, but they also would have cherry-picked the point at which the perf/W gap was the greatest, and that point is likely to be a lot closer to the clocks we'd be looking at for a handheld than the max achievable clocks. This isn't to say that I think the real-world perf/W ratio between the TX1 and TK1 in a handheld would be anywhere close to 2, but I do think it's at least somewhere above 1.

Ditto for the difference between Maxwell and Pascal, comparing the performance of GM200 versus GP100 at 250W+ envelopes won't necessarily translate to the benefits when moving from one architecture to the other in a 2W TDP, which could be much higher or much lower for all we know. Incidentally, on the subject of the transistor implication of Pascal's increased FP64 support, it appears that GP100 is the only Pascal die that's actually making the shift to 64 ALU SMs with half-rate FP64 support, and GP104 (and presumably all smaller chips) are apparently sticking to the 128 ALU Maxwell SMs with very limited FP64 support. This seems to be borne out with GP104 having a similar ALU density per billion transistors to Maxwell, and with GTX 1080 benchmarks seeing a performance boost roughly in line with raw floating point performance growth (i.e. indicating few architectural changes). This could actually be a good thing for Nintendo, as they could in theory get a 28nm "Maxwell" chip for a home console and a 16nm "Pascal" chip for the handheld with the same SMs across both.

Posts like this help me to understand how my wife feels when I talk shop with my buddies.
 

ozfunghi

Member
There are many sources confirming that the 3DS was originally specified to use the Tegra 2, even to the point of distributing dev kits using the chip (which was, at the time, the most advanced SoC Nvidia had on offer). Why they switched to their internally-developed chip with Pica200 graphics IP is a different matter, but if they were willing to use a top-of-the-line SoC at that time, when there was little need to (as the DS had shown them they could succeed without competing on performance) why wouldn't they be willing to do so now, when there would be a benefit to doing so (in being able to more easily develop games which run across both it and a home console)?

No problem. They considered a modern chip for 3DS. That hardly constitutes a "trackrecord" though. Which was the point he was trying to make. I never claimed they wouldn't or couldn't use a modern chip. I've been vocal enough in this discussion and clear about that. But you can't deny the past hardware iterations where Nintendo very clearly went with "latteral thinking with withered technology". Be it CPU's, be it handheld screens or motion sensors...
 

MacTag

Banned
No problem. They considered a modern chip for 3DS. That hardly constitutes a "trackrecord" though. Which was the point he was trying to make. I never claimed they wouldn't or couldn't use a modern chip. I've been vocal enough in this discussion and clear about that. But you can't deny the past hardware iterations where Nintendo very clearly went with "latteral thinking with withered technology". Be it CPU's, be it handheld screens or motion sensors...
N64 and GC didn't really follow the withered technology ideology either. Neither did SNES in some respects, Nintendo has a pretty diverse history when it comes to architecture design.
 

MCN

Banned
N64 and GC didn't really follow the withered technology ideology either. Neither did SNES in some respects, Nintendo has a pretty diverse history when it comes to architecture design.

But..but they changed once, they can never change again. That would be inconceivable!
 

Thraktor

Member
Posts like this help me to understand how my wife feels when I talk shop with my buddies.

Basically, even if we knew the configuration of a Tegra GPU used in a Nintendo handheld (i.e. 128, 192, 256, etc. Cuda "cores") and we know the manufacturing tech (i.e. 28nm, 20nm or 16nm) it's still very difficult to say for sure what the performance would be at the ~2W power envelope we'd expect a Nintendo handheld to occupy.

As a further note, I'd be reasonably confident in say that, if the Tegra rumour is true, the only reason we'd be getting a chip with SMs other than Maxwell (i.e. 128 ALU per SM) is if they use TK1 as an off-the-shelf chip (which isn't an impossible proposition itself). If it's the TX1, Parker, or any kind of custom chip whether on 28nm, 20nm or 16nm, it seems extremely likely that it would use Maxwell SMs, and hence the question is how many of those SMs would it use (1 or 2 being the immediate options) and what clock would you get from them in a handheld chip (dependent on process, and difficult to estimate, as discussed above)?

That being the case, and using FP32 Gflops as a proxy for performance, here's the general scale of performance we'd be looking at:

Code:
	300 MHz	400 MHz	500 MHz	600 MHz
1 SM	76.8	102.4	128	153.6
2 SM	153.6	204.8	256	307.2

The rightmost column is pretty much a "best case on 16nm" kind of situation.

No problem. They considered a modern chip for 3DS. That hardly constitutes a "trackrecord" though. Which was the point he was trying to make. I never claimed they wouldn't or couldn't use a modern chip. I've been vocal enough in this discussion and clear about that. But you can't deny the past hardware iterations where Nintendo very clearly went with "latteral thinking with withered technology". Be it CPU's, be it handheld screens or motion sensors...

I see your point, but I think trying to ascribe any particular track record to Nintendo on hardware, and especially ascribing it based on a quote from someone who left the company two decades ago, is a flawed way of looking at things. Nintendo, like any other company designing mass-market products, will use expensive advanced technology when they feel it's integral to the design of the device, and will use older, cheaper technologies in areas they feel are less important. The 3DS and Wii U are testament to this, as they used an expensive modern 3D screen in the former and custom-designed wireless video streaming hardware in the latter. In both cases Nintendo obviously felt that getting the right technology was important to their vision of the product, and in both cases they spent the money to do so.

The reason that I believe they may be willing to use a relatively powerful SoC in the handheld this time is that, unlike at any time prior, a powerful SoC is integral to the design of the device, in that it allows them to much more easily share a library of software between it and the home console. The fact that they've already talked about their desire to make cross-device development easier between their handheld and home consoles, and that they've merged the previously separate design departments shows that the ease of making games run across both devices is going to be one of the primary hardware goals for them. That being the case, and if they want their home console to be within the performance ballpark necessary for third party ports (which perhaps they don't), then the natural deduction is that they should attempt to procure the highest-performing handheld SoC they can get within their budget and TDP limits.
 

ozfunghi

Member
But..but they changed once, they can never change again. That would be inconceivable!

Maybe read the original statement before posting crap like this.

N64 and GC didn't really follow the withered technology ideology either. Neither did SNES in some respects, Nintendo has a pretty diverse history when it comes to architecture design.

Which is why i said "in the last decade".

I see your point, but I think trying to ascribe any particular track record to Nintendo on hardware, and especially ascribing it based on a quote from someone who left the company two decades ago, is a flawed way of looking at things. Nintendo, like any other company designing mass-market products, will use expensive advanced technology when they feel it's integral to the design of the device, and will use older, cheaper technologies in areas they feel are less important. The 3DS and Wii U are testament to this, as they used an expensive modern 3D screen in the former and custom-designed wireless video streaming hardware in the latter. In both cases Nintendo obviously felt that getting the right technology was important to their vision of the product, and in both cases they spent the money to do so.

The reason that I believe they may be willing to use a relatively powerful SoC in the handheld this time is that, unlike at any time prior, a powerful SoC is integral to the design of the device, in that it allows them to much more easily share a library of software between it and the home console. The fact that they've already talked about their desire to make cross-device development easier between their handheld and home consoles, and that they've merged the previously separate design departments shows that the ease of making games run across both devices is going to be one of the primary hardware goals for them. That being the case, and if they want their home console to be within the performance ballpark necessary for third party ports (which perhaps they don't), then the natural deduction is that they should attempt to procure the highest-performing handheld SoC they can get within their budget and TDP limits.

He may have passed two decades ago, but Wii, DS, WiiU, 3DS... are not from two decades ago. So i don't think that's a fair point to dismiss my argument.

Also, again, as my post history shows, i'm not claiming they will not or can not go with a modern design. I'm just saying it's not unlike them to NOT go there. They DO have a trackrecord. I've followed (and posted in) the WUST threads. "No way they'll release ANOTHER underpowered console". That was impossible. They learned their lesson etc etc... They would actually have to go out of their way, to come up with something not at least 3x more powerful than the xbox 360... Trust me, i've been there.

And i know there are plenty of arguments to support the idea of them going with a modern/powerful chip, i never claimed otherwise. I merely said they have their trackrecord against them. Why are we even debating this?
 

MacTag

Banned
Which is why i said "in the last decade".
Lateral thinking from withered hardware is an architecture ideology from over 30 years ago. The Game & Watch, Famicom or Game Boy weren't exactly last decade either.

We've only had a single hardware cycle launch in the last decade before NX. 3DS and Wii U; both launches signaled a contracted or failed design approach and neither was exactly a cheap/witchered technology design despite their low competitive performance. I'm not sure repeating that makes the most sense for Nintendo but we'll have to see what they do.
 

ozfunghi

Member
Lateral thinking from withered hardware is an architecture ideology from over 30 years ago. The Game & Watch, Famicom or Game Boy weren't exactly last decade either.

We've only had a single hardware cycle launch in the last decade before NX. 3DS and Wii U; both launches signaled a contracted or failed design approach and neither was exactly a cheap/witchered technology design despite their low competitive performance. I'm not sure repeating that makes the most sense for Nintendo but we'll have to see what they do.

What does any of this have to do with what i said? Because that's a 30 year old quote, doesn't mean Wii, WiiU, Ds, 3DS... didn't happen in the past decade, does it? So, do they or do they not have a trackrecord in re-using old technology? Yes they do.
 
What does any of this have to do with what i said? Because that's a 30 year old quote, doesn't mean Wii, WiiU, Ds, 3DS... didn't happen in the past decade, does it? So, do they or do they not have a trackrecord in re-using old hardware? Yes they do.

Let's clarify, though:

It's their market leading successes that have always been connected to their lateral thinking with withered technology strategy.
 

ozfunghi

Member
Let's clarify, though:

It's their market leading successes that have always been connected to their lateral thinking with withered technology strategy.

I'm about to punch my screen.

Please go back to my original quote wich was RIDICULOUSLY pulled out of context.
 

MacTag

Banned
What does any of this have to do with what i said? Because that's a 30 year old quote, doesn't mean Wii, WiiU, Ds, 3DS... didn't happen in the past decade, does it? So, do they or do they not have a trackrecord in re-using old technology? Yes they do.
DS and Wii didn't actually even if their cycles extended into it, just 3DS and Wii U. Both really used a mix of technologies and neither were exactly cheap either.
 
I'm about to punch my screen.

Please go back to my original quote wich was RIDICULOUSLY pulled out of context.

I'm not arguing with your original quote, just trying to clarify what parts of Nintendo's "track record" we can really link to the philosophy you're describing.

I don't think Wii U or 3DS exactly qualify.
 

ozfunghi

Member
DS and Wii didn't actually even if their cycles extended into it, just 3DS and Wii U. Both really used a mix of technologies and neither were exactly cheap either.

Fucking hell. Wii was released at the end of 2006. DS 2 years or so sooner. WiiU and 3DS weren't cheap because they put their budget towards different features. Which doesn't contradict anything i've said. We don't know what NX is, for all we know the NX will use a holographic projector, again leaving no room for cutting edge CPU/GPU.

Please read my original statement and please tell me where any of this contradicts what i said. I'm not responding to this bullshit any longer.
 

MacTag

Banned
Fucking hell. Wii was released at the end of 2006. DS 2 years or so sooner. WiiU and 3DS weren't cheap because they put their budget towards different features. Which doesn't contradict anything i've said. We don't know what NX is, for all we know the NX will use a holographic projector, again leaving no room for cutting edge CPU/GPU.

Please read my original statement and please tell me where any of this contradicts what i said. I'm not responding to this bullshit any longer.
Neither DS or Wii were designed in the past decade. If you're going to disqualify systems like SNES, Gamecube or N64 due to not falling within that set timeline, then DS and Wii would also be out based on what you said.

3DS was planned to use Tegra 2 and that falling through last minute is what led to the ARM11/Pica200 combo. And despite being based on older tech Wii U's SOC was estimated to cost close to what the PS4 APU did a year later. Both systems were designed as premium devices despite their lower comparable visual capabilities, neither really follows Yokoi's withered technology approach from the 1980s. Even their architectures.
 

MuchoMalo

Banned
I believe making more out of it would have been exponentially more expensive while keeping it at 30W with that process node.

lolno. There were two newer revisions of that same uArch on that node. Take out the eDRAM and use something like a Brazos APU, and it would have been cheaper than Wii U SoP.
 

Turrican3

Member
I think trying to ascribe any particular track record to Nintendo on hardware, and especially ascribing it based on a quote from someone who left the company two decades ago, is a flawed way of looking at things.
To be fair, Nintendo *does* have a track record regarding handhelds not using cutting edge hardware.
And while it's true they were quite close to reverting that trend with Tegra on 3DS, final hardware was a bit different than that so I'd argue that, as of now, this is still accurate.

I do unterstand though why this time they might actually want to go the (more or less) powerful route.
 
The handheld and console hardware development teams were merged a few years ago, so I wouldn't necessarily use previous handhelds as a determining factor as to what to expect from the NX handheld.

This is another part of what makes the wait all the more exciting though. It's unlikely they'll return to a DS-like form factor, so anything is game.
 

Earendil

Member
Basically, even if we knew the configuration of a Tegra GPU used in a Nintendo handheld (i.e. 128, 192, 256, etc. Cuda "cores") and we know the manufacturing tech (i.e. 28nm, 20nm or 16nm) it's still very difficult to say for sure what the performance would be at the ~2W power envelope we'd expect a Nintendo handheld to occupy.

As a further note, I'd be reasonably confident in say that, if the Tegra rumour is true, the only reason we'd be getting a chip with SMs other than Maxwell (i.e. 128 ALU per SM) is if they use TK1 as an off-the-shelf chip (which isn't an impossible proposition itself). If it's the TX1, Parker, or any kind of custom chip whether on 28nm, 20nm or 16nm, it seems extremely likely that it would use Maxwell SMs, and hence the question is how many of those SMs would it use (1 or 2 being the immediate options) and what clock would you get from them in a handheld chip (dependent on process, and difficult to estimate, as discussed above)?

That being the case, and using FP32 Gflops as a proxy for performance, here's the general scale of performance we'd be looking at:

Code:
	300 MHz	400 MHz	500 MHz	600 MHz
1 SM	76.8	102.4	128	153.6
2 SM	153.6	204.8	256	307.2

The rightmost column is pretty much a "best case on 16nm" kind of situation.

Thanks for the explanation. There was a time I would have easily understood what you guys were talking about, but years of being a web developer instead of an application developer have made me forget most of the low level knowledge I ever had (not that it would still be relevant 20 years later).
 

Ganondolf

Member
whatever chips Nintendo use, I expect they are aiming for about wii u specs. this is their normal handheld cycle (being similar in power to their previous gen home console).
 
whatever chips Nintendo use, I expect they are aiming for about wii u specs. this is their normal handheld cycle (being similar in power to their previous gen home console).
It's usually two systems behind GBA was SNES, DS was PS1/N64, and 3DS was Gamecube.
There wasn't a leap from GameCube to Wii so maybe we'll get something close to 360/Wii U?
 
That phrase is just fancier way of saying "Cheap Gimmicks".
Not really.

I wouldn't refer to the Game Boy as a cheap gimmick because it used an LCD screen that was "withered." Gimmicks by definition are used to attract attention, I can't say that screen was meant to attract attention. It was used to create a device that was super cheap, durable, and energy efficient.
 

Hcoregamer00

The 'H' stands for hentai.
Not really.

I wouldn't refer to the Game Boy as a cheap gimmick because it used an LCD screen that was "withered." Gimmicks by definition are used to attract attention, I can't say that screen was meant to attract attention. It was used to create a device that was super cheap, durable, and energy efficient.

Using the metric, having a single touch capacitive IPS LCD screen at 480p/540p is within that metric. The big question is this, what nvidia CPU/GPU combination fits the "cheap, durable, and energy efficient" metric?

Whatever the case is, it will be more powerful than the PS Vita by virtue of huge technological gains since both the 3DS and Ps Vita was released.
 

Thraktor

Member
While looking into Nvidia's upcoming Parker chip, I figured it would be worth discussing the theoretically possible, but still very, very unlikely scenario that Nintendo could use the same SoC die across both handheld and home versions of NX with simply a substantial difference in clock speeds between the two. I should emphasise that I don't consider Parker suitable to be this chip (for reasons I'll go into momentarily), but it's worth looking into Parker as a guide for where Nvidia's plans are with Tegra:


  • Manufactured on TSMC's 16FF+ process
  • Hex-core CPU: 2x Denver, 4x A57
  • "Pascal" GPU, likely Maxwell-style 128 ALU SMs, probably 3-4 SMs (ie 384 or 512 ALUs)
  • GPU clocks should exceed 1GHz, possibly around 1.2GHz
  • LPDDR4 memory, 128-bit, 51GB/s
For the reasons I wouldn't expect this to be suitable for, well, either device, to be honest with you, the first is those Denver cores. Without going into too much detail, Denver isn't actually a "true" ARMv8 CPU. It uses an internal VLIW-based instruction set and dynamically recompiles ARM code to that instruction set. This gets in the way of two things that are vitally important to a console/handheld CPU: predictable performance and straightforward optimisation. Denver has shown itself to have fairly erratic performance in its debut in the Nexus 9, performing well in certain situations and poorly in others, depending on how well suited they are to its peculiar architecture. I wouldn't be all that confident in its ability to run, for example, pathfinding routines with any degree of efficiency. On the optimisation front, trying to write ARM code which is optimised for Denver would be like trying to write x86 code which is going to be emulated on Itanium, i.e. something which would send even the best coders into the depths of insanity. Something to be avoided at all costs when you want to make porting to your platform as quick and painless as possible.

Secondly, while 51 GB/s is plenty of memory bandwidth for a handheld, it would be completely insufficient for the home console. Think of it like the XBO not being able to use SRAM at all and having to run everything off its main DDR3 pool, but with even less bandwidth than that.

That all being said, it may be worth considering what a hypothetical Tegra chip for both home console and handheld might look like. We'll call it the TN1:


  • Manufactured on TSMC 16FF+
  • CPU: 8x A72 (2GHz+ on home console, a lot less on handheld)
  • GPU: 4 SMs, 512 ALU (1.2GHz+ on home console, ~300MHz on handheld, or 3 SMs at ~400MHz for yields)
  • RAM: 4x 64-bit LPDDR4 (full 256-bit bus used on home console for ~120GB/s, 64-bit bus on handheld for ~30GB/s)
Before we get into the inherent craziness of Nintendo releasing a handheld with an SoC like this, let's look at the advantages Nintendo would get for using a single SoC across both devices:


  • Reduced R&D cost: you only have to pay Nvidia to design a single die, and you only have to go through one tape-out and validation process.
  • Simpler procurement: you only need to deal with a single order for a single piece of inventory, and you reduce inventory risk, as if (for example) the home console doesn't sell as well as you expect, you can use those chips for the handheld instead.
  • Binning: You can bin dies for the different products, which usually isn't possible with semi-custom console chips. For example you can test the dies to see which ones run better at lower voltages and use those for the handheld. Alternatively, you can only enable 3 SMs in the handheld, allowing you to use dies which would otherwise be considered faulty.
  • Perfect scaling: You want precisely five times the GPU performance in the home console versus the handheld? How about the exact same chip running at five times the clock?
  • Handheld energy efficiency: Using a large, low-clocked GPU in the handheld will give better performance per Watt than a smaller, higher-clocked GPU would. (ie 512 ALUs at 300MHz will consume less power than 256 ALUs at 600MHz would for the same performance)
And the disadvantages:


  • Chip cost: As most chips will end up in handhelds, you end up using a much bigger (and hence more expensive) die in the handheld than you need to for a given performance level, substantially increasing your costs.
  • Limit to voltage binning: With let's say 75% of dies going into handhelds, there wouldn't be a huge gain from binning the most energy-efficient chips for the handheld. You'd get to design your handheld for the 25th percentile performance, which is better than the 0th percentile, but not by a whole lot.
  • CPU choice: The optimal CPU cores for a handheld chip running at ~2W and a home console running an order of magnitude higher are going to be quite different. You either end up limiting the peak performance of the home console (say with A53s) or forcing the mobile CPU to run at extremely low frequencies (say with A57s/A72s).
CPU

The thing that really interests me from the above is the CPU choice. Unlike the GPU performance and memory quantity and bandwidth, CPU requirements don't scale down when you go from a 1080p console to a 540p handheld. Game logic doesn't vary with resolution, and if you want demanding games to run across both devices you'll want to squeeze as much CPU performance out of the handheld as possible.

As I've argued before, if you're designing an SoC for a handheld today, A53s make the most sense, as they provide the best performance at the kind of thermal limit (ie <1W) that's going to be allocated to a handheld CPU. The fact that they're so small also means you can squeeze eight or more of them on a small, cheap SoC and still get a fairly good amount of performance out of them. In this situation, though, they're probably not going to give the kind of performance you'd want from a home console CPU. They should clock to over 2GHz on 16FF+, and by blu's matrix mult benchmark they would actually comfortably outperform PS4's and XBO's CPUs at that clock, but in other circumstances their performance may be found a bit wanting for CPU-intensive multi-platform games. That pretty much leaves us with A72s.

Fortunately, there happens to be a very good resource on the performance and power consumption of A53 and A72 cores on 16FF+ in the form of Anandtech's review of the Huawei Mate 8, which uses a 16FF+ Kirin 950 SoC with said cores. From this, we can estimate the kind of clock speed that we might expect to be able to run eight A72 cores in a handheld on a 16FF+ chip. We'll assume that the combination of 16nm and very low clocks has been extremely successful in bringing down the GPU's power consumption to the point where it consumes under 1W in operation, and there's a full 1W left over for the CPU (i.e. pretty much the best case scenario).

One challenge to estimating the achievable clocks, even with the data from the Anandtech article, is that the Kirin 950 applies a minimum supply voltage of 775mW to each A72 cluster at 1.5GHz and below, a conservative decision by Huawei on the basis that they're using early 16FF+ silicon and the A72 clusters spend most of their time north of 1.5GHz in any case. This won't be Nintendo's strategy, as they'll use a fixed clock, and will want to keep the supply voltage as low as possible to maintain that clock, to keep power consumption down. The manufacturing process would also be about 1 year more mature, so would be able to do so more reliably.

What this means is that, below 1.5GHz, the power consumption figures for the A72 clusters in the Kirin 950 wouldn't reflect the power consumption expected from A72 clusters in a Nintendo handheld. Fortunately, Anandtech does give us a few graphs and data points which we can use to estimate the actual clocks we may be looking at.

Working from the data available in the article, I've come to an estimated 800MHz clock speed for two quad-core clusters of A72s on 16FF+ in a 1W combined TDP. This is actually higher than I'd expected, but obviously it's a lot lower than they'd be clocked in a console environment (perhaps by a factor of 3). While there aren't many benchmarks I can find across the A72 and Jaguar, the most suitable one I can find from a gaming point of view would be the Geekbench single-core floating point test (as multi-core would include the A53s on big.little ARM SoCs). Taking this as a guide, the octo-core A72 at 800MHz would actually only perform about 20% worse than the 1.6GHz octo-core Jaguar used in the PS4. This is far closer than I would have thought for such a TDP-constrained CPU, and it would actually make ports of more CPU-intensive PS4 and XBO games within the ballpark of possibility. That being said, this is assuming a full 1W is available for the CPU (it could be half that) and assuming that Geekbench's floating point test is a reasonable analog for game performance (it may not be), so take the comparison with a grain of salt.

Cost

Aside from CPU performance, the cost of such a chip is something we could also look at, although with much less rigor and a much larger margin of error. Die cost is pretty much just a function of die size and manufacturing node, so first we'll try to estimate die size of our TN1. The CPU is the easy part, as ARM have told us that a quad-core cluster of A72s on 16FF+ with 2MB cache is around 8mm², giving us 16mm² for our CPU. The GPU is harder to estimate without any Pascal die photos to measure off, but using the absurd oversimplification that this GPU has 1/5th the SMs of GP104, therefore must be 1/5th the size, we've got a value of 63mm². The 128-bit LPDDR4 interface on the 16nm A9X takes up around 24mm² of space, so a 256-bit interface would need around 48mm². Then, add about 25% for the remaining blocks (audio, crypto, codecs, etc.) and we come to 159mm², which is to put it mildly a giant fucking die to try to squeeze into a handheld. The A9X is 147mm², though, so let's just roll with it as a hypothetical.

Said A9X is our best comparison point for the cost of the TN1, but we don't actually have any direct information on the A9X's cost. There's a blog post which attempts to estimate the price of the A9X, and comes to the value of $37.30 including packaging, but it's worth keeping in mind that even the author admits that "there is room for error" in the estimate, so it could be north or south of that. I can't say I'd do any better, though, so let's take $37.30 as the cost of an A9X. There are a few aspects of the hypothetical TN1 which would both increase and decrease its price relative to the A9X. The first, and most obvious, is that it's appearing about 18 months later, meaning a more mature manufacturing process with higher yields and likely lower wafer costs, bringing down the price.

On the wafer cost side, this paper (PDF) estimated a 5.5% reduction in unyielded wafer costs for 16nm FinFET from Q4/2015 to Q4/2016, and if we extrapolate that to 18 months we'd see an 8.25% wafer cost decrease since the A9X, which means we can assume a $7,686 cost per 300mm 16nm wafer from TSMC for the TN1 (going by the blog's $8,400 wafer estimate).

Yield improvements are much more difficult to estimate. The A9X calculation worked on an estimate of 65% yield for the 147mm², which works out to a fault rate of about 0.3% fault probability per mm². We would expect this fault probability to drop over time (increasing yields), although the increase die size will have the opposite effect. We do actually have a useful data point on this, which is the existence of the 16nm GP104 die in consumer products about half-way between the launch of the A9X and TN1. Nvidia reportedly sees about 60% gross margin from its high-end desktop GPU sales, and we would assume they wouldn't release the GTX 1080 unless it gave them similar margins to the product its replacing, so we should be able to assume that the price Nvidia sells the GTX 1080 chip to EVGA, Asus, etc. gives them about a 60% gross margin.

Taking the $599 price point of the GTX 1080, let's strip away 25% of that for retailer margins, leaving $449.25 going back to EVGA. Let's assume EVGA themselves work on around 15% margin and 10% goes on logistics, leaving $336.94 for the full graphics card, of which the major costs will be the GPU chip and the GDDR5X memory. The GDDR5X is obviously more expensive than GDDR5, perhaps significantly so given Nvidia's choice to not use it in the GTX 1070, but difficult to estimate. Regular GDDR5 is likely to have reduced quite a bit since Sony was reportedly paying $88 for 8GB of 5.5GT/s on a 256-bit bus, but the bump to 10GT/s GDDR5X may be equal and opposite, so let's just assume an $88 cost for the GDDR5X today as it's the only data point we have. That leaves $248.94 for the GPU chip and other components, of which we'll assume somewhere around $200 is the GPU. This puts the cost to Nvidia at around $80 for them to retain their 60% gross. With an estimated $8,053 wafer cost, this would indicate a fault rate of pretty close to 0.2% per mm² for 16FF+ at the moment.

Now, the reduction in fault rate won't be linear over time, and should be expected to be closer to an exponential curve, with fairly rapid reduction early on, followed by much slower reduction in faults as the process matures. With only two data points, it's hard to estimate where we are in that curve, but we're probably past the biggest reductions if a chip like GP104 is even viable. Hence, I'm going to estimate that yields for our March 2017 launch TN1 will be in the order of 0.15% per mm², not as big a jump from the A9X to the GP104, but a reasonable enough jump for a still-maturing node.

Given a $7,686 wafer cost, a 0.15% fault probability per mm² and a 159mm² die, my calculations give me a "raw" die cost of $25.86. Add about $5 for packaging to give $30.86, and then a 15% gross margin for Nvidia (which is roughly what AMD are getting for their semi-custom chips) to give a final cost to Nintendo of $36.31 per TN1.

Is that feasible for a Nintendo handheld? Well, IHS estimated the cost of the 3DS's SoC at around $10, so it's a hell of a lot more than they've spent before. On the other hand, Nintendo reportedly spent $33.80 on the 3DS's 3D screen, so perhaps they're not entirely unwilling to spend that kind of money on a handheld component.

Handheld BoM

Let's assume for a second that, apart from the TN1 and RAM, Nintendo is keeping every other component in the device on the cheapest end of the spectrum as possible to get this to work. Then, aside from SoC and RAM, you'd be looking at a BoM similar to cheap $70 4.5" 480p smartphones like the Huawei Y560 or Honor Bee. Nintendo would be adding physical controls, but the modem wouldn't be needed, and the screen, battery, etc. would all be very similar. These are obviously sold at extremely thin margins, similar to a console, so if they're selling for about $70 we're probably looking at a BoM of $35-$40, or closer to $30 once you remove the SoC and RAM. We'll need to add the TN1 to that, but also RAM. For RAM, given the performance of the device, you'd probably be looking at 3GB of LPDDR4. IHS's Galaxy S7 teardown estimates a price of $25 for a 4GB chip of LPDDR4 in early 2016 on a PoP package. Nintendo would be looking for 3GB, a year later, and without the expensive PoP packaging, so the price would certainly be lower, but it's still going to be relatively high-end RAM by then, so let's assume $17 for the 3GB. For the remaining changes, let's also assume that Nintendo will include more flash memory than these cheap phones, and the physical controls and Amiibo NFC chip add a bit of cost, bringing the entire BoM up by about $15.

So, after taking all this into account, our estimated BoM for a TN1-powered handheld with a 480p screen is $98.31. It should be emphasised that there are plenty of sources for error in this estimate, with many guesses along the way, but that does put us in the ballpark necessary to sell for $199 retail while breaking even. Which is kind of crazy for a handheld which could in theory handle PS4 ports, but such is technological progress.

Home console BoM

As the TN1 would also be used in an NX home console in this scenario, it's also worth looking at the cost implications there. A $36.31 SoC would be substantially cheaper than would usually be expected in a home console, particularly compared to the ~$100 chips used in the Wii U and PS4, but it would give them scope to spend more on other components while keeping price reasonable. The first of these would be RAM, where they could use the same 3GB LPDDR4 parts as the handheld for 12GB overall at $68. This may seem like a lot, but the total SoC+RAM cost would still be just $104, compared to $188 for the PS4 and $170 for the XBO at launch. If they drop the optical drive and hard drive (estimated at $28 and $37 respectively for PS4 launch, although both have reduced since), they could include a sizeable pool of flash (ie 256GB+) while still at least breaking even at a sub $300 launch price. At a 2GHz CPU clock and a 1.5GHz GPU clock we'd be looking at a TDP around 40W, so you'd be getting a fairly compact, power-efficient console for your money.

TL:DR

In theory, it seems that it would be possible for Nintendo to use a single 16nm chip in both the home NX and the handheld NX (clocked substantially lower in the latter), where the home console would have roughly 1.5 NV Tflops of GPU performance and would be capable of running PS4/XBO games, and the handheld would have roughly 300 NV Gflops of GPU performance and would be capable of running such games at 480p. That all being said, I absolutely don't expect it to happen, as even if Nintendo wanted these kinds of performance levels, they could do so at much lower cost if they used a different, smaller chip in the handheld.
 
TL:DR

In theory, it seems that it would be possible for Nintendo to use a single 16nm chip in both the home NX and the handheld NX (clocked substantially lower in the latter), where the home console would have roughly 1.5 NV Tflops of GPU performance and would be capable of running PS4/XBO games, and the handheld would have roughly 300 NV Gflops of GPU performance and would be capable of running such games at 480p. That all being said, I absolutely don't expect it to happen, as even if Nintendo wanted these kinds of performance levels, they could do so at much lower cost if they used a different, smaller chip in the handheld.

Excellent post. Not technical enough to add value to this analysis. However, this does make me very interested in the graphical capabilities of an NX handheld. That we're even rationally talking about ballpark PS4 ports is amazing.
 
my thinking is gamecube and 3ds were almost the same.
If you consider DS and N64 equivalent, then sure. It's probably a bit weaker in some cases, but it has better shaders allowing for prettier games.
If NX follows the same generational leap it would be around 360 (since Wii wasn't a generational leap tech wise)
 

Nanashrew

Banned
my thinking is gamecube and 3ds were almost the same.

If I recall correctly, spec wise on paper, the 3DS is a tad lower than what was in the Gamecube. However, 3DS renders to a smaller screen, uses more modern tech, has modern instruction sets and shader libraries that can make plenty of games look like they're on a similar level. Biggest bottleneck is the CPU though, but n3DS should be more capable since it's roughly 3x faster and has more cores.

Makes me hopeful for their next handheld since that was their biggest upgrade for a revision ever.
 
Reason why I figured NVIDIA got shot out of the PS4/Xbone, and why I'm surprised Nintendo bit on their bait. Everything I've heard about NVIDIA's cooperation with console manufacturers has been far from positive in its implications.

This is why I'm hoping the Nvidia rumors aren't true. This could negatively impact the NX console and/or handheld if Nvidia is up to their old tricks again.

Actually, let say some time down the line Nintendo decides to do a refresh a la PS4 Neo/Xbox Elite. Could they switch to an AMD GPU without too big of an impact? I know this has never been done.
 

Trace

Banned
This is why I'm hoping the Nvidia rumors aren't true. This could negatively impact the NX console and/or handheld if Nvidia is up to their old tricks again.

Actually, let say some time down the line Nintendo decides to do a refresh a la PS4 Neo/Xbox Elite. Could they switch to an AMD GPU without too big of an impact? I know this has never been done.

No, console games are programmed down to the intricacies of the GPU, you can't just switch architectures like that.
 

LeleSocho

Banned
Thraktor yours is a nice post but i don't think a shared SoC in both consoles is a feasible thing for the reasons yourself have listed.
I think it will simply limited as a shared architectures for both cpu and gpu.
 
In my opinion its hard to believe that such a battery and price sensitive system like a dedicated gaming handheld will go with anything else than PowerVR.
 
Thraktor yours is a nice post but i don't think a shared SoC in both consoles is a feasible thing for the reasons yourself have listed.
I think it will simply limited as a shared architectures for both cpu and gpu.



That also sounds crazy expensive for a SoC Nintendo would use. I also find the argument that Nintendo used an expensive screen on 3DS so they could use an expensive SoC point to be a bit wrong.
The 3D screen was the selling point for Nintendo. It was basically their gimmick, the reason for the device to exist the way it was. Now when you think of it, I dont see Nintendo using hardware power as a selling point.

Now I understand the reasoning behind shared library and the need of a powerful handheld and I share this thinking too. But what people should see is it's highly likely the handheld will be the lowest common denominator. Basically, I dont see NX home console to be faster than XBO... In fact I see it slower. I expect more of a 128gflops handheld and a 512 to 768gflops home console. And yes it would still be able to receive PS4One ports. Because it all comes down to architecture and scalability.
 
In my opinion its hard to believe that such a battery and price sensitive system like a dedicated gaming handheld will go with anything else than PowerVR.



PowerVR, Qualcomm, ARM Mali... Heck even Tegra is a possibility. The thing is there's a lot of possibilities and I dont see why you're thinking PowerVR is the only one making sense for a handheld. Btw Tegra X1 is more power efficient than PowerVR offering at 20nm.
 
I wonder what kind of screen would be used?
I'm guessing OLED is out of the question. With all that (potential) power, you would need a nice screen to show it off!
No doubt It's certainly going to be an interesting unveiling.
This thing could make (get back to good sales) or break them!
 
Top Bottom