Question to you guys:
Where do you find these CPU power consumption ?
From anandtech, A72 doesn't seem to consum that much
http://www.anandtech.com/show/9878/the-huawei-mate-8-review/3
The graph I posted above is based on that article, but it's worth keeping in mind that I'm quoting figures for quad-core clusters of A72s and A53s (not individual cores), and I'm assuming constant clock speeds (unlike phone SoCs where clocks jump up and down as required).
Thraktor, let's say that Nintendo goes for a 6-inch screen instead, and only 4 or 6 cores. How would that affect your estimates? It looks like you're looking for configurations where the entire CPU is under 1W, and thus you feel that they'll be limited to a 2W SoC?
I don't really take screen size into account, as generally a larger screen means space for a larger battery, which cancels out the increased power draw from the larger screen. Regarding power draw, yes, I would be assuming a total of about 2W for the SoC, with that heavily weighted towards the GPU (so the GPU consuming perhaps twice the power of the CPU). It's conceivable that Nintendo could use a slightly higher TDP, though, and/or a different allocation of power between CPU and GPU, so there's definitely some wiggle room there.
This is my theory.
Nintendo will most likely have its own custom version of the Tegra rather than an X1 or X2.
From a pure handheld perspective they probably will devote most of the Watt budget to the GPU rather than the CPU so I would expect 4xA53.
However I also think a big important requirement for the NX is that it can emulate earlier Nintendo consoles.
A53 emulating the Wii CPU is most likely not possible, does anyone know of any Dolphin tests runnning on A53s?
Because of that I think they will add at least one higher performance CPU core.
nVidias Denver CPU is extra interesting here because it is actually not a native ARM CPU.
It translates ARM instructions to its native instruction set and has hardware to optimize this by storing the translated instrcutions in a rather large cache.
http://www.anandtech.com/show/8701/the-google-nexus-9-review/4
Nintendo could potentially work with nVidia to make the Denver CPU execute PowerPC code very efficiently.
On the other hand an A72 or A73 would be fast enough to emulate the Wii CPU the traditional way so it might be unnecessarily complex to write the Denver translator.
If Nintendo want to be able to emulate the Wii U also they probably need at least 3 fast cores and 3 A72/A73 running fast enough to emulate the Wii U would draw a lot of power.
In the end my theory is 4xA53 + 1 Denver, or 4xA53 + 4 Denver depending on if they want to emulate Wii U or not.
The Denver cores would run at low clock speed in handheld mode but could run either ARM or PowerPC code.
In docked mode they could run at full speed.
What do you guys think?
In theory Denver is well-suited for emulation (that's effectively what it's doing when it runs ARM code), but there would be a few drawbacks when using it in a console:
- While running ARM code, it performs well in synthetic benchmarks, but poorly in less predictable real-world scenarios. If the same is true while running PPC code, then there's the potential for stuttering or erratic performance of VC games, which would be difficult to fix (unlike traditional emulation where there's a software emulation layer you can tweak on a per-game basis if necessary).
- The A72 is much smaller, much more power efficient, and significantly outperforms it in both general purpose and game-related scenarios. It doesn't really make sense to give up that much performance just for the sake of slightly better VC.
- The A72 should be able to handle GC and probably Wii games adequately anyway. The TX1 (which uses the older A57 cores at 1.9GHz) can run Dolphin very smoothly, and that's on top of Android with a reverse-engineered emulator. Nintendo can run their emulation as close to the metal as they like, and obviously have a perfectly detailed knowledge of how the original hardware operates, so you would expect them to achieve better performance with their own emulator. In addition, the A72 has higher performance per clock than A57, and could potentially clock higher even in handheld mode (see discussion below), so there wouldn't really even be a need for Denver.
This does follow on neatly to a thought that I had about possible CPU configurations for NX, though, in relation to discussions with blu above about single thread versus multi-thread performance. It occurs to me that a more esoteric core configuration might be worthwhile, and that it's worth considering a 2:4:2 config consisting of 2x A72, 4x A53 and 2x A35 (with the A35s reserved for the OS, with crypto clocks but without NEON). Clocked down to ~800MHz the A35s could consume as little as 50mW while handling background OS duties (which would also keep standby power down), and the remaining six cores would provide a good mix between single thread and multi thread performance in the power envelope we're looking at.
Given that I have the data in front of me in a spreadsheet, I quickly threw together this graph showing the tradeoff between single and multi threaded performance within a 660mW power budget as you alter power draw between the A72s and A53s in this scenario:
As the graph moves to the right more of the 660mW is allocated to the A72s, while on the left more is allocated to the A53s. The blue line shows total multi-threaded performance at that particular power allocation, and the red line shows the peak single thread performance (ie performance of the most powerful individual core). The two lines are scaled relative to just using 4x A53 in the same power envelope. The relative performance between A53 and A72 at the same clock is based on the same Geekbench floating point figure I've been using in my previous comparisons.
Looking at the graph it would appear that dedicating a power budget of around 70% to the A72s would give a good balance between single and multi-thread performance, giving a moderate boost to each over a quad-A53 setup. (This would put the A72s at about 1150MHz and the A53s at about 1000MHz, for reference)
Getting back to VC, though, this kind of setup would allow them to achieve substantially better emulation performance by changing the power distribution between the different cores and the GPU. In this case, where single-threaded CPU performance is a priority, they could clock the A72s up to about 2GHz, clock the A53s down considerably (or even disable them altogether) and clock the GPU down quite a bit as well, all while staying in a ~2W TDP for the full chip. Given Dolphin performance on TX1, that should allow Nintendo to achieve good quality GC/Wii emulation even in handheld mode, which wouldn't really be possible at all on A53s alone.
Furthermore, they could also adopt phone-style burst clocks when browsing the OS, and using non-gaming apps like the web browser. There wouldn't be anything stopping them from clocking the A72s up as high as 2.5GHz for short bursts, which would result in a very snappy user experience.