I don't know how anybody would expect an NX home console released this year to have any more than 30% or so more CPU performance than PS4/XBO, let alone getting anywhere near a 125W desktop CPU.
Let's run through the different options Nintendo have open to them for a CPU for a home console releasing this year:
- x86 ISA -
Advantages:
- Large ecosystem of software, compilers, etc., etc.
- Several options which hit performance required for home console
Disadvantages:
- Only two vendors. If they want to create an NX successor with binary BC then Intel and AMD are their only options (unless VIA suddenly starts competing on performance)
- No options which hit energy efficiency required for handheld, which means using two different ISAs for the two devices, which means added cost in tools/OS/etc. development
- No binary-level BC with Wii U
Intel x86 cores:
Advantages:
- Higher end cores hit performance required for home console
Disadvantages:
- Likely more expensive than any other option
- No options which hit energy efficiency required for handheld
Synthesizable: No
Can be fabbed on-die with the GPU: No (unless they're crazy enough to use Intel IGP for a game console)
Broadwell, Skylake, etc.
Advantages:
- Probably the highest per-thread performance within a console CPU TDP
- Could be fabbed on 14nm for a 2016 launch
Disadvantages:
- Expensive
- Large die area
- See general Intel and x86 disadvantages above
Airmont (Atom)
Advantages:
- Small die area
- Low power consumption
Disadvantages:
- Lower performance per clock than either Puma or A72
AMD x86 cores:
Advantages:
- Plausible cores at least match performance required for home console (as they're already used in PS4/XBO)
- Should be cheaper than comparable Intel cores
- Nintendo have a longstanding relationship with AMD
Disadvantages:
- No options which hit energy efficiency required for handheld
Synthesizable: No
Can be fabbed on-die with the GPU: Yes (but only AMD GPUs)
Puma
Advantages:
- Would slightly outperform Jaguar cores used in PS4 and XBO
- AMD have ample expertise with Puma-based APUs
- Relatively small die area
Disadvantages:
- AMD aren't developing any x86 follow-ups to Puma, making the design of NX2 more difficult
- Likely lower performance than ARM A72 (which is one of the reasons AMD has dropped future development in favour of custom ARM cores)
Excavator
Advantages:
- Better performance per thread than Puma at high TDPs
Disadvantages:
- Power consumption required to get that performance is far, far beyond what's feasible in a console
- Large die area
- Even given the power consumption, performance per thread isn't that good
Zen
Advantages:
- Substantially better performance per thread than Puma
- Should provide high performance per thread even at console-level TDP
Disadvantages:
- Only available on 14nm, which means it's unlikely to be feasible for a 2016 console
- Probably a relatively large die area
- ARMv8 ISA -
Advantages:
- Large ecosystem of software, compilers, etc., etc.
- Several options which hit performance required for home console
- Several options which hit efficiency required for handheld
- Nintendo have a long history of ARM-based devices
- A large number of vendors developing binary-compatible cores across the performance spectrum
Disadvantages:
- Of the available cores, none quite hit the performance of high-end x86 or Power ISA cores
- No binary-level BC with Wii U
ARM in-house cores:
Advantages:
- Can be synthesised on-die with pretty much any GPU architecture
- Higher-end cores hit performance required for home console
- AMD have shown they're happy to work with reference ARM designs
- Relatively cheap
- Nintendo have already designed several SoCs with ARM's in-house cores
Disadvantages:
- No options which quite hit per-thread performance of Skylake/Zen
Synthesizable: Yes
Can be fabbed on-die with the GPU: Yes
A72
Advantages:
- Should moderately exceed performance of Jaguar on 28nm
- Relatively small die area
- See general ARM advantages above
Disadvantages:
- Not quite the per-thread performance of Skylake/Zen
A53
Advantages:
- Very energy efficient
- Tiny die area
- Could use exactly the same core on the handheld
Disadvantages:
- Doesn't have the per-thread performance necessary for a home console
AMD ARM cores:
K12
Advantages:
- Should exceed performance of Jaguar by a significant margin
- Nintendo have a longstanding relationship with AMD
Disadvantages:
- Won't be ready until 2017
Synthesizable: No
Can be fabbed on-die with the GPU: Yes (but only AMD GPUs)
Nvidia ARM cores:
Denver
Advantages:
- Probably exceeds the performance of Jaguar
- Could be integrated in a single die with Nvidia's GPU architecture
Disadvantages:
- Inconsistent benchmarks point to potential issues with dynamic recompilation to internal instruction set
- Nintendo may not have the best relationship with Nvidia, as the 3DS was apparently initially due to use a Tegra SoC, which was then dropped in favour of a custom chip with Pica graphics
Synthesizable: No
Can be fabbed on-die with the GPU: Yes (but only Nvidia GPUs)
Other ARM cores (Qualcomm, Samsung, etc.)
Advantages:
- Some offer performance exceeding ARM reference designs
- Some can be fabbed with synthesizable GPUs (eg Mali, PowerVR, etc.) on the same die
Disadvantages:
- Can't be fabbed on-die with AMD or Nvidia desktop-class GPUs
Synthesizable: No (in general)
Can be fabbed on-die with the GPU: Yes
- Power ISA -
Advantages:
- Nintendo have ample experience with Power ISA
- Could provide binary-level BC with Wii U
- Cores are available which hit performance required for home console (and then some!)
Disadvantages
- No options which hit energy efficiency required for handheld
- Only one vendor actively working on Power cores, and they're not exactly the kind of mid-range cores you'd use in a games console
- No options to fab on the same die as GPU
IBM Power cores:
Advantages:
- Nintendo have a long history of working with IBM
- IBM have been putting out chips on 22nm for a while now
Disadvantages:
- See general Power ISA disadvantages above
Synthesizable: No
Can be fabbed on-die with the GPU: No
POWER8
Advantages:
- Massively exceeds performance required for home console
Disadvantages:
- Enormous die area, far too large for a console CPU
- Very high TPD, far too high for a console CPU
- Lot of redundant functionality that's a waste for a console CPU
PowerPC A2
Advantages:
- Should hit performance levels required for a home console
- Relatively small die area
- Relatively power efficient (particularly on 22nm)
- Lots of floating point/SIMD performance
Disadvantages:
- Excessive floating point performance (game consoles don't need an exclusively 64-bit FP pipeline)
- Probably not that great in non-floating point tasks
- While it could technically provide BC with Wii U code, it's a very different architecture to Espresso, so unlikely to be able to run Wii U code at full speed
PowerPC 750
Advantages:
- Same architecture as used in Wii U, so easy and reliable BC
- Small die area
- Relatively energy efficient
Disadvantages:
- A 32-bit core, so system RAM would be limited to 4GB
- Doesn't hit performance levels of Jaguar
- No successors in development, so just pushing the can down the road
- MIPS ISA -
Advantages:
- Em, easier N64 BC perhaps?
Disadvantages:
- More than are worth going into here
Now, I could go further down along the list into obscure ISAs like RISC-V, but we've more than covered every realistic option open to Nintendo.
Based on the above, I think it's safe to rule out all Power ISA cores, as Wii U binary-compatibility can't be worth than much to Nintendo. I think we can also rule out Intel's offerings as well, both due to cost of the chips themselves, and the inability to fab on a single die with the GPU. Similarly I'd rule out all non-reference ARM cores, as they're the only ones which could be included on an SoC with an AMD GPU (which has to be by far the most likely GPU option). Then Excavator can be ruled out for heat and die area, the same reasons Sony and MS ruled out its predecessors, and 14nm is very unlikely to feasible this year, ruling out Zen.
So, we (or more accurately Nintendo) are basically reduced to two options: Puma or A72. Judging by single-core Geekbench 3 32 bit benchmarks (which are unfortunately all I have to work with for both, even enough it will be affected by things like memory configurations), Puma provides about 10% performance per clock boost over Jaguar, and should be able to clock a bit higher in the same thermal envelope (although it's hard to say by how much). At 2GHz, you could expect about a 25% boost over XBO's CPU, for the same number of cores.
The A72 hits 45% higher single-core Geekbench score per clock than Jaguar, and should clock a bit higher as well (they're hitting 1.8GHz in 28nm phone SoCs, so in a console environment we could assume 2GHz at least). A 2GHz 8 core A72 with two cores reserved for the OS would then give developers about 40% more to work with than they have on XBO. (Again this is just on the basis of this one benchmark).
If Nintendo want more performance than that (and given their history, I would be very surprised if they did), then more cores would be pretty much their only answer, although such a route isn't without its difficulties, as developers may struggle to adequately parallelise their code to make proper use of such a CPU.