Ok, a mis-transcript is a valid concern. And it more and more looks like one : /
Yeah, I've found the original quote:
Devinder Kumar said:
...we projected earlier this year that we would have at least one to two semi-custom design wins and Im pleased to report that we have those design wins, the work to design the products that has already started, the contract assigned and those parts get introduced in 2016.
[...]
Im not going to give too much detail. Ill say that one is x86 and one is ARM, and atleast one will be on gaming, right. But thats about as much as you going to get out me today, because the customers from the standpoint to be fair to them. It is their product. They launch it.
(Link)
So either one or both of their late 2014 semi-custom wins are gaming devices, and then there's I believe another semi-custom win in 2015 which they haven't commented as much on. For all we know they could all be games consoles at this point.
An HPC part could, but in practice, those looking for HPC server parts tend to go with stand-alone GPUs. Because if your workload maps well to GPGPU, then going with larger GPUs over smaller ones normally pays back in efficiency - larger GPUs have better FLOP/Watt ratings. That's just the norm though, exceptions can occur.
True, but integrating CPU, GPU and RAM on a single package may have advantages of its own, for example by removing the PCIe bottleneck and allowing both CPU and GPGPU code to operate on the same HBM pool. You could also substantially increase node density and simplify cooling. I could see situations where this kind of approach may be suitable, such as simulations which are only partially suited to GPGPU and where CPU-GPU data transfers are a bottleneck. I don't know how typical this is, but given than Nvidia have developed NVLink as a replacement for PCIe in HPC it's obviously an issue in some cases. Considering that PCIe v3 provides about 16GB/s on an x16 interface, and NVLink provides 80-192GB/s (from Nvidia's 5-12x claim), having a shared 1024GB/s interface to a common memory pool may be advantageous in some situations where you would otherwise be writing back and forth a huge amount of memory between CPU and GPU.
That said, I'm more than willing to admit that I'm even less qualified to talk about this than I am for the rest of my contributions to this thread. I took a short module related to HPC in college, but they don't let undergrads play with the supercomputers, and we never even got so far as GPU-accelerated HPC, let alone the potential bottlenecks inherent for different GPGPU workloads.