Chobel
Member
Balance it if old
www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview
So this is the complete article that discuss Xbox One architecture, it contains all the old stuff that we saw before like balance, 10% GPU reserve...etc. But also includes some new information like MS GPGPU approach, NAND memory and some other stuff.
CPU have access to eSRAM
GPGPU approach
eSRAM Latency doesn't really matter?
www.eurogamer.net/articles/digitalfoundry-the-complete-xbox-one-interview
So this is the complete article that discuss Xbox One architecture, it contains all the old stuff that we saw before like balance, 10% GPU reserve...etc. But also includes some new information like MS GPGPU approach, NAND memory and some other stuff.
CPU have access to eSRAM
Digital Foundry: And you have CPU read access to the ESRAM, right? This wasn't available on Xbox 360 eDRAM.
Nick Baker: We do but it's very slow.
GPGPU approach
Digital Foundry: So what is your general approach to GPGPU? Sony has made a big deal about its wider compute pipelines in order to get more utilisation of the ALU. What is your philosophy for GPGPU on Xbox One?
Andrew Goossen: Our philosophy is that ALU is really, really important going forward but like I said we did take a different tack on things. Again, on Xbox One our Kinect workloads are running on the GPU with asynchronous compute for all of our GPGPU workloads and we have all the requirements for efficient GPGPU in terms of fast coherent memory, we have our operating system - that takes us back to our system design. Our memory manager on game title side is completely rewritten. We did that to ensure that our virtual addressing for the CPU and GPU are actually the same when you're on that side. Keeping the virtual addresses the same for both CPU and GPU allows the GPU and CPU to share pointers. For example, a shared virtual address space along with coherent memory along with eliminating demand paging means the GPU can directly traverse CPU data structures such as linked lists.
On the system side we're running in a complete generic Windows memory manager but on the game side we don't have to worry about back-compat or any of these nasty issues. It's very easy for us to rewrite the memory manager and so we've got coherent memory, the same virtual addressing between the two, we have synchronisation mechanisms to coordinate between the CPU and GPU that we can run on there. I mean, we invented DirectCompute - and then we've also got things like AMP that we're making big investments on for Xbox One to actually make use of the GPU hardware and the GPGPU workloads.
The other thing I will point out is that also on the internet I see people adding up the number of ALUs and the CPU and adding that onto the GPU and saying, "Ah, you know, Microsoft's CPU boost doesn't make much of a difference." But there still are quite a number of workloads that do not run efficiently on GPGPU. You need to have data parallel workloads to run efficiently on the GPU. The CPU nowadays can run non-data parallel workloads but you're throwing away massive amounts of performance. And for us, getting back to the balance and being able to go back and tweak our performance with the overhead in the margin that we had in the thermals and the silicon design, it kind of enabled us to go back and look at things. We looked at our launch titles and saw that - hey we didn't make the balance between CPU and GPU in terms of our launch titles - we probably under-tweaked it when we designed it two or three years ago. And so it was very beneficial to go back and do that clock raise on the CPU because that's a big benefit to your workloads that can't be running data parallel.
Digital Foundry: The GPU compute comparison seems to be about Xbox One's high coherent read bandwidth vs. raw ALU on PS4. But don't the additional ACEs added to PS4 aim to address that issue?
Andrew Goossen: The number of asynchronous compute queues provided by the ACEs doesn't affect the amount of bandwidth or number of effective FLOPs or any other performance metrics of the GPU. Rather, it dictates the number of simultaneous hardware "contexts" that the GPU's hardware scheduler can operate on any one time. You can think of these as analogous to CPU software threads - they are logical threads of execution that share the GPU hardware. Having more of them doesn't necessarily improve the actual throughput of the system - indeed, just like a program running on the CPU, too many concurrent threads can make aggregate effective performance worse due to thrashing. We believe that the 16 queues afforded by our two ACEs are quite sufficient.
eSRAM Latency doesn't really matter?
Digital Foundry: [...] Does low latency here materially affect GPU performance?
Nick Baker: You're right. GPUs are less latency sensitive. We've not really made any statements about latency.