Originally Posted by James Prior (Tech Writer for Rage3D)
I'm hearing that PS4 GCN has 8 ACE's, each capable of running 8 CL's each. I believe Tahiti is 2 per ACE, 2 ACE's.
edit: 7970/7870 have two ACEs. That is quite a leap.
What are Asynchronous Compute Engines?
Originally Posted by Rage3D
For use in compute applications, the Southern Islands GCN Tahiti design includes dual Asynchronous Compute Engines (ACE) for the independent scheduling and dispatch of work items, necessary for efficient multi-tasking. This allows compute workloads to operate in parallel with graphics workloads, and facilitates fast context switching so that demands by workloads that exceed concurrency abilities can be given needed resources. Despite featuring PCI-Express 3.0 which doubles interface bandwidth from 8GB/s to 16GB/s, plus support for numerous data and protocol commands, the internal dual DMA engines can push data through the bus to saturate that bandwidth.
Originally Posted by Anandtech
Meanwhile on the compute side, AMD’s new Asynchronous Compute Engines serve as the command processors for compute operations on GCN. The principal purpose of ACEs will be to accept work and to dispatch it off to the CUs for processing. As GCN is designed to concurrently work on several tasks, there can be multiple ACEs on a GPU, with the ACEs deciding on resource allocation, context switching, and task priority. AMD has not established an immediate relationship between ACEs and the number of tasks that can be worked on concurrently, so we’re not sure whether there’s a fixed 1:X relationship or whether it’s simply more efficient for the purposes of working on many tasks in parallel to have more ACEs.
One effect of having the ACEs is that GCN has a limited ability to execute tasks out of order. As we mentioned previously GCN is an in-order architecture, and the instruction stream on a wavefront cannot be reodered. However the ACEs can prioritize and reprioritize tasks, allowing tasks to be completed in a different order than they’re received. This allows GCN to free up the resources those tasks were using as early as possible rather than having the task consuming resources for an extended period of time in a nearly-finished state. This is not significantly different from how modern in-order CPUs (Atom, ARM A8, etc) handle multi-tasking.