Its also while looking at A9Xs memory subsystem however that we find our second and final curveball for A9X: the L3 cache. Or rather, the lack thereof. For multiple generations now Apple has used an L3 cache on both their phone and tablet SoCs to help feed both the CPU and GPU, as even a fast memory bus cant keep up with a low latency local cache. Even as recent as A9, Apple included a 4MB victim cache. However for A9X there is no L3 cache; the only caches on the chip are the individual L1 and L2 caches for the CPU and GPU, along with some even smaller amounts for cache for various other functional blocks.
The big question right now is why Apple would do this. Our traditional wisdom here is that the L3 cache was put in place to service both the CPU and GPU, but especially the GPU. Graphics rendering is a memory bandwidth-intensive operation, and as Apple has consistently been well ahead of many of the other ARM SoC designers in GPU performance, they have been running headlong into the performance limitations imposed by narrow mobile memory interfaces. An L3 cache, in turn, would alleviate some of that memory pressure and keep both CPU and GPU performance up.
One explanation may be that Apple deemed the L3 cache no longer necessary with the A9Xs 128-bit LPDDR4 memory bus; that 51.2GB/sec of bandwidth meant that they no longer needed the cache to avoid GPU stalls. However while the use of LPDDR4 may be a factor, Apples ratio of bandwidth-to-GPU cores of roughly 4.26GB/sec-to-1 core is identical to A9s, which does have an L3 cache. With A9X being a larger A9 in so many ways, this alone isnt the whole story.
Whats especially curious is that the L3 cache on the A9 wasnt costing Apple much in the way of space. Chipworks puts the size of A9s 4MB L3 cache block at a puny ~4.5 mm2, which is just 3% the size of A9X. So although there is a cost to adding L3 cache, unless there are issues we cant see even with a die shot (e.g. routing), Apple didnt save much by getting rid of the L3 cache.
Our own Andrei Frumusanu suspects that it may be a power matter, and that Apple was using the L3 cache to save on power-expensive memory operations on the A9. With A9X however, its a tablet SoC that doesnt face the same power restrictions, and as a result doesnt need a power-saving cache. This would be coupled with the fact that with double the GPU cores, there would be a lot more pressure on just a 4MB cache versus the pressure created by A9, which in turn may drive the need for a larger cache and ultimately an even larger die size.
As it stands theres no one obvious reason, and its likely that all 3 factors die size, LPDDR4, and power needs all played a part here, with only those within the halls of One Infinite Loop knowing for sure. However I will add that since Apple has removed the L3 cache, the GPU L2 cache must be sizable. Imaginations tile based deferred rendering technology needs an on-chip cache to hold tiles in to work on, and while they dont need an entire frames worth of cache (which on iPad Pro would be over 21MB), they do need enough cache to hold a single tile. Its much harder to estimate GPU L2 cache size from a die shot (especially with Apples asymmetrical design), but I wouldnt be surprised of A9Xs GPU L2 cache is greater than A9s or A8Xs