Ok, re GPU blocks/processors, the following can all be assumed to be in there:
  • Command Processor and Thread Scheduler (not necessarily the same block)
  • Trisetup and rasterizer (R800 dropped that and delegated the workload to SPs)
  • Global Data Share (traditionally not very large, and likely encased nicely by some of the numerous embedded pools, in a much larger size)
  • A bunch of caches (vertex, texture) which could be really tiny or not so much (again, memory pools ahoy)
  • DMA engines
  • Ring buses
  • Tessellator (likely still sitting in fixed-function silicon)

BTW, a quick google for ARM9 die area got me to this article discussing a Qualcomm broadband/app processor (yes, ARM9 stand-alone dies are a bit hard to track these days), which appears to be ~0.8mm2 @40nm (the original part is 90nm, so I've applied a squares rule).