Have you seen an ARM (and armv8, for the modernized 64 bit version) or more importantly PowerPC floorplan for comparison? All this means nothing without it, I haven't found one with instruction decode outlined in such a way in both of the above.
Here's Cortex A5 (single-issue, in-order) from the CPU's official page nevertheless:
The unit that's responsible for feeding the pipeline with ops is dubbed PFU (prefetch unit). A5's PFU does the equivalent job to the ISA decoder + ucode rom + branch prediction units on the Bobcat/Jaguar floorplans. Also, A5 is a much smaller CPU than those AMD's CPUs in transistor counts, so the frontend part gets naturally a bigger portion of the floorplan.
Too bad that those are intrinsic parts of processors that contribute to their die size and power draw, which is what we're talking about? "Remove half the stuff and the other stuff starts to look bigger, eh?"
Erm, every single transistor on the CPU and support logic, heck, the entire motherboard, is 'contributing to the powerdraw' (some of them negatively) and performance. Also, AMD's x86 we've been discussing here are both parts of APUs - i.e. technically they're on the same die with the entire darn computer. Shall we count those too?
In reality, there are
logical units in the CPU
core design, and L1 caches (and tags), and sometimes L2 caches (and tags). Comparing logical units to caches is pointless - caches can be relatively easily (from the design perspective, no necessarily production perspective) reconfigured, changed the associativity of, etc. That's why CPU IP vendors (like ARM) offer the same CPU core design with various cache configurations. Logical units are much more rigid - notice how Jaguar's floorplan is Bobcat's with small modifications - 50% larger fp/simd unit, bus unit moved around (to make room for the fp unit expansion), and of course no L2 blocks (arbitrary shown on the Bobcats floorplan), and Jaguar still constitutes a separate design? Since AMD does not target so many markets with those CPUs, their design features a single cache configuration (multiple APU versions based on caches would be infeasible) But if we take a CPU with bigger target markets (ARM's, Intel's), you'll find various cache configurations. Those cache configurations, though affecting performance in the general case, do not change the inherent CPU core logic design - a CortexA8 with 128KB L2$ is as much a CortexA8 as its 256KB L2$ sibling.
Not sure what the Power8 article has to do with anything. It doesn't make any comparisons with Intel, nor is it overly technical compared to Anandtech, not sure why that was thrown in here. I'll be following its launch with interest, that said (And I wonder if we'll move our Power7 systems to them...). Also, if it was your meaning, I never said IBM wasn't strong in the mainframe market, in fact just the opposite a page back, I said where IBM has excelled is in high performance high power draw uses (usually with crazy cooling rigs and die sizes much larger than what Intel bothers with).
It's a POWER
7 article. If you had actually read it, you'd have seen how the 8 core (32 threads with 4x SMT) POWER7 design uses half the transistors of 8-core (16 threads with 2x SMT) Nehalem EX. Of course that's thanks to IBM's L3 eDRAM tech, so there's no much point in comparing those numbers, as POWER7 would wipe the floor with Intel's chip (higher performance, half the transistors).
edit: A5 OOO and superscalarity misinformation (thanks, DonMigs85) and a few typos