Another tasty, juicy nugget from Lady Gaia
"Peak teraflop numbers are always misleading, which is why you see so many technically astute people write them off as a meaningful standalone metric. If you're looping then not every instruction is an FMAD, by definition. If you're actually processing meaningful inputs then you'll have execution stalls while waiting on data from the cache or RAM. If you're using the results of the operations to perform derivative operations then you're subject to additional small pipeline bubbles. The peak number is, at best, a relative benchmark and not a very good one. Highly optimized code is lucky to hit 30-40% of the theoretical peak, and it's far more important to run real-world code quickly than to win a paper spec war.
Unfortunately, the general public is addicted to overly simplistic measures and so the theoretical peak is what they're going to compare and as a rough measure across two similar architectures it should be in the ballpark. Without hands-on time with a dev kit I certainly can't prove otherwise. I am expecting that we'll see far more impact from other design considerations than the question of how fast straight-line code can run under ideal conditions. That's why I keep advocating waiting for games to come to any conclusions."
Unf getting juicy.