I'm not sure why stream processor count is an interesting metric besides comparing within one architecture. Nvidia has been favouring fatter GPU cores that do more and take more space and thus have a lower count, AMD has been favouring the opposite, they're just different ways of getting at an end goal. In past architectures Nvidias used to be odd things like 4 thin, 1 "fat" core, and things like that.
This is completely unlike a 4 core CPU doing the same performance as an 8 core, as GPUs are "embarrassingly parallel" devices and don't have such issues scaling performance by core count.
As for the higher on paper flops - that's directly BECAUSE of the higher shader core count. It's just a mathematical statement, nothing more, (stream processors) X (frequency in GHz) X 2 (operations per core per clock) = GFLOPS for AMD architectures. It doesn't denote performance. Again, I never got why "Nvidia does more with less flops" should be a particularly interesting statement. What's the price, and what's the performance, in the end. One could just as well say "for gaming performance X, AMD has more flops than Nvidia", equally meaningless.
Now, it's less efficient, yes, but I'd be interested in how much of that is GloFos shitty process compared to TSMC. AMD was legally obligated to have a certain number of wafers ordered from Glofo, so they chose the mid range 480 to fit the bill, while higher end cards will be TSMC and should improve the efficiency equation.