Heh, it's even more of a difference than I calculated. Do you have a link to where you got that info?
http://www.anandtech.com/show/4061/amds-radeon-hd-6970-radeon-hd-6950/4
I did the math. Again it is a rough estimation, but GCN further improved on the efficiency that VLIW4 offered.
There are even ancillary benefits within the individual SPUs. While the SP count changed the register file did not, leading to less pressure on each SPUs registers as now only 4 SPs vie for register space. Even scheduling is easier as there are fewer SPs to schedule and the fact that theyre all alike means the scheduler no longer has to take into consideration the difference between the w/x/y/z units and the t-unit.
Meanwhile in terms of gaming the benefits are similar. Games that were already failing to fully utilize the VLIW5 design now have additional SIMDs to take advantage of, and as rendering is still an embarrassingly parallel operation as far as threading is concerned, its very easy to further divide the rendering workload in to more threads to take advantage of this change. The extra SIMDs mean that Cayman has additional texturing horsepower over Cypress, and the overall compute:texture ratio has been reduced, a beneficial situation for any games that are texture/filtering bound more than theyre compute bound.
This is mainly why VLIW4 replaced VLIW5, AMD could remove 1ALU for every 5 they had and have roughly the same performance, so 160 cores would work out to 128, of course Wii U has 176gflops, those extra 16gflops would be further covered by the move from VLIW4 to GCN, which moved from 4alu groups to 16alu groups, allowing the 128 cores (alus) to handle multiple thread waves at once, think of it like a catch all, threads would feed into a wider bandwidth of processing than before, without changing the available number of cores, and since you could now do multiple instructions at once, you don't have to worry about ALUs going unused. VLIW4 to GCN was from a processing standpoint smaller than VLIW5 to VLIW4, hope you are satisfied with that, you can read more about it and anandtech is a great deep dive into VLIW4/5 and GCN if you search their archives. This is what the WUST thread did to me btw.
As for maxwell over GCN, I used ~28% better performance, in reality it can be as much as 40% but like I said this is a rough estimation on the safe side, I don't even add in the added functionalities like just being able to do certain effects better and more efficiently than R700's 2008 VLIW5 engine was capable of.
To everyone being down right dense, yes Zelda on docked Switch could do 1080p 60fps, it is straight up 4times as powerful, before feature enhancements, that is enough to handle the game at the higher demand, the only issue is that wii u's architecture is very different, while PS4, XB1 and Switch roughly have the same structure. IE a bad Wii U port can still run badly on Switch, while a bad ps4 port should be handled much better since PS4 and Switch are much more alike.