It's more a question about bandwidth vs latency. If you have to build your software around bandwidth you have to do a lot of caching to make sure stuff flows properly. If you have to build your software around latency, you can simply prioritize. There's no worrying about caching and fitting data into little pools of memory. It's all accessible at the same time. Volia, developers can just get shit done.
HSA / hUMA proposes to say "hey, we want to base everything on priority rather than caches." It simplifies the entire process. The ideal computer scientist architecture uses a single address space OS and avoids task switching and is effectively functional. But that's beyond the scope of this discussion. A unified memory space merely moves us to that eventuality.
Bandwidth is far more important than latency at this level.
No, bandwidth is not important past a point (with modern CPUs), about 20 odd GB/s.
They flat out can not process more data.
Thus claiming GDDR5 is an advantage over PCs is not true.
Caching by definition is automatic, you can not just decide to make programs that are insensitive to latency, you will always need a small, fast memory pool (unless people get great latency with stacking).
Sharing a memory space with a GPU does not change that.