Can someone briefly explain what this all means (hUMA tech) ?
The GPU and CPU have a sort of on chip memories called a cache that is much higher bandwidth and lower latency the DRAM.
A problem can arise because they check this cache before going to the DRAM, if the CPU/GPU has a cached value that is different to the value in the other devices cache you lose coherency.
What hUMA is trying to do from what I understand is make sure that all values that require coherency have it by either snooping the other devices cache (seems to the be the case for the CPU) or by selectively invalidating a cache line (seems to be the case for the GPU).
The selectively invalidating a single cache line might not seem like a big deal, but it would seem in the past the entire GPU's L2 cache would have to flushed instead of a single line, this would mean that if you only wanted to share a small amount of data between the two coherently that you would lose the rest of the L2, this would cause pretty much every access to hit the DRAM again.