Support NeoGAF

Myshkin · Aug 24, 2013

ConteZero76 said:
The only thing I don't get is paging... GPU has to work with paging enabled, so shaders are context-binded to a thread ?

I don't see where hUMA itself stops you from passing to the GPU arbitrary pointers to both code and data. Where is the hardware support for memory access protections implemented?

jond76 · Aug 24, 2013

Whole thing was a marketing ploy to trick is into learning about hUMA.

Bravo, AMD!

seattle6418 · Aug 24, 2013

I miss Jeff Rigby. He was all over that Kabini Kaveri stuff when most of us didn't knew what it was and how it was related to PS4.

IN&OUT · Aug 24, 2013

PS4 has a unified memory that work inline with hUMA concept. it's not AMD problem that MS fucked up the design of X1 w/ esram and slow DDR3 memory.

Now MS is angry and started pulling strings? why not equip X1 with future proof specs to avoid all this in the first place instead of being cheap ass and charging premium for inferior tech!

MS is setting on a mountain of CASH, they could've easily created the most powerful console ever conceived ! but they think people are stupid, they don't understand specs and hardware. funny thing that we have Sony which is in the brink of bankruptcy investing in cutting edge and expensive RAM, equip PS4 with better GPU , and devoted a 5 year project to develop PS4 with an eye to industry future trends and needs to future proof the console. also above all that we find Sony charging less for PS4 despite the very superior tech found inside it compared to X1 !

MS just don't care, they said it themselves.

Joeki11a · Aug 24, 2013

Kabini, hUma,

Wtf

phosphor112 · Aug 24, 2013

ConteZero76 said:
This should explain what hUMA is.

The only thing I don't get is paging... GPU has to work with paging enabled, so shaders are context-binded to a thread ?

Hmm. Maybe neither PS4/X1 have hUMA then.

PS4 can snoop CPU cache, ~~but I don't think there is a similar bus to do that for the CPU to GPU cache.~~ See edit below.

I don't think there is any sort of that feature in the X1 at all. Not with what the documents say at least.

Major edit:

Look at this.

Look at the Onion/+ Bus.

It's bidirectional.

Onion = GPU > GPU cache > CPU Cache > Main RAM. Vice versa for other way around.
Onion+ = GPU > CPU Cache > Main RAM. Vice versa.

Once the pipeline hits either the CPU or GPU's cache, the system SHOULD be able to directly access the CPU/GPU without having to go all the way out to main RAM.

So, it SHOULD have coherency...right?

ElTorro · Aug 24, 2013

phosphor112 said:
So, it SHOULD have coherency...right?

Yes, since both the CPU/GPU probe the CPU's caches via Onion, if you choose to use that bus, CPU and GPU are fully cache-coherent. And thanks to the volatile tag, the GPU can bypass its own caches, meaning that they are not compromised by CPU/GPU interaction. This seems to be different on the XB1 where the GPU cache, from my understanding of the leaked documents, must be flushed.

idrinkalone · Aug 24, 2013

Does this mean I can't have halo at 1080p 60 fps? If not i don't care.

Thrakier · Aug 24, 2013

idrinkalone said:
Does this mean I can't have halo at 1080p 60 fps? If not i don't care.

You can ALWAYS have a halo at 1080p/60FPS, no matter the console.

SenjutsuSage · Aug 24, 2013

idrinkalone said:
Does this mean I can't have halo at 1080p 60 fps? If not i don't care.

Corrine's got us covered

thunder_snail · Aug 24, 2013

ElTorro said:
Yes, since both the CPU/GPU probe the CPU's caches via Onion, if you choose to use that bus, CPU and GPU are fully cache-coherent. And thanks to the volatile tag, the GPU can bypass its own caches, meaning that they are not compromised by CPU/GPU interaction. This seems to be different on the XB1 where the GPU cache, from my understanding of the leaked documents, must be flushed.

But doesn't this mean huma like computing is limited by the relatively narrow bandwidth of onion? (relatively narrow for gpu, not cpu) Do gpgpu computing typically require a lot of bandwidth?

ConteZero76 · Aug 24, 2013

thunder_snail said:
But doesn't this mean huma like computing is limited by the relatively narrow bandwidth of onion? (relatively narrow for gpu, not cpu) Do gpgpu computing typically require a lot of bandwidth?

IIRC Cerny said that you can explicitly flag as "coherent" or "not coherent" different memory zones (probably on a page level) so a clever paging don't engulf onion without a reason.

thunder_snail · Aug 24, 2013

W!CKED said:
CPU and GPU working in concert requires low latency more than anything. That's what Onion is for. And with its 20GB/s it even has 25% more bandwidth than the PCIe bus in a PC. For tasks that are not latency sensitive (rendering or GPGPU for eye candy) you can use the Garlic bus with maximum bandwidth.

And for clarification: Onion bus was introduced with Llano a couple of years ago. It's also called "Fusion Compute Link". Only because a system has this busses doesn't automatically mean that it has hUMA. HSA is super complocicated, isn't it? ^_^

Well, with huma, I'd imagine the GPU utilizing its many cores and high bandwidth to work on huge data structures while the CPU takes over infrequent branch conditions (since gpus suck at that as far as I know). So I think the GPU would still benefit a lot from high bandwidth.

But since this is a gaming system and the GPU will probably spend most of its time on graphics, hopefully, onion may be enough. 20 GB/s is an overkill for jaguar cpus anyways.

I'll just have to wait and buy an amd kaveri laptop to play around with this tech. Gonna be very educating.

Manmademan · Aug 24, 2013

IN&OUT said:
PS4 has a unified memory that work inline with hUMA concept. it's not AMD problem that MS fucked up the design of X1 w/ esram and slow DDR3 memory.

Now MS is angry and started pulling strings? why not equip X1 with future proof specs to avoid all this in the first place instead of being cheap ass and charging premium for inferior tech!

MS is setting on a mountain of CASH, they could've easily created the most powerful console ever conceived ! but they think people are stupid, they don't understand specs and hardware. funny thing that we have Sony which is in the brink of bankruptcy investing in cutting edge and expensive RAM, equip PS4 with better GPU , and devoted a 5 year project to develop PS4 with an eye to industry future trends and needs to future proof the console. also above all that we find Sony charging less for PS4 despite the very superior tech found inside it compared to X1 !

MS just don't care, they said it themselves.

This is late, but it's worth addressing I think. it's erroneous to say MS doesn't care and prefers to sit on a mountain of cash while giving gamers shit hardware and telling them to go F themselves if they don't like it.

MS is publically traded and answerable to shareholders. they already lost their shirts establishing xbox two Gens back. a second money losing console of that nature isn't happening. Xbone needs to have a clear path to profitability.

second, the xbox division doesn't really make it all that much money for MS. Other divisions are much more important, and at the moment they're struggling with windows 8, are fending off google docs as an office competitor, losing badly with windows phone and getting completely, utterly and totally owned with surface. they can't afford to throw money at the xbox just to kill Sony, there's too much else at stake for them.

amstradcpc · Aug 24, 2013

W!CKED said:
CPU and GPU working in concert requires low latency more than anything. That's what Onion is for. And with its 20GB/s it even has 25% more bandwidth than the PCIe bus in a PC. For tasks that are not latency sensitive (rendering or GPGPU for eye candy) you can use the Garlic bus with maximum bandwidth.

And for clarification: Onion bus was introduced with Llano a couple of years ago. It's also called "Fusion Compute Link". Only because a system has this busses doesn't automatically mean that it has hUMA. HSA is super complocicated, isn't it? ^_^

Yeah,until someone says if really there is a real unified memory address instead of a virtual one we will not know if its hUMA or not.I bet is still virtual like Llano or Cerny would have said the contrary.

thunder_snail · Aug 24, 2013

Are you sure kaveri won't have an option for gddr5? I'm sure I've read somewhere they are gonna release an apu based on ps4's design. So they already have the memory controller worked out. Gddr5 dimms may not exist yet but a lot of ultrabooks these days have memory chips on the motherboard itself. Also if they go for an apu with say, 6 cu's, may be ddr3 will be enough.

Panajev2001a · Aug 24, 2013

W!CKED said:
CPU and GPU working in concert requires low latency more than anything. That's what Onion is for. And with its 20GB/s it even has 25% more bandwidth than the PCIe bus in a PC. For tasks that are not latency sensitive (rendering or GPGPU for eye candy) you can use the Garlic bus with maximum bandwidth.

And for clarification: Onion bus was introduced with Llano a couple of years ago. It's also called "Fusion Compute Link". Only because a system has this busses doesn't automatically mean that it has hUMA. HSA is super complocicated, isn't it? ^_^

In a console environment there often are additional paths which in a way break nice and easy abstractions but provide additional performance headroom. This could be a way to explain both the use of onion, onion+, and garlic as you were also explaining in your post.

From leaks and interviews, it is apparent that, according to the memory region you store data in, you will access data through garlic or onion and if you want to skip GPU data caches you can access data through onion+.

It is possible to think that for hUMA to work, for addresses to be shared between CPU and GPU, AMD needed both a new GPU core compared to their previous APU's as well as other software customizations including OS support which would take much less time to enable in a custom BSD based OS than on Windows and OS X.

Also, it is conceivable that hUMA works only when CPU and GPU access data mapped on the onion memory region and not when accessed over onion+ or garlic. It would make sense and it would not be too hard to manage either IMHO for developers accustomed to crazier setups

.
According to Cerny's Digital Foundry/EG interview and what he was saying about the HSA software stack, it is possible that it might be a feature that will get in the SDK post launch.

ElTorro · Aug 24, 2013

thunder_snail said:
But doesn't this mean huma like computing is limited by the relatively narrow bandwidth of onion? (relatively narrow for gpu, not cpu) Do gpgpu computing typically require a lot of bandwidth?

The GPU uses 4 memory controllers, each with two 32bit wide channels. [1] Jaguar most likely, like Kabini [2], uses a single 64bit wide memory controller. That must limit the bandwidth on Onion.

[1] http://www.amd.com/us/Documents/GCN_Architecture_whitepaper.pdf
[2] http://www.anandtech.com/show/6976/...wering-xbox-one-playstation-4-kabini-temash/4

/edit: Quote from the second article

The major change between AMD’s Temash/Kabini Jaguar implementations as what’s done in the consoles is really all of the unified memory addressing work and any coherency that’s supported on the platforms. Memory buses are obviously very different as well, but the CPU cores themselves are pretty much identical to what we’ve outlined here.

So much for the "based on Kabini implies no hUMA" argument.

Panajev2001a · Aug 24, 2013

amstradcpc said:
Yeah,until someone says if really there is a real unified memory address instead of a virtual one we will not know if its hUMA or not.I bet is still virtual like Llano or Cerny would have said the contrary.

It would still be virtual, CPU an GPU would share addresses and the same virtual to physical memory pages mapping. GCN can already use virtual memory and access data outside its directly accessible physical memory through paging, what seems to be required for hUMA is an extension (share the same virtual address space and pointers to it as the CPU) of that not a complete revolution I think.

sleeping_dragon · Aug 24, 2013

This thread has no huma.

Hoo-doo · Aug 24, 2013

red731 said:
Good. GOOD.

Also here is Yung Humma!

came for this. Not disappointed.

ElTorro · Aug 24, 2013

By the way, the upcoming Kaveri (which officially supports hUMA) seems to be AMD's first APU to use GDDR5. [1] It's not far-fetched to conclude that the PS4s memory architecture is based on that.

[1] http://gamingio.com/2013/04/amd-to-...mory-architecture-for-cpuapus-as-cost-lowers/

Panajev2001a · Aug 24, 2013

ElTorro said:
By the way, the upcoming Kaveri (which officially supports hUMA) seems to be AMD's first APU to use GDDR5. [1] It's not far-fetched to conclude that the PS4s memory architecture is based on that.

[1] http://gamingio.com/2013/04/amd-to-...mory-architecture-for-cpuapus-as-cost-lowers/

Good point.

Krakn3Dfx · Aug 25, 2013

ElTorro said:
By the way, the upcoming Kaveri (which officially supports hUMA) seems to be AMD's first APU to use GDDR5. [1] It's not far-fetched to conclude that the PS4s memory architecture is based on that.

[1] http://gamingio.com/2013/04/amd-to-...mory-architecture-for-cpuapus-as-cost-lowers/

Kind of puts everything in place when it comes to what AMD said and then had to get from under at Gamescom.

Bundy · Aug 25, 2013

ElTorro said:
By the way, the upcoming Kaveri (which officially supports hUMA) seems to be AMD's first APU to use GDDR5. [1] It's not far-fetched to conclude that the PS4s memory architecture is based on that.

[1] http://gamingio.com/2013/04/amd-to-...mory-architecture-for-cpuapus-as-cost-lowers/

Very interesting!
Thank you for the link.

dragonelite · Aug 26, 2013

W!CKED said:
Some new information on HSA from Hotchips 25:

Full slide at Planet3DNow

Isn't the X1 apu presentation today?

ekim · Aug 26, 2013

dragonelite said:
Isn't the X1 apu presentation today?

Yes it is. 9:30AM PDT afaik.

googleplex · Aug 26, 2013

... And the performance gap gets bigger.

ekim · Aug 26, 2013

googleplex said:
... And the performance gap gets bigger.

Want to elaborate?

dragonelite · Aug 26, 2013

ekim said:
Yes it is. 9:30AM PDT afaik.

So i can expect some slides later tonight or so here in euroland

robertsan21 · Aug 26, 2013

this is one confusing thread

Is hUMA used for PS4? or are the jury still out on that?

Will Xbox one have it?

Subjacked · Aug 26, 2013

ekim said:
Yes it is. 9:30AM PDT afaik.

What event is it at?

ekim · Aug 26, 2013

Subjacked said:
What event is it at?

Hot Chips.

www.hotchips.org

Finalizer · Aug 26, 2013

robertsan21 said:
this is one confusing thread

Is hUMA used for PS4? or are the jury still out on that?

Will Xbox one have it?

We're probably not gonna get a definitive answer one way or another until these systems launch and the NDAs lift. It seems likely that PS4 supports hUMA in some form at least, since it looks like it's got the parts in place to support it. Jury's out on Xbone, though personally I wouldn't be surprised if it had some sort of solution of its own... But that's a tale straight from my ass, so don't put any stock into that.

Curious about this Xbone APU presentation. Wonder if we'll get any interesting insights out of that.

benny_a · Aug 26, 2013

robertsan21 said:
this is one confusing thread

Is hUMA used for PS4? or are the jury still out on that?

Seems likely based on the modification that Cerny talked about back in April.

robertsan21 said:
Will Xbox one have it?

Doesn't seem like it. The OP gives a good rundown.

Subjacked · Aug 26, 2013

ekim said:
Hot Chips.

www.hotchips.org

Thanks, curious what we will get from this. Hopefully something positively surprising.

ekim · Aug 26, 2013

I checked this table :
http://en.wikipedia.org/wiki/Heterogenous_System_Architecture#AMD_HSA_Implementation

and if I'm not misunderstanding something most of the listed 2013/Kaveri/2014 HSA features are indeed in Xbox One's APU. (Can be validated by a mod if wanted)
- passing pointers between CPU/GPU
- GPU uses pageable system memory via CPU pointers (well that's basically an implication from the above point)
- context switch
- pre-emption (which is basically context switching)

It really seems that only the eSRAM prevents the box from being "hUMA" by AMD's definition:
- Fully coherent memory between CPU & GPU

But MS might have an own solution for this:
from B3D (http://forum.beyond3d.com/showpost.php?p=1777116&postcount=5697)

Nick Baker (Engineer Console Architecture)
Source: http://www.youtube.com/watch?v=vg_DR0leAYw
21:50

"We had to invest a lot in coherency through the chips. There's been I/O coherency for awhile, but we really wanted to get the software out of the mode of managing caches and you know, put in hardware coherency for the first time on a mass scale in the living room on the GPU."

I guess that's what they will talk about in the hot chips session.

joshcryer · Aug 26, 2013

ekim said:
But MS might have an own solution for this:
from B3D (http://forum.beyond3d.com/showpost.php?p=1777116&postcount=5697)

I guess that's what they will talk about in the hot chips session.

He goes on immediately after to mention nested page tables, which is what this is probably about. It sounds like, to me, that they have solution where you can pass a pointer from GPU/CPU at an API level using a page walker, but that's going to come with some overhead, and if the cache needs to be flushed every time, it's going to cost a lot. They may have a cache level solution that keeps it from being flushed.

The entire OS seems to sit in its own VM, which is interesting, and which probably means that a huge chunk of memory is going to be dedicated to the OS.

cebri.one · Aug 26, 2013

Just a reminder...

http://www.vgleaks.com/durango-memory-system-overview/

There are two types of coherency in the Durango memory system:

Fully hardware coherent
I/O coherent
The two CPU modules are fully coherent. The term fully coherent means that the CPUs do not need to explicitly flush in order for the latest copy of modified data to be available (except when using Write Combined access).

The rest of the Durango infrastructure (the GPU and I/O devices such as, Audio and the Kinect Sensor) is I/O coherent. The term I/O coherent means that those clients can access data in the CPU caches, but that their own caches cannot be probed.

When the CPU produces data, other system clients can choose to consume that data without any extra synchronization work from the CPU.

The total coherent bandwidth through the north bridge is limited to about 30 GB/s.

The CPU requests do not probe any other non-CPU clients, even if the clients have caches. (For example, the GPU has its own cache hierarchy, but the GPU is not probed by the CPU requests.) Therefore, I/O coherent clients must explicitly flush modified data for any latest-modified copy to become visible to the CPUs and to the other I/O coherent clients.

The GPU can perform both coherent and non-coherent memory access. Coherent read-bandwidth of the GPU is limited to 30 GB/s when there is a cache miss, and its limited to 10 15 GB/s when there is a hit. A GPU memory page attribute determines the coherency of memory access.

ekim · Aug 26, 2013

http://cn.linkedin.com/in/vinberlei

APU/dGPU?

ElTorro · Aug 26, 2013

ekim said:
APU/dGPU?

Dedicated GPU (in contrast to iGPU: integrated GPU).

Hexa · Aug 26, 2013

ekim said:
http://cn.linkedin.com/in/vinberlei

APU/dGPU?

Wait what? I thought by definition an APU can't have a dGPU.

ekim · Aug 26, 2013

ElTorro said:
Dedicated GPU (in contrast to iGPU: integrated GPU).

That's why I was wondering - but I guess the person in question just did verification tests on APUs/dGPUs and PS4/X1 APUs. I first understood as if these consoles have an APU + dGPU. That was pretty much unbelievable.

Hexa said:
Wait what? I thought by definition an APU can't have a dGPU.

Afaik, Richland APU's iGPUs can be used for Crossfire with a dGPU.
edit:nevermind - misreading.

ElTorro · Aug 26, 2013

Hexa said:
Wait what? I thought by definition an APU can't have a dGPU.

That guy worked with certainty on more than just one project.

KidBeta · Aug 26, 2013

ekim said:
I checked this table :
http://en.wikipedia.org/wiki/Heterogenous_System_Architecture#AMD_HSA_Implementation

and if I'm not misunderstanding something most of the listed 2013/Kaveri/2014 HSA features are indeed in Xbox One's APU. (Can be validated by a mod if wanted)
- passing pointers between CPU/GPU
- GPU uses pageable system memory via CPU pointers (well that's basically an implication from the above point)
- context switch
- pre-emption (which is basically context switching)

It really seems that only the eSRAM prevents the box from being "hUMA" by AMD's definition:
- Fully coherent memory between CPU & GPU

But MS might have an own solution for this:
from B3D (http://forum.beyond3d.com/showpost.php?p=1777116&postcount=5697)

I guess that's what they will talk about in the hot chips session.

Could you provide your evidence for the context switch / pre-emption I have yet to read or even hear anything that suggests the XBONE has it.

For the first two points they are a standard feature of GCN, so it would be surprising if the XBONE didn't have them.

ElTorro · Aug 27, 2013

W!CKED said:
Also interesting: Sound processors in Xbox One has a lot of grunt. Cerny said he wants to use GPGPU for sound. Can't wait to see which solution is better.

I don't think that compares. Cerny said that the quite specific use case of raytracing for audio could be done by GPGPU. I don't see how the audio processor in the XB1 could do raytracing since this task depends on the representation of the scene's geometry and, thus, needs fast access to main memory.

In addition, the XB1's audio processor explicitly has "pathways" to integrate calculations performed on the CPU indicating that it won't work well as a general purpose processor. I guess it's there to perform programmable tasks on audio streams. In this respect, we don't really know what the PS4's audio chip can do since all we have are Cerny's two sentences on that issue.

Brad Grenz · Aug 27, 2013

It should be noted that the audio processing that Cerny has talked about doing on the GPU is not something you can do on the Xbox One's audio processor. If an Xbox One wanted to do the same audio ray casting it would have to use the GPU, too.

EDIT: Beaten

ElTorro · Aug 27, 2013

W!CKED said:
What is your take on hUMA for Xbox One, guys?

The only difference I can spot is the lack of finegrained GPU cache control in the XB1 and ESRAM/DME. But that was all known before the HotChips presentation from the leaked documents so I don't think that we have gained that much more information.

benny_a · Aug 27, 2013

W!CKED said:
Two compute command processors (ACEs) and most likely two compute queues for Xbox One.

Could it be that they just don't list all of them so they actually have more and the 2x2 they display are just stand-ins?

ElTorro · Aug 27, 2013

benny_a said:
Could it be that they just don't list all of them so they actually have more and the 2x2 they display are just stand-ins?

If I remember correctly, a 2x2 setup is the standard one in GCN while the added ACEs in the PS4 are among the explicit modifications.

Support NeoGAF

AMD: PlayStation 4 supports hUMA, Xbox One does not

Member

Banned

Member

Banned

Banned

Banned

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Banned

Member

Banned

Member

Member

Member

Member

Member

Member

GAF's Pleasant Genius

I wanted to dominate the living room. Then I took an ESRAM in the knee.

GAF's Pleasant Genius

Banned

Banned

I wanted to dominate the living room. Then I took an ESRAM in the knee.

GAF's Pleasant Genius

Member

Banned

Member

Member

Member

Member

Member

Member

Neo Member

Member

Member

extra source of jiggaflops

Neo Member

Member

it's ok, you're all right now

Member

Member

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Member

Member

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Junior Member

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Member

I wanted to dominate the living room. Then I took an ESRAM in the knee.

extra source of jiggaflops

I wanted to dominate the living room. Then I took an ESRAM in the knee.

Similar threads