• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

PS4's memory subsystem has separate buses for CPU (20Gb/s) and GPU(176Gb/s)

DBT85

Member
The few snippets I've seen in here so far are interesting so I'll check the whole article.

But the thread delivered already.

Really nice article, tons of details on PS4 arhitecture and SDK tools. Sony seems to be in very good position with PS4 deployment.


Article also confirmed two things - 2 CPU cores are dedicated to PS4 OS, and PSN download speed limit is [was] 12mbps.

Regarding the download speed, I'm certain I've downloaded at my Internet max of 40mbps from PSN before. This test was on an "upto 20mbps" line and Steam only got 16mbps. I find it hard to believe they only had that line available to test on but whatever.
 

i-Lo

Member
I went ahead and made a "clearer" version of the diagram posted on the first page.

Hope it helps.

5VFsgb4.png

Stop taking away our flops yo.

EDIT: Did you know that PS2 had about 6GFlops of computational power? Every Gflop counts, least of all dat xtra 23.
 
Also, as I understand it, the purpose of Onion+ is to pass things like that constant data directly to the GPU from the CPU rather than having an intermediary memory layer in between them (GPU cache). No need for it to reside in GPU cache when the CPU cache is actually holding onto it.

I also assume the unified memory allows this to happen on a grander scale directly from the main memory when either needs to access the others' large chunk of data.
 
" "The PS4's GPU is very programmable. There's a lot of power in there that we're just not using yet. So what we want to do are some PS4-specific things for our rendering but within reason - it's a cross-platform game so we can't do too much that's PS4-specific," he reveals.

"There are two things we want to look into: asynchronous compute where we can actually run compute jobs in parallel... We [also] have low-level access to the fragment-processing hardware which allows us to do some quite interesting things with anti-aliasing and a few other effects."

Fucking xbone holding back the ps4.

#unleashPS4 campaign, go!
 

Locuza

Member
It's one of the major modifications by the hardware team:
[]
The three "major modifications" Sony did to the architecture to support this vision are as follows, in Cerny's words:
"First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!"
I disagree with Cerny. I would call the PS4 AMD Station A and the Xbox One AMD Station B with some custom-stuff.
The FCL Enhancements, the unified memory access, the cache bypass, the eight ACEs and so on is more or less simply AMDs IP, without major changes from Sony.

I can confidentially say that there are multiple buses named after vegetables. Whether this is good or bad depends on how much you like onions/garlic/chives.
Actually AMD calls it now FLC (Fusion Compute Link) and RMB (Radeon Memory Bus).


Asynchronous compute is one of the areas the PS4 has a very significant advantage over Xbone. PS4 has extra CU units (18 vs. 12 for Xbone) that can be applied to asynchronous compute. PS4 also has custom modified compute queues: 64 vs. the standard 2 on AMD GCN parts.

It's great that PS4 ports are already looking at taking advantage of asynchronous compute this early in the lifecycle.
Maybe there was some influence from Sony, but the ACEs are not different from the one AMD is using in their products nowadays.
Kabini has 4 ACEs with 32 queues for 2 CUs ;)
Bonaire/7790 has also 4 ACEs with 32 queues (the chart with 2 classic ACEs must be wrong)
I guess the Xbox One has also 4 ACEs with 32 queues.
And Kaveri will have like the PS4 8 ACEs with 64 queues for 8 CUs.

The real custom stuff about the GPU ist the second graphics command processor with high priority for the OS.
 
The thing that puzzles me is the developers' mention of the need to allocate data correctly. When I first read about AMD's HSA, I assumed that everything would be unified and could be customized to the needs of the developers as required. I thought there wouldn't be any need at all for data separation, as to me and my limited knowledge it seemed that CPU and GPU data would be fed thrugh a common bus and allocated dynamically as needed. Now I'm confused :)

You can still do exactly that. HSA tries to optimize through generalization. One thing to keep in mind is that this design IS NOT the final HSA stage. The final HSA goal is to incorporate CPU and GPU into one... giant mixture of things so they can "know" what each other (CPU/GPU) are doing. These buses allow devs to do exactly that WITHOUT needing that final stage.

So, you have Garlic, which is just GPU to RAM. Perfect for graphics.
Then, more importantly, you have Onion and Onion+, that allows this "HSA" access to shine. You have a CPU process that needs to jump into compute or vice versa. Onion/+ allows the GPU to see what's in the CPU cache... and easily grab the data for use. No need to jump to the main RAM when it's already cached from the CPU ready to go to the GPU.

Stop taking away our flops yo.

EDIT: Did you know that PS2 had about 6GFlops of computational power? Every Gflop counts, least of all dat xtra 23.

I rubbed butter and syrup all over dos GFLOPS and stuffed it in my belly.
 
PS4 games will give you bad breath, and be delicious. Confirmed.

PS4: Chock full of Folic Acid

So if the unified memory doesn't negate the need to separate data in the way you described, what is its main benefit?

One massive advantage of unified memory is that the developers get to dictate how the RAM is used. With the PS3 the memory was set at a 50-50 ratio. With a unified system developers can tailor their allocations to their needs.
 

mintylurb

Member
Not in the slightest... and sadly there are more posters like him - the only difference being they don't make their bs as obvious and sometimes hide it in walls of text...

Hahah. I for one can't wait for those posters to start posting in this thread. This is going to be amusing.

So is this good or bad, I am untechnical as hell. How does this impair/improve performance?

It's good. Total 196GB bw. Though that bw can't compete against x1's infinite power of the cloud.
 

Sweep14

Member
Something told me we were not going to go into next gen without hearing about the unlocked power these machines have. So all we need now is a hard number of the percentage of the machines potential the first gen games are going to be using, 50,70,90 or 110%.

IIRC correctly, on the Guerilla's slide about KZ SF, they said their code was only using one (1) CU...
 

benny_a

extra source of jiggaflops
It's good. Total 196GB bw. Though that bw can't compete against x1's infinite power of the cloud.
Go away with this adding of various bandwidths. There is no reason to get less precise.

It will just be used in fanboy wars about "See, the Sony people are advertising bullshit memory bandwidths too! LOL hypocritical!"

So is this good or bad, I am untechnical as hell. How does this impair/improve performance?
I believe as someone that is untechnical all you need to know is that John Carmack thinks that Sony "made wise engineering choices."
 

mintylurb

Member
Go away with this adding of various bandwidths. There is no reason to get less precise.

It will just be used in fanboy wars about "See, the Sony people are advertising bullshit memory bandwidths too! LOL hypocritical!"
You're taking this way too seriously, benny.
 
there's nothing wrong with them having these separate buses.

you use the bus you need for the tasks at hand.

obviously the gpu is going to want to use the fat radeon bus for loading textures and all that good stuff.

some of the cpu stuff doesn't need a giant bus like the one used f/ the gpu. it would be pointless to have for the cpu alone.

this really doesn't change a thing for the ps4. this is NOT a bad thing
 

benny_a

extra source of jiggaflops
You're taking this way too seriously, benny.
I read a lot of these threads and I can just see the derails in the future provided by some well known posters that will allude to posts like the ones you make.

I'm just anticipating to be annoyed in the future, but I guess that is being too serious so you're correct.
 
That's what he's saying. I doubt he's talking about the PC.

He isn't actually. It wouldn't matter what the power of the Xbone were much higher than the PS4. They still wouldn't be able to do much specific stuff to its architecture because they have to keep the game multiplatform.

I've always thought the comment of developing for the lowest common denominator to be a bit misguided. Even if both consoles had the same power, multiplatform games would still perform worse than exclusives because they can't take advantage of specific hardware features of either one. Which is why I've never been too fond of having more than 1 top of the line console. And before you say competition is always better for the consumer, realize than in many cases, especially involving economies of scale, it isn't.
 

Ryoku

Member
It's good. Total 196GB bw. Though that bw can't compete against x1's infinite power of the cloud.

Correct me if I'm wrong, but this doesn't mean that the total BW is 196GB/s.
The memory itself has a maximum bandwidth of 176GB/s.
The GPU has access to the full available bandwidth.
The CPU has access to the same pool of memory, but through a slower bus with a maximum of 20GB/s.

Right or wrong?
 

RoboPlato

I'd be in the dick
Correct me if I'm wrong, but this doesn't mean that the total BW is 192GB/s.
The memory itself has a maximum bandwidth of 176GB/s.
The GPU has access to the full available bandwidth.
The CPU has access to the same pool of memory, but through a slower bus with a maximum of 20GB/s.

Right or wrong?

You are correct. It's also worth nothing that GCN GPUs are rated to be able to only handle 153GB/s so the division between the CPU and GPU is pretty much perfect for the bandwidth.
 

WolvenOne

Member
It is really just about whether an individual address in main memory should be mapped to CPU-L1/L2 cache (Onion) or not (Garlic). CPU-L1/L2 is (a) of limited size and (b) highly relevant to the CPU but at the same time irrelevant to the GPU. Hence, you issue access commands to CPU-relevant data through Onion, and access to CPU-irrelevant data through Garlic. As a result the GPU does not bully the CPU.

Ah, so, fewer conflicts over resources, gotcha.
 

kvn

Member
What? Why? It's about games tech, why shouldn't we be allowed to discuss it? It's a great opportunity for all of us to gain some knowledge on how the internals of a console work.

I don't want to prohibit discussion, but - no offense - your thread's title and content due to your presumably lacking knowledge of the topic often leads / will lead (and or already led) to clueless people posting nonsense.

Fortunately there are competent posters who already clarified the matter, so no hard feelings.
 
Boo I understand the reasoning, but still don't sit well when people don't go all out and take advantage of available resources when it comes to anything ><

They will take advantage of both systems just not at launch maybe in next 2 years you will start seeing systems advantages being played out .
 

Perkel

Banned
Really nice article, tons of details on PS4 arhitecture and SDK tools. Sony seems to be in very good position with PS4 deployment.


Article also confirmed two things - 2 CPU cores are dedicated to PS4 OS, and PSN download speed limit is [was] 12mbps.

Nope it did not.

edit: It is true ! I stand corrected
 

i-Lo

Member
You are correct. It's also worth nothing that GCN GPUs are rated to be able to only handle 153GB/s so the division between the CPU and GPU is pretty much perfect for the bandwidth.

Yup both retail 7850 & 7870 operate with bandwidth of 153.6GB/s. The GPU in PS4 would at the least have access to an extra 2.4GB/sec at any given time. Plus, the CPU won't be constantly accessing data at that rate 100% of the time anyway.
 

Ryoku

Member
You are correct. It's also worth nothing that GCN GPUs are rated to be able to only handle 153GB/s so the division between the CPU and GPU is pretty much perfect for the bandwidth.

Yeah, okay. I questioned myself about it since a lot of people thought the bandwidth to the GPU wouldn't be compromised.

Theoretically speaking, if the CPU utilizes the maximum bandwidth of 20GB/s, the GPU has access to a maximum of 156GB/s.
 

MORT1S

Member
Nope it did not.

Their game logic is created for 2 cores where rest of the cores do some other stuff. There is no confirmation in text about what you say.

It did. There is a diagram captioned

Owing to confidentiality agreements, Reflections couldn't go into too much depth on the relationship of the Onion and Garlic buses with the rest of the PS4's processor, but we suspect that ExtremeTech's block diagram is pretty close to the mark. Note that the PS4 has two Jaguar CPU clusters for eight cores in total, two of which are reserved by the operating system.

Maybe you missed it?
 

rothbart

Member
The cpu has to go through the gpu to acess ram. So its actually 176 minus 20 = 156 I belive

And all of this is routed through the 6x Blu-ray interface cutting it down to a measly 216MB/sec. That, it turn, is farmed out to the cloud at a rate of 1.5MB/sec, calculating in the average latency and work-time in the cloud, and it all effectively forms a minute black hole tearing apart the space-time continuum for each and every operation.

Sony am doomed.
 

quest

Not Banned from OT
I can understand xbone needing two cores for their 2 and a half OS setup but I'm perplexed as to why PS4 would need two as well.

I am assuming for back ground stuff, multi-tasking and social sharing. It does not matter any way multiplatform be around 6 cores anyways. Also Sony first party will be using compute so the extra core not going to be that big of a deal. The extra ACE units easily make up for this.
 

FranXico

Member
I finally understood where does that 196 GB/s value for the theoretical bandwidth peak comes from.

Thank you all for this informative thread! :)
 
Top Bottom