• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

VGLeaks Rumor: Durango Memory System Overview & Example

Takuya

Banned
http://www.vgleaks.com/durango-memory-system-overview/
&
http://www.vgleaks.com/durango-memory-system-example/

durango_memory.jpg


Memory

As you can see on the right side of the diagram, the Durango console has:

8 GB of DRAM.
32 MB of ESRAM.

DRAM

The maximum combined read and write bandwidth to DRAM is 68 GB/s (gigabytes per second). In other words, the sum of read and write bandwidth to DRAM cannot exceed 68 GB/s. You can realistically expect that about 80 – 85% of that bandwidth will be achievable (54.4 GB/s – 57.8 GB/s).

DRAM bandwidth is shared between the following components:

- CPU
- GPU
- Display scan out
- Move engines
- Audio system
- ESRAM

The maximum combined ESRAM read and write bandwidth is 102 GB/s. Having high bandwidth and lower latency makes ESRAM a really valuable memory resource for the GPU.

ESRAM bandwidth is shared between the following components:

- GPU
- Move engines

Video encode/decode engine. System coherency

There are two types of coherency in the Durango memory system:

Fully hardware coherent

I/O coherent

[...]

The CPU

The Durango console has two CPU modules, and each module has its own 2 MB L2 cache. Each module has four cores, and each of the four cores in each module also has its own 32 KB L1 cache.

When a local L2 miss occurs, the Durango console probes the adjacent L2 cache via the north bridge. Since there is no fast path between the two L2 caches, to avoid cache thrashing, it’s important that you maximize the sharing of data between cores in a module, and that you minimize the sharing between the two CPU modules.

Typical latencies for local and remote cache hits are shown in this table.

- Remote L2 hit approximately 100 cycles
- Remote L1 hit approximately 120 cycles
- Local L1 Hit 3 cycles for 64-bit values
- 5 cycles for 128-bit values
- Local L2 Hit approximately 30 cycles
- Each of the two CPU modules connects to the north bridge by a bus that can carry up to 20.8 GB/s in each direction.

From a program standpoint, normal x86 ordering applies to both reads and writes. Stores are strongly ordered (becoming visible in program order with no explicit memory barriers), and reads are out of order.

Keep in mind that if the CPU uses Write Combined memory writes, then a memory synchronization instruction (SFENCE) must follow to ensure that the writes are visible to the other client devices.

The GPU

The GPU can read at 170 GB/s and write at 102 GB/s through multiple combinations of its clients. Examples of GPU clients are the Color/Depth Blocks and the GPU L2 cache.

The GPU has a direct non-coherent connection to the DRAM memory controller and to ESRAM. The GPU also has a coherent read/write path to the CPU’s L2 caches and to DRAM.

For each read and write request from the GPU, the request uses one path depending on whether the accessed resource is located in “coherent” or “non-coherent” memory.

Some GPU functions share a lower-bandwidth (25.6 GB/s), bidirectional read/write path. Those GPU functions include:

- Command buffer and vertex index fetch
- Move engines
- Video encoding/decoding engines
- Front buffer scan out
- As the GPU is I/O coherent, data in the GPU caches must be flushed before that data is visible to other components of the system.

The available bandwidth and requirements of other memory clients limit the total read and write bandwidth of the GPU.

Move engines

The Durango console has 25.6 GB/s of read and 25.6 GB/s of write bandwidth shared between:

- Four move engines
- Display scan out and write-back
- Video encoding and decoding
- The display scan out consumes a maximum of 3.9 GB/s of read bandwidth (multiply 3 display planes × 4 bytes per pixel × HDMI limit of 300 megapixels per second), and display write-back consumes a maximum of 1.1 GB/s of write bandwidth (multiply 30 bits per pixel × 300 - megapixels per second).

You may wonder what happens when the GPU is busy copying data and a move engine is told to copy data from one type of memory to another. In this situation, the memory system of the GPU shares bandwidth fairly between source and destination clients. The maximum bandwidth can be calculated by using the peak-bandwidth diagram at the start of this article.
 

Cidd

Member
Interesting that the max bandwidth when using main ram and esram is 136.4 GB not 170.

I was expecting it to be lower like maybe around 155 or so, But 136 that's a pretty steep dip. Now I'm curious to see the PS4 RAM bandwidth, If it takes the same hit then things just got a lot more interesting.
 

Shayan

Banned
that is misleading

only 30mb of that esram will write at 170g/s

if the rumored 8g GDDR3 is true then the write speed will be 68g/s

Still based on old information..

(or is it??) :D

nothing prevents MS from delaying launch by 3/6 months with better RAM ,unless they are going for a casual market where they feel they have everything to attract that casual crowd
 

McHuj

Member
Interesting that the max bandwidth when using main ram and esram is 136.4 GB not 170.

That's for a memcopy. That theoretical ~170 was if the GPU could read from both the DRAM and ESRAM which isn't in their table (ann may not be possible anyways)
 

PSGames

Junior Member
I was expecting it to be lower like maybe around 155 or so, But 136 that's a pretty steep dip. Now I'm curious to the PS4 RAM bandwidth, If it takes the same hit then things just got a lot more interesting.
this hit is due to transferring information from dram to esram. The ps4 has one memory pool so it won't have this issue.

4Ty9MOO.jpg
 

x-Lundz-x

Member
So, someone who can understand these charts explain to me please.

What does this equal in terms of PS4 power?

Same, 75% etc????
 

sangreal

Member
Interesting that the max bandwidth when using main ram and esram is 136.4 GB not 170.

136.4 is transfering data between the two memory pools. Reading from both pools is 170. It's pretty obvious transferring would be limited by the 68GB/s of the slower pool. Since you can only read 68GB/s you can't write that data faster to the esram and since you can only write 68GB/s there wouldn't be any point to reading faster from the esram
 

DieH@rd

Banned
According to their example, main ram is ~55-58GB/s [theoretic 68GB/s, but that will never happen] and they presume that usual CPU+northbridge modules workload will be around 25GB/s, with possibility of hitting saturation point of 30GB/s. That means that main ram will be shared 50/50 between CPU/northbrigde and GPU. GPU will be able to work with ~130GB/s [~30 to mainram [thats both read+write] and ~100to ESram, presuming that move engines are shut down].

Nothing new really, we knew from before that this Durango architecture has memory problems that are really nicely fixed in PS4.
 
Interesting that the max bandwidth when using main ram and esram is 136.4 GB not 170.

That's the bandwidth when the gpu is moving data between them...

It's also a bit misleading to call it that way because you are both reading from one pool at 68GB/s and writing at the other at the same speed... That operation leaves you with ~40GB/s unused on esram, but you are not moving data at 136GB/s
 

Cidd

Member
That's the bandwidth when the gpu is moving data between them...

It's also a bit misleading to call it that way because you are both reading from one pool at 68GB/s and writing at the other at the same speed... That operation leaves you with ~40GB/s unused on esram, but you are not moving data at 136GB/s

So it's even worse than they posted? I don't like the sound of this.
 

gcubed

Member
this is boring, another rehash. I want to know software information. Why are they reserving 3gb (if thats still true). Is the API requirement true?
 
So it's even worse than they posted? I don't like the sound of this.

It's not worse, it's actually the same they are saying, just worded differently... What they wrote as if durango had 170GB/s of total memory bandwidth and that operation consumes 136... That's partially correct, but the actual transfer is happening at only 68GB/s...

I dunno how much the gpu itself will be transferring data between them, though. The point of DMEs is to operate in parallel with both the gpu and cpu doing these data transfers when they are busy doing other work...
 

Clear

CliffyB's Cock Holster
What strikes me about that block diagram is that its complication is a result of very specific system software design goals, far more so than increasing/improving application performance.
 

Cidd

Member
It's not worse, it's actually the same they are saying, just worded differently... What they wrote as if durango had 170GB/s of total memory bandwidth and that operation consumes 136... That's partially correct, but the actual transfer is happening at only 68GB/s...

I dunno how much the gpu itself will be transferring data between them, though. The point of DMEs is to operate in parallel with both the gpu and cpu doing these data transfers when they are busy doing other work...

Ah, thanks for clearing that up, so any idea why they decided to combined both Bandwidth?
Is that something special to the Xbox 720?
 
Everyone else being able to read directly from the cpu caches are new? I mean, was that touted as a feature for jaguar before?

On 360 the gpu could also access the L2 cache directly, and use it as a stream buffer, but not sure that was widely used in games, but that sounds like the same concept.
 
Everyone else being able to read directly from the cpu caches are new? I mean, was that touted as a feature for jaguar before?

On 360 the gpu could also access the L2 cache directly, and use it as a stream buffer, but not sure that was widely used in games, but that sounds like the same concept.

In that graphic there should be somewhere a HSA mem management unit (HMMU) than makes this via hardware. In Xbox 360 memexport was software IIRC.

http://developer.amd.com/wordpress/media/2012/10/hsa10.pdf
 

Eideka

Banned
The gap in power between the 720 and the PS4 is terrifying. I wonder how multiplats on 720 will fare, there are reasons to be worried on that front.

I wouldn't be surprised if 3dr parties systematically enhance the PS4 versions of their game with graphical features.
 

Osiris

I permanently banned my 6 year old daughter from using the PS4 for mistakenly sending grief reports as it's too hard to watch or talk to her
Everyone else being able to read directly from the cpu caches are new? I mean, was that touted as a feature for jaguar before?

On 360 the gpu could also access the L2 cache directly, and use it as a stream buffer, but not sure that was widely used in games, but that sounds like the same concept.

It's a big part of AMD's APU push, so something both PS4 and the next Xbox will be capable of. (Assuming the leaks are correct and MS are using Jaguar cores)

See here for more detail.
 

shandy706

Member
The gap in power between the 720 and the PS4 is terrifying. I wonder how multiplats on 720 will fare, there are reasons to be worried on that front.

I wouldn't be surprised if 3dr parties systematically enhance the PS4 versions of their game with graphical features.

Unfortunately (fortunately for MS) that won't matter if the majority buys the new Xbox. Not talking Neogaf "hardcore/Sony supporting" gamers obviously, but if Microsoft comes out swinging with advertising and gets both many hardcore gamers and casual....the developers will most likely design/aim for the best profit level.

I have a feeling MS has something up their sleeve. These "leaks" are all so old now...they've got the current Durango information (even if it's the same) locked up in some nuclear silo. lol
 

mrklaw

MrArseFace
Unfortunately (fortunately for MS) that won't matter if the majority buys the new Xbox. Not talking hardcore gamers obviously, but if Microsoft comes out swinging with advertising and gets both many hardcore gamers and casual....the developers will most likely design/aim for the best profit level.

PCs will still be more powerful and likely to be the lead development platform anyway, with optimisations for console.

If the strengths of both platforms can be properly leveraged (PS4 seems fairly transparent, Durango will depend on how automatic some of these subsystems are), then I can see Durango getting the downports this time round.

both should still look great, but considering the tiniest differences seem enough in face-offs..
 

Cidd

Member
Unfortunately (fortunately for MS) that won't matter if the majority buys the new Xbox. Not talking hardcore gamers obviously, but if Microsoft comes out swinging with advertising and gets both many hardcore gamers and casual....the developers will most likely design/aim for the best profit level.

Well that's the problem isn't the majority of sales at launch the hardcore crowd? I can see the Xbox fans buying the 720 for first party games but if third party games have a clear advantage in appearance in favor of the PS4 then MS got a lot to worry about.

It's even more grim if they're launching around the same time.
 
Ah, thanks for clearing that up, so any idea why they decided to combined both Bandwidth?
Is that something special to the Xbox 720?

Why they decided to add the bandwidth for data transfers or why they are adding 102+68 as the total bandwidth?

For the transfers they are probably adding to show that even when transferring data from one memory to the other, as esram still has some bandwidth available that other clients can consume...

The added total bandwidth is a bit curious too. It implies that while the gpu can read at both at the same time it can only write to either of them at once... That limits some scenarios they hinted it would be possible before...
 

Eideka

Banned
Unfortunately (fortunately for MS) that won't matter if the majority buys the new Xbox. Not talking Neogaf "hardcore/Sony supporting" gamers obviously, but if Microsoft comes out swinging with advertising and gets both many hardcore gamers and casual....the developers will most likely design/aim for the best profit level.
I don't see how enhancing the PS4 version harms the next Xbox, it's the hardware that matters after all. I have trouble believing they will cater to the lowest machine and don't scale up from there given how easy the PS4 is to developp for.

I have a feeling MS has something up their sleeve. These "leaks" are all so old now...they've got the current Durango information (even if it's the same) locked up in some nuclear silo. lol
For the sake of competition, I hope so.

PCs will still be more powerful and likely to be the lead development platform anyway, with optimisations for console.
It has never been the case this generation save for a few exceptions, why would the PC be the lead for next-gen multiplatform games ?
Most likely one of the consoles.

@PhatSaqs : if those rumored specs are accurate then yes the PS4 is far ahead in every department.
 

PSGames

Junior Member
It's not worse, it's actually the same they are saying, just worded differently... What they wrote as if durango had 170GB/s of total memory bandwidth and that operation consumes 136... That's partially correct, but the actual transfer is happening at only 68GB/s...

I dunno how much the gpu itself will be transferring data between them, though. The point of DMEs is to operate in parallel with both the gpu and cpu doing these data transfers when they are busy doing other work...

Question so the GPU can read both at 170GBs but wouldn't the esram have to be transferred the data from the dram first at 68GBs in 32MB chunks? So even if the you can read both at 170GBs it takes a lot of lower bandwidth steps before it even gets to that point correct?
 
According to their example, main ram is ~55-58GB/s [theoretic 68GB/s, but that will never happen] and they presume that usual CPU+northbridge modules workload will be around 25GB/s, with possibility of hitting saturation point of 30GB/s. That means that main ram will be shared 50/50 between CPU/northbrigde and GPU. GPU will be able to work with ~130GB/s [~30 to mainram [thats both read+write] and ~100to ESram, presuming that move engines are shut down].

Nothing new really, we knew from before that this Durango architecture has memory problems that are really nicely fixed in PS4.

These seem like normal overheads that I'm quite sure exist on every architecture including PS4.

AKA PS4 wont be doing whatever it does at 176 GB/s, the practical limit will be like 140 GB/s or something (just an example not an actual accurate number)
 
Top Bottom