• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

VGLeaks: Durango's Move Engines

cyberheater

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4
Also that LZ encode engine will be useful to allow off tv play like WiiU or to be able to stream Xbox720 to your tablet or smartphone.
 

Thraktor

Member
But the it doesn't look like thats all that possible on Durango either.

Sorry, meant Durango. Codename confusion with all the leaks and rumors flying around.

Also that LZ encode engine will be useful to allow off tv play like WiiU or to be able to stream Xbox720 to your tablet or smartphone.

Nope, it's just to move data around within the console (ie you'll have encode/decode done at each end of the data bus in hardware, effectively invisible to software). The compression ratio isn't going to be anything amazing, anyway, especially compared to lossy video compression that'd be used for streaming.
 
They don't seem strictly DMA's to me. Their main purpose seems to be moving pieces of data, in parallel of the gpu (specially in stages where the gpu is doing other stuff) so the data is where the gpu needs it to be when it's ready to access...

Tiling textures (as well render targets) pretty much explains how 32mb could effectively be used for both rendering into, and gathering data as well, actually improving overall system bandwidth (but i don't see they getting close to 170GB/s at all, tough).

Since the scanout shares the same link with them, i would guess that they can also gather all the tiles from both memories and send it directly to the display out, without having to unite them into a single buffer first (at least that would really make sense in a design like this XD).
 

DieH@rd

Banned
They only assist, they don't handle the entire bandwidth. Maybe they are enough to make for a well balanced system overall.

Well, Django will maybe be better balanced with these new Move Engine blocks... but it will still have less processing power than Orbit, who by the look of things, also has much easier architecture [balanced from the start].
 

gofreak

GAF's Bob Woodward
They don't seem strictly DMA's to me. Their main purpose seems to be moving pieces of data, in parallel of the gpu (specially in stages where the gpu is doing other stuff) so the data is where the gpu needs it to be when it's ready to access...

That's how memory management units generally work though, no? It's not like any GPU or CPU has to stall on a memory request if it has other things to work on. Memory operations are handled independently.

Well, Django will maybe be better balanced with these new Move Engine blocks...

I wouldn't say that. These units are there because the memory system on Durango is so relatively all over the place. Orbis has a very simple, elegant setup. It doesn't need this kind of help on that front in the first place.
 
So it is basically exactly as I expected. DMA engines. The only minor surprise here is hardware decompression.

It's just a way to make better use of the limited bandwidth and keep the GPU fed. It's not adding to FLOP count or anything as some of the rumors were trying to lead on.
 

KidBeta

Junior Member
From what I can tell it allows you to move effectively a tad more (5GB/s) of 1/3 of the bandwidth from the main RAM to the ESRAM using these ME's. So if you want peak bandwidth youll have to add other stuff as well, but that does not mean they wont be useful they just arent the mega tonne a lot of people expected.
 

Biggzy

Member
So it is basically exactly as I expected. DMA engines. The only minor surprise here is hardware decompression.

It's just a way to make better use of the limited bandwidth and keep the GPU fed. It's not adding to FLOP count or anything as some of the rumors were trying to lead on.

I thought it was obvious from the start that these would compensate some what for the lack of bandwidth that DDR3 brings.
 

Durante

Member
I thought it was obvious from the start that these would compensate some what for the lack of bandwidth that DDR3 brings.
It was the most likely scenario by far, but some of our resident "insiders" seemed to hint that there was more to it.
 

szaromir

Banned
Well, Django will maybe be better balanced with these new Move Engine blocks... but it will still have less processing power than Orbit, who by the look of things, also has much easier architecture [balanced from the start].
What's the obsession with comparing to Orbis?
 

spwolf

Member
well also, LZ77 could be LZX... which is pretty good compression wise for data (its whats mostly in CAB files). I wonder now what Sony uses.

I would not be surprised if 360 had something similar since it had "no installs".
 
Ok total not tech guy here but if I am reading this correctly this would mean that as a programmer you would have to anticipate what was going through the move engines, ESRAM and system RAM all at once? Isnt that the kind of stuff people complained about the PS3 with the SPE's and 2 pools of RAM?
 
That's how memory management units generally work though, no? It's not like any GPU or CPU has to stall on a memory request if it has other things to work on. Memory operations are handled independently.



I wouldn't say that. These units are there because the memory system on Durango is so relatively all over the place. Orbis has a very simple, elegant setup. It doesn't need this kind of help on that front in the first place.
In the end doesn't this effectively increase the bandwidth though or at least mean that the limited bandwidths a compensated for? Seems like 8GB of DDR3 with a lot of tricks to "increase" bandwidth might be the optimal solution?
 

gofreak

GAF's Bob Woodward
Ok total not tech guy here but if I am reading this correctly this would mean that as a programmer you would have to anticipate what was going through the move engines, ESRAM and system RAM all at once? Isnt that the kind of stuff people complained about the PS3 with the SPE's and 2 pools of RAM?

You don't have to. You could just keep it simple and do all your reads from DDR3 and use the eSRAM in a fairly simple way for GPU output buffers.

But that may not get the most out of the system. If you want to use eSRAM as anything other than a write buffer, you will have to get into data juggling in a bigger way. And you could get more or less optimal approaches to that requiring more or less pain.

In the end doesn't this effectively increase the bandwidth though or at least mean that the limited bandwidths a compensated for? Seems like 8GB of DDR3 with a lot of tricks to "increase" bandwidth might be the optimal solution?


See my comment above. They're there to help movement of data between eSRAM and DDR3, and super optimal use of available bandwidth on Durango will require doing that, depending on the needs of your game. It's about making best use of the eSRAM, for reads as well as writes if necessary, so that you're not stuck with 68GB/s.
 
That's how memory management units generally work though, no? It's not like any GPU or CPU has to stall on a memory request if it has other things to work on. Memory operations are handled independently.

True, but they usually work upon request. The only way that this setup makes sense is to work in advance being semi aware of which data will be needed before it is and moving it before hand in a fashion it increases the gpu fed rate... They seen to me more akin to how a cache would work.
 

KidBeta

Junior Member
In the end doesn't this effectively increase the bandwidth though or at least mean that the limited bandwidths a compensated for? Seems like 8GB of DDR3 with a lot of tricks to "increase" bandwidth might be the optimal solution?

Orbis is also rumored to have hardware based compression. where it sits i dunno.
 
The Cell has a DMA engine for each SPU and the main PPE core. It did help the PS3 make better use of it's given bandwidth. But I don't recall anyone trying to sell it as a ground breaking feature or anything. It was just a way to keep the CPU fed with data more efficiently. But no one way saying PS3 had more bandwidth because of DMA engines. But all of a sudden now...
 

szaromir

Banned
Ok total not tech guy here but if I am reading this correctly this would mean that as a programmer you would have to anticipate what was going through the move engines, ESRAM and system RAM all at once? Isnt that the kind of stuff people complained about the PS3 with the SPE's and 2 pools of RAM?
People mostly complained about smaller amount of memory available and lack of flexibility (fixed 256/256MB memory pools)
 
You don't have to. You could just keep it simple and do all your reads from DDR3 and use the eSRAM in a fairly simple way for GPU output buffers.

But that may not get the most out of the system. If you want to use eSRAM as anything other than a write buffer, you will have to get into data juggling in a bigger way. And you could get more or less optimal approaches to that requiring more or less pain.
Thanks for the clarification
 
Ok total not tech guy here but if I am reading this correctly this would mean that as a programmer you would have to anticipate what was going through the move engines, ESRAM and system RAM all at once? Isnt that the kind of stuff people complained about the PS3 with the SPE's and 2 pools of RAM?

Perhaps so, but since it's MS i would bet that they are coming with some compiler tricks to make this, at least to some extent, transparent to developers.
 

KidBeta

Junior Member
Perhaps so, but since it's MS i would bet that they are coming with some compiler tricks to make this, at least to some extent, transparent to developers.

Theres some things that are nearly impossible to pre-empt though, so this wont really work for everything will it?.
 

Chev

Member
can't current GPUs just use compressed textures directly from ram? so the ability to decode from a compressed storage to a texture isn't new, its just something they'd need to support anyway?
It's a different use. What GPUs do is they can directly read the compressed textures specifically for their rendering purposes. As such they're compression systems designed for textures an nothing else. What the ones described here do is much more generic decompression (LZ) and you can put back the decompressed data, be it text, sounds, whatever, wherever you want, like in ram, something which used to be done on the CPU. It's generic compression acceleration, not specifically texture compression.
 

cyberheater

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4
Nope, it's just to move data around within the console (ie you'll have encode/decode done at each end of the data bus in hardware, effectively invisible to software). The compression ratio isn't going to be anything amazing, anyway, especially compared to lossy video compression that'd be used for streaming.

That's a pity. I guess we can rule out off tv play from Xbox 720 then. Unless there is something we don't know in the GPU.
 

daveo42

Banned
So it's basically like having a separate physics card in your PC without any memory? It's good it can work independent of GPU/CPU loads if paths to the ESRAM and DDR3 RAM are available, but it seems like they went with the PS3 approach to make programming for it a insanely difficult.

Especially with a comment like this
This accelerators are truly fixed-function, in the sense that their algorithms are embedded in hardware. They can usually be considered black boxes with no intermediate results that are visible to software. When used for their designed purpose, however, they can offload work from the rest of the system and obtain useful results at minimal cost.

Good they have hardware-embedded algorithms, but bad if you want to create custom ones that pass through the move engine. Not even sure it that would be possible for devs.
 

Durante

Member
So it's basically like having a separate physics card in your PC without any memory?
No. It's like being able to transfer data from your main memory on PC to your GPU memory without using up CPU or GPU time. And while transparently doing some compression/decompression. Useful, but by no means groundbreaking.

And entirely unnecessary in a true UMA system such as Orbis.
 

scently

Member
That's a pity. I guess we can rule out off tv play from Xbox 720 then. Unless there is something we don't know in the GPU.

The durango has a video compress/decompress unit that can do that, and is separate from these DMEs. The Orbis has the video compress/decompress unit also, so I expect both of them to be able to stream to other devices just like the Wii U and its gamepad.
 
Perhaps so, but since it's MS i would bet that they are coming with some compiler tricks to make this, at least to some extent, transparent to developers.

I don't see how. Especially for moving data in and out the ESRAM, which is where your FB is gonna be. There's gonna be some pretty careful orchestration going on.
 

KidBeta

Junior Member
The durango has a video compress/decompress unit that can do that, and is separate from these DMEs. The Orbis has the video compress/decompress unit also, so I expect both of them to be able to stream to other devices just like the Wii U and its gamepad.

Uses the same data paths though, so if you end up using it you lose bandwidth for the DME's.
 

scently

Member
No. It's like being able to transfer data from your main memory on PC to your GPU memory without using up CPU or GPU time. And while transparently doing some compression/decompression. Useful, but by no means groundbreaking.

And entirely unnecessary in a true UMA system such as Orbis.

Indeed. They are there to help mitigate stalls.
 

cyberheater

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4
The durango has a video compress/decompress unit that can do that, and is separate from these DMEs. The Orbis has the video compress/decompress unit also, so I expect both of them to be able to stream to other devices just like the Wii U and its gamepad.

Great. I didn't know that.
 
The Cell has a DMA engine for each SPU and the main PPE core. It did help the PS3 make better use of it's given bandwidth. But I don't recall anyone trying to sell it as a ground breaking feature or anything. It was just a way to keep the CPU fed with data more efficiently. But no one way saying PS3 had more bandwidth because of DMA engines. But all of a sudden now...

AFAICS, you are comparing apples to oranges.

Think of this scenario: You have a 1mb texture on the main ram. you know that will have to read it, so you take half of it, copy to the esram, and when the gpu needs to read it, it will be able to read from both memories, effectively increasing the bandwidth.

Now, throwing everything that has been said about durango together, i think the system would actually operate like this: You have a 1mb texture on the main memory, early analysis of the scene indicates that only a portion of it is actually visible in this frame, so you gather some of the tiles of this texture that would be visible, move them to the esram, and when the gpu needs them, it can read the tiles from both memory pools. The difference now is that the total read would be less than 1Mb because you eliminated the portions of the texture that wouldn't be rendered any way.
 

oldergamer

Member
AFAICS, you are comparing apples to oranges.

Think of this scenario: You have a 1mb texture on the main ram. you know that will have to read it, so you take half of it, copy to the esram, and when the gpu needs to read it, it will be able to read from both memories, effectively increasing the bandwidth.

Now, throwing everything that has been said about durango together, i think the system would actually operate like this: You have a 1mb texture on the main memory, early analysis of the scene indicates that only a portion of it is actually visible in this frame, so you gather some of the tiles of this texture that would be visible, move them to the esram, and when the gpu needs them, it can read the tiles from both memory pools. The difference now is that the total read would be less than 1Mb because you eliminated the portions of the texture that wouldn't be rendered any way.

Now that could be a pretty large bandwidth saving per frame
 

KidBeta

Junior Member
Just a FYI the numbers I posted are only related to using it in raw copy or tiled mode.

As per the document these are the only two modes which attain peak rate of bandwidth.

If using compression I assume it gets less but how much is unknown.
 

Durante

Member
AFAICS, you are comparing apples to oranges.

Think of this scenario: You have a 1mb texture on the main ram. you know that will have to read it, so you take half of it, copy to the esram, and when the gpu needs to read it, it will be able to read from both memories, effectively increasing the bandwidth.

Now, throwing everything that has been said about durango together, i think the system would actually operate like this: You have a 1mb texture on the main memory, early analysis of the scene indicates that only a portion of it is actually visible in this frame, so you gather some of the tiles of this texture that would be visible, move them to the esram, and when the gpu needs them, it can read the tiles from both memory pools. The difference now is that the total read would be less than 1Mb because you eliminated the portions of the texture that wouldn't be rendered any way.
But in the end, even in that idealized scenario, you are still using more memory bandwidth than Orbis, which simply reads only the parts of the texture it needs from its one memory pool.
 

KidBeta

Junior Member
Im very curious as to how fast these work when using the JPEG or LZ compression.

It is not mentioned in the document anywhere and we know it is not at the peak rate.

EDIT :.

Scrap that found them.

221MB/s at 60hz for 4:2:2 sampling
373MB/s at 60hz for 4:2:0 sampling

which is.

3.7MB/s frame (4:2:2)
6.2MB/s frame (4:2:0)

LZ Rates.

Encode.
150-200 MB/s

Decode.
200 MB/s

which is. [raw]

3.333MB/Frame at 60hz
6.666MB/Frame at 30hz

which is. [end to end]

1.666MB/Frame at 60hz
3.333MB/Frame at 30hz
 
Top Bottom