Support NeoGAF

CosmicGroinPull · Feb 6, 2013

KennyLinder said:
How many GAMECUBE's?

ROFL. This should be the new metric

Ashes · Feb 6, 2013

sleeping_dragon said:
Whats the DBZ power of these?

nothing. This isn't a er computing unit.

grammar be damned.

cyberheater · Feb 6, 2013

Also that LZ encode engine will be useful to allow off tv play like WiiU or to be able to stream Xbox720 to your tablet or smartphone.

JaggedSac · Feb 6, 2013

Well, that is not particularly interesting.

xam3l · Feb 6, 2013

The true questions here are:
Can it play current PC games and its better that the PS4 or not?

KidBeta · Feb 6, 2013

cyberheater said:
That's a lot of data per frame. More then enough.

We don't know what the typical rate is (im assuming 100%) here, and also your going to lose a bit (couple MB's, maybe more) to other things.

iamshadowlark · Feb 6, 2013

cyberheater said:
That's a lot of data per frame. More then enough.

God no.

DieH@rd · Feb 6, 2013

KidBeta said:
Hardware compression is nice.

But this is not jizz.

Yeah. I expected more from these blocks.

Thraktor · Feb 6, 2013

iamshadowlark said:
But the it doesn't look like thats all that possible on Durango either.

Sorry, meant Durango. Codename confusion with all the leaks and rumors flying around.

cyberheater said:
Also that LZ encode engine will be useful to allow off tv play like WiiU or to be able to stream Xbox720 to your tablet or smartphone.

Nope, it's just to move data around within the console (ie you'll have encode/decode done at each end of the data bus in hardware, effectively invisible to software). The compression ratio isn't going to be anything amazing, anyway, especially compared to lossy video compression that'd be used for streaming.

szaromir · Feb 6, 2013

iamshadowlark said:
God no.

They only assist, they don't handle the entire bandwidth. Maybe they are enough to make for a well balanced system overall.

Durante · Feb 6, 2013

That's about what I expected, and really doesn't make up for lower computational capabilities in any way.

iamshadowlark · Feb 6, 2013

szaromir said:
They only assist, they don't handle the entire bandwidth. Maybe they are enough to make for a well balanced system overall.

Good point.

Biggzy · Feb 6, 2013

szaromir said:
They only assist, they don't handle the entire bandwidth. Maybe they are enough to make for a well balanced system overall.

This is what all the rumors seem to indicate so far.

LukasTaves · Feb 6, 2013

They don't seem strictly DMA's to me. Their main purpose seems to be moving pieces of data, in parallel of the gpu (specially in stages where the gpu is doing other stuff) so the data is where the gpu needs it to be when it's ready to access...

Tiling textures (as well render targets) pretty much explains how 32mb could effectively be used for both rendering into, and gathering data as well, actually improving overall system bandwidth (but i don't see they getting close to 170GB/s at all, tough).

Since the scanout shares the same link with them, i would guess that they can also gather all the tiles from both memories and send it directly to the display out, without having to unite them into a single buffer first (at least that would really make sense in a design like this XD).

DieH@rd · Feb 6, 2013

szaromir said:
They only assist, they don't handle the entire bandwidth. Maybe they are enough to make for a well balanced system overall.

Well, Django will maybe be better balanced with these new Move Engine blocks... but it will still have less processing power than Orbit, who by the look of things, also has much easier architecture [balanced from the start].

gofreak · Feb 6, 2013

LukasTaves said:
They don't seem strictly DMA's to me. Their main purpose seems to be moving pieces of data, in parallel of the gpu (specially in stages where the gpu is doing other stuff) so the data is where the gpu needs it to be when it's ready to access...

That's how memory management units generally work though, no? It's not like any GPU or CPU has to stall on a memory request if it has other things to work on. Memory operations are handled independently.

DieH@rd said:
Well, Django will maybe be better balanced with these new Move Engine blocks...

I wouldn't say that. These units are there because the memory system on Durango is so relatively all over the place. Orbis has a very simple, elegant setup. It doesn't need this kind of help on that front in the first place.

PuppetMaster · Feb 6, 2013

So it is basically exactly as I expected. DMA engines. The only minor surprise here is hardware decompression.

It's just a way to make better use of the limited bandwidth and keep the GPU fed. It's not adding to FLOP count or anything as some of the rumors were trying to lead on.

KidBeta · Feb 6, 2013

From what I can tell it allows you to move effectively a tad more (5GB/s) of 1/3 of the bandwidth from the main RAM to the ESRAM using these ME's. So if you want peak bandwidth youll have to add other stuff as well, but that does not mean they wont be useful they just arent the mega tonne a lot of people expected.

Biggzy · Feb 6, 2013

PuppetMaster said:
So it is basically exactly as I expected. DMA engines. The only minor surprise here is hardware decompression.

It's just a way to make better use of the limited bandwidth and keep the GPU fed. It's not adding to FLOP count or anything as some of the rumors were trying to lead on.

I thought it was obvious from the start that these would compensate some what for the lack of bandwidth that DDR3 brings.

Durante · Feb 6, 2013

Biggzy said:
I thought it was obvious from the start that these would compensate some what for the lack of bandwidth that DDR3 brings.

It was the most likely scenario by far, but some of our resident "insiders" seemed to hint that there was more to it.

LukasTaves · Feb 6, 2013

iamshadowlark said:
God no.

That wouldn't be the max the system can access per frame. Imagine that a fraction of that is data being moved from main ram to esram and we are literally increasing the data the gpu can access per frame by a very good margin.

szaromir · Feb 6, 2013

DieH@rd said:
Well, Django will maybe be better balanced with these new Move Engine blocks... but it will still have less processing power than Orbit, who by the look of things, also has much easier architecture [balanced from the start].

What's the obsession with comparing to Orbis?

Perkel · Feb 6, 2013

Durante said:
That's about what I expected, and really doesn't make up for lower computational capabilities in any way.

Wouldn't it generate more overhead ?

spwolf · Feb 6, 2013

well also, LZ77 could be LZX... which is pretty good compression wise for data (its whats mostly in CAB files). I wonder now what Sony uses.

I would not be surprised if 360 had something similar since it had "no installs".

bloodyroarxx · Feb 6, 2013

Ok total not tech guy here but if I am reading this correctly this would mean that as a programmer you would have to anticipate what was going through the move engines, ESRAM and system RAM all at once? Isnt that the kind of stuff people complained about the PS3 with the SPE's and 2 pools of RAM?

Chuck Norris · Feb 6, 2013

gofreak said:
That's how memory management units generally work though, no? It's not like any GPU or CPU has to stall on a memory request if it has other things to work on. Memory operations are handled independently.

I wouldn't say that. These units are there because the memory system on Durango is so relatively all over the place. Orbis has a very simple, elegant setup. It doesn't need this kind of help on that front in the first place.

In the end doesn't this effectively increase the bandwidth though or at least mean that the limited bandwidths a compensated for? Seems like 8GB of DDR3 with a lot of tricks to "increase" bandwidth might be the optimal solution?

gofreak · Feb 6, 2013

bloodyroarxx said:
Ok total not tech guy here but if I am reading this correctly this would mean that as a programmer you would have to anticipate what was going through the move engines, ESRAM and system RAM all at once? Isnt that the kind of stuff people complained about the PS3 with the SPE's and 2 pools of RAM?

You don't have to. You could just keep it simple and do all your reads from DDR3 and use the eSRAM in a fairly simple way for GPU output buffers.

But that may not get the most out of the system. If you want to use eSRAM as anything other than a write buffer, you will have to get into data juggling in a bigger way. And you could get more or less optimal approaches to that requiring more or less pain.

Chuck Norris said:
In the end doesn't this effectively increase the bandwidth though or at least mean that the limited bandwidths a compensated for? Seems like 8GB of DDR3 with a lot of tricks to "increase" bandwidth might be the optimal solution?

See my comment above. They're there to help movement of data between eSRAM and DDR3, and super optimal use of available bandwidth on Durango will require doing that, depending on the needs of your game. It's about making best use of the eSRAM, for reads as well as writes if necessary, so that you're not stuck with 68GB/s.

LukasTaves · Feb 6, 2013

gofreak said:
That's how memory management units generally work though, no? It's not like any GPU or CPU has to stall on a memory request if it has other things to work on. Memory operations are handled independently.

True, but they usually work upon request. The only way that this setup makes sense is to work in advance being semi aware of which data will be needed before it is and moving it before hand in a fashion it increases the gpu fed rate... They seen to me more akin to how a cache would work.

KidBeta · Feb 6, 2013

Chuck Norris said:
In the end doesn't this effectively increase the bandwidth though or at least mean that the limited bandwidths a compensated for? Seems like 8GB of DDR3 with a lot of tricks to "increase" bandwidth might be the optimal solution?

Orbis is also rumored to have hardware based compression. where it sits i dunno.

PuppetMaster · Feb 6, 2013

The Cell has a DMA engine for each SPU and the main PPE core. It did help the PS3 make better use of it's given bandwidth. But I don't recall anyone trying to sell it as a ground breaking feature or anything. It was just a way to keep the CPU fed with data more efficiently. But no one way saying PS3 had more bandwidth because of DMA engines. But all of a sudden now...

szaromir · Feb 6, 2013

bloodyroarxx said:
Ok total not tech guy here but if I am reading this correctly this would mean that as a programmer you would have to anticipate what was going through the move engines, ESRAM and system RAM all at once? Isnt that the kind of stuff people complained about the PS3 with the SPE's and 2 pools of RAM?

People mostly complained about smaller amount of memory available and lack of flexibility (fixed 256/256MB memory pools)

bloodyroarxx · Feb 6, 2013

gofreak said:
You don't have to. You could just keep it simple and do all your reads from DDR3 and use the eSRAM in a fairly simple way for GPU output buffers.

But that may not get the most out of the system. If you want to use eSRAM as anything other than a write buffer, you will have to get into data juggling in a bigger way. And you could get more or less optimal approaches to that requiring more or less pain.

Thanks for the clarification

LukasTaves · Feb 6, 2013

bloodyroarxx said:
Ok total not tech guy here but if I am reading this correctly this would mean that as a programmer you would have to anticipate what was going through the move engines, ESRAM and system RAM all at once? Isnt that the kind of stuff people complained about the PS3 with the SPE's and 2 pools of RAM?

Perhaps so, but since it's MS i would bet that they are coming with some compiler tricks to make this, at least to some extent, transparent to developers.

KidBeta · Feb 6, 2013

LukasTaves said:
Perhaps so, but since it's MS i would bet that they are coming with some compiler tricks to make this, at least to some extent, transparent to developers.

Theres some things that are nearly impossible to pre-empt though, so this wont really work for everything will it?.

Chev · Feb 6, 2013

mrklaw said:
can't current GPUs just use compressed textures directly from ram? so the ability to decode from a compressed storage to a texture isn't new, its just something they'd need to support anyway?

It's a different use. What GPUs do is they can directly read the compressed textures specifically for their rendering purposes. As such they're compression systems designed for textures an nothing else. What the ones described here do is much more generic decompression (LZ) and you can put back the decompressed data, be it text, sounds, whatever, wherever you want, like in ram, something which used to be done on the CPU. It's generic compression acceleration, not specifically texture compression.

cyberheater · Feb 6, 2013

Thraktor said:
Nope, it's just to move data around within the console (ie you'll have encode/decode done at each end of the data bus in hardware, effectively invisible to software). The compression ratio isn't going to be anything amazing, anyway, especially compared to lossy video compression that'd be used for streaming.

That's a pity. I guess we can rule out off tv play from Xbox 720 then. Unless there is something we don't know in the GPU.

kikonawa · Feb 6, 2013

No secret sauce, its only mayo

daveo42 · Feb 6, 2013

So it's basically like having a separate physics card in your PC without any memory? It's good it can work independent of GPU/CPU loads if paths to the ESRAM and DDR3 RAM are available, but it seems like they went with the PS3 approach to make programming for it a insanely difficult.

Especially with a comment like this

This accelerators are truly fixed-function, in the sense that their algorithms are embedded in hardware. They can usually be considered black boxes with no intermediate results that are visible to software. When used for their designed purpose, however, they can offload work from the rest of the system and obtain useful results at minimal cost.

Good they have hardware-embedded algorithms, but bad if you want to create custom ones that pass through the move engine. Not even sure it that would be possible for devs.

Durante · Feb 6, 2013

daveo42 said:
So it's basically like having a separate physics card in your PC without any memory?

No. It's like being able to transfer data from your main memory on PC to your GPU memory without using up CPU or GPU time. And while transparently doing some compression/decompression. Useful, but by no means groundbreaking.

And entirely unnecessary in a true UMA system such as Orbis.

scently · Feb 6, 2013

cyberheater said:
That's a pity. I guess we can rule out off tv play from Xbox 720 then. Unless there is something we don't know in the GPU.

The durango has a video compress/decompress unit that can do that, and is separate from these DMEs. The Orbis has the video compress/decompress unit also, so I expect both of them to be able to stream to other devices just like the Wii U and its gamepad.

iamshadowlark · Feb 6, 2013

LukasTaves said:
Perhaps so, but since it's MS i would bet that they are coming with some compiler tricks to make this, at least to some extent, transparent to developers.

I don't see how. Especially for moving data in and out the ESRAM, which is where your FB is gonna be. There's gonna be some pretty careful orchestration going on.

KidBeta · Feb 6, 2013

scently said:
The durango has a video compress/decompress unit that can do that, and is separate from these DMEs. The Orbis has the video compress/decompress unit also, so I expect both of them to be able to stream to other devices just like the Wii U and its gamepad.

Uses the same data paths though, so if you end up using it you lose bandwidth for the DME's.

scently · Feb 6, 2013

Durante said:
No. It's like being able to transfer data from your main memory on PC to your GPU memory without using up CPU or GPU time. And while transparently doing some compression/decompression. Useful, but by no means groundbreaking.

And entirely unnecessary in a true UMA system such as Orbis.

Indeed. They are there to help mitigate stalls.

cyberheater · Feb 6, 2013

scently said:
The durango has a video compress/decompress unit that can do that, and is separate from these DMEs. The Orbis has the video compress/decompress unit also, so I expect both of them to be able to stream to other devices just like the Wii U and its gamepad.

Great. I didn't know that.

LukasTaves · Feb 6, 2013

PuppetMaster said:
The Cell has a DMA engine for each SPU and the main PPE core. It did help the PS3 make better use of it's given bandwidth. But I don't recall anyone trying to sell it as a ground breaking feature or anything. It was just a way to keep the CPU fed with data more efficiently. But no one way saying PS3 had more bandwidth because of DMA engines. But all of a sudden now...

AFAICS, you are comparing apples to oranges.

Think of this scenario: You have a 1mb texture on the main ram. you know that will have to read it, so you take half of it, copy to the esram, and when the gpu needs to read it, it will be able to read from both memories, effectively increasing the bandwidth.

Now, throwing everything that has been said about durango together, i think the system would actually operate like this: You have a 1mb texture on the main memory, early analysis of the scene indicates that only a portion of it is actually visible in this frame, so you gather some of the tiles of this texture that would be visible, move them to the esram, and when the gpu needs them, it can read the tiles from both memory pools. The difference now is that the total read would be less than 1Mb because you eliminated the portions of the texture that wouldn't be rendered any way.

oldergamer · Feb 6, 2013

LukasTaves said:
AFAICS, you are comparing apples to oranges.

Think of this scenario: You have a 1mb texture on the main ram. you know that will have to read it, so you take half of it, copy to the esram, and when the gpu needs to read it, it will be able to read from both memories, effectively increasing the bandwidth.

Now, throwing everything that has been said about durango together, i think the system would actually operate like this: You have a 1mb texture on the main memory, early analysis of the scene indicates that only a portion of it is actually visible in this frame, so you gather some of the tiles of this texture that would be visible, move them to the esram, and when the gpu needs them, it can read the tiles from both memory pools. The difference now is that the total read would be less than 1Mb because you eliminated the portions of the texture that wouldn't be rendered any way.

Now that could be a pretty large bandwidth saving per frame

KidBeta · Feb 6, 2013

Just a FYI the numbers I posted are only related to using it in raw copy or tiled mode.

As per the document these are the only two modes which attain peak rate of bandwidth.

If using compression I assume it gets less but how much is unknown.

Durante · Feb 6, 2013

LukasTaves said:
AFAICS, you are comparing apples to oranges.

Think of this scenario: You have a 1mb texture on the main ram. you know that will have to read it, so you take half of it, copy to the esram, and when the gpu needs to read it, it will be able to read from both memories, effectively increasing the bandwidth.

Now, throwing everything that has been said about durango together, i think the system would actually operate like this: You have a 1mb texture on the main memory, early analysis of the scene indicates that only a portion of it is actually visible in this frame, so you gather some of the tiles of this texture that would be visible, move them to the esram, and when the gpu needs them, it can read the tiles from both memory pools. The difference now is that the total read would be less than 1Mb because you eliminated the portions of the texture that wouldn't be rendered any way.

But in the end, even in that idealized scenario, you are still using more memory bandwidth than Orbis, which simply reads only the parts of the texture it needs from its one memory pool.

Durante · Feb 6, 2013

JoeTheBlow said:
So these are just express-lane highways, to get around the slow-as-shit main memory then?

No. They use the existing memory bus. They alleviate CPU/GPU time from memory transfers. These are not additional buses.

KidBeta · Feb 6, 2013

Im very curious as to how fast these work when using the JPEG or LZ compression.

It is not mentioned in the document anywhere and we know it is not at the peak rate.

EDIT :.

Scrap that found them.

221MB/s at 60hz for 4:2:2 sampling
373MB/s at 60hz for 4:2:0 sampling

which is.

3.7MB/s frame (4:2:2)
6.2MB/s frame (4:2:0)

LZ Rates.

Encode.
150-200 MB/s

Decode.
200 MB/s

which is. [raw]

3.333MB/Frame at 60hz
6.666MB/Frame at 30hz

which is. [end to end]

1.666MB/Frame at 60hz
3.333MB/Frame at 30hz

Support NeoGAF

VGLeaks: Durango's Move Engines

Member

Banned

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4

Member

Member

Junior Member

Banned

Banned

Member

Banned

Member

Banned

Member

Member

Banned

GAF's Bob Woodward

Member

Junior Member

Member

Member

Member

Banned

Banned

Member

Member

Banned

GAF's Bob Woodward

Member

Junior Member

Member

Banned

Member

Member

Junior Member

Member

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4

Member

Banned

Member

Member

Banned

Junior Member

Member

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4

Member

Member

Junior Member

Member

Member

Junior Member

Similar threads