• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

VGLeaks: Durango's Move Engines

mrklaw

MrArseFace
Someone should let MS engineers know that they might as well not worry about these two extra DMAs because there is not much benefit of having them.

I realise you're being sarcastic, but don't put words into my mouth.

*if* these DMEs are extensions of the GCNs standard DMA engines (seems logical) then the main differences between this implementation and orbis would be that durango has compression/decompression built in, and it has two more engines than GCN

Considering they *share* the DMA bus you get no overall bandwidth advantages, just potentially more flexibility

And as GCN can use compressed textures directly it seems like this is mainly useful for tiling - which might be *very* useful, I don't know.

Basically retyping what I said
 

THE:MILKMAN

Member
He either moved to Australia or he pretends to. One follower on asked him on twitter for how much money he would pull the auction and he said for $3K he will deliver it in person within the states.

And it's just the alpha kit with the latest xdk it seems.

Funny how we never heard anything else about the Beta kit he was having delivered weeks ago. Unless I missed it?

Anyway, I think this whole thing is a wind up at this point. I can't tell if he's a master troll, a journalist/developer on a wind up or a Microsoft plant to contain/control leaks.

If he is fully legit I'll be amazed.
 

ekim

Member
Funny how we never heard anything else about the Beta kit he was having delivered weeks ago. Unless I missed it?

Anyway, I think this whole thing is a wind up at this point. I can't tell if he's a master troll, a journalist/developer on a wind up or a Microsoft plant to contain/control leaks.

If he is fully legit I'll be amazed.

I can promise you, that he is legit to some extend. Alpha kits are rumored to be built by the devs itself based on a construction manual including off the shelf parts. Beta kits seem to be property of MS.
 

KidBeta

Junior Member
I realise you're being sarcastic, but don't put words into my mouth.

*if* these DMEs are extensions of the GCNs standard DMA engines (seems logical) then the main differences between this implementation and orbis would be that durango has compression/decompression built in, and it has two more engines than GCN

Considering they *share* the DMA bus you get no overall bandwidth advantages, just potentially more flexibility

And as GCN can use compressed textures directly it seems like this is mainly useful for tiling - which might be *very* useful, I don't know.

Basically retyping what I said

Sony's going to have similar LZ hardware in there console whether they also have JPEG decompression I do not know.
 

THE:MILKMAN

Member
I can promise you, that he is legit to some extend. Alpha kits are rumored to be built by the devs itself based on a construction manual including off the shelf parts. Beta kits seem to be property of MS.

Wait wat!? That sounds so ametuerish. It would also mean at best he knows someone with documents and a construction manual ect.......not really direct and traceable access.

If he ever did meet Microsoft after the first Ebay dev kit fiasco, that is where him being legit ended IMO.
 
Wait wat!? That sounds so ametuerish. It would also mean at best he knows someone with documents and a construction manual ect.......not really direct and traceable access.

If he ever did meet Microsoft after the first Ebay dev kit fiasco, that is where him being legit ended IMO.

It's pretty much what I heard. Not surprised. From what I know, developpers were told to build games around a particular set of PC specs. Early 360 kits were basically a Mac Pro with Windows Kernel built for a g5 processor.

1487374415115817.JPG
 

KidBeta

Junior Member
First, nobody is thinking it's magic pixie dust. second there's people here and at B3D (including developers) that know what DMA is and how it's different to what is rumored. Now if you're smarter then all of them, please show the errors in their ways...

The problem here is that its not different to DMA at all, at most you could say its DMA to different pools of memory with compression shoved on the end, thats it, no magic pixie dust that a computer built in the past couple decades hasn't had atleast half of.
 

scently

Member
The problem here is that its not different to DMA at all, at most you could say its DMA to different pools of memory with compression shoved on the end, thats it, no magic pixie dust that a computer built in the past couple decades hasn't had atleast half of.

And that is why its different, in that it has and performs more functionality than the typical DMAs.
 

KidBeta

Junior Member
And that is why its different, in that it has and performs more functionality than the typical DMAs.

Sure ill concede to that, but that doesn't add magical unseen power, its been done before in computers, probably multiple times aswell.
 

scently

Member
Sure ill concede to that, but that doesn't add magical unseen power, its been done before in computers, probably multiple times aswell.

The point of discussion is not whether it performs magic, which u keep bringing up for what ever reason, but that it would have some benefit to the system. The reason I prefer the discussion on B3D is that, instead of constantly saying 'yeah, but___don't need it' or 'its been done before', the engage in discussion about the benefit and technical abilities of the rumored component in the system. You just don't seem interested in discussing the system at all. Why on earth you feel it is necessary to point out that it is in other computers and systems is beyond me. That is not, or at least, should not be the focus of this thread at all.
 
Could people maybe drop the whole special magic thing? At this point it seems like only some are interested in trying to analyse the design of Durango and the move engines, and coming up with theories about how it helps performance.

I don't see what some have to gain by constantly making everything about Durango look...anything but positive.

What's the point of it? Also I agree that there are viral marketeers spread throughout the internet forums, but is there a rule of thumb that says only MS has them? Where are the Sony ones, nobody seems to point them out.

Anyway, from what I understand these move engines seem to at least save GPU cycles. Can anybody with knowledge and no interest in console wars explain what exactly does that mean in relation to performance?
 

KidBeta

Junior Member
The point of discussion is not whether it performs magic, which u keep bringing up for what ever reason, but that it would have some benefit to the system. The reason I prefer the discussion on B3D is that, instead of constantly saying 'yeah, but___don't need it' or 'its been done before', the engage in discussion about the benefit and technical abilities of the rumored component in the system. You just don't seem interested in discussing the system at all. Why on earth you feel it is necessary to point out that it is in other computers and systems is beyond me. That is not, or at least, should not be the focus of this thread at all.

Because some people like to think that any rumor that comes out about any of the systems is exclusive hardware that never existed before and I feel the need to tell people multiple times so they remember that this is not new and it has been done before.

But on topic.

I cannot help but wonder how big these chips and how much they cost to manufacture, surely it cannot be that much but then again we are adding the compression and decompression into the mix which must increase their size atleast a little.
 
Could people maybe drop the whole special magic thing? At this point it seems like only some are interested in trying to analyse the design of Durango and the move engines, and coming up with theories about how it helps performance.

I don't see what some have to gain by constantly making everything about Durango look...anything but positive.

What's the point of it? Also I agree that there are viral marketeers spread throughout the internet forums, but is there a rule of thumb that says only MS has them? Where are the Sony ones, nobody seems to point them out.

Anyway, from what I understand these move engines seem to at least save GPU cycles. Can anybody with knowledge and no interest in console wars explain what exactly does that mean in relation to performance?
Basically the biggest advantage is that they can move data even when the CPU/GPU is stalled. It's not the biggest thing but it effectively allows them to keep resources flowing at all times. The only drawback I see is that developers are gonna have to do a lot of the footwork for this to be effective.
 
Anyway, from what I understand these move engines seem to at least save GPU cycles. Can anybody with knowledge and no interest in console wars explain what exactly does that mean in relation to performance?

Compared to not having them? Sure. But that's why modern GPUs have DMAs. So it isn't an issue.
 

THE:MILKMAN

Member
And that is why its different, in that it has and performs more functionality than the typical DMAs.

Different sure. But all this is in the end is a solution to a problem. Which is great.

But it has been said/hinted here that these DMEs and other stuff elevated performance above it's on paper specs. And aegies reported he'd heard Microsoft suggesting to third party devs performance was comparable to a GTX680.

That is a big claim from Microsoft if true. I'll wait to see the games though.
 
The point of discussion is not whether it performs magic, which u keep bringing up for what ever reason, but that it would have some benefit to the system. The reason I prefer the discussion on B3D is that, instead of constantly saying 'yeah, but___don't need it' or 'its been done before', the engage in discussion about the benefit and technical abilities of the rumored component in the system. You just don't seem interested in discussing the system at all. Why on earth you feel it is necessary to point out that it is in other computers and systems is beyond me. That is not, or at least, should not be the focus of this thread at all.
Different perspectives. GAF (some guys at GAF...) seems more interested into discussing about the hardware design while B3D discusses what could be done with it (tiling, mega-meshes...). Both are interesting.
 

KidBeta

Junior Member
So this is just a traditional design? For example in Orbis which is based apparently on the Pitcairn 7850 also employs the use of these 4 DMA engines?

it would certainly atleast have the two that the GCN architecture uses, it might also have others depending on what it needs. So to put it easily yes, this is just a traditional design with some hardware decompression hardware thrown in the mix.
 
So this is just a traditional design? For example in Orbis which is based apparently on the Pitcairn 7850 also employs the use of these 4 DMA engines?

Orbis wouldn't have more than 2 (which is how many a 7850 has). Durango needs more since the small size of the ESRAM means you have to move data much more often.
 
Orbis wouldn't have more than 2 (which is how many a 7850 has). Durango needs more since the small size of the ESRAM means you have to move data much more often.

Alright.

What are the advantages of having a system like this though? Why go with expensive ESRAM which will take a huge portion of die space, and opt for 2 extra DMA's instead of using a specific amount of cheap DDR3 for the OS, and go with for example 2 or 3 GB of something akin to GGDR5.

For example getting 2 GB of DDr3 for OS, and go with VRAM for the GPU? From what I understand, it would make more sense to have 3 GB of GDDR5 than 4/5 GB of DDR3 + Esram right?

Is there a big gulf in cost?
 
Having multiple memory buses of sufficient width adds a lot of board complexity and requires a lot of space on an APU. It would place significant limits on how far the chip design could be shrunk down in the future and ultimately cost more. It's obvious the design goal for Durango was to have a large amount of inexpensive RAM. The ESRAM and DMEs are just there to compensate for the shortcoming that would otherwise entail. If they just wanted the system to be fast it would look more like Orbis. Whatever their strategic goals are, they involve having lots of RAM available.
 

daxter01

8/8/2010 Blackace was here
Alright.

What are the advantages of having a system like this though? Why go with expensive ESRAM which will take a huge portion of die space, and opt for 2 extra DMA's instead of using a specific amount of cheap DDR3 for the OS, and go with for example 2 or 3 GB of something akin to GGDR5.

For example getting 2 GB of DDr3 for OS, and go with VRAM for the GPU? From what I understand, it would make more sense to have 3 GB of GDDR5 than 4/5 GB of DDR3 + Esram right?

Is there a big gulf in cost?

I guess that make the board too complex with 2 memory controller and high latency of GDDR5 is going to cause problems
 

scently

Member
So this is just a traditional design? For example in Orbis which is based apparently on the Pitcairn 7850 also employs the use of these 4 DMA engines?

I cannot answer if the orbis has the DMAs, apparently all recent AMD GCN based gpus have 2, but what I can tell you about the DMEs on the durango is that, they have 4 of them. Also, in addition to the usual DMA function, all 4 of them can tile and untile, one can decode LZ and JPEG while one can encode LZ. So it adds quite a bit of functionality to the standard DMA.
 
I cannot answer if the orbis has the DMAs, apparently all recent AMD GCN based gpus have 2, but what I can tell you about the DMEs on the durango is that, they have 4 of them. Also, in addition to the usual DMA function, all 4 of them can tile and untile, one can decode LZ and JPEG while one can encode LZ. So it adds quite a bit of functionality to the standard DMA.

Arent those dma used to get data from and on to the pcie bus ?
Im pretty sure i read about it in the OpenCL amd programmers guide but it has been a while.
 

JaggedSac

Member
Having multiple memory buses of sufficient width adds a lot of board complexity and requires a lot of space on an APU. It would place significant limits on how far the chip design could be shrunk down in the future and ultimately cost more. It's obvious the design goal for Durango was to have a large amount of inexpensive RAM. The ESRAM and DMEs are just there to compensate for the shortcoming that would otherwise entail. If they just wanted the system to be fast it would look more like Orbis. Whatever their strategic goals are, they involve having lots of RAM available.

Given the bandwidth of the system, did they realize that adding more CUs would not do anything since they would be starved, and adding more ROPs would do nothing since the GPU couldn't push out enough to need that many?
 

liquidboy

Banned
Interesting dma vs dme observation made at b3d

The DMEs are not standard DMA units, they also offer conversion between tiled and linear memory models, and that does increase the inefficiency. Indeed, GPU can copy between ESRAM and RAM on its own using DMA engines, but AFAIK they are only one-way fetching, so it would have to fire up its DMA for every copy - i.e. copy a chunk from RAM to cache, set them up again to copy from cache to ESRAM, plus, that would have to be repeated for every line of a texture if it's being tiled, while you can just set up DME once and have everything copied without ever stopping.
 

Reiko

Banned
This will improve effiency (if its not something that all GCN DMA can do, it might be after all its graphics related). But this will not increase beyond a a handful of percent at most.

A "handful"? Could you describe that in more detail, or was that just an opinion?
 

KidBeta

Junior Member
A "handful"? Could you describe that in more detail, or was that just an opinion?

http://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/

For powerful graphics hardware, with decent CPU's and high bandwidth memory access like that seen in both Orbis and Durango it would take not a lot of resources to perform this texture swizzling.

This is far from a complicated operation that is going to take half of Orbis's CPU/GPU very far from it. But we will see if its included in Orbis if it is then its clear its from the GCN DMA's and not microsoft's patented secrat sauce.
 

Reiko

Banned
http://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/

For powerful graphics hardware, with decent CPU's and high bandwidth memory access like that seen in both Orbis and Durango it would take not a lot of resources to perform this texture swizzling.

This is far from a complicated operation that is going to take half of Orbis's CPU/GPU very far from it. But we will see if its included in Orbis if it is then its clear its from the GCN DMA's and not microsoft's patented secrat sauce.

Much better.

We'll see. Again... Judging from the posts here and around the net... There never was any secret sauce. The Durango is supposed to be an efficient machine in design.
 
A clever, super efficient box that potentially matches and perhaps even in the right hands, exceeds the brute force styling of the competition.

No. The bandwidth situation may not be as catastrophic as a (pitiful) 68GB/s would suggest, but it's still categorically worse than simply having enough memory bandwidth to your main memory pool to begin with. Move engines and eSRAM are both a band aid to fix a problem that doesn't exist on on Orbis. Durango's GPU still has far fewer functional units than Orbis as well and no amount of eSRAM or move engines are going to change that. It's simply a slower system and its very difficult to think of any scenario where Durango is going to match Orbis.

Sony's system is the one that has fewer potential bottlenecks and development pitfalls (no need to juggle a 32MB memory pool and no potential for performance hiccups when using compute shader code) not Microsoft's. It's a more developer friendly architecture and its faster. Durango's one (questionable) advantage over Orbis is more RAM, that's it. Whether Durango has a powerful enough GPU and enough bandwidth to really take full and meaningful advantage of that larger RAM pool remains to be seen but most evidence would suggest not.
 
This seems to have happened to quite a few standard features of modern graphics hardware.

It's weird

First, nobody is thinking it's magic pixie dust. second there's people here and at B3D (including developers) that know what DMA is and how it's different to what is rumored. Now if you're smarter then all of them, please show the errors in their ways...

Please, there were people who thought this was the special sauce, read the first 10 pages. And of course it's different two of them have added benefits, no one's ever stated anything different but it's still based on the exact same idea as DMAs that have been around since forever; and that idea is to shuffle data between the subsystems on a chip; this isn't some exclusive sauce that makes Durango a paradigm shift from what's out there. What is laughable is this horribly stupid idea that desktop GPU and Orbis are these archaic brute force (people who say this need to just stop) devices and Durango is introducing completely new hardware never used before in a console to make it super efficient and just a crown of engineering excellence.
 

KidBeta

Junior Member
Much better.

We'll see. Again... Judging from the posts here and around the net... There never was any secret sauce. The Durango is supposed to be an efficient machine in design.

id Like to update my quote. And say that its a pretty performance critical thing (yay for changing opinions after you find out that they are wrong).

It seems its a pretty performance critical operation, of which every graphics card memory controller performs.
 

liquidboy

Banned
Can you link the quote?

oops thought I had quoted it ...

here it is again :

The DMEs are not standard DMA units, they also offer conversion between tiled and linear memory models, and that does increase the inefficiency. Indeed, GPU can copy between ESRAM and RAM on its own using DMA engines, but AFAIK they are only one-way fetching, so it would have to fire up its DMA for every copy - i.e. copy a chunk from RAM to cache, set them up again to copy from cache to ESRAM, plus, that would have to be repeated for every line of a texture if it's being tiled, while you can just set up DME once and have everything copied without ever stopping.

link to beyond3d quote
 

Reiko

Banned
id Like to update my quote. And say that its a pretty performance critical thing (yay for changing opinions after you find out that they are wrong).

It seems its a pretty performance critical operation, of which every graphics card memory controller performs.

More info is always good to read.
 

liquidboy

Banned
http://fgiesen.wordpress.com/2011/01/17/texture-tiling-and-swizzling/

For powerful graphics hardware, with decent CPU's and high bandwidth memory access like that seen in both Orbis and Durango it would take not a lot of resources to perform this texture swizzling.

This is far from a complicated operation that is going to take half of Orbis's CPU/GPU very far from it. But we will see if its included in Orbis if it is then its clear its from the GCN DMA's and not microsoft's patented secrat sauce.

That's a pretty bold statement and I would like to argue that it "depends" on the type of textures ..

if we're talking about mega textures and using Virtualized techniques im guessing that would definitely impact perf and the results.. I've read the articule you linked to around texture tiling and swizzling, not much info on the type of textures he was using.. BUT im guessing because it was done in 2011 that he wasn't really exploring the mega texture idea..

All im suggesting is that maybe, just maybe the design of the system revolves around improving certain scenarios like mega textures, virtualized approaches to textures etc..

And heres a link to a nice explanation of these mega meshs/textures from the lionhead devs. Incidently 343 Corrine yu also points out the importance of this in future gaming!
 

KidBeta

Junior Member
That's a pretty bold statement and I would like to argue that it "depends" on the type of textures ..

if we're talking about mega textures and using Virtualized techniques im guessing that would definitely impact perf and the results.. I've read the articule you linked to around texture tiling and swizzling, not much info on the type of textures he was using.. BUT im guessing because it was done in 2011 that he wasn't really exploring the mega texture idea..

All im suggesting is that maybe, just maybe the design of the system revolves around improving certain scenarios like mega textures, virtualized approaches to textures etc..

You also read the part where tiling is (logically) free in hardware?.

Because that should make it obvious to everyone here that its going to be present in both systems.
 

liquidboy

Banned
Theres been some great points made from other forums around doing work on GPU vs CPU vs Dedicated Chips (like these DME)...

Statement 1:
The GPU already has dedicated DMA hardware, you wouldn't need to use shader programs just to move stuff around in RAM...and I don't see how a "data movement engine" would automagically lead to greater GPU utilization just by existing.

Answer to the Statement 1:
But the whole point of this is to free up GPU resources. Why have the GPU do it if you can have something else do it while the GPU goes along with the rendering tasks and fetching what it needs from ESRAM when possible.

As well with the added functionality in 2 of the DMEs that allows for things to be done which would require GPU resources in the form of compute resources or CPU resources. Again things that could be better used for running the game than compressing/decompressing data.


- reference to above statements -




Statement 2
Memory access can be a lot more efficient using a dedicated chunk of silicon like this (DME).

Answer to Statement 2
Gaining you HOW MUCH exactly, really...? A few tenths of a percent, what? It can't be any huge amounts, that's for sure. Copying data must only take a tiny fraction of frame time.

Answer to Answer above
It's all about bus utilization.

Imagine a CPU doing the swizzling, loading and storing data. Do you use temporal or non-temporal memory ops? Either way the CPU quickly issues a series of loads and stores, then it stalls waiting for data.

Some of the accesses are adjacent so the prefetcher fires up, this helps with subsequent adjacent loads, - good. But because of the swizzling and boundaries (remember we can copy to and from subregions of textures) the next load is somewhere completely different and again we have a stall. The prefetcher has already fetched data ahead of the first series of loads, wasting bandwidth.

So we waste expensive silicon (our CPU core) moving data around, wasting bandwidth doing so. We're not talking a few percent here, more like 25-50%.

Alternatively you could run the texture cookie-cutter on the GPU, that won't waste any bandwidth, but the CU doing the moving will have its shader array just sit there while you copy data around and if your jpeg-decode-to-texture is part of a demand-loading texture pipeline, you'd have a lot of CPU overhead setting it up.


- reference to above statements -
 

liquidboy

Banned
You also read the part where tiling is (logically) free in hardware?.

Because that should make it obvious to everyone here that its going to be present in both systems.


I do not know of the section you speak of ... please point it out so I can re-read that!
 

KidBeta

Junior Member
Theres been some great points made from other forums around doing work on GPU vs CPU vs Dedicated Chips (like these DME)...

Statement 1:


Answer to the Statement 1:



- reference to above statements -




Statement 2


Answer to Statement 2


Answer to Answer above



- reference to above statements -

2 of the DME's are pretty obviously just the GCN DMA's (the two no other functionality other then to move data). And also the Orbis will be able to swizzle for free as well in its memory controller to assume otherwise is not logical.

I do not know of the section you speak of ... please point it out so I can re-read that!

While somewhat awkward in software, this kind of bit-interleaving is relatively easy and cheap to do in hardware since no logic is required (it does affect routing complexity, though).

Another hint to remember is that DMA is also dedicated silicon for data transfers, that happen in parallel to computation.
 

liquidboy

Banned
2 of the DME's are pretty obviously just the GCN DMA's (the two no other functionality other then to move data). And also the Orbis will be able to swizzle for free as well in its memory controller to assume otherwise is not logical.





Another hint to remember is that DMA is also dedicated silicon for data transfers, that happen in parallel to computation.

my apologies im not very well versed in the design of DMA in GPU .. is the DMA that people are mentioning, is that sitting in the GPU or as separate chips on the SoC?
 
Durango's one (questionable) advantage over Orbis is more RAM, that's it.

There's one thing that makes no sense to me. According to some of you, Durango's architecture has practically no advantages over the straightforward solution that Orbis employs - in fact, it seems to have a number of disadvantages - apart from the larger memory pool. That's allegedly because of Microsoft's non-gaming ambitions that require almost 3 gigs of RAM, which would not leave enough for games if they went with 4 gigs of GDDR5. So here's the thing I don't understand: if that was really the case, wouldn't it then be simpler to just put 3 gigs of DDR3 there for the system to use, and 3 (or even 4) additional gigs of GDDR5 for the games? The combination of DDR3 and GDDR5 is already common in the PC world, and it would hardly be significantly (if any) more expensive than 8 GB of DDR3 + ESRAM + customized DMEs + more problematic development because of the bottlenecks and the more complex architecture. I mean, if we can see that, surely it wouldn't escape all those Microsoft and AMD engineers.
 

Reiko

Banned
There's one thing that makes no sense to me. According to some of you, Durango's architecture has practically no advantages over the straightforward solution that Orbis employs - in fact, it seems to have a number of disadvantages - apart from the larger memory pool. That's allegedly because of Microsoft's non-gaming ambitions that require almost 3 gigs of RAM, which would not leave enough for games if they went with 4 gigs of GDDR5. So here's the thing I don't understand: if that was really the case, wouldn't it then be simpler to just put 3 gigs of DDR3 there for the system to use, and 3 (or even 4) additional gigs of GDDR5 for the games? The combination of DDR3 and GDDR5 is already common in the PC world, and it would hardly be significantly (if any) more expensive than 8 GB of DDR3 + ESRAM + customized DMEs + more problematic development because of the bottlenecks and the more complex architecture. I mean, if we can see that, surely it wouldn't escape all those Microsoft and AMD engineers.

Either someone is right... Or there will be some shocked gamers come launch.

We could really use more info for a clearer picture.
 

KidBeta

Junior Member
my apologies im not very well versed in the design of DMA in GPU .. is the DMA that people are mentioning, is that sitting in the GPU or as separate chips on the SoC?

Theres a tonne of DMA that goes inside a normal computer system, and whislt i am not versed in the specifics of GPU based DMA (the information is not released to the public) I can tell you that it usually bits of silicon that shit on a shared bus to request/write data without tieing up other resources a good example of this is your Ethernet card would probably DMA all the data it gets into a buffer in memory instead of constantly talking to the CPU about it.
 

liquidboy

Banned
Another interesting statement made regarding Durango vs 360, inline with the idea of mega textures/meshes

The raw texel rate is almost five times that of the 360. The Anisotropic filtering algorithms have also evolved enormously since 2005.

The system seems optimized for megamesh/megatexture type rendering. The hardware assisted decompression features together with the GPU using virtual address translation (which might remove the need for indirection in tex-lookup, cutting the cost of anisotropic filtering)

reference here
 
Top Bottom