• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

VGLeaks: Durango's Move Engines

AgentP

Thinks mods influence posters politics. Promoted to QAnon Editor.
There's one thing that makes no sense to me. According to some of you, Durango's architecture has practically no advantages over the straightforward solution that Orbis employs - in fact, it seems to have a number of disadvantages - apart from the larger memory pool. That's allegedly because of Microsoft's non-gaming ambitions that require almost 3 gigs of RAM, which would not leave enough for games if they went with 4 gigs of GDDR5. So here's the thing I don't understand: if that was really the case, wouldn't it then be simpler to just put 3 gigs of DDR3 there for the system to use, and 3 (or even 4) additional gigs of GDDR5 for the games? The combination of DDR3 and GDDR5 is already common in the PC world, and it would hardly be significantly (if any) more expensive than 8 GB of DDR3 + ESRAM + customized DMEs + more problematic development because of the bottlenecks and the more complex architecture. I mean, if we can see that, surely it wouldn't escape all those Microsoft and AMD engineers.

No. You are talking about split pools, two memory controllers and two buses. It really isn't that difficult, they couldn't settle with 4GB, but they still wanted a unified pool of RAM. So they went with the solution that is much like the 360.

Either someone is right... Or there will be some shocked gamers come launch.

We could really use more info for a clearer picture.

There is no right or wrong, there just are the facts. Again, no secret hardware, no UFOs, etc. They are just two solutions to one problem with different emphasis. The leaks are all validated, no one has come out against the known information. Companies are not going to change hardware in the last few months to make forums warriors happy.
 

liquidboy

Banned
Theres a tonne of DMA that goes inside a normal computer system, and whislt i am not versed in the specifics of GPU based DMA (the information is not released to the public) I can tell you that it usually bits of silicon that shit on a shared bus to request/write data without tieing up other resources a good example of this is your Ethernet card would probably DMA all the data it gets into a buffer in memory instead of constantly talking to the CPU about it.

Oh I get the idea of DMA throughout a system .. Its more about the DMA that people are constantly talking about when comparing to the DME..

I guess im getting confused because I immediately think people are talking about the DMA in GPU's.
 

KidBeta

Junior Member
I can promise you, that he is legit to some extend. Alpha kits are rumored to be built by the devs itself based on a construction manual including off the shelf parts. Beta kits seem to be property of MS.

This is so wrong, if this is what superdae is he just confirmed himself a troll.
 
No. You are talking about split pools, two memory controllers and two buses. It really isn't that difficult, they couldn't settle with 4GB, but they still wanted a unified pool of RAM. So they went with the solution that is much like the 360.

From the CPU/GPU perspective, the pools would be unified (both the CPU and the GPU access the same memory), they would be only split from the application perspective, which is a good thing in this case. None of the other things would pose more of an obstacle than they do in PS3, Vita, GameCube or any other system that's ever used multiple memory types.
 
There's one thing that makes no sense to me. According to some of you, Durango's architecture has practically no advantages over the straightforward solution that Orbis employs - in fact, it seems to have a number of disadvantages - apart from the larger memory pool. That's allegedly because of Microsoft's non-gaming ambitions that require almost 3 gigs of RAM, which would not leave enough for games if they went with 4 gigs of GDDR5. So here's the thing I don't understand: if that was really the case, wouldn't it then be simpler to just put 3 gigs of DDR3 there for the system to use, and 3 (or even 4) additional gigs of GDDR5 for the games? The combination of DDR3 and GDDR5 is already common in the PC world, and it would hardly be significantly (if any) more expensive than 8 GB of DDR3 + ESRAM + customized DMEs + more problematic development because of the bottlenecks and the more complex architecture. I mean, if we can see that, surely it wouldn't escape all those Microsoft and AMD engineers.

I dont believe that for a second.... 3 gig for a OS reservation while 360 uses 32mb i can see them using 512mb to 1.5gb.
What are they using it for so i can play games,stream up my gameplay and store it in the cloud while buffering porn from 20 different browser tabs, To stream later in a picture in picture in the game im playing while storing the latest californication episode.
 

itsgreen

Member
I dont believe that for a second.... 3 gig for a OS reservation while 360 uses 32mb i can see them using 512mb to 1.5gb.
What are they using it for so i can play games,stream up my gameplay and store it in the cloud while buffering porn from 20 different browser tabs, To stream later in a picture in picture in the game im playing while storing the latest californication episode.

Why would one ever need more than 64k?
 

mrklaw

MrArseFace
There's one thing that makes no sense to me. According to some of you, Durango's architecture has practically no advantages over the straightforward solution that Orbis employs - in fact, it seems to have a number of disadvantages - apart from the larger memory pool. That's allegedly because of Microsoft's non-gaming ambitions that require almost 3 gigs of RAM, which would not leave enough for games if they went with 4 gigs of GDDR5. So here's the thing I don't understand: if that was really the case, wouldn't it then be simpler to just put 3 gigs of DDR3 there for the system to use, and 3 (or even 4) additional gigs of GDDR5 for the games? The combination of DDR3 and GDDR5 is already common in the PC world, and it would hardly be significantly (if any) more expensive than 8 GB of DDR3 + ESRAM + customized DMEs + more problematic development because of the bottlenecks and the more complex architecture. I mean, if we can see that, surely it wouldn't escape all those Microsoft and AMD engineers.


Durango dos seem to be going for efficiency,and that shouldn't be underestimated. If Durango gets 90% utilisation of its GPU, and orbis only 80% (entirely made up numbers) then that closes the effective flop gap. I.e Durango may be aiming to achieve similar results to orbis using less raw muscle.


I'm still on the fence regarding tiling. Weren't MS pushing this last time (tiled forward rendering) and it got left behind? What's different this time that devs will actually use it? More direct support in hardware making it easier to implement?
 

nib95

Banned
There's one thing that makes no sense to me. According to some of you, Durango's architecture has practically no advantages over the straightforward solution that Orbis employs - in fact, it seems to have a number of disadvantages - apart from the larger memory pool. That's allegedly because of Microsoft's non-gaming ambitions that require almost 3 gigs of RAM, which would not leave enough for games if they went with 4 gigs of GDDR5. So here's the thing I don't understand: if that was really the case, wouldn't it then be simpler to just put 3 gigs of DDR3 there for the system to use, and 3 (or even 4) additional gigs of GDDR5 for the games? The combination of DDR3 and GDDR5 is already common in the PC world, and it would hardly be significantly (if any) more expensive than 8 GB of DDR3 + ESRAM + customized DMEs + more problematic development because of the bottlenecks and the more complex architecture. I mean, if we can see that, surely it wouldn't escape all those Microsoft and AMD engineers.

No, it'd be far more complex. You can't just mix different rams on the same motherboard like that, you'd need completely different set ups and components for each. Different bus etc. They went for the cheaper and less hardware complex (heat, size and cost) solution of the two.
 
Why would one ever need more than 64k?

Its not like that 3gb for a embedded console OS is fucking much. If sony can do most of the stuff with 512mb like rumors are shouting how fucking embarrassing would that be for ms a hardware company beating you in making a more lean os doing the same thing.
Win8 with 10 tabs open on chrome doesn't even take more then 2.1gb right now. Running steam,origin,dropbox and shit load of other stuff in the background. Hell and europe where i am right(the netherlands) most of that dvr stuff is useless.

Durango dos seem to be going for efficiency,and that shouldn't be underestimated. If Durango gets 90% utilisation of its GPU, and orbis only 80% (entirely made up numbers) then that closes the effective flop gap. I.e Durango may be aiming to achieve similar results to orbis using less raw muscle.

I'm still on the fence regarding tiling. Weren't MS pushing this last time (tiled forward rendering) and it got left behind? What's different this time that devs will actually use it? More direct support in hardware making it easier to implement?

The developement in tiled deferred rendering?
 

scently

Member
Durango dos seem to be going for efficiency,and that shouldn't be underestimated. If Durango gets 90% utilisation of its GPU, and orbis only 80% (entirely made up numbers) then that closes the effective flop gap. I.e Durango may be aiming to achieve similar results to orbis using less raw muscle.


I'm still on the fence regarding tiling. Weren't MS pushing this last time (tiled forward rendering) and it got left behind? What's different this time that devs will actually use it? More direct support in hardware making it easier to implement?

The tiling in 360 was in regards to using 4xMSAA, it has nothing to do with rendering itself, and that was because you cannot fit a 720p 4xMSAA framebuffer in the 10mb edram. The esram on the durango can handle 2xMSAA 1080p frame buffer without tiling, of course a game might need more frame buffers and render targets so to go higher than 2x you have to tile and if you are using a differed renderer, you can render part of your buffer to the esram and the other to the ddr3 ram. ROP are optimized to deliver 4xMSAA which needs a large cache close to the ROP to perform at full speed. Mind you the ROP on the durango seems to be the same as the ones in amd gcn gpus, but it seems the ROP caches capacity seems to have been increased and from the diagram, the ROP is directly connected to the esram unit.

Does this mean the durango games will have very good MSAA? well we have to wait and see the games.
 

AgentP

Thinks mods influence posters politics. Promoted to QAnon Editor.
From the CPU/GPU perspective, the pools would be unified (both the CPU and the GPU access the same memory), they would be only split from the application perspective, which is a good thing in this case. None of the other things would pose more of an obstacle than they do in PS3, Vita, GameCube or any other system that's ever used multiple memory types.

This doesn't make sense. You can't do what you are describing. The closest thing is the PC model. DDR for the system RAM, GDDR5 for the VRAM and connect them via some bus.
 

Karak

Member
B3D user Gubbi posted a bit of information about the decompression bits and what it would normally need in comparison to normal Jag cores instead of using the DME's. Could be wrong but that's what I am getting from it.

think they wanted LZ decompression for DXT texture data. The quoted 200MB/s compressed stream is 30% faster than a single core on my 2600s based workstation. They'd need two jaguar cores to get that kind of performance.

The 200MB/s compressed data would decompress to 300-400MB/s DXT data or 300-800MTexels, 5-10 Mtexel per 60 Hz frame. - Probably fast enough by any measure.
 

sangreal

Member
Is VGLeaks doing an article today? Seemed to have the notion in my head that it would be the audio processor.

Looks like they will have something soon.

dxH2hmm.jpg


j5JGAYg.jpg
 

Alx

Member
I've seen this "plane" things mentioned in previous discussions, are there already infos on that ? From the last picture, it looks like a simple overlay/merging of images, but there are two planes for "title" and one for "system". Since both "title" images are merged, I suppose it can't be 3D, so maybe augmented reality ? (the A*C1 + (1-A)*C0 also suggests transparency, though)
 

AgentP

Thinks mods influence posters politics. Promoted to QAnon Editor.
That block diagram looks amazingly complex.

I was thinking the same thing. What is this block between the move engines and the memory? It basically says no matter how many move engines are reading/writing in parallel there is still a max bandwidth of 25.6GB/s to the memory?

Edit: If this is a APU, why are the CPU cores going to the memory via a slower Northbridge interface? No HSA then? I hope Orbis doesn't do this, it kind of negate AMD's recent work for CPU/GPU direct memory sharing.
 

CrunchinJelly

formerly cjelly
I've seen this "plane" things mentioned in previous discussions, are there already infos on that ? From the last picture, it looks like a simple overlay/merging of images, but there are two planes for "title" and one for "system". Since both "title" images are merged, I suppose it can't be 3D, so maybe augmented reality ? (the A*C1 + (1-A)*C0 also suggests transparency, though)

This might help:

http://www.faqs.org/patents/app/20110304713#ixzz2JcQBViQq
 

mrklaw

MrArseFace
so we can only have two layers of parallax? Thats a bit shit in a modern game. Or maybe one can be a mode 7 background, with a scrolling playfield over the top? That'd be pretty neat.
 

gofreak

GAF's Bob Woodward
One new tidbit in that diagram is output bandwidth from the GPU seems to be capped in a way read bandwidth isn't.

i.e. even if you want to write to both DDR3 and eSRAM, you can never exceed 102GB/s of output writes.
 

Girsej

Member
Durango dos seem to be going for efficiency,and that shouldn't be underestimated. If Durango gets 90% utilisation of its GPU, and orbis only 80% (entirely made up numbers) then that closes the effective flop gap. I.e Durango may be aiming to achieve similar results to orbis using less raw muscle.


I'm still on the fence regarding tiling. Weren't MS pushing this last time (tiled forward rendering) and it got left behind? What's different this time that devs will actually use it? More direct support in hardware making it easier to implement?

This is a great point, but based on info from devs Sony has really stepped up with their SDK and also given devs a lot more access to really code closer to native code then Durango. So from the sounds of it Sony has their own form of "efficiancy" that should make it easier for the devs to optimize performance.
 

scently

Member
One new tidbit in that diagram is output bandwidth from the GPU seems to be capped in a way read bandwidth isn't.

i.e. even if you want to write to both DDR3 and eSRAM, you can never exceed 102GB/s of output writes.

That indicates to the esram. The ddr3 is still 68.
 

gofreak

GAF's Bob Woodward
That indicates to the esram. The ddr3 is still 68.

Well, to the 'GPU memory system'. But what I mean is, you cannot 'combine' the memory pools to exceed 102GB/s of output bandwidth. You can read more than that, but not write.
 

gofreak

GAF's Bob Woodward
so we can only have two layers of parallax? Thats a bit shit in a modern game. Or maybe one can be a mode 7 background, with a scrolling playfield over the top? That'd be pretty neat.

Games can composite as many layers as they want in the ordinary way.

I think all this display plane system is doing is letting you separate different elements, and then chuck them at a system that will composite them in a way that's optimal for the connected display, so the application is abstracted away from display specifics. So for example you might have UI elements on one plane, the main rendered image on another, and the display plane system will scale or otherwise manipulate the UI plane dependent on the display, before compositing with the main rendered image. With the system also having a display plane it can output to which the display plane system will composite in with the correct scaling etc. for your display, if you have a system overlay open. Etc. etc.

But it doesn't place any limit on how many application specific layers the game might want to composite together before dumping them in one frame to the display plane system.
 

Durante

Member
I don't really get the point of display planes, at least from that illustration.

If it's a software feature, then why limit it to 3 and why make such a huge fuss about it?
If it's a hardware feature, then why? How long does a modern GPU take to scale and blend an image, a few microseconds?
 

scently

Member
Well, to the 'GPU memory system'. But what I mean is, you cannot 'combine' the memory pools to exceed 102GB/s of output bandwidth. You can read more than that, but not write.

Oh I see. I think their next title will be the memory system as it was part of their vote, as well as the display plane so we will probably get them within a 3 day period.
 

MaulerX

Member
I think I'll stop trying to figure it out. Lol. One look at that complex block diagram and I sure as hell wouldn't be surprised if all that working in harmony could indeed get close to Orbis.
 

scently

Member
I don't really get the point of display planes, at least from that illustration.

If it's a software feature, then why limit it to 3 and why make such a huge fuss about it?
If it's a hardware feature, then why? How long does a modern GPU take to scale and blend an image, a few microseconds?

I think the display plane has to do with rendering or sending video to different device but not with all the information at once I think.

Or it could be a reference to the their roadmap leak which indicated that you can be paying a game while having something like a news feed on the same screen.

There was a patent on display planes by MS and that seem to indicate that a game can render at a certain resolution while its UI can render at a different resolution.

Whatever they are doing with it, I am sure its necessary as it adds to the BOM.
 

McHuj

Member
I don't really get the point of display planes, at least from that illustration.

If it's a software feature, then why limit it to 3 and why make such a huge fuss about it?
If it's a hardware feature, then why? How long does a modern GPU take to scale and blend an image, a few microseconds?

I could see a benefit if the blending was automatic, straight from DDR, and did not pollute the caches of the CPU and GPU as well as the SRAM. It may not be the compute that's getting off loaded but the memory traffic.
 

liquidboy

Banned
I think the display plane has to do with rendering or sending video to different device but not with all the information at once I think.

Or it could be a reference to the their roadmap leak which indicated that you can be paying a game while having something like a news feed on the same screen.

There was a patent on display planes by MS and that seem to indicate that a game can render at a certain resolution while its UI can render at a different resolution.

Whatever they are doing with it, I am sure its necessary as it adds to the BOM.



Here's a nice beyond3d discussion on someones interpretation of display planes... not suggesting its right but interesting none the less..

http://forum.beyond3d.com/showpost.php?p=1705296&postcount=610
 

mrklaw

MrArseFace
I don't really get the point of display planes, at least from that illustration.

If it's a software feature, then why limit it to 3 and why make such a huge fuss about it?
If it's a hardware feature, then why? How long does a modern GPU take to scale and blend an image, a few microseconds?

for a video overlay I can understand if you want to pop a system notification over a HDMI video input for instance. But in games I have no idea why it is needed - even if you are going to use dynamic resolution for your game (or sub HD - please no), then it should be trivial to render your main 3D view in that res, then scale it up before you put the HUD on? Maybe they found some games with current gen suffering from lower resolutions and then being scaled up, so they think decoupling the HUD is useful?


and gofreak, I was taking the piss in a SNES style
 
This 170GB/s read speed is imo really misleading. I don't know if this is intentional, but it's 102GB/s for 32MB eSRAM and 68MB/s for 8GB DDR3. Why didn't they use 2 arrows in this illustration? They should have known that you can't add those numbers.
 

mrklaw

MrArseFace
Gemüsepizza;47580706 said:
This 170GB/s entry is really misleading. I don't know if this is intentional, but it's 102GB/s for 32MB eSRAM and 68MB/s for 8GB DDR3. Why didn't they use 2 arrows in this illustration? They should have known that you can't add those numbers.

perhaps the GPU can read from both simultaneously? so you'd keep some info in DDR3 and some in the ESRAM and combine them in the GPU somehow?
 
I don't really get the point of display planes, at least from that illustration.

If it's a software feature, then why limit it to 3 and why make such a huge fuss about it?
If it's a hardware feature, then why? How long does a modern GPU take to scale and blend an image, a few microseconds?

Without really knowing what the display plane diagram means I definitely see a use. They can say Display Plane 1 = Game World, Display Plane 2 = HUD, Display Plane 3 = Guide/OS

Why would they want to split it out this way? Maybe they have some native support for dynamic resolution, and in that case you'd want Display Plane 1 to have the dynamic resolution while Plane 2 and 3 would be independent. I guess Streaming or Recording could be another reason why you'd want the different planes.
 
Top Bottom