• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

EDGE: "Power struggle: the real differences between PS4 and Xbox One performance"

DBT85

Member

Did they just completely not answer the question bout the CUs or is it me?

Memory bandwidth is one thing, but graphics capability is clearly another. PlayStation 4 enjoys a clear advantage in terms of on-board GPU compute units - a raw stat that is beyond doubt, and in turn offers a huge boost to PS4's enviable spec sheet. Andrew Goosen first confirms that both Xbox One and PS4 graphics tech is derived from the same AMD "Island" family before addressing the Microsoft console's apparent GPU deficiency in depth.

"Just like our friends we're based on the Sea Islands family. We've made quite a number of changes in different parts of the areas... The biggest thing in terms of the number of compute units, that's been something that's been very easy to focus on. It's like, hey, let's count up the number of CUs, count up the gigaflops and declare the winner based on that. My take on it is that when you buy a graphics card, do you go by the specs or do you actually run some benchmarks?" he says.

"Firstly though, we don't have any games out. You can't see the games. When you see the games you'll be saying, 'what is the performance difference between them'. The games are the benchmarks. We've had the opportunity with the Xbox One to go and check a lot of our balance. The balance is really key to making good performance on a games console. You don't want one of your bottlenecks being the main bottleneck that slows you down."
 
The only major difference between the two is the eSRAM and its pretty obvious what its point is. Theres no hidden secrets or fruits we have missed

I think it's literally impossible to say this with any sort of certainty. We don't even know where about 4 to 5MB of the supposed 47MB worth of on-chip storage Microsoft says is built into the Xbox One SoC. Where that 4 to 5MB, or potentially more if I'm mistaking the numbers, went could be fairly important. Just saying.
 

KidBeta

Junior Member
"On the SoC, there are many parallel engines - some of those are more like CPU cores or DSP cores. How we count to fifteen: [we have] eight inside the audio block, four move engines, one video encode, one video decode and one video compositor/resizer," says Nick Baker.

That answers one question.

Just like our friends we're based on the Sea Islands family.

And another.

"From a power/efficiency standpoint as well, fixed functions are more power-friendly on fixed function units," adds Nick Baker. "We put data compression on there as well, so we have LZ compression/decompression and also motion JPEG decode which helps with Kinect. So there's a lot more to the Data Move Engines than moving from one block of memory to another."

And another.
 

IN&OUT

Banned
Well Kaz Hirai himself said they are using off the shelf parts:

http://www.psxextreme.com/ps4-news/590.html

Edit: and yes - I've seen the second part of the sentence. ;)

Sony designed the CELL with IBM and Toshiba, Kaz meant that PS4 APU is built by AMD, just like X1. but his statement doesn't mean that Sony didn't add some customizations at all.

all your questions are answered in Mark Cerny Gamasutra interview, but people tend to ignore what they don't like. tidbits from Cerny interview :

For example, this is his take on its GPU: "It's ATI Radeon. Getting into specific numbers probably doesn't help clarify the situation much, except we took their most current technology, and performed a large number of modifications to it."

To understand the PS4, you have to take what you know about Cerny's vision for it (easy to use, but powerful in the long term) and marry that to what the company has chosen for its architecture (familiar, but cleverly modified.) That's what he means by "supercharged."

The three "major modifications" Sony did to the architecture to support this vision are as follows, in Cerny's words:

"First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!

"Next, to support the case where you want to use the GPU L2 cache simultaneously for both graphics processing and asynchronous compute, we have added a bit in the tags of the cache lines, we call it the 'volatile' bit. You can then selectively mark all accesses by compute as 'volatile,' and when it's time for compute to read from system memory, it can invalidate, selectively, the lines it uses in the L2. When it comes time to write back the results, it can write back selectively the lines that it uses. This innovation allows compute to use the GPU L2 cache and perform the required operations without significantly impacting the graphics operations going on at the same time -- in other words, it radically reduces the overhead of running compute and graphics together on the GPU."

Thirdly, said Cerny, "The original AMD GCN architecture allowed for one source of graphics commands, and two sources of compute commands. For PS4, we’ve worked with AMD to increase the limit to 64 sources of compute commands -- the idea is if you have some asynchronous compute you want to perform, you put commands in one of these 64 queues, and then there are multiple levels of arbitration in the hardware to determine what runs, how it runs, and when it runs, alongside the graphics that's in the system."

Source:
http://www.gamasutra.com/view/feature/191007/inside_the_playstation_4_with_mark_.php?page=2
 

KidBeta

Junior Member
I think it's literally impossible to say this with any sort of certainty. We don't even know where about 4 to 5MB of the supposed 47MB worth of on-chip storage Microsoft says is built into the Xbox One SoC. Where that 4 to 5MB, or potentially more if I'm mistaking the numbers, went could be fairly important. Just saying.

Theres redundancy built into a number of parts, including the 2 deactivated CU's in the GPU that would contribute to this, don't think they have magic hiding, but don't stop believing :).
 
The main gaps as I see it are
1) memory bandwidth
2) 12CU vs 18 CU
3) 2ACE vs 8 ACE (longer term benefits perhaps)

You can't leave off:

4) 16 ROPs vs. 32 ROPs
5) 72 TMUs vs. 48 TMUs

Bandwidth, CUs, ROPs and TMUs are the fundamental building blocks of any GPU and different games and different scenarios within those games will stretch each area differently. You can't be dramatically slower in every single important area of GPU performance and expect to compete. You can try and offset this difference in the future with a more forward looking design that is better setup to take advantage of new rendering techniques and algorithms but to make matters worse it's Sony that have the advantage here with a GPU more tailored to GPU compute.

The difference will be obvious on day one and the gap will only become wider as the generation progresses.
 

Nozem

Member
Both systems have 8GB of RAM, but Sony chose 8GB of wide, fast GDDR5 with 176MB/s of peak throughput while Microsoft opted for DDR3, with a maximum rated bandwidth of just 68GB/s - clearly significantly lower.

Yeah that should be GB, not MB. Bit of a blunder for a technical article.

In theory, Xbox One's circa 200MB/s of "real-life" bandwidth trumps PS4's 176GB/s peak throughput.

And again!
 

satam55

Banned
I wish I understood .000001% of what you guys are talking about :/

All I know is I go between GAF and a XBox One Forum and I don't understand what exactly is going on except over their they bash GAF in every other post saying you guys are "Sony Fanboys" and you guys twist and spin suff to make MS look bad and the One look weak.

I honestly have no idea, but I put more faith in what you guys say then a random Xbox forum with a handful of members.

Wish thier was a cliffs notes attached to every post lol
I feel like I grasped the PS3/Cell vs. 360 arguement better last gen now I'm completely lost..

LMAO!!!!!! I wonder how many of them are junior members that were banned from GAF the last several months.
 
What do you mean heavily customized? But Penello said:
"As a matter of fact, they actually go out and they talk about how proud they are about their off-the-shelf parts. Our guys’ll say, we touched every single component in the box and everything there is tweaked for optimum performance."

And Penello wouldn't say something like that if he wasn't 100% confident in that.

Yeah, he's 100% fair.
 
Theres redundancy built into a number of parts, including the 2 deactivated CU's in the GPU that would contribute to this, don't think they have magic hiding, but don't stop believing :).

They don't count redundancies as part of their chip design when mentioning on chip storage. Sure, t hey likely did that for the "5 billion transistors," but I certainly don't think they would for on-chip storage. That's real storage that's being used for whatever it's being used for inside the system. I'm obviously in no position to say what that is.

The 47MB of on-chip storage isn't redundancy. They wouldn't have mentioned it in such a way if it was. There's likely redundant CUs on the GPU that aren't used. No, not CUs that will magically be enabled one day, just backup to protect against chip failure, and I'm sure other things, but I wouldn't have any idea what those are, since this is an example that lines up well with what was done on the xbox 360. There were more unified shader pipelines on the 360 GPU than what was mentioned, but those were disabled for redundancy reasons, but Microsoft didn't magically include them as part of the overall chip design, which makes me doubt they would do such a thing with on-chip storage, if some of that were redundant and unusable by the system. Also, And, the LZ compression/decompression and JPEG decode isn't ONLY for Kinect, but nice try trying to make it appear that way. :)
 

gofreak

GAF's Bob Woodward
They're boxing with shadows here.

I'm seeing them construct arguments against the design that they can tear down that people weren't actually fronting?

The whole 'you can't add bandwidth' criticism was not about concurrent access to the memory pools. I'm not sure who was claiming you couldn't concurrently access eSRAM and DDR3?

The argument that eSRAM is all fine and dandy from a development ease perspective because 360 was the easier console to dev for in the last cycle is also ignoring the fact that there are simpler designs around today and criticism levelled at it around development ease are in that context.

Going by more recent comments from Cerny they're also misinterpreting/twisting the VG Leaks information about Sony's design...

Their talk about Move Engines is also, again, focussing on how it helps in the context of XB1's design while making it sound like they're an advantage in general.

My final comment would be that it's a little unfortunate Richard Leadbetter was the channel for this. It does little to dispel the notion that he's been a bit too close to MS sources of late. That there are echoes of themes that he communicated in previous articles seems rather coincidental too.
 

IN&OUT

Banned

Did MS engineer just admit that GDDR5 is the best memory setup?
Yeah, I think that's right. In terms of getting the best possible combination of performance, memory size, power, the GDDR5 takes you into a little bit of an uncomfortable place," says Nick Baker. "Having ESRAM costs very little power and has the opportunity to give you very high bandwidth. You can reduce the bandwidth on external memory - that saves a lot of power consumption and the commodity memory is cheaper as well so you can afford more. That's really a driving force behind that... if you want a high memory capacity, relatively low power and a lot of bandwidth there are not too many ways of solving that."

also when asked about the CU number compared to PS4, he brought 12 CUs are the most balance shit, then claim that Sony agree with him in VGleaks article? What about Cerny who declined such rumor?

How in hell do you guys want me to believe anything MS says, seriously guys!!
 
So much balance. The word is in the article like 15 times. I also like how they've included a video of their representative Xbox One Radeon 7850 and PS4 7870 XT...

Is the implication supposed to be that the PS4 is imbalanced?

Edit: They seemed to have missed people's contention with the bandwidth adding. On the matter of the ESRAM, people weren't just taking issue with whether one could add bandwidth, they were saying it was disingenuous to add the bandwidth when it only applies to a very small part of the RAM, compared to the unified pool on the PS4.

On the idea that there are bottlenecks with regard to CU scaling, they seem to be ignoring that other components of the PS4's GPU hold even more of an advantage than the 40% the CUs do?

I don't know why they're talking about the old VGLeaks document when it's well known now that there is no division between the compute units and developers can used them for whatever they see fit.
 

KidBeta

Junior Member
They don't count redundancies as part of their chip design when mentioning on chip storage. Sure, t hey likely did that for the "5 billion transistors," but I certainly don't think they would for on-chip storage. That's real storage that's being used for whatever it's being used for inside the system. I'm obviously in no position to say what that is.

The 47MB of on-chip storage isn't redundancy. They wouldn't have mentioned it in such a way if it was. There's likely redundant CUs on the GPU that aren't used. No, not CUs that will magically be enabled one day, just backup to protect against chip failure, and I'm sure other things, but I wouldn't have any idea what those are, since this is an example that lines up well with what was done on the xbox 360. There were more unified shader pipelines on the 360 GPU than what was mentioned, but those were disabled for redundancy reasons, but Microsoft didn't magically include them as part of the overall chip design, which makes me doubt they would do such a thing with on-chip storage, if some of that were redundant and unusable by the system. Also, And, the LZ compression/decompression and JPEG decode isn't ONLY for Kinect, but nice try trying to make it appear that way. :)

Really don't count redundancies, how do you know?.

Also, JPEG, not used in games, seriously google game texture formats and learn why your idea is a horrible one, you'd blow out all your caches, its primary purpose per the interview is for Kinect it might have another use somewhere in the realm of decoding JPEGs from external devices but other then that..

The interview also practically confirms that its HD7790 also.
 

IN&OUT

Banned
So much balance. The word is in the article like 15 times. I also like how they've included a video of their representative Xbox One Radeon 7850 and PS4 7870 XT...

Is the implication supposed to be that the PS4 is imbalanced?

Funny thing that they are quoting VGleaks article that was debunked by Cerny himself ! that's so funny it's sad actually.
 

artist

Banned
14CUs, 2 disabled for yield.

misterx redeemed. Reiko redeemed. Oh my lawd.

edit: They decided to go for 12 @ 853 over 14 @ 800 .. for balance purposes? Not yield? Gotcha *wink* *wink*
 
They're boxing with shadows here.

I'm seeing them construct arguments against the design that they can tear down that people weren't actually fronting?

The whole 'you can't add bandwidth' criticism was not about concurrent access to the memory pools. I'm not sure who was claiming you couldn't concurrently access eSRAM and DDR3?

The argument that eSRAM is all fine and dandy from a development ease perspective because 360 was the easier console to dev for in the last cycle is also ignoring the fact that there are simpler designs around today and criticism levelled at it around development ease are in that context.

Going by more recent comments from Cerny they're also misinterpreting/twisting the VG Leaks information about Sony's design...

Their talk about Move Engines is also, again, focussing on how it helps in the context of XB1's design while making it sound like they're an advantage in general.

My final comment would be that it's a little unfortunate Richard Leadbetter was the channel for this. It does little to dispel the notion that he's been a bit too close to MS sources of late. That there are echoes of themes that he communicated in previous articles seems rather coincidental too.

I see this all the time, and I think it's totally bogus. Richard Leadbetter has written and is involved in so many, in my view, awesome articles that have been put up on that site, a lot of them very awesome articles (some of the best actually) on the PS4, but nobody gives him any credit when he does those or accuses him of somehow being too close to Sony or Sony devs when he writes those? I say whatever war people are waging with the guy is beyond ridiculous. The guy isn't allowed to report on the Xbox One or relay information on it without people attacking his credibility, but he can report plenty of times on the PS4, often with very positive and very insightful information that we weren't aware of before, without a negative word uttered anywhere?

Again, this ridiculous proxy fight that some on GAF keep manufacturing about Leadbetter's motivations and allegiances is tired, played out, and, above all, extremely pitiful and transparent. Some of the articles that people most cite on here for some of the great insight we've gotten on the PS4 have been written by Richard Leadbetter, or haven't people actually realized this fact? Do people's eyes only turn red with anger when he's reporting anything that could be construed as mildly positive Xbox One information?

Really don't count redundancies, how do you know?.

Also, JPEG, not used in games, seriously google game texture formats and learn why your idea is a horrible one, you'd blow out all your caches, its primary purpose per the interview is for Kinect it might have another use somewhere in the realm of decoding JPEGs from external devices but other then that..

You need to read more on what JPEG decoding is doing on vgleaks. It's converting the jpegs into different pieces of information that apparently can be used in games.
 

Ishan

Junior Member
im almost tempted to take a grad level hardware class so i can debunk half the hardware arguments .. the software ones i can see thru already ... eh well ... either way guess console wars will go on
 

B_Boss

Member
The most interesting aspect concerning this home console 'power struggle' between MS & Sony seems so obvious to me: Mark Cerny lol....this guy is either one of the most arrogant human beings on earth or he is on to something...and with a personality and background such as his I'd probably assume well that when he speaks about Computer & Game Design he knows exactly what he is talking about.

So lets skip alot of the rumors, hearsay, etc. What has Cerny said? Lets take a look (emphasis mine)..

I think its very clear what he is implying and saying in more ways than one here:

yWywwWM.jpg


There's a lot of indie developer love at Sony, but you don't seem to have as much of a focus on exclusive titles as Microsoft. Is that perception accurate?

I believe an underappreciated aspect of what's going on is that PlayStation 4 is the world's most powerful games console, and I think that is going to become increasingly apparent as time goes by.

Perhaps many details can be discussed, argued, etc, but I think the 'bigger picture' has been clearly seen and understood by Cerny himself. The man has called the PS4 the most powerful console ever made, why are we still arguing such a moot point now lol? (rhetorical). Cerny 'dropped the mic' a long time ago.

(source: http://www.theverge.com/2013/8/22/4646696/ps4-mark-cerny-interview-gamescom-2013)
 

satam55

Banned
The main gaps as I see it are
1) memory bandwidth
2) 12CU vs 18 CU
3) 2ACE vs 8 ACE (longer term benefits perhaps)




For (1), the crux IMO will be how much of a game's rendering time can be spent solely using the ESRAM. If they can get a significant percentage using that, then at least the memory bandwidth can be significantly mitigated. I think intel said 32MB of edram gave then a 90-95% hit rate when used as a cache? But the tools sound a bit manual at the moment, and the ESRAM doesn't behave as a cache, so they need to improve there.

For (2), I don't see how they bridge that gap. Maybe there are some stats that show what amount of bandwidth the GCN architecture can leverage, and maybe there is an optimal number of CUs, and 18 is actually too many for that bandwidth and so cannot be fully utilised. I doubt it, but I'm just throwing it out there. I'd like to see data on the relationship between memory bandwidth and number of CUs in GCN

For (3) this is another area that I think PS4 will just pull further away from Xbox with. You have 50% more CUs *plus* have mechanisms in place to utilise them more efficiently. That sounds like win-win to me. Then add in Sony first party familiarity with SPE coding for CELL, and you have teams perhaps better set up for exploiting GPGPU


I'm sure there is more to Xbox one than the simple stats. It'll be interesting to hear more about the architecture too. But I don't see how they close the performance gap to PS4 in a meaningful way.

Don't forget about:

1. 768 Shader Cores vs. 1152 Shader Cores
2. 16 ROPs (Render Output Pipelines) vs. 32 ROPs
3. 12.8 Gigapixels/sec vs. 25.6 Gigapixels/sec
 
So much balance. The word is in the article like 15 times. I also like how they've included a video of their representative Xbox One Radeon 7850 and PS4 7870 XT...

Is the implication supposed to be that the PS4 is imbalanced?

Edit: They seemed to have missed people's contention with the bandwidth adding. On the matter of the ESRAM, people weren't just taking issue with whether one could add bandwidth, they were saying it was disingenuous to add the bandwidth when it only applies to a very small part of the RAM, compared to the unified pool on the PS4.

On the idea that there are bottlenecks with regard to CU scaling, they seem to be ignoring that other components of the PS4's GPU hold even more of an advantage than the 40% the CUs do?

I don't know why they're talking about the old VGLeaks document when it's well known now that there is no division between the compute units and developers can used them for whatever they see fit.

Yeah, the FUD strategy seems to be repeating the word "balance" as many times as possible to imply the PS4 is unbalanced.

14CUs, 2 disabled for yield.

misterx redeemed. Reiko redeemed. Oh my lawd.

edit: They decided to go for 12 @ 853 over 14 @ 800 .. for balance purposes? Not yield? Gotcha *wink* *wink*

Yeah, cute that they omitted the cost part of that particular cost/benefit analysis.

Anyway, they should probably retitled the article. This isn't DF vs Xbox One Architecture, it's Xbox One Architecture PR via DF.
 

KidBeta

Junior Member
You need to read more on what JPEG decoding is doing on vgleaks. It's converting the jpegs into different pieces of information that apparently can be used in games.

Yet your still not listening, its the most retarded format on the planet to use for textures or anything game related.

1. The GPU doesn't support it internally in its caches and TMU's so you get a huge blow out on the cache and on the eSRAM as it has to be decoded.

2. Its meant for photos not game textures or game data, which is why its not used.

3. To even use it in RGB space you have to convert it via the shaders.

4. I can't think of a single reason to use them, can you give me one?.
 

gofreak

GAF's Bob Woodward
I see this all the time, and I think it's totally bogus. Richard Leadbetter has written and is involved in so many, in my view, awesome articles that have been put up on that site, a lot of them very awesome articles (some of the best actually) on the PS4, but nobody gives him any credit when he does those or accuses him of somehow being too close to Sony or Sony devs when he writes those?

Deep dives on individual games are a little different to competitive hardware comparisons.

And I'm just talking about the latest run of articles. There's just some fairly unquestioning acceptance of what he's presented with. Maybe Richard isn't in a position to challenge what they're saying, maybe he's better at benchmarks than understanding hardware - he's excellent at benchmarks. He may not have the technical confidence to challenge some of the more slippery/questionable suggestions made by MS here. And it's not about giving companies/developers a platform to share views, but his own commentary and who/what it aligns with perhaps a little too easily. It may not be wilful bias, but his critics will probably only find fuel in stuff like this.
 
This part I personally think is just blatant common sense, and people have been fighting this basic notion forever on here, suggesting what is and isn't possible, and here it is straight from the horses mouth restating what other much less technically proficient posters have been saying on here forever. ESRAM is but a basic extension of EDRAM on the Xbox 360, and just like they designed EDRAM to work well with system memory on the Xbox 360, they did the same yet again with the Xbox One, only there are less limitations compared to eDRAM.

"This controversy is rather surprising to me, especially when you view as ESRAM as the evolution of eDRAM from the Xbox 360. No-one questions on the Xbox 360 whether we can get the eDRAM bandwidth concurrent with the bandwidth coming out of system memory. In fact, the system design required it," explains Andrew Goosen.

"We had to pull over all of our vertex buffers and all of our textures out of system memory concurrent with going on with render targets, colour, depth, stencil buffers that were in eDRAM. Of course with Xbox One we're going with a design where ESRAM has the same natural extension that we had with eDRAM on Xbox 360, to have both going concurrently. It's a nice evolution of the Xbox 360 in that we could clean up a lot of the limitations that we had with the eDRAM.

But, you know, these guys don't know what they're talking about and are just spreading PR lies.

Deep dives on individual games are a little different to competitive hardware comparisons.

And I'm just talking about the latest run of articles. There's just some fairly unquestioning acceptance of what he's presented with. Maybe Richard isn't in a position to challenge what they're saying, maybe he's better at benchmarks than understanding hardware - he's excellent at benchmarks. He may not have the technical confidence to challenge some of the more slippery/questionable suggestions made by MS here. And it's not about giving companies/developers a platform to share views, but his own commentary and who/what it aligns with perhaps a little too easily. It may not be wilful bias, but his critics will probably only find fuel in stuff like this.

Nobody complains when he writes fantastic articles on the PS4, also largely accepting what he is told with fairly unquestioning acceptance. This fantastic article about the crew port to PS4 was also written by Richard Leadbetter, and many more articles that I'm feeling too lazy to bother go looking up.

http://www.eurogamer.net/articles/digitalfoundry-how-the-crew-was-ported-to-playstation-4

Where's the controversy about his motives when he does things like this? And these literally aren't the only fantastic articles this guy has on or relating to the PS4. Anyway, that's enough about this. You guys can carry on. Forgot how hardcore this thread was. :)
 

artist

Banned
He explained it in the text: Xbox One only has 16 ROPs wich are decoupled from the CUs. Increasing CU count wouldn't increase pixel fillrate. Increasing clock does increase pixel fillrate.

Xbox One @ 800MHz = 12.8GPix/s
Xbox One @ 853MHz = 13.6GPix/s
PS4 @ 800 = 25.6 GPix/s
I'm inclined to believe the 14CUs with 16ROPs would still be a more balanced GPU, just look at the 7790. I dont recall seeing it ROP limited in any of the benches as far as I can remember, please correct me if I'm wrong. Yeah yeah I know I'm not an engineer whose hands are literally dripping with the silicon logic of the Xbone APU so what do I know!
 
Did MS engineer just admit that GDDR5 is the best memory setup?

Sound like they confirmed that they chose DDR3 + eSRAM for cost, "power draw" and memory capacity reasons as we all expected. If even Microsoft can't pretend it was chosen as superior to GDDR5 then that about closes that silly argument. It's a band aid to hit 8GB at low cost, nothing more, nothing less.

Edit: Nice of them to admit they are fillrate limited as well (explanation of why they get more performance from increasing clocks rather than enabling CUs). So much for people saying the ROP advantage didn't matter if it's already proving to be a major bottleneck in launch software.
 

astraycat

Member
Now that I've actually finished reading it, there's a lot of interesting tidbits in there. It still leaves me with a bunch of questions though.

ESRAM is fully integrated into our page tables and so you can kind of mix and match the ESRAM and the DDR memory as you go

This confirms that the ESRAM is absolutely not a cache but a developer-controlled scratchpad, which renders the Intel 32MiB EDRAM cache size arguments irrelevant.

"If you're only doing a read you're capped at 109GB/s, if you're only doing a write you're capped at 109GB/s," he says. "To get over that you need to have a mix of the reads and the writes but when you are going to look at the things that are typically in the ESRAM, such as your render targets and your depth buffers, intrinsically they have a lot of read-modified writes going on in the blends and the depth buffer updates. Those are the natural things to stick in the ESRAM and the natural things to take advantage of the concurrent read/writes."

This still doesn't really explain it to me. Unless there's some special purpose hardware there that can do a read-modify-write on the ESRAM side the of the 1024-bit bus, there's just no way to send more than 1024-bits per cycle, which tops out at 109GB/s at 853MHz.

"Everybody knows from the internet that going to 14 CUs should have given us almost 17 per cent more performance," he says, "but in terms of actual measured games - what actually, ultimately counts - is that it was a better engineering decision to raise the clock. There are various bottlenecks you have in the pipeline that can cause you not to get the performance you want if your design is out of balance."

This is really telling. This is an admission that the CUs can (and do) starve due to deficiencies elsewhere in the GPU pipeline. It's too bad that the bottlenecks aren't actually elaborated.

"But we also increase the performance in areas surrounding bottlenecks like the drawcalls flowing through the pipeline, the performance of reading GPRs out of the GPR pool, etc. GPUs are giantly complex. There's gazillions of areas in the pipeline that can be your bottleneck in addition to just ALU and fetch performance."

The GPR comment is pretty weird. GPRs allocation for shaders is done during compilation of the shader, and the only real rule is to minimize them without having to resort to spilling to some sort of scratch buffer. What he could mean, instead of allocation, is just filling them in the first place (loading uniforms and the like). That's a sort of difficult problem, but one I would hope that they have a handle on.

You can use the Move Engines to move these things asynchronously in concert with the GPU so the GPU isn't spending any time on the move. You've got the DMA engine doing it.

"From a power/efficiency standpoint as well, fixed functions are more power-friendly on fixed function units," adds Nick Baker. "We put data compression on there as well, so we have LZ compression/decompression and also motion JPEG decode which helps with Kinect. So there's a lot more to the Data Move Engines than moving from one block of memory to another."

The GCN DMA engines are part of what they're calling a Data Move Engine? Is the Data Move Engine really just a bunch of other bits of fixed function that they've grouped together even though they're actually separate?

Microsoft's approach to asynchronous GPU compute is somewhat different to Sony's - something we'll track back on at a later date. But essentially, rather than concentrate extensively on raw compute power, their philosophy is that both CPU and GPU need lower latency access to the same memory. Goosen points to the Exemplar skeletal tracking system on Kinect on Xbox 360 as an example for why they took that direction.

"Exemplar ironically doesn't need much ALU. It's much more about the latency you have in terms of memory fetch, so this is kind of a natural evolution for us," he says. "It's like, OK, it's the memory system which is more important for some particular GPGPU workloads."

Here comes latency. Are talking about the GDDR5 vs. DDR3 latency, or ESRAM vs. main memory, or something else entirely?
 

gofreak

GAF's Bob Woodward
This part I personally think is just blatant common sense, and people have been fighting this basic notion forever on here, suggesting what is and isn't possible, and here it is straight from the horses mouth restating what other much less technically proficient posters have been saying on here forever. ESRAM is but a basic extension of EDRAM on the Xbox 360, and just like they designed EDRAM to work well with system memory on the Xbox 360, they did the same yet again with the Xbox One, only there are less limitations compared to eDRAM.



But, you know, these guys don't know what they're talking about and are just spreading PR lies.

No one is lying but they're arguing with a point I'm not sure (m)any were making, at least within more informed critiques of the architecture?

The argument about adding bandwidths together and the appropriateness of doing that is not about concurrent memory access as they're framing it. Adding bandwidths together wasn't appropriate on 360 either.

The reason it is not appropriate is that x amount of bandwidth to one small pool of memory + y amount of bandwidth to one larger pool of memory is not equivalent to x+y amount of bandwidth to one large pool. You don't have the same flexibility about how to use that bandwidth as in the latter case. In some use cases it may make little or no difference but in others, and thus in the general case, it does make a difference. Adding together bandwidths and presenting them as one number in order to compare to one batch of bandwidth to one pool of memory runs the risk of suggesting they're equivalent when they're not. That's all.
 

Guymelef

Member
Yeah, the FUD strategy seems to be repeating the word "balance" as many times as possible to imply the PS4 is unbalanced.



Yeah, cute that they omitted the cost part of that particular cost/benefit analysis.

Anyway, they should probably retitled the article. This isn't DF vs Xbox One Architecture, it's Xbox One Architecture PR via DF.

The sad thing it's that isn't the first time.
"Unprecedented partnership" on.
 
This part I personally think is just blatant common sense, and people have been fighting this basic notion forever on here, suggesting what is and isn't possible, and here it is straight from the horses mouth restating what other much less technically proficient posters have been saying on here forever. ESRAM is but a basic extension of EDRAM on the Xbox 360, and just like they designed EDRAM to work well with system memory on the Xbox 360, they did the same yet again with the Xbox One, only there are less limitations compared to eDRAM.

"This controversy is rather surprising to me, especially when you view as ESRAM as the evolution of eDRAM from the Xbox 360. No-one questions on the Xbox 360 whether we can get the eDRAM bandwidth concurrent with the bandwidth coming out of system memory. In fact, the system design required it," explains Andrew Goosen.

"We had to pull over all of our vertex buffers and all of our textures out of system memory concurrent with going on with render targets, colour, depth, stencil buffers that were in eDRAM. Of course with Xbox One we're going with a design where ESRAM has the same natural extension that we had with eDRAM on Xbox 360, to have both going concurrently. It's a nice evolution of the Xbox 360 in that we could clean up a lot of the limitations that we had with the eDRAM.

But, you know, these guys don't know what they're talking about and are just spreading PR lies.

Jesus Christ, but it's not a simple evolution of the Xbox 360 design. Just look at this comparison:

Current gen

PS3: ~25GB/s

Xbox 360: ~23GB/s + eDRAM

Next gen:

PS4: 176GB/s

Xbox One: 68GB/s + eSRAM

Do you notice something?
 

KidBeta

Junior Member
This still doesn't really explain it to me. Unless there's some special purpose hardware there that can do a read-modify-write on the ESRAM side the of the 1024-bit bus, there's just no way to send more than 1024-bits per cycle, which tops out at 109GB/s at 853MHz.

The eSRAM is made from 4x 256bit memory controllers which each access 8 1MB modules.

You can read and write to separate controllers and therefore separate modules at the same time, in fact you have to get there peak bandwidth numbers but separately reading or writing will always max out at 109GB/s if your doing it over all 4 controllers.
 
Gemüsepizza;83107225 said:
Jesus Christ, but it's not a simple evolution of the Xbox 360 design. Just look at this comparison:

Current gen

PS3: ~25GB/s

Xbox 360: ~23GB/s + eDRAM

Nextgen:

PS4: 176GB/s

Xbox One: 68GB/s + eSRAM

Do you notice something?

Yea, I notice something alright. You're playing the traditional console warrior card of not even mentioning the very real bandwidth that is on the ESRAM. Nice try leaving out the bandwidth figure on the 360's eDRAM also in an attempt to try and cover what your real intentions were for leaving out the embedded memory bandwidth on the Xbox One. :)
 

KidBeta

Junior Member
Yea, I notice something alright. You're playing the traditional console warrior card of not even mentioning the very real bandwidth that is on the ESRAM. Nice try leaving out the bandwidth figure on the 360's eDRAM also in an attempt to try and cover what your real intentions were for leaving out the embedded memory bandwidth on the Xbox One. :)

Ill fix it for him then.

Nextgen:

PS4: 176GB/s

Xbox One: 68GB/s + 140GB/s

Do you notice something?
 
Deep dives on individual games are a little different to competitive hardware comparisons.

And I'm just talking about the latest run of articles. There's just some fairly unquestioning acceptance of what he's presented with. Maybe Richard isn't in a position to challenge what they're saying, maybe he's better at benchmarks than understanding hardware - he's excellent at benchmarks. He may not have the technical confidence to challenge some of the more slippery/questionable suggestions made by MS here. And it's not about giving companies/developers a platform to share views, but his own commentary and who/what it aligns with perhaps a little too easily. It may not be wilful bias, but his critics will probably only find fuel in stuff like this.

Perhaps he didn't have the opportunity to question MS. Perhaps if he took a harder approach he wouldn't get as much information as he does. Either way, I'm pleased that he gets this information out there, we can question it and/or rip it to shreds from there.
 

astraycat

Member
The eSRAM is made from 4x 256bit memory controllers which each access 8 1MB modules.

You can read and write to separate controllers and therefore separate modules at the same time, in fact you have to get there peak bandwidth numbers but separately reading or writing will always max out at 109GB/s if your doing it over all 4 controllers.

Sure. But 4x 256-bits is 1024-bits, and 1024-bits * 853MHz is 109GB/s. Even if you can choose for each individual lane to read or write, each cycle you only have enough lanes for 1024 bits.
 
Yea, I notice something alright. You're playing the traditional console warrior card of not even mentioning the very real bandwidth that is on the ESRAM. Nice try leaving out the bandwidth figure on the 360's eDRAM also in an attempt to try and cover what your real intentions were for leaving out the embedded memory bandwidth on the Xbox One. :)

My point is, on Xbox 360 the eDRAM was a bonus for developers. On Xbox One, the eSRAM is needed to counter the massive bandwidth difference of the main memory.
 
[Sums up my perspective as well, and on a much more broad level than just performance. Seemed MS believed that either Sony would be totally asleep at the wheel or that they themselves could throw out whatever device they wanted and the masses would lap it up thanks to the Xbox brand. I wonder if higher-ups at MS thought they had a chance at becoming the Apple of the living room and failed to notice they totally missed their mark./quote]

This!


Edit: Messed up the quote :(
 
Top Bottom