• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

DF: Xbone Specs/Tech Analysis: GPU 33% less powerful than PS4

Rourkey

Member
Is the Xbone architecture that complicated? the slower unified memory backed up with the fast ram is basically the same setup as the 360 which devs have been working with for 8 years.

I see multi platform games being developed on the Xbone so devs don't have headaches converting it to a platform with less ram.

What they won't have this generation is the struggle of trying to develop their game for 2 consoles with such differing hardware which forced them to pick one and then force it into the other. This time they basically have the same AMD CPU and the Same ATI GPU (just a model up for the PS4) with just differing ram configurations. The Xbones 5gigs free is plenty, Sony until recently thought 4 gigs including OS was gonna be fine.

It'll be exclusive 1st party games where devs get more time and resources where the best graphics will be shown as was the case this generation and I think the law of diminishing returns this gen means that time and resource will be the biggest overall factor about how great games look.
 
The latency is not nearly as important with rendering. It's brute speed/bandwidth that matters, which is why all newer versions of ram sacrifice latency for more bandwidth. Latency wise we're talking nano second differences that really don't matter in the bigger picture.

And then you have to remember that the 32mb Esram is still slower compared to the GDDR5 in the PS4 (102Gbps vs 176Gbps). The Esram isn't useless, but it's certainly not going to make up for the slower DDR3 nor compete with the massive amount of high speed GDDR5 ram in the PS4.

Hold your horses. The latency on the ESRAM is orders of magnitudes lower than the GDDR5 latency in the PS4. And, no, gpus are designed to deal with latency, but they could always benefit from having even lower latency, hence why low latency caches are so important to overall gpu performance.

Also, you talk about that 176GB/s of memory bandwidth in the PS4 as if all belongs to the GPU. It doesn't. 20GB/s of that is used for the CPU. The GPU in the Xbox One is able to utilize both the bandwidth from the DDR3 and the ESRAM simultaneously. The GPU has a theoretical read of up to 170GB/s (yes, both pools of memory combined), but of course serving it needs to serve its CPU just like the PS4 will have to, so it will have less than the full 170GB/s of memory bandwidth.

According to game programmers with far more experience with this than either you or I, low latency ESRAM that can be used as a memory resource would indeed make up a great deal of performance. It has been said by these same game programmers that shaders spend more time waiting on memory than they do computing values, so performance of the Xbox One GPU could be greatly enhanced under memory limited situations. Also, it has been said that there are a number of algorithms that would benefit from fast low latency memory, and that there are instances in which there being fast low latency memory would actually save bandwidth or not use quite as much. I'm not saying that it instantly becomes as powerful as the PS4 or stronger, but programmers are on record saying this.
 
incredulous.gif

Bookmarked that quote with this gif.

So good.
 

TAJ

Darkness cannot drive out darkness; only light can do that. Hate cannot drive out hate; only love can do that.
There has never been an Apples to Apples comparison before in console history as far as I know.
PS2 to Gamecube is probably the closest comparison in power but it doesn't really fit the bill since the PS2 was quite exotic with some really standout key performances.
The PS4 should be much, much more adept at handling 1080p with richer effects and/or higher framerate than the Xbone.

Gamecube to Wii. It's not a perfect fit, but it's easily the closest I can think of.
 

nib95

Banned
Hold your horses. The latency on the ESRAM is orders of magnitudes lower than the GDDR5 latency in the PS4. And, no, gpus are designed to deal with latency, but they could always benefit from having even lower latency, hence why low latency caches are so important to overall gpu performance.

Also, you talk about that 176GB/s of memory bandwidth in the PS4 as if all belongs to the GPU. It doesn't. 20GB/s of that is used for the CPU. The GPU in the Xbox One is able to utilize both the bandwidth from the DDR3 and the ESRAM simultaneously. The GPU has a theoretical read of up to 170GB/s (yes, both pools of memory combined), but of course serving it needs to serve its CPU just like the PS4 will have to, so it will have less than the full 170GB/s of memory bandwidth.

According to game programmers with far more experience with this than either you or I, low latency ESRAM that can be used as a memory resource would indeed make up a great deal of performance. It has been said by these same game programmers that shaders spend more time waiting on memory than they do computing values, so performance of the Xbox One GPU could be greatly enhanced under memory limited situations. Also, it has been said that there are a number of algorithms that would benefit from fast low latency memory, and that there are instances in which there being fast low latency memory would actually save bandwidth or not use quite as much. I'm not saying that it instantly becomes as powerful as the PS4 or stronger, but programmers are on record saying this.

I read an article earlier from Sony that said the system design removed that issue of waiting for the memory that you mentioned. Perhaps someone has a link? Something to do moving on to new instructions immediately without having to wait on anything. The system has been designed so that the GDDR5's latency is more than sufficient. As has been the case with all modern day gpus that put bandwidth above latency. Lest we forget that ddr2 has less latency than ddr3, and ddr3 less than ddr4. Same with gddr ram, but the bandwidth /speed benefits far out weight the loss in latency.

Also, yes the bandwidth of the Esram can be combined with that of the DDR3, but it's still only for a paltry 32mb. Still means that the available 5gb of the total 8gb DDR3 in the XO will only run at 68Gbps. The Esram running faster does not make the DDR3 run faster. People need to realise this.
 

Cidd

Member
I read an article earlier from Sony that said the system design removed that issue of waiting for the memory that you mentioned. Perhaps someone has a link? Something to do moving on to new instructions immediately without having to wait on anything.

Also, yes the bandwidth of the Esram can be combined with that of the DDR3, but it's still only for a paltry 32mb. Still means that the available 5gb of the total 8gb DDR3 in the XO will only run at 68Gbps. The Esram running faster does not make the DDR3 run faster. People need to realise this.

I wouldn't waste my time, he's been saying the same thing for months now even when others already corrected him.
 
Hold your horses. The latency on the ESRAM is orders of magnitudes lower than the GDDR5 latency in the PS4. And, no, gpus are designed to deal with latency, but they could always benefit from having even lower latency, hence why low latency caches are so important to overall gpu performance.

...

According to game programmers with far more experience with this than either you or I, low latency ESRAM that can be used as a memory resource would indeed make up a great deal of performance. It has been said by these same game programmers that shaders spend more time waiting on memory than they do computing values, so performance of the Xbox One GPU could be greatly enhanced under memory limited situations. Also, it has been said that there are a number of algorithms that would benefit from fast low latency memory, and that there are instances in which there being fast low latency memory would actually save bandwidth or not use quite as much. I'm not saying that it instantly becomes as powerful as the PS4 or stronger, but programmers are on record saying this.

Ill quote from my previous post incase you missed it:

Latency advantage is only really useful for compute tasks, thing is if your doing compute on XBOne your taking even more resources from the weak GPU, not a good idea. I guess you also have to take into account what effect the 6 extra ACEs in the PS4 GPU will have on compute.

Also the latency advantage may be insignificant. If we say ESRAM has 5ns latency and GDDR5 50ns (numebrs ive made up out of thin air) thats a seemingly significant difference of x10. This doesnt take into account any latency in the GPU pipeline though, if the GPU compute pipeline ads 300ns to that figure you end up with 305ns latency vs 350ns, not as significant as it might have seemed at first.
 
I read an article earlier from Sony that said the system design removed that issue of waiting for the memory that you mentioned. Perhaps someone has a link? Something to do moving on to new instructions immediately without having to wait on anything.

Also, yes the bandwidth of the Esram can be combined with that of the DDR3, but it's still only for a paltry 32mb. Still means that the available 5gb of the total 8gb DDR3 in the XO will only run at 68Gbps. The Esram running faster does not make the DDR3 run faster. People need to realise this.


http://www.gamasutra.com/view/feature/191007/

The CPU and GPU are on a "very large single custom chip" created by AMD for Sony. "The eight Jaguar cores, the GPU and a large number of other units are all on the same die," said Cerny. The memory is not on the chip, however. Via a 256-bit bus, it communicates with the shared pool of ram at 176 GB per second

"One thing we could have done is drop it down to 128-bit bus, which would drop the bandwidth to 88 gigabytes per second, and then have eDRAM on chip to bring the performance back up again," said Cerny. While that solution initially looked appealing to the team due to its ease of manufacturability, it was abandoned thanks to the complexity it would add for developers. "We did not want to create some kind of puzzle that the development community would have to solve in order to create their games. And so we stayed true to the philosophy of unified memory."
 

nib95

Banned
http://www.gamasutra.com/view/feature/191007/

The CPU and GPU are on a "very large single custom chip" created by AMD for Sony. "The eight Jaguar cores, the GPU and a large number of other units are all on the same die," said Cerny. The memory is not on the chip, however. Via a 256-bit bus, it communicates with the shared pool of ram at 176 GB per second

"One thing we could have done is drop it down to 128-bit bus, which would drop the bandwidth to 88 gigabytes per second, and then have eDRAM on chip to bring the performance back up again," said Cerny. While that solution initially looked appealing to the team due to its ease of manufacturability, it was abandoned thanks to the complexity it would add for developers. "We did not want to create some kind of puzzle that the development community would have to solve in order to create their games. And so we stayed true to the philosophy of unified memory."

This is not the quote I was speaking of, though obviously it is still relevant.
 

coldfoot

Banned
The GPU in the Xbox One is able to utilize both the bandwidth from the DDR3 and the ESRAM simultaneously. The GPU has a theoretical read of up to 170GB/s (yes, both pools of memory combined), but of course serving it needs to serve its CPU just like the PS4 will have to, so it will have less than the full 170GB/s of memory bandwidth.
With 32MB, reading all the contents of the ESRAM should be over in about ~300 ns, then you're back to the DDR3 pool again.

According to game programmers with far more experience with this than either you or I, low latency ESRAM that can be used as a memory resource would indeed make up a great deal of performance. It has been said by these same game programmers that shaders spend more time waiting on memory than they do computing values, so performance of the Xbox One GPU could be greatly enhanced under memory limited situations
Link?

Also, it has been said that there are a number of algorithms that would benefit from fast low latency memory, and that there are instances in which there being fast low latency memory would actually save bandwidth or not use quite as much. I'm not saying that it instantly becomes as powerful as the PS4 or stronger, but programmers are on record saying this.
Link to these algorithms and who's said this?
 
Save for the fact that we are not aware of whether PS4 will also do the same and to what extent.

PS3 and 360 both did it Vita does it, phones do it, this is just the first time either MS/Sony have every explicitly told anyone "hey guys we reserve GPU cycles for OS purposes!"
 

Shayan

Banned
I wouldn't waste my time, he's been saying the same thing for months now even when others already corrected him.

its quite funny how people are combining the ESRAM and DDR3 bandwidth to make xone look better. DDR 3 will run only at 68g/s . The bandwidth of that 32mb ONLY ESRAM can be made to run at 170g/s (68 + 102 for ES), but again ONLY 32mb of ESRAM would have that bandwidth..only 32mb yes, as opposed to GDDR5 , which is not only more efficient than DDR3 but has a bandwidth of 170g/s
 
I wouldn't waste my time, he's been saying the same thing for months now even when others already corrected him.

For the record, I'm going based off of information I've read from actual experienced game programmers on beyond3d. Clearly we won't get anywhere on this, but just know that some of the things you guys are saying are just way off.

Latency is NOT only important for compute tasks, for whoever said that.. That's simply not accurate. Shadowrunner it seems to me you're taking a range of things that really aren't accurate and combining them to fit with a certain view, but that stuff you said about the latency out of nowhere just isn't accurate. Latency becomes even better the further up you go into the caches. I'm talking about actual memory accesses from off chip, for when the information isn't inside the GPU L2 and L1 caches.

Either way, I don't want to get bogged down in a crazy tech discussion. It's too much on the brain right now lol. I just want to look at threads and see what people are saying about various subjects and obsess over game trailers for games I can't play yet. I'm starving, going to make something to eat. I enjoyed the little chat, though :)
 
Here's the thread/links I was speaking of.

PS4's GPU customization revealed (paging Jeff)

Summary.

I see that, but it isn't the same, not even close, as low latency 6T-SRAM. Again, won't get too bogged down in a tech discussion as my posts would be way longer than I want them to be, and I'm feeling too lazy to go through that currently. Simply put the PS4 is the stronger machine. Nobody disputes that. However, the Xbox One is being severely underestimated due to that fact. Weaker than the PS4 doesn't mean weak in general.
 
I read an article earlier from Sony that said the system design removed that issue of waiting for the memory that you mentioned. Perhaps someone has a link? Something to do moving on to new instructions immediately without having to wait on anything. The system has been designed so that the GDDR5's latency is more than sufficient. As has been the case with all modern day gpus that put bandwidth above latency. Lest we forget that ddr2 has less latency than ddr3, and ddr3 less than ddr4. Same with gddr ram, but the bandwidth /speed benefits far out weight the loss in latency.

I think your referring to the the 8xACEs that provide the 64 compute queues, the idea being that with enough queues running parallel you mitigate the isues with latency in much the same way GPUs get around latency in graphics tasks by having work spread across hundreds of threads.

The compute queues also help when the GPUs ALUs are idle ruining non ALU bound tasks, by fitting in jobs sitting in the queue for compute whenever it gets chance driving up overall resource efficiency of the GPU
 

Shayan

Banned
For the record, I'm going based off of information I've read from actual experienced game programmers on beyond3d. Clearly we won't get anywhere on this, but just know that some of the things you guys are saying are just way off.

Latency is NOT only important for compute tasks, for whoever said that.. That's simply not accurate. Shadowrunner it seems to me you're taking a range of things that really aren't accurate and combining them to fit with a certain view, but that stuff you said about the latency out of nowhere just isn't accurate. Latency becomes even better the further up you go into the caches. I'm talking about actual memory accesses from off chip, for when the information isn't inside the GPU L2 and L1 caches.

Either way, I don't want to get bogged down in a crazy tech discussion. It's too much on the brain right now lol. I just want to look at threads and see what people are saying about various subjects and obsess over game trailers for games I can't play yet. I'm starving, going to make something to eat. I enjoyed the little chat, though :)

it doesnt matter what one programmer at a certain forum said. Latency isnt a problem when you have ultrafast GDDR5 which can load, unload almost at the same time. The problem happens when program x needs to access sector z of the ram pool and program y also needs to access it concurrently. Then the loading/unloading alone would take eons

There is nothing that can make that slow ddr3 run beyond 68g/s for heavy graphical/computational taks
 

nib95

Banned
I see that, but it isn't the same, not even close, as low latency 6T-SRAM. Again, won't get too bogged down in a tech discussion as my posts would be way longer than I want them to be, and I'm feeling too lazy to go through that currently. Simply put the PS4 is the stronger machine. Nobody disputes that. However, the Xbox One is being severely underestimated due to that fact. Weaker than the PS4 doesn't mean weak in general.

The performance of the XO is all relative. We're talking about it's performance in relation to the PS4, which presently is a good degree behind. The PS4 is has a 50% advantage based on AMD's numbers, and then a further extreme bandwidth advantage with the ram, and then another big advantage with the available ram and hardware resources (based on ram and resources reserved by the OS). It's not a small difference, and the Esram won't do squat to minimise it.

You say that the XO is not a weak machine, but I think it absolutely is. In-fact, I think the PS4 is weak as well. Personally I'd have liked even greater performance from both of these two next gen consoles, though I appreciate that's a lot to ask given pricing and the economy. Having said that, I'm certainly a lot happier with Sony's hardware direction and as such, performance advantage this time around.
 
With 32MB, reading all the contents of the ESRAM should be over in about ~300 ns, then you're back to the DDR3 pool again.


Link?


Link to these algorithms and who's said this?

Links coming up.

http://beyond3d.com/showpost.php?p=1696970&postcount=245

No the benefit of the EDRAM in 360 was moving all of the GPU's output bandwidth to a separate pool of memory, with enough bandwidth for it to never be a bottleneck.
The SRAM performs a similar function, and potentially more.
If the pool is actually SRAM as opposed to some EDRAM variant, then it would have very low latency, this would mean that using it as a data source would greatly increase GPU performance in memory limited cases.

If it really is SRAM that memory is a big block on the die, and 64MB was probably not practical.


http://beyond3d.com/showpost.php?p=1697305&postcount=401

AFAICS the Data Move engines aren't going to make up any computational difference, they might let software better exploit the 32MB scratch pad.
If that Scratchpad is low latency then it could make a large difference to the efficiency of the system.
IMO and from what I've been told, most shaders spend more time waiting on memory than they do computing values, if that pool of memory is similar to L2 Cache performance you'd be looking at a cache miss dropping from 300+ GPU cycles to 10-20. Hiding 10-20 cycles is a lot easier than hiding 300.

IF the ESRAM pool is low latency then I think the Durango architecture is interesting
.

http://beyond3d.com/showpost.php?p=1738812&postcount=39

On Xbox 360, the EDRAM helps a lot with backbuffer bandwidth. For example in our last Xbox 360 game we had a 2 MRT g-buffer (deferred rendering, depth + 2x8888 buffers, same bit depth as in CryEngine 3). The g-buffer writes require 12 bytes of bandwidth per pixel, and all that bandwidth is fully provided by EDRAM. For each rendered pixel we sample three textures. Textures are block compressed (2xDXT5+1xDXN), so they take a total 3 bytes per sampled texel. Assuming a coherent access pattern and trilinear filtering, we multiply that cost by 1.25 (25% extra memory touched by trilinear), and we get a texture bandwidth requirement of 3.75 bytes per rendered pixel. Without EDRAM the external memory bandwidth requirement is 12+3.75 bytes = 15.75 bytes per pixel. With EDRAM it is only 3.75 bytes. That is a 76% saving (over 4x external memory bandwidth cost without EDRAM). Deferred rendering is a widely used technique in high end AAA games. It is often criticized to be bandwidth inefficient, but developers still love to use it because it has lots of benefits. On Xbox 360, the EDRAM enables efficient usage of deferred rendering.

Also a fast read/write on chip memory scratchpad (or a big cache) would help a lot with image post processing. Most of the image post process algorithms need no (or just a little) extra memory in addition to the processed backbuffer. With large enough on chip memory (or cache), most post processing algorithms become completely free of external memory bandwidth. Examples: HDR bloom, lens flares/streaks, bokeh/DOF, motion blur (per pixel motion vectors), SSAO/SSDO, post AA filters, color correction, etc, etc. The screen space local reflection (SSLR) algorithm (in Killzone Shadow Fall) would benefit the most from fast on chip local memory, since tracing those secondary rays from the min/max quadtree acceleration structure has quite an incoherent memory access pattern. Incoherent accesses are latency sensitive (lots of cache misses) and the on chip memories tend to have smaller latencies (of course it's implementation specific, but that is usually true, since the memory is closer to the execution units, for example Haswell's 128 MB L4 should be lower latency than the external memory). I would expect to see a lot more post process effects in the future as developers are targeting cinematic rendering with their new engines. Fast on chip memory scratchpad (or a big cache) would reduce bandwidth requirement a lot.

That's precisely what Xbox One has, fast on chip memory scratchpad. And, as you can see, straight from a programmer, he says that fast read/write on chip memory scratchpad would help a lot with image post processing.


B3D is filled with bullshitters. Come on.

Not the bullshitters. The confirmed programmers. Hell, one of them, I believe, is even from a sony first party that gaf literally worships :p

That's all I can do now, I'm dying of hunger. No more tech talk from me for the rest of the night, but this isn't the only example of this stuff.
 

JaggedSac

Member
B3D is filled with bullshitters. Come on.

The guy he is referring to, at least the one I think he is referring to, sebbbi, is legit. Esram is a legitimate way to mitigate at least some of the performance issues of low bandwidth, Cerny himself said it. It of course has issues, one being more work for programmers, but it also has some benefits(at least in regards to cost) in easier manufacturing.
 
This is not the quote I was speaking of, though obviously it is still relevant.

Your talking about the custom caching system they put in place. It's in that same article from gamasutra.

Here they are. I would think this would make the lower latency issue non existent.

"First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!"

"Next, to support the case where you want to use the GPU L2 cache simultaneously for both graphics processing and asynchronous compute, we have added a bit in the tags of the cache lines, we call it the 'volatile' bit. You can then selectively mark all accesses by compute as 'volatile,' and when it's time for compute to read from system memory, it can invalidate, selectively, the lines it uses in the L2. When it comes time to write back the results, it can write back selectively the lines that it uses. This innovation allows compute to use the GPU L2 cache and perform the required operations without significantly impacting the graphics operations going on at the same time -- in other words, it radically reduces the overhead of running compute and graphics together on the GPU."
 

amayo15

Banned

Yeah...I'm not an expert, sorry. I've read plenty of posts in the past that said RSX shader throughput was 364 GFLOPS whilst Xenos was 240. There have been people praising the superiority of the PS3 over the 360 until even a couple months ago - but we never saw evidence, honestly.

Did they pull the 364GF out of their ass or something?

Sorry if I offended anyone with my lack of knowledge, haha.
 

vpance

Member
You say that the XO is not a weak machine, but I think it absolutely is. In-fact, I think the PS4 is weak as well.

PS4 weak? Nope. GPU utilization will be much higher than on we see on PC games. Plus where does all that PC power go anyways? Into resolution. Pretty sure we'll have that moment where we all go, damn, 1.84TF is doing that??

Xbone is weak for sure though. Do they even have the 8 ACE like PS4? Probably not gonna see much compute use from them.
 
That's precisely what Xbox One has, fast on chip memory scratchpad. And, as you can see, straight from a programmer, he says that fast read/write on chip memory scratchpad would help a lot with image post processing.

Yeah, wasn't the whole point of the eDRAM on the 360 supposed to be the "free" AA? I'm no graphics programmer but it makes sense that a fast cache would help out a lot for operations that are going to hit the same data repeatedly per frame, which post-processing would do a lot.

It's not going to close the gap between the two systems, but it could give the Xbone some cheap image quality boosts that the PS4 has to work a bit harder for.
 
Yeah...I'm not an expert, sorry. I've read plenty of posts in the past that said RSX shader throughput was 364 GFLOPS whilst Xenos was 240. There have been people praising the superiority of the PS3 over the 360 until even a couple months ago - but we never saw evidence, honestly.

Did they pull the 364GF out of their ass or something?

Sorry if I offended anyone with my lack of knowledge, haha.

RSX actually does have more raw shader power(think its more like 255gflops vs 240 tho), but there completely different architectures and Xenos is much more efficient and RSX has a bunch of bottlenecks(vertex processing).

Regardless this is a completely different situation now. Both GPU's are the same architecture and are directly comparable now. It's apples to apples. PS4 GPU having 50% RAW shader power, 18 CU's vs 12, will make a big difference. Its like directly comparing a 7770 vs a more powerful 7850.
 
He's not wrong that ESRAM has latency advantages you could exploit in certain situations. The problem is he's acting like it's a magic bullet when it's actually just another trade-off. If you want to dedicate the ESRAM to making some GPGPU compute code run really well you're sacrificing the ability to use it for something else. If you put your frame-buffers in DDR3 you'll saturate that bus pretty quickly, run into contention issues, etc. If your dataset can't all fit in 32MB there's a lot of copy overhead to moving more data to the ESRAM. Thus it only makes sense to use the ESRAM in this fashion if the latency advantages gain you more performance than you would lose by having to accommodate that approach. It's not an automatic thing, and there will be lots of situations where it does not make sense.
 

nib95

Banned
Links coming up.

http://beyond3d.com/showpost.php?p=1696970&postcount=245




http://beyond3d.com/showpost.php?p=1697305&postcount=401



http://beyond3d.com/showpost.php?p=1738812&postcount=39



That's precisely what Xbox One has, fast on chip memory scratchpad. And, as you can see, straight from a programmer, he says that fast read/write on chip memory scratchpad would help a lot with image post processing.




Not the bullshitters. The confirmed programmers. Hell, one of them, I believe, is even from a sony first party that gaf literally worships :p

That's all I can do now, I'm dying of hunger. No more tech talk from me for the rest of the night, but this isn't the only example of this stuff.

There's a big difference when talking 360 though, since the GDDR3 was quite low bandwidth. GDDR5 overcomes the latency issues namely because of the massively quicker speed/bandwidth. There's a developer post about this somewhere on B3D. Essentially the bandwidth advantages of GDDR5 overcome any latency woes.

Remember, the GDDR5 in the PS4 is 176Gbps. By contrast, the GDDR3 in the 360 was 22.4 GB/s, and the Edram in the 360 was 256 GB/s (the Esram in the Xbox One is slower at 102Gbps).
 

TAJ

Darkness cannot drive out darkness; only light can do that. Hate cannot drive out hate; only love can do that.
That's precisely what Xbox One has, fast on chip memory scratchpad. And, as you can see, straight from a programmer, he says that fast read/write on chip memory scratchpad would help a lot with image post processing.

The embedded memory that he's talking about also had quite a bit more bandwidth than Xbone's.
 

CLEEK

Member
Yeah...I'm not an expert, sorry. I've read plenty of posts in the past that said RSX shader throughput was 364 GFLOPS whilst Xenos was 240. There have been people praising the superiority of the PS3 over the 360 until even a couple months ago - but we never saw evidence, honestly.

Did they pull the 364GF out of their ass or something?

Sorry if I offended anyone with my lack of knowledge, haha.

The RSX and Xeons are different architectures, so the FLOP figure doesn't give an accurate picture of performance. You have have lower FLOPS and yet have better performance, which was the case with the PS3 and 360.

In a current example, the 7970 (AMD) has 20% more FLOPS than the nVidia 680, but the nVidia card performs better.

To use a car analogy, you cant just look at BHP of separate models to get an idea of which car will have the fastest top speed.
 
Yeah...I'm not an expert, sorry. I've read plenty of posts in the past that said RSX shader throughput was 364 GFLOPS whilst Xenos was 240. There have been people praising the superiority of the PS3 over the 360 until even a couple months ago - but we never saw evidence, honestly.

Did they pull the 364GF out of their ass or something?

Sorry if I offended anyone with my lack of knowledge, haha.

I believe there diference in FLOPS figures was due to Xenos using unified shaders and RSX not, they were counted in different ways. RSX used fixed function units, if you add all of them up it comes to that flops figure, the problem is all the units could never be used at the same time thus the peak flops figure was much lower. Thats my understanding atleast.
 

coldfoot

Banned
Not only those quotes are different than what you're stating, they're also dependent on ESRAM being actually low latency. Also he's going by what he's been told, which means he's not really the expert on that. They also explicitly state that it's not going to make up any difference in compute.
 
Your talking about the custom caching system they put in place. It's in that same article from gamasutra.

Here they are. I would think this would make the lower latency issue non existent.

Those things help, but people are looking for words that they think match other things and then associating them with things that they really have no direct correlation to. What is basically happening is people are finding the word latency, and then suggesting that this instance of the word latency, as shown in this sentence or paragraph pertaining to the PS4, invalidates any latency advantage that ESRAM would offer over GDDR5, and it's just not accurate. It doesn't work that way.

We are referring to latency and memory accesses to off chip memory and how that might affect performance. Yes, the PS4 has things to help it with latency, but it can't be compared to low latency SRAM and used to suddenly suggest that the SRAM's low latency advantage over GDDR5 is non-existent simply because the word latency shows up in reference to one of the PS4's optimizations designed to help better utilize the system.

Said before: PS4 is the obviously stronger system and nothing is going to change that, but the PS4 doesn't need to get advantages out of thin air where it simply doesn't have them. It has all the necessary advantages that it would ever need to be a more powerful console than the Xbox One. I really do hate looking at these consoles in such a competitive context, because that really isn't what's important. What's important is that the developers have the necessary power to make incredible games on both systems, and I think they have that in spades on both consoles. So the PS4 has more breathing room, but then nobody is disputing that.

Anyway, I'm really done now. Starving. Going to tear through my kitchen. When I'm done nothing will be left.
 
GPU ROPs is a significant element in AA and the PS4 has 32 vs 16 in Xbone.

Sure, but if you can't feed them fast enough wouldn't they end up being slower overall? Like I said, I'm not a graphics programmer. :p

I guess my take is that Microsoft wouldn't have included the ESRAM if it didn't make a significant difference. They're targeting a cheaper system overall, and that stuff takes up a lot of die space that they could have dedicated to something else, or removed to lower the cost of the SoC.
 

nib95

Banned
I guess my take is that Microsoft wouldn't have included the ESRAM if it didn't make a significant difference.

I think Microsoft had little choice in adding the Esram when they banked on DDR3 (more for system use and manufacturing benefits based on what I have read). The Esram is essentially offering a slight helping hand to the DDR3, but my guess is they would have dropped it altogether if they'd gone GDDR5 instead.
 

CLEEK

Member
I guess my take is that Microsoft wouldn't have included the ESRAM if it didn't make a significant difference. They're targeting a cheaper system overall, and that stuff takes up a lot of die space that they could have dedicated to something else, or removed to lower the cost of the SoC.

MS would have known from the start they need several GB of RAM for the Win8 side of things, so would know they needed 8GB total at least. So the Xbone was designed to have cheaper, low bandwidth DDR3. This would suit the Win8 side of things perfectly, and to mitigate the bandwidth issue for the gaming side, ESRAM was required.

Of course the ESRAM will make a big difference, but it doesn't have the benefits of the Sony design.

As the interviews with Cerny showed, Sony toyed with a similar idea, but just went with a big pool of GDDR5 for everything. A solution that gives more bandwidth, less complexity and has GPGPU benefits (but costs more to manufacturer).
 
Those things help, but people are looking for words that they think match other things and then associating them with things that they really have no direct correlation to. What is basically happening is people are finding the word latency, and then suggesting that this instance of the word latency, as shown in this sentence or paragraph pertaining to the PS4, invalidates any latency advantage that ESRAM would offer over GDDR5, and it's just not accurate. It doesn't work that way.

We are referring to latency and memory accesses to off chip memory and how that might affect performance. Yes, the PS4 has things to help it with latency, but it can't be compared to low latency SRAM and used to suddenly suggest that the SRAM's low latency advantage over GDDR5 is non-existent simply because the word latency shows up in reference to one of the PS4's optimizations designed to help better utilize the system.

Said before: PS4 is the obviously stronger system and nothing is going to change that, but the PS4 doesn't need to get advantages out of thin air where it simply doesn't have them. It has all the necessary advantages that it would ever need to be a more powerful console than the Xbox One. I really do hate looking at these consoles in such a competitive context, because that really isn't what's important. What's important is that the developers have the necessary power to make incredible games on both systems, and I think they have that in spades on both consoles. So the PS4 has more breathing room, but then nobody is disputing that.

Anyway, I'm really done now. Starving. Going to tear through my kitchen. When I'm done nothing will be left.

...but... the PS4 can take off-chip memory (the GDDR5 ram) and bypass both L2 and L1 caches and throw it directly into the processor if it needs to. That's nuts. That's cutting out so much latency all in itself. How can we not compare both systems if they both address the same issue? Latency of the data access?
 
Links coming up.

http://beyond3d.com/showpost.php?p=1696970&postcount=245




http://beyond3d.com/showpost.php?p=1697305&postcount=401



http://beyond3d.com/showpost.php?p=1738812&postcount=39



That's precisely what Xbox One has, fast on chip memory scratchpad. And, as you can see, straight from a programmer, he says that fast read/write on chip memory scratchpad would help a lot with image post processing.




Not the bullshitters. The confirmed programmers. Hell, one of them, I believe, is even from a sony first party that gaf literally worships :p

That's all I can do now, I'm dying of hunger. No more tech talk from me for the rest of the night, but this isn't the only example of this stuff.

The problem is none of these points give any indication of the overall gain across the system from having the low latency EDRAM. Most points are also mentioned in a very specific use case. You are using some out of context.

Like with anything you read it is open to your own interpretation of what is being said.

Im a long time member of B3D and have read through all those comments, and many counter comments that are also valid. I certainly haven't come to the same conclutions as you.

How much of an effect do you think the ESRAM will have on XB1s performance based on the posts you have read? can you come up with any guestimates at all? Can you quantify any of what you have read into real world figures?

Its quite obvios lower latency is better, nobody can argue that, but if you are going to hold it up as something other than a minor advantage it needs to be quantified.

Its the same with the ACEs, nobody has any idea how much benefit they will bring, and thus theres not much talk about the 8 in ps4 vs 2 in XB1.
 

obonicus

Member
I know I certainly won't. But, I think you're underestimating the performance gap. Consider the impact of every reviewer recommending that people buy the PS4 version each and every time for the next 7 years.

Something worth noting, though. Back during the bad PS3 port days, a confirmed dev on B3D mentioned that often devs were encouraged to pare down Xbox versions of games so they wouldn't look that bad when compared to PS3 versions. Sometimes by Sony, sometimes by the company itself (think of PR shitstorms that inferior PS3 versions - e.g. Bayonetta - caused).

I don't know how much of this was hearsay, but the user was a developer.

I wouldn't be surprised if most devs didn't want to chance it, straight off the bat, or are waiting for someone to try making an inferior XBO port and see what the repercussions are.
 
Something worth noting, though. Back during the bad PS3 port days, a confirmed dev on B3D mentioned that often devs were encouraged to pare down Xbox versions of games so they wouldn't look that bad when compared to PS3 versions. Sometimes by Sony, sometimes by the company itself (think of PR shitstorms that inferior PS3 versions - e.g. Bayonetta - caused).

I don't know how much of this was hearsay, but the user was a developer.

I wouldn't be surprised if most devs didn't want to chance it, straight off the bat, or are waiting for someone to try making an inferior XBO port and see what the repercussions are.

Titles have to compete with others on the same platform, once a few devs start pushing the hardware the others will follow or be left behind. Id imagine a game would get bad worse PR if it obvious its being intentionally held back.

Market share plays a role also, it may not be so even between the two this time
 
I think Microsoft had little choice in adding the Esram when they banked on DDR3 (more for system use and manufacturing benefits based on what I have read). The Esram is essentially offering a slight helping hand to the DDR3, but my guess is they would have dropped it altogether if they'd gone GDDR5 instead.

Agreed that it was basically forced by their DDR3 choice. I just think it must be giving them more than a "slight" helping hand or they would have left it out and saved a few bucks. But slight is subjective, and it's going to be hard to judge how the differences really play out until we see a few games running on both platforms.

As the interviews with Cerny showed, Sony toyed with a similar idea, but just went with a big pool of GDDR5 for everything. A solution that gives more bandwidth, less complexity and has GPGPU benefits (but costs more to manufacturer).

I imagine the "best of both worlds" design would be to have a big pool of GDDR5, a chunk of ESRAM (or another fast on-die cache), and the PS4's beefier GPU, but that would have pushed costs up even higher, and Sony was already pushing it with their move to 8GB. It makes sense that they chose to leave the ESRAM out, but that doesn't necessarily mean that they wouldn't have used it if they could.
 
I imagine the "best of both worlds" design would be to have a big pool of GDDR5, a chunk of ESRAM (or another fast on-die cache), and the PS4's beefier GPU, but that would have pushed costs up even higher, and Sony was already pushing it with their move to 8GB. It makes sense that they chose to leave the ESRAM out, but that doesn't necessarily mean that they wouldn't have used it if they could.

I may be mistaken, but I was under the impression nVidia does that. That's why they have no micro-stuttering.
 
Agreed that it was basically forced by their DDR3 choice. I just think it must be giving them more than a "slight" helping hand or they would have left it out and saved a few bucks. But slight is subjective, and it's going to be hard to judge how the differences really play out until we see a few games running on both platforms.

Pulling numbers out of my ass id say world usage might be seeing bandwidth around the 120GBs mark maybe?

Its a big advantage over just DDR3 for sure almost double, they had to have it they would be screwed without.

It just doesnt hold up to 256bit DDR5.

Sony just got lucky that they could up the DDR5 to 8gb. MS couldn't guarantee it would be available and needed the 8gb for thier planned OS functions, so they whent with thier only choice. Though im sure they could have cone for much higher bandwidth interface to ESRAM which would have made it much more ineresting, like how it was with PS2, im not sure why they didnt??
 

Truespeed

Member
Something worth noting, though. Back during the bad PS3 port days, a confirmed dev on B3D mentioned that often devs were encouraged to pare down Xbox versions of games so they wouldn't look that bad when compared to PS3 versions. Sometimes by Sony, sometimes by the company itself (think of PR shitstorms that inferior PS3 versions - e.g. Bayonetta - caused).

I don't know how much of this was hearsay, but the user was a developer.

I wouldn't be surprised if most devs didn't want to chance it, straight off the bat, or are waiting for someone to try making an inferior XBO port and see what the repercussions are.

LOL, I have a hard time believing Sony would ever tell another company to gimp the 360 version because their 360 architected game just wasn't performant on the PS3. And besides, sabotaging your crown jewel just to quell comparison arguments would have been negligent. Now, there were clauses stipulated by Sony that if you time delayed the PS3 version then you had to add additional content which seems fair to me.

As for your Bayonetta example - that was never supposed to be released for the PS3 initially and when Sega did make the call they farmed it out to some unknown Japanese port mill with a broken homepage.
 

Razgreez

Member
I imagine the "best of both worlds" design would be to have a big pool of GDDR5, a chunk of ESRAM (or another fast on-die cache), and the PS4's beefier GPU, but that would have pushed costs up even higher, and Sony was already pushing it with their move to 8GB. It makes sense that they chose to leave the ESRAM out, but that doesn't necessarily mean that they wouldn't have used it if they could.

Adding complexity is not the best of both worlds. In fact it's a worse situation than they have now. The actual "best of worlds" would be stacked RAM on a wide bus
 
Top Bottom