• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Ars Technica: Penello's XB1 Numbers, Ars "Sniff Test" Analysis

Ravidrath

Member
What the hell. Can it really be that bad? I thought devs were already used to the eDRAM on the X360. Is the eSRAM on X1 really that different from a dev perspective?

I'm not a programmer, but from what I understand the eDRAM in the 360 was all automatic and for a single purpose.

By making the eSRAM flexible and multifunctional, they made it so that it had to be managed manually. And that's where the dev optimization issues come in.

It's by no means insurmountable and could be fixed entirely through APIs already. Just requires more work.
 

KidBeta

Junior Member
for reference and more details

I would not jump to the conclusion that Albert is lying... I think we really need to let the dust settle and see more details from people developing or form the Engineers to know more

Well taking ERPs comment it makes sense why Sony went with 2x the ROPs and don't forget they plan to use the CUs for other stuff when they are idle for graphics.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Me too. Both him and Major have a sense of humor. I hope a lot of the haters don't chase them away.

buyadoggkso8.gif


It's by no means insurmountable and could be fixed entirely through APIs already. Just requires more work.

The fact that games like Ryse are running well on actual hardware pretty much "reduces" this to an issue of additional effort.
 

twobear

sputum-flecked apoplexy
Albert has put himself in such a tricky predicament..... What can he really say now?


He can't really admit he was wrong and say the PS4 actually IS more powerful. He and Major Nelson have denied that far too long.


He can't say the Microsoft "Technical Fellow" was wrong because he played him up as one of the smartest men around who has one of the rarest and prestigious jobs at Microsoft...


All he can really do is support his and "Technical Fellows" original claims and specs and basically say some of the users on GAF don't know what they are talking about and say the ARS article is completely false..


Two nights ago when people told him straight up his math was completely wrong and told him you cannot add the numbers that way, he told us all "Yes you can.". Now ARS is also saying you cannot add the numbers that way, he will HAVE to say "Yes you can.". Otherwise he is admitting that Technical Fellow, one of the smartest men at Microsoft, is wrong.


Don't expect him to come clean or admit he or anyone else was wrong. He simply can't at this point, he's in too deep...

Yep, really foolish to argue that the gap is smaller than it obviously is. They should have stuck with 'having weaker hardware doesn't mean the games are less fun' angle, and pointed at the PS2 and 360.

I kind of feel like coming clean might possibly work at this stage, or at least quietly ramping down the bullish comparison chat until people move on to something else.

I suspect they'll continue to deny it, though. It's frustrating to be patronised, and honestly it devalues their interaction with GAF if all they're going to do is obfuscate and be economical with the truth.
 

teiresias

Member
Someone give this man a damn island. GAF would have broken mere mortals months ago, not only does Penello embrace GAFs hate...he revels in it.

You sir are amazing.

What you call reveling in it and amazing I call avoiding the issue of trying to pass off such blatantly obvious tech FUD that Ars couldn't let it pass.
 
"ATVI was doing the CoD: Ghosts port to nextgen. It took three weeks for PS4 and came out at 90 FPS unoptimized, and four months on Xbone and came out at 15 FPS."

That means screen tearing on PS4 and locked 15 fps on Xbone. 15 is multiple of 60, so no tearing and perfect, stable and awesome IQ with extra 160% cinematic experience.

From now on, I decide to believe this rumor above any other source, call it B3D, DF or simple HW analysis.
 

Freki

Member
OH MY GOD. That's awesome.

Avatar changed!

So why is the bidirectional bandwith of your eSRAM 204GB/s although your one-directional bw is 109GB/s - shouldn't it be 218GB/s?

Why do you compare the added BW of 272GB/s (68gb/sec DDR3 + 204gb/sec on ESRAM) vs. PS4s 176GB/s?
For the Xbox it only works for 32mb of data this way while it works for the whole 8GB on the PS4.

And what's the exact problem with more CUs when we are talking about highly parallelizable tasks? Why don't they scale (nearly) linear in performance when going from 12 -> 18 CU in the given architecture?
 
Albert has put himself in such a tricky predicament..... What can he really say now?


He can't really admit he was wrong and say the PS4 actually IS more powerful. He and Major Nelson have denied that far too long.


He can't say the Microsoft "Technical Fellow" was wrong because he played him up as one of the smartest men around who has one of the rarest and prestigious jobs at Microsoft...


All he can really do is support his and "Technical Fellows" original claims and specs and basically say some of the users on GAF don't know what they are talking about and say the ARS article is completely false..


Two nights ago when people told him straight up his math was completely wrong and told him you cannot add the numbers that way, he told us all "Yes you can.". Now ARS is also saying you cannot add the numbers that way, he will HAVE to say "Yes you can.". Otherwise he is admitting that Technical Fellow, one of the smartest men at Microsoft, is wrong.


Don't expect him to come clean or admit he or anyone else was wrong. He simply can't at this point, he's in too deep...

You can add the numbers that way though.
 

GameSeeker

Member
This is anecdotal from E3, but...

I've heard the architecture with the ESRAM is actually a major hurdle in development because you need to manually fill and flush it.

So unless MS's APIs have improved to the point that this is essentially automatic, the bandwidth and hardware speed are probably irrelevant.

For reference, the story going around E3 went something like this:

"ATVI was doing the CoD: Ghosts port to nextgen. It took three weeks for PS4 and came out at 90 FPS unoptimized, and four months on Xbone and came out at 15 FPS."

While I'm sure the Xbone version will get optimized to 60fps with extra work, this (and other anecdotes) point out a big strength of the PS4: Easier to program than the Xbone.

Cerny did an excellent job in designing an architecture that focused on "time to triangle" and fixed one of the major flaws in the PS3 architecture, namely very complicated to program.

The PS4 is both the more powerful console and the easier to program console. That combination will put Sony in a good position with game developers.
 
So why is the bidirectional bandwith of your eSRAM 204GB/s although your one-directional bw is 109GB/s - shouldn't it be 218GB/s?

And what's the exact problem with more CUs when we are talking about highly parallelizable tasks?

He never said more CU's was bad. He said it wasn't a perfect scaling, 50% more CU's doesn't get you 50% more performance. Ars says it does.
 

spwolf

Member
I would love it if the Xbox One endorsed version of Ghost ran at half of the framerate / resolution of the PS4 version.

Sweet irony. The DF thread will be delicious.

In honesty though, I doubt early development troubles when MS tools were months behind really mean anything for final perf. It's troubling though that we haven't see Ghost nor BF4 running on Xbox One yet.

i kind of believe 90fps because of the trailer looking like old gen game compared to KZ, but still, 15fps for XB1 is hard to believe unless MS tools are really really behind.
 
How? Isn't it simultaneous bandwidth, the GPU isn't reading 1 source at 200+ GB/s its 2 sources? Therefore it cant be added since no single path is 200+?
(Its still 32mb..)


Here you see that the GPU can read from the ESRAM and DDR3 at the same time. So you can add the bandwidths to get the peak bandwidth. It's not a very useful metric but it's also not an inaccurate one.

Our peak on paper is 272gb/sec. (68gb/sec DDR3 + 204gb/sec on ESRAM).
 

Freki

Member
He never said more CU's was bad. He said it wasn't a perfect scaling, 50% more CU's doesn't get you 50% more performance. Ars says it does.

That's what I'd like to know - why doesn't it scale linear (or nearly linear) for higly parallelizable tasks. I know that there are diminishing returns at some point - but surely not for 12 vs 18?
 

RedAssedApe

Banned
The PS4 is both the more powerful console and the easier to program console. That combination will put Sony in a good position with game developers.

But DirectX!

In all seriousness though...the fact that Albert needs to keep going "I need to get back to you guys about your questions" in itself tells me enough about what he actually knows about the Xbone architecture. Don't shoot the messenger. :)

Why not have an actual engineer come on and do an AMA? And not that softball garbage they are doing on IGN.
 

Pain

Banned
i kind of believe 90fps because of the trailer looking like old gen game compared to KZ, but still, 15fps for XB1 is hard to believe unless MS tools are really really behind.
Of course they're behind. Horrible messaging and post-PS4 release should be proof enough.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Here you see that the GPU can read from the ESRAM and DDR3 at the same time. So you can add the bandwidths to get the peak bandwidth. It's not a very useful metric but it's also not an inaccurate one.

This slide also shows that adding the 30GB/s of cache-coherent bandwidth is utter nonsense, since they are included in the 68GB/s cap.
 

TheD

The Detective
Here you see that the GPU can read from the ESRAM and DDR3 at the same time. So you can add the bandwidths to get the peak bandwidth. It's not a very useful metric but it's also not an inaccurate one.

You can not add the bandwidth of the two pools and then compare it to the PS4's memory bandwidth. Which is what Albert did.
 

Ushae

Banned
have we even seen any multiplatform ps4/xb1 games perform on xb1 yet?

Wait for launch? My bet is identical performance across the board.

This article made me chuckle at how much of an armchair general this Ars Technica reviewer is. How much experience does he have in console development again?
 

Vizzeh

Banned
Here you see that the GPU can read from the ESRAM and DDR3 at the same time. So you can add the bandwidths to get the peak bandwidth. It's not a very useful metric but it's also not an inaccurate one.
So the x1 can utilise that amount of bandwidth with less rops, CU's etc when the more powerful ps4 GPU surmised was a 176gb/s sweet spot?
 
You can not add the bandwidth of the two pools and then compare it to the PS4's memory bandwidth. Which is what Albert did.

Of course you can. What's stopping you?

Albert was sure to write that it was peak on paper, and technically afaik, his statement was correct.
 

Freki

Member
Of course you can. What's stopping you?

Albert was sure to write that it was peak on paper, and technically afaik, his statement was correct.

Don't you think it's disingenious to compare BW that works only for 32mb vs. BW that works for 8GB?

Edit: lol - or to put it way more eloquently:
Wrote this in the first thread:

You can add bandwidths in scenarios, where you can saturate the bandwidth to the ESRAM pool with meaningful read/writes of data that fits into that pool, and where you can saturate the bandwidth to main memory with meaningful read/writes as well. In a simplified rendering workflow, you could read texture data from main memory [until that bandwidth is saturated] and read/write pixels from and into several pixelbuffers [in ESRAM until that bandwidth is saturated]. Pixelbuffers generally fit into small memory pools since their size is determined by the amount of pixels in the target resolution multiplied by the amount of information (e.g. color) per pixel.

In such scenarios you can add up bandwidths because you are lucky enough that the bandwidth and pool size needs of your workflow match the architecture.

But you may be limited in flexibility if there are workflows which needs do not fit the architecture. If you need to read/write more data from or into main memory than DDR3 allows, you are bottlenecked and can't saturate the theoretical bandwidth sum. If your ESRAM pool cannot hold all of your buffers, then you are limited by that. If you constantly need to copy data (like textures) between main memory and eSRAM then you "waste" bandwidth on both paths that you would not have to waste if you had a single fast pool of memory where no copying is necessary. (The XB1 seems to have its DMEs to support such copy operations from a perspective of computation resources but the general concept still holds true: you are mitigating a problem that you don't have with a single pool).

So it depends on the scenario and the details. But you can't add up bandwidths generally as if there were no difference between a single memory pool setup and the XB1's setup.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Albert was sure to write that it was peak on paper, and technically afaik, his statement was correct.

Wrote this in the first thread:

You can add bandwidths in scenarios, where you can saturate the bandwidth to the ESRAM pool with meaningful read/writes of data that fits into that pool, and where you can saturate the bandwidth to main memory with meaningful read/writes as well. In a simplified rendering workflow, you could read texture data from main memory [until that bandwidth is saturated] and read/write pixels from and into several pixelbuffers [in ESRAM until that bandwidth is saturated]. Pixelbuffers generally fit into small memory pools since their size is determined by the amount of pixels in the target resolution multiplied by the amount of information (e.g. color) per pixel.

In such scenarios you can add up bandwidths because you are lucky enough that the bandwidth and pool size needs of your workflow match the architecture.

But you may be limited in flexibility if there are workflows which needs do not fit the architecture. If you need to read/write more data from or into main memory than DDR3 allows, you are bottlenecked and can't saturate the theoretical bandwidth sum. If your ESRAM pool cannot hold all of your buffers, then you are limited by that. If you constantly need to copy data (like textures) between main memory and eSRAM then you "waste" bandwidth on both paths that you would not have to waste if you had a single fast pool of memory where no copying is necessary. (The XB1 seems to have its DMEs to support such copy operations from a perspective of computation resources but the general concept still holds true: you are mitigating a problem that you don't have with a single pool).

So it depends on the scenario and the details. But you can't add up bandwidths generally as if there were no difference between a single memory pool setup and the XB1's setup.
 

TheD

The Detective
Of course you can. What's stopping you?

Albert was sure to write that it was peak on paper, and technically afaik, his statement was correct.

The fact is that only a tiny amount of the RAM is that fast and that comparing it like that to the PS4's bandwidth is extremely dishonest.
 

Cuth

Member
ERP not only misunderstood what was said about the CUs, but even then he points out how much more fillrate and bandwidth will help (something the PS4 has a lot more of than the xb1).

GPUs are also heavily latency tolerant! bandwidth is far, far more important to them!
Attacking people for not talking latency in regards to the GPU proves you are way out of your depth!
I think you should read again the message from ERP that was quoted in this thread, because he also talks about latency:
ERP said:
The eSRAM will certainly provide an advantage under some circumstances, and I'm interested if the ROP difference will end up being a factor, or the lower eSRAM latency will end up nullifying it.
So, for him the low latency could help to reduce the ROP advantage PS4 has.
For you it's basically useless.

Now, I hope to not sound offensive, my position is pretty simple: unless you have direct exprience on at least one of the two next-gen consoles and at least as much game development experience as ERP, I'll give more weight to his opinion compared to yours.
 

Vizzeh

Banned
Seen this reference, I believe it makes sense.

despite his insistence to the contrary, memory bandwidth CANNOT simply be added together. Memory pools do not combine together to create Voltron, they're separate. While they can send data back and forth, they can only do so at the speed of their respected bandwidth. DDR3 will not send faster than 68 GB/s. Ever. You can't change that by adding eSRAM. It's still 68 GB/s. You can send data from DDR3 to eSRAM at 68 GB/s where it can then be sent out to the CPU or GPU at 214 GB/s (theoretical maximum after GPU clock speed increase), but it has to arrive at the eSRAM before it can be sent out at this speed. As long as it is sitting in DDR3 the eSRAM has absolutely no effect on its speed of transmission. Pretending that you can add 214 to 68 for the total system bandwidth is just nonsense. It means nothing. It's like adding the clock speed of the CPU and GPU together and pretending that you can process instructions faster"
 

TheD

The Detective
I think you should read again the message from ERP that was quoted in this thread, because he also talks about latency:

So, for him the low latency could help to reduce the ROP advantage PS4 has.
For you it's basically useless.

Now, I hope to not sound offensive, my position is pretty simple: unless you have direct exprience on at least one of the two next-gen consoles and at least as much game development experience as ERP, I'll give more weight to his opinion compared to yours.

The simple fact is that GPUs do not care much about latency due to the loads they process and that claiming that the fillrate could be effected in the face of something GPUs are not really affected by is bullshit.

And BTW, the post of ERP that is quoted does not state that he was worked on the PS4 at all!
 

nib95

Banned
I wouldn't respond to Sony PR/ Viral Marketing Either if I were Albert.

Pretty sure it's bannable to just go around accusing people of being shills on baseless grounds.

It is somewhat frustrating when ignorance compels people to make inept posts without actually rebutting points in a mature manner. You seem to do this a lot. You can't actually rebut, respond, or correct points so just lash out with accusations instead.
 

Faustek

Member
so, ps4 is faster, but xbone is ..stronger? :)



and op, who da fuck is dat 'ars' dude? never heard of him. :)



seriously now, i cant wait for next gen multiplats
although looks like I'll be playing them on pc haha

Sorry for not answering earlier, but work man.

No I'm saying that they can up the speed of the xbone how much they want but the power in it is still weaker.
 

ypo

Member
The guy's been full of shit. What else is new? I can't stand these company PR men trying to pass off as being sincere.
 
Don't you think it's disingenious to compare BW that works only for 32mb vs. BW that works for 8GB?

Edit: lol - or to put it way more eloquently:

Of course it's disingenuous. It's not untrue though. You can't say "You can't add them!!" as many people were, because you totally can.

Fun fact: Technically the high bandwidth will be on slightly more than 32MB as a small amount of DDR3 is added to get up the simultaneous amount.
 

KidBeta

Junior Member
Of course you can. What's stopping you?

Albert was sure to write that it was peak on paper, and technically afaik, his statement was correct.

The esrams peak bandwidth is only ~200gb/s when you read and write at the same time, purely reading its 170gb/sish and 104gb/s write by the GPU

So the x1 can utilise that amount of bandwidth with less rops, CU's etc when the more powerful ps4 GPU surmised was a 176gb/s sweet spot?

The xb1 GPU can only ever write at a max of 104(8?) to any number of its pools combined
 

Vizzeh

Banned
Of course it's disingenuous. It's not untrue though. You can't say "You can't add them!!" as many people were, because you totally can.

Fun fact: Technically the high bandwidth will be on slightly more than 32MB as a small amount of DDR3 is added to get up the simultaneous amount.

Read the post above, it makes sense, ddr3 won't go faster than 68, esram can't go faster than 218gb/s (real terms 133gb/s)...the GPU can read them separately...but the figures can't be added. No transmission is from 1 source can exceed esram at 218 ( and that's paper spec) I dunno if the GPU can even utilise that bandwidth on the x1 GPU when the ps4's sweet spot is apparently 176 and it has double the rope/cu,s etc
 
Top Bottom