• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

EDGE: "Power struggle: the real differences between PS4 and Xbox One performance"

X1 equiv. on market is 100-120$ ( 7770 )

PS4 equiv. on market is about 200-220$ ( 7850 - 7870, i call it a 7860, right in the middle of both )

Hmm so how will the GPU pricing decreases pan out do people think?

Has the 7770 hit rock bottom yet? Will it decrease more?

Does the 7850/7870 have more room for price drops?
 

twobear

sputum-flecked apoplexy
maybe. I wouldn't be surprised if someone crunched numbers to figure out if it was worth including more ESRAM in order to facilitate higher resolutions.

for a company that wanted to try to argue it had designed it's console 'for the future', it's certainly not as forward looking a piece of hardware as the Xbox 360 was.

Literally every aspect of the console echoes with the ways Microsoft wanted to either make or save money with it. There's nothing humane or charming about it.
 

AndyD

aka andydumi
Hmm so how will the GPU pricing decreases pan out do people think?

Has the 7770 hit rock bottom yet? Will it decrease more?

Does the 7850/7870 have more room for price drops?

Hard to say because whole cards include a lot more than just the chip.

But a good rule of thumb is that the XX50/XX70 cards drop to about 60-100 before they disappear. Looking at it from that money perspective, the 7770 is very near the bottom, and the 7850 has a long way to go.
 
Hard to say because whole cards include a lot more than just the chip.

But a good rule of thumb is that the XX50/XX70 cards drop to about 60-100 before they disappear. Looking at it from that money perspective, the 7770 is very near the bottom, and the 7850 has a long way to go.

I thought that might be the case. It seems since the XB1 is using the cheaper solutions there are also using the older solutions (hence the cheapness) so it does appear they will have difficulties dropping prices further for those that have already hit pretty close to rock bottom (DDR3, GPU etc.)
 

vcc

Member
This is an odd situation...

Esram prices generally don't change.

When memory becomes obsolete it generally goes up in price. GDDR3 willl be more expensive the end of the Xbox One's life than currently, yet GDDR3 still hasn't quite hit rock bottom in price.

GDDR5 prices will continue to become cheaper over the PS4's life.

eSRAM price would be built into the cost of the APU; as the size of the APU comes down then the cost of the APU would come down.
 

AndyD

aka andydumi
I thought that might be the case. It seems since the XB1 is using the cheaper solutions there are also using the older solutions (hence the cheapness) so it does appear they will have difficulties dropping prices further for those that have already hit pretty close to rock bottom (DDR3, GPU etc.)

Indeed. Which is why the rumored bill of materials circulating has MS making a profit and Sony not.
 

astraycat

Member
Yeah 32MB broken into even smaller 4x8MB chunks. Not much you can do with 8MB with the next-gen engines that are coming down the pipes.

This isn't really true. All memory comes in physical chunks, and those chunks are further broken down into physical pages. Those pages are then virtually mapped into continuous memory addresses. As far as the developer is concerned, it'll be treated as one 32MiB continuous chunk of memory.

It's like having multiple sticks of RAM in your computer. They're physically separate, but can be virtually continuous. Hell, because of virtual memory physical continuity has very little bearing on programming these days.
 
Indeed. Which is why the rumored bill of materials circulating has MS making a profit and Sony not.

Yes it does appear that you get quite a good deal out of the PS4 at present

XB1 seems rather highly priced although kinect cost is debatable so not sure how much the console itself costs
 

Pbae

Member
Had a stupid question but thought someone could clarify.

Most people are suggesting the XB1 is similar in performance to the Radeon 7770. I get that, but was that specific card chosen because it has comparable tflops?

What I really wanted to know is that if the Xbox One cannot absorb the 10% gpu block that the Kinect causes, would it still be reasonable to label the XB1 as similar to the 7770 rather than a lower specéd card?
 
But there's no way they'd do this. They'd rather have games run at a lower resolution or framerate than the competition than completely break the user experience they're marketing. MS is marketing the XB1 as a TV extender that does games as well. All the running of apps and snap mode, etc., is key to their vision. If they break that...they trash their vision.

Imagine people's confusion and complaints when they can run their apps or use snap mode with every game EXCEPT for COD, let's say. Or if you just let developers kill that when they want, how long is it before none of the apps or snap mode works with ANY of the games your playing.

No, MS won't do that. People just have to realize, I believe, that MS is targeting the XB1 differently than it did the 360. They also already said they didn't target the higher end of graphics, etc., for gaming, people just didn't listen. In the end, MS is banking on the app and TV functionality making the XB1 take off like the Wii did. Remember, the Wii's graphics capabilities were NOTHING like the X360 or the PS3, but it did very well. MS wants to do that, and the XB1 system is the way they believe they can make that happen.

Are we ignoring the fact that:
1 - There isn't any game like Wii Sports to hook people on Kinect like it happened with the Wiimote
2 - Wii launched at $249, Xbox One is launching at $499
3 - "Apps" and TV are hardly groundbreaking features...
 

Skeff

Member
This isn't really true. All memory comes in physical chunks, and those chunks are further broken down into physical pages. Those pages are then virtually mapped into continuous memory addresses. As far as the developer is concerned, it'll be treated as one 32MiB continuous chunk of memory.

It's like having multiple sticks of RAM in your computer. They're physically separate, but can be virtually continuous. Hell, because of virtual memory physical continuity has very little bearing on programming these days.

Whilst your right about it all being one 32mb addressable block, and so there's never a need to try and squish things into an 8mb block as suggested by the poster you quoted, the esram is on 4 separate 256bit buses and so to achieve higher than ~27.5Gb/s you need to split your daa into multiple lanes.

This can be automated to an extent, for instance a Render target could be sent to esram and it could automatically be written across all 4 lanes, and for *most* situations this would be fine, the problem arises when Data must all be pulled from the same lane which is limited by the 256 bit bus it is on. The difficulty is as a programmer to plan ahead and write the correct data to the correct lanes to try and prevent future bottlenecks.
 

mrklaw

MrArseFace
Wasn't the entire point of their using eSRAM in order to save money?

long term. But if they lose market share and attach rate, those losses are unlikely to be offset by long term production cost savings.

I think they should have gone with either:
1) split memory pool with dedicated memory for the OS (DDR3) and games (GDDR5), with some kind of move engine to move data back and forth if necessary. Then the whole APU can be used for CPU+GPU and could be on par with PS4

2) DDR3 8GB as currently, but a daughter die with 64-128MB EDRAM, giving higher bandwidth and again freeing up space on the APU so they can have a larger GPU.


they couldn't have put more ESRAM on the APU - it's already 1.6 billion transistors? Doubling it to 64MB would be over 3 billion transistors - thats as many as an entire GTX480 - just for the ram! You'd basically have no space left for a GPU
 
Had a stupid question but thought someone could clarify.

Most people are suggesting the XB1 is similar in performance to the Radeon 7770. I get that, but was that specific card chosen because it has comparable tflops?

What I really wanted to know is that if the Xbox One cannot absorb the 10% gpu block that the Kinect causes, would it still be reasonable to label the XB1 as similar to the 7770 rather than a lower specéd card?

I believe the comparisons to the 7770 is indeed before the 10% reduction in GPU power due to snap and other media functionality (doubt much GPU is reserved for kinect)

It is still unclear if any GPU on the PS4 is reserved but I can't understand why it would be. I assume it to be designed much like the vita and don't believe any of the vita's GPU is reserved for OS use

long term. But if they lose market share and attach rate, those losses are unlikely to be offset by long term production cost savings.

I think they should have gone with either:
1) split memory pool with dedicated memory for the OS (DDR3) and games (GDDR5), with some kind of move engine to move data back and forth if necessary. Then the whole APU can be used for CPU+GPU and could be on par with PS4

2) DDR3 8GB as currently, but a daughter die with 64-128MB EDRAM, giving higher bandwidth and again freeing up space on the APU so they can have a larger GPU.

Agreed too much was sacrificed with the ESram approach
 

Pbae

Member
I believe the comparisons to the 7770 is indeed before the 10% reduction in GPU power due to snap and other media functionality (doubt much GPU is reserved for kinect)

It is still unclear if any GPU on the PS4 is reserved but I can't understand why it would be. I assume it to be designed much like the vita and don't believe any of the vita's GPU is reserved for OS use

Thank you for your prompt and concise answer. I appreciate it because I was under the impression that the XB1 is comparable to the 7770 even with the 10% blocked off.

In your opinion, what card would best match the XB1 in it's current state? If I wanted to find out myself, would I just google "1.1 tflop Graphics Card"?
 
7870 2GB vs 7770 1GB:
http://www.anandtech.com/bench/product/548?vs=536

Here's a sloppy screen of the benchmark between the 2, you get the idea.
KEEP IN MIND though, CPU's in both PS4 and Xbox One will be the bottleneck and aren't as good to warrant such a huge difference seen here.

lNg2eCA.jpg
 

lyrick

Member
PS4 GPU is a lot closer to a 7850...

Those cards (the 7870 and 7770) use between 150W and 200W on their own under load, neither of those GPUs found their way into either console. People need to be looking a mobile Pitcairn results (7970M).
 
Thank you for your prompt and concise answer. I appreciate it because I was under the impression that the XB1 is comparable to the 7770 even with the 10% blocked off.

In your opinion, what card would best match the XB1 in it's current state? If I wanted to find out myself, would I just google "1.1 tflop Graphics Card"?

Well the Flop count for the 7770 is around 1300 gflops (1.3 Tflops) and with 10% reserved XB1's GPU is approximately 1.18 tflops (.9*1.31)

So the XB1 is really between a 7750 and a 7770 (far closer to a 7770 though), I guess an underclocked 7770 then?
 

Marlenus

Member
Whilst your right about it all being one 32mb addressable block, and so there's never a need to try and squish things into an 8mb block as suggested by the poster you quoted, the esram is on 4 separate 256bit buses and so to achieve higher than ~27.5Gb/s you need to split your daa into multiple lanes.

This can be automated to an extent, for instance a Render target could be sent to esram and it could automatically be written across all 4 lanes, and for *most* situations this would be fine, the problem arises when Data must all be pulled from the same lane which is limited by the 256 bit bus it is on. The difficulty is as a programmer to plan ahead and write the correct data to the correct lanes to try and prevent future bottlenecks.

This applies to any memory pool. Take GDDR5 for example, to get 8GB of it you are using 8 physical memory modules that are each attached to a 32 bit bus. It is the same with the Xbox DDR3 as well except that is 4 64bit blocks with 2GB/block.
 

hesido

Member
Whilst your right about it all being one 32mb addressable block, and so there's never a need to try and squish things into an 8mb block as suggested by the poster you quoted, the esram is on 4 separate 256bit buses and so to achieve higher than ~27.5Gb/s you need to split your daa into multiple lanes.

This can be automated to an extent, for instance a Render target could be sent to esram and it could automatically be written across all 4 lanes, and for *most* situations this would be fine, the problem arises when Data must all be pulled from the same lane which is limited by the 256 bit bus it is on. The difficulty is as a programmer to plan ahead and write the correct data to the correct lanes to try and prevent future bottlenecks.
I don't think this is accurate. The data would be divided into 4 chunks automatically by the memory controller. Like dual channel mem setups on pc but quad.. This way bandwidth is maximised. Correct me if I'm wrong.
 

Marlenus

Member
I don't think this is accurate. The data would be divided into 4 chunks automatically by the memory controller. Like dual channel mem on pc but quad.. This way bandwidth is maximised. Correct me if I'm wrong.

I am pretty sure you are right because all memory systems are split into smaller chunks.
 

Skeff

Member
This applies to any memory pool. Take GDDR5 for example, to get 8GB of it you are using 8 physical memory modules that are each attached to a 32 bit bus. It is the same with the Xbox DDR3 as well except that is 4 64bit blocks with 2GB/block.

I know, however from the DF interview it is implied that the esram could well be manually managed into "lanes" by Developers:

Nick Baker: Over that interface, each lane - to ESRAM is 256-bit making up a total of 1024 bits and that's in each direction. 1024 bits for write will give you a max of 109GB/s and then there's separate read paths again running at peak would give you 109GB/s. What is the equivalent bandwidth of the ESRAM if you were doing the same kind of accounting that you do for external memory... With DDR3 you pretty much take the number of bits on the interface, multiply by the speed and that's how you get 68GB/s. That equivalent on ESRAM would be 218GB/s. However, just like main memory, it's rare to be able to achieve that over long periods of time so typically an external memory interface you run at 70-80 per cent efficiency.

The same discussion with ESRAM as well - the 204GB/s number that was presented at Hot Chips is taking known limitations of the logic around the ESRAM into account. You can't sustain writes for absolutely every single cycle. The writes is known to insert a bubble [a dead cycle] occasionally... One out of every eight cycles is a bubble, so that's how you get the combined 204GB/s as the raw peak that we can really achieve over the ESRAM. And then if you say what can you achieve out of an application - we've measured about 140-150GB/s for ESRAM. That's real code running. That's not some diagnostic or some simulation case or something like that. That is real code that is running at that bandwidth. You can add that to the external memory and say that that probably achieves in similar conditions 50-55GB/s and add those two together you're getting in the order of 200GB/s across the main memory and internally.

One thing I should point out is that there are four 8MB lanes. But it's not a contiguous 8MB chunk of memory within each of those lanes. Each lane, that 8MB is broken down into eight modules. This should address whether you can really have read and write bandwidth in memory simultaneously. Yes you can there are actually a lot more individual blocks that comprise the whole ESRAM so you can talk to those in parallel and of course if you're hitting the same area over and over and over again, you don't get to spread out your bandwidth and so that's why one of the reasons why in real testing you get 140-150GB/s rather than the peak 204GB/s is that it's not just four chunks of 8MB memory. It's a lot more complicated than that and depending on how the pattern you get to use those simultaneously. That's what lets you do read and writes simultaneously. You do get to add the read and write bandwidth as well adding the read and write bandwidth on to the main memory. That's just one of the misconceptions we wanted to clean up.

This, to me, along with what we know about Developers managing the esram means that the developers are likely responsible for where in the memory the data goes.

This is what I meant by it can be managed automatically but perhaps not with the greatest benefits.
 

Rolf NB

Member
Wasn't the entire point of their using eSRAM in order to save money?
Does not compute.

DRAM is built in massive quantities on specialized/tuned fab processes. Area for area, an off-the-shelf DRAM chip will always be cheaper than any custom chip you can fab yourself, simply for economies of scale, and processes specifically tweaked for a single design.

And DRAM also being denser than SRAM further increses the advantage.

Embedded SRAM is not, never, in no rationally driven comparison, a cost benefit over DRAM.

Embedded memory is motivated by performance. Bandwidth mostly. A 512bit memory bus is a radical, high-end thing if you're talking about connecting discrete memory chips. But in the realm of embedded memory, it's low end.

If your embedded memory is outmatched in bandwidth by a discrete solution, you've fucked up. Because you're taking on all that expense and gain nothing from it. And that's what happened with the Xbone.
 

hesido

Member
I know, however from the DF interview it is implied that the esram could well be manually managed into "lanes" by Developers:



This, to me, along with what we know about Developers managing the esram means that the developers are likely responsible for where in the memory the data goes.

This is what I meant by it can be managed automatically but perhaps not with the greatest benefits.
That's him trying to explain the magical extra 109GB bandwidth.. to get that extra bw you need an optimum interleaving of reads and writes. otherwise the memory controller should be doing the 4 lane work automatically.

edit: honestly I can't grasp his explanations to full extent. that extra bandwidth in practice..
 

Skeff

Member
That's him trying to explain the magical extra 109GB bandwidth.. to get that extra bw you need an optimum interleaving of reads and writes. otherwise the memory controller should be doing the 4 lane work automatically

That's basically what I just said isn't it? It probably can be done automatically, but for the Best results (higher than 109Gb/s) Developers are expected to assign memory locations themselves. With the state the esram tools are currently in I wouldn't be surprised to find that the memory must be allocated manually rather than it can be allocated manually.

EDIT: I'm not saying the system definitely can't do this automatically, I'm saying it appears as if Microsoft would Like developers to do this manually and may not allow for this to be done automatically at this time (which would align with the difficulties some developers are reporting)
 

hesido

Member
That's basically what I just said isn't it? It probably can be done automatically, but for the Best results (higher than 109Gb/s) Developers are expected to assign memory locations themselves. With the state the esram tools are currently in I wouldn't be surprised to find that the memory must be allocated manually rather than it can be allocated manually.

EDIT: I'm not saying the system definitely can't do this automatically, I'm saying it appears as if Microsoft would Like developers to do this manually and may not allow for this to be done automatically at this time (which would align with the difficulties some developers are reporting)
Oh I get what you mean better. I also just understood the engineer better, I had so far missed that the memory write and read lanes were separate, that's how I couldnt understand the extra 109GBsec bandwith..
Your suggestion for complexity holds, however, the explanation is slightly different. the 4 lane write+read is still automatic, but to maximize bandwidth, programmers have to decide what memory address to put data and when to read it write it. The data is always transferred in 4 chunks. Just a technicality, your suggestion is not far off.
 

Skeff

Member
Oh I get what you mean better. I also just understood the engineer better, I had so far missed that the memory write and read lanes were separate, that's how I couldnt understand the extra 109GBsec bandwith..
Your suggestion for complexity holds, however, the explanation is slightly different. the 4 lane write+read is still automatic, but to maximize bandwidth, programmers have to decide what memory address to put data and when to read it write it. The data is always transferred in 4 chunks. Just a technicality, your suggestion is not far off.

I think we agree but we're both having difficulty describing it to each other.
 

viveks86

Member
Do we know the X1's fillrates?

Here you go! 41 Gtexels per second, 13.6 Gpixels per seconds. 204 GBps peak for eSRAM and 68 GBps peak for DDR3. There are claims that they can realistically achieve 140 to 150 GBps on the eSRAM. I think it's very similar to the 7770, except for clock rate and RAM setup.

P3DbyQJ.png
 

astraycat

Member
Whilst your right about it all being one 32mb addressable block, and so there's never a need to try and squish things into an 8mb block as suggested by the poster you quoted, the esram is on 4 separate 256bit buses and so to achieve higher than ~27.5Gb/s you need to split your daa into multiple lanes.

This can be automated to an extent, for instance a Render target could be sent to esram and it could automatically be written across all 4 lanes, and for *most* situations this would be fine, the problem arises when Data must all be pulled from the same lane which is limited by the 256 bit bus it is on. The difficulty is as a programmer to plan ahead and write the correct data to the correct lanes to try and prevent future bottlenecks.

If people are actually worried about this (which I doubt) they can just interleave the physical pages across virtual memory. Doing so would be trivial.

Far more of a worry is how to manage swapping things in and out, and how to handle that at a resource level while ensuring one render job doesn't stomp another.
 

Finalizer

Member
PS4 GPU is a lot closer to a 7850...

For fairness' sake, I'd agree, a 7850/7770 comparison gives a better idea. Maybe a 7870/7790 if you wanna overshoot on both of 'em.

Still, by this point I'm not so sure we really need the PC FPS comparisons anymore. We've already got the new console games with resolution differences between each system coming out the gate; the power disparity is plainly evident by now.

Does not compute.

Really? I'd also been under the impression that the ESRAM was chosen for cost reasons, being cheaper than EDRAM, or at least would become so after a die shrink. I recall reading something along those lines in the past... Maybe that Anandtech article from after the Xbone reveal? Gonna go look to see if that's where I read it.
 

Skeff

Member
If people are actually worried about this (which I doubt) they can just interleave the physical pages across virtual memory. Doing so would be trivial.

Far more of a worry is how to manage swapping things in and out, and how to handle that at a resource level while ensuring one render job doesn't stomp another.

Yes, Interleaving is what I meant when I said it could be done automatically, however a more complex solution may be required for out of order memory read/writes. A very basic example would be if we were to store the alphabet across the 4 lanes It could automatically write it to memory as:

Code:
ABCD
EFGH
IJKL
MNOP
QRST
UVWX
YZ

(where the 4 lanes are the 4 vertical columns)

The interleaving would automatically assign the data ABCDEFGHIJKLMNOPQRSTUVWXYZ in this way and whilst reading or writing this it would of course achieve the maximum speed automaticlly, However when we decide ot do out of order read/writes we may not get the maximum bandwidth, For example if we were to pull my Username "Skeff" we would obviously only pull f once, but we would be reading:

Lane 1 - 1 character E
Lane 2 - 1 character F
Lane 3 - 2 characters S and K
Lane 4 - nothing.

I hope that makes sense and I managed to put my point across a little more clearly this time. I think this would only be an issue as the XB1 could have bandwidth issues anyway, if we were to look at the size of the esram and the way it is likely going to have to move tiled frame buffers around between DDR3 and esram, this could cause some (albeit minor) issues.

Clarification: I don't expect to store letters in these address spaces, it was just an example, to show my non-sequential read/write theory. Also I'm aware this could also be said about the GDDR5, however I feel with the memory architecture of the XB1, and the theory of exceeding 109Gb/s I feel it will be exacerbated on the XB1.
 
I'm sure the suits think 720p is good enough. There are certainly plenty of people who agree with them, and I think for a lot of people it will be.

But just like with last gen and SD, by the end of the gen the number of people okay with 720p are going to be smaller in number than they are at the launch of the gen.

And I still suspect Ghosts is deferred, since they've touted realtime volumetric/HDR lighting. I mean, again, it might not be and they haven't confirmed that it is, but if it isn't... well, it's really hard to explain why the Xbox One isn't running it closer in resolution to the PS4 version. Because, there's no reason we should expect more than the 50% difference in resolution we see between the Xbox One version of BF4 and the PS4 version.

It makes perfect sense that the PS4 version of COD is higher resolution than BF4 is. You just have to look at the two games. So why isn't the Xbox One version, if not because of framebuffer limitations?

Makes sense

Literally every aspect of the console echoes with the ways Microsoft wanted to either make or save money with it. There's nothing humane or charming about it.

You hit the nail right on the head
 
But a) we don't know how much cooling the PS4 has and b) the Xbone has a cavernous empty space inside the shell that isn't doing anything for cooling at all.

Airflow for cooling. Negative space isnt ideal to keep anything cool.

Thanks for alleviating some of my fears haha. Call me illogical but I still can't get over how something so small can pack that much power and components with an internal supply can have no heat problems. Lets hope magic sony HW engineering has solved this. We shall see after launch anyhow with the americans beta testing the consoles :p
 
Thanks for alleviating some of my fears haha. Call me illogical but I still can't get over how something so small can pack that much power and components with an internal supply can have no heat problems. Lets hope magic sony HW engineering has solved this. We shall see after launch anyhow with the americans beta testing the consoles :p

It's still to be determined as we'll most certainly get a proper tear down sometime near launch. I doubt either console will suffer from Hardware failures. Although wonder what cboat meant in that one post about risk
 

kitch9

Banned
Thanks for alleviating some of my fears haha. Call me illogical but I still can't get over how something so small can pack that much power and components with an internal supply can have no heat problems. Lets hope magic sony HW engineering has solved this. We shall see after launch anyhow with the americans beta testing the consoles :p

The GPU in my PC pulls nearly 300w (7990) and that cools itself with a heatsink and 3 fans.

The PS4 APU by comparison will pull circa 100W..... Seriously, I've had wet farts that were harder to cool than the PS4.
 
Thanks for alleviating some of my fears haha. Call me illogical but I still can't get over how something so small can pack that much power and components with an internal supply can have no heat problems. Lets hope magic sony HW engineering has solved this. We shall see after launch anyhow with the americans beta testing the consoles :p

All i know is the PS4 hardware has been in development for a long time and doesn't appear to be rushed. So I don't think overheating will be an issue.
 
The GPU in my PC pulls nearly 300w (7990) and that cools itself with a heatsink and 3 fans.

The PS4 APU by comparison will pull circa 100W..... Seriously, I've had wet farts that were harder to cool than the PS4.

It's probably closer to 125W. 25 for CPU and rest for GPU, roughly.
 

onQ123

Member
You back, yung?
Yep

Never knew why he was banned or how he came back since he was a junior/juniored member.

I was banned for simply asking why the guy was wrong for mistaking the transgender person for a man dressed up as a woman at a gaming event when people are always dressed up in cosplay at gaming events.


I guess the mod didn't understand the question & just used his itchy trigger finger.
 
Top Bottom