• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Matt weighs in on PS5 I/O, PS5 vs XSX and what it means for PC.

tydhNJU.png




I found this interesting and I wonder whether or not the XSX can do the same thing. I wouldn't know how it affects games however.

This is mumbo-jumbo. Both consoles are using NAND, which is a page and block-level technology. In other words, data is read at the page level, and written at the block level. This is just the nature of the technology. You cannot access NAND data at the byte (let alone bit) level unless you place it in a different type of memory with that privilege first such as SRAM, DRAM, MRAM or NOR Flash (the latter has very slow write speeds btw, but is the only one AFAIK with bit-level read accessibility since it's often used for XIP), at which point that isn't the NAND providing that level of granularity, but the different memory some of that data is in.

So basically when it comes to explaining the basics of how SSDs (particularly the NAND they use) works, that giant continuous paragraph gets it wrong, they are fantasizing something that isn't possible. The very closest you could get with that is if they were using something like ReRAM (which doesn't even exist in any notable capacities AFAIK and is still in very early stages), MRAM (capacities of only a few MBs and very expensive even for that), or 3D Xpoint (AFAIK quite expensive per-GB though it exists in pretty large capacities and mass availability mainly for data center markets). But those "only" offer byte-level accessibility at lowest granularity level.


The CPU is really is 3.2 for the PS5 that is the mode developers are coding around to keep the gpu up in speed. When games push harder that will drop more as the gpu needs to steal more and more power. Thanks to the fixed clocks the series x has a 10% CPU cushion off the bat minus 10% of a core for io.

I've heard about the profiles, but IIRC didn't Cerny clarify with DF that the final retail system will handle profiles autonomously? Which I would be very curious if could even be done, since doesn't the game code itself determine the amount of power usage that would be going to the CPU and GPU?

They would need some extremely good detection logic and ability to switch power load through the components almost instantly if handling things automatically on their end, at least if the devs decided on profiles they could query the hardware for adjusting power loads ahead of time for specific instances where they expect power usage to be greater than usual. That's just what I'm thinking off in my head, though.
 

JTCx

Member
Who knows, we all saw the Github leak, and the max frequency tested was 2 GHz, maybe Sony got lucky this gen with a GPU that can clock higher, 2 GHz was still high. So it could well be luck and being caught of guard, or it was always the plan there is no way to tell now . But we will hear the stories in a couple of years.
As it stands now, one runs at 2.3 GHz with 36 CU, and the other at 18.25 GHz with 52 CUs. Imo it’s like super charging a Honda Civic, instead of getting a Audio R8, it all feels a bit weird imo.
Right. Sony got lucky in the lab with those goggles on hoping those last minute upclocking will work.

Lmao
 

sinnergy

Member
Right. Sony got lucky in the lab with those goggles on hoping those last minute upclocking will work.

Lmao
That shit happens, like with the extra 4 gig RAM in PS4, it’s also a process that can have set backs or some luck. Like most things in life.

there are a few discoveries we use today that where accidental or luck when they experimented.
 
Last edited:

Leyasu

Banned
Not this time, these are all running on hardware rather than target specs like inside xbox episode we saw where everything was running on pc. Keep spinning though haha
You won't be seeing anything that is truly built from the ground up for these consoles until 2022/23. Perhaps at the very earliest 2021.

Keep your expectations in check.
 
That shit happens, like with the extra 4 gig RAM in PS4, it’s also a process that can have set backs or some luck. Like most things in life.

there are a few discoveries we use today that where accidental or luck when they experimented.

That was swapping out one set of chips for another. It really isn't the same thing.
 

geordiemp

Member
Just like the other side uses the SSD? in the end the GPU draws the pictures , which benefits from more CUs and more bandwidth. Or is there another component that draws the final image ?

And where do the pictres come from, tgh
Why do people keep repeating this nonsense?

- XSX is faster in computing everything and transfering work data
- PS5 is faster in transfering static game data into the pool of work data

Both are faster than the other in some respect, what is stronger even supposed to mean here other than being faster?

Not so sure, TF compute yes, faster at general things like cache related to clocks ? No,

IO and SSD is no longer static if pulled in every frame it becomes more dynamic and fluid data with game state.

Then there is driver and api overhead, Playstation more direct vs MS absraction layers. I am not expecting much difference in 3rd party is my view, not as much as you hope it is going to be.

And then there will be games that leverage that IO.......
 
Last edited:

JTCx

Member
That shit happens, like with the extra 4 gig RAM in PS4, it’s also a process that can have set backs or some luck. Like most things in life.

there are a few discoveries we use today that where accidental or luck when they experimented.
Ps4 8gb ram was based on dev feedback just like the whole design of ps5. Your analogy is weak.

Get yo lab coats ready and make those last minute upclocking because of XSX specs. LOL
 

quest

Not Banned from OT
This is mumbo-jumbo. Both consoles are using NAND, which is a page and block-level technology. In other words, data is read at the page level, and written at the block level. This is just the nature of the technology. You cannot access NAND data at the byte (let alone bit) level unless you place it in a different type of memory with that privilege first such as SRAM, DRAM, MRAM or NOR Flash (the latter has very slow write speeds btw, but is the only one AFAIK with bit-level read accessibility since it's often used for XIP), at which point that isn't the NAND providing that level of granularity, but the different memory some of that data is in.

So basically when it comes to explaining the basics of how SSDs (particularly the NAND they use) works, that giant continuous paragraph gets it wrong, they are fantasizing something that isn't possible. The very closest you could get with that is if they were using something like ReRAM (which doesn't even exist in any notable capacities AFAIK and is still in very early stages), MRAM (capacities of only a few MBs and very expensive even for that), or 3D Xpoint (AFAIK quite expensive per-GB though it exists in pretty large capacities and mass availability mainly for data center markets). But those "only" offer byte-level accessibility at lowest granularity level.




I've heard about the profiles, but IIRC didn't Cerny clarify with DF that the final retail system will handle profiles autonomously? Which I would be very curious if could even be done, since doesn't the game code itself determine the amount of power usage that would be going to the CPU and GPU?

They would need some extremely good detection logic and ability to switch power load through the components almost instantly if handling things automatically on their end, at least if the devs decided on profiles they could query the hardware for adjusting power loads ahead of time for specific instances where they expect power usage to be greater than usual. That's just what I'm thinking off in my head, though.
The profiles allow developers to code what the variable clocks will kind of be like with out a true simulation. The retail units do it automatically. That is why any demo on kits really are not a true test no variable clocks. Could be locked in full CPU and gpu.
 

onQ123

Member
What does that have to do with the conversation? Edit: Nvr mind. Maybe it relates to PC gaming somehow.
Because Matt said that the PS5 I/O is like something from a mind gen refresh & I'm saying if the difference is big enough & noticeable the companies doing cloud gaming services can take advantage of this selling point because they can store their games on even faster SSDs & build games around that with no loading.
 
Last edited:

geordiemp

Member
Because Matt said that the PS5 I/O is like something from a mind gen refresh & I'm saying if the difference is big enough & noticeable the companies doing cloud gaming services can take advantage of this selling point because they can store their games on even faster SSDs & build games around that with no loading.

Not on windows lol
 
The profiles allow developers to code what the variable clocks will kind of be like with out a true simulation. The retail units do it automatically. That is why any demo on kits really are not a true test no variable clocks. Could be locked in full CPU and gpu.

Interesting; I guess we'll see (somewhat) soon enough. I'm still curious how in the retail units, they will be able to adjust the power distribution automatically by detecting the code. What exactly are they detecting in the code to determine how to adjust the load? Are they using some type of analyzing utility program in the OS for this? Or a different co-processor (so at least some part of code has to be sent twice potentially)?

Are devs flagging parts of their code to trigger a power distribution load shift? And how much is all of that going to cost in terms of needed hardware and overall performance? There likely has to be some kind of price to it if devs are not managing the power load distribution themselves, but that would require them to be a LOT more mindful about their code and know ahead of time how much resources that code is going to be using, that way they know the operations generated from the code are in a given power profile.

There's just so many (maybe almost too many) questions surrounding their variable frequency setup and that's even with accounting everything explained in Road to PS5. Would look forward to them detailing it further some time in the future.
 
Last edited:

Elog

Member
This is mumbo-jumbo. Both consoles are using NAND, which is a page and block-level technology. In other words, data is read at the page level, and written at the block level. This is just the nature of the technology. You cannot access NAND data at the byte (let alone bit) level unless you place it in a different type of memory with that privilege first such as SRAM, DRAM, MRAM or NOR Flash (the latter has very slow write speeds btw, but is the only one AFAIK with bit-level read accessibility since it's often used for XIP), at which point that isn't the NAND providing that level of granularity, but the different memory some of that data is in.

Is that limitation hardware or software driven? And if it is hardware driven, is it a hardware standard rather than a hard hardware limitation? Question I ask is that since the PS5 have a custom memory controller for the SSD, it does not seem far fetched that Cerny could have changed the way it is read assuming it is not a hard hardware limitation. The programmer that wrote the text a) clearly had experience from the PS5 dev kit and b) wrote that text way ahead of the Cerny speech. Seems fairly credible to me given the timestamp.
 

alstrike

Member
The CPU is really is 3.2 for the PS5 that is the mode developers are coding around to keep the gpu up in speed. When games push harder that will drop more as the gpu needs to steal more and more power. Thanks to the fixed clocks the series x has a 10% CPU cushion off the bat minus 10% of a core for io.

tenor.gif
 
Is that limitation hardware or software driven? And if it is hardware driven, is it a hardware standard rather than a hard hardware limitation? Question I ask is that since the PS5 have a custom memory controller for the SSD, it does not seem far fetched that Cerny could have changed the way it is read assuming it is not a hard hardware limitation. The programmer that wrote the text a) clearly had experience from the PS5 dev kit and b) wrote that text way ahead of the Cerny speech. Seems fairly credible to me given the timestamp.

No, it's hardware-based. NAND technology is inherently page-addressable on reads and block-addressable on writes by its nature. There's nothing anyone, Cerny included, could do to change that unless they use a technology different from NAND altogether.

I've heard a few things about how the PS5 handles the look-up tables for data entries on the NAND that seems pretty novel and has its own advantages, but none of that changes how the NAND memory itself functions. I don't know who that poster is but unless they were able to clarify or provide even the slightest hint as to what was being done with PS5's I/O hardware to somehow access data on the NAND at bit-level granularity, then they're making it up.

Maybe they are referring to some small NOR Flash cache? Which wouldn't make any sense due to NOR flash having very bad write speeds relative to NAND. Other technologies like MRAM and 3D Xpoint are byte-level at lowest (for reads), not bit-level (correction: they ARE bit-level for writes of data...maybe. MRAM most certainly is. Would think 3D Xpoint is, too). I'd assume PS5 is maybe using the SRAM cache of the memory controller to (very quickly) read pages/blocks of data into and then pick out select data to write at bit-level if needed, then that data in the cache gets written back out to the NAND at some point (if the cache is only large enough for but so many pages/blocks worth of data to hold then that's going to influence how soon the data needs to be written back out to NAND or deleted from the cache itself).
 
Last edited:

martino

Member
Interesting; I guess we'll see (somewhat) soon enough. I'm still curious how in the retail units, they will be able to adjust the power distribution automatically by detecting the code. What exactly are they detecting in the code to determine how to adjust the load? Are they using some type of analyzing utility program in the OS for this? Or a different co-processor (so at least some part of code has to be sent twice potentially)?

Are devs flagging parts of their code to trigger a power distribution load shift? And how much is all of that going to cost in terms of needed hardware and overall performance? There likely has to be some kind of price to it if devs are not managing the power load distribution themselves, but that would require them to be a LOT more mindful about their code and know ahead of time how much resources that code is going to be using, that way they know the operations generated from the code are in a given power profile.

There's just so many (maybe almost too many) questions surrounding their variable frequency setup and that's even with accounting everything explained in Road to PS5. Would look forward to them detailing it further some time in the future.

you can see a (too) quick look at smartshift here :
 

Elog

Member
Maybe they are referring to some small NOR Flash cache? Which wouldn't make any sense due to NOR flash having very bad write speeds relative to NAND. Other technologies like MRAM and 3D Xpoint are byte-level at lowest (for reads), not bit-level (correction: they ARE bit-level for writes of data...maybe. MRAM most certainly is. Would think 3D Xpoint is, too). I'd assume PS5 is maybe using the SRAM cache of the memory controller to (very quickly) read pages/blocks of data into and then pick out select data to write at bit-level if needed, then that data in the cache gets written back out to the NAND at some point (if the cache is only large enough for but so many pages/blocks worth of data to hold then that's going to influence how soon the data needs to be written back out to NAND or deleted from the cache itself).

So the only way to make the original text correct would be to assume - right or wrong - that he refers to the information in the SRAM of the I/O controller that can be read in that way, i.e. from the first stop where a read page from the SSD ends up. Thank you for your answer!
 
you can see a (too) quick look at smartshift here :


NGL if it was in there I missed it xD. I did see them mention FreeSync but are they and Smartshift the same thing?

So the only way to make the original text correct would be to assume - right or wrong - that he refers to the information in the SRAM of the I/O controller that can be read in that way, i.e. from the first stop where a read page from the SSD ends up. Thank you for your answer!

Yep, pretty much. And depending on the quality of the SRAM, it's probably either between 16 MB (the high-quality, very fast type) to maybe 64 MB/ 128 MB (more PS-RAM territory; still very fast and very low latency, but it's moreso repurposed DRAM designed to act like SRAM).

You're welcome!
 

geordiemp

Member
you can see a (too) quick look at smartshift here :


I think too many people mix up smart shift, which is simply transferring power between CPU and GPU, and the more important predictive clock and power settings based on intended work load. The only common bit is they might have same granularity (smart shift is 2ms , I dont know what the other is).

The concept is straight forward, the complexity is how did sony manage to get logic that can predictively downclock when known circumstances that generate excessive heat present themselves such as an uncontrolled map screen.

It could be very complex or even very simple, I wonder if they patented the concept ?

One thing is cerain, controlling heat just as it is generated is always more efficient than control that is waiting on the thermal result.
 
Last edited:

Elog

Member
So it is fairly clear what the I/O complex does in terms of high quality textures, and on the spot changes to VRAM and load times.

The question I have tried to find answers to - and have not found - is the following:

Assume you have a graphics card on a PC (pick your model) and you get 70 FPS on average at ultra-settings and 1440p with 5% lows at 55 FPS. What does utilisation look like across cores and shaders during rendering? What drives the dips into the lows? I have read several article and book chapters now without getting solid numbers. There are multiple hints that utilisation across CUs and shaders are very far from even but no-one gives practical answers. The question I try to understand is what the value can be of variable frequency, i.e. one CU is starting to hit the roof and be the bottle-neck and a frequency increase solves that issue and how much cache management by the GPU matters to increase efficiency, i.e. how much does GPU cache management impact GPU performance. In both if these areas PS5 seems to have a strength - I just cannot get my arms around what that strength might result in in practice.

Anyone has practical knowledge about GPU utilisation pattern under load (please note that this is very different from the GPU utilisation number you get under windows - that number does not tell you if it is actually used or not - just that it is fired up)?
 

Yoshi

Headmaster of Console Warrior Jugendstrafanstalt
Not so sure, TF compute yes, faster at general things like cache related to clocks ? No,
All computations are faster. CPU and GPU are both faster and RAM is also faster. Caching is a more complex matter because it depends on what you want to cache and what part of caching you are discussing specifically. If you want to load data from SSD to RAM, to keep it there, that process takes longer on Xbox than on PlayStation, if you want to cache something you have just computed, then you are just keeping it in RAM, so there is no speed difference (other than maybe the cost of moving from registers to RAM, which is negligible). Technically you could also use the SSD to cache stuff you have computed, but that is not a very good idea, because if you use SSD as an extension of RAM you will kill the SSD very quickly. Moreover, SSD I/O is not fast enough for this to be practical.
IO and SSD is no longer static if pulled in every frame it becomes more dynamic and fluid data with game state.
What?
Then there is driver and api overhead, Playstation more direct vs MS absraction layers. I am not expecting much difference in 3rd party is my view, not as much as you hope it is going to be.
We are discussing the hardware here and, you know, Microsoft is an OS developer first and foremost. Without knowing both systems very well, it is impossible to make any judgements on OS footprint, but I would be surprised if Microsoft ran into a disadvantage in their core market, specifically.
I am not expecting much difference in 3rd party is my view, not as much as you hope it is going to be.
What I hope it is going to be? Please tell me more. Why would I hope for a significant difference on third party titles between both systems?
 

geordiemp

Member
All computations are faster. CPU and GPU are both faster and RAM is also faster. Caching is a more complex matter because it depends on what you want to cache and what part of caching you are discussing specifically. If you want to load data from SSD to RAM, to keep it there, that process takes longer on Xbox than on PlayStation, if you want to cache something you have just computed, then you are just keeping it in RAM, so there is no speed difference (other than maybe the cost of moving from registers to RAM, which is negligible). Technically you could also use the SSD to cache stuff you have computed, but that is not a very good idea, because if you use SSD as an extension of RAM you will kill the SSD very quickly. Moreover, SSD I/O is not fast enough for this to be practical.

What?
We are discussing the hardware here and, you know, Microsoft is an OS developer first and foremost. Without knowing both systems very well, it is impossible to make any judgements on OS footprint, but I would be surprised if Microsoft ran into a disadvantage in their core market, specifically.

What I hope it is going to be? Please tell me more. Why would I hope for a significant difference on third party titles between both systems?

Nope

Rasterisation ? working caches and everything that works from the GPU base clock. so NO, do you know anything about this ? Do you know the ROPs layouts in both consoles and number ? I dont, its not been said yet.

You used the word static, I just repeated it, you tell me what is static about loading data from SSD every frame as Tim Sweeny said. You used the word static bacause it plays down the role of SSD and IO when it is read, its subtle FUD.

I am talking about api and driver latency due to layers of abstraction not the size of the OS. Do you want some examples to read about so we can properly discuss, as it seems to have gone miles over your head.

You over egging the difference therefore i am assuming your hoping of a performance delta that people will care about, you didnt say it directly but the tone of your post and incorrect assumptions clearly said everything I need to know.
 
Last edited:

martino

Member
I think too many people mix up smart shift, which is simply transferring power between CPU and GPU, and the more important predictive clock and power settings based on intended work load. The only common bit is they might have same granularity (smart shift is 2ms , I dont know what the other is).

The concept is straight forward, the complexity is how did sony manage to get logic that can predictively downclock when known circumstances that generate excessive heat present themselves such as an uncontrolled map screen.

It could be very complex or even very simple, I wonder if they patented the concept ?

One thing is cerain, controlling heat just as it is generated is always more efficient than control that is waiting on the thermal result.

if you have the need for it indeed it is.
 

Yoshi

Headmaster of Console Warrior Jugendstrafanstalt
Rasterisation ? working caches and everything that works from the GPU base clock. so NO, do you know anything about this ? Do you want some educational links for reading ?
I am really interested in what you want to tell me about rasterisation. Are you talking hardware rasterisation or software rasterisation?
You used the word static, I just repeated it, you tell me what is static about loading data from SSD every frame as Tim Sweeny said. You used the word static bacause it plays down the role of SSD and IO when it is read, its subtle FUD.
Static means that you load data that is not the result of some computations you have made, but that you are loading data that is fixed on the disc. E.g. loading a texture is static. Loading a buffered intermediate result in a dynamic programming problem is not static because the buffered information depends on the specific instance you are solving. I did not use the word static to play down anything, I was describing the kind of data you were loading. You have dynamic data, such as positional data of moving agents and you have static data such as geometry and textures you are loading from disc.

I am talking about api and driver latency due to layers of abstraction not the size of the OS. Do you want some examples to read about so we can properly discuss, as it seems to have gone miles over your head.
Where did I say anything about size of the OS, I was talking the computational footprint, because the OS is a major step of abstraction and you yourself brought up OS as a factor. We - as people who have not developed low level code on PS5 or XSX - have no idea how big the computational footprint of OS and other abstraction layers is on either system, so anything you are saying in that regard is pure speculation. Moreover, ease of development is a major goal of both console manufacturers, and also of developers of engines such as Unity and Unreal, which inherently are an important layer of abstraction. Abstraction is what makes modern game development possible. If everything had to be written to the metal in Assembly, you wouldn't see many games released.

You over egging the difference therefore i am assuming your hoping of a performance delta that people will care about, you didnt say it directly but the tone of your post and incorrect assumptions clearly does.
Your assumption is about as good as your approximation of computer science expertise. I could not care less whether people will care about performance differences between XSX and PS5, I am not a console vendor and I have no stock in either MS nor Sony. In fact, I think that the differences in performance between XSX and PS5 on multi platform titles will have no impact on either system's sales.
 

quest

Not Banned from OT
I think too many people mix up smart shift, which is simply transferring power between CPU and GPU, and the more important predictive clock and power settings based on intended work load. The only common bit is they might have same granularity (smart shift is 2ms , I dont know what the other is).

The concept is straight forward, the complexity is how did sony manage to get logic that can predictively downclock when known circumstances that generate excessive heat present themselves such as an uncontrolled map screen.

It could be very complex or even very simple, I wonder if they patented the concept ?

One thing is cerain, controlling heat just as it is generated is always more efficient than control that is waiting on the thermal result.
I'm sure rather simple they have hardware to capture instructions. Each instruction has a point value avx 256 is a 10 aka worst for heat. If the point value gets over a certain value in a time frame it down clocks for so many cycles then the counter is reset. That would be the really high level view of how im guessing how it is done.
 

geordiemp

Member
I am really interested in what you want to tell me about rasterisation. Are you talking hardware rasterisation or software rasterisation?

Static means that you load data that is not the result of some computations you have made, but that you are loading data that is fixed on the disc. E.g. loading a texture is static. Loading a buffered intermediate result in a dynamic programming problem is not static because the buffered information depends on the specific instance you are solving. I did not use the word static to play down anything, I was describing the kind of data you were loading. You have dynamic data, such as positional data of moving agents and you have static data such as geometry and textures you are loading from disc.


Where did I say anything about size of the OS, I was talking the computational footprint, because the OS is a major step of abstraction and you yourself brought up OS as a factor. We - as people who have not developed low level code on PS5 or XSX - have no idea how big the computational footprint of OS and other abstraction layers is on either system, so anything you are saying in that regard is pure speculation. Moreover, ease of development is a major goal of both console manufacturers, and also of developers of engines such as Unity and Unreal, which inherently are an important layer of abstraction. Abstraction is what makes modern game development possible. If everything had to be written to the metal in Assembly, you wouldn't see many games released.


Your assumption is about as good as your approximation of computer science expertise. I could not care less whether people will care about performance differences between XSX and PS5, I am not a console vendor and I have no stock in either MS nor Sony. In fact, I think that the differences in performance between XSX and PS5 on multi platform titles will have no impact on either system's sales.

You can look up rasterisation and Rops yourself, your homework, also look at Ps4pro...Rops..

Some reading on apis and abstraction differences on ps4 / xb1, we dont know what they are on ps5 and XSX yet...however, go read below.


Go read above and come back with your thoughts for starters.

Oles Shishkovstov: Let's put it that way - we have seen scenarios where a single CPU core was fully loaded just by issuing draw-calls on Xbox One (and that's surely on the 'mono' driver with several fast-path calls utilised). Then, the same scenario on PS4, it was actually difficult to find those draw-calls in the profile graphs, because they are using almost no time and are barely visible as a result.

In general - I don't really get why they choose DX11 as a starting point for the console. It's a console! Why care about some legacy stuff at all? On PS4, most GPU commands are just a few DWORDs written into the command buffer, let's say just a few CPU clock cycles. On Xbox One it easily could be one million times slower because of all the bookkeeping the API does.


Ps the graphic data that stays in RAM is the static data, the data for streaming changes dynamically therefore its the other way around.
 
Last edited:
So it is fairly clear what the I/O complex does in terms of high quality textures, and on the spot changes to VRAM and load times.

The question I have tried to find answers to - and have not found - is the following:

Assume you have a graphics card on a PC (pick your model) and you get 70 FPS on average at ultra-settings and 1440p with 5% lows at 55 FPS. What does utilisation look like across cores and shaders during rendering? What drives the dips into the lows? I have read several article and book chapters now without getting solid numbers. There are multiple hints that utilisation across CUs and shaders are very far from even but no-one gives practical answers. The question I try to understand is what the value can be of variable frequency, i.e. one CU is starting to hit the roof and be the bottle-neck and a frequency increase solves that issue and how much cache management by the GPU matters to increase efficiency, i.e. how much does GPU cache management impact GPU performance. In both if these areas PS5 seems to have a strength - I just cannot get my arms around what that strength might result in in practice.

Anyone has practical knowledge about GPU utilisation pattern under load (please note that this is very different from the GPU utilisation number you get under windows - that number does not tell you if it is actually used or not - just that it is fired up)?

I don't think there's an easy or even standardized answer here because it comes down a lot to the engine a game is running, the GPU programming language being used (CUDA, etc.), and the programming techniques of the application in question. My understanding of things on this front aren't super-detailed, but I assume generally CUs are filled with work depending on the requirements of the workload, and (usually) that's done sequentially, i.e when the first CU has its caches occupied and it's working on data then the next CU in the block is given data to work with, once all CUs in a block are occupied then CUs in the next block are assigned with tasks by the scheduler etc.

So in that type of environment, the GPU operating faster will clear out its taskwork in the caches more quickly and is sooner available to queue for more work on the task. However if that's the case then it's also worth keeping in mind it's not the only way work can be assigned to CUs in the GPU, there's as asynchronous programming where, with the frontend improvements in RDNA2 architectures, should allow for much better saturation of even lower-demanding tasks on a wider array of CUs of the GPU, that way you don't end up with the situation of extreme unevenness in the GPU hardware utilization.

So with that type of example, a task only needing say 3 TF of computational power could be spread out over 18 CUs instead of 9 CUs for example on PS5 (each CU there is roughly 285 GF); that would give that task more L1 and L2 cache to work on in parallel, in addition to the speed of processing data through the caches based on the GPU clock. If I had to take a guess, I think one of the reasons Sony focused on high clocks is because they were very aware of the problems you mentioned earlier and maybe weren't confident AMD could improve the frontend to such a degree where higher and smarter rates of CU parallelized utilization would give the results they wanted without going absolutely massive on the GPU size, which would've drove up costs. So they chose a strategy on higher clocks instead, and whatever frontend improvements came would be a "nice bonus" on top of that, but at least this allowed them to go with a smaller GPU, banking on yield improvements with shift to 7nm to save on costs even if they would need higher-quality silicon due to the higher clock.

MS, on the other hand, I think they had a lot more confidence in AMD's design team to make breakthroughs on the frontend and improve parallelized CU workloads by magnitudes. The fact we're seeing much bigger GPUs taking the focus front-and-center from AMD (as well as Nvidia and Intel) seems to show that they've made those improvements. They went with a more modest GPU clock in understanding that those frontend and architectural improvements would translate to bigger gains on parts of the GPU performance that are not as reliant on the clocks, and also knowing those would benefit more from a larger GPU. Even if the larger die size would affect the pricing, they wouldn't need such super high-quality silicon for the GPU as the clocks are lower, offsetting some of that extra cost.

Overall I'd say both Sony and MS made smart decisions with their GPUs based on what they saw in the roadmaps and what they saw in terms of actual performance from their current-gen systems when starting up next-gen design work. To the things you mention with PS5; to my understanding the system is in a "continuous boost" mode by default so the GPU is already at full clock speed of 2.23 GHz, and lowers down based on load demands but that is the result of the power distribution being scaled back, not the frequency itself being directly changed. So depending on the GPU task, the GPU will variate the frequency by adjusting the power to itself, but it never goes past 2.23 GHz.

I don't think the power load adjustment works on a CU-level, i.e selectively scaling the power load per CU. They are either all operating at one frequency or all operating at another frequency depending on the power allotted to the GPU. So the example you mention, I don't think that's actually something which can happen on the GPU side. This would also extend to the caches; they're either all operating at one frequency or all operating at another frequency, it's not a case of one CU operating at 2.23 GHz and another at 2.5 GHz and yet another at 1.9 GHz but across the board they're all operating at a net power load resulting in a frequency at or below 2.23 GHz.

.
 

geordiemp

Member
I'm sure rather simple they have hardware to capture instructions. Each instruction has a point value avx 256 is a 10 aka worst for heat. If the point value gets over a certain value in a time frame it down clocks for so many cycles then the counter is reset. That would be the really high level view of how im guessing how it is done.

We may never find out until someone spills the beans. I am sure Sony have engineered and tested the scenarios to death and come up witha simple (and clever) way. That is really the mystery of predictive workload control, I am sure millisecond rate they can change clocks and voltages is understood..
 

geordiemp

Member
I don't think the power load adjustment works on a CU-level, i.e selectively scaling the power load per CU. They are either all operating at one frequency or all operating at another frequency depending on the power allotted to the GPU. So the example you mention, I don't think that's actually something which can happen on the GPU side. This would also extend to the caches; they're either all operating at one frequency or all operating at another frequency, it's not a case of one CU operating at 2.23 GHz and another at 2.5 GHz and yet another at 1.9 GHz but across the board they're all operating at a net power load resulting in a frequency at or below 2.23 GHz.

.

I think there will be only 1 GPU clock, and it will be either 2.23 Ghz or less and my gut feel is it will chage in 2ms steps and steps in frequency like smartshift (I read somewhere). Changing power for 2ms in advance or just as its happening is better than waiting 1000 millisconds for temperature rising steps is my "feel" for the efficiency, remove edge cases early before you get hot in first place and the old way of ramping the fan.

I look forward to more Cerny talk on this as its my cuppa tea.
 
Last edited:

Ar¢tos

Member
All computations are faster. CPU and GPU are both faster and RAM is also faster. Caching is a more complex matter because it depends on what you want to cache and what part of caching you are discussing specifically. If you want to load data from SSD to RAM, to keep it there, that process takes longer on Xbox than on PlayStation, if you want to cache something you have just computed, then you are just keeping it in RAM, so there is no speed difference (other than maybe the cost of moving from registers to RAM, which is negligible). Technically you could also use the SSD to cache stuff you have computed, but that is not a very good idea, because if you use SSD as an extension of RAM you will kill the SSD very quickly. Moreover, SSD I/O is not fast enough for this to be practical.

What?
We are discussing the hardware here and, you know, Microsoft is an OS developer first and foremost. Without knowing both systems very well, it is impossible to make any judgements on OS footprint, but I would be surprised if Microsoft ran into a disadvantage in their core market, specifically.

What I hope it is going to be? Please tell me more. Why would I hope for a significant difference on third party titles between both systems?
For an OS developer they could have done A LOT better than the triple OS of the X1.

It's safe to assume that XSX OS is going to be Windows based, that for me is enough to predict a bigger overhead than the PS5 freebsd based OS.
You can get a fully functional BSD distro taking only 16mb space and using only 100mb ram.
 

kuncol02

Banned
No, it's hardware-based. NAND technology is inherently page-addressable on reads and block-addressable on writes by its nature. There's nothing anyone, Cerny included, could do to change that unless they use a technology different from NAND altogether.

I've heard a few things about how the PS5 handles the look-up tables for data entries on the NAND that seems pretty novel and has its own advantages, but none of that changes how the NAND memory itself functions. I don't know who that poster is but unless they were able to clarify or provide even the slightest hint as to what was being done with PS5's I/O hardware to somehow access data on the NAND at bit-level granularity, then they're making it up.

Maybe they are referring to some small NOR Flash cache? Which wouldn't make any sense due to NOR flash having very bad write speeds relative to NAND. Other technologies like MRAM and 3D Xpoint are byte-level at lowest (for reads), not bit-level (correction: they ARE bit-level for writes of data...maybe. MRAM most certainly is. Would think 3D Xpoint is, too). I'd assume PS5 is maybe using the SRAM cache of the memory controller to (very quickly) read pages/blocks of data into and then pick out select data to write at bit-level if needed, then that data in the cache gets written back out to the NAND at some point (if the cache is only large enough for but so many pages/blocks worth of data to hold then that's going to influence how soon the data needs to be written back out to NAND or deleted from the cache itself).
They could theoretically allow bit by bit data change in software but then controller would write and read that by whole blocks, but that not only makes no sense, that would also shorten PS5 life so bad, that it could probably compete with first generation of X360.
 

martino

Member
You can look up rasterisation and Rops yourself, your homework, also look at Ps4pro...Rops..

Some reading on apis and abstraction differences on ps4 / xb1, we dont know what they are on ps5 and XSX yet...however, go read below.


Go read above and come back with your thoughts for starters.

same studio later with dx12 api :

Ben Archard: Actually, we've got a great perf boost on Xbox-family consoles on both GPU and CPU thanks to DX12.X API. I believe it is a common/public knowledge, but GPU microcode on Xbox directly consumes API as is, like SetPSO is just a few DWORDs in command buffer. As for PC - you know, all the new stuff and features accessible goes into DX12, and DX11 is kind of forgotten. As we are frequently on the bleeding edge - we have no choice!

 
Last edited:

geordiemp

Member
same studio later with dx12 api :




I know it was fixed, I did not even engage in the million times slower......I was just giving an example.

Do you think DX12 is as direct as Sony apis ?

In general MS do have more abstraction than Sony, what do you think the real world percentage cost in % will be ?

Abstraction is great for BC, but its not free, that is my point.
 
Last edited:

kuncol02

Banned
For an OS developer they could have done A LOT better than the triple OS of the X1.

It's safe to assume that XSX OS is going to be Windows based, that for me is enough to predict a bigger overhead than the PS5 freebsd based OS.
There was link for article about that like week ago. Actuall cost of that virtualization was really small. Way smaller than you could expect. I guess that's also base for their backward compatibility.
 
They could theoretically allow bit by bit data change in software but then controller would write and read that by whole blocks, but that not only makes no sense, that would also shorten PS5 life so bad, that it could probably compete with first generation of X360.

By that point would the controller just need a fat CPU in there xD? Yeah, conceptually that wouldn't sound like a good idea, either. I know NAND endurance for TLC (which is what I suspect PS5 and XSX are using) would be better than in the past, but there's still a limit.

I think there will be only 1 GPU clock, and it will be either 2.23 Ghz or less and my gut feel is it will chage in 2ms steps and steps in frequency like smartshift (I read somewhere). Changing power for 2ms in advance or just as its happening is better than waiting 1000 millisconds for temperature rising steps is my "feel" for the efficiency, remove edge cases early before you get hot in first place and the old way of ramping the fan.

I look forward to more Cerny talk on this as its my cuppa tea.

But there's still way more questions than answers here. If it's autonomous on Sony's end, it must be pretty costly and/or sophisticated hardware (let alone what utility the OS has to manage for this, or if that utility is being done through microcode on the hardware for this purpose then that just adds to the cost) to implement. I can't picture this being cheaper than just going with fixed clocks and a very good fan/cooling system, not for the degree of timing adjustment they are claiming and knowing that the driving factor on this would be the power budget of the resources game code is using.

That's why I asked if there's something like flag exceptions in the game code to trigger detection for initiating a power load adjustment; if there's honestly nothing on the developer's end, then the hardware for enabling this must be somewhat costly because it has to be beefy enough to detect the power load at all times, and it has to have a means of being able to detect the power load and THAT needs to be able to do some type of analysis on what code is being generated I'd assume.

So we know SmartShift is part of the solution here, but it isn't the only component. At least SmartShift seems to be baked into the AMD architecture, so those costs would roll in with the costs for the APU itself on AMD's end in terms of R&D, silicon sourcing and pricing, etc. The more I think about it, between wondering what hardware is there to enable the variable frequency, the Dualsense controller tech, the hardware silicon in the I/O block and the cooling system, that $599 is starting to look like more of a reality and that's for "just" the 825 GB option (if there are somehow two SKUs planned).
 
Last edited:

martino

Member
I know it was fixed, but in general MS do have more abstraction than Sony, what do you think the real world percentage cost in % will be ?

Abstraction is great for BC, but its not free,

if i would bet from my ass i would say 2 to 5%
we are still in a case, even for the hypervisor, where it's tailored for few specifics hardwares

best question is what does it means for latency ?
 
Last edited:

Yoshi

Headmaster of Console Warrior Jugendstrafanstalt
You can look up rasterisation and Rops yourself, your homework, also look at Ps4pro...Rops..
I am not interested in finding out what they are, I am interested in finding out what you think they are.
Some reading on apis and abstraction differences on ps4 / xb1, we dont know what they are on ps5 and XSX yet...however, go read below.
What relevance does a technological issue in Xbox One back then have for PS5 / XSX?
Ps the graphic data that stays in RAM is the static data, the data for streaming changes dynamically therefore its the other way around.
The data you obtain from SSD does not change dynamically. You have a huge chunk of data that is static (the game itself), from which you poll different parts at different times and you have data that is dynamic because it gets computed based on user input every time. You know, there is a reason illegal copies of console games are often called ROMs (read only memory). In fact, physical games are still distributed in read only memory formats. This is only possible because the data is static. I really get the impression that your understanding of technological issues is about as deep as "of, static sounds negative, so if he uses static close to my favourite console, then he must be badmouthing my darling".

Reading static data is important to a game. The extremely fast access to static game data on Nintendo 64 was one of its major advantages over PlayStation.
 

geordiemp

Member
By that point would the controller just need a fat CPU in there xD? Yeah, conceptually that wouldn't sound like a good idea, either. I know NAND endurance for TLC (which is what I suspect PS5 and XSX are using) would be better than in the past, but there's still a limit.



But there's still way more questions than answers here. If it's autonomous on Sony's end, it must be pretty costly and/or sophisticated hardware (let alone what utility the OS has to manage for this, or if that utility is being done through microcode on the hardware for this purpose then that just adds to the cost) to implement. I can't picture this being cheaper than just going with fixed clocks and a very good fan/cooling system, not for the degree of timing adjustment they are claiming and knowing that the driving factor on this would be the power budget of the resources game code is using.

That's why I asked if there's something like flag exceptions in the game code to trigger detection for initiating a power load adjustment; if there's honestly nothing on the developer's end, then the hardware for enabling this must be somewhat costly because it has to be beefy enough to detect the power load at all times, and it has to have a means of being able to detect the power load and THAT needs to be able to do some type of analysis on what code is being generated I'd assume.

So we know SmartShift is part of the solution here, but it isn't the only component. At least SmartShift seems to be baked into the AMD architecture, so those costs would roll in with the costs for the APU itself on AMD's end in terms of R&D, silicon sourcing and pricing, etc. The more I think about it, between wondering what hardware is there to enable the variable frequency, the Dualsense controller tech, the hardware silicon in the I/O block and the cooling system, that $599 is starting to look like more of a reality and that's for "just" the 825 GB option (if there are somehow two SKUs planned).

I dont believe so, logic has to be cleverly though out and engineered but I dont think we will see a large chunk of silicon for this = hardware cost.

Yes Sony spent more on engineering is likely, dont know how each factor that into base conole prices. Both probably spend more on marketing lol.

I also dont think the cooling will be expensive, as Cerny himself said it was impressive, and impressive in engineering money means performant and cost effective, but I have no opnions on cooling as we have not seen it.

Sony looks to have spent more money on audio if you ask my thoughts, there is a shit load of R&D there.
 
Last edited:

Codeblew

Member
The difference between their CPU are small. 3.5ghz vs 3.8ghz. It’s basically nothing.
XSEX CPU is only 3.8gz when not in SMT mode. It is 3.66ghz when in SMT mode. I assume most games will use SMT mode since it is better performance if you use more than 8 CPU threads.
 

geordiemp

Member
You are actually delusional.

But do inform us of this supposedly latent I/O, I'm sure it'll be riveting fanfic.

BT_20190518_LLDKEP1_3784237_0.jpg

As I said, I wont get any intellectual converation with you, thanks. Do you ever discuss anything that interests you other than warrior mode ?

And no calling me names just looks bad on your IQ.

I will ingore you as you have nothing to say. Ever.
 
Last edited:

Tripolygon

Banned
Regardless of the fact that PS5 SSD architecture is faster, Xbox Series X SSD is really fast too and will enable amazing things and afford developers lots of freedom.

Q: What does Xbox Series X/next-generation development enable in current or future projects that you could not have achieved with the current generation of consoles?

A: There are loads of things for us to still explore in the new hardware but I’m most intrigued to see what we can do with the new SSD drive and the hardware decompression capabilities in the Xbox Velocity Architecture.

The drive is so fast that I can load data mid-frame, use it, consume it, unload and replace it with something else in the middle of a frame, treating GPU memory like a virtual disk. How much texture data can I now load?


This is the true game changer to me not just the raw SSD speed.
 
Last edited:

quest

Not Banned from OT
Not bad for reusing the OG xbox io and last second SSD phil got at best buy. From dirt developer. The PS5 has a bad ass storage solution but a 12tf apu can easily be fed with less with fancy tricks don't need force lol


A: There are loads of things for us to still explore in the new hardware but I’m most intrigued to see what we can do with the new SSD drive and the hardware decompression capabilities in the Xbox Velocity Architecture.

The drive is so fast that I can load data mid-frame, use it, consume it, unload and replace it with something else in the middle of a frame, treating GPU memory like a virtual disk. How much texture data can I now load?
 

THE:MILKMAN

Member
Not bad for reusing the OG xbox io and last second SSD phil got at best buy. From dirt developer. The PS5 has a bad ass storage solution but a 12tf apu can easily be fed with less with fancy tricks don't need force lol


A: There are loads of things for us to still explore in the new hardware but I’m most intrigued to see what we can do with the new SSD drive and the hardware decompression capabilities in the Xbox Velocity Architecture.

The drive is so fast that I can load data mid-frame, use it, consume it, unload and replace it with something else in the middle of a frame, treating GPU memory like a virtual disk. How much texture data can I now load?

Treating GPU memory (GDDR6) as a virtual disk? I thought the idea was a 100GB of the SSD was virtual RAM?
 
And you think the slower SSD and more importantly latent IO can feed high assets fast every frame ?

If your not going to engage intellectually why should I bother ?
"The drive is so fast that I can load data mid-frame, use it, consume it, unload and replace it with something else in the middle of a frame, treating GPU memory like a virtual disk. How much texture data can I now load? "

Do you think this is slow? These things seems to be crazy fast. Both are gonna be amazing.
 
Top Bottom