• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Next-Gen PS5 & XSX |OT| Console tEch threaD

Status
Not open for further replies.

Locuza

Member
We can indeed look at the performance of current games on PS5 versus XSX versus PC and gain some confidence in the fact that if any downclocking is occuring, it simply isn't meaningful to overall performance. In almost every game tested, the PS5 GPU outperforms a PC desktop GPU with similar TF numbers. Likewise, in most non-BC titles those 10TF on PS5 are delivering higher performance than the 12TF on XSX... so again, any downclocking on the PS5 GPU isn't appearing to be meaningful to real-world performance.
You can gain that confidence and it may be true but again the statement wasn't that a huge downlock can happen but that 18% is the worst case TF advantage for the Xbox Series X.
It may grow to 20, 22 or even 25%, without real data we couldn't tell anway because even when the difference is larger, there could be other hardware and software factors which would mask the difference.

Of course, but then when the performance in real games consistently demonstrates the GPU is outperforming equivalent TF desktop GPUs and even the 12TF XSX GPU there becomes strong reason to simply believe that Mark Cerny wasn't lying or spouting optimistic marketing (and any historical precedent should tell you that Mark Cerny rarely does for that matter).

There's also the notion that the Road to PS5 talk was originally intended for devs. So there's very little reason to mislead devs with a statement like that. They are the ones who will be working with the hardware at the end of the day and will very quickly and very easily be able to call stuff like that out as BS if it was.

Normally, Technical Experts like Andrew Goosen on the Xbox side, and Mark Cerny on the PS side, speak plainly, concisely and factually. They aren't company PR people so aren't at all given to spouting marketing spiel. So it seems odd to want to dismiss their comments on the basis of "it could be marketing". These aren't the guys that do that.

If it was Spencer or Jim Ryan, then sure.
Whenever people speak to the public and are recorded, as is the case with the YT video, there is a legal and marketing step behind it.
With the last point I don't mean that you get sugar coated in BS but it does effect how information is presented and how some caveats are not mentioned.

For example a short interview with Mark Cerny and the modifications done for the PS4:
"The three "major modifications" Sony did to the architecture to support this vision are as follows, in Cerny's words:
  • "First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!"

You probably did not hear the caveat that the Onion+-Bus on the PS4 shares bandwidth with the Onion and CPU-Bus at the end and the more you stress the CPU busses, the larger the bw penalty on the GPU side is.
Though you might have seen this diagram:
PS4-GPU-Bandwidth-140-not-176.png


And it's just so that in practise you don't want to push 20GB/s through the bus.

In a similar vein saying that the PS4 supports 64 Compute Queues may give away the impression that all of that is useful in practise although you don't want more than a couple of compute queues, otherwise you will loose performance because of cache trashing and synchronization overhead.

In relative terms Mark Cerny is presenting technicals details really good and I take his claims seriously but one still shouldn't take it as absolute truth with no strings attached.

Further for the public realm it doesn't matter a lot if developers are mislead.
Few if any can/will mention (precise) technical details because of NDAs.
The CPU doesn't impact PS5's variable clocks. The clock frequency isn't varied on the basis of power; i.e. it's not measuring consumed power and adjusting frequency to keep within a threshold. It's deterministically adjusting frequency on the basis of GPU workload and GPU hardware occupancy. So whatever the CPU workload it won't impact GPU clocks.

It kinda seems like you might be conflating Smart Shift with their GPU variable clocks. Smart Shift will raise the power ceiling for the CPU if the GPU is idle, but I don't think it works the same the other way around because the GPU clocks are fixed at the top end, based not on overall APU power, but GPU stability limitations (as attested to by Cerny himself).

So I doubt the CPU running flat out would reduce the GPU power ceiling, as a) I would argue logically that as poor design, and b) it simply doesn't need to because the cooling system capacity will be sized for peak CPU and GPU power consumption (which with the variable GPU clock regime is still way below that of a fixed clock GPU).
It can because it's the other way around, if the CPU is not fully stressed the "activity/power credits" can be used by the GPU.
So if a game is heavily stressing the GPU and even exceeding the threshold, the extra budget from the CPU side may be enough to keep the GPU clock rates still at 2.23GHz.
But if both units are run close to the limit, there will be no extra budget left for the GPU and both can be forced to run at lower clocks.
PS5-Main-Chip-2-pcgh.png



That's not how it works. The GPU clock frequency adjustment is performed on the basis of GPU workload. So it will change rapidly and many times within the time span of rendering a single frame.

So for a 30fps game, you could see the clocks adjust up and down multiple times within that 33.3ms frame time.

So when Cerny says the GPU will be at max clocks most of the time, it's pretty clear he knows what he's talking about.
If there is enough headroom the GPU will never downclock and run at fixed 2.23 GHz, if the threshold is exceeded it has to lower the frequency and the avg. frequency over that time period will be lower as a result.
 
Last edited:
Would like to see it as well but it looks like its probably AMD.

Honestly, the text seems more "added" (Dont find my words, time to go bed 😆).

I was asking because yes Sampler feedback is an API level feature, with a custom optimization on XsX (hardware implementation of residency map).
But for the VRS, Nvidia has an hardware implementation since the rtx2000 series, and with all the information we have, I would be very surprised if it is not done in the same way on RDNA2.
 
Last edited:

Stooky

Member
You can gain that confidence and it may be true but again the statement wasn't that a huge downlock can happen but that 18% is the worst case TF advantage for the Xbox Series X.
It may grow to 20, 22 or even 25%, without real data we couldn't tell anway because even when the difference is larger, there could be other hardware and software factors which would mask the difference.


Whenever people speak to the public and are recorded, as is the case with the YT video, there is a legal and marketing step behind it.
With the last point I don't mean that you get sugar coated in BS but it does effect how information is presented and how some caveats are not mentioned.

For example a short interview with Mark Cerny and the modifications done for the PS4:
"The three "major modifications" Sony did to the architecture to support this vision are as follows, in Cerny's words:
  • "First, we added another bus to the GPU that allows it to read directly from system memory or write directly to system memory, bypassing its own L1 and L2 caches. As a result, if the data that's being passed back and forth between CPU and GPU is small, you don't have issues with synchronization between them anymore. And by small, I just mean small in next-gen terms. We can pass almost 20 gigabytes a second down that bus. That's not very small in today’s terms -- it’s larger than the PCIe on most PCs!"

You probably did not hear the caveat that the Onion+-Bus on the PS4 shares bandwidth with the Onion and CPU-Bus at the end and the more you stress the CPU busses, the larger the bw penalty on the GPU side is.
Though you might have seen this diagram:
PS4-GPU-Bandwidth-140-not-176.png


And it's just so that in practise you don't want to push 20GB/s through the bus.

In a similar vein saying that the PS4 supports 64 Compute Queues may give away the impression that all of that is useful in practise although you don't want more than a couple of compute queues, otherwise you will loose performance because of cache trashing and synchronization overhead.

In relative terms Mark Cerny is presenting technicals details really good and I take his claims seriously but one still shouldn't take it as absolute truth with no strings attached.

Further for the public realm it doesn't matter a lot if developers are mislead.
Few if any can/will mention (precise) technical details because of NDAs.

It can because it's the other way around, if the CPU is not fully stressed the "activity/power credits" can be used by the GPU.
So if a game is heavily stressing the GPU and even exceeding the threshold, the extra budget from the CPU side may be enough to keep the GPU clock rates still at 2.23GHz.
But if both units are run close to the limit, there will be no extra budget left for the GPU and both can be forced to run at lower clocks.
PS5-Main-Chip-2-pcgh.png




If there is enough headroom the GPU will never downclock and run at fixed 2.23 GHz, if the threshold is exceeded it has to lower the frequency and the avg. frequency over that time period will be lower as a result.

Lol, There’s enough power to supply both cpu and gpu full clocks. Under Load gpu /cpu can transfer power to help each other out, this can happen multiple times during frame render. We should never be able to tell the difference.
 
Jesus that thing is tiny. that would make this thing 20% smaller and more importantly 20% cheaper than the xsx. that explains how they were able to hit that $399 price point by just removing the disc drive. the BOM on the chip is probably
$30-40 lower.


god hes such a prick lmao.

Sony should also be able to manufacture many more PS5 chips than XSX chips. Supply is a big deal right now, and why they are leading the way in terms of sales since demand is still high for both.

Sony struck that right balance IMHO, even though I know you're big on TFLOPS.
 
I find that interesting, so according to that AMD chart VRS & SF is working on different gpu Architekturen and is a software/api feature


HVScojk.jpg
This feature is ready to be use in rtx 2000 series and has its limits as cannot be used in all the textures for each frame.


Also this feature is used for many for console warriors than doesn't know what is "Virtual Texture" or even its predecessor
tiled texture was use as the magic weapon for Xbox One, It funny the people really think all the marketing is true 100% without
doubt and they get angry if you are not agree.
 
Last edited:
Its not the english language thats the problem its your english terms and nouns that are confusing, 36cu's on the ps5 arent hypothesis they are physical. And i wasnt clear why you stated 1.8ghz which obviously everybody knows 36 cus at 1.8ghz would perform worse than series x.

No it's your reading comprehension that's the problem because nowhere did I say that the 36CUs are hypothetical. I'm arguing that the current 2.23GHz PS5 GPU is faster than a "HYPOTHETICAL PS5" with fixed clocks and the same number of CUs. The 1.8Ghz clock was a number I picked for a hypothetical PS5 with fixed clocks, because a PS5 with fixed clocks would not be able to run at 2.23GHz, it would be lower. It could be 1.86 or 1.9 or 1.7GHz, it doesn't matter, because the fact is, with fixed clocks it would be clocked lower than the variable clock PS5 at 2.23GHz max. and would thus perform worse.


About ps5 performing better in frame rates vs series x i said my guess would be the variable and efficient nature of ps5s design. That is my guess its not the universal truth it could be alot of things together from tools to the hardware itself. And the variable clocks to my guess are one of the reasons otherwise a theres no reason we shouldnt discuss it lets all go to sleep and its just magic that ps5 with 10tf is outperforming 12tf

Variable clocks only allows for the PS5 to run nominally at higher clocks. So it would be more technically correct to say the higher clocks is the reason for the higher performance on PS5.

If Cerny set the variable clock limit at 1.8GHz, the console would run worse than XSX in all games, universally.

So the variable clock regime itself doesn't have any inherent performance benefits. It only opens the door to clocking the system higher at the maximum, which means you perform better because of the higher clock-speed.

You can gain that confidence and it may be true but again the statement wasn't that a huge downlock can happen but that 18% is the worst case TF advantage for the Xbox Series X.
It may grow to 20, 22 or even 25%, without real data we couldn't tell anway because even when the difference is larger, there could be other hardware and software factors which would mask the difference.

I disagree based on my understanding of how the variable clocks regime as described by Cerny works. I highly doubt Sony would have designed a system that drops clock frequency often and significantly. It would have been better to go with a 52CU lower clocked chip like MS did otherwise.

They clearly didn't and designed for a smaller faster chip from the outset and they clearly reasoned their variable clock regime would net them sustained performance wins without needing to design a cooling system for edge case power-virus-like performance spikes that would rarely occur in QA'd and shipped game code.

It can because it's the other way around, if the CPU is not fully stressed the "activity/power credits" can be used by the GPU.
So if a game is heavily stressing the GPU and even exceeding the threshold, the extra budget from the CPU side may be enough to keep the GPU clock rates still at 2.23GHz.
But if both units are run close to the limit, there will be no extra budget left for the GPU and both can be forced to run at lower clocks.
PS5-Main-Chip-2-pcgh.png



I'm not seeing anything in the above diagram and Cerny talk that corroborates your claims here. Can I ask you where you find this information?

My understanding of Smart Shift is that it raises the power ceiling for the GPU or CPU based on the surplus power of the other processor available at any one time.

I can't see a reason why it should lower the floor of either, when you can just as easily size the cooling and power delivery solutions to cope with the worst case... as this is what you would do anyway with a conventional system with CPU and GPU with fixed clocks.

What you're suggesting would defeat the point of what Smart Shift is intended to achieve, in my mind. And may be wrong, but it would seem like a backwards approach.

If there is enough headroom the GPU will never downclock and run at fixed 2.23 GHz, if the threshold is exceeded it has to lower the frequency and the avg. frequency over that time period will be lower as a result.

Not at all, because the GPU frequency being dependent on workload and hardware occupancy will for 99% of cases during rendering not be running at near 100% utilisation and thus would rarely need to drop clocks. It's extremely hard to run at close to sustained, full GPU utilization, and essentially only happens in edge cases like with a power virus; which incidentally is precisely the case Cerny cites in his talk.

The system will rarely need to downclock the GPU, but that's precisely what Cerny himself said.

Edit:

R roops67 on here and Llabe Brave over on Era have discussed this topic in depth and written very good explanations about this and how it works:

 
Last edited:

M1chl

Currently Gif and Meme Champion
This feature is ready to be use in rtx 2000 series and has its limits as cannot be used in all the textures for each frame.


Also this feature is used for many for console warriors than doesn't know what is "Virtual Texture" or even its predecessor
tiled texture was use as the magic weapon for Xbox One, It funny the people really think all the marketing is true 100% without
doubt and they get angry in you are not agree.

Gotta be honest on this one, I got sucked into hype, because there was talks how it's DX12, Turing, RDNA2 feature and didn't know that back then it was virtual texturing, because I was sucked into thinking "it would not be advertised if it would not be new".

Pretty embarassing of me, since i have experience from this industry.
 
The system will rarely need to downclock the GPU, but that's precisely what Cerny himself said.

I also think it would be extremely bad for developers to deal with a system with highly variable clock speeds. One of the focuses of the PS5 was to make development easier not harder. Can't imagine extremely variable clocks making things easier for the devs.
 
Precisely. It would run counter to of the PS5's main design goals as you say.

Just imagine a really crazy extreme case.

If the PS5 had extremely variable clock speeds that would drop it often to around 4TF (again an extreme example) how are developers going to design their games around that?

I can only imagine it making development a lot more difficult because the systems power would be extremely unpredictable. If they had a system that would rarely drop from 10TFs that's a lot easier to work with.
 

Locuza

Member
I disagree based on my understanding of how the variable clocks regime as described by Cerny works. I highly doubt Sony would have designed a system that drops clock frequency often and significantly. It would have been better to go with a 52CU lower clocked chip like MS did otherwise.

They clearly didn't and designed for a smaller faster chip from the outset and they clearly reasoned their variable clock regime would net them sustained performance wins without needing to design a cooling system for edge case power-virus-like performance spikes that would rarely occur in QA'd and shipped game code.
52 CUs lead to a larger chip and fixed clock rates come with the downside of unused clock potential.
Which is a reason why every CPU and GPU nowadways is clocking dynamically based on the circumstances and the PS5 is using similar principles even though the parameters it uses are different and it follows a reference model for deterministic behaviour.

Just as an example, when the GPU is clocking at 2.23GHz it could consume 170W in one game and 200W in another.
That's because depending on the game more or less transistors will switch.
Logic which is not in active use is also aggressively clock gated, saving energy.

Now some dev studio might develop a game which behaves similar to a power virus and it would consume 250W at 2.23, so they would need to reduce the frequency to 1.9 GHz to be at 200W again, if that's the level of their power supply and cooling solution they want to be at.
If one is using a fixed clock target they must think about the worst case scenario with most transistors switching and they have to design the cooling solution around that.
That strategy is of course just adjusted around that worst case scenario and most games don't behave that way and performance potential for many applications is suffering because of that, leading to 1.9GHz for all games.
Instead it makes obviously sense to dynamically downclock the machine, when a certain threshold is reached.
Now most games can run at 2.23 and benefit from better performance while the worst case scenarios are also covered, where the machine will automatically downclock to 1.9GHz to keep the 200W target.
47-1080.2582637642.jpg


This is just an extreme illustration behind the idea, as Mark Cerny said, which is also evidently by the power/frequency curve from GPU measurements, if you run at the top frequency levels of a design and the PS5 is at a high spot, efficiency can be hugely improved by lowering the frequency by just couple of percent.

172884-mi60-vs-mi25-chart-1260x709_0.jpg


And those are Mark Cerny's words, if that's a trustworthy person for the people here and he further stated that they expect the processing units to stay most of the time at or close to max frequency, meaning that yeah it goes down and may even go down consistently but that they expect the downlock to be pretty minor.
Which ties into my initial statement, saying that 18% is the worst case advantage for the XSX, with the potential that it may be larger.
Later I said to 20, 22 or 25% just as an example and to also state that even with larger downclocks it's hard to tell in final games without precise data.
Now how it will develop over the years is something I definitely can't foresee and I take Mark Cerny's claim as an orientation point as many others.

I'm not seeing anything in the above diagram and Cerny talk that corroborates your claims here. Can I ask you where you find this information?

My understanding of Smart Shift is that it raises the power ceiling for the GPU or CPU based on the surplus power of the other processor available at any one time.

I can't see a reason why it should lower the floor of either, when you can just as easily size the cooling and power delivery solutions to cope with the worst case... as this is what you would do anyway with a conventional system with CPU and GPU with fixed clocks.

What you're suggesting would defeat the point of what Smart Shift is intended to achieve, in my mind. And may be wrong, but it would seem like a backwards approach.
According to the diagram and Mark Cerny's presentation, Smartshift appears to work only in one direction, giving power to the GPU if extra power is available.
This means that in practise the total GPU budget is tied to the CPU utilization.
It will never go down below a certain treshold, let's say 150W, but when both units are stressed the chances for a necessary GPU downlock could be much higher because the GPU can't use free power from the CPU side anymore.
Which is why I made the point that with next generation games which stress more of the GPU and the CPU, we may see the PS5 GPU clocking lower than 2.23GHz.
ps5-erklrungxvjw8.jpg

Not at all, because the GPU frequency being dependent on workload and hardware occupancy will for 99% of cases during rendering not be running at near 100% utilisation and thus would rarely need to drop clocks. It's extremely hard to run at close to sustained, full GPU utilization, and essentially only happens in edge cases like with a power virus; which incidentally is precisely the case Cerny cites in his talk.

The system will rarely need to downclock the GPU, but that's precisely what Cerny himself said.
Here is something which I don't really get.
I just said that if the threshold is exceeded the GPU has to downlock and the avg. clock rates for that time period will be lower.
You say "not at all" but at the same time you are basically saying the same thing, that the system will need to downclock the GPU in cases where the threshold is exceeded, just adding the stance that it should happen very rarely.

I have the feeling we basically agree on many points but my statements are taken further than they are.
 
So if a game is heavily stressing the GPU and even exceeding the threshold, the extra budget from the CPU side may be enough to keep the GPU clock rates still at 2.23GHz.
It is not true. The power budget for SoC PS5 is designed in such a way that there is no need to choose anything, it is enough to power both the CPU and the GPU. You just really misunderstand how variable frequency works. And all this was done for the sake of maximum efficiency, maximum efficiency when it is needed in every cycle. The most interesting thing is that you do not realize that Xbox Series X will suffer more from a lack of power when all GPU computing units are loaded (it also has a power supply unit of lower power), because they will consume the most energy, as well as simd instructions for the CPUs (AVX2) consume a lot of power as well. However, this is not all for the sake of dispute, I just know that you are mistaken in this case. With all due respect.
 

Panajev2001a

GAF's Pleasant Genius
there is no need to choose anything, it is enough to power both the CPU and the GPU
This is true and we also need to think about Smartshift being able to switch power usage allocation incredibly rapidly (as well as variable clock rate being able to lower power usage quickly with a very minor frequency drop for a brief time) mean that the impact on even titles that make efficient use of the HW are minor if not essentially zero (unless you count corner cases like static menu screens which are the use cases Cerny and many have highlighted about being there tons of wasted energy potential).

Smartshift also helps the GPU keeps the clocks high even if the GPU were to hit a corner case of very very high utilisation running very power hungry operations and the CPU has some headroom to spare... and again in real word cases there is almost always headroom especially given how fast the whole power transfer works (incredibly tiny fraction of a frame).

On the other side you could design PS5 software where the system may need to downclock as the fixed power allocation does not cover 100% utilisation at full clocks running power hungry instructions (which is an unlikely corner case, but devs do have power profiling in their tools and dev only clock profiles), but then the subtle hint Cerny was making in his DF interview for example was that a conventional fixed clock system tends not to be built for true 100% efficient utilisation with the most power hungry instructions at full speed either (you can take a PS4 or an Xbox and design a game that could shut it down not to ruin the HW).
 
Last edited:

M1chl

Currently Gif and Meme Champion
It is not true. The power budget for SoC PS5 is designed in such a way that there is no need to choose anything, it is enough to power both the CPU and the GPU. You just really misunderstand how variable frequency works. And all this was done for the sake of maximum efficiency, maximum efficiency when it is needed in every cycle. The most interesting thing is that you do not realize that Xbox Series X will suffer more from a lack of power when all GPU computing units are loaded (it also has a power supply unit of lower power), because they will consume the most energy, as well as simd instructions for the CPUs (AVX2) consume a lot of power as well. However, this is not all for the sake of dispute, I just know that you are mistaken in this case. With all due respect.
Well is it like that? If you have enough power budget you can stress the whole APU. Why would it not be possible. I am not sure what max TDP is, but so far we've seen max 200W of console power input. Which menas like 180W or so W for APU itself. I am simply not see the reason why on XSX you would be limited of power when you stress one part of the APU.
 

onesvenus

Member
My understanding of Smart Shift is that it raises the power ceiling for the GPU or CPU based on the surplus power of the other processor available at any one time.
How does that contradict his claim of being a 18% at worst? It's really simple and I can't see how people are trying to twist his words:
- On one side, if you only take into account the maximum theoretic TF value, XSX is 18% above PS5.
- On the other hand, the maximum theoretic TF value of the PS5 can be lower, we don't know to which extent, because it scales the frequency to be below, not above, the 2.23Ghz clock

That's the very definition of a lower bound. Thus you can say that the TF value of the XSX will be 18% higher than the one in the PS5 at worst.

Is that a simplification of the overall performance? Yes, it is, but regarding TF it's completely true and I can't see how an engineer like you can't see it.

The power budget for SoC PS5 is designed in such a way that there is no need to choose anything, it is enough to power both the CPU and the GPU
Do we have any proof of that?
There are two lines of thought here:
- Either the power budget is not enough to keep the CPU and GPU frequencies at their max values and when the CPU needs to do more work, the GPU frequency goes down
- Or either the power budget is enough to run both frequencies at max but they choose not to save on power consumption.

AFAIK we don't have any evidence supporting one case over the other.
 
Well is it like that? If you have enough power budget you can stress the whole APU. Why would it not be possible. I am not sure what max TDP is, but so far we've seen max 200W of console power input. Which menas like 180W or so W for APU itself. I am simply not see the reason why on XSX you would be limited of power when you stress one part of the APU.
The PS5's priority power system is designed to take into account the actual load with a constant voltage applied to the SoC. The variable frequency of the CPU and GPU is chosen in order to flexibly distribute the entire power package allocated to the SoC by tracking activity. I am surprised why everyone thinks that by seriously loading the CPU, the GPU will automatically downclock the frequency. It will drop only if it is not loaded enough and the CPU needs more voltage. If the load is so high that it loads the CPU and GPU by 100% each cycle (which is an almost unrealistic scenario), then to avoid overloading, the monitoring system will either slightly reduce the overall frequency for the CPU and GPU, or, based on activity data, select a higher priority option, for the CPU or for the GPU. You can think for yourself what will happen to the XsX chip in the same scenario.
 

geordiemp

Member
52 CUs lead to a larger chip and fixed clock rates come with the downside of unused clock potential.
Which is a reason why every CPU and GPU nowadways is clocking dynamically based on the circumstances and the PS5 is using similar principles even though the parameters it uses are different and it follows a reference model for deterministic behaviour.

Just as an example, when the GPU is clocking at 2.23GHz it could consume 170W in one game and 200W in another.
That's because depending on the game more or less transistors will switch.
Logic which is not in active use is also aggressively clock gated, saving energy.

Now some dev studio might develop a game which behaves similar to a power virus and it would consume 250W at 2.23, so they would need to reduce the frequency to 1.9 GHz to be at 200W again, if that's the level of their power supply and cooling solution they want to be at.
If one is using a fixed clock target they must think about the worst case scenario with most transistors switching and they have to design the cooling solution around that.
That strategy is of course just adjusted around that worst case scenario and most games don't behave that way and performance potential for many applications is suffering because of that, leading to 1.9GHz for all games.
Instead it makes obviously sense to dynamically downclock the machine, when a certain threshold is reached.
Now most games can run at 2.23 and benefit from better performance while the worst case scenarios are also covered, where the machine will automatically downclock to 1.9GHz to keep the 200W target.
47-1080.2582637642.jpg


This is just an extreme illustration behind the idea, as Mark Cerny said, which is also evidently by the power/frequency curve from GPU measurements, if you run at the top frequency levels of a design and the PS5 is at a high spot, efficiency can be hugely improved by lowering the frequency by just couple of percent.

172884-mi60-vs-mi25-chart-1260x709_0.jpg


And those are Mark Cerny's words, if that's a trustworthy person for the people here and he further stated that they expect the processing units to stay most of the time at or close to max frequency, meaning that yeah it goes down and may even go down consistently but that they expect the downlock to be pretty minor.
Which ties into my initial statement, saying that 18% is the worst case advantage for the XSX, with the potential that it may be larger.
Later I said to 20, 22 or 25% just as an example and to also state that even with larger downclocks it's hard to tell in final games without precise data.
Now how it will develop over the years is something I definitely can't foresee and I take Mark Cerny's claim as an orientation point as many others.


According to the diagram and Mark Cerny's presentation, Smartshift appears to work only in one direction, giving power to the GPU if extra power is available.
This means that in practise the total GPU budget is tied to the CPU utilization.
It will never go down below a certain treshold, let's say 150W, but when both units are stressed the chances for a necessary GPU downlock could be much higher because the GPU can't use free power from the CPU side anymore.
Which is why I made the point that with next generation games which stress more of the GPU and the CPU, we may see the PS5 GPU clocking lower than 2.23GHz.
ps5-erklrungxvjw8.jpg


Here is something which I don't really get.
I just said that if the threshold is exceeded the GPU has to downlock and the avg. clock rates for that time period will be lower.
You say "not at all" but at the same time you are basically saying the same thing, that the system will need to downclock the GPU in cases where the threshold is exceeded, just adding the stance that it should happen very rarely.

I have the feeling we basically agree on many points but my statements are taken further than they are.

Your analysis does not take into account the time granularity of smart shift, which is rumoured to be 2 ms and looking at it from a PC perspective rather than an APU..

You are trying to look at the power and frequency usage as an average.

In any 2 ms, both CPU and GPU can have different levels of activities and across a frame this is not the same as a PC average analysis.

See spiderman example

Gp6UMYl.png


The final 2 data points are 6700 clocks and the fact that turning off the centrifugal fan on ps5 for games tested and ps5 kept going for a long time. This suggests there is massive contingency and probably over engineered for the settings and ps5 could probably do 6700 frequencies..
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
How does that contradict his claim of being a 18% at worst?

Well, perhaps because it is a pretty uninteresting point to say 18% is the worst possible gap if you are only trying to talk about very very rare scenarios where clock scaling is holding the game back significantly as if they were extremely meaningful. We can either go with the “technically true is the best kind of true” approach or the “trying to suggest the situation is worse/scarier than it is” or somewhere in the middle.

Another key thing is that 18% is only the CU TFLOPS difference, there is a lot more HW in the GPU that benefits from the higher clocks (all the hardware outside of the CU’s: RB’s, HW Scheduler, Rasteriser, Geometry Engine, etc...)... including the CU’s themselves when running branchy code for example and when doing repeated computation on the same CU that might not trivially scale linearly as you try to spread it to many CU’s. It also ignores the extra pressure that adding more CU’s to the same sized Shader Array shared L1 cache does.

Still relevant (indications and numbers MS has provided indicate that under the per clock throughput of the “common” HW outside of the CU’s and the GDDR6 memory controller is the same so this gives a good picture once you adjust it for the number of CU’s):
hLybZbu.jpg

IzXQkeV.jpg

Ai1cklr.jpg
 
Last edited:

onesvenus

Member
Well, perhaps because it is a pretty uninteresting point to say 18% is the worst possible gap if you are only trying to talk about very very rare scenarios where clock scaling is holding the game back significantly as if they were extremely meaningful. We can either go with the “technically true is the best kind of true” approach or the “trying to suggest the situation is worse/scarier than it is” or somewhere in the middle.

Another key thing is that 18% is only the CU TFLOPS difference, there is a lot more HW in the GPU that benefits from the higher clocks (all the hardware outside of the CU’s: RB’s, HW Scheduler, Rasteriser, Geometry Engine, etc...)... including the CU’s themselves when running branch code for example and when doing repeated computation on the same CU that might not trivially scale linearly as you try to spread it to many CU’s. It also ignores the extra pressure that adding more CU’s to the same sized Shader Array shared L1 cache does.

Still relevant (indications and numbers MS has provided indicate that under the per clock throughput of the “common” HW outside of the CU’s and the GDDR6 memory controller is the same so this gives a good picture once you adjust it for the number of CU’s):
hLybZbu.jpg

IzXQkeV.jpg

Ai1cklr.jpg
I agree with all you said, I was just pointing that exact sentence is not wrong. It's only talking about TFlops and nothing else, which as you and others said, doesn't give us all the information, but that doesn't make that exact sentence false.
One thing is what he is saying and another entirely different is what we make up of what he is saying when we read it, we should all be more precise on not thinking people are saying things they are not.
 

jroc74

Phone reception is more important to me than human rights
I also think it would be extremely bad for developers to deal with a system with highly variable clock speeds. One of the focuses of the PS5 was to make development easier not harder. Can't imagine extremely variable clocks making things easier for the devs.

Precisely. It would run counter to of the PS5's main design goals as you say.
And like yall said....game analysis, comparisons should already confirm this.
 

M1chl

Currently Gif and Meme Champion
It is not true. The power budget for SoC PS5 is designed in such a way that there is no need to choose anything, it is enough to power both the CPU and the GPU. You just really misunderstand how variable frequency works. And all this was done for the sake of maximum efficiency, maximum efficiency when it is needed in every cycle. The most interesting thing is that you do not realize that Xbox Series X will suffer more from a lack of power when all GPU computing units are loaded (it also has a power supply unit of lower power), because they will consume the most energy, as well as simd instructions for the CPUs (AVX2) consume a lot of power as well. However, this is not all for the sake of dispute, I just know that you are mistaken in this case. With all due respect.
The PS5's priority power system is designed to take into account the actual load with a constant voltage applied to the SoC. The variable frequency of the CPU and GPU is chosen in order to flexibly distribute the entire power package allocated to the SoC by tracking activity. I am surprised why everyone thinks that by seriously loading the CPU, the GPU will automatically downclock the frequency. It will drop only if it is not loaded enough and the CPU needs more voltage. If the load is so high that it loads the CPU and GPU by 100% each cycle (which is an almost unrealistic scenario), then to avoid overloading, the monitoring system will either slightly reduce the overall frequency for the CPU and GPU, or, based on activity data, select a higher priority option, for the CPU or for the GPU. You can think for yourself what will happen to the XsX chip in the same scenario.
The bolded one is what I am talking about. I know about PS5, sorry that I didn't specified that. So I am asking if there is some additional info about power delivery on XSX?
 
So I am asking if there is some additional info about power delivery on XSX?
Well, it meant that if the PS5 suffers from a lack of power, then on the XsX it will be even more pronounced. But I doubt that this is generally the case that needs to be discussed. Personally, I like both systems, each has its own interesting features and all that remains is to see the games.
 

M1chl

Currently Gif and Meme Champion
Well, it meant that if the PS5 suffers from a lack of power, then on the XsX it will be even more pronounced. But I doubt that this is generally the case that needs to be discussed. Personally, I like both systems, each has its own interesting features and all that remains is to see the games.
Ohhh, I didn't read what you were reacting to. Sorry, never mind. I don't think that both of manufacturers underestimate PSU and power delivery, seems like the number of VRM on the board is almost doubled from the last gen on both of these machines, so I am not worried : )
 
And like yall said....game analysis, comparisons should already confirm this.
All games so far have been built to run on Jaguar CPUs, so the Ryzen CPU in the PS5 is half asleep anyway. The one exception I can think of is Avengers, where the CPU gets hit really hard, and those performance results have been quite interesting indeed.
 

Elog

Member
I still think that a lot of you are not aware how little of the silicon that is actually used at "100% GPU and CPU" utilisation at a given frequency. The actual power consumption will be a function of the actual amount of transistors that are engaged and not just the frequency.

Here is a sample chart of two AMD cards (Nvidia is no different) from Wu et al (https://doi.org/10.1145/3377138).
9mvzlRf.png
As you can see from the graph for a sample "100% GPU" utilisation workload the average actual utilisation is around 40% or so. This is mainly driven by two things: Architecture (i.e. how well can the GPU feed the CUs with relevant information and how are all the individual components scaled vis-a-vis each other to avoid queuing) and programming (i.e. how well is the code written to avoid underutilisation).

This is why power consumption can vary like crazy at the same "100% utilisation" and frequency. This is what most people miss when discussing the smart shift technology and cooling solutions. I am fairly certain that most of our GPUs in our PCs would have severely inadequate cooling solutions if the software actually could get 100% of the transistors to be used - the GPU would shut down because the cooling solution has been selected to deal with a standard world and some more - which ends up fairly far from 100% of actual utilisation.

In other words - looking at the graph above, if you have a theoretical peak TFLOP card with 10 TFLOPs the graphics you see on the screen represent an actual TFLOP output of roughly 4 TFLOPs. The rest is not used. What Cerny has tried to do with the PS5 is to increase that 40% to a higher number. Let us for the sake of argument say that a design increases that number to 50% instead of 40% due to better I/O - that is a 25% increase in performance. Now everyone can see that an 18% difference in theoretical peak TFLOP is not that much at the end of the day.

The second dimension - as mentioned - is software and for consoles this is the development environment and how optimised the API set is in utilising the silicon. It is not random that Sony spent all that time with Epic to ensure that the UE5 Sony module is optimised as much as it can be for the PS5. That is probably more important than the hardware as such given how many transistors that are unused on average. Personally, I believe that MS decision to unify the development environment across PC, XSS and XSX is a challenge here - it makes it increasingly harder to get good transistor utilisation when you go for one-solution-fits-all.

If you are successful as a hardware and software designer in increasing the transistor utilisation you need a system that can handle that. That means you need a really good cooling solution (we all know what happened towards the end with the PS4 when the transistor utilisation went up the cooling solution was on life-support to handle it) and you need a power management system that can survive a fairly significant increase in transistor utilisation rate. Here is the reason for the dual side- and liquid-cooling solution and why Sony went with Smart shift. That is why Smart shift currently does not mean that the frequency is decreased at all - but in the future that might very well be the case if transistor utilisation is increased. Please note, if transistor utilisation is increased by 25% so the power draw is too high you probably need to adjust frequency by less then 10% to make the thermal and power envelope adequate. In other words - it is still a performance win. Or in other words - you will not need to down clock unless you see a net performance increase from increased transistor utilisation.

Apologies for the long post - I just feel that the discussion ends up in some weird rabbit holes at times.
 
Last edited:

Rea

Member
I see some people still don't understand about PS5 variable frequency.

"Regarding locked profiles, we support those on our dev kits, it can be helpful not to have variable clocks when optimising. Released PS5 games always get boosted frequencies so that they can take advantage of the additional power," explains Cerny.

Even if the dev lock the frequency to 2.0ghz or even 1.8ghz during optimisation, the final games will be boosted automatically by ps5 and the consumers like us will enjoy extra performance.
 
No it's your reading comprehension that's the problem because nowhere did I say that the 36CUs are hypothetical. I'm arguing that the current 2.23GHz PS5 GPU is faster than a "HYPOTHETICAL PS5" with fixed clocks and the same number of CUs. The 1.8Ghz clock was a number I picked for a hypothetical PS5 with fixed clocks, because a PS5 with fixed clocks would not be able to run at 2.23GHz, it would be lower. It could be 1.86 or 1.9 or 1.7GHz, it doesn't matter, because the fact is, with fixed clocks it would be clocked lower than the variable clock PS5 at 2.23GHz max. and would thus perform worse.




Variable clocks only allows for the PS5 to run nominally at higher clocks. So it would be more technically correct to say the higher clocks is the reason for the higher performance on PS5.

If Cerny set the variable clock limit at 1.8GHz, the console would run worse than XSX in all games, universally.

So the variable clock regime itself doesn't have any inherent performance benefits. It only opens the door to clocking the system higher at the maximum, which means you perform better because of the higher clock-speed.



I disagree based on my understanding of how the variable clocks regime as described by Cerny works. I highly doubt Sony would have designed a system that drops clock frequency often and significantly. It would have been better to go with a 52CU lower clocked chip like MS did otherwise.

They clearly didn't and designed for a smaller faster chip from the outset and they clearly reasoned their variable clock regime would net them sustained performance wins without needing to design a cooling system for edge case power-virus-like performance spikes that would rarely occur in QA'd and shipped game code.



I'm not seeing anything in the above diagram and Cerny talk that corroborates your claims here. Can I ask you where you find this information?

My understanding of Smart Shift is that it raises the power ceiling for the GPU or CPU based on the surplus power of the other processor available at any one time.

I can't see a reason why it should lower the floor of either, when you can just as easily size the cooling and power delivery solutions to cope with the worst case... as this is what you would do anyway with a conventional system with CPU and GPU with fixed clocks.

What you're suggesting would defeat the point of what Smart Shift is intended to achieve, in my mind. And may be wrong, but it would seem like a backwards approach.



Not at all, because the GPU frequency being dependent on workload and hardware occupancy will for 99% of cases during rendering not be running at near 100% utilisation and thus would rarely need to drop clocks. It's extremely hard to run at close to sustained, full GPU utilization, and essentially only happens in edge cases like with a power virus; which incidentally is precisely the case Cerny cites in his talk.

The system will rarely need to downclock the GPU, but that's precisely what Cerny himself said.

Edit:

R roops67 on here and Llabe Brave over on Era have discussed this topic in depth and written very good explanations about this and how it works:

Here is where you mentioned a hypithetical 36cu gpu with fixed 1.8ghz maybe i understood you wrong

8o6dY5g.jpg
And yes variable clocks are one of the reasons the ps5 holds steady frame rates i dont get whyyou cant comprehend this. The nature of modern engines is variable everything is becoming variable this days from resolution to effects the engine is constantly flactuating things to keep a target frame rate and this needs an efficient adaptable silicon. Soon well even have variable compute units not just clocks but thats just my guess so back to ps5 yes fixed 36cus with a fixed 2.23ghz could have been a thing but its pointless because not every workload needs constant fixed clocks every frame as ive mentioned before you dont need 2.23ghz on a gpu when looking at the sky. So whats the point of fixed clocks?

The series x is having a hardtime wasting constant power on fixed clocks its just common sense traditional fixed clocks are an old tradition. Because the games arent rendering a fixed budget everything is variable you dont need 100% clocks, cores, vram in every frame or workload this creates throttling problems the series x would have beaten the ps5 or become more efficient if it variably clocked from 1.8ghz to 2ghz but its needlessly stuck at 1.8ghz because that was the heat target but if never needed that because it already keeps solid frame rates just not aswell as the ps5 if it variably clocked from 1.8 to 2ghz it would have kept stefier frame rayes than ps5 and remember game engines are still rasterized and clocks win in rasterization its why a faster 2.23ghz 10 teraflop gpu beats a 1.8ghz 12 teraflop gpu cause your not solving astronomy and the big bang with a game thats 10% of the time the 90% of the time a game is just trying to throw polygons, textures and pixels on screen. So the variable clocks help because without them the ps5 might have been clocked lower and suffered but it isnt because the variable clocks make it hit 2.23ghz without over heating. It was an engineering challenge and cerny solved it. And the games show it.
 

skit_data

Member
Here is where you mentioned a hypithetical 36cu gpu with fixed 1.8ghz maybe i understood you wrong

8o6dY5g.jpg
And yes variable clocks are one of the reasons the ps5 holds steady frame rates i dont get whyyou cant comprehend this. The nature of modern engines is variable everything is becoming variable this days from resolution to effects the engine is constantly flactuating things to keep a target frame rate and this needs an efficient adaptable silicon. Soon well even have variable compute units not just clocks but thats just my guess so back to ps5 yes fixed 36cus with a fixed 2.23ghz could have been a thing but its pointless because not every workload needs constant fixed clocks every frame as ive mentioned before you dont need 2.23ghz on a gpu when looking at the sky. So whats the point of fixed clocks?

The series x is having a hardtime wasting constant power on fixed clocks its just common sense traditional fixed clocks are an old tradition. Because the games arent rendering a fixed budget everything is variable you dont need 100% clocks, cores, vram in every frame or workload this creates throttling problems the series x would have beaten the ps5 or become more efficient if it variably clocked from 1.8ghz to 2ghz but its needlessly stuck at 1.8ghz because that was the heat target but if never needed that because it already keeps solid frame rates just not aswell as the ps5 if it variably clocked from 1.8 to 2ghz it would have kept stefier frame rayes than ps5 and remember game engines are still rasterized and clocks win in rasterization its why a faster 2.23ghz 10 teraflop gpu beats a 1.8ghz 12 teraflop gpu cause your not solving astronomy and the big bang with a game thats 10% of the time the 90% of the time a game is just trying to throw polygons, textures and pixels on screen. So the variable clocks help because without them the ps5 might have been clocked lower and suffered but it isnt because the variable clocks make it hit 2.23ghz without over heating. It was an engineering challenge and cerny solved it. And the games show it.
Stop dude, hes discussing a non existing hypothetical 36 CU GPU at 1.8 Ghz.

God, you are like the monarchjt of sony fanboys
 
Last edited:

Interfectum

Member
Bottomline... PS5 is keeping up and, a lot of times, staying ahead of Series X across all of these ports. Months before release no one predicted this would happen even when watching the Cerny video. He clearly engineered a fantastic console.

I think it'll be interesting to compare games made from the ground up for next gen in a couple years. Will the RDNA2 / updated tools show it's secret sauce on Series X or will PS5 keep pace with its speedy I/O? :pie_thinking:
 
Last edited:

IntentionalPun

Ask me about my wife's perfect butthole
I find that interesting, so according to that AMD chart VRS & SF is working on different gpu Architekturen and is a software/api feature


HVScojk.jpg
Someone took 2 slides from an AMD presentation and then added the sentence at the bottom.

On AMD's site they call VRS "hardware supported" only on RDNA 2 cards:

VRS is hardware supported on AMD RDNA 2 architecture graphics cards such as the AMD Radeon RX 6000 Series.


They don't mention hardware with Mesh Shaders, which has everyone wondering if it will exist outside of the 6X series.. but DX12U certainly only supports them on RDNA2 and Turing (nVidia) cards.

And right now OpenGL and Vulkan only support mesh shaders on nVidia cards.. they don't have any AMD version, not for the 6x cards yet either.

And it honestly doesn't make a ton of sense to think these all aren't features requiring specific hardware... it's not like Vulkan and OpenGL have any reason not to support advanced features on any GPU they can, and really at this point why would Microsoft even ignore cards for DX12U? Devs don't have to use DX12U and if DX12U could support all of these things on RDNA1, why wouldn't they?

They are all features that require APIs built around them because they modify how the graphics pipeline works... not because this is all "hardware independent" stuff that is largely done in software.
 
Last edited:
Stop dude, hes discussing a non existing hypothetical 36 CU GPU at 1.8 Ghz.

God, you are like the monarchjt of sony fanboys
AS i said i might have read him wrong and you sound disturbed jumping and calling people fanboys just because they arent fantasising with ur ideas. Infact it takes a fanboy to know one so you must be the mother of all xbox fanboys. Likewise
 
Status
Not open for further replies.
Top Bottom