• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Simulating Gonzalo (Rumoured NextGen/PS5 leak)

llien

Member
Welp, isnt 200w where unbearably loud PS4 Pro is?
If so, no, thanks.

Betwen 5700 and XT looks like the most realistic bet.
Heck, which is quite a bump vs earlier hopes of betveen vega56 and 64.
And of course there is the spec arms race, so, perhaps, closer to XT.

And Im not buying the NotEnoughFor4k talk. It is about setings for the current gen, you CAN comfortably run most games at 4k on even non XT 5700.
And multiplier to be applied to the abstract next gen is highly arguable and Id argue that it is RT that will make a diference, potentially reducing raster load as the rastr tricks for shades and reflections could be done by RT instead.
 
Last edited:

R600

Banned
Welp, isnt 200w where unbearably loud PS4 Pro is?
If so, no, thanks.

Betwen 5700 and XT looks like the most realistic bet.
Heck, which is quite a bump vs earlier hopes of betveen vega56 and 64.
And of course there is the spec arms race, so, perhaps, closer to XT.

And Im not buying the NotEnoughFor4k talk. It is about setings for the current gen, you CAN comfortably run most games at 4k on even non XT 5700.
And multiplier to be applied to the abstract next gen is highly arguable and Id argue that it is RT that will make a diference, potentially reducing raster load as the rastr tricks for shades and reflections could be done by RT instead.
PS4Pro is rarely over 160W. Xbox One X can hit 190W, but rarely.
 

_sqn_

Member
Some god tier topic you made here kudo's for that.



tenor.gif




It's basically a stock 1080 gtx. LIke people expected with new stuff on it.

Honestly 20k score in total aint bad.
nope, rtx 2070 (even little above)
 
So i repeated the power testing over different frequencies and analized the power draw of the GPU die. For that i logged the die power reading with the highest possible sampling rate and averaged the power consumption of each datapoint over the duration of Graphics Test 1 in Fire Strike (GT1). GT1 is the most power hungry on average of the tests in Fire Strike I think. That's why i used it as a stress test.

powerscalinggpuonly10tjsx.png


Mind that because that i haven't done more runs on each datapoint the likelyhood for some deviation on the exact location of each datapoint is high, but that shouldn't harm the general trend to much (Have to choose your trade offs when time's limited).

So we generally see the same exponential behavoir as with the wall power chart, but in an even more pronounced manner. Going from 1750 MHz target clock to 2100Mhz doubles the power draw of the GPU die.

When we overlay the wall powerdraw from last time, we can see something interessting:

powerscalinggpuonlyszjn3.png


The differential between both lines - representing the power requirements for every other component besides the GPU die itself - is somewhat constant over the huge parts of the spectrum. That means that on the lower frequency side the non-GPU power is weighing in somewhat overproportionally compared to reference point at 1800Mhz F_target.


So you have the data. Do what you want with it. I will have a write-up of my thoughts on the matter later on.
 
125W APU @ 1.8GHZ .. good

thats 125W @1.75Ghz real frequency, and that's just GPU alone in my measurements. but you were probably right if converted to a APU: the CPU part is incredible efficient at 3.2Ghz (20-25W) and thats probably the power you save from redundant components and data traveling less "distance".

I just noticed this older graph .. what's happening after 1.8 ? .. the power really takes off relatively .. do you have the voltages for each frequency ?

you have to increase voltage overproportionally more at higher frequencies otherwise the GPU gets unstable. that's been this way in overclocking with every piece of silicon since forever.
yeah i noted the voltages for the different over- & underclocks somewhere. will provide them after i finalized my testing with my write-up.
 
So i got a second sample card with which i repeated the testing (what makes the possibility that both cards are golden samples or rubbish very slim). The second GPU behaves very much the same. I've gone a little more aggressive on the undervolts and better controlled the voltage changes for frequency steps. I didn't go as far as to hit the stability limit though, to ensure that this is still somewhat representative for the vast amount of silicon that will be produced. But the power consumption of the die went down quite a bit and the runaway power effect rose to a higher frequency.


Here are the results:

powerscalinggpuonly3pkch.png



You can see the quality improvements of the results if you compare them to the old ones:

powerscalinggpuonly232j3x.png




I just noticed this older graph .. what's happening after 1.8 ? .. the power really takes off relatively .. do you have the voltages for each frequency ?

I'm not too happy with this graph anymore. It is kinda deceiving. Firstly, the Y-axis scaling is very unfortunate because it kinda hides the differences. Secondly, at lower power levels the non-GPU power components start to dominate the whole picture which hides the impact of GPU scaling.

I think this graph is much better now:

powerscalingfactor2ojoe.png


One can see clearer now that the GPU power draw scales quite pleasently below 1800Mhz (F_target). If you go from 1800 to 1500 you can save 27% of GPU power for just 13% less performance.




*please note that i added one more frequency step at 2150Mhz in the recent graphs
 
Last edited:

SlimySnake

Flashless at the Golden Globes
So i got a second sample card with which i repeated the testing (what makes the possibility that both cards are golden samples or rubbish very slim). The second GPU behaves very much the same. I've gone a little more aggressive on the undervolts and better controlled the voltage changes for frequency steps. I didn't go as far as to hit the stability limit though, to ensure that this is still somewhat representative for the vast amount of silicon that will be produced. But the power consumption of the die went down quite a bit and the runaway power effect rose to a higher frequency.


Here are the results:

powerscalinggpuonly3pkch.png



You can see the quality improvements of the results if you compare them to the old ones:

powerscalinggpuonly232j3x.png




I'm not too happy with this graph anymore. It is kinda deceiving. Firstly, the Y-axis scaling is very unfortunate because it kinda hides the differences. Secondly, at lower power levels the non-GPU power components start to dominate the whole picture which hides the impact of GPU scaling.

I think this graph is much better now:

powerscalingfactor2ojoe.png


One can see clearer now that the GPU power draw scales quite pleasently below 1800Mhz (F_target). If you go from 1800 to 1500 you can save 27% of GPU power for just 13% less performance.




*please note that i added one more frequency step at 2150Mhz in the recent graphs
thanks for all your hard work. this is extremely interesting.

Do you think we can have a 56 CU GPU in a console under 200w? Or even exactly 200W? I would love to see these tests again when AMD eventually releases the big Navi with 48-56CUs. A 56 CU clocked at 1.55 Ghz gets us to 11.1 tflops which matches a couple of scarlett rumors. I wonder if adding 40% more CUs at 1.55Ghz adds 40% to the dize power. Do you happen to have the cut down 36 CU 5700 card by any chance?

The latest fluke leak also shows a cutdown CPU with maybe only 8MB of L3 Cache instead of the 32MB that comes with the 3700x. it performs like a 1700x as well. Its possible that sony or MS or whoever is making the gonzolo/flute was able to make the CPU smaller thus saving more in TDP.
 
thanks for all your hard work. this is extremely interesting.

Do you think we can have a 56 CU GPU in a console under 200w? Or even exactly 200W? I would love to see these tests again when AMD eventually releases the big Navi with 48-56CUs. A 56 CU clocked at 1.55 Ghz gets us to 11.1 tflops which matches a couple of scarlett rumors. I wonder if adding 40% more CUs at 1.55Ghz adds 40% to the dize power. Do you happen to have the cut down 36 CU 5700 card by any chance?

The latest fluke leak also shows a cutdown CPU with maybe only 8MB of L3 Cache instead of the 32MB that comes with the 3700x. it performs like a 1700x as well. Its possible that sony or MS or whoever is making the gonzolo/flute was able to make the CPU smaller thus saving more in TDP.

no i don't have a 5700 non-xt.

yeah i think that would be possible. at least with very sophisticated cooling (that means a step up from your standard vapour chamber / x1x). the problem about this line of thought is, just because something is possible doesen't automatically makes it real. the fluke leak is kinda symptomatic of this (i suppose with fluke you mean this one: https://pastebin.com/y8qXme7b). the problem with this is, if you followed and understood the next gen thread here, you can actually fabricate a very sophisticated looking "leak". the fluke one is a perfect example of that. the figures look plausible, yet it's fake as you can see from the stated GDDR6 bandwidth. with the stated 12 mem packages you would have 384 bit bus, which would give you at least around 670GB/s of bandwidth and NOT 560GB/s. at this point i would disregard every leak till announcement.


i don't think it makes sense to try to save further power on the CPU side, because that's already nearly down to nothing.
 

SlimySnake

Flashless at the Golden Globes
no i don't have a 5700 non-xt.

yeah i think that would be possible. at least with very sophisticated cooling (that means a step up from your standard vapour chamber / x1x). the problem about this line of thought is, just because something is possible doesen't automatically makes it real. the fluke leak is kinda symptomatic of this (i suppose with fluke you mean this one: https://pastebin.com/y8qXme7b). the problem with this is, if you followed and understood the next gen thread here, you can actually fabricate a very sophisticated looking "leak". the fluke one is a perfect example of that. the figures look plausible, yet it's fake as you can see from the stated GDDR6 bandwidth. with the stated 12 mem packages you would have 384 bit bus, which would give you at least around 670GB/s of bandwidth and NOT 560GB/s. at this point i would disregard every leak till announcement.


i don't think it makes sense to try to save further power on the CPU side, because that's already nearly down to nothing.
lol no, not the pastebin. Its a benchmark that showed up and komachi captured it.



The ID is 13F9. Gonzolo was 13F8. People think it could be the next revision of gonzolo. The benchmark was quickly deleted but it showed performance equivalent to a 1700x.

Here is a reply talking about how the cache was halved, if not quartered. Should make CPU smaller and more power efficient. that could explain how gonzolo can hit 1.8ghz.



I will try to find some screenshots of the benchmark.

Edit: Found it.

AMD-Flute-semicustom-APU-Zen-2-konzole-UserBenchmark-2.png


Edit # 2: See the link below to see a comparison with the Ryzen 1700x.

 
Last edited:

SlimySnake

Flashless at the Golden Globes
DemonCleaner DemonCleaner What do you think of the latest explosive Oberon leak?

2.0 Ghz???

Your tests show over 150w for the GPU alone at 2.0 ghz. Though i guess it would be 10% less if Sony is going with the 36 CU 5700. So 145w for the GPU alone. 20w for the CPU. 30W for RAM. Thats already 195w before SSD, UHD and other mobo stuff.
 

Hellgardia

Member
DemonCleaner DemonCleaner What do you think of the latest explosive Oberon leak?

2.0 Ghz???

Your tests show over 150w for the GPU alone at 2.0 ghz. Though i guess it would be 10% less if Sony is going with the 36 CU 5700. So 145w for the GPU alone. 20w for the CPU. 30W for RAM. Thats already 195w before SSD, UHD and other mobo stuff.

I think it's more likely for a lower clocked / higher CU count chip than a higher clocked / lower CU one...
 

V4skunk

Banned
GPU overclocking is different from CPU overclocking nowadays. you can't just lock the clock frequency of a GPU. on recent AMD cards you can just define a frequency max. clocks will allways autoadjust on environmental conditions (power limit, thermal limit, bandwidth constraints)



yeah, i will do something like that definitely (it's basically the whole goal of this exercise to get a feeling for the power/perf sweetspot of RDNA). it's a lot of work and im still figuring out what's the best way to do it (probably will set step up the power limit in incements. it won't work to step up the frequency as soon as you run into a power limited scenario. what makes it even harder is that you have to find a stable voltage to complement the coresponding clock rate. too many variables...). hope i get there on the weekend but no promises.



sorry man. maybe the image host isn't up for the task. has anyone else this problem?
If you use MSI afterburner you have much more direct control of the gpu than using the AMD software.
 
DemonCleaner DemonCleaner What do you think of the latest explosive Oberon leak?

2.0 Ghz???

Your tests show over 150w for the GPU alone at 2.0 ghz. Though i guess it would be 10% less if Sony is going with the 36 CU 5700. So 145w for the GPU alone. 20w for the CPU. 30W for RAM. Thats already 195w before SSD, UHD and other mobo stuff.

well i wouldn't rule it out. komachi has a good track record. he leaked the CU redesign for RDNA a few months in advance to the 5700 launch for example.

what i would rule out is, that it would be still on standard 7nm for the aforementioned reasons. must be TSMC N7P or 7nm EUV /6nm in that case.

what speaks for the leak is, that it has two legacy modes that imitate the PS4 and PS4pro clock frequencies. this fits in nicely with some sony patents for BC we've seen earlier this year.


i think the leak doesn't rule out sony or MS would go broader than 36CUs though.


btw: the 30W figure is not just the RAM. RAM alone would be ~18W in my example in post 2 at 24GB and 384bit bus. 16GB @256bit (what i sincerly hope wont happen because it would be a desaster) would be around 12W.

If you use MSI afterburner you have much more direct control of the gpu than using the AMD software.

i use afterburner+riva tuner statistics server for my frame by frame analysis on a daily basis. sadly i don't know what feature you mean. there are as far as i see it no additional GPU controls there compared to wattman.
 
Last edited:

V4skunk

Banned
well i wouldn't rule it out. komachi has a good track record. he leaked the CU redesign for RDNA a few months in advance to the 5700 launch for example.

what i would rule out is, that it would be still on standard 7nm for the aforementioned reasons. must be TSMC N7P or 7nm EUV /6nm in that case.

what speaks for the leak is, that it has two legacy modes that imitate the PS4 and PS4pro clock frequencies. this fits in nicely with some sony patents for BC we've seen earlier this year.


i think the leak doesn't rule out sony or MS would go broader than 36CUs though.


btw: the 30W figure is not just the RAM. RAM alone would be ~18W in my example in post 2 at 24GB and 384bit bus. 16GB @256bit (what i sincerly hope wont happen because it would be a desaster) would be around 12W.



i use afterburner+riva tuner statistics server for my frame by frame analysis on a daily basis. sadly i don't know what feature you mean. there are as far as i see it no additional GPU controls there compared to wattman.
Wattman sucks compared to Afterburner. In Afterburner i can control and lock clock speeds. Remove overclocking limits etc... Wattman is really basic.
 
Wattman sucks compared to Afterburner. In Afterburner i can control and lock clock speeds. Remove overclocking limits etc... Wattman is really basic.

sorry that's plainly not true. you can't lock clocks on recent architectures with any software anymore. wattman was basic when it was released. now it isn't. the GUI is different for every AMD arch. for Navi10 there are much more changable parameters as in afterburner at the time of writing this.
 

Colbert

Banned
DemonCleaner DemonCleaner
Nice analysis. I think you underestimate the power consumption but this is as near as you can get testing from a PC. My concern would still be the cooling with a 3.2 Ghz CPU and 1.8 Ghz GPU. I would rather think to increase the possible yield we look at 1.6 Ghz for GPU (+/-).

When it comes to estimate the die size I used a similar but less detailed approach. You can look at it here:
https://bit.ly/2lNGR9e
 
Last edited:
DemonCleaner DemonCleaner
Nice analysis. I think you underestimate the power consumption but this is as near as you can get testing from a PC. My concern would still be the cooling with a 3.2 Ghz CPU and 1.8 Ghz GPU. I would rather think to increase the possible yield we look at 1.6 Ghz for GPU (+/-).

When it comes to estimate the die size I used a similar but less detailed approach. You can look at it here:
https://bit.ly/2lNGR9e

Thanks and nice thread yourself! Yeah i also think they would/should go rather lower clock and broader on the GPU side as i wrote in my summery in post 2. this is provided they'd go with plain 7nm in the end. considering the state of 7nmEUV/6nm there might be the possiblity it was the plan to go with that all along and they therefore chose those relatively high clock speeds. after testing zen2 i can very confidently say that 3.2Ghz for the CPU even on 7nm really isn't a problem.

on being overoptimistic on the power consumption. i don't think so. granted i did choose relatively low voltages, but that's the only method to do a power over frequency analysis (that i know of). so you might have to use slightly higher voltages for each frequency point in a console to hit your required yields. on the other hand i used a lot of worst case asumptions in the whole process, that should compensate for that. again, please read post 2 or better even my post history in this thread concerning that.

what i really think some people in those nextgen threads are criminally underestimating is how hard 250W TDP (not PSU power) are to cool in a console sized system. that will not happen. even a 200W TDP console would require them to go through quite a lot of hoops and might be overly optimistic.
 

Proelite

Member
Any thoughts on 2ghz?

n7p theoretically should give 5-7% perf boost for free. Maybe that's why the 1.8ghz became 2ghz?
 
Any thoughts on 2ghz?

n7p theoretically should give 5-7% perf boost for free. Maybe that's why the 1.8ghz became 2ghz?


Well I think 2Ghz on plain 7nm in a console environment is really stretching it...

N7P (DUV-based) could be boarderline possible:



^ thats N7 vs. N7P. Strangely the gap seems to somewhat widen at higher clocks. The mentioned speed improvements at the same power are usually only seen in relatively simple logic. I doubt we would see a full mentioned 7% in something as complex as a 300-400mm die. The pro: you could just produce those with existing capacities.

So if AMD isn't reworking it's hardware pipeline / design philosophies further to even more improve clockspeeds [which i think is unlikely at this point in time], that leaves us with (EUV-based) N7+. On that front you have to ask if TSMC would be able to provide a suffcient amount of wafers for one or more mass market devices.
 
Last edited:
Power Prognosis:



powerbalancenbknz.png


So we learned from the testing that if we clock Navi down slightly, we improve power efficiency in an overproportional manner. So at F_t = 1500Mhz the Navi10 GPU die just consumed 87W. Lets round that up to 90W to ensure our underclock would be viable for a wider range of silicon quality.

So 90W for 40 active CUs. What would happen if we scale that up to a 52CU console APU as shown above?

Ok lets just assume the power at the same frequency would just scale linearly. That would be some sort of worst case scenario though, because in the scenario above the ratio from front-end components to CUs would not be constant in the higher CU GPU meaning that those parts would contribute less to the total power requirements of the die.

So linear scaling to 52 CUs would bring us to 117W die power. For the CPU side we just take our 24W we derived in the previous chapter.
That said, there are some redundant components in those GPU and CPU figures that wouldn’t be present twice in an APU (memcontrollers for example). So from that perspective that is yet again worst case.

Following our method from the Interpretation chapter, that would give us the following:

GPU: 117W
CPU: 24W
RAM: 12 x 1,5W = 18 W (12 x 2GB GDDR6 modules)
PCB/AUX: 15W
PSU losses: 31W

wallpower0qkpn.png



good work me of the past.
 
Top Bottom