• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Next-Gen PS5 & XSX |OT| Console tEch threaD

Status
Not open for further replies.

DaMonsta

Member
You're not saying what you think yourself. I don't care about outside sources regarding this thing. I care about what people on GAF are saying when I'm on GAF.

Just like another @IntentionalPun who just got banned from this thread, he also didn't express his stance or understanding either and just tried to pick apart my posts.

Again, are you saying XSX has variable clocks or fixed clocks? I don't understand why you are having such difficulty saying what you think on this particular issue.
The hell are you talking about man, lol.

Your stance was that Microsoft was to blame for the confusion over the PS5s variable clocks because they advertised the Xbox clocks as fixed.

Now you are talking some whole other stuff. I don’t even know what to say at this point so, whatever man believe what you want to believe.
 
What I understand with fixed versus variable is the following.
MS will have PL levels fixed in hardware, so PL0 is when it is doing absolutely nothing and clocks will be substantially lower and fixed at some value like (this is just hypothetical) 300MHz or 400MHz something. At this level it goes to store, opens a website pretty much basic level.
PL1 is like 1000MHz for side crawler games, 2D games and the like, it's for games that are not pushing the hardware at all.
and finally PL2 1825MHz for all the other games.
All these frequencies are fixed at a value.

For Sony there will be an envelope for these power levels, like PL0 is between 300-400 MHz, PL1 is 900-1000MHz and PL2 is 2230-2130MHz and it will vary in MHz between the top end and the bottom end of the envelope and it stays in the power target. Staying in power target is a guarantee with Sony's setup, what is varying in is frequencies.

Although I like to remind everyone that all hardware in the market have 'race to idle' set in its profile and these switches can happen in nano seconds (the switching itself), and sometimes stay in that switched level only a few nanoseconds to with back to a previous power level. And for people with GPUz can confirm this by looking at the core clock freq. while doing stuff like browsing internet, watching YT, and watching highly compressed HEVC, playing a basic 2D game, playing 3D side crawler and playing a game that pushes the hardware to its limits. All have different levels of power targets and PL levels, and actually it varies a lot even at a high PL level as it is generally divided into smaller sub PLs just like a target envelope in the PC space.

So from the confusion here I think no one have an enthusiast grade PCs and tried OCing their hardware, because if you did then you would know about PLs defined in GPU card BIOSes and how you could change the behavior of the card, even down to defining new fan speeds for OCed PL right down in the nitty gritty part of the hardware.

Actually I remember writing over HEX bytes in the BIOS and unlocking a gimped Nvidia card to its full potential when manufacturer disabled SMs that were actually working, and also bricking an AMD card trying to do the same and sending it to RMA with no hopes but receiving a new card all the same, then doing it again but successfully this time, that was how I got my 290 turn into 290X.
 
vomit1.gif


Imagine 10 wankers liking this sh*tpost. Good lord, console warriors are borderline pathologically disturbed.
You're response (to be quite honest) is shit. Imagine labeling others wankers because they liked another members post.
 
How are they getting to that number? Each GDDR6 chip provides 56 GB/s of bandwidth, right? So if the CPU only needs 1 GB of data, if it's on one chip, then 48 GB/s of CPU access would be able to be provided through one chip, right?

Unless I have that wrong, of course. That said I thought about the possibility if the amount of the slower memory bandwidth can be dynamically accessed, i.e if non-GPU processors like the CPU only need a small slice of the 336 GB/s bandwidth total, the system just gives as much as needed. For example if the CPU only needs 112 GB/s bandwidth, just two of the 2 GB chips are tapped for their bandwidth, while the others can still stay utilized by the GPU, which would include the 1 GB chips that are not given to the slower bandwidth pool.

Figure that should be possible, no reason to lock off six chips for slower bandwidth to something like the GPU on a given set of cycles when there could be many instances where the CPU only needs maybe one or two chips i.e very small amounts of that total slower bandwidth figure. Guess we need more info on how the memory setup works, but I picture that type of dynamic range to the slower pool would help somewhat with contention issues.

And I mean honestly, it's just a bit of an assumed conclusion; I've seen some people elsewhere seemingly think that when the non-GPU processors are accessing the memory, it "locks up" the six 2 GB chips altogether from GPU access regardless of what amount of memory and bandwidth those other processors need. That sounds like a dumb design decision oversight IMHO; would make more sense that the setup is those chips can access up to 336 GB/s bandwidth through the lower bound 1 GB portions of the six 2 GB chips, not that those chips are locked up as some sort of mode setting regardless of the amount needing to be requested.

As far as any upgrades, well if we don't hear anything about delays by sometime in June we can assume the systems are set for release this year. If anything gets announced for delays, it'll be before July IMHO. Just a gut feeling (plus usually that's about the time full production on the systems at mass scale begins I believe, for fall launches).

This description is exactly correct. Thank you for posting this.
 
My argument is about GPU's point of view on memory bandwidth. My reason for 168 GB/s is to directly debate against Lady Gaia's 48 GB/s CPU argument.

My argument is based on GDDR6's dual-channel per chip with either 6GB memory range's odd or even channels still allowing the GPU to access 10GB address range while CPU/DSP/File IO/residual GPU is busy with 3.5GB address range.

This is exactly correct. The XSX GPU can always access the lower 1GB of all ten memorychip paths @56GB/S. Always. There is zero contention in that pathway as that uses the 320 Bit bus (32bit x 10 lanes).

The CPU also has access to both the 320Bit bus as well as the 192bit (32 bit x 6 lanes) bus to access the top 1gb of the 6 2GB chips. The CPU can access both busses while the GPU can only see 10GB as thats the only bus its attached too.

OS Audio and other slow memory needs will optimally sit on the 192 bit bus inside of the 6 1Gb chips allocated to slower memory (56Gbps X 6 lanes = 336gbps).
 

Shmunter

Member
If you REALLY had looked at my post history, you would have known I always wanted both consoles to be the same, I was even ok if MS would have a slight edge(what turned out to be true). It's Xbox fanboys who try to push this narrative that there's this huge gap between SX and PS5, when they just have different approaches to reach their goals.
There is a huge gap. But it’s not in computing power, it’s in the memory subsystem.
 

Gavin Stevens

Formerly 'o'dium'
If you have two options:

A) you can eat every day for a week.

B) you can eat only five days out of a week, maybe four.

Which is the better option?

Oh and btw, option A had better meals, with better quality meat. But gets to you a little slower each day.

Tough one huh.
 
Last edited:
I think MS decided to emphasize it was fixed because they somehow got wind that the PS5 was not fixed. Variable is the standard but it is still very different in the console space where a piece of hardware cannot expect piece upgrades over the next 5-7 years. This is a huge change for the console market and one I welcome. Everybody saying that the XSX is more PC like, I completely disagree. I think the PS5 is the one that is taking a massive step towards PC while also innovating with the SSD in a way even PCs haven't seen yet.
PC'S don't have one pool of memory. One CPU pool that's slower ddr4 and one pool of faster ddr6 .
 
There is a huge gap. But it’s not in computing power, it’s in the memory subsystem.

The memory subsystems gap is vast if you include in the DIrect Storage element where the XSX has direct and instant access 100Gb of game data with no need to stream. Although used before on creator systems with the Radeon Pro SSG, we don't know how that would operate in a game setting.

As far as compute power, if the XSX keeps the same TMU and ROPS count as what we expect for the PS5 (144 and 64 respectively) then the PS5 will actually have the texture and pixel advantage @ 2.23 Ghz. I would not expect MS to hamper their design by increasing their CU count to 52 and not increasing TMU and Rops count accordingly. (I don't think TMUs can be separated from CUs in the AMD architecture can they?)

That said, since we expect that XSX is really a 56 CU part with 4 CU disabled we can look for similarly high then disabled units in the Shaders TMU and ROPS count as well. A comparison of *potential* compute power would look like this:

System GPU Speed CU(base/active) Shaders 64* (base/active) TMUs (base/active) Rops (base/active)

PS5 2.233GHZ 40/36 2560/2304 160/(144*2.233Ghz) = 321.6 Gtexels 64*2.233Ghz = 142.9 Gpixels
XSX (nom) 1.825GHz 56/52 3584/3326 160/(144*1.825Ghz) = 262.8 Gtexels 64*1.825Ghz = 116.8

However I think the TMU and ROP counts dont match the CU in a way that we know AMD designs their cards. The best guess as to the graphics pipeline of XSX GPU comes from here:

XSX (exp) 1.825Ghz 56/52 3584/3328 224/(208*1.825Ghz) = 379 Gtexels 80*1.825ghz = 146 Gpixels

So the jump to 2.233 really DOES help the PS5 close the power gap considerably if the numbers can be effectively maintained in a game scenario.
 
Last edited:

rnlval

Member
This is exactly correct. The XSX GPU can always access the lower 1GB of all ten memorychip paths @56GB/S. Always. There is zero contention in that pathway as that uses the 320 Bit bus (32bit x 10 lanes).

The CPU also has access to both the 320Bit bus as well as the 192bit (32 bit x 6 lanes) bus to access the top 1gb of the 6 2GB chips. The CPU can access both busses while the GPU can only see 10GB as thats the only bus its attached too.

OS Audio and other slow memory needs will optimally sit on the 192 bit bus inside of the 6 1Gb chips allocated to slower memory (56Gbps X 6 lanes = 336gbps).
Also, each data element has associated address data.

Alternative XSX memory layout




cr3xmSd.png





Blue fluid, odd 16bit straws 168 GB/s + 112‬ GB/s two Blue glasses = 280 GB/s with 8GB slice

Green fuild, even 16bit straws 168 GB/s + 112‬ GB/s two Green glass = 280 GB/s with 8GB slice

Total: 560 GB/s
 
Last edited:

Shmunter

Member
The memory subsystems gap is vast if you include in the DIrect Storage element where the XSX has direct and instant access 100Gb of game data with no need to stream. Although used before on creator systems with the Radeon Pro SSG, we don't know how that would operate in a game setting.

As far as compute power, if the XSX keeps the same TMU and ROPS count as what we expect for the PS5 (144 and 64 respectively) then the PS5 will actually have the texture and pixel advantage @ 2.23 Ghz. I would not expect MS to hamper their design by increasing their CU count to 52 and not increasing TMU and Rops count accordingly. (I don't think TMUs can be separated from CUs in the AMD architecture can they?)

That said, since we expect that XSX is really a 56 CU part with 4 CU disabled we can look for similarly high then disabled units in the Shaders TMU and ROPS count as well. A comparison of *potential* compute power would look like this:

System GPU Speed CU(base/active) Shaders 64* (base/active) TMUs (base/active) Rops (base/active)

PS5 2.233GHZ 40/36 2560/2304 160/(144*2.233Ghz) = 321.6 Gtexels 64*2.233Ghz = 142.9 Gpixels
XSX (nom) 1.825GHz 56/52 3584/3326 160/(144*1.825Ghz) = 262.8 Gtexels 64*1.825Ghz = 116.8

However I think the TMU and ROP counts dont match the CU in a way that we know AMD designs their cards. The best guess as to the graphics pipeline of XSX GPU comes from here:

XSX (exp) 1.825Ghz 56/52 3584/3328 224/(208*1.825Ghz) = 379 Gtexels 80*1.825ghz = 146 Gpixels

So the jump to 2.233 really DOES help the PS5 close the power gap considerably if the numbers can be effectively maintained in a game scenario.
Agreed. Anything is possible and I’m totally comfortable with it.. But what’s clear to me, at these levels it’s a fools errand to be chasing pixel pushing potentials. Neither is lacking and both will deliver. A few extra pixels here or there, do not a change make.

What’s truly new and opens fresh opportunities is access to massive data at unprecedented speeds. Data that was locked behind suboptimal hard drive delivery is no longer a barrier. This is what defines the next gen for development, and the inherent positive results for the end user. It’s pointless to discredit the reality of the matter, but hey it’s entertaining for the most part, very entertaining indeed.
 
Also, each data element has associated address data.

Alternative XSX memory layout




cr3xmSd.png





Blue fluid, odd 16bit straws 168 GB/s + 112‬ GB/s two Blue glasses = 280 GB/s with 8GB slice

Green fuild, even 16bit straws 168 GB/s + 112‬ GB/s two Green glass = 280 GB/s with 8GB slice

Total: 560 GB/s


I like this diagram. Would it be more accurate to have it so that glasses 5 and 7 are green AND that there is one straw in each glass that can ONLY drink from Green (GPU RAM) fluid?

I don't know the answer but the literature supposes that the GPU can only use 10 of the 20 straws and can only "see" and therefore "drink" from the bottom 1Gb of each glass. Im very interested in this representation and I have never seen it illustrated in this way. Thank you.
 
Agreed. Anything is possible and I’m totally comfortable with it.. But what’s clear to me, at these levels it’s a fools errand to be chasing pixel pushing potentials. Neither is lacking and both will deliver. A few extra pixels here or there, do not a change make.

What’s truly new and opens fresh opportunities is access to massive data at unprecedented speeds. Data that was locked behind suboptimal hard drive delivery is no longer a barrier. This is what defines the next gen for development, and the inherent positive results for the end user. It’s pointless to discredit the reality of the matter, but hey it’s entertaining for the most part, very entertaining indeed.

Indeed the last generation that was this architecturally interesting was the PS3/X360. Cell vs EDRAM. I'm learning alot also.

It will be great to see what tradeoffs developers value and how that impacts their visions for their games. First and second party devs are stuck with their hardware and I'm sure they will achieve great things for each system. I'm hoping that 3rd party devs do not chicken out and commission separate teams to maximize the capabilities of each system such that we can do accurate and entertaining head to head comparisons of titles.
 

rnlval

Member
I like this diagram. Would it be more accurate to have it so that glasses 5 and 7 are green AND that there is one straw in each glass that can ONLY drink from Green (GPU RAM) fluid?

I don't know the answer but the literature supposes that the GPU can only use 10 of the 20 straws and can only "see" and therefore "drink" from the bottom 1Gb of each glass. Im very interested in this representation and I have never seen it illustrated in this way. Thank you.
Each GDDR6 chip has two 16bit straws which enable full-duplex read/write pattern for GDDR6 chip.

It depends on how MS slices the physical memory address and mapping it to the virtual memory address. Static hardware design is nice for this trickery.
 
Each GDDR6 chip has two 16bit straws which enable full-duplex read/write pattern for GDDR6 chip.

It depends on how MS slices the physical memory address and mapping it to the virtual memory address. Static hardware design is nice for this trickery.

Yes indeed 32 bit width. My oversight.
 

rnlval

Member
Yes indeed 32 bit width. My oversight.
GDDR6's dual-channel improves random access handling which is just in time for large scale "Fusion" APUs such as PS5 and XSX, and GpGPU server workloads.

GDDR5/GDDR5X was designed for yesteryear's GPUs workloads. Don't worry about last-gen GPUs.
 
The memory subsystems gap is vast if you include in the DIrect Storage element where the XSX has direct and instant access 100Gb of game data with no need to stream. Although used before on creator systems with the Radeon Pro SSG, we don't know how that would operate in a game setting.

As far as compute power, if the XSX keeps the same TMU and ROPS count as what we expect for the PS5 (144 and 64 respectively) then the PS5 will actually have the texture and pixel advantage @ 2.23 Ghz. I would not expect MS to hamper their design by increasing their CU count to 52 and not increasing TMU and Rops count accordingly. (I don't think TMUs can be separated from CUs in the AMD architecture can they?)

That said, since we expect that XSX is really a 56 CU part with 4 CU disabled we can look for similarly high then disabled units in the Shaders TMU and ROPS count as well. A comparison of *potential* compute power would look like this:

System GPU Speed CU(base/active) Shaders 64* (base/active) TMUs (base/active) Rops (base/active)

PS5 2.233GHZ 40/36 2560/2304 160/(144*2.233Ghz) = 321.6 Gtexels 64*2.233Ghz = 142.9 Gpixels
XSX (nom) 1.825GHz 56/52 3584/3326 160/(144*1.825Ghz) = 262.8 Gtexels 64*1.825Ghz = 116.8

However I think the TMU and ROP counts dont match the CU in a way that we know AMD designs their cards. The best guess as to the graphics pipeline of XSX GPU comes from here:

XSX (exp) 1.825Ghz 56/52 3584/3328 224/(208*1.825Ghz) = 379 Gtexels 80*1.825ghz = 146 Gpixels

So the jump to 2.233 really DOES help the PS5 close the power gap considerably if the numbers can be effectively maintained in a game scenario.
Huh. If it's the second case both are really really close, how interesting. PS5 = 86% and 98% of XSX, respectively. That's good enough IMO.
 
Last edited:

SonGoku

Member
P psorcerer are you a dev? I'd like clarification on the XSX memory configuration
From what i've read online both pools can't be accessed simultaneously, access to either pool must switch on a cycle by cycle basis because the bus is saturated. As a result whenever the slower pool is accessed by the CPU the average bandwidth available to GPU is lower due to wasted cycles

To get around this limitation devs would presumably use the 10GB pool for vram and system ram delegating the extra 3.5GB as a low priority cache (assets textures etc.)
 
Last edited:

3liteDragon

Member
Engadget writes their own articles.

Your claim was that Microsoft purposely caused confusion by advertising that their clocks are fixed.

I ask one more time, where has Microsoft advertised the Series X clocks as “fixed”


No I’m not playing the dumb game with you.

Post where they advertised it or admit you pulled that from your ass.

For the CPU (timestamped):



For the GPU AGAIN (timestamped):



They mentioned it SEVERAL times like Richard said during their visit to Microsoft HQ, this was posted two days before "The Road to PS5" event was live-streamed on Wednesday, March 18th. Meaning, Microsoft knew beforehand that Sony were going with boost clocks and they wanted to make sure that they let everyone know that their clocks are always LOCKED at the exact same frequency at all times.

Another popular tech YouTuber, Austin Evans also visited Microsoft HQ and posted a video that very same day covering all of the Series X's specs (timestamped).



He also mentions that the CPU runs at "sustained" clock speeds, and he EVEN goes on to say "and that's not some kind of BOOST speed or anything, it can run at 3.8 GHz SUSTAINED, pretty much forever." It's almost as if he was told to say that rather than him choosing to mention it out of nowhere for whatever reason like people don't know that or something.

When he's talking about the GPU's specs (timestamped):



Even on the graphic, it's mentioned that the GPU clock is "sustained" as in like the clock speed is locked and will always remain the same.
 
Last edited:

Neo_game

Member
A quote from Liabe Brave of era regarding the memory of XSX:


Don't kill me I am just quoting

Yes, I think it is true as I have read other analysis also saying the same thing. So as soon as games start using more than 10gb of ram. BW of both console with will start converging.

BTW who is Liabe Brave ?
 

nosseman

Member
What I understand with fixed versus variable is the following.
MS will have PL levels fixed in hardware, so PL0 is when it is doing absolutely nothing and clocks will be substantially lower and fixed at some value like (this is just hypothetical) 300MHz or 400MHz something. At this level it goes to store, opens a website pretty much basic level.
PL1 is like 1000MHz for side crawler games, 2D games and the like, it's for games that are not pushing the hardware at all.
and finally PL2 1825MHz for all the other games.
All these frequencies are fixed at a value.

For Sony there will be an envelope for these power levels, like PL0 is between 300-400 MHz, PL1 is 900-1000MHz and PL2 is 2230-2130MHz and it will vary in MHz between the top end and the bottom end of the envelope and it stays in the power target. Staying in power target is a guarantee with Sony's setup, what is varying in is frequencies.

Although I like to remind everyone that all hardware in the market have 'race to idle' set in its profile and these switches can happen in nano seconds (the switching itself), and sometimes stay in that switched level only a few nanoseconds to with back to a previous power level. And for people with GPUz can confirm this by looking at the core clock freq. while doing stuff like browsing internet, watching YT, and watching highly compressed HEVC, playing a basic 2D game, playing 3D side crawler and playing a game that pushes the hardware to its limits. All have different levels of power targets and PL levels, and actually it varies a lot even at a high PL level as it is generally divided into smaller sub PLs just like a target envelope in the PC space.

So from the confusion here I think no one have an enthusiast grade PCs and tried OCing their hardware, because if you did then you would know about PLs defined in GPU card BIOSes and how you could change the behavior of the card, even down to defining new fan speeds for OCed PL right down in the nitty gritty part of the hardware.

Actually I remember writing over HEX bytes in the BIOS and unlocking a gimped Nvidia card to its full potential when manufacturer disabled SMs that were actually working, and also bricking an AMD card trying to do the same and sending it to RMA with no hopes but receiving a new card all the same, then doing it again but successfully this time, that was how I got my 290 turn into 290X.

A CPU and/or GPU can "idle" at locked clocks and still use significantly less power than on max load.
 
Yes, I think it is true as I have read other analysis also saying the same thing. So as soon as games start using more than 10gb of ram. BW of both console with will start converging.

BTW who is Liabe Brave ?
He's a Resetera member, the guy is technically very good and doesn't mind console warfare, so he's a good person to read.
 

FeiRR

Banned
They mentioned it SEVERAL times like Richard said during their visit to Microsoft HQ, this was posted two days before "The Road to PS5" event was live-streamed on Wednesday, March 18th. Meaning, Microsoft knew beforehand that Sony were going with boost clocks and they wanted to make sure that they let everyone know that their clocks are always LOCKED at the exact same frequency at all times.

Another popular tech YouTuber, Austin Evans also visited Microsoft HQ and posted a video that very same day covering all of the Series X's specs (timestamped).
While watching both videos (DF and the other YT guy), I noticed how much of the wording they used was almost identical. Which means, they had been told exactly what to say and how.
 

mitchman

Gold Member
The memory subsystems gap is vast if you include in the DIrect Storage element where the XSX has direct and instant access 100Gb of game data with no need to stream.
You're trying to make it sound like a memory mapped filesystem or files is somehow a new and revolutionary thing. It's not. All OSes has supported it for decades, and that includes Sony's version of FreeBSD used on PS4 and PS5, and it's widely used by many applications billions use daily on their computers. The only new and interesting here is if they somehow manage to compete with the DMA access and the GPU cache scrubbers that the PS5 has, then they would be closer in performance.
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
You're trying to make it sound like a memory mapped filesystem or files is somehow a new and revolutionary thing. It's not. All OSes has supported it for decades, and that includes Sony's version of FreeBSD used on PS4 and PS5, and it's widely used by many applications billions use daily on their computers. The only new and interesting here is if they somehow manage to compete with the DMA access and the GPU cache scrubbers that the PS5 has, then they would be closer in performance.

They are competing with them. Maybe with a bit less resources dedicated to it, lower bandwidth and less HW accelerators, but their point about including Zen 2 cores worth of logic to accelerate I/O access and extract the performance the SSD makes available (the lack of which and backwards compatibility concerns are why you are seeing Direct Storage premiering on Xbox than Windows 10/10X) is really the same.
 
Last edited:

psorcerer

Banned
P psorcerer are you a dev? I'd like clarification on the XSX memory configuration
From what i've read online both pools can't be accessed simultaneously, access to either pool must switch on a cycle by cycle basis because the bus is saturated. As a result whenever the slower pool is accessed by the CPU the average bandwidth available to GPU is lower due to wasted cycles

To get around this limitation devs would presumably use the 10GB pool for vram and system ram delegating the extra 3.5GB as a low priority cache (assets textures etc.)

Yes and no.
It's kinda more complicated. But it's not about the "bus saturation" because there is no single bus...
We need to think about it in terms of clients and servers, requests and responses.
We have 5 64bit memory controllers (MCs) which are the "servers" and 3 "clients": CPU, GPU, SSD.
5 MCs are not equal, 3 of them have 2x2GB chips ("bigger", 4GB servers), and 2 of them have 2x1GB ("smaller", 2GB servers).
But the bandwidth per server is the same: 2x56GB/sec = 112GB/sec
Now in the naive scenario where we have access only from the GPU client and it is randomly uniformly distributed.
We will get 2x requests to the "bigger" servers (more data there -> more requests, we are uniformly distributed).
Numbers: let's say we want 560GB/sec, GPU has 128B typical access, which means 560G/128 = 4480M requests per second.
How are they distributed? 4480/16GB*4GB= 1120Mreq/sec for the "bigger" MCs and 560Mreq/sec for the "smaller" ones (2x size = 2x requests)
But each server can serve only 112GB/sec / 128B = 896Mreq/sec. I.e. the 4GB servers will be overwhelmed and serve only 896Mreq and smaller servers will be underutilized and serve their 560Mreq happily.
Overall bandwidth will be: 476GB/sec (896M*3+560M*2) * 128B
But that's not what will happen.

MSFT divided RAM addresses into 2 pools: 10GB and 6GB.
Now if GPU works only with 10GB it will always get 560GB/sec
But what about other clients?
This generation the CPU bandwidth was around 10-20GB/sec (max).
Next gen it will probably double to 20-40GB/sec, let's say it's 30GB/sec.
So we have: 530GB/sec for GPU and 30GB/sec for CPU (and SSD).
Typical CPU request is smaller 64B, but it is still served by the same MC. Let's assume that CPU requests always come in pairs and it's still 2*64B=128B
So, 4240Mreq from GPU and 240Mreq from CPU
GPU requests are randomly distributed over 10GB and access only the first 2GB of each chip (that's how the pool is configured): 4240/10GB*2GB = 848Mreq
CPU is using 6GB pool: 240/6GB*2GB=80Mreq
Now smaller MCs get 848Mreq/sec and bigger ones 848+80=928Mreqs/sec
Still the bigger ones are slightly over saturated and smaller ones are underutilized.
The total bandwidth is: (896*3+848*2)*128 = 548GB/sec (of which GPU got 519GB/sec and CPU got 29GB/sec).
A much much better situation.
Let's say it's even fancier: GPU (520GB/sec 10GB pool and 10GB/sec 6GB pool) CPU (only 6GB pool 30GB/sec)
Sparing the math it gets us to 832Mreq + 106.7Mreq => 544GB/sec (of which GPU got 515GB/sec and CPU got 29GB/sec).
Why is CPU getting almost all it asks for and GPU suffers?
Because the distribution is still uniform, larger "servers" still attract more requests, but arbitrage by the pool size eases the problem.
It gets much more interesting when we start factoring in the CPU smaller typical access size and lower latency requirements.
It will eat even more bandwidth out of GPU, for each 1Gb/sec served to CPU GPU will suffer a 2GB/sec bandwidth reduction (but that's obviously happens for both next gen consoles in exactly the same manner).

So the answer is: yes it will lower the GPU bandwidth, and no probably not as badly as in the naive case.
And bonus: memory usage.
From our bandwidth numbers we can assume for the last (realistic) case: 9GB of 10GB pool used in 1 second, <1GB of the 6GB pool used in the same 1 second. So yes other 2.5GB available will be best used for SSD decompression it seems.
 
You're trying to make it sound like a memory mapped filesystem or files is somehow a new and revolutionary thing. It's not. All OSes has supported it for decades, and that includes Sony's version of FreeBSD used on PS4 and PS5, and it's widely used by many applications billions use daily on their computers. The only new and interesting here is if they somehow manage to compete with the DMA access and the GPU cache scrubbers that the PS5 has, then they would be closer in performance.

I don't think there is a clear analog to this technology except the AMD SSG technology. This isn't a simple virtual swap file. The SSD implementation bypasses the normal "asking the CPU to fetch game data over the PCIE bus the dump into VRAM" process by storing files for entire projects locally on the GPU itself. Those files remain resident until cleared by the application or user, and only need to be migrated to the that bespoke GPU connected partition of the SSD once.

The GPU can now communicate directly with this local storage (that portion of the SSD has its own channels directly to the GPU no PCIE access necessary) when its own VRAM capacity has been exceeded or is tapped for other implementations. The GPU will be able to dump or pull geometry, textures, shaders, and other models into the SSD rather than limited VRAM, and the CPU can also work on that data simultaneously.

I'm not aware of another implementation of NAND direct to GPU with 100Gb of instant access. other than the Radeon Pro SSG implementation. If you can provide a similar outline I would be glad to read up on it.
 
Last edited:

Gavin Stevens

Formerly 'o'dium'
A terrible analogy from a bogus insider

I was told 10.5 with heating issues (correct) which then got revised to 11.6 without after he got the willies. Was also correct on show dates and other things down the to day in advance. Plus I’m also still around posting unlike most, have my name and face here, and carry on as normal.

But please, continue.
 

Vroadstar

Member
I was told 10.5 with heating issues (correct) which then got revised to 11.6 without after he got the willies. Was also correct on show dates and other things down the to day in advance. Plus I’m also still around posting unlike most, have my name and face here, and carry on as normal.

But please, continue.

And that's the embarrassing part. Looking forward to your food analogy for fake insiders
 

Gavin Stevens

Formerly 'o'dium'
And that's the embarrassing part. Looking forward to your food analogy for fake insiders

It’s embarrassing that I’m still around posting, just because I trusted the wrong info...?

Man, you guys take this shit way too seriously. I got good info the first time round, look up my posts. Bang on, even down to the heat issues that are now being talked about. Posted it, was roasted (I’m just salty because it has to be better than Xbox. Fuck logic). Got bad info again, posted it, was roasted. Both times because it wasn’t what people wanted to hear, and everybody was on the “it must be 13+tf!” power train. The second lot of info, I was promised it was it, and it checked out. So I ran with it and had a laugh, when in reality, the first lot was bang on. He got the willies at the traction I was getting, and fed me bad numbers. But the first info he gave me, was bang on.

But it’s embarrassing because I still post, still offer help and insight about technical discussion, stuff I have experience in? All because the numbers weren’t what you wanted? On a plastic box, which will still outsell the other regardless...?

Let me tell you what’s embarrassing. The way people are acting, both on this thread and others. As soon as they hear something they don’t want to hear, they jump into fangirl mode and try and down play it. That 10 foot pole can’t be 10 foot, because *reasons*. Sometimes, a 10 foot pole is 10 foot. It doesn’t matter, it won’t change your purchase of the pole. But people try and offer sound technical advice here and you brush them off because some sad git in his bedroom posted a YouTube video about how Cerny made a 10 foot pole reach a cat in a tree 20 foot up.

It’s remarkable. And THAT my friend is embarrassing.
 

geordiemp

Member
How are they getting to that number? Each GDDR6 chip provides 56 GB/s of bandwidth, right? So if the CPU only needs 1 GB of data, if it's on one chip, then 48 GB/s of CPU access would be able to be provided through one chip, right?

Unless I have that wrong, of course. That said I thought about the possibility if the amount of the slower memory bandwidth can be dynamically accessed, i.e if non-GPU processors like the CPU only need a small slice of the 336 GB/s bandwidth total, the system just gives as much as needed. For example if the CPU only needs 112 GB/s bandwidth, just two of the 2 GB chips are tapped for their bandwidth, while the others can still stay utilized by the GPU, which would include the 1 GB chips that are not given to the slower bandwidth pool.

Figure that should be possible, no reason to lock off six chips for slower bandwidth to something like the GPU on a given set of cycles when there could be many instances where the CPU only needs maybe one or two chips i.e very small amounts of that total slower bandwidth figure. Guess we need more info on how the memory setup works, but I picture that type of dynamic range to the slower pool would help somewhat with contention issues.

And I mean honestly, it's just a bit of an assumed conclusion; I've seen some people elsewhere seemingly think that when the non-GPU processors are accessing the memory, it "locks up" the six 2 GB chips altogether from GPU access regardless of what amount of memory and bandwidth those other processors need. That sounds like a dumb design decision oversight IMHO; would make more sense that the setup is those chips can access up to 336 GB/s bandwidth through the lower bound 1 GB portions of the six 2 GB chips, not that those chips are locked up as some sort of mode setting regardless of the amount needing to be requested.

As far as any upgrades, well if we don't hear anything about delays by sometime in June we can assume the systems are set for release this year. If anything gets announced for delays, it'll be before July IMHO. Just a gut feeling (plus usually that's about the time full production on the systems at mass scale begins I believe, for fall launches).

No, its a shared bus and the contention is real, also if the CPU and audio data is in the slow access pool it has a bigger effect. Thats why so many posters never expected this arragement and why nobody does it, otherwise it would be more common.

And also note GPUs do not operate on their own, CPU needs to tell what to display, so the contention is frequent in every frame unless te CPU data is in the 10 GB. For large memory games with 4K high quality assets,

Best post I have read


JHqg2OT.png


Note, if XSX manages to keep most frequent access in the 10 gb, it will have very high bandwidth. Maybe they have some more tricks up their sleeve ?
 
Last edited:

Vroadstar

Member
I don't even know why you attack him lol there are two possibilities:
He's fake
Or
The source told him false informations as he claims.
Do you have solid proof is the first case?

I quoted his food analogy which is ridiculous and obviously skewed towards his favored so-called "plastic boxes", for somebody who actually claims to have experience
in it and offering help and insight in a technical discussion, did you learn a lot about that food analogy?

and being wrong is not solid proof enough? and his out was he was given wrong info by his buddy because his post got traction? Do you actually believe that? what did he expect? you post it in GAF, a gaming forum at that and people will not notice it?
 

Gavin Stevens

Formerly 'o'dium'
I quoted his food analogy which is ridiculous and obviously skewed towards his favored so-called "plastic boxes", for somebody who actually claims to have experience
in it and offering help and insight in a technical discussion, did you learn a lot about that food analogy?

You can use whatever analogy you want. The simple truth is one unit has more of X than the other, and you can say the SX > the PS5 on pretty much most, or the PS5 > the SX on SSD. Those are facts. If you want to get into the finer detail, then you cant just move the goalposts to suit your arguments. Both machines have a LOT of shit inside that is purpose built purely for helping push their respective system, yet all I seem to read most of the time here is that the PS5 has it all, the SX is essentially a windows machine from 1997.

So yes. If you talk on pure system grunt, theres a clear winner.

It means fuck all at the end of the day, because pound for pound I've had more fun with my Switch than I have done EITHER of the current gen machines, and thats sure as shit not winning any system grunt awards. Hence, play the games on the system you want.

But discuss the technical aspects in a fair way and don't blindly believe everything from one over nothing from the other. Otherwise you're no better than those youtube idiots that don't even know what a gigawatt is.

and being wrong is not solid proof enough? and his out was he was given wrong info by his buddy because his post got traction? Do you actually believe that? what did he expect? you post it in GAF, a gaming forum at that and people will not notice it?

You DO know that Tommy got the SX spec correct based on *reasons*, and then all of a sudden he was the saviour of Playstation because he quoted it as a 13tf machine. It was ok then, but not now...? I wonder why that is...?

But yes, I was given a set of specs first which I posted about, as well as showcase dates. The dates came true, so I assumed the specs would work out too. It took me over a week to even get a message back from the person who gave me those specs, a first party dev who i have worked with before, I may add. And yes, the original specs he gave me (roughly 10.5 but with heating issues so they didn't know if it was going to be lower max clock) were fine. I even remember most people here having a go at me because "why are you talking TARGET specs, it should be a set in stone number" and "how can it be having heat issues?!". Look these up. Later on, he gave me revised specs, which were 11.6, which again, when i did some searching looked to be, well, ok. They were reported elsewhere, so i made a fun game of it, which got some laughs. I don't regret that, it was fun.

Over a week after spec reveal he finally gets back to me, and tells me because I posted the original specs that were correct, he fed me BS ones to throw people off anything getting back to him, as his company was one of the few that had access to those speeds at that time.

OR... I made it all up, for attention, and even posted hundreds of times about technical talk, explanations, help, discussion, all of it fake, just for... REASONS...?

Yeah, I know which sounds more plausable.

But whatever. Just buy the damn box you like the look of, and stop worrying about shit you wont even see 99% of the time.

SlimySnake SlimySnake It wasn't supposed to be a riddle, more so an analogy, so sorry about that. I wouldn't be posting bloody riddle, not yet... Thats coo coo....
 
Last edited:

SlimySnake

Flashless at the Golden Globes
Guys, stop harrassing Odium. I'm all for giving them a good ribbing, but the time for that has passed. Let's not turn into Xbox era discord and lead harassment campaigns against individuals just because they got a lousy number wrong. It's not like anyone else including lord github got it right anyway. Clearly this thing was in flux.

Odium, just an fyi, posting riddles after how you guys jerked us around for months is a bad idea. The wounds are still fresh.
 
I quoted his food analogy which is ridiculous and obviously skewed towards his favored so-called "plastic boxes", for somebody who actually claims to have experience
in it and offering help and insight in a technical discussion, did you learn a lot about that food analogy?

and being wrong is not solid proof enough? and his out was he was given wrong info by his buddy because his post got traction? Do you actually believe that? what did he expect? you post it in GAF, a gaming forum at that and people will not notice it?

Sheesh dude, you sure have issues. Let that man alone stop that childish behaviour. It's really annoying with your personal attacks and blowing things out of propotion.
 
Im not denying GDDR6 is more efficient and advanced, the penalties on wasted cycles would be much more severe with GDDR5!
Its a physical limitation, can't access both pools simultaneously (on the same cycle) if they do the average bandwidth would be lower than PS5. The right path would be to switch access on a cycle by cycle basis as LadyGaia pointed out.

Anyways im way out of my element here, just relaying what more knowledgeable people said on the matter
@Fafalada thoughts?
sqPOlNf.png

c67XdCs.png

FIBvcQm.png

KxUmSTs.png

I'm not sure why accessing the "slower" 6GB over its 3 x 64 bit channels (assuming it's 64-bit channels like RDNA1) would disable the remaining two 64-bit channels.

MS said that access to the 6GB normal memory was at 336GB/s, not that accessing that 6GB of memory reduce entire system bandwidth to 336GB/s.

Now it might be that depending on what data the GPU was waiting for, and how it was striped across the memory channels, that the CPU, IO etc could effectively block or slow accesses that also used the remaining channels on the remaining 128-bits of the bus .... but that's very different from saying that a CPU access inherently blocks access to memory channels it's not using.

There's a whole lot of imagining worst case scenarios for the XSX memory setup!
 
I quoted his food analogy which is ridiculous and obviously skewed towards his favored so-called "plastic boxes", for somebody who actually claims to have experience
in it and offering help and insight in a technical discussion, did you learn a lot about that food analogy?

and being wrong is not solid proof enough? and his out was he was given wrong info by his buddy because his post got traction? Do you actually believe that? what did he expect? you post it in GAF, a gaming forum at that and people will not notice it?
No, being wrong is not enough for calling people fake, especially because he was the closest to the truth.
So, I'm gonna report you.
 
No, its a shared bus and the contention is real, also if the CPU and audio data is in the slow access pool it has a bigger effect. Thats why so many posters never expected this arragement and why nobody does it, otherwise it would be more common.

And also note GPUs do not operate on their own, CPU needs to tell what to display, so the contention is frequent in every frame unless te CPU data is in the 10 GB. For large memory games with 4K high quality assets,

Best post I have read


JHqg2OT.png


Note, if XSX manages to keep most frequent access in the 10 gb, it will have very high bandwidth. Maybe they have some more tricks up their sleeve ?

The CPU is not tying up the bus. The GPU always has access to its 10Gb memory at the lower 1Gb address on each chip. The GPU AFAIK can only see the 10Gb and its dedicated VRAM for the most part.

Yes and no.
It's kinda more complicated. But it's not about the "bus saturation" because there is no single bus...
We need to think about it in terms of clients and servers, requests and responses.
We have 5 64bit memory controllers (MCs) which are the "servers" and 3 "clients": CPU, GPU, SSD.
5 MCs are not equal, 3 of them have 2x2GB chips ("bigger", 4GB servers), and 2 of them have 2x1GB ("smaller", 2GB servers).
But the bandwidth per server is the same: 2x56GB/sec = 112GB/sec
Now in the naive scenario where we have access only from the GPU client and it is randomly uniformly distributed.
We will get 2x requests to the "bigger" servers (more data there -> more requests, we are uniformly distributed).
Numbers: let's say we want 560GB/sec, GPU has 128B typical access, which means 560G/128 = 4480M requests per second.
How are they distributed? 4480/16GB*4GB= 1120Mreq/sec for the "bigger" MCs and 560Mreq/sec for the "smaller" ones (2x size = 2x requests)
But each server can serve only 112GB/sec / 128B = 896Mreq/sec. I.e. the 4GB servers will be overwhelmed and serve only 896Mreq and smaller servers will be underutilized and serve their 560Mreq happily.
Overall bandwidth will be: 476GB/sec (896M*3+560M*2) * 128B
But that's not what will happen.

MSFT divided RAM addresses into 2 pools: 10GB and 6GB.
Now if GPU works only with 10GB it will always get 560GB/sec
But what about other clients?
This generation the CPU bandwidth was around 10-20GB/sec (max).
Next gen it will probably double to 20-40GB/sec, let's say it's 30GB/sec.
So we have: 530GB/sec for GPU and 30GB/sec for CPU (and SSD).
Typical CPU request is smaller 64B, but it is still served by the same MC. Let's assume that CPU requests always come in pairs and it's still 2*64B=128B
So, 4240Mreq from GPU and 240Mreq from CPU
GPU requests are randomly distributed over 10GB and access only the first 2GB of each chip (that's how the pool is configured): 4240/10GB*2GB = 848Mreq
CPU is using 6GB pool: 240/6GB*2GB=80Mreq
Now smaller MCs get 848Mreq/sec and bigger ones 848+80=928Mreqs/sec
Still the bigger ones are slightly over saturated and smaller ones are underutilized.
The total bandwidth is: (896*3+848*2)*128 = 548GB/sec (of which GPU got 519GB/sec and CPU got 29GB/sec).
A much much better situation.
Let's say it's even fancier: GPU (520GB/sec 10GB pool and 10GB/sec 6GB pool) CPU (only 6GB pool 30GB/sec)
Sparing the math it gets us to 832Mreq + 106.7Mreq => 544GB/sec (of which GPU got 515GB/sec and CPU got 29GB/sec).
Why is CPU getting almost all it asks for and GPU suffers?
Because the distribution is still uniform, larger "servers" still attract more requests, but arbitrage by the pool size eases the problem.
It gets much more interesting when we start factoring in the CPU smaller typical access size and lower latency requirements.
It will eat even more bandwidth out of GPU, for each 1Gb/sec served to CPU GPU will suffer a 2GB/sec bandwidth reduction (but that's obviously happens for both next gen consoles in exactly the same manner).

So the answer is: yes it will lower the GPU bandwidth, and no probably not as badly as in the naive case.
And bonus: memory usage.
From our bandwidth numbers we can assume for the last (realistic) case: 9GB of 10GB pool used in 1 second, <1GB of the 6GB pool used in the same 1 second. So yes other 2.5GB available will be best used for SSD decompression it seems.

The only thing I would add is that is that GPU is locked off from accessing the 6Gb/336Gbs pool and can ONLY see the 10Gb pool. It always has access to the lower 1 Gb address space of the 10 MCs regardless of What the CPU is doing. The CPU has access to all 16GB and the upper and lower addresses but Devs are emphasized to use the upper 1Gb address for CPU Audio and OS storage. The distribution of access and usage isnt uniform but mostly your example is about right.
 
Last edited:

geordiemp

Member
MS said that access to the 6GB normal memory was at 336GB/s,

Thats right, accessing the slower access portion is that, and on a shared common bus nothing else is happening, there are no 2 busses or individual straws. And when the system is accessing the slower access RAM, its precious time wasted.

Maybe MS will increase the RAM amount and way it is connected up before release, who knows ? Maybe they will keep all CPU and GPU and sound and everything running in the 10 GB if they can and do some clever jiggery pokery...who knows.

Maybe most games wont need more than 10 gb....

The CPU is not tying up the bus. The GPU always has access to its 10Gb memory at the lower 1Gb address on each chip. T

Sorry I belive a professional who wrote optimisation and profiling tools for a living, unless you have credentials or expertise in this area I think lady Gaia is correct for now - and so did everyone else on Era.....NX gamer etc.
 
Last edited:
Status
Not open for further replies.
Top Bottom