• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Platform Xbox Velocity Architecture - 100 GB is instantly accessible by the developer through a custom hardware decompression block

rntongo

Banned
Feb 25, 2020
343
657
275
That would NEVER pass Sony or MS certifications. Letting devs create their own low level hardware access is instant recipe for disaster. All it takes is a dev making an error and suddenly you have a jailbreaked console and rampant piracy.
You misunderstood me entirely. Devs have low level access to the hardware through APIs provided by Sony and MSFT. I should have just used software libraries so that we don't have such a semantic debate.
 

Apollo Helios

Member
Feb 27, 2020
295
1,252
395
The CPU can consume data from ANYWHERE. Not just RAM.
There are penalties for each. Devs CAN but up until now DIDN'T for a reason. If the latency on PS5 IO and SSD solution is much much better than XSX as I figure then SSD option will be viable only for one system still. Software stack can not reduce latency on its own, hw must be close at hand + engineered and designed for that in mind. XSX blown up shots of motherboard puts SSD really far away and we know for a fact that for MS SSD was much like an afterthought on hw level and tacked on as software stack with DirectStorage API as the wishful savior. We will see these very soon. But a load comparison might already give you an idea, with One X vs XSX only incrementally improving the load times, like in this gen putting an SATA SSD in and only halving load times. For PS5 the leaked sub 1 second load time is ground breaking proves PS5 can use SSD like RAM, and might be the only one at that.
 

Dodkrake

Member
Feb 28, 2020
192
516
265
Reminder, RTX 2080 has a split integer and floating math processors, hence it has higher operations per cycle potential when combined both integer and floating-point workloads.

Each Turing CUDA FP core has corresponding CUDA Integer core.



From https://www.gamersnexus.net/guides/3364-nvidia-turing-architecture-technical-deep-dive




RTX 2080 FE has 1897 Mhz average clock speed, hence 11.134 TFLOPS average and not including TIOPS for INT32 performance.

RDNA CU's ALUs shares integer and floating workloads.

XSX GPU can be clocked higher when CPU usage is reduced as per Smartshift function




Atm, XSX has the fixed frequency strategy i.e. CPU at 3.6 Ghz with 16 threads mode or CPU at 3.8 Ghz with 8 threads mode and GPU at 1.825 Ghz.

XSX's fixed frequency strategy is before AMD's Smartshift is enabled, but CPU itself has mini-Smartshift since disabled SMT reduces power consumption which enables higher clock speed. XSX's Smartshift exists in hardware while waiting to be fully enabled.

Any pure TFLOPS argument hides Turing's TIOPS CUDA core hardware.
So the new FUD is that the XSX has smartshift. LOL
 
  • LOL
Reactions: Panajev2001a
Aug 28, 2019
2,165
3,960
510
www.instagram.com
There are penalties for each. Devs CAN but up until now DIDN'T for a reason. If the latency on PS5 IO and SSD solution is much much better than XSX as I figure then SSD option will be viable only for one system still. Software stack can not reduce latency on its own, hw must be close at hand + engineered and designed for that in mind. XSX blown up shots of motherboard puts SSD really far away and we know for a fact that for MS SSD was much like an afterthought on hw level and tacked on as software stack with DirectStorage API as the wishful savior. We will see these very soon. But a load comparison might already give you an idea, with One X vs XSX only incrementally improving the load times, like in this gen putting an SATA SSD in and only halving load times. For PS5 the leaked sub 1 second load time is ground breaking proves PS5 can use SSD like RAM, and might be the only one at that.
The worst of takes, holy crap. Some people are seemingly set in their ways, huh? xD.

Like, how are you expecting people genuinely interested in nuanced discussion to bother with this? It reads like warrior wishful thinking rather than a moderate viewpoint based on actual facts.

Like I said he was a grump! Like a teacher with a class full of dumb-asses he probably often thought! As for the Github thing all I'll say is it wasn't the full details is all. As for others dismissing Github after Matt's comments then all I can say is that it is always the case that mods/insiders are put on a pedestal. That really isn't his fault. We all can read his posts and decide for ourselves what to believe. e.g. he's also said the 18% the XSX is ahead in compute and the various numbers given for both so far bear out in practice.
Meh, it is what it is I guess. Water under the bridge at this point. FWIW Matt doesn't have access to the hardware or devkits, so even if he has claimed the 18% figure or such that could've been well after anyone else had done so, because I don't recall him suggesting anything regarding PS5's TFs before Road to PS5 and in fact I believe he was of the belief it was going to be around the 12 TF range or so, if not having outright said so.

You're right though in that he can't be held responsible for other people putting his word on a pedestal. With that said, he has to be aware that people do put his word on a pedestal, so whether he wants it or not, he has to burden an additional responsibility because of that.
 
Last edited:
  • Like
Reactions: rntongo and Ascend

MasterCornholio

Gold Member
Mar 27, 2020
1,688
4,397
505
The worst of takes, holy crap. Some people are seemingly set in their ways, huh? xD.

Like, how are you expecting people genuinely interested in nuanced discussion to bother with this? It reads like warrior wishful thinking rather than a moderate viewpoint based on actual facts.
Speaking about facts how important is latency is an I/O system?

I understand that the further away the SSD is from the APU the more latency there will be. But what I don't understand is how far away does it have to be for it to be a problem?
 
Aug 28, 2019
2,165
3,960
510
www.instagram.com
Speaking about facts how important is latency is an I/O system?

I understand that the further away the SSD is from the APU the more latency there will be. But what I don't understand is how far away does it have to be for it to be a problem?
Latency in regards to PCIe is not something of a major concern with the setup XSX has with its SSD location on the board. Again, MS has a team of engineers; if they felt that location was bad for latency, they wouldn't have placed the SSD there. PCIe 4.0 can support trace lengths of up to 12 inches (1 foot).

As another example, ethernet connection wires can support lengths of literal feet with no or minimal signal degradation. Based on the PCIe 4.0 spec, XSX's SSD location on the PCB is a non-issue, it's probably 3-4 inches in total, well within the spec's trace distance support range.

Also if they feel the need, they likely have a retimer on the board for purposes of channel attenuation, but I honestly don't see the need with the design they have in place.

EDIT: Actually thinking about it for a sec and looking at a shot of the PCB with the SSD, there could be a retimer along the trace path if you look closely. I can't ascertain what the green lines (they aren't green on the actual PCB of course) are going over in the pic, might just be a result of a PCB mockup shot for marketing purposes.
 
Last edited:
  • Like
Reactions: Trueblakjedi

Trueblakjedi

Member
Mar 30, 2020
177
378
220
There are penalties for each. Devs CAN but up until now DIDN'T for a reason. If the latency on PS5 IO and SSD solution is much much better than XSX as I figure then SSD option will be viable only for one system still. Software stack can not reduce latency on its own, hw must be close at hand + engineered and designed for that in mind. XSX blown up shots of motherboard puts SSD really far away and we know for a fact that for MS SSD was much like an afterthought on hw level and tacked on as software stack with DirectStorage API as the wishful savior. We will see these very soon. But a load comparison might already give you an idea, with One X vs XSX only incrementally improving the load times, like in this gen putting an SATA SSD in and only halving load times. For PS5 the leaked sub 1 second load time is ground breaking proves PS5 can use SSD like RAM, and might be the only one at that.
This sound purely like assertions on your part. "much much better.. as i figure" ,"Wishful savior" "really far away" are all subjective, dismissives which should make this post one not taken seriously.

1: To start with XSX GPU has a 560 GB/s bandwidth rate. WHy? Dont gimme the dummy answer of wide slow because that isnt an answer. The team at XBOX chose that bandwidth for that part. Why? If they could not fill the GPU with work it would have been better to go with a 448GB/s system bandwidth rate, So why keep that if they COULDNT fill it?? No one on this board can answer that.

2: If the I/O was an after thought why did they design both a HW block that has a decompression rate of 6+GB/s to deal with decompression as part of their solution in addtion to a fast by any other reckoning SSD? You dont design ASICs as an afterthought. Remember MS staff have been using XSX's in their homes since before the new year so the design was complete quite a while ago. Where is your factual basis for this assertion here: "we know for a fact that for MS SSD was much like an afterthought on hw level " Where? Other than your mind, I need an external proof.

3. Why would Phil Spencer speak on using SSD as RAM a year ago if this was an afterthought?

4. You DO KNOW that in the roadmap of RDNA3 and Nvidia future architectures having Storage attached GPU is a thing right? Do you think that AMD and Nvidia both independently decide that was goona be a thing.. and MS didnt have a clue and tucked it into DirectX12U randomly like a dollar in a shoe?
"Just as GPUDirect RDMA (Remote Direct Memory Address) improved bandwidth and latency when moving data directly between a network interface card (NIC) and GPU memory, a new technology called GPUDirect Storage enables a direct data path between local or remote storage, like NVMe or NVMe over Fabric (NVMe-oF), and GPU memory. Both GPUDirect RDMA and GPUDirect Storage avoid extra copies through a bounce buffer in the CPU’s memory and enable a direct memory access (DMA) engine near the NIC or storage to move data on a direct path into or out of GPU memory – all without burdening the CPU or GPU. "

"Like HBCC, NVCache will use some of the system RAM and SSD of your system, and super-speed game load times and optimize VRAM usage.

Read more: https://www.tweaktown.com/news/72399/nvidia-ampere-rumor-nvcache-speeds-up-load-times-optimize-vram-usage/index.html"

So you think with that access, insight and 4 years design time... and readily available technology to put into their console... it was an afterthought.. yeah? About Sony. Not about bringing all of these architectures under the DX12 umbrella - console and PC. It was abotu responding to a Sony solution that the PCIE 4 specification itself exceeds on its own without special HW.

Some folks are special.
 

rnlval

Member
Jun 26, 2017
363
298
255
Sector 001
gpucuriosity.wordpress.com
This sound purely like assertions on your part. "much much better.. as i figure" ,"Wishful savior" "really far away" are all subjective, dismissives which should make this post one not taken seriously.

1: To start with XSX GPU has a 560 GB/s bandwidth rate. WHy? Dont gimme the dummy answer of wide slow because that isnt an answer. The team at XBOX chose that bandwidth for that part. Why? If they could not fill the GPU with work it would have been better to go with a 448GB/s system bandwidth rate, So why keep that if they COULDNT fill it?? No one on this board can answer that.

2: If the I/O was an after thought why did they design both a HW block that has a decompression rate of 6+GB/s to deal with decompression as part of their solution in addtion to a fast by any other reckoning SSD? You dont design ASICs as an afterthought. Remember MS staff have been using XSX's in their homes since before the new year so the design was complete quite a while ago. Where is your factual basis for this assertion here: "we know for a fact that for MS SSD was much like an afterthought on hw level " Where? Other than your mind, I need an external proof.

3. Why would Phil Spencer speak on using SSD as RAM a year ago if this was an afterthought?

4. You DO KNOW that in the roadmap of RDNA3 and Nvidia future architectures having Storage attached GPU is a thing right? Do you think that AMD and Nvidia both independently decide that was goona be a thing.. and MS didnt have a clue and tucked it into DirectX12U randomly like a dollar in a shoe?
"Just as GPUDirect RDMA (Remote Direct Memory Address) improved bandwidth and latency when moving data directly between a network interface card (NIC) and GPU memory, a new technology called GPUDirect Storage enables a direct data path between local or remote storage, like NVMe or NVMe over Fabric (NVMe-oF), and GPU memory. Both GPUDirect RDMA and GPUDirect Storage avoid extra copies through a bounce buffer in the CPU’s memory and enable a direct memory access (DMA) engine near the NIC or storage to move data on a direct path into or out of GPU memory – all without burdening the CPU or GPU. "

"Like HBCC, NVCache will use some of the system RAM and SSD of your system, and super-speed game load times and optimize VRAM usage.

Read more: https://www.tweaktown.com/news/72399/nvidia-ampere-rumor-nvcache-speeds-up-load-times-optimize-vram-usage/index.html"

So you think with that access, insight and 4 years design time... and readily available technology to put into their console... it was an afterthought.. yeah? About Sony. Not about bringing all of these architectures under the DX12 umbrella - console and PC. It was abotu responding to a Sony solution that the PCIE 4 specification itself exceeds on its own without special HW.

Some folks are special.
John Carmark's statement against TimS's argument already shown the direction for PC i.e. unbuffered I/O and fix the GPU drivers.

 
Last edited:
  • Like
Reactions: Trueblakjedi

rntongo

Banned
Feb 25, 2020
343
657
275
There are penalties for each. Devs CAN but up until now DIDN'T for a reason. If the latency on PS5 IO and SSD solution is much much better than XSX as I figure then SSD option will be viable only for one system still. Software stack can not reduce latency on its own, hw must be close at hand + engineered and designed for that in mind. XSX blown up shots of motherboard puts SSD really far away and we know for a fact that for MS SSD was much like an afterthought on hw level and tacked on as software stack with DirectStorage API as the wishful savior. We will see these very soon. But a load comparison might already give you an idea, with One X vs XSX only incrementally improving the load times, like in this gen putting an SATA SSD in and only halving load times. For PS5 the leaked sub 1 second load time is ground breaking proves PS5 can use SSD like RAM, and might be the only one at that.
I don’t know if you’ve been following this thread for enough time. But we’ve gone over most of what you’ve said. We’ve debunked some of the things that you’ve said.

For example the PS5 SSD demo was on an old slower devkit and had an easier workload of 8 seconds compared to the XSX which had to cut down 50 seconds.

Secondly we have not seen what the XSX XVA can do. The demos according to sources, I and others have included in other replies, do not use the Decompression block and the direct storage api.
 
Last edited:
  • Like
Reactions: Trueblakjedi

rntongo

Banned
Feb 25, 2020
343
657
275
Just wanted to say that Kirby Louise sounds very reasonable...



It's a shame that people will dismiss her claims, but whatever. Just wanted to post a few more things she said;

She definitely knows what she's talking about and has experience developing games. I know she's spoken well about certain aspects of the PS5. But because she follows and retweets some inflammatory tweets, it puts some people off.
 

MasterCornholio

Gold Member
Mar 27, 2020
1,688
4,397
505
But because she follows and retweets some inflammatory tweets, it puts some people off.
She really needs to stop that. I would love if she didn't do that but I really can't trust her opinion if she retweets people like Tom Dog for example.

I've seen other developers shut those people down when they spread FUD. She should do the same.
 
Last edited:
  • Like
Reactions: rntongo

rntongo

Banned
Feb 25, 2020
343
657
275
She really needs to stop that. I would love if she didn't do that but I really can't trust her opinion if she retweets people like Tom Dog for example.

I've seen other developers shut those people down when they spread FUD. She should do the same.
Besides some of her retweets she's solid. She has some good insights into how the two systems perform. During these times she's one of the best sources of information.
 

Ascend

Gold Member
Jul 23, 2018
1,607
2,139
485
She really needs to stop that. I would love if she didn't do that but I really can't trust her opinion if she retweets people like Tom Dog for example.

I've seen other developers shut those people down when they spread FUD. She should do the same.
I think she's quite young. In one of her tweets she's saying she has a part time job to cover for her college.
 
  • Triggered
Reactions: Neo_game

MasterCornholio

Gold Member
Mar 27, 2020
1,688
4,397
505
I think she's quite young. In one of her tweets she's saying she has a part time job to cover for her college.
Ok so she's an inexperienced developer who is a bit immature.

I'm sure she will get better with time but it's a little difficult to take her completely seriously.
 
Aug 28, 2019
2,165
3,960
510
www.instagram.com
Just wanted to say that Kirby Louise sounds very reasonable...



It's a shame that people will dismiss her claims, but whatever. Just wanted to post a few more things she said;

If true that's very interesting. For example, for PS5 if the I/O is delivering the data directly to RAM then does that mean it shares the bus with the CPU and GPU? Basically meaning if it's delivering data to RAM the CPU and GPU cannot access data at that time?

This is from me at least knowing how the APUs are designed and there will be bus contention between CPU and GPU on both PS5 and XSX, so from what she's saying PS5 is writing storage contents to RAM through the I/O (faster than XSX can obviously) but if everything is sharing bus access it could be assumed the CPU and GPU are not accessing the RAM while this is being done? Or maybe there is a specific memory range the I/O can write to so the CPU and/or GPU can still access data in the other ranges (perhaps this could also explain part of the reason the I/O block has the cache coherency engines (although there are other reasons I can assume the cache coherency engines are there))?

I'm assuming the I/O block shares the bus anyway; in the Road to PS5 presentation we clearly see it has a pathway directly to system RAM but we know the RAM is also linked to the CPU and GPU and PS5 isn't using a "split" memory pool like XSX (which is in itself not even a true split pool like we see on PC or older game consoles). It's hUMA, so logical conclusion is when it's sending or receiving data to memory the CPU and GPU have wait, which might explain why it's got such beefy hardware in it; the quicker it can do its work in sending data into RAM the sooner the CPU and GPU can get access back along the memory bus.

It also feels like they're probably right about XSX's setup, because I figure MS would want to invest further into executeIndirect and that is a feature which'll likely make its way to AMD GPUs in short order thanks to their partnership, and it can already leverage off work AMD's done with their SSG cards. Since the 100 GB partition is being treated as extended memory, there aren't any specific calls to the data there the game code needs to do that they wouldn't need to do addressing data in the actual RAM. Trade-off being the data in the 100 GB partition is magnitudes slower than GDDR6 RAM. And that probably also might be part of a reason they went with a slower SSD I/O (in regards the paper specs) and why they still relegate a portion of the taskwork on the CPU (1/10th of a single core).

Hell, we might not need the August presentation at this rate xD. This is just me trying to piece together some speculation on MS and Sony's respective approaches though. I like them both for different reasons, it's pretty easy to picture where respective strengths and weaknesses are at but overall they're both very viable methods, assuming this speculation that's been scatterbrained turns out to be even partly on the mark.

Ok so she's an inexperienced developer who is a bit immature.

I'm sure she will get better with time but it's a little difficult to take her completely seriously.
TBF she knows more than most people on this forum and Era; any social media hijinks they're doing now they'll grow out of it, they probably don't even engage all too much with some of the people they follow on Twitter anyway.

Regardless none of that's particularly important to me personally; if they have a firm grasp on their stuff and can elucidate on it, I'm willing to listen. Nothing in those tweets seemed outwardly negative towards PS5, simply not referring to PS5 in mentions doesn't qualify as being outwardly negative on it or anything.
 
Last edited:

MasterCornholio

Gold Member
Mar 27, 2020
1,688
4,397
505
If true that's very interesting. For example, for PS5 if the I/O is delivering the data directly to RAM then does that mean it shares the bus with the CPU and GPU? Basically meaning if it's delivering data to RAM the CPU and GPU cannot access data at that time?

This is from me at least knowing how the APUs are designed and there will be bus contention between CPU and GPU on both PS5 and XSX, so from what she's saying PS5 is writing storage contents to RAM through the I/O (faster than XSX can obviously) but if everything is sharing bus access it could be assumed the CPU and GPU are not accessing the RAM while this is being done? Or maybe there is a specific memory range the I/O can write to so the CPU and/or GPU can still access data in the other ranges (perhaps this could also explain part of the reason the I/O block has the cache coherency engines (although there are other reasons I can assume the cache coherency engines are there))?

I'm assuming the I/O block shares the bus anyway; in the Road to PS5 presentation we clearly see it has a pathway directly to system RAM but we know the RAM is also linked to the CPU and GPU and PS5 isn't using a "split" memory pool like XSX (which is in itself not even a true split pool like we see on PC or older game consoles). It's hUMA, so logical conclusion is when it's sending or receiving data to memory the CPU and GPU have wait, which might explain why it's got such beefy hardware in it; the quicker it can do its work in sending data into RAM the sooner the CPU and GPU can get access back along the memory bus.

It also feels like they're probably right about XSX's setup, because I figure MS would want to invest further into executeIndirect and that is a feature which'll likely make its way to AMD GPUs in short order thanks to their partnership, and it can already leverage off work AMD's done with their SSG cards. Since the 100 GB partition is being treated as extended memory, there aren't any specific calls to the data there the game code needs to do that they wouldn't need to do addressing data in the actual RAM. Trade-off being the data in the 100 GB partition is magnitudes slower than GDDR6 RAM. And that probably also might be part of a reason they went with a slower SSD I/O (in regards the paper specs) and why they still relegate a portion of the taskwork on the CPU (1/10th of a single core).

Hell, we might not need the August presentation at this rate xD. This is just me trying to piece together some speculation on MS and Sony's respective approaches though. I like them both for different reasons, it's pretty easy to picture where respective strengths and weaknesses are at but overall they're both very viable methods, assuming this speculation that's been scatterbrained turns out to be even partly on the mark.



TBF she knows more than most people on this forum and Era; any social media hijinks they're doing now they'll grow out of it, they probably don't even engage all too much with some of the people they follow on Twitter anyway.

Regardless none of that's particularly important to me personally; if they have a firm grasp on their stuff and can elucidate on it, I'm willing to listen. Nothing in those tweets seemed outwardly negative towards PS5, simply not referring to PS5 in mentions doesn't qualify as being outwardly negative on it or anything.
I've seen some of that Tim Dog kind of stuff though which is why I'm a bit reluctant.

But maybe my problem isn't with her but another user on Gaf that uses her tweets in the wrong way.

Also does Kirby have a PS5 developer kit?
 
  • Fire
Reactions: Neo_game

oldergamer

Member
Aug 20, 2004
1,982
703
1,520
So some serious info in this thread. So we now think 100gb has a direct connection to cpu/gpu? So less latency for direct access the it would be to copy into RAM. That is very different then what sony has.
 
  • Like
Reactions: Trueblakjedi

Ascend

Gold Member
Jul 23, 2018
1,607
2,139
485
I've seen some of that Tim Dog kind of stuff though which is why I'm a bit reluctant.

But maybe my problem isn't with her but another user on Gaf that uses her tweets in the wrong way.

Also does Kirby have a PS5 developer kit?
Nope. She says it's too expensive for her. She develops for the Switch, PC and Xbox because of that. As a hobby I might add.

If true that's very interesting. For example, for PS5 if the I/O is delivering the data directly to RAM then does that mean it shares the bus with the CPU and GPU? Basically meaning if it's delivering data to RAM the CPU and GPU cannot access data at that time?

This is from me at least knowing how the APUs are designed and there will be bus contention between CPU and GPU on both PS5 and XSX, so from what she's saying PS5 is writing storage contents to RAM through the I/O (faster than XSX can obviously) but if everything is sharing bus access it could be assumed the CPU and GPU are not accessing the RAM while this is being done? Or maybe there is a specific memory range the I/O can write to so the CPU and/or GPU can still access data in the other ranges (perhaps this could also explain part of the reason the I/O block has the cache coherency engines (although there are other reasons I can assume the cache coherency engines are there))?

I'm assuming the I/O block shares the bus anyway; in the Road to PS5 presentation we clearly see it has a pathway directly to system RAM but we know the RAM is also linked to the CPU and GPU and PS5 isn't using a "split" memory pool like XSX (which is in itself not even a true split pool like we see on PC or older game consoles). It's hUMA, so logical conclusion is when it's sending or receiving data to memory the CPU and GPU have wait, which might explain why it's got such beefy hardware in it; the quicker it can do its work in sending data into RAM the sooner the CPU and GPU can get access back along the memory bus.

It also feels like they're probably right about XSX's setup, because I figure MS would want to invest further into executeIndirect and that is a feature which'll likely make its way to AMD GPUs in short order thanks to their partnership, and it can already leverage off work AMD's done with their SSG cards. Since the 100 GB partition is being treated as extended memory, there aren't any specific calls to the data there the game code needs to do that they wouldn't need to do addressing data in the actual RAM. Trade-off being the data in the 100 GB partition is magnitudes slower than GDDR6 RAM. And that probably also might be part of a reason they went with a slower SSD I/O (in regards the paper specs) and why they still relegate a portion of the taskwork on the CPU (1/10th of a single core).

Hell, we might not need the August presentation at this rate xD. This is just me trying to piece together some speculation on MS and Sony's respective approaches though. I like them both for different reasons, it's pretty easy to picture where respective strengths and weaknesses are at but overall they're both very viable methods, assuming this speculation that's been scatterbrained turns out to be even partly on the mark.
Maybe this will help you further. I'll post every other tweet since you can read two at the same time;

 

MasterCornholio

Gold Member
Mar 27, 2020
1,688
4,397
505
Nope. She says it's too expensive for her. She develops for the Switch, PC and Xbox because of that. As a hobby I might add.


Maybe this will help you further. I'll post every other tweet since you can read two at the same time;

So which one will produce better results in the end?
 

Ascend

Gold Member
Jul 23, 2018
1,607
2,139
485
So which one will produce better results in the end?
Hard to tell. It seems as if the PS5 is designed to load as much as possible as fast as possible towards RAM, while the XSX is designed to load only what is required, bypassing the RAM when convenient. If she is right, of course. The programming will play a role;

 
Last edited:
Aug 28, 2019
2,165
3,960
510
www.instagram.com
Also does Kirby have a PS5 developer kit?
I have zero idea. They have Xbox Series X as one of the platforms in their Twitter bio; if they don't have a PS5 devkit then they might know a dev who is working with one and they talk back and forth.

So some serious info in this thread. So we now think 100gb has a direct connection to cpu/gpu? So less latency for direct access the it would be to copy into RAM. That is very different then what sony has.
Well, there's a lot of info but there's also a lot of speculation, to be perfectly fair. It's possible both systems have direct CPU/GPU connections but one prioritizes that (at the expense of slower/more overhead writing to RAM) and the other prioritizes writing directly to RAM (at the expense of slower/more overhead with direct CPU/GPU access).

From what we've seen of the PS5, the I/O complex has a direct path to the system RAM, so the assumption is that the system is able to utilize that to write out required data to RAM rather than having the CPU do it (the CPU still communicates with the I/O complex to instruct what data to write to the RAM though, or what data to take from RAM and write back out to storage). We didn't see any connection from the I/O complex to the GPU and given the focus Cerny put on the SSD in that presentation, if that was a feature, they surely would've highlighted it at the event since it's kind of a big deal.

The thing though is that since PS5's I/O complex handles data transfer operations from RAM to NAND and vice-versa pretty much all on its own (aside from the CPU instructing it on what to do, and it's possible they have some features in the I/O complex to store a state of instructions that can be triggered through some type of event process), and has the speed necessary to facilitate this, it can stream in needed data very quickly, especially if it is compressed.

The question is if the I/O complex is writing and reading data to/from NAND to/from RAM, do the CPU and/or GPU access data at that moment? Given the PS5's RAM setup, I'm going to say "no", because the I/O complex has to share the bus with the APU given the hUMA memory setup. Unless the I/O complex can write data to a virtualized partition range and the CPU/GPU can access data not in that range while it is reading or writing to that range, but I don't think that is possible without some intermediate hardware. However, it might be why the cache coherency engines are in place: maybe the I/O complex can have its most recent operations checked by the CPU, that way the CPU knows what data might've been updated by the I/O complex while it is operating on a memory range.

While the I/O complex writes to a range, the CPU can take needed data and put it in other locations to make room for more data being written to the partitioned range by the I/O complex. I figure something like that could be done seeing as OSes on computers allocate memory ranges for different programs to contain their contents in; those partitioned ranges for those applications are considered private so that one application doesn't know the contents of the data in the other program's range, though. I'm guessing there's a way they could have the PS5's OS manage a partition range/bank for the I/O Complex to write data to in RAM, and use the cache coherency engines to assist in the CPU knowing if new updated versions of data are in the partition range the I/O complex writes to, that way maybe the CPU can get the updated data from there instead?

I'm actually really curious to see how having the I/O complex write data contents directly to RAM play into any potential bus contention because, again, if the CPU and GPU already have to share the bus (same as with XSX), then I assume another part of the system sending and retrieving data from RAM also has to share the bus.

As for the XSX, from what it seems like the CPU is using that 1/10th of a single core to write data to/from NAND to/from the RAM, using things like the decompression block along the way if needed. So there's an extra step there compared to PS5, but I'm assuming from there the CPU can write any incoming data from the NAND into either of the two pools of the "split" GDDR6 memory. However, it seems the XSX might have a direct connection between the NAND and GPU for GPU-formatted data, so if the GPU needs any of that type of data it can basically page the 100 GB partition of NAND on the SSD as extended memory (albeit magnitudes slower than GDDR6, but that's just NAND in general tbh).

The fact they already have features on XBO's GPU for executeIndirect (allowing the GPU to perform some limited operations without waiting on the CPU), and the fact they are obviously continuing that with XSX, would suggest they are probably rolling with this setup. Ironically while people like Moore's Law Is Dead did touch on the idea, they did so regarding PS5, not XSX. But from what it seems like the systems are approaching it might be XSX taking this GPU/NAND direct link approach. I guess both systems could do both approaches but that would probably overly complicate the hardware design and not be worth the costs required to cover that much hardware and software functionality.

I'm just speculating based on some actual info and some stuff that can be extrapolated/inferred from that, and some questions on my end about possible design choices that may or may not even be possible. August can't come soon enough xD.

Nope. She says it's too expensive for her. She develops for the Switch, PC and Xbox because of that. As a hobby I might add.


Maybe this will help you further. I'll post every other tweet since you can read two at the same time;

That's quite the gold mine of speculative info tbh x3. Really appreciate this; I got some other speculative details on the SSD I/Os a few days ago and been trying to formulate some thoughts in private. This might help with sorting out some of that stuff I've been sitting on.

I can definitely see the viability to this approach if it's what MS is doing (and there's a decent bit of evidence to suggest it's what they're doing), while there's also a lot of viability to Sony's approach. They're both seemingly doing a lot through some divergent means playing to some historical and market strengths of the respective platform holders.

I am curious to know how is it possible for a college student to get Xbox devkit ?
Same way a single Chinese dude can be developing a game for XSX?

This would not be the first time a young dev got access to then-new console devkit hardware, or devkits to develop on in general. MS probably has some contracts or plans for devs of different financial persuasions.
 
Last edited:

oldergamer

Member
Aug 20, 2004
1,982
703
1,520
So the differences between the two systems.

Ps5 5.5gb ssd copy to 16gb
XsX 2.4gb ssd copy to 16gb + 2.4gb direct access to 100gb by cpu/gpu

So for tiled resources using sfs, xbox can access those right from the 100gb without copying into system mem. If that is the case then thats an interesting setup.
 

MasterCornholio

Gold Member
Mar 27, 2020
1,688
4,397
505
Hard to tell. It seems as if the PS5 is designed to load as much as possible as fast as possible towards RAM, while the XSX is designed to load only what is required, bypassing the RAM when convenient. If she is right, of course. The programming will play a role;

But in the end they are both limited by what they can pull from the SSD. So I believe that the speeds of the drives will still be very important even with Microsofts ability to load the data directly into the CPU/GPU.
 
Last edited:

Ascend

Gold Member
Jul 23, 2018
1,607
2,139
485
So the differences between the two systems.

Ps5 5.5gb ssd copy to 16gb
XsX 2.4gb ssd copy to 16gb + 2.4gb direct access to 100gb by cpu/gpu

So for tiled resources using sfs, xbox can access those right from the 100gb without copying into system mem. If that is the case then thats an interesting setup.
Indeed. The way I am speculating that it will work is that you'll have low quality MIPs in RAM. When the game requests a higher quality MIP, rather than requiring to load it from RAM to GPU, it's loaded from SSD to GPU instead, and afterwards stored to RAM for re-use if required. If it's not in use for a long time it will get overwritten by some other texture that follows the same cycle.
 

rnlval

Member
Jun 26, 2017
363
298
255
Sector 001
gpucuriosity.wordpress.com
Hard to tell. It seems as if the PS5 is designed to load as much as possible as fast as possible towards RAM, while the XSX is designed to load only what is required, bypassing the RAM when convenient. If she is right, of course. The programming will play a role;

Well, the Linux partition regime has virtual memory (Linux-swap) in a separate partition, hence bypassing the normal file system.
 

Ascend

Gold Member
Jul 23, 2018
1,607
2,139
485
But in the end they are both limited by what they can pull from the SSD. So I believe that the speeds of the drives will still be very important even with Microsofts ability to load the data directly into the CPU/GPU.
According to Kirby Louise, for some reason she claims diminishing returns with SSDs over 2GB/s. Why that number, I have no idea. The speed of the drives will be mainly important to off-set the relatively low jump in amount of RAM compared to previous gens. Theoretically, the faster the drive, the less you need to keep in RAM, all else being equal.
 
Last edited:

Trueblakjedi

Member
Mar 30, 2020
177
378
220
According to Kirby Louise, for some reason she claims diminishing returns with SSDs over 2GB/s. Why that number, I have no idea. The speed of the drives will be mainly important to off-set the relatively low jump in amount of RAM compared to previous gens. Theoretically, the faster the drive, the less you need to keep in RAM, all else being equal.
2 TB or GB?
 

MasterCornholio

Gold Member
Mar 27, 2020
1,688
4,397
505
Theoretically, the faster the drive, the less you need to keep in RAM, all else being equal.
That's pretty interesting. So with that applied to the XSX and PS5 it's possible that the PS5 wouldn't need as much ram for the assets. I wonder what they can use the extra space for.
 

rnlval

Member
Jun 26, 2017
363
298
255
Sector 001
gpucuriosity.wordpress.com
According to Kirby Louise, for some reason she claims diminishing returns with SSDs over 2GB/s. Why that number, I have no idea. The speed of the drives will be mainly important to off-set the relatively low jump in amount of RAM compared to previous gens. Theoretically, the faster the drive, the less you need to keep in RAM, all else being equal.
One of the limiters is the number of pixels with the framebuffer. Excess subpixel data leads to a wasteful overdraw situation.
 

Ascend

Gold Member
Jul 23, 2018
1,607
2,139
485
One of the limiters is the number of pixels with the framebuffer. Excess subpixel data leads to a wasteful overdraw situation.
That might indeed be one of the reasons. So maybe that implies that around 2GB/s should be enough for per pixel detail at 4k?
 

Neo_game

Member
Mar 19, 2020
159
258
240
Same way a single Chinese dude can be developing a game for XSX?

This would not be the first time a young dev got access to then-new console devkit hardware, or devkits to develop on in general. MS probably has some contracts or plans for devs of different financial persuasions.
As far as I know the dev making Bright Memory infinite is expecting to finish the game on PC by 2020 and then he is planning to port the game to next gen console. I do not think think he has any devkit. He said according to his schedule the game should get completed on PC by December.

I will be really surprised if she is even having a devkit. Unless she has some serious favor from Microsoft.
 
Last edited:

MasterCornholio

Gold Member
Mar 27, 2020
1,688
4,397
505
That might indeed be one of the reasons. So maybe that implies that around 2GB/s should be enough for per pixel detail at 4k?
Sounds like to me that Sony aimed higher because they wanted everything to be almost instant on the system.

2GB/s could be enough for 4K assets but 5.5GB/s is way more than that. And I know that the PS5 isn't an 8K console.
 

Ascend

Gold Member
Jul 23, 2018
1,607
2,139
485
As far as I know the dev making Bright Memory infinite is expecting to finish the game on PC by 2020 and then he is planning to port the game to next gen console. I do not think think he has any devkit.

I will be really surprised if she is even having a devkit. Unless she has some serious favor from Microsoft.
She's probably using ID@Xbox, and if she passes qualifications she gets a dev kit for free. This is on her Github page;

Where's the support for PlayStation?
PS4/5 development kits are far too expensive for me to obtain (they are in excess of $4000, compared to Switch's $400 and Xbox's $0 dev mode). Unless someone wants to donate me a couple of PS4/5 devkits, I have no plans to support the platform. I apologize for any inconvience this may cause. Talk to Sony about lowering the price of their devkits to something reasonable if you are unhappy. You are of course free to make your own PS4/5 fork of Void2D, as long as it remains open source. If I do manage to get my hands on some PS4/5 devkits I may even use your code in the main branch, and of course you will be fully credited.


 

Ascend

Gold Member
Jul 23, 2018
1,607
2,139
485
The other limit is AMD GPU's geometry performance. Convince me that AMD RDNA 2 GPU can beat RTX 2080/RTX 2080 Ti on geometry performance at a saturated state.



Asteroid mesh shader demo running at 4K 60 fps on RTX 2080 (TU104).
I don't think we have that info, although this is a reference. Yeah it's WCCFTech, but the video is legit from the MS DX12 YouTube channel.

 

Neo_game

Member
Mar 19, 2020
159
258
240
She's probably using ID@Xbox, and if she passes qualifications she gets a dev kit for free. This is on her Github page;

Where's the support for PlayStation?
PS4/5 development kits are far too expensive for me to obtain (they are in excess of $4000, compared to Switch's $400 and Xbox's $0 dev mode). Unless someone wants to donate me a couple of PS4/5 devkits, I have no plans to support the platform. I apologize for any inconvience this may cause. Talk to Sony about lowering the price of their devkits to something reasonable if you are unhappy. You are of course free to make your own PS4/5 fork of Void2D, as long as it remains open source. If I do manage to get my hands on some PS4/5 devkits I may even use your code in the main branch, and of course you will be fully credited.


Yes I just saw that. I still find it little weird that on her free time from college she is making open source engine for Xbox and Switch but not for mobiles. I know it will be free but more likely hood of people trying her game engine on mobile platforms than consoles
 
Last edited:

oldergamer

Member
Aug 20, 2004
1,982
703
1,520
She's probably using ID@Xbox, and if she passes qualifications she gets a dev kit for free. This is on her Github page;

Where's the support for PlayStation?
PS4/5 development kits are far too expensive for me to obtain (they are in excess of $4000, compared to Switch's $400 and Xbox's $0 dev mode). Unless someone wants to donate me a couple of PS4/5 devkits, I have no plans to support the platform. I apologize for any inconvience this may cause. Talk to Sony about lowering the price of their devkits to something reasonable if you are unhappy. You are of course free to make your own PS4/5 fork of Void2D, as long as it remains open source. If I do manage to get my hands on some PS4/5 devkits I may even use your code in the main branch, and of course you will be fully credited.


4000 is nothing. I remember when devkits cost 12k for psx.
 
  • Like
Reactions: Trueblakjedi

Ascend

Gold Member
Jul 23, 2018
1,607
2,139
485
Yes I just saw that. I still find it little weird that on her free time from college she is making open source engine for Xbox and Switch but not for mobiles. I know it will be free but more likely hood of people trying her game engine on mobile platforms than consoles
Maybe she simply prefers fixed platforms. There are more things to take into account if you want to develop for Android for example. She's still developing for Windows though, so... I don't know. We can ask her lol.

4000 is nothing. I remember when devkits cost 12k for psx.
It can be a lot for a college student. Depending on where she's from, that might be 3 months or more of her whole monthly income.
 

oldergamer

Member
Aug 20, 2004
1,982
703
1,520
So my question now is, if the XSX - SSD has 4 lanes to system memory ( 16Gb) what bandwidth is the CPU/GPU able to access data directly from 100GB at? I don't see how it could be the same 4 lanes if those components see them like system memory.
 

rntongo

Banned
Feb 25, 2020
343
657
275
That would NEVER pass Sony or MS certifications. Letting devs create their own low level hardware access is instant recipe for disaster. All it takes is a dev making an error and suddenly you have a jailbreaked console and rampant piracy.
I think we're having a semantic issue here. Sony & Microsoft provide low level APIs which devs can use for example to directly access the RT hardware. So on the Xbox version of DX12U, devs can directly access the GPU and use their own RT libraries instead of those provided by DirectX Ray Tracing API.

Here is a quote:
"In grand console tradition, we also support direct to the metal programming including support for offline BVH construction and optimisation. With these building blocks, we expect ray tracing to be an area of incredible visuals and great innovation by developers over the course of the console's lifetime."
 

rntongo

Banned
Feb 25, 2020
343
657
275
And for those wondering, no there is no 100GB partition of the XSX SSD. It is simply up to 100GB of the game install that acts as virtual RAM.

Here is the quote:
"The idea, in basic terms at least, is pretty straightforward - the game package that sits on storage essentially becomes extended memory, allowing 100GB of game assets stored on the SSD to be instantly accessible by the developer."

I can see how this is possible on the PS5 using the second I/O co-processor. But since there are indications that it will be directly accessible to the CPU on the XSX it's even more impressive and I look forward to finding out how exactly they did this. Also it shows there's more to the I/O architecture in the XSX.
 
Last edited: