• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Velocity Architecture - 100 GB is instantly accessible by the developer through a custom hardware decompression block

Is what is being said impossible, or is what is being said simply different than the traditional way of doing things? I am asking because, sometimes people have trouble seeing new possibilities when they have been in something for too long. Like, it's hard for a tennis player to bat properly in baseball, and vice versa.

I'm not blind to new possibilities, but there is no evidence of such. Remember, they are using AMD components here, and massive customisations are expensive. But let's wildly speculate for fun.

There are variations that could be considered.
Imagine we wanted to create a very fast cache that could be written to directly (or even via VRAM, as caches are good for repeat reads anyway). We decide to make it bigger and more useful to pull stuff from the SSD or just commonly used stuff from VRAM. Let's make it 64mb or even 128mb. We also want to make it part of the die, to improve access times. The fastest way to do this is SRAM (which current caches are made using) which access nanosecond access times. 128mb might not sound like much, but there are 16,000,000ns in a single 16.6ms frame (60fps) and SRAM has around 50-100ns access times.

Congrats you have just created ESRAM which the Xbox One uses, which not only increases the die-size and heat profile of the chip, but was so hard to use, it was removed from the Xbox One X.

Developers like large, homogeneous hunks of RAM, even having two different memory profiles like the XSX has is worrisome (it's not a massive gulf, but you have to wonder why they did it.

I think a big part of what people are missing is, once you have the data in RAM, the next part is important, and it's that part where the XSX will shine. (Rendering).
 

Lethal01

Member
Definitely more than 2,3












And all of this before the actual game reveal event on July. They may still reveal something on the June.


The medium and Scorn teams are not AAA.
Hellblade team is also AA if I remember right, but the game does look pretty good.

So yeah 2 or 3
 
Last edited:

Bernkastel

Ask me about my fanboy energy!
The medium and Scorn teams are definitely not AAA.
Hellblade team is also AA if I remember right, but the game does look pretty good.

So yeah 2 or 3
Hellblade was AA, but Hellblade II is a full 100 person first party AAA project. Medium was originally announced in 2012 for 360, PS3 and WiiU but is now console exclusive to XSX, it is Bloober Team's "most ambitious project yet" and they call it "biggest game we've built". At this point Medium is like what Until Dawn is to PS4, you dont call them AA. Scorn is the same with development starting in 2013. Both Medium and Scorn are way too ambitious to be simply called AA.
 

Lethal01

Member
Hellblade was AA, but Hellblade II is a full 100 person first party AAA project. Medium was originally announced in 2012 for 360, PS3 and WiiU but is now console exclusive to XSX, it is Bloober Team's "most ambitious project yet" and they call it "biggest game we've built". At this point Medium is like what Until Dawn is to PS4, you dont call them AA. Scorn is the same with development starting in 2013. Both Medium and Scorn are way too ambitious to be simply called AA.

Hellblade sure,

Bloober and scorn while ambitious for the company and visually impressive aren't reaching AAA level when it comes to budget and team size.
That said whether they are AAA doesn't really matter that much. I think the point that was being made is they don't have the "Heavy hitters" on the level of something like RE8 or DMC6 or elden ring etc.

We have seen about 3 games of that level. I'll dip out now, whether Scorn is in the same lane as Halo or Bloodbourne 2 doesn't affect eh architecture.
 
Last edited:
Interesting. According to Kirby (38m40s) even PS5 I/O will be too slow to stream data that's only visible on the screen.
You aren't pulling in multiple 4k textures or entire meshes. 150mb NEW data per frame is a lot of data. (You only pull in new pages you need, pages are 64kb in typical implementations.)
 
Last edited:

Frederic

Banned
Hellblade sure,

Bloober and scorn while ambitious for the company and visually impressive aren't reaching AAA level when it comes to budget and team size.
That said whether they are AAA doesn't really matter that much. I think the point that was being made is they don't have the "Heavy hitters" on the level of something like RE8 or DMC6 or elden ring etc.

We have seen about 3 games of that level. I'll dip out now, whether Scorn is in the same lane as Halo or Bloodbourne 2 doesn't affect eh architecture.

What about PS5 then? What AAA games do we know of? Please don't tell me it's Godfall
 

Bernkastel

Ask me about my fanboy energy!
Hellblade sure,

Bloober and scorn while ambitious for the company and visually impressive aren't reaching AAA level when it comes to budget and team size.
That said whether they are AAA doesn't really matter that much. I think the point that was being made is they don't have the "Heavy hitters" on the level of something like RE8 or DMC6 or elden ring etc.

We have seen about 3 games of that level. I'll dip out now, whether Scorn is in the same lane as Halo or Bloodbourne 2 doesn't affect eh architecture.
There actual game reveal event is on July though. And we still got a lot before the actual event.
 

Ascend

Member
I'm not blind to new possibilities, but there is no evidence of such. Remember, they are using AMD components here, and massive customisations are expensive. But let's wildly speculate for fun.

There are variations that could be considered.
Imagine we wanted to create a very fast cache that could be written to directly (or even via VRAM, as caches are good for repeat reads anyway). We decide to make it bigger and more useful to pull stuff from the SSD or just commonly used stuff from VRAM. Let's make it 64mb or even 128mb. We also want to make it part of the die, to improve access times. The fastest way to do this is SRAM (which current caches are made using) which access nanosecond access times. 128mb might not sound like much, but there are 16,000,000ns in a single 16.6ms frame (60fps) and SRAM has around 50-100ns access times.

Congrats you have just created ESRAM which the Xbox One uses, which not only increases the die-size and heat profile of the chip, but was so hard to use, it was removed from the Xbox One X.
What would the access time be on an NVMe SSD with minimal API overhead?

And going back... Imagine if you actually wanted to load data from SSD directly to the GPU, bypassing VRAM, and you wanted to do that without API overhead. How would you go about solving that problem?

Developers like large, homogeneous hunks of RAM, even having two different memory profiles like the XSX has is worrisome (it's not a massive gulf, but you have to wonder why they did it.
It's quite obvious why they did it. A wider memory bus and more bandwidth. If they had to go with 16 GB of RAM, normally you would do either 8 x 2GB or 16 x 1GB chips. The former would give you a 256-bit bus, and the latter would give you 512-bit bus. The former is what the PS5 does. The former would likely bottleneck the GPU and the latter was most likely too expensive, so, they went with the middle ground of a 320 bit bus, which is 10 RAM chips (6 x 2GB + 4 x 1GB).

I think a big part of what people are missing is, once you have the data in RAM, the next part is important, and it's that part where the XSX will shine. (Rendering).
I don't think we're missing this. It's just the least interesting aspect to talk about.
 
What would the access time be on an NVMe SSD with minimal API overhead?

And going back... Imagine if you actually wanted to load data from SSD directly to the GPU, bypassing VRAM, and you wanted to do that without API overhead. How would you go about solving that problem?

Usually in the microseconds range, which is a couple of orders of magnitude slower than cache, but faster than the everage frametime in a game.
Loading data by bypassing RAM is like cooking a bowl of rice, one grain of rice one at a time. GPUs deal with pointers and stack data stored in registers, and also uses L1 cache to share data across threads. Then you just have VRAM. There is nowhere else to load data.


What data is stored in a register / cache is highly dependent on what is being rendered, what quality, and how it was scheduled. It's controlled by the runtime. Most of this data is gotten from RAM in the first place to be used by the runtime.
 

Ascend

Member
Loading data by bypassing RAM is like cooking a bowl of rice, one grain of rice one at a time. GPUs deal with pointers and stack data stored in registers, and also uses L1 cache to share data across threads. Then you just have VRAM. There is nowhere else to load data.
Which is where we can argue that you can let part of the SSD appear as RAM to the CPU & GPU.

What data is stored in a register / cache is highly dependent on what is being rendered, what quality, and how it was scheduled. It's controlled by the runtime. Most of this data is gotten from RAM in the first place to be used by the runtime.
Most? And where does the rest come from?

You haven't really answered what I asked. You've shown the problems, not how you would solve those problems if you were given the task to do so. If you only had manual cars, and someone gave you a task to create an automatic one, you wouldn't get anywhere by saying how shifting gears is always manual with a stick shift and that you have to have a clutch operated by a human to disconnect the engine from the gear, implying it's an impossible problem to solve.

So yeah. Anyone is free to take a bite;
Imagine if you actually wanted to be able to load data from SSD directly to the GPU, bypassing VRAM when desired, and you wanted to do that without API overhead. How would you go about solving that problem?
 
Which is where we can argue that you can let part of the SSD appear as RAM to the CPU & GPU.
That's Virtual Memory.
Most? And where does the rest come from?
Let's just say all when it comes to games.
You haven't really answered what I asked. You've shown the problems, not how you would solve those problems if you were given the task to do so. If you only had manual cars, and someone gave you a task to create an automatic one, you wouldn't get anywhere by saying how shifting gears is always manual with a stick shift and that you have to have a clutch operated by a human to disconnect the engine from the gear, implying it's an impossible problem to solve.
So yeah. Anyone is free to take a bite;
Imagine if you actually wanted to be able to load data from SSD directly to the GPU, bypassing VRAM when desired, and you wanted to do that without API overhead. How would you go about solving that problem?

It's irrelevant because no-one is actually doing that or trying to. Do you think the XSX is bypassing VRAM? Why?
 
Last edited:

oldergamer

Member
You certainly have offered nothing.
What did you offer? any ideas on what this is beyond, oh it's standard swap space and nothing more. When i read your posts, it sounds like you are just saying MS re-inveted the wheel. I highly doubt whatever it is, MS invested time and research to just end up with a bog standard hardware piece. I'm sure you wouldn't think the same about sony hardware.

If this ends up being some customized tech that does X Y or Z above your expectations, I hope you come back in and acknowledge it.

We will know who is right or wrong in july.
 
Last edited:

Segslack

Neo Member
The problem here is that we do not have any new information regarding to MS's solution but instead the same old twittes that everytime are being shared over and over again... MS really needs to come up and show something more concrete...
 
What did you offer? any ideas on what this is beyond, oh it's standard swap space and nothing more. When i read your posts, it sounds like you are just saying MS re-inveted the wheel. I highly doubt whatever it is, MS invested time and research to just end up with a bog standard hardware piece. I'm sure you wouldn't think the same about sony hardware.

If this ends up being some customized tech that does X Y or Z above your expectations, I hope you come back in and acknowledge it.

We will know who is right or wrong in july.
I'm sure you'll find I'm interested in the tech, not any particular platform. I'm sure MS have tons of optimizations in their hardware stack. What I'm against is the pulling random shit from thin air and dubious sources without any evidence or at least knowledge about the subject matter. I've explained every concept in detail, but I'm sure not going to stoke the flames of fanboism Stay grounded, use facts, and yes, everything is reinventing the wheel, or we'd still be on stones.

If you don't agree with it, lay it out, but don't ask me for hypotheticals about mystical hardware.

Sony laid out a hardware stack for loading data directly into VRAM, it's been backed by developers who know what they're talking about, and seems to be a good solution.
In response, I've heard that the XSX must be loading stuff directly into the GPU and bypassing RAM, because some rando said XSX has figured it out, and the Sony solution isn't any good.

If I say the Sony solution is good, am I a fanboy?
If I say the XSX solution is good am I a fanboy?

Only one is based on reality here.

I'm also not a fan of Smart Shift, and do believe the Sony GPU will be left wanting in a lot of games.

Am I a XSX fanboy now?
 
Last edited:

quest

Not Banned from OT
I'm sure you'll find I'm interested in the tech, not any particular platform. I'm sure MS have tons of optimizations in their hardware stack. What I'm against is the pulling random shit from thin air and dubious sources without any evidence or at least knowledge about the subject matter. I've explained every concept in detail, but I'm sure not going to stoke the flames of fanboism Stay grounded, use facts, and yes, everything is reinventing the wheel, or we'd still be on stones.

If you don't agree with it, lay it out, but don't ask me for hypotheticals about mystical hardware.

Sony laid out a hardware stack for loading data directly into VRAM, it's been backed by developers who know what they're talking about, and seems to be a good solution.
In response, I've heard that the XSX must be loading stuff directly into the GPU and bypassing RAM, because some rando said XSX has figured it out, and the Sony solution isn't any good.

If I say the Sony solution is good, am I a fanboy?
If I say the XSX solution is good am I a fanboy?

Only one is based on reality here.

I'm also not a fan of Smart Shift, and do believe the Sony GPU will be left wanting in a lot of games.

Am I a XSX fanboy now?
Just miss informed in these parts variable clocks = fixed. Sony took crappy smart shift and fixed it so it works as a permit boost. Once you are properly educated you will see the error in your ways and see cerney shift is the biggest breakthrough in a 4 decades.
 

Ascend

Member
You certainly have offered nothing.
Oh give me a break. You're dismissing everything and then pretend nothing has been offered. Things have been said for a specific reason. If you want to dismiss them because it is inconvenient to your biased dinosaur views, that's your problem.

You still can't answer what they mean with 100 GB being instantly accessible, while at the same time you are claiming that all XSX does is the same as before. It's nothing but a dismissal of the possibility that they have something innovative here.

And your thinking doesn't even make any sense. That's why you're so quick to dismiss the 'instantly accessible' statement. You claim it is either virtual memory or normal transfer from SSD to RAM. Well guess what.

If it is virtual memory in the traditional sense, it's not instantly accessible, because you have a lot of overhead.
If it is normal transfer from SSD to RAM, then why only 100GB of the SSD and not the whole drive?

It does not add up. So it is something else. What else is there? Exactly. Transfer directly from the SSD to the GPU with nothing in between. If there is another possibility I'll gladly hear it. But I still have not seen anything that complies with that statement.

That you wish to dismiss this as a possibility due to being stuck in the old ways of doing things is your problem. You sound like the people asking who will pick the cotton if we abolish slavery. That is not the important part.
And when I tried to give you the incentive to think further on how you would solve the issues you yourself are bringing forth, your reply is a mere shallow dismissive "irrelevant since no one is doing that or trying to do that". Really? No one is trying to solve obvious limitations to get more performance? How do you know this is not exactly what XVA is doing?
It was supposed to be a thought experiment to drive us closer to what is happening. But you're not interested in that. You're only interested in saying that the XSX has nothing new.

So yeah. Why are you even in this thread?

I'm also not a fan of Smart Shift, and do believe the Sony GPU will be left wanting in a lot of games.
The GPU will do fine. Either you don't understand SmartShift, or you don't understand hardware.

In response, I've heard that the XSX must be loading stuff directly into the GPU and bypassing RAM, because some rando said XSX has figured it out, and the Sony solution isn't any good.
No one said the Sony solution isn't any good.
...
The cat is out of the bag. I guess it's quite clear now what you're trying to do here. Despite your seemingly above average knowledge, you're no different than the ones trying to disrupt this thread.

If you don't agree with it, lay it out, but don't ask me for hypotheticals about mystical hardware.
Ah yes... "Mystical hardware". The fallacies keep rolling about.

We have are statements made by MS, and unless you are saying they are falsely advertising, which they can be sued for, their statements mean something. And how do we figure out what they mean? By hypotheses, by speculating, by sharing information and putting the puzzles together. It's either that, or waiting for more information.
If you are not interested in hypothesizing, you better simply leave this thread and go wait somewhere else. Allow the ones that are actually interested in discussing possibilities interact with each other without the constant bickering of "oh we already did all that when dinosaurs roamed the earth".
 

KingT731

Member
You still can't answer what they mean with 100 GB being instantly accessible, while at the same time you are claiming that all XSX does is the same as before. It's nothing but a dismissal of the possibility that they have something innovative here.

And your thinking doesn't even make any sense. That's why you're so quick to dismiss the 'instantly accessible' statement. You claim it is either virtual memory or normal transfer from SSD to RAM. Well guess what.

If it is virtual memory in the traditional sense, it's not instantly accessible, because you have a lot of overhead.
If it is normal transfer from SSD to RAM, then why only 100GB of the SSD and not the whole drive?

It does not add up. So it is something else. What else is there? Exactly. Transfer directly from the SSD to the GPU with nothing in between. If there is another possibility I'll gladly hear it. But I still have not seen anything that complies with that statement.

That you wish to dismiss this as a possibility due to being stuck in the old ways of doing things is your problem. You sound like the people asking who will pick the cotton if we abolish slavery. That is not the important part.
And when I tried to give you the incentive to think further on how you would solve the issues you yourself are bringing forth, your reply is a mere shallow dismissive "irrelevant since no one is doing that or trying to do that". Really? No one is trying to solve obvious limitations to get more performance? How do you know this is not exactly what XVA is doing?
It was supposed to be a thought experiment to drive us closer to what is happening. But you're not interested in that. You're only interested in saying that the XSX has nothing new.
I think the issue here seems to be that people are assuming you can transfer everything instantly instead of the more logical approach of something like the SYSTEM can view XXX amount of data on the SSD as Virtual RAM that is ACCESSIBLE instantly. Does this mean the transfer of said data is instant? Absolutely Not.
 
You can call me a dinosaur all you like and say I know nothing about hardware. Doesn’t bother me, let that be your narrative. Mystical because you think something must be there and made up direct GPU to SSD access without evidence or even basic knowledge of why that doesn’t even make sense. I know nothing though, I’ll get back to running sticks together.
 

Ascend

Member
I think the issue here seems to be that people are assuming you can transfer everything instantly instead of the more logical approach of something like the SYSTEM can view XXX amount of data on the SSD as Virtual RAM that is ACCESSIBLE instantly. Does this mean the transfer of said data is instant? Absolutely Not.
I don't think people are making that mistake. Obviously you cannot transfer 100GB instantly, if the transfer limit is 2.4GB/s raw. You can access anything within that 100GB instantly, but the load speed would still be 2.4 GB/s raw. But then the question still remains, why that 100GB and not the full SSD? And again, if you have API overhead, it cannot be considered to be instant...

Let me quote something else here... Old article... A year old in fact...

Thanks to their speed, developers can now use the SSD practically as virtual RAM. The SSD access times come close to the memory access times of the current console generation. Of course, the OS must allow developers access that goes beyond that of a pure storage medium. Then we will see how the address space will increase immensely - comparable to the change from Win16 to Win32 or in some cases Win64.

Of course, the SSD will still be slower than the GDDR6 RAM that sits directly on top of the die. But the ability to directly supply data to the CPU and GPU via the SSD will enable game worlds to be created that will not only be richer, but also more seamless. Not only in terms of pure loading times, but also in terrain mapping. A graphic designer no longer has to worry about when GDDR6 ends and when the SSD starts.


 

FireFly

Member
I don't think people are making that mistake. Obviously you cannot transfer 100GB instantly, if the transfer limit is 2.4GB/s raw. You can access anything within that 100GB instantly, but the load speed would still be 2.4 GB/s raw. But then the question still remains, why that 100GB and not the full SSD? And again, if you have API overhead, it cannot be considered to be instant...
In the Beyond3D forums they were suggesting that the memory address space could be mapped to the SSD in 100 GB chunks.

In any case all physical processes take time, so nothing is truly instantaneous. What counts as "instant" depends entirely on the context in which the statement was made, which we don't fully know.
 

Ascend

Member
The purpose of the residency sample is to generate memory addresses that reach the page table hardware in the graphics processor but do not continue on to become full memory requests. Instead, the residency of the PRT at those addresses is checked and missing pages are non-redundantly logged and requested to be filled by the OS or a delegate.
That is extremely important information. I bet that previously they became full memory requests, causing loading, which ultimately were left unused.
 
Whatever is mapped becomes unavailable.. it’s not hard to understand why you don’t map the entire drive. You also COULD do so, but you don’t, as you probably want that space for your games and stuff. Hence me saying they either reserve a partition permanently, or a file which takes UP to 100gb depending on dev needs.
 

THE:MILKMAN

Member
I think this is an example where a marketing term ('instantly accessible') has made a rod for Microsoft's back because they decided not to fully explain and flesh out what it is, and what it means.
Transfer directly from the SSD to the GPU with nothing in between.
I'm confused about how that makes any sense? The RAM is orders of magnitude faster than the SSD at 560GB/s to the GPU isn't it? Why would you transfer it from the SSD at a max 2.4GB rate (missing out on decompression going direct to the GPU?) and where is that data stored? Let's say Microsoft do have something new that does allow a direct SSD to GPU path (I assume from here it goes to the screen?), what would be the point in having 16GB RAM?
The SSD access times come close to the memory access times of the current console generation
Whether they are talking about latency or BW I'm fairly confident this isn't close to being true, right!?

Obviously this is all above my tech knowledge so apologies for the basic questions.
 

Ascend

Member
I'm confused about how that makes any sense? The RAM is orders of magnitude faster than the SSD at 560GB/s to the GPU isn't it? Why would you transfer it from the SSD at a max 2.4GB rate (missing out on decompression going direct to the GPU?) and where is that data stored? Let's say Microsoft do have something new that does allow a direct SSD to GPU path (I assume from here it goes to the screen?), what would be the point in having 16GB RAM?
Ok. With "nothing" in between, I meant no RAM. So that's my mistake. You definitely need the decompression block. Unless you're decompressing in advance from the SSD, and then storing it again in decompressed form within a pre-asigned 100GB sized portion of the SSD, which doesn't sound very efficient at first glance. Because that means you lower the efficiency of your bandwidth in multiple ways. Not to mention you degrade your SSD a lot quicker.

The idea of transferring directly from SSD to the GPU would mean transferring to the GPU cache only the portions that are purely necessary. The whole idea of SFS seems to be to have a low quality texture in place at all times in RAM, and after a high quality texture has been confirmed to be needed, you stream that texture in. Some people think it HAS to go from SSD to RAM to GPU. Some of us think it's more efficient to bypass the RAM and read it directly to the GPU cache from the SSD, since the SSD is being seen as RAM anyway. That's what seems to be the closest to what MS's marketing says.
In either case, you would have loaded that texture ages ago into RAM with the traditional way of rendering, and it might not have been used at all. That's where the bandwidth savings come from; reading and loading only what is actually needed, rather than what we suspect will be needed.

Whether they are talking about latency or BW I'm fairly confident this isn't close to being true, right!?

Obviously this is all above my tech knowledge so apologies for the basic questions.
Yeah they are talking about latency, not transfer speeds. Access time is basically how long it takes to 'locate' the data on the drive before you can actually load it. Remember that HDDs have seek times. SSDs do not. At least not in the traditional HDD sense.
 
Last edited:

THE:MILKMAN

Member
Ascend Ascend Let's hope Microsoft clear this all up this month rather than leaving it until later in the summer. I can't see them talking about this stuff at their July games blowout.
 

Ascend

Member
Whatever is mapped becomes unavailable.. it’s not hard to understand why you don’t map the entire drive. You also COULD do so, but you don’t, as you probably want that space for your games and stuff. Hence me saying they either reserve a partition permanently, or a file which takes UP to 100gb depending on dev needs.
Oh. So you CAN say sensible things without being dismissive all the time.

I suspect the latter, where the game already on the SSD will be mapped. Having to copy data from the SSD to a fixed 100GB portion seems like a waste, both for SSD space and for SSD durability.
But that still begs the question, why 100GB then? Why not simply say the full game? Or why not 96GB, or 128GB?

Ascend Ascend Let's hope Microsoft clear this all up this month rather than leaving it until later in the summer. I can't see them talking about this stuff at their July games blowout.
Hopefully. There are some patents available that we can still delve deeper into before that. Like this one;

 
Last edited:

Panajev2001a

GAF's Pleasant Genius
That is extremely important information. I bet that previously they became full memory requests, causing loading, which ultimately were left unused.

This is clever, it provides a very efficient HW implementation of a common software optimisation for PRT, which avoids the extra memory requests at a shader complexity and performance cost, and offers an additional instructions which can be used to trigger asynchronous page faults/automatic PRT texture data updates (they do not block waiting for the data to arrive: the GPU takes the requests, the operation returns, and the GPU in the meantime figures out what needs to be loaded in and when transparently to you).

So, yeah a PRT based solution without doing any of this in HW or SW/shader would waste performance and cause stalls in the shader, but ultimately you are requesting memory you need. This is a way to allow you to say “is this part of the texture in memory already? No? Then make sure you figure out what you need exactly and transfer it soon”. —>
the residency of the PRT at those addresses is checked and missing pages are non-redundantly logged and requested to be filled by the OS or a delegate.

Without doing this in HW or in the shader, a regular texture sampling instruction could trigger a page fault, cause the texture section to be loaded from memory, and that particular shader would stall until the request was fulfilled.
Basic PRT helps you reference sparse texture data (load different chunks of the same texture without loading it all in memory), this gives you an instruction to give a hint to the GPU that you will want this memory at some point in the future. A system can be built on top of it to automate calling this operation/triggering the pre-fetching mechanism early on based on what the GPU has been rendering and other game feedback.
 
Last edited:

Lethal01

Member
Oh. So you CAN say sensible things without being dismissive all the time.

I suspect the latter, where the game already on the SSD will be mapped. Having to copy data from the SSD to a fixed 100GB portion seems like a waste, both for SSD space and for SSD durability.
But that still begs the question, why 100GB then? Why not simply say the full game? Or why not 96GB, or 128GB?


Hopefully. There are some patents available that we can still delve deeper into before that. Like this one;


Yeah the 100GB part is what makes me the most curious, if they just said "instant access to the SSD" I'd actually have more confidence(but still extremely little) about what they could mean.
 

sendit

Member
I'm sure you'll find I'm interested in the tech, not any particular platform. I'm sure MS have tons of optimizations in their hardware stack. What I'm against is the pulling random shit from thin air and dubious sources without any evidence or at least knowledge about the subject matter. I've explained every concept in detail, but I'm sure not going to stoke the flames of fanboism Stay grounded, use facts, and yes, everything is reinventing the wheel, or we'd still be on stones.

If you don't agree with it, lay it out, but don't ask me for hypotheticals about mystical hardware.

Sony laid out a hardware stack for loading data directly into VRAM, it's been backed by developers who know what they're talking about, and seems to be a good solution.
In response, I've heard that the XSX must be loading stuff directly into the GPU and bypassing RAM, because some rando said XSX has figured it out, and the Sony solution isn't any good.

If I say the Sony solution is good, am I a fanboy?
If I say the XSX solution is good am I a fanboy?

Only one is based on reality here.

I'm also not a fan of Smart Shift, and do believe the Sony GPU will be left wanting in a lot of games.

Am I a XSX fanboy now?

Don’t worry about it. I got ignored for calling him out on his bullshit.
 

Tripolygon

Banned
Yeah the 100GB part is what makes me the most curious, if they just said "instant access to the SSD" I'd actually have more confidence(but still extremely little) about what they could mean.
It is not really that hard to understand if you read the statement within context. People are taking that particular piece out of the context.
The idea, in basic terms at least, is pretty straightforward - the game package that sits on storage essentially becomes extended memory, allowing 100GB of game assets stored on the SSD to be instantly accessible by the developer. It's a system that Microsoft calls the Velocity Architecture and the SSD itself is just one part of the system.
By the very nature of SSD low seek and low latency when compared to HDD, coupled with the optimization of their IO and software stack, it allows a developer to pull any file within that "100GB of game package" 'instantly', at 2.4GB/s raw or 4.8GB/s compressed.

There is no 'direct' addressable connection from a supposed 100GB partition on the SSD to GPU. It goes SSD - CPU - RAM - GPU. The CPU is still partially responsible for file IO. The latency of SSD are too great to work out of when compared to RAM.
 
Last edited:

Tripolygon

Banned
I think this is an example where a marketing term ('instantly accessible') has made a rod for Microsoft's back because they decided not to fully explain and flesh out what it is, and what it means.

I'm confused about how that makes any sense? The RAM is orders of magnitude faster than the SSD at 560GB/s to the GPU isn't it? Why would you transfer it from the SSD at a max 2.4GB rate (missing out on decompression going direct to the GPU?) and where is that data stored? Let's say Microsoft do have something new that does allow a direct SSD to GPU path (I assume from here it goes to the screen?), what would be the point in having 16GB RAM?

Whether they are talking about latency or BW I'm fairly confident this isn't close to being true, right!?

Obviously this is all above my tech knowledge so apologies for the basic questions.
You pretty much got it.
 

Ascend

Member
It is not really that hard to understand if you read the statement within context. People are taking that particular piece out of the context.

By the very nature of SSD low seek and low latency when compared to HDD, coupled with the optimization of their IO and software stack, it allows a developer to pull any file within that "100GB of game package" 'instantly', at 2.4GB/s raw or 4.8GB/s compressed.

There is no 'direct' addressable connection from a supposed 100GB partition on the SSD to GPU. It goes SSD - CPU - RAM - GPU. The CPU is still partially responsible for file IO. The latency of SSD are too great to work out of when compared to RAM.
That is, once again, the traditional way of doing things. Let me quote the whole part that you're referencing;

The form factor is cute, the 2.4GB/s of guaranteed throughput is impressive, but it's the software APIs and custom hardware built into the SoC that deliver what Microsoft believes to be a revolution - a new way of using storage to augment memory (an area where no platform holder will be able to deliver a more traditional generational leap). The idea, in basic terms at least, is pretty straightforward - the game package that sits on storage essentially becomes extended memory, allowing 100GB of game assets stored on the SSD to be instantly accessible by the developer. It's a system that Microsoft calls the Velocity Architecture and the SSD itself is just one part of the system.


Once again the same question arises. If everything is being done the same, why are they claiming they have a new way of using storage to augment memory? If it is new, that means it's not traditional virtual memory. Potentially, it also means not going Storage -> RAM -> GPU. Emphasis on the word potentially.
 

Tripolygon

Banned
Once again the same question arises. If everything is being done the same, why are they claiming they have a new way of using storage to augment memory? If it is new, that means it's not traditional virtual memory. Potentially, it also means not going Storage -> RAM -> GPU. Emphasis on the word potentially.
We are going from caching data in RAM to pulling data from SSD to RAM just in time. The very nature of SSD means a larger traditional RAM pool is not really needed. (an area where no platform holder will be able to deliver a more traditional generational leap) . Traditional generation leap is about 16 times RAM increase. That is not feasible. They've also optimized their file system and IO system with SFS and new texture compression format.

This is a new way to use storage to augment memory.
 
Last edited:

THE:MILKMAN

Member
I think also important to point out the above quotes from the DF article are Richard's words and not quotes from Microsoft reps.
 

Tripolygon

Banned
I think also important to point out the above quotes from the DF article are Richard's words and not quotes from Microsoft reps.
This is true. Here is Microsoft's own words
Enter Xbox Velocity Architecture, which features tight integration between hardware and software and is a revolutionary new architecture optimized for streaming of in game assets. This will unlock new capabilities that have never been seen before in console development, allowing 100 GB of game assets to be instantly accessible by the developer.

100GB game assets to be instantly accessible by the developer. This to me means when a developer needs a game asset from their game package, they can stream it instantly into RAM and use it without needing to have precached the data in RAM like they do traditionally this gen. That is how fast and efficient their new file and IO architecture is.

The other proposed interpretation takes a certain leap in logic and a technological breakthrough we have yet to see. If they can work straight out of SSD then they essentially have no need for RAM. Even Xbox One relatively fast 68.3GB/s RAM needed some help from an even faster 102.4GB/s bidirectional SRAM pool for the GPU. Latency is king.
 
Last edited:
I hope they will release 2TB+ proprietary Seagate SSD for that connection slot, cause I know 1 TB is going to be quickly gobbled up. Gonna trade in my Xbone X for XsX, and I am loving the fact that I can finally play old games with drastic reduction in loading times. Everything is going MS's way of building the most powerful and well balanced console.
 

Ascend

Member
If they can work straight out of SSD then they essentially have no need for RAM.
That's not true at all. The SSD is not fast enough to not require RAM. But it is fast enough to reduce RAM usage compared to previous gen. But to me it seems more efficient to have your RAM basically loaded up with assets that are constantly re-used, and when one that is suddenly necessary that wasn't ever required before, why not avoid storing it in RAM altogether and simply load it from the SSD at that moment? It avoids the need to constantly dump & rewrite RAM.
 
You certainly have offered nothing.

You're looking at "instant" in the wrong context. It's not a reference to speed or latency, but rather the fact that the data in the 100 GB pool of the SSD doesn't need to go into RAM in order for it to be accessed by the GPU directly.

Which, yes, is something that is doable. Whether you want to consider it analogous to bank switching or not, is besides the point. The bigger point is that this has already been demonstrated in its own way through AMD's SSG Pro graphics card line so you can picture what's being discussed here as a reduced/scaled down implementation of that (those cards have 2 TB of NAND).

The basic way those cards work is that the GPU has a HBCC for direct access to the on-board NAND and bypasses the PCIe bus, CPU, and other system interfaces. If XSX has a method inspired by this then it would work somewhat differently of course, but GPU-bound data formatted for GPU access in the 100 GB cluster of the drive can likely be accessed through GPU hardware modifications for asset streaming purposes.

The extent of that is unknown though partly because details on it MS are still keeping a hold on. But we should probably know more within a month or two. Anyways it's certainly an idea to keep open as being a possibility; crazier technologies have come to fruition in systems of the past and we're seeing other customizations on both systems for suiting their specific requirements.

Also, you might want to give this a read ;)

imagine having a 100gb slow VRAM on the go. This is what the xbox can do and same goes for playstation as well. Its like DDR2 slow ram

Actually the PS5 is doing something different, in that the I/O block is accessing the RAM directly, but the speeds are fast enough to essentially implement the functionality conceptually through a different method with its own advantages (and a couple of tradeoffs, like any technological implementation).

The concept of DDR2 slow RAM you're referring to is with the maximum peak compression rate of data on the SSD that compresses particularly well at that rate, through to system memory. How much type of data will actually reach that rate is up for debate however.

This is true. Here is Microsoft's own words


100GB game assets to be instantly accessible by the developer. This to me means when a developer needs a game asset from their game package, they can stream it instantly into RAM and use it without needing to have precached the data in RAM like they do traditionally this gen. That is how fast and efficient their new file and IO architecture is.

The other proposed interpretation takes a certain leap in logic and a technological breakthrough we have yet to see. If they can work straight out of SSD then they essentially have no need for RAM. Even Xbox One relatively fast 68.3GB/s RAM needed some help from an even faster 102.4GB/s bidirectional SRAM pool for the GPU. Latency is king.

It's not so much about working directly out of SSD to the point RAM is not needed so much as it is about streaming data from SSD to GPU in cases where the GPU can expect to work with a a certain limited amount of data streamed in per-frame off the storage that can be sufficiently used for some extended taskwork by the GPU in question.

So the idea has never been in using it for all types of graphics data workloads, that's why the GDDR6 is still there in the first place. However I do see it possible the idea I mention here and your own idea for direct streaming of the data into RAM without it being precached first are simultaneously doable depending on technological and GPU-specific design customizations that may have been made.

Me personally? I prefer to keep the door open on it until official wording from the source direct says otherwise. I wouldn't consider it a crazy thing to expect if a company wanted to leverage some existing technologies to do so along with proprietary innovations from their R&D laboratories.
 
Last edited:

Ascend

Member
Kind of funny how quickly the thread died, after someone claimed that no one is looking into transferring data directly from storage to GPU, and we have a link of nVidia explaining that exact thing....

And even more of a coincidence, nVidia talks about GPUDirect Storage for this feature, and somehow, the API that is considered to be a gamechanger for the XSX (and for the future in Windows) is called DirectStorage. That is quite the coincidence.

Yes I'm aware the consoles use AMD and not nVidia. But most likely this will become a spec for DX12U, and RDNA2 is DX12U compliant, so they will have it too.
 
Last edited:
Top Bottom