• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Xbox Velocity Architecture - 100 GB is instantly accessible by the developer through a custom hardware decompression block

Panajev2001a

GAF's Pleasant Genius
What would you rate 4.8GB of compressed textures already rejected for what is visible via sfs?

You can add SFS, PRT, your own virtual texturing on top of that 4.8 GB/s, but it applies to both consoles and that is just how you use the bus made available to you on top of the compression applied to each data (btw, interesting comments I reading the other day about an additional encoding that can be applied to textures, thus an additional step of decoding in software after which is not free CPU or GPU wise, which applied before compressing with kraken can yield closer compression rates to BCPack than I thought, need to find the link... oh oldergamer oldergamer thanks for the links: https://www.neogaf.com/threads/xbox...re-decompression-block.1532436/post-258312495 ).

The real number is uncompressed, then you can factor average compression rates (and tricks, which you pay for/do not come for free in HW, to improve them), then you can factor virtual texturing and whatever fancy scheme you use to stream the right data in and out.
 
Last edited:

oldergamer

Member
You can add SFS, PRT, your own virtual texturing on top of that 4.8 GB/s, but it applies to both consoles and that is just how you use the bus made available to you on top of the compression applied to each data (btw, interesting comments I reading the other day about an additional encoding that can be applied to textures, thus an additional step of decoding in software after which is not free CPU or GPU wise, which applied before compressing with kraken can yield closer compression rates to BCPack than I thought, need to find the link... oh oldergamer oldergamer thanks for the links: https://www.neogaf.com/threads/xbox...re-decompression-block.1532436/post-258312495 ).

The real number is uncompressed, then you can factor average compression rates (and tricks, which you pay for/do not come for free in HW, to improve them), then you can factor virtual texturing and whatever fancy scheme you use to stream the right data in and out.
Im not sure you can use the cpu in this without a penalty but to be honest im not sure i understand how this all works. In my mind there is two different compressions at play. You either have general file compression on hard drive (on top of texture compression) or you don't.

I guess my question is, if everything on ps5 is compressed with kraken via its general file compression. So all files and textures are pre compressed on the hard drive. Then everything needs to be decompressed by that custom block before it can be used. However, what happens in the case of textures compressed with kraken? Im mean kraken isnt a gpu dxtc format. Do they need to be decompressed before the texels can be looked up? If that isn't the case let me know.

So how are you looking up the texels to reject what is visible if its pre compressed with kraken? For xbox my understanding was bcpack is a compression format specifically for textures and added in addition to other dxt compression (at least im pretty sure that is the case).

If that is the case, the gpu would not need the texture decompressed and could read the texels directly from a compressed texture. Those texels could then be rejected before needing to load the visible texels into memory. Correct me if im wrong on any of that.
 
Last edited:

longdi

Banned
Sort of - but we've used it to hand-wave GPU delta that had an even longer 'list' of differences for the past 7 years. At this point, expanding the 'GPU is faster' into a list of 'stuff' comes across like pretending things have changed 'just because', even though architecturally the differences have never been smaller in history of consoles.

While I think MS has done a fair share of good things with software-side integration in last 2 years, 'Cloud integration' is one of those buzzwords that's about on-par with 'blockchain' in terms of end-user relevance or desire-ability. Especially in a game console.
But as far as 'match-ups' go - I guess I'm different from most users since buying any console is far from a done deal for me at this point, so it's really not a relative-comparison question... yet.
I/O improvements (on both) are the only system selling thing that has been announced to date (and I say this as someone that's been using 3-4GB/s NVME for several years, most of it lost to PC software-stack dead-end -_-), but I'll need more than that to sell me on one.

That'll take a lot of explaining given that in 40 years of console history 'proprietary storage' was never considered a win by consumers. Unless we're now going back in time claiming Sony was right all along with the Vita...

Yes for those with fast PC, the next gen consoles dont seem terrifying. If anything they do bring up the minimum requirements by a lot. The excitement has to be MS choosing a higher tier than expected. In PC terms, 5800 is more exciting than 5700. :messenger_bicep:
I also think cloud will play a bigger part to open up the gap. Like DLSS is from cloud, and the streaming/gamepass.
As for prop storage, i feel neither PCie4 SSD and MS storage cards will be cheap. At least with storage cards, Xbox players are 100% guaranteed to expand their storage painlessly.
 

Fafalada

Fafracer forever
You're saying that if we take away the bandwidth advantage of the XSX then...the advantage won't matter?
I'm saying these elements work together to deliver an increase - it's 'not' compounding/stacking benefits, even though the 'list' gets longer.
The same goes for I/O or any other subsystem - eg. if the decompressors in weren't rated for the SSD speeds they are paired with - you'd effectively end up wasting drive bandwidth when loading compressed data. But again it's just two numbers working together - not compounding.

A lack of information on PS5s part has no bearing on the information we do have from MS.
MS didn't share information on the hw implementation though - Sony did.

So your saying its not an advantage because both HW have it, as its based on a broadly available RDNA2 featureset?
I'm saying that without MS giving any other detail (and Cerny's talk didn't allude to it being exclusive to PS5) - that's the likely explanation.

It is also noteworthy that the RTX 2080 Ti renders the scene in about 40 microseconds using the regular pass-through method at 1440p whereas the Xbox Series X renders in around 100 micro seconds at 4K.
I'm not sure if you got the numbers wrong - but this implies 2080Ti would still be at least 20% faster at 4k...

Finally you are wrong about RT HW concurrency on XSX.
The intersection tests are run on it their own(so yes that's 'parallel') - but the rest of RT pipeline still uses regular compute. Costs will vary depending on specific use-cases, but it's not some free addition that PR made it sound like. As for rapid-packed math - that's part of RDNA whitepapers so it's all public. Last I checked, Integer math shares the execution units with FP, so there's no extra concurrency there.
The PR you quoted there makes no mention of concurrency either, it's just referencing adding support for packed-ops (which is optional in design-spec, so presumably not every RDNA gpu has it).

Your understanding of the featureset is fairly incomplete based what we know today.
That I agree - MS has been incredibly cagey around their custom stuff (like BCPack). But the above isn't really based on assumptions, RT/RDNA patent/papers are public and so far we have no indications(from either company) that any of it is incorrect or different.
 

Panajev2001a

GAF's Pleasant Genius
Im not sure you can use the cpu in this without a penalty

Of course, as I was saying in my post there would be a decoding penalty on either the CPU side or the GPU one, but that is the joy of programming: tradeoffs :). What do you want to achieve and what are you prioritising?

I guess my question is, if everything on ps5 is compressed with kraken via its general file compression. So all files and textures are pre compressed on the hard drive. Then everything needs to be decompressed by that custom block before it can be used. However, what happens in the case of textures compressed with kraken? Im mean kraken isnt a gpu dxtc format. Do they need to be decompressed before the texels can be looked up? If that isn't the case let me know.

You are right Kraken is not a GPU native format, but BCPack is not one either AFAICS and reading texel data from the GPU to reject textures that are not visible would be a very high latency operation and still consume quite a bit of bandwidth.
I honestly need to read more about BCPack itself, but having a BCPack HW decompressor sitting between SSD and RAM (the point it is there, outside of the GPU Texture Units) indicates that the GPU is not reading texture data straight from the SSD without any decoding happening in between (that is reserved for textures compressed in GPU native formats which I expect to be what is exported for the game and then compressed further with...).

There is a lot of space you can save rearranging the data appropriately (see what was suggested as a Kraken optimisation or what Apple achieved on the App Store by switching the order of encryption and compression of .ipa files) or by designing a custom decompressor block after a specific data layout which is what MS did with the BCPack decompressor).

Now, the question perhaps that you were investigating and I am also thinking on is better framed as “can the SSD, assumed we have a huge texture atlas compressed with either BCPack or Kraken, only transfer a portion of it to memory and thus make it visible to the GPU without transferring the entire file, decompressing it, cutting a portion of it, and throwing away the rest?”

References:

From what I remember and can look up online, native GPU texture compression formats allow the GPU to reference a subset/a tile of the texture data and transfer only that chunk as the compression splits the texture in blocks that can be independently decoded.

BCPack and Kraken apply an additional compression step on top of that without affecting the tileability of the resource and breaking virtual texturing AFAIK, it would be quite a big waste if you had to choose between lzma/kraken/BCPack compression and tiled resources (in which case is the set of tiles you need in GPU native compression format smaller in size than the whole texture with the additional lzma/Kraken/BCPack).
 

martino

Member
If anything they do bring up the minimum requirements by a lot.
giphy.gif

and enthusiasts there are only happy about that.

I also think cloud will play a bigger part to open up the gap. Like DLSS is from cloud, and the streaming/gamepass.
As for prop storage, i feel neither PCie4 SSD and MS storage cards will be cheap. At least with storage cards, Xbox players are 100% guaranteed to expand their storage painlessly.
giphy.gif
 
Last edited:

longdi

Banned
Idk man, my cloud stocks have been on a tear. A lot more cloud hype than even with the lockdown future.
I feel it that MS can offer a lot of Xcloud things that leave Sony straggling. There is a reason why Sony jumped on Azure. 🤷‍♀️
 

oldergamer

Member
You are right Kraken is not a GPU native format, but BCPack is not one either AFAICS and reading texel data from the GPU to reject textures that are not visible would be a very high latency operation and still consume quite a bit of bandwidth.
I honestly need to read more about BCPack itself, but having a BCPack HW decompressor sitting between SSD and RAM (the point it is there, outside of the GPU Texture Units) indicates that the GPU is not reading texture data straight from the SSD without any decoding happening in between (that is reserved for textures compressed in GPU native formats which I expect to be what is exported for the game and then compressed further with...).

So the thing is, i think BCpack actually is a better version of Block compression over standard dxtc formats. The question has been asked, but MS said they would talk more about it at a later date which is odd if it was anything but. I think the decompressor hardware ms has can decompress Zlib as they support that for regular files. All GPU's can decompress the textures as they read the texels. So why would you need hardware to decompress that data (outside of the space saved on the SSD). They can be loaded into memory as is.

There is a lot of space you can save rearranging the data appropriately (see what was suggested as a Kraken optimisation or what Apple achieved on the App Store by switching the order of encryption and compression of .ipa files) or by designing a custom decompressor block after a specific data layout which is what MS did with the BCPack decompressor).
The thing is though, MS never called the hardware decompressor a BCpack decompressor, or did i miss that somewhere?

Now, the question perhaps that you were investigating and I am also thinking on is better framed as “can the SSD, assumed we have a huge texture atlas compressed with either BCPack or Kraken, only transfer a portion of it to memory and thus make it visible to the GPU without transferring the entire file, decompressing it, cutting a portion of it, and throwing away the rest?”

Right, that is what i'm thinking, however we already know that kraken is general purpose compression. it treats all files the same way, and isn't specific to textures.

From what I remember and can look up online, native GPU texture compression formats allow the GPU to reference a subset/a tile of the texture data and transfer only that chunk as the compression splits the texture in blocks that can be independently decoded.

That's right, so no need to decompress textures if they are loaded into memory.

BCPack and Kraken apply an additional compression step on top of that without affecting the tileability of the resource and breaking virtual texturing AFAIK, it would be quite a big waste if you had to choose between lzma/kraken/BCPack compression and tiled resources (in which case is the set of tiles you need in GPU native compression format smaller in size than the whole texture with the additional lzma/Kraken/BCPack).
Isn't that the case right now though, you have to choose between using kraken or standard texture compression formats? I kinda think that is where the mystery is, does kraken support compression without affecting how the hardware can access the texels? we actually don't need PS5 to tell us this, the answer should be there in the kraken documentation.
 
Let me follow you correctly. You're saying that if we take away the bandwidth advantage of the XSX then...the advantage won't matter?

Im not sure I follow your meaning there.

On your next point regarding the compound view of performance. Yes that is exactly the point of this entire thread. Taken to together from the CU advantage, the RT and ML hw, the decompressor speed and the additional virtual ram XVA should be a very competitive solution to the PS5s IO implementation. What else would we be talking about?

Whats amazing is you say all that to then pronounce that the overall I/O solution "isn't in favor of XSX anyway." Ok thank you for your opinion on that.

Well with respect to mesh shaders on XSX, the implementation has been described as quite powerful. A lack of information on PS5s part has no bearing on the information we do have from MS.

DirectX12 is the API construct that runs on the XSX hardware to expose its features.

So your saying its not an advantage because both HW have it, as its based on a broadly available RDNA2 featureset? OK possibly.

The UE5 demo had all the time and opportunity in world to showcase that, but they didnt.

They did actively reference using the primitive shader HW in Ps5 to accelerate scene construction at 1440p @30fps.

By contrast:

"Principal Engineer at Microsoft/Xbox ATG (Advanced Technologies Group), Martin Fuller, has showcased how the new technique would help devs...

It is also noteworthy that the RTX 2080 Ti renders the scene in about 40 microseconds using the regular pass-through method at 1440p whereas the Xbox Series X renders in around 100 micro seconds at 4K. The Xbox Series X, however, delivers much faster render times even at 4K than the (standard pass-through) NVIDIA GeForce RTX 2080 Ti which goes off to show the benefits of the new Mesh Shaders in Direct X 12 Ultimate API being embedded in Turing and RDNA 2 GPUs."

Having a next gen console compare favorably with a top of the line discrete Graphics card is a good thing no?

Finally you are wrong about RT HW concurrency on XSX. From the horses mouth, Andrew Goossen:

"Without hardware acceleration, this work could have been done in the shaders but would have consumed over 13 TFLOPs alone. For the Xbox Series X, this work is offloaded onto dedicated hardware and the shader can continue to run in parallel with full performance. " In parallel with full performance.

On integer concurrence:

"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."

So the XSX has RDNA 2 shader arrays, HW for RT independent of that Shader array, AND special Hardware designed for int/ML work... all concurrent.

All this insight comes directly from Goossen or Fuller who are responsible for the XSX feature set.

I'm not sure where else we can go with this part of the conversation because it seems that our facts are incompatible here..

Your understanding of the featureset is fairly incomplete based what we know today. And there hasn't even been a deep dive as to how it all works together yet.

Makes sense to me. IIRC AMD split off the compute-focused architecture stuff to CDNA2. I wonder if it's possible MS have some CDNA2 hardware on the GPU specifically for compute, that way the RDNA2 CUs can be used for graphics and/or asynchronous compute and the dedicated hardware for RT?

If so it, the "12 TF" banter would actually be underselling the system's computational capabilities by a fair margin.
 
Last edited:

martino

Member
"We knew that many inference algorithms need only 8-bit and 4-bit integer positions for weights and the math operations involving those weights comprise the bulk of the performance overhead for those algorithms," says Andrew Goossen. "So we added special hardware support for this specific scenario. The result is that Series X offers 49 TOPS for 8-bit integer operations and 97 TOPS for 4-bit integer operations. Note that the weights are integers, so those are TOPS and not TFLOPs. The net result is that Series X offers unparalleled intelligence for machine learning."

ho i missed that info

giphy.gif

tell me more.
 

FireFly

Member
Let me follow you correctly. You're saying that if we take away the bandwidth advantage of the XSX then...the advantage won't matter?
I think the point is that the extra bandwidth is needed to take advantage of the extra compute power, so they are two sides of the same coin. On PC, bandwidth and GPU performance increase in lockstep, and every generation Nvidia and AMD have to figure out how to increase memory speeds and/or bus widths in proportion to the performance gains they are targeting.
 

Fafalada

Fafracer forever
Isn't that the case right now though, you have to choose between using kraken or standard texture compression formats?
No - standard treatment has been to use a lossless compressor (usually some derivative of LZ) on top of DXT - going on for about 20 years now.
Compression ratios can then be further increased by messing around with signal quality when encoding DXT (in simplest terms - look for area of low-perceptual loss and flatten the details so lossless compressor has more to work with - if you want a simple analogue - think of what VRS does at runtime but applied to source texture data) and data rearranging - eg. in a DXT texture, sort index and color blocks together instead of interleaved like they are natively, which makes data 'nicer' for a lossless codec to shrink (something that MS had a proprietary format for since X360, and I highly suspect BCPack is based on).
Actually Panajev's post already explained this in a ton of detail - especially if you read any of the links he embeded, so I'm just repeating here.

I kinda think that is where the mystery is, does kraken support compression without affecting how the hardware can access the texels?
The new decompressors sit in SSD/I/O block - they don't have anything to do with how GPU accesses the texels. Both companies made that part pretty clear - there were no allusions made to new GPU block-formats.
 
Last edited:

Andodalf

Banned
to me this mean you won't the cloud for dlss equivalent.

Not sure if won't means want there? But DLSS uses cloud computing in that it has massive Servers work to generate the dataset that is then used by a PC, offline, to implement DLSS. On the local side, It's nice to have custom hardware that enables faster implantation of DLSS. On PC this is Nvidia's Tensor cores, which seem to be CUs customized to work at lower precision, like FP16, i8, and i4, as Well as their custom TF 32. XSX has CUs customized to support these operations as well.

So For DLSS, you need Massive offsite rendering to be done in the cloud, but DLSS games will have you install the "result" data from that, which you then work with, ideally with low precision maths.
 

oldergamer

Member
No - standard treatment has been to use a lossless compressor (usually some derivative of LZ) on top of DXT - going on for about 20 years now.
Compression ratios can then be further increased by messing around with signal quality when encoding DXT (in simplest terms - look for area of low-perceptual loss and flatten the details so lossless compressor has more to work with - if you want a simple analogue - think of what VRS does at runtime but applied to source texture data) and data rearranging - eg. in a DXT texture, sort index and color blocks together instead of interleaved like they are natively, which makes data 'nicer' for a lossless codec to shrink (something that MS had a proprietary format for since X360, and I highly suspect BCPack is based on).
Actually Panajev's post already explained this in a ton of detail - especially if you read any of the links he embedded, so I'm just repeating here.
Yeah i read the first two links, the last one I'd leave for later. I guess i'm just not seeing the benefit of the decompressor hardware if it does apply to compressed textures. Right i do recall reading about that format back before the 360 launched (I had access to the developer portal back then).

So based on those links, you wouldn't double compress a texture with DXT and then kraken on top of it. Perhaps I'm confused, but would this additional compression have an impact if you were using PRT?

The new decompressors sit in SSD/I/O block - they don't have anything to do with how GPU accesses the texels. Both companies made that part pretty clear - there were no allusions made to new GPU block-formats.
yeah I'm confused now. GPU's are supposed to be capable of decompressing compressed textures at a texel level. But, If you are decompressing textures as they come off the SSD, outside of saving space on the install, and improving I/O throughput, are you saving bandwidth anywhere else?
 

Fafalada

Fafracer forever
Perhaps I'm confused, but would this additional compression have an impact if you were using PRT?
PRT requires some form of virtualization - I/O no longer treats textures as "files", since you're requesting parts of the image on-demand.
You'd be accessing some sort of block-layout on disc (maybe something hw-optimal like page-size 64KB - maybe a completely arbitrary block size that happens to work really well for your storage solution, who knows). Disc-compression is then used on those blocks the same way, not on entire images - so yes, it benefits just fine.

outside of saving space on the install, and improving I/O throughput, are you saving bandwidth anywhere else?
Yes that's all there is to it - get data faster into memory and save space on disc.
Once your I/O throughput is sufficiently high - amount of physical memory is no longer a limitation (cue UE5 as example of what that looks like).
 

KingT731

Member
Here is a video of RGT talking about the tweet

timestamped



If 4.8 GB/s is just the safe baseline, on top of this tweet



seems to align a lot with what thicc_girls_are_teh_best thicc_girls_are_teh_best has been saying

This is basically the same thing Cerny said during the "Road To PS5." The Kraken decompression block, in ideal circumstance, can hit 22GB/s but typical usage is 8-9GB/s
 
Last edited:

Panajev2001a

GAF's Pleasant Genius
So the thing is, i think BCpack actually is a better version of Block compression over standard dxtc formats. The question has been asked, but MS said they would talk more about it at a later date which is odd if it was anything but. I think the decompressor hardware ms has can decompress Zlib as they support that for regular files. All GPU's can decompress the textures as they read the texels. So why would you need hardware to decompress that data (outside of the space saved on the SSD). They can be loaded into memory as is.


The thing is though, MS never called the hardware decompressor a BCpack decompressor, or did i miss that somewhere?



Right, that is what i'm thinking, however we already know that kraken is general purpose compression. it treats all files the same way, and isn't specific to textures.



That's right, so no need to decompress textures if they are loaded into memory.


Isn't that the case right now though, you have to choose between using kraken or standard texture compression formats? I kinda think that is where the mystery is, does kraken support compression without affecting how the hardware can access the texels? we actually don't need PS5 to tell us this, the answer should be there in the kraken documentation.
Here is a video of RGT talking about the tweet

timestamped



If 4.8 GB/s is just the safe baseline, on top of this tweet



seems to align a lot with what thicc_girls_are_teh_best thicc_girls_are_teh_best has been saying


Which is what people have been saying for pages: virtual texturing (SFS for a more HW assisted implementation) works with BCPack/Kraken and not against it.
 

Ascend

Member
Here is a video of RGT talking about the tweet

timestamped



If 4.8 GB/s is just the safe baseline, on top of this tweet



seems to align a lot with what thicc_girls_are_teh_best thicc_girls_are_teh_best has been saying

Of course they stack. There are reasons why they don't. Many people are doubting the 2x-3x advantage, and that is basically the main thing that will make the SSD of the XSX just as good, or inferior to the PS5. People are saying the PS5 can do the same thing. We have no choice but to wait at this moment. THere are multiple things that still need to be revealed about the XSX, even though many are pretending we already know everything about it, we know more about the PS5's architecture than we do the XSX.

I feel like I have to quote what I posted many pages back. It can still be wrong, but, everything is still indicating that it works in this direction.

Regarding sampler feedback streaming... I'm not sure people get what it actually does... So I'm going to try and explain things step by step...

First, the transfer value given for the I/O slash SSD is basically a bandwidth value. The 2.4 GB/s raw value means that at most, 2.4 GB of data can be transferred per second.
The compressed value does not magically increase the 2.4 GB/s. What it does is, compress the files to make them smaller. The max amount transferred is still going to be 2.4GB in a second. But when you decompress it again on the 'other side', the equivalent size of the data would have been 4.8GB if you could have transferred it as raw data. So effectively, it's 4.8GB/s, but in practice, 2.4GB/s is being transferred.

Then we get to SFS. First, take a look at what MS themselves say on it;

Sampler Feedback Streaming (SFS) – A component of the Xbox Velocity Architecture, SFS is a feature of the Xbox Series X hardware that allows games to load into memory, with fine granularity, only the portions of textures that the GPU needs for a scene, as it needs it. This enables far better memory utilization for textures, which is important given that every 4K texture consumes 8MB of memory. Because it avoids the wastage of loading into memory the portions of textures that are never needed, it is an effective 2x or 3x (or higher) multiplier on both amount of physical memory and SSD performance.

That last sentence is important. It is an effective 2x or 3x (or higher) multiplier on both amount of physical memory and SSD performance. Now what does that mean? If you want to stream part of textures, you will inevitably need to have tiling. What is tiling? You basically divide the whole texture in equally sized tiles. Instead of having to load the entire texture, which is large, you load only the tiles that you need from that texture. You then don't have to spend time discarding so many parts of the texture that you don't need after you spent resources loading it. It basically increases transfer efficiency. Tiled resources is a hardware feature that is present since the first GCN, but there are different tiers to it, the latest one being Tier 4, which no current market GPU supports. It is possible that the XSX is the first one to have this, but don't quote me on that. It might simply be Tier 3 still.

In any case. When tiling, the size of the tiles will determine how efficient you can be. The smaller the tiles, the more accurate you can be for loading, and the less bandwidth you will need. Theoretically, you can be bit-precise so to speak, but that's unrealistic and requires an unrealistic amount of processing power. There is an optimum there, but we don't have enough information to determine where that point is in the XSX. Microsoft is claiming that with SFS the effective mulitplier can be more than 3x. This means that, after compression (everything on the SSD will inevitably be compressed), you can achieve a higher than 3x 4.8GB/s in effective streaming. To put it another way, effectively, the XSX is capable of transferring 14.4 GB/s of data from the SSD. This does not mean that 14.4GB/s is actually being transferred. Just like with compression, the amount of transferred data is still max 2.4GB/s. What it does mean is that if you compare the current data transfer with compressed tiles to loading the full raw uncompressed texture, you would need more than 14.4GB/s bandwidth to transfer the data to ultimately achieve the same result. This also helps RAM use obviously, because you're loading everything from the SSD into RAM, and you would be occupying RAM space that you wouldn't have. Basically, it decreases the load on everything, including the already mentioned RAM, and the I/O, CPU and GPU.

And here I'm going to speculate for a little bit, in comparison to the PS5. Tiled resources has been a feature in GPUs for a while. And the main part that allows this is sampler feedback. Now, you can have sampler feedback, but that does not mean that you necessarily have sampler feedback streaming. That would depend on the I/O. I recall Cerny mentioning that the GPU is custom built, and that they choose which features they wish to include and not include on the GPU. That implies they did not include everything that AMD has to offer in the GPUs. Most likely neither did MS. But if the PS5 still has this feature, then things regarding the SSDs remain proportionally the same between the compressed values in terms of performance difference, 9GB/s vs 4.8 GB/s. However, considering the beefy I/O of the PS5, it is actually quite possible that Sony ditched the tiled resources feature, and instead opted to beef up the I/O to allow the streaming of the full textures instead. If this is the case, then really, the difference in the SSD performance between the two consoles will be quite minimal. Why they would do that is beyond me though, so, most likely it's still in there. Whether they can stream it immediately is another story.

Obviously, some things are clear now after we discussed for so many pages, but it might bring the ones that didn't follow the thread sort of up to speed.
 
Last edited:

jimbojim

Banned
Many people are doubting the 2x-3x advantage, and that is basically the main thing that will make the SSD of the XSX just as good, or inferior to the PS5.

But when you just look the pure numbers, SSD in XSX is already inferior compared to PS5s SSD. And there is nothing in XSX with included all compression crap and other stuff that can close the gap or mitigate inferior speed
 

Bernkastel

Ask me about my fanboy energy!
But when you just look the pure numbers, SSD in XSX is already inferior compared to PS5s SSD. And there is nothing in XSX with included all compression crap and other stuff that can close the gap or mitigate inferior speed
XSX streams way less asset data then PS5 and this whole setup considerable reduces CPU/GPU overhead and RAM usage. By your logic we should go back to days of Kutaragi and use Flops as absolute performance gauge of comparison.
 
Last edited:

Thirty7ven

Banned
But when you just look the pure numbers, SSD in XSX is already inferior compared to PS5s SSD. And there is nothing in XSX with included all compression crap and other stuff that can close the gap or mitigate inferior speed

It would make this an happier forum and a more positive thread if we stopped having to compare these consoles in a negative light as if they are going into the ring against each other.

It has been a couple of interesting pages about the hardware inside XSX. This thread doesn’t need to be about how it compares negatively or not to the PS5.

We’ve all been guilty of it but we can move past it.

Having people like falafada chiming in is great, so let’s try and keep that up.
 
Last edited:

THE:MILKMAN

Member
They cost the same to manufacture
EccobH7.jpg

This is just a random chart the Bloomberg reporter put up. I seriously doubt it represents him digging up the real/actual costs.

The $450 BOM he had for the PS5 is probably a reasonable ballpark but it doesn't really jive with the figures in the chart here IMO.
 

geordiemp

Member
Idk man, my cloud stocks have been on a tear. A lot more cloud hype than even with the lockdown future.
I feel it that MS can offer a lot of Xcloud things that leave Sony straggling. There is a reason why Sony jumped on Azure. 🤷‍♀️

Powa da cloud is back

 

Exodia

Banned
That's up to MS, and thankfully we won't have to wait any further than mid-August at latest!

I thought they got a new PR/Marketing team and strategy? I guess not, same ole Microsoft in action.
By August it will already be game over in terms of specs. Everyone would have pre-ordered and made up their mind. If you are getting a PC and latest Nividia/Intel/Amd you have already made up your mind, If you are getting a XSX or PS5 nothing will change it either.

While people criticism Cerny. He did a brilliant thing giving that talk and laying out what they did for their SSD, I/O solution and how complex it is.
Making everything plain. Its not surprising that SSD has taken over the discourse. If that talk didn't exist it wouldn't have.
Of-course MS being MS will take forever to respond and actually will never respond. Knowing MS from the past behavior in tech (not just consoles). The corporates are still running the show.

If a real down to earth small company were in charge. They would move up parts of that Hot chip presentation like yesterday and do what Cerny did including a Tech demo of their own specifically to outline their SSD/IO solution.

Its hilarious that people have to read between the lines on BCPACK, SFS and heck what the hell is Direct Storage anyway. Sure we know the general description but nothing tangible. When is it coming to PC. How will the PC market utilize it and how will it improve the current crop of SSD NVME in PCs. Simple stuff. But it will take months to get approval due to red tape. Just goes to show you that the same leadership is still in charge unfortunately.
 
Last edited:

Ascend

Member
But it's not premature when some people are claiming that XSX SSD is just as good as PS5 SSD. Yeah, no.
I understand that. But let's take something else as a comparison...
Look at DLSS 2.0. Basically, it renders the game at a lower resolution, and outputs the image practically indistinguishably from the native higher resolution. You can claim that graphics card X that supports DLSS is not as powerful as graphics card Y that does not support it, and that might be true. However, if the end result is the same, the fact that graphics card Y is more powerful is moot. Graphics card X achieves the same end-result despite its inferior raw capabilities.
And the same thing potentially applies for the PS5 vs XSX, depending on what the XSX hardware is capable of.

The SSD specs of the consoles are clear and don't need debating. Trying to figure out why the XSX claims 2x-3x multipliers on memory/bandwidth and how they achieve that should be welcomed, and not shut down. Especially if the reason is that it makes certain people uncomfortable. Exploring the XSX has already been shutdown on the PS5/XSX speculation thread, since that's basically PS5 only territory now. If you want to share how great the PS5 SSD is, go there, and leave this thread alone.

And that's the last thing I will say about this.
 
Last edited:

Thirty7ven

Banned
The 2x 3x is about sampler feedback, with vs without. I thought that had been clearerd by the Ms guy in the tweet. It’s something that helps every gpu that supports it.

It doesn’t seem to make sense to try and attribute it to XSX specifically as it has those gains on the PC side too.

The only thing that was said to be custom to XSX were custom texture filters that helped avoid artifacts at the edges of those textures.

It doesn’t make much sense to hang on to that number. Seems pretty hard headed to do it. You’re going in circles with this number.
 
Last edited:

oldergamer

Member
I posted the patent info. There was more to the SFS implementation of custom hardware filters. Could what MS is doing be accomplished in software? sure, but with a hit to performance. somewhere along the line. The key is how this interacts with the 100GB of SSD swap space that MS says can be instantly accessed. We still don't know much about it officially and they are holding onto info for a later date.
 

Ascend

Member
The 2x 3x is about sampler feedback, with vs without. I thought that had been clearerd by the Ms guy in the tweet. It’s something that helps every gpu that supports it.

It doesn’t seem to make sense to try and attribute it to XSX specifically as it has those gains on the PC side too.
Leaving these here, once again.... So everyone is on the same page...



The last one is the most important. It basically means that if you do things the same way in software you will get pop ins.
 

Thirty7ven

Banned
The last one is the most important. It basically means that if you do things the same way in software you will get pop ins.

Exactly if then “you might get pop in” at page boundaries. This isn’t confirmation that without those custom texture filters you get 2x or 3x times less bandwidth.

You’re hanging on to that number but it’s a bit like saying that when MS says they have a 40 times faster SSD, they mean 40 times faster than the other guy....

It doesn’t make sense and it’s not being supported by facts thus far. It’s being supported by information that is not there...

Edit: just to reaffirm that I’m not trying to compare. I’m saying your interpretation of the number just doesn’t seem to be supported by facts.
 
Last edited:

Bernkastel

Ask me about my fanboy energy!
I thought they got a new PR/Marketing team and strategy? I guess not, same ole Microsoft in action.
By August it will already be game over in terms of specs. Everyone would have pre-ordered and made up their mind. If you are getting a PC and latest Nividia/Intel/Amd you have already made up your mind, If you are getting a XSX or PS5 nothing will change it either.

While people criticism Cerny. He did a brilliant thing giving that talk and laying out what they did for their SSD, I/O solution and how complex it is.
Making everything plain. Its not surprising that SSD has taken over the discourse. If that talk didn't exist it wouldn't have.
Of-course MS being MS will take forever to respond and actually will never respond. Knowing MS from the past behavior in tech (not just consoles). The corporates are still running the show.

If a real down to earth small company were in charge. They would move up parts of that Hot chip presentation like yesterday and do what Cerny did including a Tech demo of their own specifically to outline their SSD/IO solution.

Its hilarious that people have to read between the lines on BCPACK, SFS and heck what the hell is Direct Storage anyway. Sure we know the general description but nothing tangible. When is it coming to PC. How will the PC market utilize it and how will it improve the current crop of SSD NVME in PCs. Simple stuff. But it will take months to get approval due to red tape. Just goes to show you that the same leadership is still in charge unfortunately.
While people criticism Cerny. He did a brilliant thing giving that talk and laying out what they did for their SSD, I/O solution and how complex it is.
Making everything plain. Its not surprising that SSD has taken over the discourse. If that talk didn't exist it wouldn't have.
Of-course MS being MS will take forever to respond and actually will never respond. Knowing MS from the past behavior in tech (not just consoles). The corporates are still running the show.

If a real down to earth small company were in charge. They would move up parts of that Hot chip presentation like yesterday and do what Cerny did including a Tech demo of their own specifically to outline their SSD/IO solution.
Theres no need to disguise PR as GDC talk. Microsoft also had their GDC panel uploaded to the "Microsoft Game Stack" and "Microsoft DirectX 12 and Graphics Education" channel. Devs dont need to be spoon fed what a SSD is and more basic stuffs. We know a lot more about XVA, DirectStorage, BCPack and SFS then you think.


I thought they got a new PR/Marketing team and strategy? I guess not, same ole Microsoft in action.
By August it will already be game over in terms of specs. Everyone would have pre-ordered and made up their mind. If you are getting a PC and latest Nividia/Intel/Amd you have already made up your mind, If you are getting a XSX or PS5 nothing will change it either.
And Sony will show their box in August, what are you on about ? So far Microsoft has been more open about XSX than Sony is on PS5.
Also, I thought consumers didnt care about specs and only games mattered ?
 
Last edited:

Ascend

Member
Exactly if then “you might get pop in” at page boundaries. This isn’t confirmation that without those custom texture filters you get 2x or 3x times less bandwidth.

You’re hanging on to that number but it’s a bit like saying that when MS says they have a 40 times faster SSD, they mean 40 times faster than the other guy....

It doesn’t make sense and it’s not being supported by facts thus far. It’s being supported by information that is not there...

Edit: just to reaffirm that I’m not trying to compare. I’m saying your interpretation of the number just doesn’t seem to be supported by facts.
You know, I'm going to be honest. I'm getting tired of hearing the same thing over and over and over and over again. I don't go into threads talking about the PS5 saying that we've been streaming from HDDs for a long time, so streaming from storage on a PS5 is not a big deal.
So, please, do me a favor, don't tell me we've been having the same thing as SFS for ages and that it's not a big deal.
 

Thirty7ven

Banned
You know, I'm going to be honest. I'm getting tired of hearing the same thing over and over and over and over again. I don't go into threads talking about the PS5 saying that we've been streaming from HDDs for a long time, so streaming from storage on a PS5 is not a big deal.
So, please, do me a favor, don't tell me we've been having the same thing as SFS for ages and that it's not a big deal.

You’re deflecting. I said what the MS guy said, without the custom texture filters you might get texture pop in on page boundaries. That’s what he says, he specifically addresses what the custom block (texture filters) does. He doesn’t say what you say.

If the point isn’t to have discussion based on facts, and you just want to say whatever you want to say, then I misread the room.
 
Last edited:

Ascend

Member
You’re deflecting. I said what the MS guy said, without the custom texture filters you might get texture pop in on page boundaries. That’s what he says, he specifically addresses what the custom block (texture filters) does. He doesn’t say what you say.

If the point isn’t to have discussion based on facts, and you just want to say whatever you want to say, then I misread the room.
Ugh. Fine. Let's humor you.

Let me put it this way. If without the texture filters you get pop in, would that not be equivalent of being unable to achieve 2x-3x otherwise, keeping in mind that pop in is not desired?
 

Thirty7ven

Banned
Ugh. Fine. Let's humor you.

Let me put it this way. If without the texture filters you get pop in, would that not be equivalent of being unable to achieve 2x-3x otherwise, keeping in mind that pop in is not desired?

You’re not just humoring me, you’re finally talking about what he says and the implications of that.

When he says we might notice pop in at page boundaries what does it mean in practice, can you give me an example?

Is that the only way to prevent that pop in? Does it mean we will notice that pop in in Dx12 U games that use sampler feedback?

These are absolutely the questions that should be made.

And no it’s not equivalent because that’s just you once again using information that doesn’t exist to confirm something you already decided to be true. But maybe if we find the answers to the real questions, we will find out if it’s true or not.
 
Last edited:

Bernkastel

Ask me about my fanboy energy!
here seems to be a lot of misconceptions about the xbox velocity architecture. The goal of the PS5's and the Series X's I/O implementation is to increase the complexity of the content presented on screen without a corresponding increase in load times/memory footprint but go about it in totally different ways. Since the end of the cartridge era, an increase in geometry/texture complexity was usually accompanied by an increase in load times. This was because while RAM bandwidth might be adequate, the thoughput of the link feeding the RAM from the HDD was not. Hence, HDDs and the associated I/O architecture was the bottleneck.
One way to address this issue was to "cache" as much as possible in the RAM so as to get around the aforementioned bottleneck. However, this solution comes with its own problem in that the memory footprint just kept ballooning ("MOAR RAM"). This is brilliantly explained by Mark Cerny in his GDC presentation with the 30s of gameplay paradigm. Playstations answer to this problem is to increase the throughput to the RAM in an unprecedented way. Thus, instead of caching for the next 30s of gameplay, you might only need to cache for only the next 1s of gameplay which results in a drastic reduction in memory print. Indeed, the point of it all is that for a system with the old HDD architecture to have the same jump in texture and geometry complexity, either the amount of RAM needed for caching will have to be exorbitant or frametime will have to be increased to allow enough time for the texture to stream in (low framerates) or gameplay design will have to be changed to allow for texture loading (long load times). The PS5 supposedly will achieve all of this with none of those drawbacks thanks to alleviating the bottleneck between persistent memory and RAM (the bottleneck still exists because RAM is still quicker than the SSD but it is good enough for the PS5 rendering capacity and hence doesn't matter anyway. You just don't load textures from SSD to the screen.)

We can now see why the throughput from the SSD to RAM has now become the one-and-only metric for judging the I/O capability of next-gen systems in the mind of gamers. After all, it does make perfect sense. BUT...is there an alternative way of doing things? Microsoft's went in a completely different direction. Is the Persistent memory to RAM throughput still the bottleneck? Yes! Why is more throughput needed? To stream more textures evidently. The defining question is then how much of it is actually needed? After careful research by assessing how games actually utilised textures on a per frame basis, MS seems to have come to a surprising answer: not that much actually.

Indeed, by loading higher detailed MIPs than necessary and keeping the Main memory - RAM throughput constant, load times/memory footprint is increased. Lets quote Andrew Goosen in the Eurogamer deep-dive for reference:

"We observed that typically, only a small percentage of memory loaded by games was ever accessed," reveals Goossen. "This wastage comes principally from the textures. Textures are universally the biggest consumers of memory for games. However, only a fraction of the memory for each texture is typically accessed by the GPU during the scene. For example, the largest mip of a 4K texture is eight megabytes and often more, but typically only a small portion of that mip is visible in the scene and so only that small portion really needs to be read by the GPU."

The upshot of it all is that by knowing what MIP levels are actually needed on a per-frame basis and loading only that, the amount needed to be streamed is radically reduced and so is the throughput requirement of the SSD-RAM link as well as the RAM footprint. Can this Just-in time streaming solution be implement ed via software? MS indeed acknowledges that it is possible to do so but concedes that it is very inaccurate and requires changes to shader/application code. The hardware implementation of determining residency maps associated with partially resident textures is sampler feedback.

While sampler feedback is great, it is not sampler feedback streaming. You now need hardware implementation for :

(1) transition from a lower MIP-level to a higher one seamlessly
(2) fallback to a lower MIP-level if the requested one is not yet resident in memory and to blend back to the higher one when it comes available after a few frames.

Microsoft claims to have devised a hardware implementation for doing just that. This is the so-called "texture filters" described by James Stanard. Do we have more information about Microsoft's implementation? Of course we do. SFS is patented hardware technologgy and is described in patent US10388058B2 titled
"Texture residency hardware enhancements for graphics processors" with co-inventors being Mark S Grossman and....Andrew Goosen.

Combined with Directstorage (presuambly a new API that revamps the file system but information about it is sparse) and the constant high throughput of the SSD, this is how Microsoft claims to achieve 2x-3x increase in efficiency. Hence, the "brute force" meme about the series X is wildly off-base.

As for which of the PS5 or Series X I/O system is better? I say let the DF face-offs begin.
 

Ascend

Member
You’re not just humoring me, you’re finally talking about what he says and the implications of that.

When he says we might notice pop in at page boundaries what does it mean in practice, can you give me an example?

Is that the only way to prevent that pop in? Does it mean we will notice that pop in in Dx12 U games that use sampler feedback?

These are absolutely the questions that should be made.

And no it’s not equivalent because that’s just you once again using information that doesn’t exist to confirm something you already decided to be true. But maybe if we find the answers to the real questions, we will find out if it’s true or not.
Ok. We clearly need to take a step back. He mentions page boundaries, that means, you have pages. What is a page? In short, memory is divided into pages (basically 'blocks' of memory that are all the same size), and everything that is loaded into memory is loaded in pages. At the same time you create a page table to keep a record of where exactly everything is stored in memory.
Then we get to page boundaries. A page boundary is basically where the data of one page ends, and the data of the next page begins.

So if you're loading a texture and you need to load multiple pages of that texture, you will see each part of the texture popping in, as each page is finished loading. Also note how the texture filters work specifically when the texture is not in memory yet, meaning, it has to be loaded from the SSD.

Is it the only way to prevent the pop-in? No. There are two ways.
The first one is a faster SSD transfer rate, which the PS5 has. So theoretically, the PS5 does not need the texture filters.
The second one is more RAM. So for PC, rather than having the texture filter, you're going to most likely require more than 16GB to avoid the pop ins for an XSX optimized game.
 

Ascend

Member
Link to the patent;
 
Top Bottom