• Hey Guest. Check out your NeoGAF Wrapped 2025 results here!

Nvidia shows neural texture compression cutting VRAM usage from 6.5GB to 970MB

BennyBlanco

aka IMurRIVAL69
DLSS 5 is only one part of neural rendering, and it sits on the side where machine learning is applied to the final rendered result. The GDC session instead focused on using small neural networks inside the rendering pipeline itself to decode textures, evaluate materials, and reduce memory traffic.
The easiest example was Neural Texture Compression (NTC) NVIDIA showed its Tuscan Wheels scene dropping from about 6.5GB of VRAM with traditional BCN-compressed textures to 970MB with NTC, while keeping image quality close to the original.


NVIDIA-NEURAL-RENDERING-MEMORY-DEMO-1200x624.jpg
 
Seems like incredibly important tech considering the state of things right now
Watch Nvidia refuse to release this until the jig is up on the memory inflation caused by themselves and manufacturers hand in hand.

Call me a cynic, but I don't see Nvidia doing anything "pro consumer" until they're done leeching.
 
Last edited:
Yes, and that includes the PS6. AMD and Sony already announced it.
Are you referring to Universal Compression? I thought that was a different technology to neural texture compression, although logically it would fall under the umbrella Universal Compression now that I think about it.
 
Are you referring to Universal Compression? I thought that was a different technology to neural texture compression, although logically it would fall under the umbrella Universal Compression now that I think about it.

Both are using Neural Networks to compress data. Though during the Sony/AMD presentation they only talked about reducing memory bandwidth.
I'm not sure it it keeps data compressed in vram.
 
Are u sure? Afaik only RTX5000 have neural shaders.
from reddit

GPU for NTC decompression on load and transcoding to BCn:

  • Minimum: Anything compatible with Shader Model 6
  • Recommended: NVIDIA Turing (RTX 2000 series) and newer.
  • [*] The oldest GPUs that the NTC SDK functionality has been validated on are NVIDIA GTX 1000 series, AMD Radeon RX 6000 series, Intel Arc A series.
GPU for NTC inference on sample:

  • Minimum: Anything compatible with Shader Model 6 (will be functional but very slow)
  • Recommended: NVIDIA Ada (RTX 4000 series) and newer.
If implemented well, being able to pack in about 5.5x more amount of texture data in the same footprint is going to be very big, especially for VRAM-constrained scenarios. And it does look like it will benefit disk sizes and PCIe traffic equally for most people, even those without the newest GPUs.
 
Are u sure? Afaik only RTX5000 have neural shaders.

For the RTX Neural Texture Compression (NTC) , there are 2 requirements. To decompress, meaning for users to use the tech, all it takes is a RTX2000 series GPU.

GPU for NTC decompression on load and transcoding to BCn:
  • Minimum: Anything compatible with Shader Model 6
  • Recommended: NVIDIA Turing (RTX 2000 series) and newer.
GPU for NTC inference on sample:
  • Minimum: Anything compatible with Shader Model 6 (will be functional but very slow)
  • Recommended: NVIDIA Ada (RTX 4000 series) and newer.
GPU for NTC compression:
  • Minimum: NVIDIA Turing (RTX 2000 series).
  • Recommended: NVIDIA Ada (RTX 4000 series) and newer.

The RTX Neural Shading is for developers to create and traine their own models and assets.
RTX Neural Shading (RTXNS) also known as RTX Neural Shaders, is intended as a starting point for developers interested in bringing Machine Learning (ML) to their graphics applications on Windows or Linux. It provides a number of examples to help the reader understand how to train their own neural networks and then use those models to perform inference alongside their normal graphics rendering.
 
The tech looks promising but considering game dev realities, we won't see it for years. This isn't just a driver level optimization and dev studios actually have to do bunch of work around training models for this.

So for any projects past initial stages in the pipeline I find it unlikely that the tech will be implemented, ie we won't see it in games for like 4-5 years. Hopefully I am wrong.
 
Watch Nvidia refuse to release this until the jig is up on the memory inflation caused by themselves and manufacturers hand in hand.

Call me a cynic, but I don't see Nvidia doing anything "pro consumer" until they're done leeching.

Not cynical but really uninformed on the state of this tech

Demo has been out for a long time for anyone to try


DirectX12 update this year will include the API calls for it



Intel will use it via DirectX



AMD will use it via DirectX



Nvidia has the dev kits updated and it's available now


Devs are toying with this right now. It'll be hardware agnostic via DirectX

Again, Nvidia dragging this industry forward kicking and screaming and they even handed the keys to API consortiums to use it so every manufacturers can use it. But fuck Nvidia right?
 
Last edited:
Not cynical but really uninformed on the state of this tech

Demo has been out for a long time for anyone to try


DirectX12 update this year will include the API calls for it



Intel will use it via DirectX



AMD will use it via DirectX



Nvidia has the dev kits updated and it's available now


Devs are toying with this right now. It'll be hardware agnostic via DirectX

Again, Nvidia dragging this industry forward kicking and screaming and they even handed the keys to API consortiums to use it so every manufacturers can use it. But fuck Nvidia right?
Fuck Nvidia.
 
Feeling It Charlie Day GIF by First We Feast


But how great will be in motion and what's the catch?

Even better quality textures than BCn

There's no catch akin to the DLSS5 drama, this is per pixel and more akin to upscalers.

It can decompress in VRAM (no savings there but saves on SSD and PCI bandwidth), or can inference on sampling which means for every pixel it'll be done in the shader and you save VRAM. This has a performance cost of course but we're talking about ~0.5 to 1ms akin to upscalers and in a game pipeline this is easily hidden by working concurrently the tensor cores while another task is being done.


Is this for DLSS5 only?
Seems like it could be extremely useful for cheaper builds.

Won't be DLSS 5 only

Nvidia basically gave the keys to API consortiums to implement what's required for games to make hardware agnostic calls and do NTC on Nvidia / Intel / AMD
 
from reddit

GPU for NTC decompression on load and transcoding to BCn:

  • Minimum: Anything compatible with Shader Model 6

  • Recommended: NVIDIA Turing (RTX 2000 series) and newer.
  • [*] The oldest GPUs that the NTC SDK functionality has been validated on are NVIDIA GTX 1000 series, AMD Radeon RX 6000 series, Intel Arc A series.
GPU for NTC inference on sample:

  • Minimum: Anything compatible with Shader Model 6 (will be functional but very slow)

  • Recommended: NVIDIA Ada (RTX 4000 series) and newer.
If implemented well, being able to pack in about 5.5x more amount of texture data in the same footprint is going to be very big, especially for VRAM-constrained scenarios. And it does look like it will benefit disk sizes and PCIe traffic equally for most people, even those without the newest GPUs.

NTC on load means no VRAM saving though, it decompress there. But that means a huge game size saving on SSDs and the PCI-e bandwidth saved.

NTC on inference is where you save on VRAM. And yea, I'm sure it can be tried on older cards that were starved on VRAM and perhaps even better solution than GPU choking on VRAM but overall it's Ada +

You won't see switch 2 suddenly use this tech to save VRAM. Could be really good though for cartridge size 🤔
 
Not cynical but really uninformed on the state of this tech

Demo has been out for a long time for anyone to try


DirectX12 update this year will include the API calls for it



Intel will use it via DirectX



AMD will use it via DirectX



Nvidia has the dev kits updated and it's available now


Devs are toying with this right now. It'll be hardware agnostic via DirectX

Again, Nvidia dragging this industry forward kicking and screaming and they even handed the keys to API consortiums to use it so every manufacturers can use it. But fuck Nvidia right?
It's quite neat but is this real-time capable? I remember a few years back when the first papers about neural compression came out and they were just not real-time capable. And if it is now, what are the limits there? I guess the bandwidth of the VRAM is going to be the main bottleneck? And how robust is this against compression artifacts? I assume they have a very specialized variant of an autoencoder in use for this? Or is this generation based on some initial patch (basically an inverted in-paining problem or guided diffusion problem), because that usually isn't real-time capable unless you would either have loading times to decompress what you need or have separate GPU kernels that are just there to decompress, manage and cache this stuff?

Microsoft seems to be the actual "good guy" here because they are implementing a unified API that game devs can target instead of letting Nvidia, AMD and Intel do their own thing.

In any case, nice technology (even back then) but it will not make the issue of Nvidia selling overpriced cards with not enough VRAM go away. And Nvidia weren't the ones who came up with the idea, but they made it popular at least (to sell their cards).
 
from reddit

GPU for NTC decompression on load and transcoding to BCn:

  • Minimum: Anything compatible with Shader Model 6

  • Recommended: NVIDIA Turing (RTX 2000 series) and newer.
  • [*] The oldest GPUs that the NTC SDK functionality has been validated on are NVIDIA GTX 1000 series, AMD Radeon RX 6000 series, Intel Arc A series.
GPU for NTC inference on sample:

  • Minimum: Anything compatible with Shader Model 6 (will be functional but very slow)

  • Recommended: NVIDIA Ada (RTX 4000 series) and newer.
If implemented well, being able to pack in about 5.5x more amount of texture data in the same footprint is going to be very big, especially for VRAM-constrained scenarios. And it does look like it will benefit disk sizes and PCIe traffic equally for most people, even those without the newest GPUs.
When is this going to release?

Nice that it works on all generation rtx cards but i guess we'll see about performance on the older cards.
 
The tech looks promising but considering game dev realities, we won't see it for years. This isn't just a driver level optimization and dev studios actually have to do bunch of work around training models for this.

So for any projects past initial stages in the pipeline I find it unlikely that the tech will be implemented, ie we won't see it in games for like 4-5 years. Hopefully I am wrong.
Yeah like Directstorage.
 
I'm happy to see so much focus lately on making more efficient use of the hardware we already have, rather than just throwing more memory + computation at the problem.
 
Not cynical but really uninformed on the state of this tech

Demo has been out for a long time for anyone to try


DirectX12 update this year will include the API calls for it



Intel will use it via DirectX



AMD will use it via DirectX



Nvidia has the dev kits updated and it's available now


Devs are toying with this right now. It'll be hardware agnostic via DirectX

Again, Nvidia dragging this industry forward kicking and screaming and they even handed the keys to API consortiums to use it so every manufacturers can use it. But fuck Nvidia right?
Is your leather jacket in the mail?

By the by, you can enjoy nvidia the GPU maker but also say "fuck nvidia" as a whole. All my GPUs have been Nvidia and all my future GPUs will be Nvidia. Despite their GPU gaming tech, they are doing the world no favors right now.
 
Last edited:
depends on the developers , could be implemented in a big release next year maybe?
That's a long ways away.

Oddly I haven't run into any VRAM issues yet. Hard to believe but true. I have a 2080 and a 3080ti. Any game with enough performance to even run at whatever settings it can doesn't benefit from extra RAM.

In other words, if I'm prioritizing 60+fps, I usually have to drop the resolution / shadows or use DLSS to where it's not maxing my VRAM and whatever settings my textures are at, it's usually either maxed or negligible from the next step down.
 
Last edited:
It's quite neat but is this real-time capable? I remember a few years back when the first papers about neural compression came out and they were just not real-time capable.

Always been? Even in the papers' abstract

At the same time, our method allows on-demand, real-time decompression with random access similar to block texture compression on GPUs, enabling compression on disk and memory.

The cost associated with it was to be further researched into how to hide it as best as possible in the pipeline but that's usually for any new tech that will go into a game rendering pipeline. 0.5-1 ms frametime cost is a lot easier to hide in a game pipeline than the performance cost in an NTC demo

They also refer to it in their first paper

Although NTC is more expensive than traditional hardware-

accelerated texture filtering, our results demonstrate that our method

achieves high performance and is practical for use in real-time ren-

dering. Furthermore, when rendering a complex scene in a fully-

featured renderer, we expect the cost of our method to be partially

hidden by the execution of concurrent work (e.g., ray tracing) thanks

to the GPU latency hiding capabilities. The potential for latency

hiding depends on various factors, such as hardware architecture,

the presence of dedicated matrix-multiplication units that are oth-

erwise under-utilized, cache sizes, and register usage. We leave

investigating this for future work

And if it is now, what are the limits there? I guess the bandwidth of the VRAM is going to be the main bottleneck?

Bandwidth of VRAM is a limit for the GPUs that will be doing inference on load. In that case you have no savings of VRAM and uses more bandwidth for the decompression

Inference on sample then you absolutely save everywhere in anything related to memory, you save massively on bandwidth, the textures will be inferenced in the pipeline directly (thus likely a performance cost)

There's also an inbetween solution which is inference on feedback which mixes inference on load and inference on sample. Finding the sweet spot for that would be interesting.


And how robust is this against compression artifacts? I assume they have a very specialized variant of an autoencoder in use for this? Or is this generation based on some initial patch (basically an inverted in-paining problem or guided diffusion problem), because that usually isn't real-time capable unless you would either have loading times to decompress what you need or have separate GPU kernels that are just there to decompress, manage and cache this stuff?

You'll probably find most of your questions here


NTC is closer to reference than BCn

Microsoft seems to be the actual "good guy" here because they are implementing a unified API that game devs can target instead of letting Nvidia, AMD and Intel do their own thing.

Cooperative vectors and neural rendering in the upcoming DX12 is basically Nvidia's list of research

Just like ray tracing back then, DXR is basically Nvidia's homework
 
Is your leather jacket in the mail?

That's the best you can muster?

You know, it's really "hip" lately to just be a fucking negative black hole on the internet, you want to be a Debbie Downer? Fine Debbie, but when all the facts have been accumulated throughout the years that the tech is ready and being toyed with with developers, then own up to your stupid cynical take that you were off or if you don't want to, certainly don't fucking attack the peoples laying out in the table what's going on with the tech.
 
The tech looks promising but considering game dev realities, we won't see it for years. This isn't just a driver level optimization and dev studios actually have to do bunch of work around training models for this.

So for any projects past initial stages in the pipeline I find it unlikely that the tech will be implemented, ie we won't see it in games for like 4-5 years. Hopefully I am wrong.

Next gen consoles will use this and I believe almost by default. Kepler said they would keep it to NTC on load (not on sample) so at the very very least, it will be mainstream by the time devs release games on consoles and have PC ports. As for when games on PC will use it, at the least DX12 will have it API side for late April. I wouldn't be surprised we see some collaboration in fall between Nvidia and some sponsored game release.

When peoples ass how much SSD space it takes, there will be a lot f pressure for devs to stop fucking around and releasing >150GB games when some will be released at a fraction of that.
 
That's the best you can muster?

You know, it's really "hip" lately to just be a fucking negative black hole on the internet, you want to be a Debbie Downer? Fine Debbie, but when all the facts have been accumulated throughout the years that the tech is ready and being toyed with with developers, then own up to your stupid cynical take that you were off or if you don't want to, certainly don't fucking attack the peoples laying out in the table what's going on with the tech.
So you didn't read the rest of my post?

I'll say it again: it's perfectly fine to acknowledge Nvidia's good work in the GPU and gaming sphere but also say fuck nvidia as a whole for their participation in fucking up the world tech economy right now. So as someone who has always bought an Nvidia GPU and will continue to, I still reserve the right to say fuck nvidia whenever I think they're being a shitty company overall.

Humans are capable of nuanced thoughts and take like that. Put me on ignore if you don't like that I can both consume nvidia GPUs and say Fuck nvidia in the same breath.
 
Last edited:
Just like ray tracing back then, DXR is basically Nvidia's homework
To be fair with RT, Nvidia's homework was heavily based on RT research (which existed before Nvidia even existed since RT predates rasterization).
 
Last edited:
Top Bottom