• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

NVIDIA Volta Unveiled (GV100 new GPU Architecture)

ethomaz

Banned
I can't find a thread about that.

Yesterday nVidia announced the first Volta GPU aimed at very high end of the compute market. While gamer GPU based on Volta will probably come in 2018 only it is good to have idia of the Pascal's sucessor.

I did a summary myself but you can read the article linked after that.


  • The first Volta GPUs are focused on business, HPC, and deep learning.
  • The first chip is codenamed GV100 (sucessor of GP100).
  • GV100 has 84SMs with 64 CUDA Cores each one (5376 CUDA Cores total).
  • There are a new processing unit called Tensor Cores, 8 per SM, 672 Tensor Cores total.
  • FP16 2:1, FP64 1:2 (compared with FP32 performance).
  • 1455MHz Boost Clock, 30TFs FP16, 15TFs FP32, 7.5TFs FP64.
  • 336 TMUs.
  • 16GB HBM2, 900GB/s bandwidth.
  • 21.1B transistors, 815mm2 die size in TSMC 12nm FFN (it is a improved 16nmFF+).
  • 128KB of L1 data cache/shared memory (split between both can be configurable now).
  • 300W TDP.
  • Launch in Q3 2017.

More info here: http://www.anandtech.com/show/11367...v100-gpu-and-tesla-v100-accelerator-announced

nVidia showed Kingsglaive: Final Fantasy XV demo running on Volta GPU.

https://www.youtube.com/watch?v=TIgQQz5SNxs

nVidia PR.

http://nvidianews.nvidia.com/news/n...next-era-of-ai-and-high-performance-computing

The Tesla V100 GPU leapfrogs previous generations of NVIDIA GPUs with groundbreaking technologies that enable it to shatter the 100 teraflops barrier of deep learning performance. They include:

  • Tensor Cores designed to speed AI workloads. Equipped with 640 Tensor Cores, V100 delivers 120 teraflops of deep learning performance, equivalent to the performance of 100 CPUs.
  • New GPU architecture with over 21 billion transistors. It pairs CUDA cores and Tensor Cores within a unified architecture, providing the performance of an AI supercomputer in a single GPU.
  • NVLink™ provides the next generation of high-speed interconnect linking GPUs, and GPUs to CPUs, with up to 2x the throughput of the prior generation NVLink.
  • 900 GB/sec HBM2 DRAM, developed in collaboration with Samsung, achieves 50 percent more memory bandwidth than previous generation GPUs, essential to support the extraordinary computing throughput of Volta.
  • Volta-optimized software, including CUDA, cuDNN and TensorRT™ software, which leading frameworks and applications can easily tap into to accelerate AI and research.
 

Gestault

Member
Processing FP64 data at only half of its FP32 rate sounds impressive to me. This may be a misunderstanding, but does that mean its efficiency is on the better side of the inverse-square law, or is it more of a linear scale-up?
 

ethomaz

Banned
Processing FP64 data at only half of its FP32 rate sounds impressive to me. This may be a misunderstanding, but does that mean its efficiency is on the better side of the inverse-square law, or is it more of a linear scale-up?
That is not new.

Pascal GP100 already do 1:2 FP64:FP32... it is just that gaming GPU was capped.
 

ethomaz

Banned
How strong was gp 100 again? 20 (fp16) / 10 (fps 32) / 5 (fp64) tflops?
FP16 21.2 TFs
FP32 10.6 TFs
FP64 5.3 TFs

While there is the same number of units than GP102... clock is lower... so I expect GV100 to have lower clock than GV104 and GV102 too.
 

ISee

Member
FP16 21.2 TFs
FP32 10.6 TFs
FP64 5.3 TFs

While there is the same number of units than GP102... clock is lower... so I expect GV100 to have lower clock than GV104 and GV102 too.

The consumer versions will most likely have a higher clockspeed, but also less Tensor cores and therefore less shaders. Still ~15 tflops on a GTX 2070 with 8GB gddr6 are a possibility, imo. But that's just pure speculation.
 

McHuj

Member
Can't wait to upgrade my 1070. It was sufficient for 1080p, but since I've upgraded to 1440/144Hz, I've had to do the unthinkable and lower some settings.
 

ethomaz

Banned
The consumer versions will most likely have a higher clockspeed, but also less Tensor cores and therefore less shaders. Still ~15 tflops on a GTX 2070 with 8GB gddr6 are a possibility, imo. But that's just pure speculation.
Depends of the consumer version.

GV102 (TITANs) will have the same numbers.
GV104 (xx80) will have less numbers.

Clocks will probably be around 1700-1800Mhz while GV100 runs at 1480Mhz.
 

10k

Banned
Oh shit glad I skipped Pascal.

Although I'll probably have to upgrade my motherboard to support DDR4 before buying this gpu (priorities)
 

McHuj

Member
The consumer versions will most likely have a higher clockspeed, but also less Tensor cores and therefore less shaders. Still ~15 tflops on a GTX 2070 with 8GB gddr6 are a possibility, imo. But that's just pure speculation.

That's probably too high for a 2070.

A 100% increase in flops over the 1070 (assuming 5.7 in the ref model) would increase the 2070's flops to a ~11. That I think is way to aggressive. 60-75% is in the more realistic ballpark which would put it right under 10 Flops. And given the new improved shader cores, it could probably perform within the ballpark of a 1080Ti.
 

McHuj

Member
2080 TI will arrive in 2019 then?

Assuming that 2070 and 2080 cards will arrive today in a year.

2018 is more likely for the Ti with the 2070 and 2080 coming this year.

P100 was announced last April and the 1070/80 followed roughly 3 months later.
 

efyu_lemonardo

May I have a cookie?
Jesus these things are monsters. I don't keep up with current graphics card performance so I realize this isn't a quantum leap over the previous generation, but just looking at those numbers makes me gasp.
 

FingerBang

Member
I've been waiting for Vega for a year but at this point if it's not much more powerful than Pascal I'm going to get a cheap card in June (maybe a 1060) and then grab a 2070 when Volta comes out.
 

Jimrpg

Member
Well I was thinking of upgrading after Volta (I bought a 970 and 1070), so in terms of TFs for the 70 series (going by 70% increases each time)

Pascal - 6.5 TF
Volta - 11TF
After Volta - 18TF that's gotta be enough for 4k60 at that point.
 

Hux1ey

Banned
Well I was thinking of upgrading after Volta (I bought a 970 and 1070), so in terms of TFs for the 70 series (going by 70% increases each time)

Pascal - 6.5 TF
Volta - 11TF
After Volta - 18TF that's gotta be enough for 4k60 at that point.
1080 ti does 4k 60 fine tbh if you're smart with your settings.
 

ISee

Member
That's probably too high for a 2070.

A 100% increase in flops over the 1070 (assuming 5.7 in the ref model) would increase the 2070's flops to a ~11. That I think is way to aggressive. 60-75% is in the more realistic ballpark which would put it right under 10 Flops. And given the new improved shader cores, it could probably perform within the ballpark of a 1080Ti.

Yes, maybe I'm reaching too high for the 2070:

The 1080 has ~30% less shader units than a GP100. If they hold on to the ~30% less for the 2080 over the GV100 we would get a card with roughly 13.5 tflops on the 2080 FE (@1800MHz).

The next question is: How significant are the architectural improvements over Pascal.
The last huge improvement was going from keplar to maxwell. The 3.5 tflops 970 performed about as good as a 5 tflops 780Ti. That was an improvement by ~40%. But let's assume a much more careful +10% for going from pascal to volta
With that a 2080 FE could perform just as well as a 14.9 tflops Pascal card (or a 1080Ti running at 2080 MHz, which is rather unreachable).

Disclaimer: I'm aware that I'm pulling estimates out of nowhere and that I'm most probably way off. I just like speculating in pc hardware threads.

Depends of the consumer version.

GV102 (TITANs) will have the same numbers.
GV104 (xx80) will have less numbers.

Clocks will probably be around 1700-1800Mhz while GV100 runs at 1480Mhz.

I normally draw the line after the xx80 Ti for consumer cards, but yes Pascal are technically consumer cards. 1700-1800 MHz is a likely estimate for the 20X0 Founders Editions, I think.
 

Newboi

Member
What's up with 8-Hi HBM2 stacks? I thought they were arriving this year. It's weird to go over a year without 32GBs of HBM2 on an enterprise class card. I really want to see that 1TB/s bandwidth mark get hit!

Anyway, I am curious as to how much die space the Tensor cores take up? I'm guessing consumer cards will have no need for those. The Tensor cores however do provide an interesting conundrum for future Titan cards though. If Nvidia believes their own explanation about how Titans aren't gaming cards, then it would make since to put the Tensor cores into them (especially since they are marketed as deep learning cards as well).


I am curious though about the overall capability of the Volta architecture. Anandtech says it's an entirely new architecture. I'm honestly surprised though that the card is only 15 Teraflops even though it's ridiculously big (815mm2)! 15 Teraflops is huge, but isn't the Titan Xp 11 Teraflops?
 

Newboi

Member
Is Volta a "workstation GPU first" and not good for gaming?

The GV100 is the workstation/HPC GPU. Volta is the name of the new GPU architecture it's using, which will trickle down to gaming cards later. All this being said, Nvidia will be selling PCI-E versions of GV100 (albiet slightly cut down) that I assume will have windows drivers. If this is true, people could potentially game on those cards...if they wanted to spend over 10K to do so lol.
 

Gearless

Neo Member
I'm not even sure if there's proper driver support for workstation GPU's in the geforce drivers...

That's weird, I would have thought that would be a basic thing. Does AMD have the same problems or do they not create workstation GPUs?
 

Newboi

Member
Nice, would it be smart of me to wait later for some Volta GPUs or is Pascal enough (I just want basic 1080p 60FPS with Ultra settings)?

You definitely wouldn't need anything close to Volta in order to achieve what you want. The Nvidia GTX1070 will be way more than enough for the needs you have. Depending on the games you want to play, the GTX1060 might be enough for you. If you want to game at 120fps plus at 1080p ultra settings in most games, I would go with the GTX1080. The 1080Ti is beyond overkill and you won't see any performance benefit over a GTX1080 really unless you game at extremely high resolutions (or downsample).
 

Gearless

Neo Member
You definitely wouldn't need anything close to Volta in order to achieve what you want. The Nvidia GTX1070 will be way more than enough for the needs you have. Depending on the games you want to play, the GTX1060 might be enough for you. If you want to game at 120fps plus at 1080p ultra settings in most games, I would go with the GTX1080. The 1080Ti is beyond overkill and you won't see any performance benefit over a GTX1080 really unless you game at extremely high resolutions (or downsample).

You make a compelling reason to get Pascal, I'm just worried that Volta could be something different.
 

Newboi

Member
You make a compelling reason to get Pascal, I'm just worried that Volta could be something different.

In the end, it's up to you. Pascal is going to fulfill the need you have right now (especially at resolutions below 4K). Volta is the shiny looking object in the distance, but the time consumer grade graphics cards come out using the new architecture, there will already be talks of the next architecture that will be so much better than Volta.

This is generally the issue when you wait until near the end of a GPU generation to then decide to get a GPU. The next generation will look much more enticing even if you don't need it.

My general rule of thumb is that if there is a card available that fulfills the needs you have, go ahead an buy it. There will always be a next best thing, especially when it comes to GPUs.
 
Top Bottom