• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Oxide: Nvidia GPU's do not support DX12 Asynchronous Compute/Shaders.

Henrar

Member
Last time I checked, most home PCs only have a single processor. In fact, if you look at the APU in the PS4 and XB1 they actually have 2 quad core processors on the die not a single 8 core cpu. So that must the PS4 is not the future of gaming! Seriously though, what the fuck are you are on?

What he probably meant is that PCs are held by their separate RAM pool architecture and having separate CPU and GPU (that copies RAM back and forth) and PCs should be closer to PS4 architecture.

He is right on that, however there are also huge drawbacks to that solution (thermals and die size when integrating high-end CPU and GPU on the same die)
 

W!CK!D

Banned
Last time I checked, most home PCs only have a single processor. In fact, if you look at the APU in the PS4 and XB1 they actually have 2 quad core processors on the die not a single 8 core cpu. So that must the PS4 is not the future of gaming! Seriously though, what the fuck are you are on?

First of all, there is no need for personal attacks.

A CPU is a processor that consists of a small number of big processing cores. A GPU is a processor that consists of a very huge number of small processing cores. Therefore most home PCs have multiple processors. "APU" is a marketing term for a single processor that consists of different kinds of processing cores. In the case of the PS4, the APU has two Jaguar modules with four x86 cores each and 18 GCN compute units with 64 shader cores each. You distinguish processors as the following: Single core (like Intel Pentium), multi core (Intel Core i7 or any GPU), hetero core (APUs like the one in PS4) and cloud core (Microsoft Azure for example).

If you take a look back, the evolution of computer technology was always about maximum integration. The reason for that you is want to minimize latency as much as possible. A couple of years ago, GPUs only had fixed-function hardware. That means that every core of the GPU was specialized for a certain task. That changed with the so called unified shader model. Today, the shader cores of a modern GPU are freely programmable. Just think of them as extremely stupid CPU cores. The advantage of a freely programmable GPU, however, is that you have thousands of those cores. The PS4 has 1156 shader cores. That makes a GPU perfectly suited for tasks that benefit from mass parallelization like graphics rendering. You can also utilize them for general purpose computations (GPGPU) which, in theory, opens up a whole new world of possibilities since the brute force of a GPU is much higher than the computational power of a traditional CPU. In practice, however, the possibilities of GPGPU are limited by latency.

If you want to do GPGPU on a traditional gaming PC, you have to copy your data from your RAM pool over the PCIe to your VRAM pool. The process of copying costs latency. A roundtrip from CPU -> GPU -> CPU usually takes so long that the performance gain from utilizing the thousands of shader cores gets immediately eaten up by the additional latency: Even if the GPU is much faster at solving the task than the CPU, the process of copying the data back and forth will make the GPGPU approach slower than letting the CPU do it on its own. That's the reason why GPGPU today is only used for things that don't need to be send back to the CPU. The possibilities on a traditional PC are very limited.

The next step in integration is the so-called hetero core processor. You integrate the CPU cores as well as the GPU shader cores on a single processor die and give them one unified RAM pool to work with. That will allow you to get rid of that nasty copy overhead. Till this day, the PS4 has the most powerful hetero core processor (2 TFLOPS @ 176GB/s) available. Not only that, since the APU in PS4 was built for async compute (see Cerny interviews), it can do GPGPU without negatively affecting graphics rendering performance. It's a pretty awesome system architecture, if you want my opinion.

The only problem is, that PC gamers don't have a unified system architecture. The developers of multiplatform engines have to consider that fact. 1st-party console devs can fully utilize the architecture, though.
 

dr_rus

Member
This situation is depressing.

You can see UE4 going out of its way for Gameworks, yet no sight of AsyncCompute on PC which is part of DX12.
This situation isn't depressing at all as we don't know anything firm about this situation yet. But it's nice to see how some people here are already jumping to conclusions on a post made by a guy working on an AMD sponsored game running on beta drivers and all.

There should be the possibility to halt the workload and switch the context.
This would of course lead to worse performance.
Or sure there is such a possibility, but it would halt the workload on GCN as well. It's also somewhat of a bad idea if what you're trying to achieve is running several jobs in parallel.

No, only if the developer really screwed up with DX12.
Yes as we're becoming GPU limited and it doesn't matter how much headroom DX12 provide. This may change with 16nm GPUs though.

Of course not all features are equal and easy to implement.
Conservative Rasterization and Tiled Resources Tier 3 are definitely not straightforward to use without a clear target and use case in mind.
Hence why I've said that this blurb about how NV spent more time in emails during the last two months is straight misleading. You can't re-build the engine in two months and if it was built for GCN/Mantle in the first place then it will run like shit on other h/w.

The developer can choose to put every command in one queue, instead of dispatching additional compute-queues alongside with some synchronisations points.
I would guess, that's the thing oxide did for Nvidia GPUs.
How do you put graphics and compute jobs in one queue? The point of async compute is in running compute jobs (which are loading SPs almost exclusively) in parallel with graphics job (which may spend most of its time in ROPs or memory fetches - thus SPs are free to run the compute job in parallel). In DX12 this is completely transparent to the application as all you need is to run 2+ jobs in parallel at some time - and the API+driver will launch them asynchronously or serialized - depending on the capabilities of the hardware. The only way to make them run in a serial fashion is to launch them one by one checking if the previous one has finished - but this is exactly what the driver must do, doing this in an application is a special case and any special case is bad. So the only logical conclusion I see here at the moment is the quality of NV's DX12 driver.
 

KKRT00

Member
marketing blah.

Now show me real world example of advantage of this.

Because i can show You real world example of non APU and DX11 GPGPU working fine:
https://www.youtube.com/watch?v=sbrFIp73tbw

Also, there is not real performance benchmarks that shows that having more ACE increases or positively impacts async compute utilization.
There is a lot of games [in production or already released] using async, both on Xbone and PC, its not exclusive to PS4, and especially not to 1st party devs, in any way.

---
Thanks for this, as usual.
That explains why I was blown away by The Tomorrow Children beta. Or why some upcoming games like UC4 or Horizon looks that good...

Lol, no.
 

Tworak

Member
AMD AMD AMD! something something more foresight than NVIDIA.
Thanks for this, as usual.
That explains why I was blown away by The Tomorrow Children beta. Or why some upcoming games like UC4 or Horizon looks that good...
nah, there's another reason why you were blown away by that.
 

Tripolygon

Banned
Thanks for this, as usual.
That explains why I was blown away by The Tomorrow Children beta. Or why some upcoming games like UC4 or Horizon looks that good...
Na, you would be more blown away if they have 3, 4 or 5 Teraflops to work with. Those developers are awesome developers no matter what hardware you give them to work with.
 
Could you explain me this then?

image003_w_600.png


With the current model, throwing more bandwidth to the PCIe serves of no purpose. Some things works better being shared, like coherent cache, but some others works better with isolated resources. As convenient as shared pools of RAM is for integration and cost cutting, local vram>system ram from a pure performance standpoint.

Of course, new architectural paradigm might change that, but not current one.

Obviously, the cat at your avatar is yellow but, for some reason, I see it purplish. So I guess I'm high.
 

Alej

Banned
I don't know how we can have an educated discussion about this when some users constantly use fallacious authority arguments (without any explanation) while constantly bashing any other opinions by mocking the user behind it.

The "lol no" and other "marketing blah" are really cringeworthy. It's like no one really understand anything here or just simply look at it from an extremely tight POV (where everything exotic is bad or "marketing blah").

It's not like I want to defend why I bought a PS4 (I play on PC too and even make mods for PC games). But I am constantly amazed by what first party devs can do on consoles and, just because here some guys have said that "optimization" doesn't really exist or that a fixed hardware doesn't offer realistic advantages (see my post history on the subject with answers I have received) then, I want to know what permits that old filthy hardware to blow me away.
 

Arguments from authority and derision are always painful, I agree.
I just think there is such a lack of data regarding the current subject, a lack of transparency on all sides, so much so that making grande sweeping statements regarding AMD and NV's architectures wholesale, or regarding the PC as a whole, are beyond ludicrous.
 

Alej

Banned
Arguments from authority and derision are always painful, I agree.
I just think there is such a lack of data regarding the current subject, a lack of transparency on all sides, so much so that making grande sweeping statements regarding AMD and NV's architectures wholesale, or regarding the PC as a whole, are beyond ludicrous.

But that's not what I'm saying. Every platform has some advantages and PC is arguably the best place to play with an NV GPU. I agree with that but why I'm amazed by Sony's first party then? Marketing blah? Sheeeshh.
 
But that's not what I'm saying. Every platform has some advantages and PC is arguably the best place to play with an NV GPU. I agree with that but why I'm amazed by Sony's first party then? Marketing blah? Sheeeshh.
Your amazement is your own of course, cannot yell at you for that. Take no offence please to what I am about to type.
I think you would be less amazed perhaps if you perspectivized what you were seeing, if you look in the tech presentations or viewed the visual content with an eye for what is happening. Then you can see the exact moments and reasons why said visuals are running on said hardware. Rarely is it then as amazing as without the perspective. An example would be the cool voxel GI in Tomorrow Children. It is nice to see that happening on console (finally), but if you read up on it or compare it to VXGI, SVOTI, etc... it make sense why it is limited the way it is / or acts the way it does: all because of the hardware. It is awesome that it is happening, none the less though.

An example of marketing vs. reality would be that TO: 1886 video before it came out, saying the hardware offered them no limits and they can finally do everything they wanted. Then you read their tech presentations and you see a more nuanced opinion regarding their limitations from hardware.

I tend to think the technology present in console games is rather hohum at times (is that a word? IDK): excluding some rare examples. The rest always seems to come from devs understanding their limits and hiding them so well with great art and solid performance. That typically is sony's first party studios IMO.
 

laxu

Member
We're talking about an architectual limitation on Nvidia's side. No matter the API, Nvidia cards will suck at GPGPU. For example if Nintendo released a new console with a Nvidia GPU, this console wouldn't be able to do async compute. The console maker's engineers know their stuff, however, and in contrast to most PC gamers, these people don't fall for Nvidia's PR lies. There is a reason why AMD gets all the console deals.

No. Please stop spreading FUD. The main reason why AMD is in consoles is because they're the only manufacturer who offers an integrated CPU+GPU solution that has a GPU suited for gaming.

As for Nvidia, it is not yet known if it's lacking Win10 drivers or an actual issue with async compute. If it is an architecture problem then that would be a huge (albeit temporary) victory for AMD as the next Nvidia cards would most likely fix the issue. If that is the case then AMD has been rather forward thinking but at the same time put the wrong cards on the market as it's still going to take roughly a year before DX12 is the norm and in the meantime Nvidia is performing better.


You really shouldn't be using multiple GPUs anyway. Do every video game programmer a favor and buy one single GPU. Optimizations for SLI and Crossfire take a disproportionate amount of time and resources that could be much better invested otherwise. Graphics cards are ridiculously overpriced for years. No need to waste your money on multiple ridiculously overpriced cards.

Only this year have graphics cards improved to the point where single cards can handle 1440p and to a degree even 4K resolutions at high graphic settings. Before that you needed two cards for that. The 980 Ti and Fury X are still pretty expensive too but hopefully in a few years there will be less need for dual GPU setups for high res gaming. Multiple GPUs work fine for the most part and it's entirely up to the developer if they want to support them or not.


It's definitely not worsening the situation. Multiplat engines are still held back by PC graphics cards, though. In a perfect world, every PC would be a unified system with a single processor and a single RAM pool. The concept behind the PS4 system architecture is the future of gaming PCs.

The PS4 needs to run significantly less stuff at the same time than PCs and GDDR5 is finally cheap enough to be fitted in suitable quantity on those. Multiplatform engines have never been held back by PC graphics cards. You're more likely to be limited by the horsepower of the Xbone/PS4 than any separate memory pools and types PCs have.
 
Yes. The way asynchronous programming works is that you should be able to take any asynchronous call and execute it synchronously and the program will work exactly the same from a correctness standpoint.
Well, not necessarily. That would only be true is the native-async system was designed to twiddle its thumbs while it waited for returns. But that would be a pretty poor design. In a proper async system, the caller would move on to other tasks while waiting for returns — dispatching more async jobs, for example — so if they were suddenly forced to do nothing instead, it would mess up all of their timings.


It has yet to be proven that async compute is drastitically important for something like Maxwell 2 archichtere or even if Maxwell 2 does not support it.
Why wouldn't it be important?


Now show me real world example of advantage of this.
From the first page…
Don't mix async timewarp [an engine rendering technique for sampling last user posistion as late as possible] with async compute [taking advantage from multiple hardware pipelines heading to the Compute Units and filling the "empty spots" in the rendering pipeline with more tasks]. AMD intentionally increased the number of ACE pipelines in GCN 1.2, so devs can extract full performance from their cards [esentially, all DX9/11 cards we used for years have wasted a lot of their performance].

First implementation of Async Compute in Tommorow Children on PS4 enabled bunching up of GPU tasks into the tighter schedule that was previously full of holes:
Dzm5roI.png


18% increase of performance just from going from traditional rendering pipeline to taking advantage of radeon ACEs [and that's just in a first implementation of the tech].

The version of GCN that Xbone uses have 4x less ACEs, and I think that Gforce cards have similar sparse setup.


edit - video explanation of the asynchronous compute
https://www.youtube.com/watch?v=v3dUhep0rBs

Because i can show You real world example of non APU and DX11 GPGPU working fine:
https://www.youtube.com/watch?v=sbrFIp73tbw
We know a new technology offers no improvement if it doesn't retroactively break everything which came before it? Da fuq?


As convenient as shared pools of RAM is for integration and cost cutting, local vram>system ram from a pure performance standpoint.
Err, the AMD APUs use GDDR5 as system RAM.
 

bj00rn_

Banned
Sheeeshh.

I just wanted to say that from the viewpoint of someone who's purely interested in reading the technical on-topic details of this thread the juvenile whining about justice for favorite platform is starting to become a little bit painful to watch.

Why wouldn't it be important?

Because DX11 doesn't support it, and because DX12 won't be relevant in the market for a couple of years yet? I don't know, just putting the question out there to those who do..
 

Alej

Banned
I just wanted to say that from the viewpoint of someone who's purely interested in reading the technical on-topic details of this thread the juvenile whining about justice for favorite platform is starting to become a little bit painful to watch.

That's not how it works. I am perfectly in my place here because I am discussing the subject and not whining about some guys I don't want to read here.
I want to know what this implies for PS4, because that console is notoriously build with asynchronous compute in mind. That's why I'm here.

We obviously know that this doesn't mean anything on PC because if 80% of the hardware out there isn't asynchronous compute capable no game will really use that for a half decade. We can even say that PC might not even need asynchronous compute for a long time to provide big improvements in the years to come.

But those PS4 on the market say "hi". Because they are built around taking advantages of asynchronous compute. What advantages, this is the question and OP says this is double digit percent more performance to obtain via asynchronous compute. It may not be groundbreaking at all, but in a closed platform where every advantages and bottlenecks count twice as much, I feel it's not "marketing blah" or boring tech-wise to discuss that.

I don't mind about any platform being better than the other and I've never been mad about guys saying PS4 is "shit" or not for them. What annoys me is guys like you trying to shut off the discussion about this subject just because consoles are some pieces of old shit everyone already experienced with a decade ago on PC.

That's not how it works.
 
I've been trying to follow this thread but I'm having a bit of a melt down.

I'm going to assume that like every new DirectX, that the videocards out upon initial release aren't going to fully support it or really take advantage of it. So.... When will we be able to buy video cards that will fully support and be optimized for it?
 

bee

Member
Could you explain me this then?

image003_w_600.png


With the current model, throwing more bandwidth to the PCIe serves of no purpose. Some things works better being shared, like coherent cache, but some others works better with isolated resources. As convenient as shared pools of RAM is for integration and cost cutting, local vram>system ram from a pure performance standpoint.

Of course, new architectural paradigm might change that, but not current one.

Obviously, the cat at your avatar is yellow but, for some reason, I see it purplish. So I guess I'm high.

at low resolution maybe graph is true but not at 4k and beyond

http://forums.overclockers.co.uk/showthread.php?t=18609613

real users > benchmark sites

edit: bleh, I can't find the better comparison, he's testing there pci-e 2.0 x16 sli vs pci-e 3.0 x16 sli(highend boards only), the comparison for dual cards for normal users configurations i.e pci-e 2.0 x8 sli vs pci-e 3.0 x8 sli is more interesting and is certainly a bottleneck at high resolutions
 

KKRT00

Member
From the first page…

Thats an example of async compute, which is not new. It was used before, even in multiplatform games like Thief. It also does not in anyway show the difference in async utilization with 2xACE and 8xACE.

We know a new technology offers no improvement if it doesn't retroactively break everything which came before it? Da fuq?
I dont think You understand why i posted Flex video. Its the prime example of compute technology that should not viable without unified memory setup, because it requires CPU sync with GPU. Here we see the technology on DX11 hardware with split memory setup and API and its running just fine.


====
We obviously know that this doesn't mean anything on PC because if 80% of the hardware out there isn't asynchronous compute capable no game will really use that for a half decade.
But tons of games are using it or are devleoped with using it. Deus Ex,Tomb Raider and all Frostbite games like Battlefront, NFS or Mirrors Edge just to name few.
Async will be possible on Nvidia's cards, You can bet on that.

====
I don't know how we can have an educated discussion about this when some users constantly use fallacious authority arguments (without any explanation) while constantly bashing any other opinions by mocking the user behind it.

The "lol no" and other "marketing blah" are really cringeworthy. It's like no one really understand anything here or just simply look at it from an extremely tight POV (where everything exotic is bad or "marketing blah").
What educated discussion? What W!CK!D said is the same PR we had at the start of this gen. He said nothing new and havent provided even one practical example.
And You know why he didnt? Because they do not exist. Actually some examples contradict that whole 'theory', like a video i've posted.
 

aeolist

Banned
i don't see the gloom and doom here. the benchmarks show that current nvidia cards, which at the mid and high end are far more powerful than console GPUs, still perform perfectly well under dx12. they don't get the huge boost that AMD is seeing because AMD has been underperforming on dx11.

this isn't going to lead to a situation where the consoles "catch up" or anything like it. they started behind and are staying there.
 

Alej

Banned
But tons of games are using it or are devleoped with using it. Deus Ex,Tomb Raider and all Frostbite games like Battlefront, NFS or Mirrors Edge just to name few.
Async will be possible on Nvidia's cards, You can bet on that.

And that is great then, it should benefits PC ports on consoles. That's nice for everyone.
 

VariantX

Member
i don't see the gloom and doom here. the benchmarks show that current nvidia cards, which at the mid and high end are far more powerful than console GPUs, still perform perfectly well under dx12. they don't get the huge boost that AMD is seeing because AMD has been underperforming on dx11.

this isn't going to lead to a situation where the consoles "catch up" or anything like it. they started behind and are staying there.

That's pretty much my uneducated guess too. Why would they "catch up" when the new consoles have been designed to take advantage of this from the beginning? As far as Nividia and AMD performance in DX12 goes, I dunno. Not enough info for me since its just 1 game and its not even out yet, and only in a single genre. I would think anyone would reasonably wait and see when more games in multiple genres are made fully with DX12 in mind are out in the wild.
 
Please stop spreading FUD.
Err…
You're more likely to be limited by the horsepower of the Xbone/PS4 than any separate memory pools and types PCs have.
Right, because PCs are even more powerful than physics. ><

Perhaps you should explain how you think hUMA works, and we can explain where you've got it wrong.


Because DX11 doesn't support it, and because DX12 won't be relevant in the market for a couple of years yet? I don't know, just putting the question out there to those who do..
So, async isn't really important because we're gonna keep ignoring it a while longer? lol

If Maxwell is already being well utilized. Obviously any bit helps though as mentioned earler in the thread.
Well, it's not being well utilized. That's the problem async compute aims to solve, actually. You know a GPU isn't homogenous inside, right? Within the GPU there are various types of processors, specialized in different types of math. Throughout the rendering process, there will be different groups of processors sitting idle at various times, because the math they know how to do isn't needed on this particular cycle. It's waiting for its neighbor to finish preparing the data it needs.

Async compute can improve utilization by assigning additional jobs to those idle processors. This is illustrated in the diagrams DieH@rd posted earlier, where anything black represents processors going unused.


Plus, the GPU is simply better at some stuff than the CPU is, so simply having the GPU do it instead is a win, especially if you can do it without interrupting the rendering. Maxwell 2 finally adds a 32-queue compute scheduler, but it's sounding like it's still not actually asynchronous, and the GPU actually rapidly switches contexts between render and compute. So you can quickly switch between job types, but you can't truly run them simultaneously using idle transistors while the rendering takes place around you. Compare this with the eight, 8-queue schedulers on GCN, which are fully independent of the render scheduling, and basically have free access to any transistors not actively being used for rendering.

Oh, and another advantage of a truly asynchronous system is that once the CPU dispatches a job to the GPU, it doesn't need to sit there doing nothing while it waits for the result. As a crude example, the CPU can ask the GPU what actor1 can see. Instead of simply waiting for the result, the CPU can then ask what actor2 is able to see, and actor3, and so on. As the results start coming back from the GPU, then the CPU can decide what action actor1 will take based on what he can see. GPUs are good at ray-casting, and CPUs are good at branchy decision making, so the flexibility provided by async again allows us to utilize resources more efficiently. Running strictly on a CPU, a typical AI routine would spend 90% of its cycles determining perception and pathfinding for the actor. So if you normally spend 10ms on AI, 9ms is spent on perception and pathfinding. The GPU may be able to do that part of the job in 2ms, so you get your results a lot faster this way, but perhaps more importantly, you've freed up 9ms on the CPU, giving it time to work on other tasks.

So no matter how powerful your system is, async compute just makes it that much more powerful. hUMA makes it more powerful still, because the shared memory increases your opportunities to leverage GPGPU overall.
 

Arkanius

Member
Some users at Beyond3D have tested the Async performance of both the 980 Ti and the 7870XT

980 Ti

pJqBBDS.png


7870XT

nu2NDtM.png


980 TI can't run both tasks in parallel, they are ran in a sequence order.
Apparently, this is a huge blow for VR performance (Not much into VR yet). It seems Nvidia have lag issues that create nausea due this latency of having to run both tasks in sequence.

Something else that could be understood from this graph:

Nvidia has a lower latency despite doing it in a serial way if they batch the Compute + Graphics jobs in batches of 32?
 

prag16

Banned
My gawd. The conclusions being jumped to in this topic... unbelievable. PS4/AMD fans are getting way too excited, and some nvidia/PC fans are getting way too downtrodden, based on almost no information. The people claiming the PS4's GPU will now somehow completely "catch up" to a 970 are batshit insane, even in the best case (for AMD, worst for nvidia).

Relax...

I mean, durante hasn't even weighed in yet. Way too soon for anybody to be panicking and/or celebrating.
 

Arkanius

Member
My gawd. The conclusions being jumped to in this topic... unbelievable. PS4/AMD fans are getting way too excited, and some nvidia/PC fans are getting way too downtrodden, based on almost no information.

Relax...

I mean, durante hasn't even weighed in yet. Way too soon for anybody to be panicking and/or celebrating.

It's not jumping to conclusions if you are analyzing data.
We don't need an Nvidia PR statement to start drawing some lines though.
 

Kezen

Banned
Deus Ex Mankind Divided and Rise of the Tomb Raider use async compute extensively, it's going to be interesting to see which class of hardware can offer a console-like experience then.

I don't believe much will change on the AMD side (Intel quad core + 2012 era GPU 7850/7870 to match the PS4) but on Nvidia it could lead to a rise.
 

Darg

Neo Member
My gawd. The conclusions being jumped to in this topic... unbelievable. PS4/AMD fans are getting way too excited, and some nvidia/PC fans are getting way too downtrodden, based on almost no information. The people claiming the PS4's GPU will now somehow completely "catch up" to a 970 are batshit insane, even in the best case (for AMD, worst for nvidia).

Relax...

I mean, durante hasn't even weighed in yet. Way too soon for anybody to be panicking and/or celebrating.

Yup, sorry to say this but in no world will it become a 970, i mean you have to realize just how much more powerful a 970 is compared to console gpu's, no magic is going to help.

Anyhow, i'll wait for more games it's still quite early for anything really.
 

Henrar

Member
Could you explain me this then?

image003_w_600.png


With the current model, throwing more bandwidth to the PCIe serves of no purpose. Some things works better being shared, like coherent cache, but some others works better with isolated resources. As convenient as shared pools of RAM is for integration and cost cutting, local vram>system ram from a pure performance standpoint.

I believe that graph doesn't tell full story. I mean, when you're not VRAM limited on GPU everything should be fine as everything sits in high-bandwidth internal memory of GPU. However, when you run out of memory, you should see slower PCI becoming bottleneck.
Although in every case you'll see stuttering as PCi bandwidth is way lower than GDDR5/HBM bandwidth.
 

Vinland

Banned
Having a feature and having a feature done correctly are two different things. This could be a huge error in marketing or it maybe something they can resolve in their driver. The problem with this thread is there are people defending Nvidia with arguments that read like this to me:

"Dx11 is all that matters anyways so who cares."

"Nvidia is still faster serially so who cares."

"No, nvidia just have shit drivers for dx12 and no it's not ironic."

"Nvidia have proper asynchronous compute but I won't provide any proof other than marketing bullet points."

I mean, that is cute and all but honestly it wears a bit thin. Nvidia could have bungled this but people want to believe they didn't like a cult. It is a product and in the end if they can fix it in software good for them. If they can't then hopefully their next product can do it. And if AMD actually do have better async. then so be it because at the end of the day they aren't our friends they are companies who drive each other to market innovations for something new and shines to sell us.
 
Thats an example of async compute, which is not new.
Yes, and async compute was what W!CK!D was discussing when you asked for a real world example of the benefit, so I relinked the example that had already been provided on the first page.

It also does not in anyway show the difference in async utilization with 2xACE and 8xACE.
What? You want an example that shows being able to dispatch more jobs is better than being able to dispatch less? You're asking for proof that 64 is greater than 16? Is this a "64KB should be enough for anyone" argument? =/

I don't think You understand why i posted Flex video.
I really don't, no. It was pretty, but it didn't explain why hUMA isn't beneficial. It didn't explain anything at all.

Its the prime example of compute technology that should not viable without unified memory setup, because it requires CPU sync with GPU. Here we see the technology on DX11 hardware with split memory setup and API and its running just fine.
You're saying it proves hUMA is unnecessary because they have time to copy results back to the CPU from the GPU? That doesn't prove anything of the sort. Being able to get some copy operations in under the wire doesn't prove there's no advantage to eliminating them entirely.

Your arguments don't even make sense. =/
 

KKRT00

Member
What? You want an example that shows being able to dispatch more jobs is better than being able to dispatch less? You're asking for proof that 64 is greater than 16? Is this a "64KB should be enough for anyone" argument? =/
Yes, i want real world example of it having meaningful performance advantage in rendering in games.


I really don't, no. It was pretty, but it didn't explain why hUMA isn't beneficial. It didn't explain anything at all.
It shows that You dont need hUMA to have proper CPU and GPU sync in gaming related scenarios. Multiple physics engines being synced together is ultimate test for that.
Other thing that hUMA is benefitial is searching patterns in large set of data, ala DB, but this is not used in gaming.

You're saying it proves hUMA is unnecessary because they have time to copy results back to the CPU from the GPU? That doesn't prove anything of the sort. Being able to get some copy operations in under the wire doesn't prove there's no advantage to eliminating them entirely.
Yes, i'm saying that its not necessary for gaming. At least it hasnt been proofed yet to be necessary, by any dev.

Your arguments don't even make sense. =/
Sure, they dont, they are not supercharged enough.
 

aeolist

Banned
Having a feature and having a feature done correctly are two different things. This could be a huge error in marketing or it maybe something they can resolve in their driver. The problem with this thread is there are people defending Nvidia with arguments that read like this to me:

"Dx11 is all that matters anyways so who cares."

"Nvidia is still faster serially so who cares."

"No, nvidia just have shit drivers for dx12 and no it's not ironic."

"Nvidia have proper asynchronous compute but I won't provide any proof other than marketing bullet points."

I mean, that is cute and all but honestly it wears a bit thin. Nvidia could have bungled this but people want to believe they didn't like a cult. It is a product and in the end if they can fix it in software good for them. If they can't then hopefully their next product can do it. And if AMD actually do have better async. then so be it because at the end of the day they aren't our friends they are companies who drive each other to market innovations for something new and shines to sell us.

it's not so much that nvidia bungled anything, they just designed their products with an eye on the short-term market and want to upsell people to more feature-rich GPUs down the line when those features become relevant. AMD has been trying to guess long-term trends for the last decade and it's done nothing but bite them in the ass and make their products worse to use right now. it may sound nice in a general way to want to design your products with an eye on future needs but ultimately it hasn't worked.

in the end current nvidia cards are still quite performant and will scale decently, and future ones will be better. computer parts are more easily replaceable than ever and people shouldn't be trying to make a PC that's future proofed because that's just not the economically smart thing to do.
 

cyberheater

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4
Well, it's not being well utilized. That's the problem async compute aims to solve, actually. You know a GPU isn't homogenous inside, right? Within the GPU there are various types of processors, specialized in different types of math. Throughout the rendering process, there will be different groups of processors sitting idle at various times, because the math they know how to do isn't needed on this particular cycle. It's waiting for its neighbor to finish preparing the data it needs.

Async compute can improve utilization by assigning additional jobs to those idle processors. This is illustrated in the diagrams DieH@rd posted earlier, where anything black represents processors going unused.



Plus, the GPU is simply better at some stuff than the CPU is, so simply having the GPU do it instead is a win, especially if you can do it without interrupting the rendering. Maxwell 2 finally adds a 32-queue compute scheduler, but it's sounding like it's still not actually asynchronous, and the GPU actually rapidly switches contexts between render and compute. So you can quickly switch between job types, but you can't truly run them simultaneously using idle transistors while the rendering takes place around you. Compare this with the eight, 8-queue schedulers on GCN, which are fully independent of the render scheduling, and basically have free access to any transistors not actively being used for rendering.

Oh, and another advantage of a truly asynchronous system is that once the CPU dispatches a job to the GPU, it doesn't need to sit there doing nothing while it waits for the result. As a crude example, the CPU can ask the GPU what actor1 can see. Instead of simply waiting for the result, the CPU can then ask what actor2 is able to see, and actor3, and so on. As the results start coming back from the GPU, then the CPU can decide what action actor1 will take based on what he can see. GPUs are good at ray-casting, and CPUs are good at branchy decision making, so the flexibility provided by async again allows us to utilize resources more efficiently. Running strictly on a CPU, a typical AI routine would spend 90% of its cycles determining perception and pathfinding for the actor. So if you normally spend 10ms on AI, 9ms is spent on perception and pathfinding. The GPU may be able to do that part of the job in 2ms, so you get your results a lot faster this way, but perhaps more importantly, you've freed up 9ms on the CPU, giving it time to work on other tasks.

So no matter how powerful your system is, async compute just makes it that much more powerful. hUMA makes it more powerful still, because the shared memory increases your opportunities to leverage GPGPU overall.

That was as good read. Thanks.
 

Arulan

Member
The worst thing about consoles unanimously using AMD hardware now is that it's just not petty "Green vs. Red" discussion, but console warriors pushing their thinly veiled agenda behind the AMD flag.

It's definitely not worsening the situation. Multiplat engines are still held back by PC graphics cards, though. In a perfect world, every PC would be a unified system with a single processor and a single RAM pool. The concept behind the PS4 system architecture is the future of gaming PCs.

Didn't your old account (W!CKED) get banned?

As for the topic at hand, it's nice to see AMD put some pressure on Nvidia. It's interesting to see AMD's architecture work well with DX12, but that doesn't make the years of relatively terrible DX11 performance pay off. We'll have to wait and see what Pascal offers, and DX12 benchmarks under real-world scenarios.
 

Irobot82

Member
Something else that could be understood from this graph:

Nvidia has a lower latency despite doing it in a serial way if they batch the Compute + Graphics jobs in batches of 32?

Are you seriously trying to compare a 980ti to a 7870XT? No shit it has lower latency.
 

tuxfool

Banned
If Maxwell is already being well utilized. Obviously any bit helps though as mentioned earler in the thread.

I don't think gpu utilization is as straightforward as it appears on most user tools. For example do we know if ROP heavy code reports the same utilization level as as heavy shader/alu code? These two cases use different parts of the gpu and doing two tasks on the graphics pipe would involve a context switch to handle both tasks.

The comparison to hyperthreading is somewhat apt, because that is one of the cases where async compute stands to provide the most benefit.
 

Renekton

Member
This situation isn't depressing at all as we don't know anything firm about this situation yet. But it's nice to see how some people here are already jumping to conclusions on a post made by a guy working on an AMD sponsored game running on beta drivers and all.
Regardless of what the Oxide guy says, I don't see PC GPU Asynchronous Compute support in UE4 doc search. (someone correct me plox). The rash conclusion I'm jumping to is that Epic may have put this on backburner with view of Nvidia's overwhelming market share.

Oh well if by the off-chance Maxwell doesn't implement it, Pascal is more than guaranteed to have it so it's just another year away.
 

frontieruk

Member
Why does the latency in Nvidia's cards get progressively worse? Or am I still interpreting that incorrectly?

The test increases the number of single lane compute kernels from 1 to 128, as the batches get more complex the latency increases which is leading to the theory that Nvidia isn't doing async but serial processing.
 

Irobot82

Member
The test increases the number of single lane compute kernels from 1 to 128, as the batches get more complex the latency increases which is leading to the theory that Nvidia isn't doing async but serial processing.

This is starting to get above me. So does this mean if nvidias card has to start using more and more compute it gets worse in latency but AMD's so far stay the same but it's much higher in latency.
 

dogen

Member
This is starting to get above me. So does this mean if nvidias card has to start using more and more compute it gets worse in latency but AMD's so far stay the same but it's much higher in latency.

I think it would be interesting to see the test done with a higher number of concurrent tasks. Seems like there's a startup overhead for AMD or something.
 

Daffy Duck

Member
Buys 980Ti Friday, this news breaks today....

gmGW5tf.gif


So it's time to bin it already. So the card will not be able to run anything on DX12 above 30FPS?
 

frontieruk

Member
I think it would be interesting to see the test done with a higher number of concurrent tasks. Seems like there's a startup overhead for AMD or something.

That's been put forward to the tests author, just to see where AMD starts hitting a pattern like NV.

Buys 980Ti Friday, this news breaks today....

gmGW5tf.gif


So it's time to bin it already. So the card will not be able to run anything on DX12 above 30FPS?

this is a particularly limited case, due to the style of game, Fable Legends and maybe Gears Ultimate will show a more everyday case, shouldn't Ark have DX12 now? That'll be an interesting case the Devs said it offered a 20% perf increase.
 
Top Bottom