Support NeoGAF

Henrar · Aug 31, 2015

Pimpbaa said:
Last time I checked, most home PCs only have a single processor. In fact, if you look at the APU in the PS4 and XB1 they actually have 2 quad core processors on the die not a single 8 core cpu. So that must the PS4 is not the future of gaming! Seriously though, what the fuck are you are on?

What he probably meant is that PCs are held by their separate RAM pool architecture and having separate CPU and GPU (that copies RAM back and forth) and PCs should be closer to PS4 architecture.

He is right on that, however there are also huge drawbacks to that solution (thermals and die size when integrating high-end CPU and GPU on the same die)

W!CK!D · Aug 31, 2015

Pimpbaa said:
Last time I checked, most home PCs only have a single processor. In fact, if you look at the APU in the PS4 and XB1 they actually have 2 quad core processors on the die not a single 8 core cpu. So that must the PS4 is not the future of gaming! Seriously though, what the fuck are you are on?

First of all, there is no need for personal attacks.

A CPU is a processor that consists of a small number of big processing cores. A GPU is a processor that consists of a very huge number of small processing cores. Therefore most home PCs have multiple processors. "APU" is a marketing term for a single processor that consists of different kinds of processing cores. In the case of the PS4, the APU has two Jaguar modules with four x86 cores each and 18 GCN compute units with 64 shader cores each. You distinguish processors as the following: Single core (like Intel Pentium), multi core (Intel Core i7 or any GPU), hetero core (APUs like the one in PS4) and cloud core (Microsoft Azure for example).

If you take a look back, the evolution of computer technology was always about maximum integration. The reason for that you is want to minimize latency as much as possible. A couple of years ago, GPUs only had fixed-function hardware. That means that every core of the GPU was specialized for a certain task. That changed with the so called unified shader model. Today, the shader cores of a modern GPU are freely programmable. Just think of them as extremely stupid CPU cores. The advantage of a freely programmable GPU, however, is that you have thousands of those cores. The PS4 has 1156 shader cores. That makes a GPU perfectly suited for tasks that benefit from mass parallelization like graphics rendering. You can also utilize them for general purpose computations (GPGPU) which, in theory, opens up a whole new world of possibilities since the brute force of a GPU is much higher than the computational power of a traditional CPU. In practice, however, the possibilities of GPGPU are limited by latency.

If you want to do GPGPU on a traditional gaming PC, you have to copy your data from your RAM pool over the PCIe to your VRAM pool. The process of copying costs latency. A roundtrip from CPU -> GPU -> CPU usually takes so long that the performance gain from utilizing the thousands of shader cores gets immediately eaten up by the additional latency: Even if the GPU is much faster at solving the task than the CPU, the process of copying the data back and forth will make the GPGPU approach slower than letting the CPU do it on its own. That's the reason why GPGPU today is only used for things that don't need to be send back to the CPU. The possibilities on a traditional PC are very limited.

The next step in integration is the so-called hetero core processor. You integrate the CPU cores as well as the GPU shader cores on a single processor die and give them one unified RAM pool to work with. That will allow you to get rid of that nasty copy overhead. Till this day, the PS4 has the most powerful hetero core processor (2 TFLOPS @ 176GB/s) available. Not only that, since the APU in PS4 was built for async compute (see Cerny interviews), it can do GPGPU without negatively affecting graphics rendering performance. It's a pretty awesome system architecture, if you want my opinion.

The only problem is, that PC gamers don't have a unified system architecture. The developers of multiplatform engines have to consider that fact. 1st-party console devs can fully utilize the architecture, though.

Alej · Aug 31, 2015

W!CK!D said:
Very good explanation

Thanks for this, as usual.
That explains why I was blown away by The Tomorrow Children beta. Or why some upcoming games like UC4 or Horizon looks that good...

dr_rus · Aug 31, 2015

Renekton said:
This situation is depressing.

You can see UE4 going out of its way for Gameworks, yet no sight of AsyncCompute on PC which is part of DX12.

This situation isn't depressing at all as we don't know anything firm about this situation yet. But it's nice to see how some people here are already jumping to conclusions on a post made by a guy working on an AMD sponsored game running on beta drivers and all.

Locuza said:
There should be the possibility to halt the workload and switch the context.
This would of course lead to worse performance.

Or sure there is such a possibility, but it would halt the workload on GCN as well. It's also somewhat of a bad idea if what you're trying to achieve is running several jobs in parallel.

Locuza said:
No, only if the developer really screwed up with DX12.

Yes as we're becoming GPU limited and it doesn't matter how much headroom DX12 provide. This may change with 16nm GPUs though.

Locuza said:
Of course not all features are equal and easy to implement.
Conservative Rasterization and Tiled Resources Tier 3 are definitely not straightforward to use without a clear target and use case in mind.

Hence why I've said that this blurb about how NV spent more time in emails during the last two months is straight misleading. You can't re-build the engine in two months and if it was built for GCN/Mantle in the first place then it will run like shit on other h/w.

Locuza said:
The developer can choose to put every command in one queue, instead of dispatching additional compute-queues alongside with some synchronisations points.
I would guess, that's the thing oxide did for Nvidia GPUs.

How do you put graphics and compute jobs in one queue? The point of async compute is in running compute jobs (which are loading SPs almost exclusively) in parallel with graphics job (which may spend most of its time in ROPs or memory fetches - thus SPs are free to run the compute job in parallel). In DX12 this is completely transparent to the application as all you need is to run 2+ jobs in parallel at some time - and the API+driver will launch them asynchronously or serialized - depending on the capabilities of the hardware. The only way to make them run in a serial fashion is to launch them one by one checking if the previous one has finished - but this is exactly what the driver must do, doing this in an application is a special case and any special case is bad. So the only logical conclusion I see here at the moment is the quality of NV's DX12 driver.

KKRT00 · Aug 31, 2015

W!CK!D said:
marketing blah.

Now show me real world example of advantage of this.

Because i can show You real world example of non APU and DX11 GPGPU working fine:
https://www.youtube.com/watch?v=sbrFIp73tbw

Also, there is not real performance benchmarks that shows that having more ACE increases or positively impacts async compute utilization.
There is a lot of games [in production or already released] using async, both on Xbone and PC, its not exclusive to PS4, and especially not to 1st party devs, in any way.

---

Alej said:
Thanks for this, as usual.
That explains why I was blown away by The Tomorrow Children beta. Or why some upcoming games like UC4 or Horizon looks that good...

Lol, no.

Tworak · Aug 31, 2015

AMD AMD AMD! something something more foresight than NVIDIA.

Alej said:
Thanks for this, as usual.
That explains why I was blown away by The Tomorrow Children beta. Or why some upcoming games like UC4 or Horizon looks that good...

nah, there's another reason why you were blown away by that.

Arkanius · Aug 31, 2015

Coulomb_Barrier said:
You got a link to these graphs?

Here

https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-8

Tripolygon · Aug 31, 2015

Alej said:
Thanks for this, as usual.
That explains why I was blown away by The Tomorrow Children beta. Or why some upcoming games like UC4 or Horizon looks that good...

Na, you would be more blown away if they have 3, 4 or 5 Teraflops to work with. Those developers are awesome developers no matter what hardware you give them to work with.

dr. apocalipsis · Aug 31, 2015

Could you explain me this then?

With the current model, throwing more bandwidth to the PCIe serves of no purpose. Some things works better being shared, like coherent cache, but some others works better with isolated resources. As convenient as shared pools of RAM is for integration and cost cutting, local vram>system ram from a pure performance standpoint.

Of course, new architectural paradigm might change that, but not current one.

Obviously, the cat at your avatar is yellow but, for some reason, I see it purplish. So I guess I'm high.

Alej · Aug 31, 2015

I don't know how we can have an educated discussion about this when some users constantly use fallacious authority arguments (without any explanation) while constantly bashing any other opinions by mocking the user behind it.

The "lol no" and other "marketing blah" are really cringeworthy. It's like no one really understand anything here or just simply look at it from an extremely tight POV (where everything exotic is bad or "marketing blah").

It's not like I want to defend why I bought a PS4 (I play on PC too and even make mods for PC games). But I am constantly amazed by what first party devs can do on consoles and, just because here some guys have said that "optimization" doesn't really exist or that a fixed hardware doesn't offer realistic advantages (see my post history on the subject with answers I have received) then, I want to know what permits that old filthy hardware to blow me away.

Dictator93 · Aug 31, 2015

Alej said:
snip

Arguments from authority and derision are always painful, I agree.
I just think there is such a lack of data regarding the current subject, a lack of transparency on all sides, so much so that making grande sweeping statements regarding AMD and NV's architectures wholesale, or regarding the PC as a whole, are beyond ludicrous.

Alej · Aug 31, 2015

Dictator93 said:
Arguments from authority and derision are always painful, I agree.
I just think there is such a lack of data regarding the current subject, a lack of transparency on all sides, so much so that making grande sweeping statements regarding AMD and NV's architectures wholesale, or regarding the PC as a whole, are beyond ludicrous.

But that's not what I'm saying. Every platform has some advantages and PC is arguably the best place to play with an NV GPU. I agree with that but why I'm amazed by Sony's first party then? Marketing blah? Sheeeshh.

Dictator93 · Aug 31, 2015

Alej said:
But that's not what I'm saying. Every platform has some advantages and PC is arguably the best place to play with an NV GPU. I agree with that but why I'm amazed by Sony's first party then? Marketing blah? Sheeeshh.

Your amazement is your own of course, cannot yell at you for that. Take no offence please to what I am about to type.
I think you would be less amazed perhaps if you perspectivized what you were seeing, if you look in the tech presentations or viewed the visual content with an eye for what is happening. Then you can see the exact moments and reasons why said visuals are running on said hardware. Rarely is it then as amazing as without the perspective. An example would be the cool voxel GI in Tomorrow Children. It is nice to see that happening on console (finally), but if you read up on it or compare it to VXGI, SVOTI, etc... it make sense why it is limited the way it is / or acts the way it does: all because of the hardware. It is awesome that it is happening, none the less though.

An example of marketing vs. reality would be that TO: 1886 video before it came out, saying the hardware offered them no limits and they can finally do everything they wanted. Then you read their tech presentations and you see a more nuanced opinion regarding their limitations from hardware.

I tend to think the technology present in console games is rather hohum at times (is that a word? IDK): excluding some rare examples. The rest always seems to come from devs understanding their limits and hiding them so well with great art and solid performance. That typically is sony's first party studios IMO.

laxu · Aug 31, 2015

W!CK!D said:
We're talking about an architectual limitation on Nvidia's side. No matter the API, Nvidia cards will suck at GPGPU. For example if Nintendo released a new console with a Nvidia GPU, this console wouldn't be able to do async compute. The console maker's engineers know their stuff, however, and in contrast to most PC gamers, these people don't fall for Nvidia's PR lies. There is a reason why AMD gets all the console deals.

No. Please stop spreading FUD. The main reason why AMD is in consoles is because they're the only manufacturer who offers an integrated CPU+GPU solution that has a GPU suited for gaming.

As for Nvidia, it is not yet known if it's lacking Win10 drivers or an actual issue with async compute. If it is an architecture problem then that would be a huge (albeit temporary) victory for AMD as the next Nvidia cards would most likely fix the issue. If that is the case then AMD has been rather forward thinking but at the same time put the wrong cards on the market as it's still going to take roughly a year before DX12 is the norm and in the meantime Nvidia is performing better.

You really shouldn't be using multiple GPUs anyway. Do every video game programmer a favor and buy one single GPU. Optimizations for SLI and Crossfire take a disproportionate amount of time and resources that could be much better invested otherwise. Graphics cards are ridiculously overpriced for years. No need to waste your money on multiple ridiculously overpriced cards.

Only this year have graphics cards improved to the point where single cards can handle 1440p and to a degree even 4K resolutions at high graphic settings. Before that you needed two cards for that. The 980 Ti and Fury X are still pretty expensive too but hopefully in a few years there will be less need for dual GPU setups for high res gaming. Multiple GPUs work fine for the most part and it's entirely up to the developer if they want to support them or not.

It's definitely not worsening the situation. Multiplat engines are still held back by PC graphics cards, though. In a perfect world, every PC would be a unified system with a single processor and a single RAM pool. The concept behind the PS4 system architecture is the future of gaming PCs.

The PS4 needs to run significantly less stuff at the same time than PCs and GDDR5 is finally cheap enough to be fitted in suitable quantity on those. Multiplatform engines have never been held back by PC graphics cards. You're more likely to be limited by the horsepower of the Xbone/PS4 than any separate memory pools and types PCs have.

serversurfer · Aug 31, 2015

tokkun said:
Yes. The way asynchronous programming works is that you should be able to take any asynchronous call and execute it synchronously and the program will work exactly the same from a correctness standpoint.

Well, not necessarily. That would only be true is the native-async system was designed to twiddle its thumbs while it waited for returns. But that would be a pretty poor design. In a proper async system, the caller would move on to other tasks while waiting for returns dispatching more async jobs, for example so if they were suddenly forced to do nothing instead, it would mess up all of their timings.

Dictator93 said:
It has yet to be proven that async compute is drastitically important for something like Maxwell 2 archichtere or even if Maxwell 2 does not support it.

Why wouldn't it be important?

KKRT00 said:
Now show me real world example of advantage of this.

From the first page

DieH@rd said:
Don't mix async timewarp [an engine rendering technique for sampling last user posistion as late as possible] with async compute [taking advantage from multiple hardware pipelines heading to the Compute Units and filling the "empty spots" in the rendering pipeline with more tasks]. AMD intentionally increased the number of ACE pipelines in GCN 1.2, so devs can extract full performance from their cards [esentially, all DX9/11 cards we used for years have wasted a lot of their performance].

First implementation of Async Compute in Tommorow Children on PS4 enabled bunching up of GPU tasks into the tighter schedule that was previously full of holes:

18% increase of performance just from going from traditional rendering pipeline to taking advantage of radeon ACEs [and that's just in a first implementation of the tech].

The version of GCN that Xbone uses have 4x less ACEs, and I think that Gforce cards have similar sparse setup.

edit - video explanation of the asynchronous compute
https://www.youtube.com/watch?v=v3dUhep0rBs

KKRT00 said:
Because i can show You real world example of non APU and DX11 GPGPU working fine:
https://www.youtube.com/watch?v=sbrFIp73tbw

We know a new technology offers no improvement if it doesn't retroactively break everything which came before it? Da fuq?

dr. apocalipsis said:
As convenient as shared pools of RAM is for integration and cost cutting, local vram>system ram from a pure performance standpoint.

Err, the AMD APUs use GDDR5 as system RAM.

bj00rn_ · Aug 31, 2015

Alej said:
Sheeeshh.

I just wanted to say that from the viewpoint of someone who's purely interested in reading the technical on-topic details of this thread the juvenile whining about justice for favorite platform is starting to become a little bit painful to watch.

serversurfer said:
Why wouldn't it be important?

Because DX11 doesn't support it, and because DX12 won't be relevant in the market for a couple of years yet? I don't know, just putting the question out there to those who do..

Dictator93 · Aug 31, 2015

serversurfer said:
Why wouldn't it be important?

If Maxwell is already being well utilized. Obviously any bit helps though as mentioned earler in the thread.

Alej · Aug 31, 2015

bj00rn_ said:
I just wanted to say that from the viewpoint of someone who's purely interested in reading the technical on-topic details of this thread the juvenile whining about justice for favorite platform is starting to become a little bit painful to watch.

That's not how it works. I am perfectly in my place here because I am discussing the subject and not whining about some guys I don't want to read here.
I want to know what this implies for PS4, because that console is notoriously build with asynchronous compute in mind. That's why I'm here.

We obviously know that this doesn't mean anything on PC because if 80% of the hardware out there isn't asynchronous compute capable no game will really use that for a half decade. We can even say that PC might not even need asynchronous compute for a long time to provide big improvements in the years to come.

But those PS4 on the market say "hi". Because they are built around taking advantages of asynchronous compute. What advantages, this is the question and OP says this is double digit percent more performance to obtain via asynchronous compute. It may not be groundbreaking at all, but in a closed platform where every advantages and bottlenecks count twice as much, I feel it's not "marketing blah" or boring tech-wise to discuss that.

I don't mind about any platform being better than the other and I've never been mad about guys saying PS4 is "shit" or not for them. What annoys me is guys like you trying to shut off the discussion about this subject just because consoles are some pieces of old shit everyone already experienced with a decade ago on PC.

That's not how it works.

Baron_Calamity · Aug 31, 2015

I've been trying to follow this thread but I'm having a bit of a melt down.

I'm going to assume that like every new DirectX, that the videocards out upon initial release aren't going to fully support it or really take advantage of it. So.... When will we be able to buy video cards that will fully support and be optimized for it?

bee · Aug 31, 2015

dr. apocalipsis said:
Could you explain me this then?

With the current model, throwing more bandwidth to the PCIe serves of no purpose. Some things works better being shared, like coherent cache, but some others works better with isolated resources. As convenient as shared pools of RAM is for integration and cost cutting, local vram>system ram from a pure performance standpoint.

Of course, new architectural paradigm might change that, but not current one.

Obviously, the cat at your avatar is yellow but, for some reason, I see it purplish. So I guess I'm high.

at low resolution maybe graph is true but not at 4k and beyond

http://forums.overclockers.co.uk/showthread.php?t=18609613

real users > benchmark sites

edit: bleh, I can't find the better comparison, he's testing there pci-e 2.0 x16 sli vs pci-e 3.0 x16 sli(highend boards only), the comparison for dual cards for normal users configurations i.e pci-e 2.0 x8 sli vs pci-e 3.0 x8 sli is more interesting and is certainly a bottleneck at high resolutions

KKRT00 · Aug 31, 2015

serversurfer said:
From the first page…

Thats an example of async compute, which is not new. It was used before, even in multiplatform games like Thief. It also does not in anyway show the difference in async utilization with 2xACE and 8xACE.

serversurfer said:
We know a new technology offers no improvement if it doesn't retroactively break everything which came before it? Da fuq?

I dont think You understand why i posted Flex video. Its the prime example of compute technology that should not viable without unified memory setup, because it requires CPU sync with GPU. Here we see the technology on DX11 hardware with split memory setup and API and its running just fine.

====

Alej said:
We obviously know that this doesn't mean anything on PC because if 80% of the hardware out there isn't asynchronous compute capable no game will really use that for a half decade.

But tons of games are using it or are devleoped with using it. Deus Ex,Tomb Raider and all Frostbite games like Battlefront, NFS or Mirrors Edge just to name few.
Async will be possible on Nvidia's cards, You can bet on that.

====

Alej said:
I don't know how we can have an educated discussion about this when some users constantly use fallacious authority arguments (without any explanation) while constantly bashing any other opinions by mocking the user behind it.

The "lol no" and other "marketing blah" are really cringeworthy. It's like no one really understand anything here or just simply look at it from an extremely tight POV (where everything exotic is bad or "marketing blah").

What educated discussion? What W!CK!D said is the same PR we had at the start of this gen. He said nothing new and havent provided even one practical example.
And You know why he didnt? Because they do not exist. Actually some examples contradict that whole 'theory', like a video i've posted.

aeolist · Aug 31, 2015

i don't see the gloom and doom here. the benchmarks show that current nvidia cards, which at the mid and high end are far more powerful than console GPUs, still perform perfectly well under dx12. they don't get the huge boost that AMD is seeing because AMD has been underperforming on dx11.

this isn't going to lead to a situation where the consoles "catch up" or anything like it. they started behind and are staying there.

Alej · Aug 31, 2015

KKRT00 said:
But tons of games are using it or are devleoped with using it. Deus Ex,Tomb Raider and all Frostbite games like Battlefront, NFS or Mirrors Edge just to name few.
Async will be possible on Nvidia's cards, You can bet on that.

And that is great then, it should benefits PC ports on consoles. That's nice for everyone.

VariantX · Aug 31, 2015

aeolist said:
i don't see the gloom and doom here. the benchmarks show that current nvidia cards, which at the mid and high end are far more powerful than console GPUs, still perform perfectly well under dx12. they don't get the huge boost that AMD is seeing because AMD has been underperforming on dx11.

this isn't going to lead to a situation where the consoles "catch up" or anything like it. they started behind and are staying there.

That's pretty much my uneducated guess too. Why would they "catch up" when the new consoles have been designed to take advantage of this from the beginning? As far as Nividia and AMD performance in DX12 goes, I dunno. Not enough info for me since its just 1 game and its not even out yet, and only in a single genre. I would think anyone would reasonably wait and see when more games in multiple genres are made fully with DX12 in mind are out in the wild.

serversurfer · Aug 31, 2015

laxu said:
Please stop spreading FUD.

Err

You're more likely to be limited by the horsepower of the Xbone/PS4 than any separate memory pools and types PCs have.

Right, because PCs are even more powerful than physics. ><

Perhaps you should explain how you think hUMA works, and we can explain where you've got it wrong.

bj00rn_ said:
Because DX11 doesn't support it, and because DX12 won't be relevant in the market for a couple of years yet? I don't know, just putting the question out there to those who do..

So, async isn't really important because we're gonna keep ignoring it a while longer? lol

Dictator93 said:
If Maxwell is already being well utilized. Obviously any bit helps though as mentioned earler in the thread.

Well, it's not being well utilized. That's the problem async compute aims to solve, actually. You know a GPU isn't homogenous inside, right? Within the GPU there are various types of processors, specialized in different types of math. Throughout the rendering process, there will be different groups of processors sitting idle at various times, because the math they know how to do isn't needed on this particular cycle. It's waiting for its neighbor to finish preparing the data it needs.

Async compute can improve utilization by assigning additional jobs to those idle processors. This is illustrated in the diagrams DieH@rd posted earlier, where anything black represents processors going unused.

Plus, the GPU is simply better at some stuff than the CPU is, so simply having the GPU do it instead is a win, especially if you can do it without interrupting the rendering. Maxwell 2 finally adds a 32-queue compute scheduler, but it's sounding like it's still not actually asynchronous, and the GPU actually rapidly switches contexts between render and compute. So you can quickly switch between job types, but you can't truly run them simultaneously using idle transistors while the rendering takes place around you. Compare this with the eight, 8-queue schedulers on GCN, which are fully independent of the render scheduling, and basically have free access to any transistors not actively being used for rendering.

Oh, and another advantage of a truly asynchronous system is that once the CPU dispatches a job to the GPU, it doesn't need to sit there doing nothing while it waits for the result. As a crude example, the CPU can ask the GPU what actor1 can see. Instead of simply waiting for the result, the CPU can then ask what actor2 is able to see, and actor3, and so on. As the results start coming back from the GPU, then the CPU can decide what action actor1 will take based on what he can see. GPUs are good at ray-casting, and CPUs are good at branchy decision making, so the flexibility provided by async again allows us to utilize resources more efficiently. Running strictly on a CPU, a typical AI routine would spend 90% of its cycles determining perception and pathfinding for the actor. So if you normally spend 10ms on AI, 9ms is spent on perception and pathfinding. The GPU may be able to do that part of the job in 2ms, so you get your results a lot faster this way, but perhaps more importantly, you've freed up 9ms on the CPU, giving it time to work on other tasks.

So no matter how powerful your system is, async compute just makes it that much more powerful. hUMA makes it more powerful still, because the shared memory increases your opportunities to leverage GPGPU overall.

Arkanius · Aug 31, 2015

Arkanius said:
Some users at Beyond3D have tested the Async performance of both the 980 Ti and the 7870XT

980 Ti

7870XT

980 TI can't run both tasks in parallel, they are ran in a sequence order.
Apparently, this is a huge blow for VR performance (Not much into VR yet). It seems Nvidia have lag issues that create nausea due this latency of having to run both tasks in sequence.

Something else that could be understood from this graph:

Nvidia has a lower latency despite doing it in a serial way if they batch the Compute + Graphics jobs in batches of 32?

prag16 · Aug 31, 2015

My gawd. The conclusions being jumped to in this topic... unbelievable. PS4/AMD fans are getting way too excited, and some nvidia/PC fans are getting way too downtrodden, based on almost no information. The people claiming the PS4's GPU will now somehow completely "catch up" to a 970 are batshit insane, even in the best case (for AMD, worst for nvidia).

Relax...

I mean, durante hasn't even weighed in yet. Way too soon for anybody to be panicking and/or celebrating.

Arkanius · Aug 31, 2015

prag16 said:
My gawd. The conclusions being jumped to in this topic... unbelievable. PS4/AMD fans are getting way too excited, and some nvidia/PC fans are getting way too downtrodden, based on almost no information.

Relax...

I mean, durante hasn't even weighed in yet. Way too soon for anybody to be panicking and/or celebrating.

It's not jumping to conclusions if you are analyzing data.
We don't need an Nvidia PR statement to start drawing some lines though.

Kezen · Aug 31, 2015

Deus Ex Mankind Divided and Rise of the Tomb Raider use async compute extensively, it's going to be interesting to see which class of hardware can offer a console-like experience then.

I don't believe much will change on the AMD side (Intel quad core + 2012 era GPU 7850/7870 to match the PS4) but on Nvidia it could lead to a rise.

Darg · Aug 31, 2015

prag16 said:
My gawd. The conclusions being jumped to in this topic... unbelievable. PS4/AMD fans are getting way too excited, and some nvidia/PC fans are getting way too downtrodden, based on almost no information. The people claiming the PS4's GPU will now somehow completely "catch up" to a 970 are batshit insane, even in the best case (for AMD, worst for nvidia).

Relax...

I mean, durante hasn't even weighed in yet. Way too soon for anybody to be panicking and/or celebrating.

Yup, sorry to say this but in no world will it become a 970, i mean you have to realize just how much more powerful a 970 is compared to console gpu's, no magic is going to help.

Anyhow, i'll wait for more games it's still quite early for anything really.

Henrar · Aug 31, 2015

dr. apocalipsis said:
Could you explain me this then?

With the current model, throwing more bandwidth to the PCIe serves of no purpose. Some things works better being shared, like coherent cache, but some others works better with isolated resources. As convenient as shared pools of RAM is for integration and cost cutting, local vram>system ram from a pure performance standpoint.

I believe that graph doesn't tell full story. I mean, when you're not VRAM limited on GPU everything should be fine as everything sits in high-bandwidth internal memory of GPU. However, when you run out of memory, you should see slower PCI becoming bottleneck.
Although in every case you'll see stuttering as PCi bandwidth is way lower than GDDR5/HBM bandwidth.

Vinland · Aug 31, 2015

Having a feature and having a feature done correctly are two different things. This could be a huge error in marketing or it maybe something they can resolve in their driver. The problem with this thread is there are people defending Nvidia with arguments that read like this to me:

"Dx11 is all that matters anyways so who cares."

"Nvidia is still faster serially so who cares."

"No, nvidia just have shit drivers for dx12 and no it's not ironic."

"Nvidia have proper asynchronous compute but I won't provide any proof other than marketing bullet points."

I mean, that is cute and all but honestly it wears a bit thin. Nvidia could have bungled this but people want to believe they didn't like a cult. It is a product and in the end if they can fix it in software good for them. If they can't then hopefully their next product can do it. And if AMD actually do have better async. then so be it because at the end of the day they aren't our friends they are companies who drive each other to market innovations for something new and shines to sell us.

serversurfer · Aug 31, 2015

KKRT00 said:
Thats an example of async compute, which is not new.

Yes, and async compute was what W!CK!D was discussing when you asked for a real world example of the benefit, so I relinked the example that had already been provided on the first page.

It also does not in anyway show the difference in async utilization with 2xACE and 8xACE.

What? You want an example that shows being able to dispatch more jobs is better than being able to dispatch less? You're asking for proof that 64 is greater than 16? Is this a "64KB should be enough for anyone" argument? =/

I don't think You understand why i posted Flex video.

I really don't, no. It was pretty, but it didn't explain why hUMA isn't beneficial. It didn't explain anything at all.

Its the prime example of compute technology that should not viable without unified memory setup, because it requires CPU sync with GPU. Here we see the technology on DX11 hardware with split memory setup and API and its running just fine.

You're saying it proves hUMA is unnecessary because they have time to copy results back to the CPU from the GPU? That doesn't prove anything of the sort. Being able to get some copy operations in under the wire doesn't prove there's no advantage to eliminating them entirely.

Your arguments don't even make sense. =/

KKRT00 · Aug 31, 2015

serversurfer said:
What? You want an example that shows being able to dispatch more jobs is better than being able to dispatch less? You're asking for proof that 64 is greater than 16? Is this a "64KB should be enough for anyone" argument? =/

Yes, i want real world example of it having meaningful performance advantage in rendering in games.

serversurfer said:
I really don't, no. It was pretty, but it didn't explain why hUMA isn't beneficial. It didn't explain anything at all.

It shows that You dont need hUMA to have proper CPU and GPU sync in gaming related scenarios. Multiple physics engines being synced together is ultimate test for that.
Other thing that hUMA is benefitial is searching patterns in large set of data, ala DB, but this is not used in gaming.

serversurfer said:
You're saying it proves hUMA is unnecessary because they have time to copy results back to the CPU from the GPU? That doesn't prove anything of the sort. Being able to get some copy operations in under the wire doesn't prove there's no advantage to eliminating them entirely.

Yes, i'm saying that its not necessary for gaming. At least it hasnt been proofed yet to be necessary, by any dev.

serversurfer said:
Your arguments don't even make sense. =/

Sure, they dont, they are not supercharged enough.

aeolist · Aug 31, 2015

Vinland said:
Having a feature and having a feature done correctly are two different things. This could be a huge error in marketing or it maybe something they can resolve in their driver. The problem with this thread is there are people defending Nvidia with arguments that read like this to me:

"Dx11 is all that matters anyways so who cares."

"Nvidia is still faster serially so who cares."

"No, nvidia just have shit drivers for dx12 and no it's not ironic."

"Nvidia have proper asynchronous compute but I won't provide any proof other than marketing bullet points."

I mean, that is cute and all but honestly it wears a bit thin. Nvidia could have bungled this but people want to believe they didn't like a cult. It is a product and in the end if they can fix it in software good for them. If they can't then hopefully their next product can do it. And if AMD actually do have better async. then so be it because at the end of the day they aren't our friends they are companies who drive each other to market innovations for something new and shines to sell us.

it's not so much that nvidia bungled anything, they just designed their products with an eye on the short-term market and want to upsell people to more feature-rich GPUs down the line when those features become relevant. AMD has been trying to guess long-term trends for the last decade and it's done nothing but bite them in the ass and make their products worse to use right now. it may sound nice in a general way to want to design your products with an eye on future needs but ultimately it hasn't worked.

in the end current nvidia cards are still quite performant and will scale decently, and future ones will be better. computer parts are more easily replaceable than ever and people shouldn't be trying to make a PC that's future proofed because that's just not the economically smart thing to do.

cyberheater · Aug 31, 2015

serversurfer said:
Well, it's not being well utilized. That's the problem async compute aims to solve, actually. You know a GPU isn't homogenous inside, right? Within the GPU there are various types of processors, specialized in different types of math. Throughout the rendering process, there will be different groups of processors sitting idle at various times, because the math they know how to do isn't needed on this particular cycle. It's waiting for its neighbor to finish preparing the data it needs.

Async compute can improve utilization by assigning additional jobs to those idle processors. This is illustrated in the diagrams DieH@rd posted earlier, where anything black represents processors going unused.

Plus, the GPU is simply better at some stuff than the CPU is, so simply having the GPU do it instead is a win, especially if you can do it without interrupting the rendering. Maxwell 2 finally adds a 32-queue compute scheduler, but it's sounding like it's still not actually asynchronous, and the GPU actually rapidly switches contexts between render and compute. So you can quickly switch between job types, but you can't truly run them simultaneously using idle transistors while the rendering takes place around you. Compare this with the eight, 8-queue schedulers on GCN, which are fully independent of the render scheduling, and basically have free access to any transistors not actively being used for rendering.

Oh, and another advantage of a truly asynchronous system is that once the CPU dispatches a job to the GPU, it doesn't need to sit there doing nothing while it waits for the result. As a crude example, the CPU can ask the GPU what actor1 can see. Instead of simply waiting for the result, the CPU can then ask what actor2 is able to see, and actor3, and so on. As the results start coming back from the GPU, then the CPU can decide what action actor1 will take based on what he can see. GPUs are good at ray-casting, and CPUs are good at branchy decision making, so the flexibility provided by async again allows us to utilize resources more efficiently. Running strictly on a CPU, a typical AI routine would spend 90% of its cycles determining perception and pathfinding for the actor. So if you normally spend 10ms on AI, 9ms is spent on perception and pathfinding. The GPU may be able to do that part of the job in 2ms, so you get your results a lot faster this way, but perhaps more importantly, you've freed up 9ms on the CPU, giving it time to work on other tasks.

So no matter how powerful your system is, async compute just makes it that much more powerful. hUMA makes it more powerful still, because the shared memory increases your opportunities to leverage GPGPU overall.

That was as good read. Thanks.

Arulan · Aug 31, 2015

The worst thing about consoles unanimously using AMD hardware now is that it's just not petty "Green vs. Red" discussion, but console warriors pushing their thinly veiled agenda behind the AMD flag.

W!CK!D said:
It's definitely not worsening the situation. Multiplat engines are still held back by PC graphics cards, though. In a perfect world, every PC would be a unified system with a single processor and a single RAM pool. The concept behind the PS4 system architecture is the future of gaming PCs.

Didn't your old account (W!CKED) get banned?

As for the topic at hand, it's nice to see AMD put some pressure on Nvidia. It's interesting to see AMD's architecture work well with DX12, but that doesn't make the years of relatively terrible DX11 performance pay off. We'll have to wait and see what Pascal offers, and DX12 benchmarks under real-world scenarios.

Irobot82 · Aug 31, 2015

Arkanius said:
Something else that could be understood from this graph:

Nvidia has a lower latency despite doing it in a serial way if they batch the Compute + Graphics jobs in batches of 32?

Are you seriously trying to compare a 980ti to a 7870XT? No shit it has lower latency.

tuxfool · Aug 31, 2015

Dictator93 said:
If Maxwell is already being well utilized. Obviously any bit helps though as mentioned earler in the thread.

I don't think gpu utilization is as straightforward as it appears on most user tools. For example do we know if ROP heavy code reports the same utilization level as as heavy shader/alu code? These two cases use different parts of the gpu and doing two tasks on the graphics pipe would involve a context switch to handle both tasks.

The comparison to hyperthreading is somewhat apt, because that is one of the cases where async compute stands to provide the most benefit.

Arkanius · Aug 31, 2015

Irobot82 said:
Are you seriously trying to compare a 980ti to a 7870XT? No shit it has lower latency.

Calm down
It's not that different from the Fury X. It's part of the GCN architecture.

Fury X vs 980 Ti

Seems there is a trade-off.

Irobot82 · Aug 31, 2015

Arkanius said:
Calm down
It's not that different from the Fury X. It's part of the GCN architecture.

Fury X vs 980 Ti

Seems there is a trade-off.

The lack of figures on that chart hurts my brain. What am I looking at?

tuxfool · Aug 31, 2015

prag16 said:
The people claiming the PS4's GPU will now somehow completely "catch up" to a 970 are batshit insane, even in the best case (for AMD, worst for nvidia).

Nobody is claiming that except as a joke.

Arkanius · Aug 31, 2015

Irobot82 said:
The lack of figures on that chart hurts my brain. What am I looking at?

It's the datasets from this thread
https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-9

Y Axis: Latency (in ms)
X Axis: Batch order

Renekton · Aug 31, 2015

dr_rus said:
This situation isn't depressing at all as we don't know anything firm about this situation yet. But it's nice to see how some people here are already jumping to conclusions on a post made by a guy working on an AMD sponsored game running on beta drivers and all.

Regardless of what the Oxide guy says, I don't see PC GPU Asynchronous Compute support in UE4 doc search. (someone correct me plox). The rash conclusion I'm jumping to is that Epic may have put this on backburner with view of Nvidia's overwhelming market share.

Oh well if by the off-chance Maxwell doesn't implement it, Pascal is more than guaranteed to have it so it's just another year away.

Irobot82 · Aug 31, 2015

Arkanius said:
It's the datasets from this thread
https://forum.beyond3d.com/threads/dx12-performance-thread.57188/page-9

Y Axis: Latency (in ms)
X Axis: Batch order

Why does the latency in Nvidia's cards get progressively worse? Or am I still interpreting that incorrectly?

frontieruk · Aug 31, 2015

Irobot82 said:
Why does the latency in Nvidia's cards get progressively worse? Or am I still interpreting that incorrectly?

The test increases the number of single lane compute kernels from 1 to 128, as the batches get more complex the latency increases which is leading to the theory that Nvidia isn't doing async but serial processing.

Irobot82 · Aug 31, 2015

frontieruk said:
The test increases the number of single lane compute kernels from 1 to 128, as the batches get more complex the latency increases which is leading to the theory that Nvidia isn't doing async but serial processing.

This is starting to get above me. So does this mean if nvidias card has to start using more and more compute it gets worse in latency but AMD's so far stay the same but it's much higher in latency.

dogen · Aug 31, 2015

Irobot82 said:
This is starting to get above me. So does this mean if nvidias card has to start using more and more compute it gets worse in latency but AMD's so far stay the same but it's much higher in latency.

I think it would be interesting to see the test done with a higher number of concurrent tasks. Seems like there's a startup overhead for AMD or something.

Daffy Duck · Aug 31, 2015

Buys 980Ti Friday, this news breaks today....

So it's time to bin it already. So the card will not be able to run anything on DX12 above 30FPS?

frontieruk · Aug 31, 2015

dogen said:
I think it would be interesting to see the test done with a higher number of concurrent tasks. Seems like there's a startup overhead for AMD or something.

That's been put forward to the tests author, just to see where AMD starts hitting a pattern like NV.

Daffy Duck said:
Buys 980Ti Friday, this news breaks today....

So it's time to bin it already. So the card will not be able to run anything on DX12 above 30FPS?

this is a particularly limited case, due to the style of game, Fable Legends and maybe Gears Ultimate will show a more everyday case, shouldn't Ark have DX12 now? That'll be an interesting case the Devs said it offered a 20% perf increase.

Support NeoGAF

Oxide: Nvidia GPU's do not support DX12 Asynchronous Compute/Shaders.

Member

Banned

Banned

Member

Member

Member

Member

Banned

Banned

Banned

Member

Banned

Member

Member

Member

Banned

Member

Banned

Member

Member

Member

Banned

Banned

Member

Member

Member

Banned

Member

Banned

Neo Member

Member

Banned

Member

Member

Banned

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4

Member

Member

Banned

Member

Member

Banned

Member

Member

Member

Member

Member

Member

Member

Member

Similar threads