• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Siggraph 2015 game graphics papers


It's feature complete, to the extent that it tracks features for OpenGL ES 3.0 and has its own means of handling GPU compute.

iOS doesn't support OpenGL ES 3.1 right now, and tessellation and geometry shaders are ES 3.2 core features that were previously part of the Android Extension Pack for ES 3.1. Maybe Metal will have its own implementations of these features when (if?) Apple supports them in a post iOS 9.0 revision.
 

Durante

Member
When people talk about the additionally considered compute specs with the PS4, I'm guessing they're referring to the number of ACE's/queues.
Which is really just one aspect (work submission) of one aspect (scheduling) of compute performance, which is one aspect of GPU performance. Let's be brutally honest here: it is brought up about 50 times more than it should by rights be because it's one of the few easily quantified differences between the current consoles de jour.

Sorry, my knowledge on this particular section is rather limited.
Let's be more specific then, PC gaming does not exist in isolation. It stands to sense AAA (or more modest devs it does not really matter) multiplatform devs will build their engines around the GCN's strengthes (the GPU compute workloads at which it excels) so the question is : how will Kepler/Maxwell fare then ? I assume, perhaps wrongly, that those multiplatform games will make heavy use of GCN-tailored compute workloads and leverage async compute and async shaders.

We might have gotten a glimpse of an answer with Ryse which is a showcase for GCN, it did not run bad at all on Kepler/Maxwell but not quite as well as on various GCN cards.
I think after years of "to the metal" propaganda, people are too eager to look for complex (and often unprovable, especially with the simplistic performance investigation tools generally employed) answers for performance discrepancies when in fact the same ones which have been applicable for a decade are still good indicators.

Let me give you an example of what I mean. Yes, Ryse runs better than the average game on GCN cards relative to NV cards. It could be that this is indicative of some deeply rooted algorithmic optimization for "asynchronous compute" or the game being highly tuned to specifics of the GCN architecture. But isn't it far more likely that it's simply a workload less dependent on texturing/sampling or raster ops and more dependent on raw floating point operation throughput? Remember that e.g. a 290X has 6TF of theoretical FP performance while a 980 has 4.6 TF.

Basically, my point is that it has been discussed that different GPU setups perform differently for different workloads at least since the Radeon shipped with 3-texture multitexturing capabilities -- and it's true! But it has also never influenced anything significantly in the long run. If one vendor goes a bit too far (or not far enough) in one direction (TMUs, FLOPs, Bandwidth, ROPs, ...) in a given architecture they'll just correct that in the next.
 

tuxfool

Banned
I really wish people wouldn't just say "compute" when they could possibly mean any one of 50 different things. It reminds me of late last decade when every GPU was suddenly a "GPGPU".

It's the nxgamerization of hardware discussion :p


Edit:
I think the fact that "compute" aptly describes basically everything every component in a computer (which isn't memory) does is somewhat indicative of it being not very well suited to discussing specifics.

Yeah, this is true. Possibly one should use Capital C for Compute on a GPU. Still is bad, I wonder who is responsible for coining the term as it pertains to GPUs.
 

Kezen

Banned
Let me give you an example of what I mean. Yes, Ryse runs better than the average game on GCN cards relative to NV cards. It could be that this is indicative of some deeply rooted algorithmic optimization for "asynchronous compute" or the game being highly tuned to specifics of the GCN architecture. But isn't it far more likely that it's simply a workload less dependent on texturing/sampling or raster ops and more dependent on raw floating point operation throughput? Remember that e.g. a 290X has 6TF of theoretical FP performance while a 980 has 4.6 TF.
I thought tflops between different architectures were not meaningful, but you've made your point. A GCN-dominated developpment environment does not condemn Nvidia, they just have to make the right adjustments in the future and from what I gathered they already have done so with Maxwell.

Basically, my point is that it has been discussed that different GPU setups perform differently for different workloads at least since the Radeon shipped with 3-texture multitexturing capabilities -- and it's true! But it has also never influenced anything significantly in the long run. If one vendor goes a bit too far (or not far enough) in one direction (TMUs, FLOPs, Bandwidth, ROPs, ...) in a given architecture they'll just correct that in the next.
Alright, but the question pertaining Kepler/Maxwell vs GCN is still open. It's going to be very interesting to see how 2015/2016 games perform on Nvidia vs AMD, especially those who will be endorsed by AMD : Rise of the Tomb Raider and Deus Ex Mankind Divided.
 

nib95

Banned
Which is really just one aspect (work submission) of one aspect (scheduling) of compute performance, which is one aspect of GPU performance. Let's be brutally honest here: it is brought up about 50 times more than it should by rights be because it's one of the few easily quantified differences between the current consoles de jour.

That's exactly what it is. People often look to highlight the specifications that are readily available and most publicised, and the ones they can best compare as points of difference with the competition.
 
...I wonder who is responsible for coining the term as it pertains to GPUs.

I can't answer that specific bit, but an Intel-run course I attended considered "compute" to be an umbrella term for high level, defined interfaces to send workloads to the CPU, GPU or a kind of DSP on the hardware it's running on.

Which makes sense through the lens of OpenCL and friends. OpenCL can execute workloads on CPUs, GPUs, FPGAs and DSPs. RenderScript on Android can execute on the CPU or the GPU.
 

Durante

Member
Yeah, this is true. Possibly one should use Capital C for Compute on a GPU. Still is bad, I wonder who is responsible for coining the term as it pertains to GPUs.
I don't know who coined it, but it has been around for a long time, first as an alternative to and then entirely replacing the earlier "GPGPU". Which was just slightly less generic in meaning, but still not a particularly great name.

I thought tflops between different architectures were not meaningful
They aren't meaningful in the way they are often used, as a summary value describing the entirety of GPU performance. However, like other metrics such as fillrate, bandwith and texture sampling rate, they are useful as indicators of where a given GPU architecture is focused relative to another.
 

tuxfool

Banned
Alright, but the question pertaining Kepler/Maxwell vs GCN is still open. It's going to be very interesting to see how 2015/2016 games perform on Nvidia vs AMD, especially those who will be endorsed by AMD : Rise of the Tomb Raider and Deus Ex Mankind Divided.

Look at it this way. The consoles, regardless of the numbers of ACEs, have relatively weak Compute performance relative to mid-high end Desktop GPUs. Those games will be designed to scale to weak GPUs. A desktop Kepler should be fine in comparison to consoles.

Note that Fixed functions and graphics shaders are still important here. AMD has weak geometry performance even when compared to Kepler.

Nvidia will adapt, I have no doubt about that.

They already have. As I said, Maxwell for the most part seems to be fine.
 

tuxfool

Banned
They aren't meaningful in the way they are often used, as a summary value describing the entirety of GPU performance.

I should also point out that sometimes the IHVs also use different formats when measuring tflops. As an example when Nvidia was promoting the Tegra X1 it was quoting tflops for fp16 as opposed to fp32. It made their numbers look really good.

They weren't entirely wrong to do so as apparently fp16 is fairly important for mobile graphics, but they weren't explicit about it.
 

Kezen

Banned
Look at it this way. The consoles, regardless of the numbers of ACEs, have relatively weak Compute performance relative to mid-high end Desktop GPUs. Those games will be designed to scale to weak GPUs. A desktop Kepler should be fine in comparison to consoles.
By "compute" you really do mean "GPGPU", correct ? What is frustrating is that async shader/compute seems to give genuinely tangible boosts and we won't get that on PC if the game does not use D3D12. :(
AMD claim up to 46% performance boost :
Async_Perf_575px.jpg

http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading
I hope ROTTR will support D3D12.
Where do you draw the line between all the Kepler skus ? I took that Kepler relied on context-switching, is that enough to guarantee good performance in games heavily tuned for the GCN ?

It should point out that Fixed functions and graphics shaders are still important here. AMD has weak geometry performance even when compared to Kepler.
I know that but considering those games will be tuned for GCN is this going to matter all that much ?

They already have. As I said, Maxwell for the most part seems to be fine.
I hope so, Ryse runs really well on Maxwell for instance.
 

Datschge

Member
tri-Ace's Gotanda is giving a course again (together with Silicon Studio's Kawase, and former Silicon Studio's Kakimoto): http://research.tri-ace.com/s2015.html
The usual topics for Gotanda, i.e. real camera simulation with bokeh, lens effects and the likes.
Anybody know when that's scheduled and if there is/was a stream? Wasn't able find anything outside of that page.
 

tuxfool

Banned
By "compute" you really do mean "GPGPU", correct ? What is frustrating is that async shader/compute seems to give genuinely tangible boosts and we won't get that on PC if the game does not use D3D12. :(
AMD claim up to 46% performance boost :.

Most modern games already use Compute shaders. Also the key word to take from those slides is may. Not all tasks benefit from it.

I know that but considering those games will be tuned for GCN is this going to matter all that much ?

It hasn't made much of a difference yet. Also developers are not going to ignore 70% of the PC market when it comes to performance.
 
tri-Ace's Gotanda is giving a course again (together with Silicon Studio's Kawase, and former Silicon Studio's Kakimoto): http://research.tri-ace.com/s2015.html
The usual topics for Gotanda, i.e. real camera simulation with bokeh, lens effects and the likes.
Anybody know when that's scheduled and if there is/was a stream? Wasn't able find anything outside of that page.

SIGGRAPH 2015 was last week, he might give the course again for SIGGRAPH Asia in November. I would write to them about that.

I ended up missing this course because it ran against a remarkably comprehensive talk on mobile graphics, including Qualcomm's TBR architecture and issues that can come up with the thermal throttling that ARM's big.LITTLE architecture does.
 

dr_rus

Member
I hope so, Ryse runs really well on Maxwell for instance.
Maxwell 2 has 32 queues, GCN 1.1/1.2 has up to 64. GCN has better flow control because it has up to 8 dedicated ACEs (flow control processors) compared to Maxwell 2 which only has one (I think? info is a bit hazy here; no idea if we can even compare them by numbers either) which is managing all queues.

However it's actually up for discussion / testing to figure out how much benefit anything more than 1 graphics + 1 compute queues will bring to the table. A proper graphics program would load a GPU up to 100% even with 1 queue (if someone's unsure of this he can go and check if any Kepler card ever hits 99% load himself). Adding a compute queue can help in cases where a graphics workload stalled because of some reason (external bandwidth most likely) but it's definitely a game with diminishing returns for a second, third, etc compute queues.

A somewhat good example here is CPU's HT which is good for two threads mostly and gives almost no benefits if we move to 4, 8, etc. I think that 8 ACEs is a bit of a brute force approach to be honest and a more interesting option here would be to keep them down to 2 but increase their queue widths from 8 to 32. This is what I fully expect NV will do in Pascal which should theoretically bring them pretty much on par if not higher than GCN 1.2 here.

AMD claim up to 46% performance boost

Yeah, what you should always remember with this is that this "boost" is highly dependent on the parallel workload happening. A 46% boost in performance with async compute basically mean that the "primary" queue (be it graphics or another compute) is hitting about 2/3 GPU utilization for some reason which is quite far from ideal and not something which you see often on any platform. It will be interesting to see how all of this will play out between AMD and NV as NV general GPU utilization seems to be higher on average anyway.
 

tuxfool

Banned
A somewhat good example here is CPU's HT which is good for two threads mostly and gives almost no benefits if we move to 4, 8, etc. I think that 8 ACEs is a bit of a brute force approach to be honest and a more interesting option here would be to keep them down to 2 but increase their queue widths from 8 to 32. This is what I fully expect NV will do in Pascal which should theoretically bring them pretty much on par if not higher than GCN 1.2 here..

In the Article Kezen quoted AMD themselves state that 8 is overkill for graphics, where the previous combination of 1 graphics command processor + 2 ACEs may be sufficient. They're only keeping the current setup for things like HSA and OpenCL.
 

KKRT00

Member
By "compute" you really do mean "GPGPU", correct ? What is frustrating is that async shader/compute seems to give genuinely tangible boosts and we won't get that on PC if the game does not use D3D12. :(
AMD claim up to 46% performance boost :
Async_Perf_575px.jpg

http://www.anandtech.com/show/9124/amd-dives-deep-on-asynchronous-shading
I hope ROTTR will support D3D12.
Where do you draw the line between all the Kepler skus ? I took that Kepler relied on context-switching, is that enough to guarantee good performance in games heavily tuned for the GCN ?


I know that but considering those games will be tuned for GCN is this going to matter all that much ?
Thats really PR screenshot to show performance difference, because most post-processing effects have almost constant gpu time requirements per frame, so even though it is 46% here, due to 100fps difference the actual difference is 2.3ms of gpu time, so for a game that runs in 30hz or even 60hz it will be 7% and 14% for post processing respectively.
Still, async is free GPU time if You get good pipeline to use, so its definitely worth it.
 

Kezen

Banned
Most modern games already use Compute shaders. Also the key word to take from those slides is may. Not all tasks benefit from it.
Is it a stretch to assume devs will try to use compute shader as much as possible ?

It hasn't made much of a difference yet. Also developers are not going to ignore 70% of the PC market when it comes to performance.
Maybe they haven't taken too much advantage of this tech yet, the more ambitious devs become the more attractive performance-enhancing features will be. They might not have needed it in the past because they could meet their performance target but possibly in the future or in the fall 2015 games.

Maxwell 2 has 32 queues, GCN 1.1/1.2 has up to 64. GCN has better flow control because it has up to 8 dedicated ACEs (flow control processors) compared to Maxwell 2 which only has one (I think? info is a bit hazy here; no idea if we can even compare them by numbers either) which is managing all queues.
However it's actually up for discussion / testing to figure out how much benefit anything more than 1 graphics + 1 compute queues will bring to the table. A proper graphics program would load a GPU up to 100% even with 1 queue (if someone's unsure of this he can go and check if any Kepler card ever hits 99% load himself). Adding a compute queue can help in cases where a graphics workload stalled because of some reason (external bandwidth most likely) but it's definitely a game with diminishing returns for a second, third, etc compute queues.
We will see how it pans out.
 

prudislav

Member
Seems like there was some cool stuff from Ready at Dawn
At this year’s SIGGRAPH, Ready At Dawn showcased The Order 1886 running on the PC at 60fps.

While this does not mean that the team is currently working on a port, it does show the potential of its engine.

As Ready At Dawn claimed during its presentation:

“Just to highlight the basic building blocks of this system, here you can see our scripting editor and the game running on PC. You’ll notice there is a light set block listed for each shot in the sequence. It is enabled for that duration, and then disabled.”


Ready At Dawn has listed PC and PS4 as its target platforms for its engine, so it remains to be seen whether: a) The Order 1886 will make the jump to the PC and b) its future games hit the PC or not.

http://www.dsogaming.com/news/the-order-1886-showcased-running-on-the-pc-at-60fps/
 

Hah, it will never be on PC (Why would sony do that, unless to somehow make some money back that they lost).

Rather, that just shows how their engine was made for PC/PS4 and how their next titles will be on PC, likewise, their lead grphics guys shares PC graphics code from their engine on his blog.

Do not read too much into it, rather, read into it that their next game will have a PC release as well.

It did make my head turn sideways while reading the ppt's to see that they mentioned a number of hardware limitations making them change some graphical features up in the end, so perhaps their next project will do some things engine-wise that are different than what we saw in TO: 1886.
 

prudislav

Member
yeah it most likely stays on PS4 , unless it flopped really hard , but it will be really nice to see what they can do with their engine on PC in future
 

Carn82

Member
I can't answer that specific bit, but an Intel-run course I attended considered "compute" to be an umbrella term for high level, defined interfaces to send workloads to the CPU, GPU or a kind of DSP on the hardware it's running on.

Which makes sense through the lens of OpenCL and friends. OpenCL can execute workloads on CPUs, GPUs, FPGAs and DSPs. RenderScript on Android can execute on the CPU or the GPU.

Thats what ive understood as well. A devfriend of mine writes 'compute code' that works great on GPUs, but even better on CPUs.
 

KKRT00

Member
Presentations from DICE are finally up.

Stochastic Screen-Space Reflections

Interesting. PS4 tests were made in 900p. As algorithm takes 3-4ms of GPU time, it means that its probably not for 60hz titles, but 30hz.
Mass Effect Catalyst will be running in 900p? Or it will have lower quality reflection.

---

WOW. OIT is not available with GCN. Thanks consoles ;\ When will be next gen coming again?
 

Kezen

Banned
Interesting. PS4 tests were made in 900p. As algorithm takes 3-4ms of GPU time, it means that its probably not for 60hz titles, but 30hz.
Mass Effect Catalyst will be running in 900p? Or it will have lower quality reflection.
Sounds very expensive.

WOW. OIT is not available with GCN. Thanks consoles ;\ When will be next gen coming again?
I admit this elicited a double take. I thought all DX11 GPUs were capable of OIT. Intel has custom DX extensions used in GRID 2.
Didn't TressFX use OIT to increase hair credibility ?

Don't you think Johan must be referring to ROV ?

Other interesting presentations about APIs :
http://nextgenapis.realtimerendering.com/
 

KKRT00

Member
I admit this elicited a double take. I thought all DX11 GPUs were capable of OIT. Intel has custom DX extensions used in GRID 2.
Didn't TressFX use OIT to increase hair credibility ?

Don't you think Johan must be referring to ROV ?
Probably it is possible to program solution for OIT, but its not unified solution like ROV, which is also hardware based.
Repi talked a lot about unified, consistent and future proof approach and probably its only viable via ROV.

---------
Yes reflections seems expensive, it would also explain why they were running in 30hz in 60hz footage, it was very jarring.
I think reflections sampled in 30hz would benefit from temporal reprojection tech that Lucasarts studio was developing.
Its strange they havent talked about character reflections from Mirrors Edge, but there should be another talk related just to rendering techniques from ME, so i'm looking forward to it. I'm really interested how they did those reflections.
 
Mirror's Edge to this day looks very pleasing even though everything is baked . Probably why I wasn't that impressed with the re-make (not saying it looks bad or anything)
 

KKRT00

Member
So much awesome stuff in here. Damn.

Yep, Frostbite is so awesome.
I hope that DICE collaborates with Bioware on Mass Effect development and we'll see most of those features included in the game.
I really want to EA go all out with next Mass Effect, we deserve it after UE3 based previous games :p
Highest end version of Frostbite + Bioware's excellent art will be ultra gorgeous.
 

dr_rus

Member
I admit this elicited a double take. I thought all DX11 GPUs were capable of OIT. Intel has custom DX extensions used in GRID 2.
Didn't TressFX use OIT to increase hair credibility ?

Don't you think Johan must be referring to ROV ?

Other interesting presentations about APIs :
http://nextgenapis.realtimerendering.com/

I think it's not as much impossible on FL12_0 as the performance cost is simply too much to use it. I don't think that TressFX even used transparency? They simply anti-alias hair strands and that's all, no?
 

Javin98

Banned
I'd really rate the Frostbite engine if they could deliver graphics at 1080p on PS4.
Well, there are a few Frostbite games running at 1080p on PS4, but I'm sure you're referring to Battlefield 4 and Star Wars Battlefront. In the former's defense, Battlefield 4 was a launch title and ran at 60FPS, so it has an excuse to be running at 900p, I guess. Not to mention Battlefield 4 is pretty demanding on PC as well. Star Wars Battlefront, on the other hand, looks significantly improved visually and is targeting 60FPS as well on consoles. I think this is a rare exception where I would give the devs a pass for a 900p game on PS4 just because of how amazing it looks and targeting 60FPS to boot.

Edit: And I am still waiting for Guerrilla's presentation on the volumetric clouds in Horizon.
 
Top Bottom