If anything hinders Scorpio from pulling off native 4K consistently across a wide range of games, it's probably going to be due to having only 32 ROPs. I was expecting 64.
Good article.
Oh no not the fp16 doubling power theory
Well it's not really blowing smoke, it gave DICE a 30% performance improvement.
On paper though, the technical accomplishment here is impressive. VooFoo has quadrupled resolution over the base PS4 version, and it has done this using a GPU that only has 2.3x the compute power of the older hardware and only 25 per cent more memory bandwidth. Either the base PS4 is being significantly underutilised (in which case we would expect an improvement on its 2x MSAA) or there's something more going on behind the scenes. The team joked about 'Mantis magic' before revealing that exploiting enhancements made to the PlayStation 4 Pro GPU have paid dividends.
Of course, we already knew that the Pro graphics core implements a range of new instructions - it was part of the initial leak - but we didn't really know exactly what they could actually do. As we understand it, with the new enhancements, it's possible to complete two 16-bit floating point operations in the time taken to complete one on the base PS4 hardware. The end result from the new Radeon technology is the additional throughput required to making Mantis Burn Racing hit its 4K performance target
Didnt they have the 204Gbps for the Xbox One too?This isn't just a matter of DF vs Anand and what MS told them but they actually are publishing these specs on their page so I doubt they would be so confident if it were some bullshit
http://www.xbox.com/en-US/project-scorpio
lol, you got me.Well that didn't take long. He must want a trip to Redmond too. I'm sure Ars Technica won't be far behind with there blind hot take.
Currently not one specific thing was mentioned in this regard, not even FP16.Yeah, but it's still a RX 480 with Vega enhancements[...]
They can be used if the game needs a certain format and this topic applies to all other consoles and hardware as well:But why state 326 if it can't be used?
That's a big gap.
In practise it might not matter much but in theory it's suboptimal.Is this good or not ?
The L2$ slices are tied to the memory controllers so for even load distrubution you arrange L2$ slices with the same capacity across the memory controllers.
With 6 memory controllers you would expect 512KB L2$ per mc, 3MB (or 256KB for 1,5MB) in total and not 2 .
Another thing which makes this interesting is Vega.
Currently the ROPs are the clients of the memory controllers but on GCN they have an extra interconnect which makes it possible to scale them independently of the number of memory controllers.
Vega will change this, with Vega the ROPs will be the clients of the L2$ and for equal load balancing the ROPs will be partionated accordingly and the L2$ in the same way to the MC.
https://www.extremetech.com/wp-content/uploads/2017/01/RenderL2.jpg
Scorpio must have 6 L2$ slices which would mean 48 ROPs (4 per slice) if the ROPs where tied to the L2$ but it has 32 ROPs which means that Scorpio probably uses the current GCN backend design up to Polaris and not Vega.
Although MS could make something like this:
4 ROPs for 4 256KB L2$ slices to 4 memory controllers
8 ROPs for 2 512KB L2$ slices to 2 memory controllers
You would get 32 ROPs, 2MB L2$ and a 384-Bit wide bus.
But I wouldn't bet on this scenario.
http://www.eurogamer.net/articles/digitalfoundry-2017-project-scorpio-tech-revealed"As you can see, we doubled the amount of shader engines. That has the effect of improvement of boosting our triangle and vertex rate by 2.7x when you include the clock boost as well. We doubled the number of render back-ends, which has the effect of increasing our fill-rate by 2.7x. We quadrupled the GPU L2 cache size, again for targeting the 4K performance."
Wait, I thought Scorpio had 40 ROPs?
Compute Units are not ROPs.It does. I don't know where people are getting 32.
Compute Units are not ROPs.
But why state 326 if it can't be used?
That's a big gap.
And for those people who asked if 32 ROPs are confirmed:
I think my satire detection is broken now: is this the new "secret sauce"?
I got no issues with Scorpio. Xbox? Maybe. But that's cause I'm a long time Xbox fan from the OG days so I'm bound to have issues.
Here though, I was just making an observation on the prediction/posts.
They're going off of a GPU that had similar setup, it's not guessing. FWIW AMD dumped that design (decoupling ROPs from memory controllers) in their next architecture with Hawaii. It hasnt re-appeared in Fiji or Polaris either.They're guessing. They don't even have the full detailed specs for the chip
They are taking the same information presented by DF and breaking it down without any superlatives or buzzwords. For the unknowns, they have raised the questions.This article is just second hand analysis from the same DF article. Nothing but guesses. Was hoping they had detailed info on the actual chip and not trying to guess what's inside
Wait, I thought Scorpio had 40 ROPs?
Also, PS4Pro using Vega enhancements while Scorpio is just a Polaris part (paraphrasing from the article)? Hmmm, I'm skeptical just a tad. Anandtech is very thorough in their observations and hardware breakdowns, but I dunno...skeptical.
They're going off of a GPU that had similar setup, it's not guessing. FWIW AMD dumped that design (decoupling ROPs from memory controllers) in their next architecture with Hawaii. It hasnt re-appeared in Fiji or Polaris either.
The situation is more complicated as the same bandwidth will be shared with the CPU cores in the case of Scorpio. Anandtech article is asking the right questions.
Why it is clear MS goal was 6tf in the smallest package possible. So instead of making the die larger with Vega enhancements they went mean and lean. Also keeping it smaller helps with thermos. Be interesting how this works out 4.2tf with more advanced tech vs 6tf of older tech.
They literally said they doubled the render backend, coupled with the increased clock speeds of the ROPs you get ~2.7x more pixel fillrate.That doesn't confirm anything technically as it says double the rendering backends increased fill-rate by 2.7x. I'm thinking we should be taking that statement as >= 2x XBox One ROPs and not = 2x XBox One ROPs although the same statement is used for shaders. However, we got an official published number for the latter but not the former.
Who says what pro has isn't in Scorpio already?
Because MS has kept quiet on that front. If it had those advanced features we would of heard from DF article. Instead it was a generic 60 custom features. Makes sense honestly they hit their goals. The question now is was designing around 6tf number worth it.
And you're the same person who's pushing the 1070 based off of one game. Mkay.But we need actual chip design analysis to see what exactly has been done. It's good to replicate something from a desktop gpu but let's wait till everyone has their hands on actual chips to evaluate it. Right now it's still not a hard definitive on what's on the chip. Even says in the article mostly speculating
I dont understand?
GAF told me the scorpio was more powerful than a gtx 1070?
And you're the same person who's pushing the 1070 based off of one game. Mkay.
AMD used this design again with Tonga (GCN Gen 3):They're going off of a GPU that had similar setup, it's not guessing. FWIW AMD dumped that design (decoupling ROPs from memory controllers) in their next architecture with Hawaii. It hasnt re-appeared in Fiji or Polaris either.
[...]
It sounds like a lot of this extra memory bandwidth would only possibly be taken advantage of by first party developers.
No multiplat is going to go out of their way to code to the metal just for Scorpio.
Currently not one specific thing was mentioned in this regard, not even FP16.
They wouldnt state that memory speed if hey cant use it. I hope this gets clarified
It's no surprise that even the far back leaks always lead to an overclocked RX 480 and it still holds true today. I don't know why people expected Sony and MS to bump up specs to godly levels last minute. These consoles need to be cost efficient.
It's not overclocked. It's clocked lower actually. Just has 4 more CUs.
They wouldnt state that memory speed if hey cant use it. I hope this gets clarified
"For 4K assets, textures get larger and render targets get larger as well. This means a couple of things - you need more space, you need more bandwidth. The question, though, was how much?" asks Nick Baker, Distinguished Engineer, Silicon. "We'd hate to build this GPU and then end up having to be memory-starved. So all the analysis that Andrew was talking about, we were able to look at the effect of different memory bandwidths, and it quickly led us to needing more than 300GB/s memory bandwidth. So in the end we ended up choosing 326GB/s. On Scorpio we are using a 384-bit GDDR5 interface - that is 12 channels. Each channel is 32 bits."
It always was a 256b chip, wasnt it? Even though built as 384b chip.AMD used this design again with Tonga (GCN Gen 3):
http://neogaf.com/forum/showpost.php?p=233597567&postcount=107
AMD even built a card with the fully enabled chip but it never entered mass production:
http://www.overclock.net/t/1583638/amd-r9-285-380-380x-tonga-tonga-xt-owners-discussion-thread/440#post_25633363
+ his GPU-Z validation:
https://www.techpowerup.com/gpuz/details/anr8p
30% for a part of the checkerboarding, not total rendering time
It was only sold as a 256-Bit product but there are 6 MCs and 32 ROPs just like on Tahiti (GCN Gen 1).It always was a 256b chip, wasnt it? Even though built as 384b chip.
It was only sold as a 256-Bit product but there are 6 MCs and 32 ROPs just like on Tahiti (GCN Gen 1).
Tonga's die-shot:
https://www.3dnews.ru/assets/external/illustrations/2015/06/08/915323/tonga-crys800.jpg[IMG][/QUOTE]
Yes, I think AMD also said to some affect that there wasnt a payoff to release it as a 384b design.
I remember Tahiti had some issues hitting its peak memory bandwidth, it was like ~70% at most compared to the 80%+ of other GPUs. I cant find those results unfortunately since that chip is so old now.
lol yeah this is exactly what DF predicted would've happened in one of their videos if the specs would've just leaked the regular way and analyzed on the paper with no inside knowledge. DF might still hype certain things up a bit too much though.They didn't see it in action either. So no benchmarks. I'm trusting DF's take on it for now.
It was only sold as a 256-Bit product but there are 6 MCs and 32 ROPs just like on Tahiti (GCN Gen 1).
So in the end, do you think Anandtech is right thinking Scorpio will struggle to use the full bandwidth in many cases?
It should not even be possible to fully max it with 32 rops.
I am starting to think Scorpio was initailly designed as a 8chip x 1GB GDDR5 machine that got bumped up to 12 chip at the last min. Because that 32 ROPs seems clearly best designed for 8
It should not even be possible to fully max it with 32 rops.
I am starting to think Scorpio was initailly designed as a 8chip x 1GB GDDR5 machine that got bumped up to 12 chip at the last min. Because that 32 ROPs seems clearly best designed for 8
Well, the rendering from last June suggested 12 GB from the beginning, so I'm confused
Well maybe not that last min then.
But like the article said. We want the width of the total ROPs and memory bus to be 1:1 for full utilization. 8 chips would have been 1:1
I will throw two things in here:[...]
I remember Tahiti had some issues hitting its peak memory bandwidth, it was like ~70% at most compared to the 80%+ of other GPUs. I cant find those results unfortunately since that chip is so old now.
http://techreport.com/review/22192/amd-radeon-hd-7970-graphics-processor/4ROP rates, of course, include the pixel fill rate and, more crucially these days, the amount of blending power for multisampled antialiasing. The 7970 is barely faster than the 6970 on the this front because it sports the same basic mix of hardware: eight ROP partitions, each capable of outputting four colored pixels or 16 Z/stencil pixels per clock. Rather than increasing the hardware counts here, AMD decided on a reorganization. In previous designs, two ROP partitions (or render back-ends) were associated with each memory controller, but AMD claims the memory controllers were "oversubscribed" in that setup, leaving the ROPs twiddling their thumbs at times. Tahiti's ROPs are no longer associated with a specific memory controller. Instead, the chip has a crossbar allowing direct, switched communication between each ROP partition and each memory controller. (The ROPs are not L2 cache clients, incidentally.) With this increased flexibility and the addition of two more memory controllers, AMD claims Tahiti's ROPs should achieve up to 50% higher utilization and thus efficiency. Higher efficiency is a good thing, but the big question is whether Tahiti's relatively low maximum ROP rates will be a limiting factor, even if the chip does approach its full potential more frequently.
Not in the way Anandtech speculates because the interconnection is solid and then you are simply ROP-Bound or BW-Bound.So in the end, do you think Anandtech is right thinking Scorpio will struggle to use the full bandwidth in many cases?
Very interesting comments both here and in a couple of the other threads. Devs chiming in about not only Jaguar, but potential GPU limitations as well.
Once the dust settles I look forward to more in-depth analysis.
That is 4K 60fps Ultra, clearly stated in the article.What I can't quite grasp is how Scorpio will render games at native 4k with this hardware. I mean the same DF has said in the past that a GTX 1080 is the absolute minimum for native 4k gaming and I quote:
http://www.eurogamer.net/articles/digitalfoundry-2016-4k-gaming-is-finally-viable-and-its-stunning
But somehow, they didn't even argue about Scorpio's 4k capabilities.