• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Anandtech breaks down Scorpio specs + predictions

AmyS

Member
If anything hinders Scorpio from pulling off native 4K consistently across a wide range of games, it's probably going to be due to having only 32 ROPs. I was expecting 64.

Good article.
 

Leyasu

Banned
If anything hinders Scorpio from pulling off native 4K consistently across a wide range of games, it's probably going to be due to having only 32 ROPs. I was expecting 64.

Good article.

Yeah, 32 ROPs seems a little light. Has this been confirmed by MS?
 

timlot

Banned
Well that didn't take long. He must want a trip to Redmond too. I'm sure Ars Technica won't be far behind with there blind hot take.
 

STEaMkb

Member
Oh no not the fp16 doubling power theory

Well it's not really blowing smoke, it gave DICE a 30% performance improvement.

VooFoo Studios also:

On paper though, the technical accomplishment here is impressive. VooFoo has quadrupled resolution over the base PS4 version, and it has done this using a GPU that only has 2.3x the compute power of the older hardware and only 25 per cent more memory bandwidth. Either the base PS4 is being significantly underutilised (in which case we would expect an improvement on its 2x MSAA) or there's something more going on behind the scenes. The team joked about 'Mantis magic' before revealing that exploiting enhancements made to the PlayStation 4 Pro GPU have paid dividends.

Of course, we already knew that the Pro graphics core implements a range of new instructions - it was part of the initial leak - but we didn't really know exactly what they could actually do. As we understand it, with the new enhancements, it's possible to complete two 16-bit floating point operations in the time taken to complete one on the base PS4 hardware. The end result from the new Radeon technology is the additional throughput required to making Mantis Burn Racing hit its 4K performance target

It has limited applicability though so expectations must be tempered.
 

wachie

Member

Locuza

Member
Yeah, but it's still a RX 480 with Vega enhancements[...]
Currently not one specific thing was mentioned in this regard, not even FP16.

But why state 326 if it can't be used?

That's a big gap.
They can be used if the game needs a certain format and this topic applies to all other consoles and hardware as well:
3175612-0982097327-lowle.jpg


For the Scorpio:
1172 x 32 x 4 Bytes = 150 GB/s
1172 x 32 x 8 Bytes = 300 GB/s
1172 x 32 x 16 Bytes = 600 GB/s

The CPU also needs a few GB/s and then you lose some because of memory contention and other inefficiencies.

And Tahiti wasn't the last chip which used 384-Bit and 32 ROPs, there was also Tonga.
Tonga-XT-full-architecture.png


I wouldn't be too surprised about 32 ROPs being tied to a 384-Bit wide bus, but 2048 KB L2$ on it:
Is this good or not ?
In practise it might not matter much but in theory it's suboptimal.
The L2$ slices are tied to the memory controllers so for even load distrubution you arrange L2$ slices with the same capacity across the memory controllers.
With 6 memory controllers you would expect 512KB L2$ per mc, 3MB (or 256KB for 1,5MB) in total and not 2 .
Cache%20Hierarchy.png


Another thing which makes this interesting is Vega.
Currently the ROPs are the clients of the memory controllers but on GCN they have an extra interconnect which makes it possible to scale them independently of the number of memory controllers.
Vega will change this, with Vega the ROPs will be the clients of the L2$ and for equal load balancing the ROPs will be partionated accordingly and the L2$ in the same way to the MC.
https://www.extremetech.com/wp-content/uploads/2017/01/RenderL2.jpg

Scorpio must have 6 L2$ slices which would mean 48 ROPs (4 per slice) if the ROPs where tied to the L2$ but it has 32 ROPs which means that Scorpio probably uses the current GCN backend design up to Polaris and not Vega.

Although MS could make something like this:
4 ROPs for 4 256KB L2$ slices to 4 memory controllers
8 ROPs for 2 512KB L2$ slices to 2 memory controllers

You would get 32 ROPs, 2MB L2$ and a 384-Bit wide bus.
But I wouldn't bet on this scenario.

And for those people who asked if 32 ROPs are confirmed:
"As you can see, we doubled the amount of shader engines. That has the effect of improvement of boosting our triangle and vertex rate by 2.7x when you include the clock boost as well. We doubled the number of render back-ends, which has the effect of increasing our fill-rate by 2.7x. We quadrupled the GPU L2 cache size, again for targeting the 4K performance."
http://www.eurogamer.net/articles/digitalfoundry-2017-project-scorpio-tech-revealed
 
Wait, I thought Scorpio had 40 ROPs?

Also, PS4Pro using Vega enhancements while Scorpio is just a Polaris part (paraphrasing from the article)? Hmmm, I'm skeptical just a tad. Anandtech is very thorough in their observations and hardware breakdowns, but I dunno...skeptical.
 
And for those people who asked if 32 ROPs are confirmed:

That doesn't confirm anything technically as it says double the rendering backends increased fill-rate by 2.7x. I'm thinking we should be taking that statement as >= 2x XBox One ROPs and not = 2x XBox One ROPs although the same statement is used for shaders. However, we got an official published number for the latter but not the former.
 

Space_nut

Member
This article is just second hand analysis from the same DF article. Nothing but guesses. Was hoping they had detailed info on the actual chip and not trying to guess what's inside :(
 

wachie

Member
They're guessing. They don't even have the full detailed specs for the chip
They're going off of a GPU that had similar setup, it's not guessing. FWIW AMD dumped that design (decoupling ROPs from memory controllers) in their next architecture with Hawaii. It hasnt re-appeared in Fiji or Polaris either.

The situation is more complicated as the same bandwidth will be shared with the CPU cores in the case of Scorpio. Anandtech article is asking the right questions.

This article is just second hand analysis from the same DF article. Nothing but guesses. Was hoping they had detailed info on the actual chip and not trying to guess what's inside :(
They are taking the same information presented by DF and breaking it down without any superlatives or buzzwords. For the unknowns, they have raised the questions.
 

quest

Not Banned from OT
Wait, I thought Scorpio had 40 ROPs?

Also, PS4Pro using Vega enhancements while Scorpio is just a Polaris part (paraphrasing from the article)? Hmmm, I'm skeptical just a tad. Anandtech is very thorough in their observations and hardware breakdowns, but I dunno...skeptical.

Why it is clear MS goal was 6tf in the smallest package possible. So instead of making the die larger with Vega enhancements they went mean and lean. Also keeping it smaller helps with thermos. Be interesting how this works out 4.2tf with more advanced tech vs 6tf of older tech.
 

Space_nut

Member
They're going off of a GPU that had similar setup, it's not guessing. FWIW AMD dumped that design (decoupling ROPs from memory controllers) in their next architecture with Hawaii. It hasnt re-appeared in Fiji or Polaris either.

The situation is more complicated as the same bandwidth will be shared with the CPU cores in the case of Scorpio. Anandtech article is asking the right questions.

But we need actual chip design analysis to see what exactly has been done. It's good to replicate something from a desktop gpu but let's wait till everyone has their hands on actual chips to evaluate it. Right now it's still not a hard definitive on what's on the chip. Even says in the article mostly speculating
 

Space_nut

Member
Why it is clear MS goal was 6tf in the smallest package possible. So instead of making the die larger with Vega enhancements they went mean and lean. Also keeping it smaller helps with thermos. Be interesting how this works out 4.2tf with more advanced tech vs 6tf of older tech.

Who says what pro has isn't in Scorpio already?
 

Locuza

Member
That doesn't confirm anything technically as it says double the rendering backends increased fill-rate by 2.7x. I'm thinking we should be taking that statement as >= 2x XBox One ROPs and not = 2x XBox One ROPs although the same statement is used for shaders. However, we got an official published number for the latter but not the former.
They literally said they doubled the render backend, coupled with the increased clock speeds of the ROPs you get ~2.7x more pixel fillrate.

16 ROPs x 853 Mhz ~ 13.648 MPix/s
32 ROPs x 1172 Mz ~ 37.404 MPix/s (2.7x)
 

quest

Not Banned from OT
Who says what pro has isn't in Scorpio already?

Because MS has kept quiet on that front. If it had those advanced features we would of heard from DF article. Instead it was a generic 60 custom features. Makes sense honestly they hit their goals. The question now is was designing around 6tf number worth it.
 

Space_nut

Member
Because MS has kept quiet on that front. If it had those advanced features we would of heard from DF article. Instead it was a generic 60 custom features. Makes sense honestly they hit their goals. The question now is was designing around 6tf number worth it.

I'm sure DF will be releasing more info on the 60 custom features. MS made all upgrades to Polaris tech before they built the chip to handle any previous bottlenecks. I'm sure they have the same tech as pro plus a few more. Whatever DF wrote isn't all there is. Let's wait till we get the full details on the chip design
 

wachie

Member
But we need actual chip design analysis to see what exactly has been done. It's good to replicate something from a desktop gpu but let's wait till everyone has their hands on actual chips to evaluate it. Right now it's still not a hard definitive on what's on the chip. Even says in the article mostly speculating
And you're the same person who's pushing the 1070 based off of one game. Mkay.
 

Space_nut

Member
And you're the same person who's pushing the 1070 based off of one game. Mkay.

DF got hands on the tech. What they wrote in the initial article wasn't everything. The article you posted is basing their info from what DF disclosed. Again if they had design docs and first hand then fine. I just see a article basing on info from DF article that didn't go into full details on the design
 

Locuza

Member
They're going off of a GPU that had similar setup, it's not guessing. FWIW AMD dumped that design (decoupling ROPs from memory controllers) in their next architecture with Hawaii. It hasnt re-appeared in Fiji or Polaris either.
[...]
AMD used this design again with Tonga (GCN Gen 3):
http://neogaf.com/forum/showpost.php?p=233597567&postcount=107

AMD even built a card with the fully enabled chip but it never entered mass production:
http://www.overclock.net/t/1583638/amd-r9-285-380-380x-tonga-tonga-xt-owners-discussion-thread/440#post_25633363

+ his GPU-Z validation:
https://www.techpowerup.com/gpuz/details/anr8p
 
It's no surprise that even the far back leaks always lead to an overclocked RX 480 and it still holds true today. I don't know why people expected Sony and MS to bump up specs to godly levels last minute. These consoles need to be cost efficient.

It's not overclocked. It's clocked lower actually. Just has 4 more CUs.
 

timlot

Banned
They wouldnt state that memory speed if hey cant use it. I hope this gets clarified

Don't think there is much to clarify...
"For 4K assets, textures get larger and render targets get larger as well. This means a couple of things - you need more space, you need more bandwidth. The question, though, was how much?" asks Nick Baker, Distinguished Engineer, Silicon. "We'd hate to build this GPU and then end up having to be memory-starved. So all the analysis that Andrew was talking about, we were able to look at the effect of different memory bandwidths, and it quickly led us to needing more than 300GB/s memory bandwidth. So in the end we ended up choosing 326GB/s. On Scorpio we are using a 384-bit GDDR5 interface - that is 12 channels. Each channel is 32 bits."
 

wachie

Member

wachie

Member
It was only sold as a 256-Bit product but there are 6 MCs and 32 ROPs just like on Tahiti (GCN Gen 1).

Tonga's die-shot:
https://www.3dnews.ru/assets/external/illustrations/2015/06/08/915323/tonga-crys800.jpg[IMG][/QUOTE]
Yes, I think AMD also said to some affect that there wasnt a payoff to release it as a 384b design.

I remember Tahiti had some issues hitting its peak memory bandwidth, it was like ~70% at most compared to the 80%+ of other GPUs. I cant find those results unfortunately since that chip is so old now.
 

Fredrik

Member
They didn't see it in action either. So no benchmarks. I'm trusting DF's take on it for now.
lol yeah this is exactly what DF predicted would've happened in one of their videos if the specs would've just leaked the regular way and analyzed on the paper with no inside knowledge. DF might still hype certain things up a bit too much though.
 
So in the end, do you think Anandtech is right thinking Scorpio will struggle to use the full bandwidth in many cases?

It should not even be possible to fully max it with 32 rops.

I am starting to think Scorpio was initailly designed as a 8chip x 1GB GDDR5 machine that got bumped up to 12 chip at the last min. Because that 32 ROPs seems clearly best designed for 8
 
It should not even be possible to fully max it with 32 rops.

I am starting to think Scorpio was initailly designed as a 8chip x 1GB GDDR5 machine that got bumped up to 12 chip at the last min. Because that 32 ROPs seems clearly best designed for 8

Well, the rendering from last June suggested 12 GB from the beginning, so I'm confused
 

quest

Not Banned from OT
It should not even be possible to fully max it with 32 rops.

I am starting to think Scorpio was initailly designed as a 8chip x 1GB GDDR5 machine that got bumped up to 12 chip at the last min. Because that 32 ROPs seems clearly best designed for 8

No way the 384 bit bus was there from the initial design. They wanted a ton of bandwidth and more than 8 gigs of ram. 16 gigs of GDdr5x was never a realistic option.
 
Well, the rendering from last June suggested 12 GB from the beginning, so I'm confused

Well maybe not that last min then.

But like the article said. We want the width of the total ROPs and memory bus to be 1:1 for full utilization. 8 chips would have been 1:1
 

Proelite

Member
Well maybe not that last min then.

But like the article said. We want the width of the total ROPs and memory bus to be 1:1 for full utilization. 8 chips would have been 1:1

Sounds like they should have gone 16 chips of GDDR5x clamshell on 256 bit bus for 16GB because I have 20:20 hindsight and knows more than Nick Baker.
 

Locuza

Member
[...]
I remember Tahiti had some issues hitting its peak memory bandwidth, it was like ~70% at most compared to the 80%+ of other GPUs. I cant find those results unfortunately since that chip is so old now.
I will throw two things in here:
ROP rates, of course, include the pixel fill rate and, more crucially these days, the amount of blending power for multisampled antialiasing. The 7970 is barely faster than the 6970 on the this front because it sports the same basic mix of hardware: eight ROP partitions, each capable of outputting four colored pixels or 16 Z/stencil pixels per clock. Rather than increasing the hardware counts here, AMD decided on a reorganization. In previous designs, two ROP partitions (or render back-ends) were associated with each memory controller, but AMD claims the memory controllers were "oversubscribed" in that setup, leaving the ROPs twiddling their thumbs at times. Tahiti's ROPs are no longer associated with a specific memory controller. Instead, the chip has a crossbar allowing direct, switched communication between each ROP partition and each memory controller. (The ROPs are not L2 cache clients, incidentally.) With this increased flexibility and the addition of two more memory controllers, AMD claims Tahiti's ROPs should achieve up to 50% higher utilization and thus efficiency. Higher efficiency is a good thing, but the big question is whether Tahiti's relatively low maximum ROP rates will be a limiting factor, even if the chip does approach its full potential more frequently.
http://techreport.com/review/22192/amd-radeon-hd-7970-graphics-processor/4

And here is a performance table with a fill rate test from 3D Mark Vantage:
3dm-color.gif

http://techreport.com/review/25509/amd-radeon-r9-290x-graphics-card-reviewed/6

Tahiti really pushes much more than Cayman (6970) and Pitcairn who also has 32 ROPs but only a 256-Bit Interface.
Hawaii has 64 ROPs (+100%) but only 320 GB/s (+11%) vs. 288 GB/s which results in only 21% more fill rate, at least in the 3D Mark test.

Other tests and formats might show different results but the backend design with disproportionate ROP/MC distribution doesn't look problematic.

So in the end, do you think Anandtech is right thinking Scorpio will struggle to use the full bandwidth in many cases?
Not in the way Anandtech speculates because the interconnection is solid and then you are simply ROP-Bound or BW-Bound.
In my earlier posting you can see that the ROPs can push 150 GB worth of data, 300 GB or even 600.
I don't know in which proportions games use certain formats but either way there is no hardware issue/special bottleneck coming from design aspect.
As I mentioned before I'm more curious about the L2$ distrubtion and well together with the ROPs.
 

belvedere

Junior Butler
Very interesting comments both here and in a couple of the other threads. Devs chiming in about not only Jaguar, but potential GPU limitations as well.

Once the dust settles I look forward to more in-depth analysis.
 

quest

Not Banned from OT
Very interesting comments both here and in a couple of the other threads. Devs chiming in about not only Jaguar, but potential GPU limitations as well.

Once the dust settles I look forward to more in-depth analysis.

I would love that I just worry it won't happen. If Scorpio has less Vega features than the pro I can see ms being tight lipped. They gave df almost nothing of substance outside the basic specs. This article would explain why.
 
This article, although based on theory is exactly what I wanted from digital foundry.

Instead we got buzzwords and clickbait.

It looks a good class above the PS4 pro but chasing that native 4k will negate any power difference when Sony implements checkboarding instead.

Exclusives withstanding of course.

Which in digital foundrys own words is "a minimal difference".
 
Top Bottom