• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

AMD's Zen CPUs to feature up to 32 cores and 8-channel DDR4

tokkun

Member
eDRAM has definitively higher latency in low capacities than SRAM, which is why you typically only see it in use as an L3/L4 cache (or L2 in a less performance sensitive case like Wii U), but IBM's 22nm eDRAM L3 on the POWER8 achieves almost half the latency of Intel's 14nm SRAM L3, which shows the benefit when you get up to multi-megabyte capacities.

It's not really an apple-to-apples comparison, though, since the SRAM cache is set associative. Associativity trades off latencies with smaller working sets to improve effective capacity, and you see the SRAM L3 outperforming the eDRAM in larger sizes in that benchmark.

I think if you really want to draw this sort of conclusion about technologies you would need to compare a direct-mapped SRAM with an eDRAM of equal capacities.
 
And I don't understand why this didn't pop up more often in the zen circle jerks in Scorpio threads, or when people were demanding zen in neo. It's a desktop class CPU, it was never going to magically be 5W TDP. you want low TDP you get core m performance, or maybe ultra book dual core CPUs at 15W
FWIW, Wiki sez they've planned "mobile" Zen CPUs with 2-4 cores and 15W-35W TDP. The 4-core engineering sample is rated at 65W, but only draws 2.5W at 550MHz. So it sounds like dialing the clocks down gives you a disproportionate drop in power draw, but I'm not sure how to figure out what sort of speeds you could get at 30W or whatever.

Btw, how is that 32 core only a bit more than double the TDP of the 8 core? Maybe Scorpio could take four of those? (195w for 32 core, 24w for 4 cores)
Similarly, I was wondering why the 8-core only draws ~50% more power than the 4-core. Is it actually 50% faster rather than 100% faster? Anyone? Oh, and is the 65W TDP of the 4-core at the base clock, or the boost clock? Is 65W the actual TDP of the ES, or target TDP for the final hardware?


There is no "mobile Zen" versus "desktop Zen". AMD will use exactly the same core, and likely the same dies across both desktop and mobile, just like Intel does.The reason mobile CPUs consume so much less power is that (a) they're clocked lower, with lower voltages and are able to sustain maximum clock speeds for shorter periods of time and (b) they're binned dies, which means that only a small proportion of chips coming off the production line have to meet their power/thermal thresholds, with the rest going into desktop chips. In a console environment you'd get the benefit of lower clock speeds (although they may end up so low that you defeat the purpose of using high-performance cores), but you're at very much the opposite end of the spectrum when it come to binning.
Oh, so does that mean the 8-core is twice as fast as the 4-core, but only in short bursts? I don't really understand how or why it would work like that. Seems like you'd have "enough" power for 8 cores, or not. Can you possibly explain what's happening? <3

Even aside from all of this, there's the question of the die size (i.e. cost) of using a onto-core Zen CPU in a console. While we don't have confirmed die sizes for Zen yet, if they're attempting to compete with Intel on performance then they can't be a whole lot smaller than Broadwell/Skylake cores, which would put an 8-core Zen CPU somewhere around the 200mm² range of 14nm. Attach a 6 Tflop GPU on there and you have an absolutely monstrous die, probably dwarfing even the GP102 used in the $1200 Pascal Titan X. There's just no way you can squeeze that into a console without taking massive, crippling losses on each unit sold.
We have this shot of Summit Ridge at 14mn…
zen_summit_ridge_first.jpg

Can you make some estimates based on the size of other features, like the interfaces or the caches? Speaking of caches, more than a third of each Zen quad is filled with 8MB L3, but the current consoles have no L3 at all. Would it be possible/likely that they'd remove the L3 from the new consoles, reducing footprint, and I assume power draw as well?
 

DonMigs85

Member
Even in the GPU space sometimes the smaller chips are less efficient than the big ones. For example the GTX 960 is basically a 980 chopped in half in terms of specs, but it's rated at 120W TDP versus 180W I believe for the 980. So yeah another 50% difference.
 
Are you comparing skylake die size to zen? Skylake cpus have gpus built in already which take up a lot of space. AMD FX chips based on zen will not have any gpu on the chip at all.

Skylake 14nm desktop quad core = 122 mm^2 die and that includes the gpu which takes up around a third of the die space alone

77a.jpg


I dont see any reason why AMD won't be able to put Zen + 6 tflop gpu together for MS for next year. The cpu cores themselves will be tiny. It's mainly just going to be a big ass gpu + 8 small cores together.

The xbone had a 363mm^2 die size, PS4 328mm^2

Nvidias 1070/1080 chip is 314mm^2. That's an 8.2 TF chip at a high 1.6 ghz when all cores are enabled.

AMD's 480 chip is 232mm^2. This is a 5.1 TF chip and at 1.12ghz

Just looking at the information provided and you can pretty much see that it is possible to put a high powered GPU + 8 zen cpus and still have a die size you can go into production with at 14/16nm.

It's doable and will be done.
 

Proelite

Member
Are you comparing skylake die size to zen? Skylake cpus have gpus built in already which take up a lot of space. AMD FX chips based on zen will not have any gpu on the chip at all.

Skylake 14nm desktop quad core = 122 mm^2 die and that includes the gpu which takes up around a third of the die space alone

77a.jpg


I dont see any reason why AMD won't be able to put Zen + 6 tflop gpu together for MS for next year. The cpu cores themselves will be tiny. It's mainly just going to be a big ass gpu + 8 small cores together.

The xbone had a 363mm^2 die size, PS4 328mm^2

Nvidias 1070/1080 chip is 314mm^2. That's an 8.2 TF chip at a high 1.6 ghz when all cores are enabled.

AMD's 480 chip is 232mm^2. This is a 5.1 TF chip and at 1.12ghz

Just looking at the information provided and you can pretty much see that it is possible to put a high powered GPU + 8 zen cpus and still have a die size you can go into production with at 14/16nm.

It's doable and will be done.

Why does Scorpio need 8 core? 4 core with hyperthreading would be more than enough of a jump from Xb1.

In addition, I was wondering if it's possible to turn in 32mb esram into a shared L3 cache between CPU and GPU ala Sandy Bridge. In BC mode the esram will be exclusively GPU, otherwise the CPU will make most use of the cache.
 
I was just assuming they would go for 8 cores based on having 8 in the xbone.

There's zero reason to waste die space on esram for xbone bc or to use as L3 cache since zen already has a cache system designed specifically for it.

MS doesn't need hardware to do bc. This has been shown with XB BC on 360 and now 360 BC on Xbone.


In regards to PC, i'm hoping zen will bring back some type of competition and to put pressure on intel to release mainstream cpus with 6-8 cores instead of saving them for those willing to spend.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
It's not really an apple-to-apples comparison, though, since the SRAM cache is set associative. Associativity trades off latencies with smaller working sets to improve effective capacity, and you see the SRAM L3 outperforming the eDRAM in larger sizes in that benchmark.

I think if you really want to draw this sort of conclusion about technologies you would need to compare a direct-mapped SRAM with an eDRAM of equal capacities.
I don't know why you think eDRAM cache must be direct-mapped. Power8 uses set-associativity up to its L4 cache - the cache on-board the Centaur memory controller is 16-way set-associative (which also uses a combination of eDRAM and SRAM, but that's another subject).
 

Agent_4Seven

Tears of Nintendo
In regards to PC, i'm hoping zen will bring back some type of competition and to put pressure on intel to release mainstream cpus with 6-8 cores instead of saving them for those willing to spend.
More cores does not mean more performance in games, especially when it comes to Intel CPUs. What is really needed is competition, yes, but also more powerfull CPUs from generation to generation with a lot more than just stupid and barely noticeable (if noticeable at all) 5-10 (15 at best) percent performance increase overall year after year.
 

Renekton

Member
Why does Scorpio need 8 core? 4 core with hyperthreading would be more than enough of a jump from Xb1.

In addition, I was wondering if it's possible to turn in 32mb esram into a shared L3 cache between CPU and GPU ala Sandy Bridge. In BC mode the esram will be exclusively GPU, otherwise the CPU will make most use of the cache.
A thread is not nearly a substitute for a physical core.
 

Avtomat

Member
More cores does not mean more performance in games, especially when it comes to Intel CPUs. What is really needed is competition, yes, but also more powerfull CPUs from generation to generation with a lot more than just stupid and barely noticeable (if noticeable at all) 5-10 (15 at best) percent performance increase overall year after year.
I think Intel is copping far too much blame in this barely noticeable performance increase narrative. I mean the first gen i7 had a 128 entry reorder window now we at 224. Skylar has more execution units, increased throughput etc. The fact if the matter is we appear to have hit a wall with exactly how much ILP we can extract. When you look at the technical papers Intel has actually done a lot of stuff post Sandybridge just in games especially I don't think the CPU is the bottleneck anymore.

IMO the CPU industry needs to properly balance ILP gains with clockspeed gains and improvement in the memory subsystem. Larger lower latency caches, quicker Dram access....

Maybe even look at specialised blocks on the CPU to accelerate certain tasks.
 

DonMigs85

Member
Skylake does have dedicated fixed-function blocks that accelerate whatever. It gives it a significant boost in Google Octane and Kraken over Haswell and Broadwell.
 

Avtomat

Member
I never realised about the gains in octane, interesting I wonder what kind of function block can accelerate web browsing. Skylake has an onboard DSP as well - it will be interesting to see what direction Intel goes in the future.

On a side note just recalled that in the dolphin benchmark skylake absolutely wrecks everything else, that's the sort of benchmark that shows the IPC potential if Intel's newer designs.
 

tokkun

Member
I don't know why you think eDRAM cache must be direct-mapped. Power8 uses set-associativity up to its L4 cache - the cache on-board the Centaur memory controller is 16-way set-associative (which also uses a combination of eDRAM and SRAM, but that's another subject).

The reference I looked at yesterday suggested it wasn't, but I guess it was incomplete. It seems the L3 is 8-way from:
https://books.google.com/books?id=o...ache associativity&pg=PP1#v=onepage&q&f=false

Whereas Intel's L3 is 16-20-way based on http://en.wikichip.org/wiki/intel/microarchitectures/broadwell

So it seems like the basic point about comparing caches with different levels of associativity still stands.
 

DonMigs85

Member
Some of you guys didn't seem to read the whole article. This is the engineering sample with a base speed of 2.8GHz. If they can get the launch version closer to 4GHz it should rival the i7-4790.
 

chaosblade

Unconfirmed Member
Apparently the Ashes CPU framerate is not actually rating the CPU alone. CPU ratings seem to be all over the place.

Here's a 4670K bench with the same GPU, falling slightly below Zen.


That's pretty close if not slightly higher than what I was expecting. Still doesn't necessarily mean anything though.
 

Duxxy3

Member
Interesting. It's going to depend on price though. If it's even with an i5 6500, nobody will give a crap.
 

Inuhanyou

Believes Dragon Quest is a franchise managed by Sony
They would have to clock it pretty low in a console even with these benchmarks, but an i7 from 2011 clocked at the same speed of 1.6GHZ in PS4, is still significantly faster than jaguar
 

Renekton

Member
The Anandtech thread does seem super weird, if eg you look at 4790 vs 6700 there. Also clocks are all over the place, just no consistency.

I guess no need to freak out yet.
 
Too bad it wasn't done clock for clock...

It's easy to estimate clock for clock, and if you wanted to truly do clock for clock, it's pretty trivial to limit the multiplier on any Intel chip from the last decade to 2.8/3.2 ghz and do benches on that Intel chip to get a true clock for clock.

I'll leave this as an exercise for you guys.

As with all engineering samples, final clocks will be pending until much closer to launch. Let's all hope that GlobalFoundries 14FF process has improved from Polaris otherwise all hope will be lost.

Interesting. It's going to depend on price though. If it's even with an i5 6500, nobody will give a crap.

What's ironic is that for all the screaming and kicking about Intel possibly having a monopoly if it had no competitors, the reality is that on the high end Intel has had a de-facto monopoly for about a decade now. With the exception of the recent 10-core/20-thread Broadwell-E monster, the pricing of Intel CPUs has not budged during the last decade.

This puts any potential competitor in a huge dilemma because an i5-6500 costs all of $200, the i7-4790K costs $340. If that's the best your new CPU can perform, then you're going to have a hell of time trying to recoup development costs.

This is the same crisis that Polaris faces, because the most that anyone could possibly want to pay for an RX 480 is $250 and that's not how you recoup your development costs on that either.

Zen and Vega desperately need to be able to compete with the $500+ products from Intel and Nvidia because quite frankly you can't spend years and years and billions of dollars on R&D and then only be able to charge $200-300 for the resulting product. That's just an inevitable death for a technology company by slow strangulation of revenue without which further R&D cannot be funded, which is why Intel and Nvidia always seem to be perpetually 1-2 years ahead and there is no way to ever catch up.
 

chaosblade

Unconfirmed Member
It's easy to estimate clock for clock, and if you wanted to truly do clock for clock, it's pretty trivial to limit the multiplier on any Intel chip from the last decade to 2.8/3.2 ghz and do benches on that Intel chip to get a true clock for clock.

I'll leave this as an exercise for you guys.

As with all engineering samples, final clocks will be pending until much closer to launch. Let's all hope that GlobalFoundries 14FF process has improved from Polaris otherwise all hope will be lost.

I could have sworn I read they were using TSCM for Zen, but it looks like that was probably just some WCCF bullshit that was being spread around.

I'm not entirely convinced the issues Polaris has are on GF at this point though, seems more like it's just Polaris' design and last minute decisions by AMD. If that's the case hopefully GF will be good to go for Zen. Might mean bad things for Vega though.
 
I could have sworn I read they were using TSCM for Zen, but it looks like that was probably just some WCCF bullshit that was being spread around.

I'm not entirely convinced the issues Polaris has are on GF at this point though, seems more like it's just Polaris' design and last minute decisions by AMD. If that's the case hopefully GF will be good to go for Zen.

Zen is going to be a GloFo product. TSMC doesn't have nearly the capacity to make large x86 dies. Furthermore TSMC is basically running at 100% fab capacity all the time 24/7 365/forever as it is since they are suppliers for Apple, Qualcomm, Nvidia, Mediatek, and pretty much everyone else who is fabless and needs something made with 16FF.
 

rrs

Member
The benchmark seems like a shadow covered tease of the bad guy from a bad 80's ninja film at this point. Also, seems like Zen can really crank down the clocks if it can do ~500 Mhz on desktop, could be fun to undervolt
 

Durante

Member
Based on that benchmark and adjusting for clock/cores IPC is really disappointing. Hope it's not true, but it probably is.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
The reference I looked at yesterday suggested it wasn't, but I guess it was incomplete. It seems the L3 is 8-way from:
https://books.google.com/books?id=o...ache associativity&pg=PP1#v=onepage&q&f=false

Whereas Intel's L3 is 16-20-way based on http://en.wikichip.org/wiki/intel/microarchitectures/broadwell

So it seems like the basic point about comparing caches with different levels of associativity still stands.
Ok, so how do you compare 8-way L3 + 16-way L4, on one hand, to 16-20 way L3 on the other?
 

tokkun

Member
Ok, so how do you compare 8-way L3 + 16-way L4, on one hand, to 16-20 way L3 on the other?

Comparing those specific configurations using the benchmark and metrics is fine. But recognize that when you have multiple variables that you know are capable of impacting the outcome, it limits how much you can generalize the conclusions about any one variable. In this case, the cache organization (capacity, number, associativity) and the underlying technology (SRAM, eDRAM) varied. My complaint was that you can't take the data and make a broad statement solely about the technology, because it is difficult to tease out how much of the performance effect is attributed to technology vs organization.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Comparing those specific configurations using the benchmark and metrics is fine. But recognize that when you have multiple variables that you know are capable of impacting the outcome, it limits how much you can generalize the conclusions about any one variable. In this case, the cache organization (capacity, number, associativity) and the underlying technology (SRAM, eDRAM) varied. My complaint was that you can't take the data and make a broad statement solely about the technology, because it is difficult to tease out how much of the performance effect is attributed to technology vs organization.
But the apples-to-apples comparison you've been after is in favor of eDRAM from one size upwards, thanks to improved density, and ergo - improved wire latencies. That's not something new, and set associativity does not change that -- it's been known for years. Here's an IBM presentation that discusses the subject at length: https://www.src.org/calendar/e003676/barth.pdf

And here's a paper in part by Intel, where you can see how LLC made of the largest GainCell eDRAM (cell size 2x-3x larger than traditional 1T1C eDRAM; no explanation given why they did not consider 1T1C) can still have practically identical performance to SRAM: http://terpconnect.umd.edu/~blj/papers/hpca2013.pdf p. 6: Chaper 5. Modeling and Methodology - Table 2 and onwards.

BTW, a short quote from the above (bolding on my part):

Chang et al said:
As shown in Table 2, since the interconnections play a dominant role in access time and access energy for high capacity caches, the STT-RAM and the eDRAM caches have shorter read latencies and lower read energies compared
to the SRAM cache. This is due to their smaller cell sizes and shorter wires. In particular, the STT-RAM cache has the smallest cell size and correspondingly best read performance. However, although its retention time is sacrificed for better write performance, the STT-RAM cache still has the highest write latency and energy. Finally, when comparing standby power, SRAM is the leakiest technology among the three memory designs. Both STT-RAM and eDRAM dissipate low leakage power, but the eDRAM cache suffers short retention time and high refresh power.

Now, whether all that manifests in the Anand numbers - we cannot know for sure, but the physics understanding of the subject was established years ago. And it boils down to 'Smaller is faster'.
 
D

Deleted member 465307

Unconfirmed Member
Is there any chance Zen could outperform something like Kaby Lake's hypothetical 7700 in general use cases? The new AMD benchmark is against Intel's Broadwell-E line in a specific use case, which seemed impressive to me at first, but then I wondered if those kinds of results will hold true in most settings and in cases where it's not 8 cores vs. 8 cores.

I was planning on buying Intel for my next computer, but it seems pointless to pay for integrated graphics if I'm buying a GPU and not in the market for their most expensive products.
 

DonMigs85

Member
Is there any chance Zen could outperform something like Kaby Lake's hypothetical 7700 in general use cases? The new AMD benchmark is against Intel's Broadwell-E line in a specific use case, which seemed impressive to me at first, but then I wondered if those kinds of results will hold true in most settings and in cases where it's not 8 cores vs. 8 cores.

I was planning on buying Intel for my next computer, but it seems pointless to pay for integrated graphics if I'm buying a GPU and not in the market for their most expensive products.

If the 7700 is 4 cores/8 threads again, then a 8/16 thread Zen will easily outperform it in anything multithreaded. Shouldn't feel much difference even in regular Windows usage or web browsing/gaming as well
 
Top Bottom