• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

The NEW Xbox one thread of Hot Chips and No Pix

mrklaw

MrArseFace
Interesting, was the cpu pic posted?

12.jpg

What's this about improved compute processor? Is that a specific customisation or just the normal 2 ACEs in GCN?

And how do all the different processors interact? Do they have EUR own dedicated buses or share the main one?
 

vcc

Member
First of all, let me just state that I think hUMA is a stupid term and hope it dies a horrible death.

That said, it clearly has heterogeneous processors connected to the same pool of memory, and clearly offers some degree of cache coherence.

It is not clear (to me, maybe the information is out there and I haven't seen it), what the story is with the eSRAM.

Whether AMD marketing want to label this implementation as hUMA or not is, from my perspective, utterly meaningless. We can discuss the pros and cons of the implementation without acting as if its consonance with AMD jargon is actually of any importance.

The hUMA and UMA concept is that they unify the addressing systems so the GPU and CPU can work on the same data at reduced cost and it's hardware managed so it has a reduced implementation cost. Wouldn't having a extra large cache like the eSRAM basically negate most of the benefits? Either the Move Engines have to shift data back and forth to enable GPGPU or the devs have to set aside an allotment of main memory and mark it CPU/GPU shared data.
 

ekim

Member
What's this about improved compute processor? Is that a specific customisation or just the normal 2 ACEs in GCN?

And how do all the different processors interact? Do they have EUR own dedicated buses or share the main one?

I don't know if this is a standard ACE feature, but the one's in X1 support fine granular pre-emption. (GPGPU doesn't need to be synchronized with rendering / Context switching?)
 

c0de

Member
Ms has to do an incredible job for their SDK to put everything together. I think it will take some time for devs to fully use the potential of the hardware.
 

mrklaw

MrArseFace
I don't know if this is a standard ACE feature, but the one's in X1 support fine granular pre-emption. (GPGPU doesn't need to be synchronized with rendering / Context switching?)

Sounds interesting - is there more info on this, and how it compares with PS4's implementation?
 

ekim

Member
Sounds interesting - is there more info on this, and how it compares with PS4's implementation?

Unfortunately not - but it's part of AMD's HSA definition.
I guess the PS4 has the same thing. Don't know.

http://www.anandtech.com/show/5847/...geneous-and-gpu-compute-with-amds-manju-hegde
6. GPU compute context switch and GPU graphics pre-emption: GPU tasks can be context switched, making the GPU in the APU a multi-tasker. Context switching means faster application, graphics and compute interoperation. Users get a snappier, more interactive experience. As UI's are becoming increasing more touch focused, it is critical for applications trying to respond to touch input to get access to the GPU with the lowest latency possible to give users immediate feedback on their interactions. With context switching and pre-emption, time criticality is added to the tasks assigned to the processors. Direct access to the hardware for multi-users or multiple applications are either prioritized or equalized

Sounds like something MS does want.
 

mrklaw

MrArseFace
Will game developers really care whether it is exactly hUMA? Won't they simply think 'oh I can have a shared area if ram for common CPU/GPU shared stuff and I don't have to keep moving the data back and forth manually to keep it in sync. That'll be handy'
 

mrklaw

MrArseFace
Only thing they care about is that they have 8GB coherent memory with 176GB/s in PS4 and 8GB coherent memory with 68GB/s in Xbox One. The eSRAM is non-coherent (at least it looks like it on the slides). So, Xbox One offers a slow memory pool with coherency or a tiny but fast memory pool without coherency. PS4 has one big and very fast coherent pool. PS4's power will be much easier to utilize to the full.

you wouldn't need a huge coherent memory pool. If you're just storing textures/models that can just be dumb memory. You'd only need to reserve a small amount that you need for shared data between GPU/CPU.

and the limited coherent bandwidth of both consoles looks to be around 30GB/s, so the 176 isn't relevant in that case.
 

TheKayle

Banned
The flash will be incredibly slow compared to the DDR3, it will not be used for anything which requires performance.




Which is very little (<10%) of a Jaguar core. MS needed the extra horse power for Kinect otherwise they could have saved the time, money and die space.


doing audio work with optimized vector operations 360 and jaguar core are pretty much equals...
there are games that needed more than two core of the old 360 (forza)

shape cant be emulated at 100% with 8 jaguar cores

this will end just offloading the cpu work
and probably having better audio in games than other consoles with less resource used

is something good for the xb1
 

gofreak

GAF's Bob Woodward
you wouldn't need a huge coherent memory pool. If you're just storing textures/models that can just be dumb memory. You'd only need to reserve a small amount that you need for shared data between GPU/CPU.

and the limited coherent bandwidth of both consoles looks to be around 30GB/s, so the 176 isn't relevant in that case.

It kind of is.

Assuming each saturated its coherent bandwidth for mixed CPU/GPGPU tasks, 30 out of 68GB to DDR3 is going to hurt graphics tasks on Xbox One more than 30GB out of 176GB to GDDR5. Yes, the eSRAM bandwidth is unaffected but that's going to only to 32MB...

So it's something to bear in mind if you're doing GPU stuff that you want to be cache coherent with the CPU...you have to write out to DDR3, and watch your bandwidth.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Cool specs. I still think that both next gen systems will be fairly similar in terms of graphics.

Depends what you mean by similar, but i would be surprised if a GPU with (more or less) the same architecture but 41% more processing power (or 56% if you subtract the 10% reserved most probably for snap), and double the ROPs won't show. The benefit of the 360's GPU was much less than that and it still showed in practically every multi-platform title. The better memory setup won't
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Could somebody simplify what a compute processor is? I hear them mentioned lots

It doesn't actually process application data but it buffers and schedules incoming GPU instructions onto available resources.

he GCN command processor is responsible for receiving high-level level API commands from the driver and mapping them onto the different processing pipelines. There are two main pipelines in GCN. The Asynchronous Compute Engines (ACE) are responsible for managing compute shaders, while a graphics command processor handles graphics shaders and fixed function hardware. Each ACE can handle a parallel stream of commands, and the graphics command processor can have a separate command stream for each shader type, creating an abundance of work to take advantage of GCN's multi-tasking.

http://www.amd.com/us/Documents/GCN_Architecture_whitepaper.pdf p. 12
 

AgentP

Thinks mods influence posters politics. Promoted to QAnon Editor.
I obviously can't link it, because I don't want to get anyone in trouble, but it's a Durango Developer Summit powerpoint presentation called Graphics on Durango. It's an official MS document on the Xbox One from early 2012.

Assuming this is real and your connection for what MS knew about the eSRAM then and now was this:

Full speed FP16x4 writes are possible in limited circumstances.

Then I would say you also believe the eSRAM BW suddenly being reported as 204GB/s is not a general read/write operation but a specific operation, correct? You can't have it both ways...
 

AgentP

Thinks mods influence posters politics. Promoted to QAnon Editor.
doing audio work with optimized vector operations 360 and jaguar core are pretty much equals...
there are games that needed more than two core of the old 360 (forza)

shape cant be emulated at 100% with 8 jaguar cores

this will end just offloading the cpu work
and probably having better audio in games than other consoles with less resource used

is something good for the xb1

Sorry, I can never take any technical info from you, don't take it personal, but you just make shit up...
 

tokkun

Member
Only thing they care about is that they have 8GB coherent memory with 176GB/s in PS4 and 8GB coherent memory with 68GB/s in Xbox One. The eSRAM is non-coherent (at least it looks like it on the slides). So, Xbox One offers a slow memory pool with coherency or a tiny but fast memory pool without coherency. PS4 has one big and very fast coherent pool. PS4's power will be much easier to utilize to the full.

If the eSRAM really is a scratchpad with its own address space, then the question of whether it is coherent or not is meaningless.

The hUMA and UMA concept is that they unify the addressing systems so the GPU and CPU can work on the same data at reduced cost and it's hardware managed so it has a reduced implementation cost. Wouldn't having a extra large cache like the eSRAM basically negate most of the benefits?

Based on the information available, it appears both CPU and GPU have a shared, coherent view of main memory, so that memory model has the same benefits you mentioned.

As far as the eSRAM goes, I don't think it has been stated by an official source, but let's go with the idea that it is a typical scratchpad memory, as has been claimed in this thread.

If so, the CPU and GPU still have a shared, coherent view of memory in the scratchpad (as an exclusive address space, the scratchpad cannot possibly be non-coherent). The challenge is actually the reverse: it is more difficult to have the CPU and GPU both use the scratchpad without sharing data. The address space of a scratchpad is not virtualized, so you need some way of partitioning - either spatially or temporally - to prevent one processor from clobbering another's data.

By the way, none of these systems are UMA. They are all NUMA, PS4 included.

Either the Move Engines have to shift data back and forth to enable GPGPU or the devs have to set aside an allotment of main memory and mark it CPU/GPU shared data.

This has been true of the GPUs internal scratchpad memory since the beginning, and presumably is still true with the GCN used here, with the exception that data sharing between cores is possible.
 
http://www.extremetech.com/gaming/164934-xbox-one-bus-bandwidths-graphics-capabilities-and-odd-soc-architecture-confirmed-by-microsoft
The big picture takeaway from this is that the Xbox One probably is HSA capable, and the underlying architecture is very similar to a super-charged APU with much higher internal bandwidth than a normal AMD chip. That’s a non-trivial difference — the 68GB/s of bandwidth devoted to Jaguar in the Xbox One dwarfs the quad-channel DDR3-1600 bandwidth that ships in an Intel X79 motherboard. For all the debates over the Xbox One’s competitive positioning against the PS4, this should be an interesting micro-architecture in its own right.
First, there’s the fact that while we’ve been calling this a 32MB ESRAM cache, Microsoft is representing it as a series of four 8MB caches. Bandwidth to this cache is apparently 109GB/s “minimum” but up to 204GB/s. The math on this is… odd. It’s not clear if the ESRAM cache is actually a group of 4x8MB caches that can be split into chunks for different purposes or how its purposed. The implication is that the cache is a total of 1024 bits wide, running at the GPU’s clock speed of ~850MHz for 109GB/s in uni-directional mode — which would give us the “minimum” talked about. But that has implications for data storage — filling four blocks of 8MB each isn’t the same as addressing a contiguous block of 32MB. This is still unclear.
There are still questions regarding the ESRAM cache — breaking it into four 8MB chunks is interesting, but doesn’t tell us much about how those pieces will be used. If the cache really is 1024 bits wide, and the developers can make suitable use of it, then the Xbox One’s performance might surprise us.
Edit:nvm
 
Assuming this is real and your connection for what MS knew about the eSRAM then and now was this:



Then I would say you also believe the eSRAM BW suddenly being reported as 204GB/s is not a general read/write operation but a specific operation, correct? You can't have it both ways...

I've always said this from the very beginning, if I'm not mistaken. I've compared it to the 360's EDRAM situation a number of times. The Xbox 360 never had a true full read/write of 256GB/s. It only ever achieved such remarkable bandwidth numbers when a bunch of specific operations were being run together simultaneously. I think this is what Microsoft means when they say a 'minimum' of 109GB/s is possible on the ESRAM. I think they're saying that's the typical bandwidth for your average read/write operation, but that there are specific and limited situations in which 204GB/s is theoretically achievable, just like there was no way the Xbox 360's EDRAM was providing 256GB/s unless the game was using 4XMSAA and a few other techniques simultaneously. I think what prevented 360 devs from doing certain things was never an issue with the EDRAM's bandwidth, but simply the amount of memory space. 10MB wasn't enough space. However, it's been said that 32MB of ESRAM is just right for a part targeting 1080p with the intentions of using a decent AA solution, but of course 64MB would've been a whole lot better.
 
He got that information from Bkilian from B3D.

Which actually worked on the Xbox One audio chip. I think I remember it being said that Shape's power couldn't be emulated even if you utilized every core on the Xbox 360 for audio, but I don't remember what was said regarding SHAPE vs Jaguar Processors.

I do think, however, that it was incredibly hard to emulate the chip even with a xeon in the development kits or something like that.

Will game developers really care whether it is exactly hUMA? Won't they simply think 'oh I can have a shared area if ram for common CPU/GPU shared stuff and I don't have to keep moving the data back and forth manually to keep it in sync. That'll be handy'

Yea, I don't think so. The design of the Xbox One clearly has some design implications that devs have to think about, but I think there will be a number of things that developers are pleased with. For example, the existence of the ESRAM, according to MS, means that there's no need to move data into System RAM in order to read. That has to come in handy, since it's cutting down on the need to copy to the slower system ram. There seems to be a lot of cool little things about the platform that I really see devs taking advantage of in the coming years. One of the bigger feature additions to hUMA over AMD's previous solution was the ability to share pointers between the CPU and the GPU. The Xbox One architecture, according to early dev documentation, makes it so pointers can be passed between the CPU and GPU. So clearly some consideration has been given in the design to provide devs with new ways to address the available hardware.
 

velociraptor

Junior Member
It's the two ACEs of the Xbox One. PS4 has eight ACEs.

I get the feeling this presentation was a PR event and not for educational purpose. Microsoft just renamed every part of the GCN GPU to "specialized XY", but of course they didn't mention that these parts are pretty standard or that they're pretty weak compared to PS4.

What we learned is that PS4 is much stronger than Xbox One:

Xbox One:
1.31 TFLOPS
40.9 GTex/s
13.6 GPix/s
68GB/s DDR3
109GB/s eSRAM

PS4:
1.84 TFLOPS (+40%)
57.6 GTex/s (+40%)
25.6 GPix/s (+90%)
176GB/s GDDR5

We also know that PS4 will be superior in GPGPU tasks since Sony integrated four times as much compute command processors than Microsoft. And we can expect that Xbox One will not have a hUMA, otherwise Microsoft would have said it explicitly.



No. The expensive HD7000 GPUs have two ACEs. HD8000 has four ACEs. PS4 will have eight ACEs.
How many compute units and ROPS do each of the consoles have?

Compute units are what will be used for GPGPU, right? i.e. the more compute units, the more specific tasks you can assign.

What is the primary function of 'ACE'? What benefit does 8 ACEs provide?
 

velociraptor

Junior Member
PS4: 18 CUs (1152 ALUs), 32 ROPs
XB1: 12 CUs (768 ALUs), 16 ROPs



Yes.
So if the XB1 has 12 CUs, does that mean it also can perform GPGPU, or is that limited to the PS4? Any other advantages of having a large number of CUs besides allowing a greater number of possible tasks executed in parallel to graphics processing?
 

Oppo

Member
Will game developers really care whether it is exactly hUMA? Won't they simply think 'oh I can have a shared area if ram for common CPU/GPU shared stuff and I don't have to keep moving the data back and forth manually to keep it in sync. That'll be handy'

Cerny pointed to this very specific point as the #1 request from developers. Unified, fast memory. He even talked about the esRAM approach and why they decided against it (and strongly implied that "old Sony" would have gone for that approach, difficulty be damned).
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
So if the XB1 has 12 CUs, does that mean it also can perform GPGPU, or is that limited to the PS4? Any other advantages of having a large number of CUs besides allowing a greater number of possible tasks executed in parallel to graphics processing?

CUs also do the major part of graphics processing like execution of shaders and texturing. They are not only there for compute. They are pretty much the most important part of the entire GPU.

XB1 and PS4 are both based on the same GPU architecture, so both can perform GPGPU in pretty much the same way, although the PS4 has more modifications to make GPGPU more efficient. And of course, the more raw power (= CUs) you have the more you can afford to spend on such computations.
 

astraycat

Member
We learned quite a bit that was new. We learned more about the coherency in the system, we learned how they implemented the ESRAM. It's 4 x 8MB slices of ESRAM. We were thinking of it before like one huge 32MB slice, or at least I was.

A little late, but just to clarify that this more of an implementation detail than something developers worry about. It is a 32MB pool, but there are four memory controllers (4x 256bit as indicated in the diagram) that each are responsible for 8MB of data. From a programming perspective you'd see it as a single pool. The 4x 256-bit memory controllers are how MS arrived at the 109GB/s number in the first place, since (1024-bits/cycle) * (853M cycles/s) * ( 1 GB / 8Gbits ) ~= 109.2 GB/s.
 

velociraptor

Junior Member
CUs also do the major part of graphics processing like execution of shaders and texturing. They are not only there for compute. They are pretty much the most important part of the entire GPU.

XB1 and PS4 are both based on the same GPU architecture, so both can perform GPGPU in pretty much the same way, although the PS4 has more modifications to make GPGPU more efficient. And of course, the more raw power (= CUs) you have the more you can afford to spend on such computations.
Interesting. I had no idea. And here I thought that they chucked in a 7850 into the PS4.
 

Metfanant

Member
Interesting. I had no idea. And here I thought that they chucked in a 7850 into the PS4.

this is why so many people were up in arms about that one article on DF where leadbetter took off the shelf PC parts that "match" the theoretical performance of the consoles and ran those benchmarks...because BOTH of these consoles are much different than any off the shelf AMD sku's....
 
I still think that both next gen systems will be fairly similar in terms of graphics.

They where both similar this gen and PS3 lost most comparisons, which I'm sure didn't do it any favors, so "similar" doesn't really count for much to say it doesn't matter (it does - as the more powerful easier to develop for console will reap the multiplat benefits).
 

kmag

Member
What, if any, sacrifices did MS have to do to get the console into a configuration that supports a low power state?

Aside from the cost of power gating chips or portions of them, not much if anything in terms of performance. It adds to the complexity and the size of the chip marginally but not much else.
 
A little late, but just to clarify that this more of an implementation detail than something developers worry about. It is a 32MB pool, but there are four memory controllers (4x 256bit as indicated in the diagram) that each are responsible for 8MB of data. From a programming perspective you'd see it as a single pool. The 4x 256-bit memory controllers are how MS arrived at the 109GB/s number in the first place, since (1024-bits/cycle) * (853M cycles/s) * ( 1 GB / 8Gbits ) ~= 109.2 GB/s.

Thanks for the clarification. 1024 bits wide. That's pretty damn interesting. Love all the little pieces of info that we're finding out now about the system.
 

mrklaw

MrArseFace
So if the XB1 has 12 CUs, does that mean it also can perform GPGPU, or is that limited to the PS4? Any other advantages of having a large number of CUs besides allowing a greater number of possible tasks executed in parallel to graphics processing?

Fundamentally, more CUs means you can do more; more compute queues means you can more efficiently use each CU

PS4 has both
 

AmyS

Member
Didn't read entire thread so I'm adhering to the no pics thing.



This boarders on the utterly ridiculous, yet at the same time was too interesting *not* to post here.



[RUMOR]Microsoft Xbox One Could Possibly Feature a Powerful Discrete GPU Core Stacked Inside the APU

At Hot Chip 2013, Microsoft unveiled an overview of the SOC architecture powering their next generation Xbox One console. The Xbox One&#8217;s accelerated processing unit covers a die size of 363mm2 which is huge considering AMD&#8217;s flagship Tahiti chip comes with a die size of 389mm2.

Microsoft Xbox One Could Possibly Feature a Powerful Discrete GPU
We received a tip from one of our sources who work as a whole group which seemed quite fascinating to us so we thought of sharing it would our readers. For months, it has been know that Microsoft&#8217;s Xbox One features a weak graphics core compared to its competitor, the PlayStation 4. Both consoles boast impressive specs but the Sony&#8217;s console with faster graphics and GDDR5 memory turns out the be the winner in terms of performance. But to our surprise, if this rumor turns out to be true, Microsoft could have the fastest gaming console to date. While i was still working on this article, i found out that misterxmedia had already covered most of the details that i recieved from the anonymous source. Do read it, since its more proof of what the Xbox One may be in reality.

What our source revealed is something that hasn&#8217;t been mentioned before or revealed by any official Microsoft representative. It all starts with the following quoted text from Venturebeat:

&#8220;Microsoft disclosed some details but left many important pieces out. Evidently, Microsoft doesn&#8217;t want to tell all of its competitors about how well designed its system is.&#8221;

Now, we summed up almost every bit of details from Hot Chip 2013 a few weeks back which can be seen below:
Xbox One Architecture and Key Features:
8 Jaguar Cores
1.31 TFlops Performance
5 Billion Transistors
32 MB ESRAM
8 GB DDR3 Memory
8 GB Flash memory
15 special purpose processors
4 Command processors (2 compute, 2 graphics)
SHAPE offloads >1 CPU core
Memory coherency between CPU cores and GPU
Audio offload processor custom designed by Microsoft - 1 CPU core worth of processing.
68 GB/sec peak bandwidth to off-chip 8GB DDR3 memory.
204 GB/s peak bandwidth to 32MB of on-die storage.


So my question is, what do we still not know regarding the Xbox One? The answer is what could possibly be a super fast discrete graphics core stacked inside the main SOC itself. I know it sounds fishy when you first hear about this and the first thing to come up in your mind are, how did anyone not notice this in the first place? or Does AMD or Microsoft have the technology to even manage something like a stacked wafer design? Well, the Xbox architecture hasn&#8217;t been in works for just recently but it has been under consideration since some time.

The technology does exist and first appeared in an AMD slide back in 2009. AMD has bet everything on their APU design and if they want to remain competitive on both x86, GPU and Compute sectors than their all-in-one SOCs are their best hope. Back in 2009, AMD revealed a slide that showcased their future potential designs for APUs, the slide starts off with a common PC setup we found these days that includes a discrete GPU and an x86 CPU. Next up is the first APU design which shows the GPU and CPU on a single die, after this we see optimization between both units which include a coherent memory structure and various other improvements which are referring to the Heterogeneous System Architecture which we would see in the next generation of APUs. The last stage is the most critical which involves exploitation of what more could be done and that is only possible through a full fusion of the CPU and GPU knowned as APD (Accelerated Processing Device).

&#8220;APD refers to any cooperating collection of hardware and/or software that performs those functions and computations associated with accelerating graphics processing tasks, data parallel tasks, or nested data parallel tasks in an accelerated manner compared to conventional CPUs, conventional GPUs, software and/or combinations thereof&#8221;

According to the information, the APU fused on the Xbox One is using an W2W (wafer 2 wafer) multi-module design that allows stacking of one chip on another. The first layer wafer is dedicated to the 28nm HPM Main SOC which was revealed during Hot Chip 2013 with 8 Jaguar cores and a modified Radeon HD 6670 GPU with DX11.1 and OpenGL 3.0 support. The layer beneath it is dedicated to the stacked discrete GPU and the final wafer layer is dedicated to the 32 MB of ESRAM. In development phase, it was revealed by an insider that Microsoft Xbox One could possibly end up with 2 GB of GDDR5 memory in the stacked design in addition to the 8 GB DDR3 that is available to the system. The other information revealed by him were that the Xbox One would have a power envelope of 300W (90W for Main SOC and 95W for Discrete GPU) but during Wired&#8217;s tear down of the Xbox One, the power configuration revealed that Xbox One&#8217;s discrete GPU would be getting around 130-150W of power.

So what is this discrete GPU itself? The second wafer is supposedly a 22nm SOI design featuring 2304 to 2560 stream processors as been rumored and based off the VI architecture. VI stands for Volcanic Islands which could be another reason why the discrete GPU has been kept a secret since the new Volcanic Islands based AMD Radeon family would be officially announced during the last week of September in Hawaii. If the Xbox One GPU is possibly based on a Volcanic Islands architecture or even a Southern Islands part with around 2048 cores as the Tahiti die, then this would prove to be a massive powerhouse of graphics performance.

As soon as the NDA lifts on Volcanic Islands, we will know if this rumor is true or not, till then all we can do is speculate and wait for some official response. Lastly, there is one more proof mentioned by the source which is a link that redirects back to an Linkedin profile of AMD&#8217;s GPU Engineer Vinber Lei, who seems to have worked on Graphics Core IP Design Verification,APU/DGPU Xbox One,Playstation 4. Notice the mention of dGPU again?

Note &#8211; This article is based off speculations from various parties !


http://www.info-pc.info/2013/09/rumormicrosoft-xbox-one-could-possibly.html


do not
Believe!?
 
Top Bottom