• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

The NEW Xbox one thread of Hot Chips and No Pix

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Devs have been developing for the xbox360 for the past 8 years. Dont think the ESRAM will be that complicated.

Sure, but not having to think about that at all makes life still easier.
 
Sure, but not having to think about that at all makes life still easier.

I've said this before, but it's a good thing then that the PS4 and Xbox One share similar processor and GPU architectures. That should at least help take some of the stress out of the process. They're no longer dealing with drastically different architectures. There are differences, just not differences as significant as the PS3 and Xbox 360.

well yes, because without it, youre talking monstrous deficiencies in bw...

Yep, that's about the gist of it.
 

Metfanant

Member
from the start, this upcoming generation is just some sort of bizzarro world/twilight zone...where Sony is making things easy, and everyone else is over complicating things...its just so fun to watch/be a part of...

really at the end of the day on a purely hardware front this generation hinged on Cerny's ballsy gamble to go with GDDR5 and being able to secure the right size chips to give them 8GB....
 
from the start, this upcoming generation is just some sort of bizzarro world/twilight zone...where Sony is making things easy, and everyone else is over complicating things...its just so fun to watch/be a part of...

really at the end of the day on a purely hardware front this generation hinged on Cerny's ballsy gamble to go with GDDR5 and being able to secure the right size chips to give them 8GB....

I think the extra GPU power is a bigger plus in the end.
 

Klocker

Member
from the start, this upcoming generation is just some sort of bizzarro world/twilight zone...where Sony is making things easy, and everyone else is over complicating things...its just so fun to watch/be a part of...

really at the end of the day on a purely hardware front this generation hinged on Cerny's ballsy gamble to go with GDDR5 and being able to secure the right size chips to give them 8GB....

Yep.

We could just as easily be dealing with one system with 4gb and one with 8.

It was ballsy indeed for Sony and MS certainly anticipated the possibility that Sony could pull out 8 but had long since set their own path and had to follow it.

I think they did pretty well considering those choices
 
Just a thought, but Xbone will support multiple Live account logins simultaneously. Kinect may be able to distinguish the different voices of each account holder and thus process and respond to each individual's voice commands appropriately. If this is functionality is constantly available even during gameplay, could it possibly be the reason for all that power in the Shape processor? Or is that stuff not handled by the audio processor? I'm envisioning scenarios where players may queue-up background tasks via voice command, without interrupting their play sessions.
 
Devs have been developing for the xbox360 for the past 8 years. Dont think the ESRAM will be that complicated.

The EDRAM in the 360 could only do one thing, and it was much faster, relative to the GPU power than the One's ESRAM is. And you still ended up with lots of sub-HD games from devs who just wanted their buffers to fit in 10MBs.
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
4 x 256bit Read/Write on the eSRAM according to MS.

So why isn't the theoretical not 218GB/s?

The ESRAM bandwidth issue indeed remains weird, especially since they "discovered" the new peak bandwidth rather late in the development process.

Especially since 4 * 256bit * 853mhz = 873,472 Gbit/s = 109,184 GB/s

There must be some way to read/write on the rising and falling edges of the clock. However, it is still weird that the theoretical maximum BW is not doubled by that.
 
The ESRAM bandwidth issue indeed remains weird, especially since they "discovered" the new peak bandwidth rather late in the development process.

Especially since 4 * 256bit * 853mhz = 873,472 Gbit/s = 109,184 GB/s

There must be some way to read/write on the rising and falling edges of the clock. However, it is still weird that the theoretical maximum BW is not doubled by that.

I don't think they found it very late, because the exact operation example suggested in the DF article about the increase in the XB1's memory performance was included right there in an Xbox One development document from back in early 2012.

It says word for word "Full speed FP16x4 writes are possible in limited circumstances."

This was from January or February 2012, and is a Microsoft document.

And now from the DF article.

Apparently, there are spare processing cycle "holes" that can be utilised for additional operations. Theoretical peak performance is one thing, but in real-life scenarios it's believed that 133GB/s throughput has been achieved with alpha transparency blending operations (FP16 x4).

The same exact thing mentioned in Microsoft's own document on the Xbox One from early 2012 on what the implications would be of having 32MB of ESRAM on the Xbox One. The DF article makes it sound like it's something that's only possible under certain conditions, and the 2012 MS document seems to anticipate that full speed writes of that nature would be possible in limited circumstances, so Microsoft literally knew about this as far back as early last year, which probably lends credibility to the increased ESRAM performance.
 
I can't shake the feeling it's just a bit too complicated, like they started out with an easy design and ended up just adding more and more layers where they see fit to achieve their goal.
Think about what the initial demands from higher ups meant for X1's design:

Must be virtually silent: Huge box with large heatsink & fan, limited power draw, external power brick, etc.

Must have lots of RAM to multitask: 8GB RAM from the start, which meant DDR3+ESRAM, limiting room on the SoC for better GPU.

Kinect mandatory: Extra budget/hardware resources dedicated to processing Kinect.

3 OSes, HDMI input overlay, etc.: Lots of budget poured into non-gaming software features.

It's a miracle it performs as well as it does given all the non-gaming restrictions and budget allocations placed on engineers.
 
Think about what the initial demands from higher ups meant for X1's design:

Must be virtually silent: Huge box with large heatsink & fan, limited power draw, external power brick, etc.

Must have lots of RAM to multitask: 8GB RAM from the start, which meant DDR3+ESRAM, limiting room on the SoC for better GPU.

Kinect mandatory: Extra budget/hardware resources dedicated to processing Kinect.

3 OSes, HDMI input overlay, etc.: Lots of budget poured into non-gaming software features.

It's a miracle it performs as well as it does given all the non-gaming restrictions and budget allocations.

One thing has certainly emerged from all this, however. That awesome audio chip would absolutely not even be in there without a big focus on Kinect. So for people who love them some incredible audio in their games, one of those decisions could be a blessing in disguise.
 
Sorry but this isa is pretty horrible compared to the ps4 and smacks of ps3. Sure, sure these devs are going to port your game over to the xbone... And then make some of these calls work on the audio chip because of flops. Sure, they won't just turn down the res. just like they did on the ps3.

What a bunch of jokers. All of this because they wanted to not have to source for gddr5 and were stuck on this multi OS kick. Jokers the lot of them.
 

KidBeta

Junior Member
I don't think they found it very late, because the exact operation example suggested in the DF article about the increase in the XB1's memory performance was included right there in an Xbox One development document from back in early 2012.

It says word for word "Full speed FP16x4 writes are possible in limited circumstances."

This was from January or February 2012, and is a Microsoft document.

And now from the DF article.



The same exact thing mentioned in Microsoft's own document on the Xbox One from early 2012 on what the implications would be of having 32MB of ESRAM on the Xbox One. The DF article makes it sound like it's something that's only possible under certain conditions, and the 2012 MS document seems to anticipate that full speed writes of that nature would be possible in limited circumstances, so Microsoft literally knew about this as far back as early last year, which probably lends credibility to the increased ESRAM performance.

But full speed of the write would be limited by the ROPs, so you would have a max of 108GB/s. Can't write faster then that, ever.
 
I don't think they found it very late, because the exact operation example suggested in the DF article about the increase in the XB1's memory performance was included right there in an Xbox One development document from back in early 2012.

It says word for word "Full speed FP16x4 writes are possible in limited circumstances."

This was from January or February 2012, and is a Microsoft document.

And now from the DF article.



The same exact thing mentioned in Microsoft's own document on the Xbox One from early 2012 on what the implications would be of having 32MB of ESRAM on the Xbox One. The DF article makes it sound like it's something that's only possible under certain conditions, and the 2012 MS document seems to anticipate that full speed writes of that nature would be possible in limited circumstances, so Microsoft literally knew about this as far back as early last year, which probably lends credibility to the increased ESRAM performance.

That would imply such a "hole" existed during almost every cycle.
 

artist

Banned
32 + 4MB of L2 cache on CPU + L1 cache + whatever the fuck the rest of it is. ~1mb is on the audio chip... Maybe 4 or so MBs on the GPU... don't know where the rest is.

Also, not hUMA, but a solution to accommodate for it.
32MB eSRAM
4MB L2 (CPU)
512KB L1 (CPU)
1MB SHAPE
512KB L2 (GPU)
768KB LDS (GPU)
96KB Scalar Data cache (GPU)
3120KB (GPU - 260KB per CU)

~ 42MB + Buffers + 2 Geometry engine caches + Redundancy. Redundancy of ~10% for eSRAM and you arrive close to 47 ;)

The 47MB on-chip storage figure is as meaningless figure as the 5 billion transistors.
 
32MB eSRAM
4MB L2 (CPU)
512KB L1 (CPU)
1MB SHAPE
512KB L2 (GPU)
768KB LDS (GPU)
96KB Scalar Data cache (GPU)
3120KB (GPU - 260KB per CU)

~ 42MB + Buffers + 2 Geometry engine caches + Redundancy. Redundancy of ~10% for eSRAM and you arrive close to 47 ;)

The 47MB on-chip storage figure is as meaningless figure as the 5 billion transistors.
Seems like it. Thanks for counting though.
 

AgentP

Thinks mods influence posters politics. Promoted to QAnon Editor.
SenjutsuSage, if you are going to make some claims, at least link WTF you are talking about. Some vague reference to a 202 doc doesn't help anyone.
 

chadskin

Member
32MB eSRAM
4MB L2 (CPU)
512KB L1 (CPU)
1MB SHAPE
512KB L2 (GPU)
768KB LDS (GPU)
96KB Scalar Data cache (GPU)
3120KB (GPU - 260KB per CU)

~ 42MB + Buffers + 2 Geometry engine caches + Redundancy. Redundancy of ~10% for eSRAM and you arrive close to 47 ;)

The 47MB on-chip storage figure is as meaningless figure as the 5 billion transistors.

Thanks for the rundown. Quite a big L1 cache, it seems.
 
Sorry but this isa is pretty horrible compared to the ps4 and smacks of ps3. Sure, sure these devs are going to port your game over to the xbone... And then make some of these calls work on the audio chip because of flops. Sure, they won't just turn down the res. just like they did on the ps3.

What a bunch of jokers. All of this because they wanted to not have to source for gddr5 and were stuck on this multi OS kick. Jokers the lot of them.

If Sony didn't get lucky with getting the 8GB gddr5 ram they wanted, this would have been the next best thing.
 

szaromir

Banned
The EDRAM in the 360 could only do one thing, and it was much faster, relative to the GPU power than the One's ESRAM is. And you still ended up with lots of sub-HD games from devs who just wanted their buffers to fit in 10MBs.
That wasn't the only limiting factor, ie. the amount of sub-HD games on PS3 was even greater despite not having to deal with it. Same goes for sub-SD games on OG Xbox (I wanted to add Wii/GC/PS2 but then realized they had embedded RAM).
 
Can a fellow japanese Gaffer translate this article?

http://pc.watch.impress.co.jp/docs/column/kaigai/20130827_612762.html

Goto thinks or seen (?) that the PS4 APU is 250-300mm2. That corresponds to what I heard couple of months back - http://www.neogaf.com/forum/showpost.php?p=61566193&postcount=3668

Interesting, was the cpu pic posted?

06.jpg

12.jpg
 

daxter01

8/8/2010 Blackace was here
So the 15 special purpose processors are as follows?

1) av out / resize compositor
2) av in
3) video encode
4) video decode
5) swizzle copy lz encode
6) swizzle copy lz/mjpeg decode
7+8) swizzle copy x 2

audio :
9) C something something dsp (top blue box in audio processor)
10) scalar dsp core
11+12) vector dsp core x 2
13) sample rate converter
14) audio dma
15) the last orange box covered by the guy's head
is this real?
 
I really like the information and architecture of the system. Looks like what I'd expect a console to look like. All these little tricks and extras will bring the playing much closer than the theoretical maxes on both consoles would suggest at 1080p.
 
32 + 4MB of L2 cache on CPU + L1 cache + whatever the fuck the rest of it is. ~1mb is on the audio chip... Maybe 4 or so MBs on the GPU... don't know where the rest is.

Also, not hUMA, but a solution to accommodate for it.

8MB are probably a spare ESRAM block added to improve yield. (have 5, got 1 broken, laser cut)
Oh, GPU caches too.
 
Can you ask your dev buddy why the bw is 204MB instead 218MB?

I fully intend to do just that, so I can get a better understanding of how they arrive at the figure, although I won't be the least bit surprised if we have our answer in the next 4 days or so.

SenjutsuSage, if you are going to make some claims, at least link WTF you are talking about. Some vague reference to a 202 doc doesn't help anyone.

I obviously can't link it, because I don't want to get anyone in trouble, but it's a Durango Developer Summit powerpoint presentation called Graphics on Durango. It's an official MS document on the Xbox One from early 2012.

There's a page on ESRAM. This is everything that it says on that particular page.

ESRAM

-- General parameters
32MB
102GB/s
lower latency access
No contention (Front Buffer is in DRAM)

And It's Generic

Durango

Rendering into ESRAM - YES rendering into DRAM - YES
Texturing from ESRAM - YES texturing from DRAM - YES
Resolving into ESRAM - YES resolving into DRAM - YES

Xbox 360

Rendering into EDRAM - YES rendering into DRAM - NO
Texturing from EDRAM - NO texturing from DRAM - YES
Resolving into EDRAM - NO resolving into DRAM - YES

-- Implications

No need to move data into System RAM in order to read.
Full speed FP16x4 writes are possible in limited circumstances.
And something pretty interesting that I've literally been skipping over this entire time, mostly because it didn't strike me as that big a deal is the page that comes right before the ESRAM page, the Memory Management page, and the following is everything that's said on that page.

1TB of Virtual Address space via own MMU
--- A page in ESRAM, DRAM or unmapped

Some Implications
--- Well-defined page miss behavior
--- Memory fragmentation becomes less important
--- Architecturally possible to pass pointers between CPU and GPU
--- Portions of resources could be using ESRAM
Is that passing of pointers between the CPU and GPU a big deal, as I think I remember reading that being one of the big things for hUMA?
 
I fully intend to do just that, so I can get a better understanding of how they arrive at the figure, although I won't be the least bit surprised if we have our answer in the next 4 days or so.



I obviously can't link it, because I don't want to get anyone in trouble, but it's a Durango Developer Summit powerpoint presentation called Graphics on Durango. It's an official MS document on the Xbox One from early 2012.

There's a page on ESRAM. This is everything that it says on that particular page.


And something pretty interesting that I've literally been skipping over this entire time, mostly because it didn't strike me as that big a deal is the page that comes right before the ESRAM page, the Memory Management page, and the following is everything that's said on that page.


Is that passing of pointers between the CPU and GPU a big deal, as I think I remember reading that being one of the big things for hUMA?

Passing pointers is useful if you archieve coherency between CPU and GPU, otherwise you'll have to flush caches (and flush to DDR3 hurts) to take advantage of that feature.
 
And now that I'm thinking about it, that portions of resources could be using ESRAM thing is pretty interesting now, as it sounds like it's trying to say that, thanks to the 1TB of virtual address space via the MMU, a lot more of a game's resources could actually be using ESRAM than what the physical memory size (32MB) would suggest is possible..

Anyway, pretty tired. Good night all.
 

TheD

The Detective
That's the only way you can get 8 cores with Jaguar, PS4 achieves it the same way.

I know the PS4 is also like that, I just do not know why.

If you go to all this trouble to make a custom SoC why not make sure the internal bus that connects the cores is large enough to have 8 of them (if an 8 core crossbar bus is too complex, replace it with a ring bus).
 

KidBeta

Junior Member
And now that I'm thinking about it, that portions of resources could be using ESRAM thing is pretty interesting now, as it sounds like it's trying to say that, thanks to the 1TB of virtual address space via the MMU, a lot more of a game's resources could actually be using ESRAM than what the physical memory size (32MB) would suggest is possible..

Anyway, pretty tired. Good night all.

No, you can still only have 32MB in the eSRAM at any one time, all the unified pointers mean is that is that the GPU is using the same algorithm to translate virtual addresses to real addresses and that the eSRAM is included in this real address space.
 
While googling around, I found a nice quote from 2009.



Seems like the tables have turned, with the PS4 being the (presumably) "easier to program for" console.
No.

Cerny designed the hardware so that they could leverage the power from day one but it has far more potential that developers can unlock once they start to figure out what the capabilities of the console are.

All they have done is lower the barrier for entry, one of the biggest complaints about the PS3 CELL when it launched is that it was hard to leverage the power and had terrible documentation usually in a manual written in Japanese.
 

cyberheater

PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 PS4 Xbone PS4 PS4
I know the PS4 is also like that, I just do not know why.

If you go to all this trouble to make a custom SoC why not make sure the internal bus that connects the cores is large enough to have 8 of them.

Maybe from a manufacturing point of view it was safer relative to fab yields to duplicate the quad core rather then redesign the buss.
 

jaosobno

Member
It based on during only alpha transparency blending. So it just created math.

Its like saying my car can do 250 MPH*

*When drop off a cliff.

It doesnt have real world performance like that.

This just might be one of the greatest analogies ever!
 

ethomaz

Banned
So they did not even go for an 8 core CPU and instead got pretty much 2 quad core CPUs and connected them via the NB.
If I'm not wrong AMD didn't have a native 8 core Jaguar CPU solution... so for 8 core you need to go with 2 quad-core Jaguar.

PS4 uses the same solution.
 

vcc

Member
Looking at it, I can see they will treat it like a turbo charged 360. ESRAM used as framebuffer, and for compute temp storage.

The one advantage the esram has over the 360 is it's addressable, which means your post processing is going to be slicker than the PS4 by about 30gb/s, without stalls. Thats a fair boost.

Man this is going to be so much more even than raw numbers can show I think.

That assumes 204 GB/s is a general bandwidth number; DF suggests it's a special case and the general number is 102 GB/S (maybe 6% more now if the GPU upclock also upclocks the ESRAM). DF implied they got 133 GB/s real life performance by playing instruction tetris with alpha transparency blending.
 

mrklaw

MrArseFace
I agree that the hUMA-term is not very helpful since we all don't seem to know the exact feature set that constitutes a hUMA architecture. In general, the slides align nicely with the leaked documents (after adapting the numbers to the new 853mhz clock). My take is:

  • In both XB1/PS4 the GPU can access main memory by probing the CPU's L2 caches (cache-coherency)
  • In both XB1/PS4 the GPU can (and must to achieve peak bandwidth) bypass the CPU's caches at will (no cache-coherency)
  • ESRAM seems to be a GPU-only scratchpad with a dedicated address space and DMA support via the 4 move engines
  • The PS4 seems to have a finegrained mechanism to bypass GPU cache lines (volatile tag) while the XB1 needs to flush the entire GPU cache
  • XB1 has 2 GFX and 2 ACE processors, the PS4 has 2 GFX and 8 ACE processors which, in combination with the above mentioned volatile tag, shows a bigger emphasize on GPGPU/HSA in general.


Big difference seems to be
- Xbox one has 30GB/s when probing caches, 68Gb/s when not
- PS4 has 10GB/s when probing caches (using onion and onion+), 176GB/s when not

And the Xbox information doesn't cover a couple of this
- how esram affects coherency (my guess would be you just use main memory for a shared space, and any work using esram you have to copy back to that main memory space)
- what about GPU cache? The PS4 bus diagram soecifically mentions onion/onion+ bypassing the GPU cache (and snooping CPU cache), but the Xbox info only talks about CPU cache coherency. Should we assume it is bypassing the GP cache too?
 

ElTorro

I wanted to dominate the living room. Then I took an ESRAM in the knee.
Big difference seems to be
- Xbox one has 30GB/s when probing caches, 68Gb/s when not
- PS4 has 10GB/s when probing caches (using onion and onion+), 176GB/s when not.

That 30GB/s are the total bandwidth that can probe the L2 caches, and that might be higher than the amount the GPU can take. On PS4 the CPU can have 20GB/s while Onion/Onion+ seem to have another additional 10GB/s resulting in 30GB/s of total L2-coherent bandwidth as well.
 
Top Bottom