• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

The NEW Xbox one thread of Hot Chips and No Pix

Klocker

Member
Can somebody explain what "power islands and clock gating to 2.5% of full power" means??

The spu gates off sections of the chip to reduce power use when not needed (streaming tv or watching a movie or in standby) to run in extremely low power mode but still remain "on"
 

Osiris

I permanently banned my 6 year old daughter from using the PS4 for mistakenly sending grief reports as it's too hard to watch or talk to her
You gotta remember there's 3GB of RAM dedicated to the OS. If the OS is on the flash portion, they'll just be loading off of that.

At 200MB/s (Max eMMC4.5 bandwidth) it's guaranteed to be long-term storage / caching, not "memory" as some here are seeing it. I think it'll be used to cache and speed up the OS. (Which on-disk will be larger than 8Gb)
 
So the 15 special purpose processors are as follows?

1) av out / resize compositor
2) av in
3) video encode
4) video decode
5) swizzle copy lz encode
6) swizzle copy lz/mjpeg decode
7+8) swizzle copy x 2

audio :
9) C something something dsp (top blue box in audio processor)
10) scalar dsp core
11+12) vector dsp core x 2
13) sample rate converter
14) audio dma
15) the last orange box covered by the guy's head
 
Did we really learn anything new today, or is this just fleshing out a few details?

We learned quite a bit that was new. We learned more about the coherency in the system, we learned how they implemented the ESRAM. It's 4 x 8MB slices of ESRAM. We were thinking of it before like one huge 32MB slice, or at least I was.

We learned max theoretical bandwidth of the ESRAM. But beyond the boring numbers, we learned a lot of interesting info that is finally being detailed in a way it hasn't been for the Xbox One, but already has been to some extent for the PS4 by Cerny. We learned quite a bit about the audio portion of the console, and how it has in excess of a full core worth of performance. It's also a chip or DPU designed by tensilica, which I'm learning is some impressive stuff.
 

EdgeXL

Member
At 200MB/s (Max eMMC4.5 bandwidth) it's guaranteed to be long-term storage / caching, not "memory" as some here are seeing it. I think it'll be used to cache and speed up the OS. (Which on-disk will be larger than 8Gb)

Yes, my mistake referring to it as flash RAM, I meant to call it memory and failed.

Thanks for the responses. The information is clearer to me now.
 

Proelite

Member
So the 15 special purpose processors are as follows?

1) av out / resize compositor
2) av in not sure how this can be a processor
3) video encode
4) video decode
5) swizzle copy lz encode
6) swizzle copy lz/mjpeg decode
7+8) swizzle copy x 2

audio :
9) C something something dsp (top blue box in audio processor)
10) scalar dsp core
11+12) vector dsp core x 2
13) sample rate converter
14) audio dma
15) the last orange box covered by the guy's head That's the GPU

Yeah... I wouldn't bother trying.
 

Bsigg12

Member
At 200MB/s (Max eMMC4.5 bandwidth) it's guaranteed to be long-term storage / caching, not "memory" as some here are seeing it. I think it'll be used to cache and speed up the OS. (Which on-disk will be larger than 8Gb)

Interesting so kinda like a hybrid hard drive?
 
http://www.eetimes.com/document.asp?doc_id=1319316&cid=SM_ELE_EET_Edit

"I'll provide more details about the paper and design later, including interviews with the two Microsoft engineers presenting the chips."

Yes, please! Can't wait to read that stuff.

Oh, and big thanks to the mods, so we can get some decent discussion going on this. Fantastic moderating.

That whatever SHAPE handles would usually require one CPU core.

It's cool to know this, since it was one of the bigger questions regarding the Xbox One. It was thought that the audio block could offload some work off of the CPU, but we were never really certain of just how much, and now we have an answer.
 

Klocker

Member
What does this mean?

It has the same processing power of a jaguar core so what would have been done on the cpu for sound on 360 for example will now be able to be done (+more) on the audio chip and not touch the cpu for audio task,,,,essentially freeing up a cpu core
 

tokkun

Member
I am curious about whether the eSRAM in the same address space as the rest of the memory (i.e. is this a scratchpad or a cache)?

Devs must know the answer, but I'm not sure if it's been made public yet.
 
both next gen consolle have flash memory on board ? i never heard anything about this additional memory. will it be used for gameplay recording? if ys x1 should be able to record more than 7 minutes of video
 

Phawx

Member
Interesting so kinda like a hybrid hard drive?

Yea, there are interesting ways to cache data. One nice easy way is to just link LBA sectors to do block-level cache instead of file-level cache.

Alternatively, you could let the developer have a portion of the flash for quickly jumping into a game and then pull larger blocks of assets from the hdd.

Does anyone know if they mentioned anything about over provisioning?
 

pixlexic

Banned
both next gen consolle have flash memory on board ? i never heard anything about this additional memory. will it be used for gameplay recording? if ys x1 should be able to record more than 7 minutes of video

ps4 does not. It will have to both stream data and cache on the same hdd, which can not be done at the same time.
 

Phawx

Member
both next gen consolle have flash memory on board ? i never heard anything about this additional memory. will it be used for gameplay recording? if ys x1 should be able to record more than 7 minutes of video

I don't think they'd waste flash for video storage. I don't think you'll be getting raw video to tinker with before uploading somewhere.

Also I don't think the flash would last all that long used as a constant buffer for video.
 

Satchel

Banned
That 8 gig of flash memory is a big pro. I hope it is all usable for gaming and not just to pool data for back ground processes like video recording.

My hope is that they use it to enhance the game in any way possible. Caching, speed up load times somehow? Whatever they can come up with.
 

Chobel

Member
That whatever SHAPE handles would usually require one CPU core or more.

How much CPU power is needed to handle audio in games? I always assumed it doesn't take much and it's negligible in comparison to other tasks like physics, animation...
I think The better question would be: is there some audio tasks (in games) that take so much CPU power that warrant reserving one CPU core or more?

My hope is that they use it to enhance the game in any way possible. Caching, speed up load times somehow? Whatever they can come up with.

I think that's 8GB DDR3 is more than enough to do the trick, just load once in RAM.
 

Klocker

Member
How much CPU power is needed to handle audio in games? I always assumed it doesn't take much and it's negligible in comparison to other tasks like physics, animation...
I think The better question would be: is there some audio tasks (in games) that take so much CPU power that warrant reserving one CPU core or more?

Some 360 games use up to an entire core on cpu for audio only
 
Credit from here

gqyaZAM.png

What's good about being able to get a better look at this specific diagram is that Microsoft point out quite clearly that the Non-CPU-Cache Coherent DRAM Access of 68GB/s also includes the 30GB/s CPU-Cache-Coherent bandwidth, showing that they aren't trying to confuse people into thinking that 68 + 30 would somehow give you 98GB/s. The extra clarification is helpful.

That's really important because when you consider that with the 204GB/sec peak of the ESRAM they don't state that it's combined with anything else, like they did for the 68GB/s, it further suggests that the 204GB/s really is all coming from the ESRAM.

Some 360 games use up to an entire core on cpu for audio only

Some used almost 2 cores entirely according to one of the engineers that worked on the Xbox One's audio chip.

My hope is that they use it to enhance the game in any way possible. Caching, speed up load times somehow? Whatever they can come up with.

I'm also hoping they use it this way. Seems pretty damn helpful if it's use like we think it is.
 
How much CPU power is needed to handle audio in games? I always assumed it doesn't take much and it's negligible in comparison to other tasks like physics, animation...
I think The better question would be: is there some audio tasks (in games) that take so much CPU power that warrant reserving one CPU core or more?



I think that's 8GB DDR3 is more than enough to do the trick, just load once in RAM.


Audio tasks can be very processor intensive. Some games had to reserve one of the xbox 360's cores entirely to audio.
 

chadskin

Member
Where do the 47 megabytes "of storage on chip" come from? If they're referring to the eSRAM, isn't that supposed to be 32 megabytes?

Also, can someone finally clear up the age-old question "hUMA" or "nO hUMA"?
 

strata8

Member
Where do the 47 megabytes "of storage on chip" come from? If they're referring to the eSRAM, isn't that supposed to be 32 megabytes?

Also, can someone finally clear up the age-old question "hUMA" or "nO hUMA"?

32MB eSRAM + 4MB CPU L2 + ???
 

Phawx

Member
How much CPU power is needed to handle audio in games? I always assumed it doesn't take much and it's negligible in comparison to other tasks like physics, animation...
I think The better question would be: is there some audio tasks (in games) that take so much CPU power that warrant reserving one CPU core or more?

It's less about if it's negligible or not and more about exact specifications. Devs don't want to have to worry about random IO or spikes in some subsystem that could potentially affect performance.

So when thinking it's 'negligible' you need to think of worst case scenario. Hence dedicating an entire core, just to guarantee available resources.
 
Where do the 47 megabytes "of storage on chip" come from? If they're referring to the eSRAM, isn't that supposed to be 32 megabytes?

Also, can someone finally clear up the age-old question "hUMA" or "nO hUMA"?

Definitely some form of it from the sounds of things, but it seems like AMD and Microsoft worked together to give the Xbox One it's own unique implementation based on the desire to use the ESRAM in the design. It seems that Host-Guest GPU MMU is the key piece in the puzzle that helps to route the ESRAM towards achieving coherency. Because in one of the diagrams, you see a blue coherency pathway leading directly from the GPU MMU, the very thing the ESRAM is piped into.
 

USC-fan

Banned
Also, can someone finally clear up the age-old question "hUMA" or "nO hUMA"?

It clear it doesnt support Huma or HSA. Not that big of deal right now. It would have been a nightmare and i think hurt the performance. The esram is worth the trade off.

I'll start: where does "204MB/s" come from? we know It is assumed that bus from eSRAM to GPU is not dual so how can it be possible to read and write simultaneously? even if it's possible why is it not 109*2 =218 GB/s?
It based on during only alpha transparency blending. So it just created math.

Its like saying my car can do 250 MPH*

*When drop off a cliff.

It doesnt have real world performance like that.
 
I don't think they'd waste flash for video storage. I don't think you'll be getting raw video to tinker with before uploading somewhere.

Also I don't think the flash would last all that long used as a constant buffer for video.

do you think is possible that the x1 have an hibernation mode that transfer the full ddr3 content in this flash memory or this operation will cause the same problem ?
 

Klocker

Member
Mind if you give an example?

He worked on 360 and helped design the one audio chip....

For audio work, which generally involves highly optimized vector operations, the 360 and the Jaguar CPU in the new consoles are roughly equal


On the 360, there is hardware for decoding XMA files, which is a much simpler subset of WMA. XAudio2 allows decoding of xWMA files too, but that's CPU side software only. The XMA decoder chip is rated at 320 channels, but in reality it generally maxes out lower than that. The 256 audio channels was calculated using a full core I believe, and that's using a very simple linear interpolation SRC, and possibly a filter and volume per channel.

All audio on the 360, other than XMA decompression, is software and uses the main CPU. Party chat, including codecs and mixing, happen in the system reservation. Game Chat, Kinect MEC and voice recognition, and all game audio happen in the game process and use game resources, including memory and CPU. Game audio frequently uses an entire hardware thread, and I've seen games where it uses 3 hardware threads. Car racing games, in particular, can use upwards of a hundred voices on a single car.
 

Sounddeli

Banned
PALO ALTO, Calif. — Microsoft described the SoC of its upcoming Xbox One and upgraded Kinect sensor at the annual Hot Chips conference here. The SoC is among the first to give CPU and GPU cores equal access to shared memory, while the sensor provides a new level of game play by using a homegrown time-of-flight sensor.

The SoC techniques are in some ways pioneering work for the AMD-led Heterogeneous Systems Architecture Foundation. The Kinect upgrade will create a new level of recognition and thus game play.

I'll provide more details about the paper and design later, including interviews with the two Microsoft engineers presenting the chips. For now, check out the slides.

]
174101_354259.jpg

XBox One sports four custom chips designed by Microsoft.

174052_708880.jpg

The main SoC by the numbers.

174106_969203.jpg

The main SoC puts CPUs and GPUs on a common coherent memory bus.

174115_349504.jpg

The graphics core is based on an AMD Radeon.

174121_514209.jpg

Microsoft used Tensilica for its audio and some many other ancillary cores

174130_010115.jpg

Microsoft developed its own time-of-flight sensor for a 1080p camera optimized for low light.[/center]
 

USC-fan

Banned
Really? Comments during the presentation seem to be implying quite the opposite. There was even a direct mention of what they're doing being similar to or exactly like HSA.
Sure it have a bus at 30GB/s that act like HSA but it is not HSA. The esram memory make it a nightmare to put real HSA on there but again the trade off is worth it. The performance gains from Esram is worth wayyy more than HSA.
 
Where do the 47 megabytes "of storage on chip" come from? If they're referring to the eSRAM, isn't that supposed to be 32 megabytes?

Also, can someone finally clear up the age-old question "hUMA" or "nO hUMA"?

32 + 4MB of L2 cache on CPU + L1 cache + whatever the fuck the rest of it is. ~1mb is on the audio chip... Maybe 4 or so MBs on the GPU... don't know where the rest is.

Also, not hUMA, but a solution to accommodate for it.
 

Satchel

Banned
Could they "install" the OS onto the 8GB of flash and then allocate a tiny portion of RAM to "run" it?

Freeing up RAM to be used for games.
 

tokkun

Member
Also, can someone finally clear up the age-old question "hUMA" or "nO hUMA"?

First of all, let me just state that I think hUMA is a stupid term and hope it dies a horrible death.

That said, it clearly has heterogeneous processors connected to the same pool of memory, and clearly offers some degree of cache coherence.

It is not clear (to me, maybe the information is out there and I haven't seen it), what the story is with the eSRAM.

Whether AMD marketing want to label this implementation as hUMA or not is, from my perspective, utterly meaningless. We can discuss the pros and cons of the implementation without acting as if its consonance with AMD jargon is actually of any importance.

Definitely some form of it from the sounds of things, but it seems like AMD and Microsoft worked together to give the Xbox One it's own unique implementation based on the desire to use the ESRAM in the design. It seems that Host-Guest GPU MMU is the key piece in the puzzle that helps to route the ESRAM towards achieving coherency. Because in one of the diagrams, you see a blue coherency pathway leading directly from the GPU MMU, the very thing the ESRAM is piped into.

As posted above, I'm still interested in knowing if the eSRAM is in the same address space, because if it is a scratchpad with its own address space, then it doesn't need to have a coherence mechanism.
 
Sure it have a bus at 30GB/s that act like HSA but it is not HSA. The esram memory make it a nightmare to put real HSA on there but again the trade off is worth it. The performance gains from Esram is worth wayyy more than HSA.

Pretty much echos what I've heard from a friend in development, not regarding HSA and ESRAM and the comparative benefits of one over the other, but regarding the performance implications of ESRAM.

As posted above, I'm still interested in knowing if the eSRAM is in the same address space, because if it is a scratchpad with its own address space, then it doesn't need to have a coherence mechanism.

Well, from what I know from an internal dev document that Microsoft gave to developers, the ESRAM is most definitely a generic scratchpad, but I don't know if that automatically means it has its own address space or not. Does being a scratchpad automatically suggest it must have its own address space?

From the document that I'm looking at currently, you can render into ESRAM
You can texture from ESRAM (You couldn't with EDRAM on the 360)
You can resolve into ESRAM (You couldn't with EDRAM on the 360)

And some implications of the choice of ESRAM No need to move data into System RAM in order to read.
Full speed FP16x4 writes are possible in limited circumstances (Those full speed writes are precisely the ones mentioned in the DF article about the increase in the Xbox One's memory performance, so even in this early 2012 document, it clearly implies that there are limited circumstances in which the ESRAM might be able to achieve higher levels of performance, indicating that the possibility of the ESRAM being more capable than expected isn't something that entirely caught Microsoft by surprise.)
 

USC-fan

Banned
One thing is very clear, they spent a ton of money of kinect 2.0 in xbone. See how much this pays off in the end. If they didnt have kinect and the custom chips i could see the xbone being $350.

Seems like they double down on kinect with xbone.
 

AgentP

Thinks mods influence posters politics. Promoted to QAnon Editor.
Could they "install" the OS onto the 8GB of flash and then allocate a tiny portion of RAM to "run" it?

Freeing up RAM to be used for games.

The flash will be incredibly slow compared to the DDR3, it will not be used for anything which requires performance.


Some 360 games use up to an entire core on cpu for audio only

Which is very little (<10%) of a Jaguar core. MS needed the extra horse power for Kinect otherwise they could have saved the time, money and die space.
 

M_A_C

Member
So I think the big question is, with all this new info, have we learned anything that would make us think the gap between the two consoles is lessened?
 

JaggedSac

Member
32 + 4MB of L2 cache on CPU + L1 cache + whatever the fuck the rest of it is. ~1mb is on the audio chip... Maybe 4 or so MBs on the GPU... don't know where the rest is.

Also, not hUMA, but a solution to accommodate for it.

Based on the slide, there is about 232 kb of cache on the audio chip.
 
I'm guessing the Xbox One doesn't have hUMA according to the second pic.

The second pic shows that the coherent memory is just "cache" (not really) for the CPU.
 
Top Bottom