• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Wii U CPU |Espresso| Die Photo - Courtesy of Chipworks

Thraktor

Member
Here's my read of what's going on inside Core 1:

espresso_core_layout.jpg


There's a lot of guesswork in there on my part, but the L1 cache and L2 tags are pretty straight-forward.

Glossary:

LSU - Load Store Unit
IUs - Integer Units
DCU - Dispatch and Completion Unit
BU - Branch Unit
FPU - Floating Point Unit
GPRs - General Purpose Registers
FPRs - Floating Point Registers
BHT - Branch History Table
BTIC - Branch Target Instruction Cache
 

Thraktor

Member
Question: Why do we have 4 memory blocks for L1?

Broadway had two, Gekko too, probably.

Well, there's separate instruction cache and data cache (I didn't separate them as I can't tell which is which). Each of them is divided into two 16KB blocks.

As to why it's two 16KB blocks as opposed to a single 32KB block, it's probably just the way that IBM decided to go with L1 on their 45nm process, perhaps as it makes it easier to also accommodate cores with 16KB instruction and data caches (eg the A2).
 

tipoo

Banned
I'm wondering how passing data between cores is done on the Espresso? From the rumors, each core has a fixed bit of the L2 cache (1x 2mb, 2x 512kb), rather than a shared cache of any form. In modern AMD and Intel processors usually each core has their own L1 and L2 cache, and then a further large pool of shared L3 cache which all cores have equal access too with a unified address space (and as an aside chips like Piledriver have 4×2 MB L2 cache plus 8MB L3, totaling 16MB cache just on the chip for 8 cores, so 2MB per core). So any information that one core brought into the cache can be accessed by any other core just as quickly from L3 without having to do any time and performance consuming memory moving.

With the Espresso there isn't a unified pool anywhere (afaik), some would say all the memory on the GPU could be used as a sort of L3 cache but that's still theoretical and it would still not quite be the same as on-chip cache.

Anyone wager a guess? For two cores to work on the same data set would they have to do costly swapping?
 
I see Thraktor has made his. I went back the drawing board to make a better one. I went overboard though and focused more on highlighting than labeling. I didn't do all of them as you'll see.

Espresso3.png
 

Orionas

Banned
is there any higher resolution for it? I have lenses 1:120 magnification with tubes for micro photography, I wish I have the chip...

the middle cpu looks more bifi.... Both cpu and gpu are totally from scratch chips as it seems.
 

tipoo

Banned
Everyone keeps saying the middle core looks different, how so? Apart from the expected added cache access (so more L2 cache tags) it looks identical to me. Looking at bgassassins pic there, the red squares look like the only difference to me and those are cache related L2 tags.
 
I'm not sure if I'm the only one who has said this, but that doesn't look very much like a PowerPC 750 core. In fact, I don't think that they look that similar to the Broadway core (but I could be missing something).
 

tipoo

Banned
So with Marcans picture, the upper blocks being fuse banks is confirmed, thermal sensor we didn't know about, PLL and boot rom location spotted. Nice to know, if nothing unexpected.


He also added this?
Silicon-wise, it's all but efficient. There's a metric fuckload of empty space in that shot ;)
And
It's totally synthesized from scratch (though we *know* that they're 750s). There's a bunch of new stuff for the new L2/coherency subsys.
 

krizzx

Junior Member
marcan posted this on his twitter, no idea if it gives you "know hows" in here any additional information.



https://twitter.com/marcan42/status/303157344429281281

Yes, I was correct Those were the fuses. Not sure if it was any help though.


There are still 4 objects not labeled on the die. The thing to the left of the PLL. The 2 square objects under the fuse banks and the rectangular object to the right of them. Though I figured a figured a thermal sensor would only be one component. Are all three of them part of the thermal sensor?

Now all that is label the cores.

Of course that still wouldn't tell us much about the actually capability without being able run the chip and test ourselves.

Silicon-wise, it's all but efficient. There's a metric fuckload of empty space in that shot ;)
I do not understand the purpose of the ";)". Does this make him happy?
 

xemumanic

Member
I have a question, and I'd like some confirmation or clarification from those here in the know.

I'd like to first preface this by saying I hope this question doesn't come off like the whole "2 you know what duct taped together" type of thing (because I REALLY don't mean it in any negative tone, quite the opposite in fact) but.........

Is it accurate to say that Espresso is to Broadway as Broadway was to Gekko? Basically that for the Wii, they took the base custom PowerPC 750 derivative design of the Gekko, and iterated on that, and have done so again for the Wii U. Especially since the Espresso, to quote the OP, "offers hardware backwards compatibility with Gekko/Broadway, and is very likely based on the same PPC 750 core".

I know this is also a massive oversimplification of things, but I ask because it was always my opinion that GC/Wii, and now Wii U, are a part of a multi-generational 'family' of the overall custom PowerPC 750 & ATI/AMD GPU design Nintendo has used for 3 successive consoles now.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
I have a question, and I'd like some confirmation or clarification from those here in the know.

I'd like to first preface this by saying I hope this question doesn't come off like the whole "2 you know what duct taped together" type of thing (because I REALLY don't mean it in any negative tone, quite the opposite in fact) but.........

Is it accurate to say that Espresso is to Broadway as Broadway was to Gekko? Basically that for the Wii, they took the base custom PowerPC 750 derivative design of the Gekko, and iterated on that, and have done so again for the Wii U. Especially since the Espresso, to quote the OP, "offers hardware backwards compatibility with Gekko/Broadway, and is very likely based on the same PPC 750 core".

I know this is also a massive oversimplification of things, but I ask because it was always my opinion that GC/Wii, and now Wii U, are a part of a multi-generational 'family' of the overall custom PowerPC 750 & ATI/AMD GPU design Nintendo has used for 3 successive consoles now.
Yes, you can say that Gekko->Broadway->Espresso are evolutionary stages of the same design. Actually, you can trace that back even further to the ppc "G2" - ppc603->ppc750->Gekko->Broadway->Espresso. Of those stages, the Gekko->Broadway is allegedly the smallest transition, more like revisions of the same design. The rest of the steps are actually notable advancements, though (apparently Espresso still pending complete unveiling).
 
My biggest question about the CPU at this moment is:
If backwards compatibility uses one of the cores, I presume that at the same speed than Broadway (729 Mhz), and halves the L2 cache to match the amount of L2 found on the Broadway... does this means that the latency of the L2 eDram cache is as low as the SRAM used on the Wii (in terms of read/write cycles)?
If that's the case, then those 3MB of L2 cache is a win-win scenario...
 

ozfunghi

Member
I have a question, and I'd like some confirmation or clarification from those here in the know.

I'd like to first preface this by saying I hope this question doesn't come off like the whole "2 you know what duct taped together" type of thing (because I REALLY don't mean it in any negative tone, quite the opposite in fact) but.........

Is it accurate to say that Espresso is to Broadway as Broadway was to Gekko? Basically that for the Wii, they took the base custom PowerPC 750 derivative design of the Gekko, and iterated on that, and have done so again for the Wii U. Especially since the Espresso, to quote the OP, "offers hardware backwards compatibility with Gekko/Broadway, and is very likely based on the same PPC 750 core".

I know this is also a massive oversimplification of things, but I ask because it was always my opinion that GC/Wii, and now Wii U, are a part of a multi-generational 'family' of the overall custom PowerPC 750 & ATI/AMD GPU design Nintendo has used for 3 successive consoles now.

GPU is completely different. CPU, see Blu's answer.
 

Xun

Member
Sorry for beating what is probably a dead horse, but the Wii U can technically play GameCube games, no?

Also I know there is much confusion with the Wii U's power, but it's looking like a pretty decent piece of kit now right? The comments from Criterion seem to back this up.
 

xemumanic

Member
Yes, you can say that Gekko->Broadway->Espresso are evolutionary stages of the same design. Actually, you can trace that back even further to the ppc "G2" - ppc603->ppc750->Gekko->Broadway->Espresso. Of those stages, the Gekko->Broadway is allegedly the smallest transition.

Thanks. I especially like your choice of words, I had trouble finding the right words to explain what I meant.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Like the Intel 80386 and i7 is build on IA-32.
And yet the i7 is more capable.

It's witchcraft.
Actually, i7 is of the pentium pro lineage. Atom is of the p5 lineage, and thus, a closer relative to the 386 than i7 is.
 

tipoo

Banned
Sorry for beating what is probably a dead horse, but the Wii U can technically play GameCube games, no?

Also I know there is much confusion with the Wii U's power, but it's looking like a pretty decent piece of kit now right? The comments from Criterion seem to back this up.


Not sure. The later Wiis were stripped of Gamecube backwards compatibility if I'm not mistaken? So being able to play Wii games doesn't necessitate it also being perfectly compatible with Gamecube games. I'm not completely sure though. Older console games will be available on virtual console (like N64 and prior stuff), but I'm not sure it can emulate GC.
 

Orayn

Member
Sorry for beating what is probably a dead horse, but the Wii U can technically play GameCube games, no?

Also I know there is much confusion with the Wii U's power, but it's looking like a pretty decent piece of kit now right? The comments from Criterion seem to back this up.

Apparently people used homebrew to make its Wii mode to boot into Gamecube mode, and it was only limited by the I/O problem of having no Gamecube controllers.
 

Clockwork

Member
Not sure. The later Wiis were stripped of Gamecube backwards compatibility if I'm not mistaken? So being able to play Wii games doesn't necessitate it also being perfectly compatible with Gamecube games. I'm not completely sure though. Older console games will be available on virtual console (like N64 and prior stuff), but I'm not sure it can emulate GC.

For the Wii, the core architecture didn't change. Removal of GC BC from the OS as well as taking out the controller ports and memory card slots was strictly a cost saving measure.
 

xemumanic

Member

If I had instead said "& *A* ATI/AMD GPU design, would you get what I was saying better? Because I'm not trying to imply there was any sort of generational relation to the GPUs Nintendo ever used, Just that ATi/AMD always provided the GPU. Nor did I make any mention of the GPU other than that part.
 

ozfunghi

Member
If I had instead said "& *A* ATI/AMD GPU design, would you get what I was saying better? Because I'm not trying to imply there was any sort of generational relation to the GPUs Nintendo ever used, Just that ATi/AMD always provided the GPU. Nor did I make any mention of the GPU other than that part.

Just pointing out why you got my reply. And no, it wouldn't have made a difference imo. Should've been something like "IBM CPU for 3 consecutive consoles and an AMD/ATI GPU" imo. But who cares. Only gave that answer because of your wording. No need to make a big deal out of it.
 
My biggest question about the CPU at this moment is:
If backwards compatibility uses one of the cores, I presume that at the same speed than Broadway (729 Mhz), and halves the L2 cache to match the amount of L2 found on the Broadway... does this means that the latency of the L2 eDram cache is as low as the SRAM used on the Wii (in terms of read/write cycles)?
If that's the case, then those 3MB of L2 cache is a win-win scenario...

Actually.. that is a good question. Going by the technical information that we have , eDRAM blocks less than 4MB should have higher latency than SRAM. Perhaps it close enough that it doesn't make much of a difference?
 
Actually.. that is a good question. Going by the technical information that we have , eDRAM blocks less than 4MB should have higher latency than SRAM. Perhaps it close enough that it doesn't make much of a difference?
Well, I'm not an expert in the subject, but... even if the difference was as little as 1 clock cycle, wouldn't that mess completely the performance of the WiiU in compatibility mode? This of course supposing that WiiU downclocks itself to the Wii speeds.

If this is not the case, would then be possible that the cache L2 memory works at a higher clock than the rest of the CPU to compensate this increase in latencies?

Regards!
 

tipoo

Banned
The L2 SRAM on the Wiis Broadway processor would have run at most at the clock rate of the main core, possibly less (do we know? 729MHz at most though). So even if eDRAM inherently has a longer latency under 4MB than SRAM, the SRAM in the Broadway wouldn't have been particularly fast to start with, so maybe that's why eDRAM can emulate it.

If you think about it, at any given clock speed SRAM would have a direcly related latency, the more time each cycle takes the higher the latency. Since the Wii was only at 729MHz the latency on the eDRAM on the Wii U doesn't have to be crazy low to match it.
 

tipoo

Banned
Did we know this before? IBM actually re-released a PPC750 variant called the 750CL fairly recently, in 2006. It only differed from the original in that the new design allowed for higher clock speeds, and IBM had models up to 1GHz.


http://en.wikipedia.org/wiki/Broadway_(microprocessor)
http://en.wikipedia.org/wiki/PowerPC_G3#PowerPC_750CL

And some IBM technical whitepapers

https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/53757AB1709404FC8525762A0032A39F/$file/To%20CL%20-%20From%20750GX%209-10-09.pdf
https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/2F33B5691BBB8769872571D10065F7D5/$file/750cldd2x_ds_v2.6_16Oct2009dft.pdf

The PowerPC 750CL and the PowerPC 750GX are architecturally identical; both processors are
PowerPC Architecture books 1, 2, & 3 compliant. The primary technical differences between the two
devices are in the area of special features and some minor electrical, timing, and pinout differences

This section summarizes the features of the 750CL implementation of the PowerPC Architecture™. Major features of the 750CL include the following:
•Branch processing unit
•Fetches four instructions per clock
•Processes one branch per cycle and can resolve two speculations
•Executes single speculative stream during fetch of another speculative stream
•Has a 512-entry branch history table (BHT) for dynamic prediction
•Dispatch unit
•Has full hardware detection of dependencies, which are resolved in the execution units
•Dispatches two instructions to six independent units (system, branch, load/store, fixed-point unit 1, fixed-point unit 2, or floating-point)
•Has serialization control (predispatch, postdispatch, execution, serialization)
•Decode
•Register file access
•Forwarding control
•Partial instruction decode
•Load/store unit
•Has single-cycle load or store cache access (byte, halfword, word, doubleword)
•Has effective address generation
•Allows hits under misses (one outstanding miss)
•Has single-cycle misaligned access within a doubleword boundary
•Has alignment, zero padding, sign extend for integer register file
•Converts floating-point internal format (using alignment and normalization)
•Sequences for load/store multiples and string operations
•Has store gathering
•Has cache and translation lookaside buffer (TLB) instructions
•Supports big-endian and little-endian byte addressing
•Supports misaligned little-endian in hardware
Datasheet
DD2.X
PowerPC 750CL Microprocessor
General Information
Page 10 of 70
Version 2.6
October 16, 2009
•Fixed-point units
•Fixed-point unit 1 (FXU1): multiply, divide, shift, rotate, arithmetic, logical
•Fixed-point unit 2 (FXU2): shift, rotate, arithmetic, logical
•Single-cycle arithmetic, shift, rotate, logical
•Multiply and divide support (multi-cycle)
•Early out multiply
•Floating-point unit
•Support for IEEE-754 standard single-precision and double-precision floating-point arithmetic
•3-cycle latency, 1-cycle throughput, single-precision multiply-add
•3-cycle latency, 1-cycle throughput, double-precision add
•4-cycle latency, 2-cycle throughput, double-precision multiply-add
•Hardware support for divide
•Hardware support for denormalized numbers
•Time deterministic non-IEEE mode
•System unit
•Executes Condition Register (CR) logical instructions and miscellaneous system instructions
•Has special register transfer instructions
•Level 1 (L1) cache structure
•32 KB, 32-byte line, 8-way set-associative instruction cache
•32 KB, 32-byte line, 8-way set-associative data cache
•Single-cycle cache access
•Pseudo least-recently-used (PLRU) replacement
•Copy-back or write-through data cache (on a page-per-page basis)
•Supports PowerPC memory coherency modes
•Nonblocking instruction and data cache (supports hits under one outstanding miss)
•No snooping of instruction cache
•Memory management unit
•128 entry, 2-way set-associative instruction TLB
•128 entry, 2-way set-associative data TLB
•Hardware reload for TLBs
•Eight instruction block address translation (BAT) arrays and eight data BATs
•Virtual memory support for up to 4 petabytes (252) of virtual memory
•Real memory support for up to 4 gigabytes (232) of physical memory
•Level 2 (L2) cache
•256 KB, 64-byte line, 2-way set-associative on-chip cache memory
•Internal L2 cache controller with 2 K-entry tag array
•Copy-back or write-through data cache (on a page basis, or for all L2)
Datasheet
DD2.X
PowerPC 750CL Microprocessor
Version 2.6
October 16, 2009
General Information
Page 11 of 70
•64-byte cache line organized as two 32-byte sectors
•L2 frequency at core speed
•Selectable 32-byte, 64-byte, or 128-byte L2 cache loads
•Error correction code (ECC) protection on cache array
•Bus interface
•Compatible with the 60x processor interface
•Has a 32-bit address bus
•Has a 64-bit data bus (also supports 32-bit data bus mode)
•Supports bus-to-core frequency multipliers of 2x, 2.5x, 3x, 3.5x, 4x, 4.5x, 5x, 5.5x, 6x, 6.5x, 7x, 7.5x, 8x, 8.5x, 9x, 9.5x, and 10x
•Bus transaction pipeline depth of 2, 3, or 4 transactions (selectable)
•Testability
•Level sensitive scan design (LSSD)
•JTAG interface


So there's that 60x interface Marcan was wondering about.
 
Is there a problem with the image for some people? I can try rehosting on a different site later today if people can't see the larger version.
 
Did we know this before? IBM actually re-released a PPC750 variant called the 750CL fairly recently, in 2006. It only differed from the original in that the new design allowed for higher clock speeds, and IBM had models up to 1GHz.

I thought the Wii U runs at around 1.2Ghz?
 
The L2 SRAM on the Wiis Broadway processor would have run at most at the clock rate of the main core, possibly less (do we know? 729MHz at most though). So even if eDRAM inherently has a longer latency under 4MB than SRAM, the SRAM in the Broadway wouldn't have been particularly fast to start with, so maybe that's why eDRAM can emulate it.

If you think about it, at any given clock speed SRAM would have a direcly related latency, the more time each cycle takes the higher the latency. Since the Wii was only at 729MHz the latency on the eDRAM on the Wii U doesn't have to be crazy low to match it.
But if WiiU when emulating Wii is downclocked to 729Mhz as well (the CPU), then it means that eDram cache has the same latency than SRAM cache on Wii in cycles (not nanoseconds).

Or could it be that eDram on WiiU CPU is clocked higher (proportionally speaking) than the Wii. For example, at 2x the CPU clock?
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Did we know this before? IBM actually re-released a PPC750 variant called the 750CL fairly recently, in 2006. It only differed from the original in that the new design allowed for higher clock speeds, and IBM had models up to 1GHz.


http://en.wikipedia.org/wiki/Broadway_(microprocessor)
http://en.wikipedia.org/wiki/PowerPC_G3#PowerPC_750CL

And some IBM technical whitepapers

https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/53757AB1709404FC8525762A0032A39F/$file/To%20CL%20-%20From%20750GX%209-10-09.pdf
https://www-01.ibm.com/chips/techlib/techlib.nsf/techdocs/2F33B5691BBB8769872571D10065F7D5/$file/750cldd2x_ds_v2.6_16Oct2009dft.pdf


So there's that 60x interface Marcan was wondering about.
Tipoo, you haven't been really paying attention. Broadway's 'stock-ification' as 750CL has been known for, well, many years.
 
But if WiiU when emulating Wii is downclocked to 729Mhz as well (the CPU), then it means that eDram cache has the same latency than SRAM cache on Wii in cycles (not nanoseconds).

Or could it be that eDram on WiiU CPU is clocked higher (proportionally speaking) than the Wii. For example, at 2x the CPU clock?

Hmm. The difference in latency between the SRAM L2 of Broadway and eDRAM L2 of Espresso is something worth pondering. I'm pretty sure, however, that the eDRAM is not clocked higher than the CPU. AFAIK, the Broadway core supports L2 cache speed of 1/2 the CPU clock at the highest. This is pretty common in all IBM designs even today.

Do we have any speculation on the internal bus here? Is it the EIB "ring bus" we've heard rumors of? If so, on Cell, that also runs at 1/2 CPU core clock speed.
 

tipoo

Banned
I thought the Wii U runs at around 1.2Ghz?

It does, but I had missed the part where IBM retooled the 750 to run at higher clocks I guess. I thought this was the highest clocked one by a wider margin, but 1.0-1.2 isn't that big a leap compared to what the old 750s ran at.

Blu, I guess I missed it, I knew it was 750 based since Marcan originally tweeted that, but I must have missed the part where the 750 being re-released in 2006 was talked about.
 

wsippel

Banned
It does, but I had missed the part where IBM retooled the 750 to run at higher clocks I guess. I thought this was the highest clocked one by a wider margin, but 1.0-1.2 isn't that big a leap compared to what the old 750s ran at.

Blu, I guess I missed it, I knew it was 750 based since Marcan originally tweeted that, but I must have missed the part where the 750 being re-released in 2006 was talked about.
It wasn't "re-released". It was a new version, based on Broadway. All the unique features mentioned in the documentation are Gekko/ Broadway features. The whole 750 line was still very much active back then, with one new version released every two years: 750 in 1998, 750CX in 2000, 750FX in 2002, 750GX in 2004 and 750CL in 2006.


EDIT: This appears to be a "regular" 750 - no idea which one, though:


Found by Birdman on Twitter.
 
One without L2 on-die. So a PPC 750/755 (1997-1999).

Great find.

EDIT: according to Birdman's Die Shot Collection thread, it's 200nm, so it's a 1999 PPC750 core shrink just before PPC750 CX. At 40 mm² it's surely the one Nintendo used on their own Gekko promo photos instead of an actual Gekko.

That thread is beautiful, there should be more people preserving die shots like that.
 
Top Bottom