• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

WiiU "Latte" GPU Die Photo - GPU Feature Set And Power Analysis

Status
Not open for further replies.
Xbone-r (I'm gonna coin that now) is using x86, Wii U is using PowerPC.
I don't think that this is important to the ones programming the games. I mean, if they don't program on an extremely low level, the compiler should the work.

I think that in terms of CPU architecture what's more important in terms of similarity is cache sizes, clock speed, number of cores, OOE vs IOE and some specific instructions (like 4 vector SIMD vs 2 vector SIMD on the WiiU).

In comparison to the current gen, the WiiU has a CPU that resembles more the ones found on XBone and PS4, but the fact that those CPUs have 8 cores vs 3 on the WiiU can be kinda problematic (although 3 vs 8 is still much better than 1 vs 3 like the Wii was against the X360).
 

wsippel

Banned
Both are not using eDRAM. THe One uses eSRAM, the U uses eDRAM. eDRAM is 3x as dense, however it must refresh its cells and under 4MB has a higher latency.
Read what I posted a couple hours ago. According to Marcan, Wii U's eDRAM looks like it's pseudostatic. It probably has to be, if only for backwards compatibility. So no refresh cycles here - same latency as real SRAM.
 
Your entire scenario is different than mine.

Its seems me and you have very different concepts of what "maxing" a system out is. By your logic, when Insomniac said Resistance maxed out the capabilities of the PS3 it was true. Then they come back and say they got 40% more performance with Ratchet and Clank and like 80% more performance with Resistance 2. So all of those games maxed out its system by your methods. Only the one that did the most did according to my concept of maxing out.
Quite the opposite.

Factor 5 used everything they had there for graphics, and it conditioned their games.

That's why... say, Zelda runs at 30 FPS and Mario games at 60 FPS; they're actually based on the same tech/engines, but Mario doesn't really bother with detailed shadows (other than Mario's) and uses polygons for them. Zelda uses extruded shadow maps, saved as a buffer for a lot of things, so it's more intensive. And that's not the only intensive thing it's doing on top, compared to a Mario game.

But I can push said system with a 30 fps game, and I can push it at 60 fps, depending on what I'm doing, of course.

Mario Galaxy games do a lot for the reality of the system (and at 60 fps) to be dismissed like that by the Factor 5 "Phantom Menace".

Polygon is not the end all thing too; biggest obstacle for it is texture resolution (because of the hit texturing) now, GC and Wii were very good at texturing, but then again, Factor 5 games tetxures were never in your face nor could you look at them upclose; hence they aren't all that detailed. They probably managed to push more polygons due to that alone, and then they optimized on top.

Insomniac is besides the point.
Any game can reach "its limit" on a console. I can write 10 lines of code and use all the resource of system hardware. That is not maxing out the system, though. That is just using up all of resources. Maxing out means optimizing to the point where there is nothing more that you can get out of the hardware. It means using all of the hardware most advanced features to their highest potential
Of course, but that was not my point at all.

You're saying to max out the system they have to pull every effect it supports and then some (by software). I don't agree with that; code can be efficient and be needed doing some tasks for the game nonetheless, and be maxed that way.
Nothing Nintendo releases ever does this. Any game that runs at 60 FPS can get could get double the performance at 30. If you never hit an instance of slowdown, then that means you were no where near peaking the limits of the hardware. Mario Galaxy left so much leeway that is could still maintain solid 60 FPS with the screen split. There is a difference between a game being/looking good and a game pushing a system. I like MG 2 but I know the hardware can do far more than that.
So your issue is that there is no slowdown?

Mindblown.
Also, ATI said the chip was HD capable, but Nintendo had it locked in the firmware. Even the PS2 and Xbox1 could output in HD. The PS2 output at 1080i in actual games and it only had 4 MB edram had a lower texture bandwith than the GC embedded 1T-SRAM. Nintendo could have unlocked the higher resolutions if they wanted to. They just didn't.
Not to be an ass, but I only heard you saying that, and I've been reading about this for years, AFAIK blu already covered it, it's not blocked via firmware per see, but via the video encoder chip.
One of the homebrew apps for the Wii allowed you to play "some" videos at 720p "Files with MPEG-2 TS video streams (typically HDTV recorded .ts files) with resolutions up to 1280x720 play correctly."
I'll look into that.

But I still don't see much use for 720p on the Wii.
 
Also, you are missing the entirety of what I'm saying. The main point was that when Mario Galaxy achieved 60 FPS, it did it with a ton of leeway. Nintendo didn't even try to "push" the system. People are letting their opinion of the game itself prevent them from seeing it technically. You are viewing it as fans rather than analysts. Also, that was not "all" that i looked at or pointed out.

Lilke lostinblue, your concept of maxing out is completely different than mine. Your view seems to be emphatic, or based on your emotional view and the games appeal to you. What devs target is their business and completely irrelevant to what I'm talking about. How I feel about the game/hardware is irreverent.

Speaking purely in terms of what the hardware can possibly achieve, no game pushed the Wii anywhere near it optimal limits, ie. maxing it out by my concept. There were no games that used every stage of the TEV to its optimized limit or push out the highest level texture effects it could achieve. There were no games that pushed the polygon potential of the Wii to the highest achievable in a real world scenario like Rebel Strike(which did it at 60 FPS by the way). There were no games that optimized the Wii CPU to do physics/A.I. to the "best" of its capability. There were games that had these things, but they were far from optimal. By that metric, no game maxed out the Wii's capabilities.
Or perhaps devs didn't brag as much as Factor 5. :p Nintendo never bragged about polygons per second or anything like that.

Mario Galaxy engine could pull more than what it did (for it was at 60 fps); then again no one else had an engine that good on the Wii; that says a lot about Nintendo; they could have stressed it until it was 30 frames per second with dips, but it was not their goal. Yes, they probably could have done the levels and then test them, retest them and add more and more polygons just so the system can't handle any more and then reach the theorectical limit everywhere across the whole game; but what's the point if objects are already very high polygon (to the point people say it could look very good on the HD twins) and rounded out? The only reason to do so would be for bragging rights, risking dips and taking twice as long to ship the game.

But thing is, you could only pull that on their engine; or invest a metric ton to have tech that good. They weren't lazy, as you pointed the engine was feature rich as f*ck; they just wanted it locked at 60 fps, and implemented a variable LOD on top of it all, so it became more doable and easy to do so (keep 60 fps with that detail).

I'm not saying it's the ultimate system destroyer, but it is a game that used the hardware properly, if someone calls it a system pusher I have no problem in agreeing, even if I can put the same caveats you name, it would be possible to stress the system further, yes; if not at 60 fps 30 fps.


The most wtf thing about mario galaxy though, is that they use polygonal shadows on the coins, that use more polygons than a proper top-down shadow (or recreation of it) would; 4 polygons versus the rounded polygon composition they use.
 

tipoo

Banned
Read what I posted a couple hours ago. According to Marcan, Wii U's eDRAM looks like it's pseudostatic. It probably has to be, if only for backwards compatibility. So no refresh cycles here - same latency as real SRAM.

I did, I don't see that in his recent tweets anywhere, and his earlier tweets did say eDRAM.

In either case, the statement I was correcting would still be wrong, the One at least does not use eDRAM. eSRAM is confirmed for that one. And whatever the Wii U uses, it can't be eSRAM due to density, eSRAM is 3x as large per memory unit.
 

AzaK

Member
I did, I don't see that in his recent tweets anywhere, and his earlier tweets did say eDRAM.

In either case, the statement I was correcting would still be wrong, the One at least does not use eDRAM. eSRAM is confirmed for that one. And whatever the Wii U uses, it can't be eSRAM due to density, eSRAM is 3x as large per memory unit.
It could be 1T-SRAM though could it not? That is similar in density to eDRAM and faster. XBO uses 6T-SRAM.
 

tipoo

Banned
It could be 1T-SRAM though could it not? That is similar in density to eDRAM and faster. XBO uses 6T-SRAM.

Some places do use 1T-SRAM and eDRAM interchangably, but Nintendo used to say 1T-SRAM for the Wii, and they said eDRAM specifically for the Wii U, why change the name if they were using the same thing? But more importantly, would we not know with everyone who has looked at the cells by now?
 
Some places do use 1T-SRAM and eDRAM interchangably, but Nintendo used to say 1T-SRAM for the Wii, and they said eDRAM specifically for the Wii U, why change the name if they were using the same thing? But more importantly, would we not know with everyone who has looked at the cells by now?
I reckon there have been some changes in MoSys and 1T-SRAM patents.

Here. They sold it to Invensas, NEC produced it anyway (later renamed Renesas), so chances are conditions of having to state it to be in the package have changed. If that's the case anyway.

Back on the GC days IBM, Mosys and ATi had their names on the box and released specs, it was part of their deals; Panasonic/Matsushita and NEC didn't.
 

Meelow

Banned
So now that it's confirmed that the Xbox One will use 3GB for the OS and 5GB for games, I've noticed that Nintendo could of upgraded the Wii U ram to 6GB and used 5GB for games and 1GB for the OS, or possibly even 5.5GB for the games and 500mb for the OS like they will be doing in the future and cutting the OS ram and adding it for the games.

I know this is the GPU thread but since there is no ram thread I might as well post this here.
 
So now that it's confirmed that the Xbox One will use 3GB for the OS and 5GB for games, I've noticed that Nintendo could of upgraded the Wii U ram to 6GB and used 5GB for games and 1GB for the OS, or possibly even 5.5GB for the games and 500mb for the OS like they will be doing in the future and cutting the OS ram and adding it for the games.

I know this is the GPU thread but since there is no ram thread I might as well post this here.

They also could've put 8 GB of GDDR5 RAM in the Wii U. They didn't. Don't mean to come off as rude, but there really isn't much to discuss on that front.
 

Meelow

Banned
They also could've put 8 GB of GDDR5 RAM in the Wii U. They didn't. Don't mean to come off as rude, but there really isn't much to discuss on that front.

I mean if they knew about Microsoft's spec plans for Xbox One.

Not saying Nintendo would do it but it is a interesting theory in my opinion.
 

krizzx

Junior Member
They also could've put 8 GB of GDDR5 RAM in the Wii U. They didn't. Don't mean to come off as rude, but there really isn't much to discuss on that front.

I would rather have 6GB of DDR3 than 8GB of GDDR5. It would give better all around performance for multiple tasks. GDDR is only "meant" for graphics.

If there was anything I would suggest Nintendo beefing up on the ram side, it would be the edram. If they had 100 MB of EDRAM, the rest would hardly even matter. Of course, that would put costs through the roof.
 

OryoN

Member
I'm still very intrigued by all those dual blocks in the GPU, and the fact that they're all congregated in the same general location. Seems like everyone just gave up on them.

What are some of the most common components in Radeon GPUs that would most likely occupy dual blocks? The only thing I'm aware of are geometry engines(this could allow Latte to achieved 2 polys/clk right?).

What else? Customized CUs? Tesselltors? (is more than 1 even necessary?) Also, do their sizes offer any clues to their functions? What about location? Closer to eDRAM, closer to shader cores, etc.
 

disap.ed

Member
I would rather have 6GB of DDR3 than 8GB of GDDR5. It would give better all around performance for multiple tasks. GDDR is only "meant" for graphics.

If there was anything I would suggest Nintendo beefing up on the ram side, it would be the edram. If they had 100 MB of EDRAM, the rest would hardly even matter. Of course, that would put costs through the roof.

4GB of DDR3 @ 128bit (3GB for games) would have been good enough I guess keeping the rest of the system in mind.
 

krizzx

Junior Member
I'm still very intrigued by all those dual blocks in the GPU, and the fact that they're all congregated in the same general location. Seems like everyone just gave up on them.

What are some of the most common components in Radeon GPUs that would most likely occupy dual blocks? The only thing I'm aware of are geometry engines(this could allow Latte to achieved 2 polys/clk right?).

What else? Customized CUs? Tesselltors? (is more than 1 even necessary?) Also, do their sizes offer any clues to their functions? What about location? Closer to eDRAM, closer to shader cores, etc.

It seems that a lot of AMD GPU from 2011 forward used dual graphics engines. That and it similar component design to Brazos are the main reasons why I believe Latte is based on the HD6XXX series tech now.

The duplicate components are undeniable. We can see them. The visual similarity to parts on Brazos is hard to deny as well.

Cayman%20block%20diagram.png
Barts_Block_diagram.png

Correct me if I'm wrong, but I believe AMD didn't start using dual graphics engines in single GPUs untill the HD6000 series. Latte clearly has one. There are five duplicate GPU components on Latte and coincidentally, there are 5 duplicate components to all AMD dual graphics engines.

1. [Q1, Q2] = Vertex Assemblers
2. [S1 S2] = Tessellators
3. [T1, T1] = Geometry Assemblers
4. [U1, U2] = Hierarchical Zs
5. [W1, W2] = Rasterizers

 
But, (query for the technically inclined) if the difference between PPC and x86 was trivial though, then wouldn't we see some form of BC on the XBO?
 
So now that it's confirmed that the Xbox One will use 3GB for the OS and 5GB for games, I've noticed that Nintendo could of upgraded the Wii U ram to 6GB and used 5GB for games and 1GB for the OS, or possibly even 5.5GB for the games and 500mb for the OS like they will be doing in the future and cutting the OS ram and adding it for the games.

I know this is the GPU thread but since there is no ram thread I might as well post this here.
Nintendo could have done any number of things. What is there to discuss in what you posted?
 
A

A More Normal Bird

Unconfirmed Member
I mean if they knew about Microsoft's spec plans for Xbox One.

Not saying Nintendo would do it but it is a interesting theory in my opinion.

What theory? Why would they put in so much more RAM whilst leaving the rest of the system untouched?

I would rather have 6GB of DDR3 than 8GB of GDDR5. It would give better all around performance for multiple tasks. GDDR is only "meant" for graphics.

If there was anything I would suggest Nintendo beefing up on the ram side, it would be the edram. If they had 100 MB of EDRAM, the rest would hardly even matter. Of course, that would put costs through the roof.

You'd rather have slower RAM and 2GB less of it due to a slight latency advantage? It might help the CPU out a bit, but what about the 4GB of data (or more) bound for the GPU? I think it's telling that devs were saying that a 4GB unified pool of GDDR5 would have been preferable or just as good as MS's solution. If DDR3 gave better performance for gaming purposes Sony would be using it and there wouldn't be move engines in the new Xbox.
 

disap.ed

Member
What theory? Why would they put in so much more RAM whilst leaving the rest of the system untouched?



You'd rather have slower RAM and 2GB less of it due to a slight latency advantage? It might help the CPU out a bit, but what about the 4GB of data (or more) bound for the GPU? I think it's telling that devs were saying that a 4GB unified pool of GDDR5 would have been preferable or just as good as MS's solution. If DDR3 gave better performance for gaming purposes Sony would be using it and there wouldn't be move engines in the new Xbox.

It's all a matter of costs and design paradigms. WiiU and XB1 have 32MB of eDRAM, so the use of relatively slow DDR3 makes sense for the main RAM. PS4 doesn't have eDRAM, so they have to use fast GDDR5 as main RAM.

Therefore eDRAM + GDDR5 would be overkill.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
But, (query for the technically inclined) if the difference between PPC and x86 was trivial though, then wouldn't we see some form of BC on the XBO?
PPC and x86_64 are different ISAs. You need to either emulate one via the other (translate ops on the fly), or pre-translate binary code from one to the other (which is what 360 did with xbox OG emulation), but in either case you need to handle endianness (the ordering of bytes in longer types). Neither of those is trivial. Pre-translation would be more viable, but in xb1's case, I'm highly skeptical whether Jaguar could quite hit the peek single-thread performance of Xenon's SIMD. So even if they did pre-translation, chances are they'd hit performance hurdles.
 
A

A More Normal Bird

Unconfirmed Member
It's all a matter of costs and design paradigms. WiiU and XB1 have 32MB of eDRAM, so the use of relatively slow DDR3 makes sense for the main RAM. PS4 doesn't have eDRAM, so they have to use fast GDDR5 as main RAM.

Therefore eDRAM + GDDR5 would be overkill.
Got it, thought krizz was just talking in general, like those people who claim that the use of GDDR5 in the PS4 will disadvantage it due to latency concerns. Although, I'd phrase it like this: the Wii-U and XB1 have slow main RAM, so they have to use eDRAM, whereas the PS4 and PC GPUs use GDDR5 and thus have no such concerns. That said, 6GB of any RAM in the Wii-U would be overkill and unbalanced from a system design perspective.
 
You'd rather have slower RAM and 2GB less of it due to a slight latency advantage? It might help the CPU out a bit, but what about the 4GB of data (or more) bound for the GPU? I think it's telling that devs were saying that a 4GB unified pool of GDDR5 would have been preferable or just as good as MS's solution. If DDR3 gave better performance for gaming purposes Sony would be using it and there wouldn't be move engines in the new Xbox.

Anand speculated that Sony went with the 8 GB unified pool of GDDR5 simply because for them it was a drop-in upgrade which required no redesign of the rest of the console. All they would do is buy 512 MB modules instead of 256 MB modules put those on the already finished PS4 motherboard.

The latency advantage of DDR3 over GDDR5 is vastly overrated. Let's not forget that DDR has superior latency over DDR2 which has superior latency over DDR3. Yet somehow PCs have gotten CPUs with ever higher clockspeeds and ever higher demands for rapid access to large pools of system memory and in the end the latency disadvantage hasn't meant jack shit because of the sheer bandwidth advantage DDR3 has over DDR2 has over DDR.

Depending on just how Sony and MS implement the memory controllers of Xbone and PS4, the latency difference between the DDR3 in Xbone and the GDDR5 in PS4 might be as little as 2x. Which is basically nothing. Meanwhile the PS4 can access the entire 8 GB pool of GDDR5 with a bandwidth of 170 GB/s. The Xbox accesses it's 8 GB pool of DDR3 at 68 GB/s and even the 32 MB of eSRAM can only be accessed at 106 GB/s (though at a ridiculously low latency since it's on-die). Since 32 MB is only big enough to hold a framebuffer anyways it's unlikely to make a big difference in overall performance, when it comes to sheer fillrate the PS4 absolutely murders the Xbone because of raw bandwidth and also more ROPs. We're looking at a situation where the PS4 has enough sheer GPU grunt and bandwidth that some games might be forced to use FXAA on the Xbone and be able to use MSAA on the PS4. That's how big the performance difference is.
 

disap.ed

Member
Got it, thought krizz was just talking in general, like those people who claim that the use of GDDR5 in the PS4 will disadvantage it due to latency concerns. Although, I'd phrase it like this: the Wii-U and XB1 have slow main RAM, so they have to use eDRAM, whereas the PS4 and PC GPUs use GDDR5 and thus have no such concerns. That said, 6GB of any RAM in the Wii-U would be overkill and unbalanced from a system design perspective.

Definitely. I wonder if Nintendo would have gone with 4GB if they would have known that PS4/XB1 will have 8GB. They really could have built a not too far away machine from XB1 with a little more money spent on every unit (for example at least 2 more CPU cores). I hope they will continue with this architecture in the next gen, just blow it up. Would allow a mature OS also.
I imagine a CPU with 1+8 Enhanced Espresso cores (the one being the core with 2MB L2 cache) ~1,6GHz, GX3 GPU with ~2TF and 128MB of eDRAM built as SoC on a 20nm process (or smaller if available in mature state). 8GB DDR4 RAM @128 bit. Release in 2016.
 
Definitely. I wonder if Nintendo would have gone with 4GB if they would have known that PS4/XB1 will have 8GB. They really could have built a not too far away machine from XB1 with a little more money spent on every unit (for example at least 2 more CPU cores). I hope they will continue with this architecture in the next gen, just blow it up. Would allow a mature OS also.
I imagine a CPU with 1+8 Enhanced Espresso cores (the one being the core with 2MB L2 cache) ~1,6GHz, GX3 GPU with ~2TF and 128MB of eDRAM built as SoC on a 20nm process (or smaller if available in mature state). 8GB DDR4 RAM @128 bit. Release in 2016.

I think its pretty clear when Nintendo were designing the Wii u they were expecting the other 2 to hit sooner and be lower specced, even mid last year the general assumption was on ps4 having only 2gb of ram
 
Definitely. I wonder if Nintendo would have gone with 4GB if they would have known that PS4/XB1 will have 8GB. They really could have built a not too far away machine from XB1 with a little more money spent on every unit (for example at least 2 more CPU cores). I hope they will continue with this architecture in the next gen, just blow it up. Would allow a mature OS also.
I imagine a CPU with 1+8 Enhanced Espresso cores (the one being the core with 2MB L2 cache) ~1,6GHz, GX3 GPU with ~2TF and 128MB of eDRAM built as SoC on a 20nm process (or smaller if available in mature state). 8GB DDR4 RAM @128 bit. Release in 2016.

There is no scenario in which the Wii U2 would be performance-competitive with Xbone and PS4 where it still uses the ancient Gamecube -> Wii -> Wii U CPU design. It needs to be abandoned, period. There is no other alternative. Nintendo is using a super-clocked version of a CPU architecture which was contemporary with the Pentium II.

I think its pretty clear when Nintendo were designing the Wii u they were expecting the other 2 to hit sooner and be lower specced, even mid last year the general assumption was on ps4 having only 2gb of ram

According to Anand the PS4's development started with the assumption it would have 2 GB of RAM, which was upgraded to 4 GB, and then upgraded again to 8 GB as DRAM module capacity grew and prices fell during the design process.
 

disap.ed

Member
There is no scenario in which the Wii U2 would be performance-competitive with Xbone and PS4 where it still uses the ancient Gamecube -> Wii -> Wii U CPU design. It needs to be abandoned, period. There is no other alternative. Nintendo is using a super-clocked version of a CPU architecture which was contemporary with the Pentium II.

You could say the same about the AMD CPU architecture though. As blu mentioned earlier the IPC isn't too far off and definitely not in a different league from the Jaguar cores.
There absolutely isn't a reason they HAVE TO abandon this architecture.
Of course they could just use a die shrinked PS4 chip and give it a day, but they would have to alter engines, OS etc too in this case.
I guess they will weigh the alternatives too :)
 
You could say the same about the AMD CPU architecture though. As blu mentioned earlier the IPC isn't too far off and definitely not in a different league from the Jaguar cores.
There absolutely isn't a reason they HAVE TO abandon this architecture.
Of course they could just use a die shrinked PS4 chip and give it a day, but they would have to alter engines, OS etc too in this case.
I guess they will weigh the alternatives too :)

I have poor knowledge of what AMD was up to since their contemporary to the Pentium II, the Athlon. But I can tell you what Intel has done since the Pentium II:

Pentium III (Katmai -> Coppermine -> Tualatin)
Pentium 4 (Willamette -> Northwood -> Prescott)
Pentium M/Core (Banias -> Dothan -> Yonah)
Core 2 Duo (Conroe -> Allendale -> Wolfdale)
Core 2 Quad (Kentsfield -> Yorkfield)
Core iSeries Gen 1 (Bloomfield -> Lynnfield)
Core iSeries Gen 2 (Sandy Bridge -> Ivy Bridge)
Core iSeries Gen Next (Haswell)

Intel has released a total of 7 or 8 (depending on how you relate these CPU families) new CPU designs since the heyday of PowerPC 750. There were a total of 4 completely new architectures in that timeframe (P4, PM, Core 2, and Core iSeries). Haswell is again an entirely new architecture, so that makes 5 new architectures. Anyone who claims that the IPC of PII is similar to the IPC of Haswell is ignorant or delusional. I wouldn't hesitate to claim that Haswell probably has 10x IPC that PII has, nevermind that Haswell also has 10x the clockspeed. 10x IPC and 10x clockspeed. Is Haswell really 100 times faster than PII was? Without a single doubt, I could say yes and mean it truthfully on computations which didn't need to touch main memory too much. Moore's Law is a tough bitch on old CPU designs. Will a PC running Haswell actually be 100 times faster than an old Pentium II PC? No, of course not, because the whole rest of the PC is a factor in performance as well. But in terms of sheer computational grunt, Haswell being 100 times faster than PII is a good rough estimate.

As for IPC of Jaguar vs. Espresso, I don't think 10x is too unreasonable a guess there either. AMD has not kept pace with Intel on sheer IPC, especially after the Core iSeries. So let's pretend clock-for-clock Jaguar is about 75% of an Intel CPU. Jaguar is conservatively clocked at 1.6 ghz on Xbone and PS4 compared to maybe 1.2 ghz on Espresso. There are 4 Jaguar cores vs. 3 Espresso cores. So doing some back of the envelope math, it's reasonable to guess that Jaguar has roughly around the ballpark of 13 times more sheer computational grunt than Espresso does, considering just the cores and ignoring everything else such as different architectures (POWER vs. x86-64) and of course the rest of the machines.

I would not be surprised if Dolphin, the GC/Wii emulator, could be modified to run Wii U games at full speed on existing, current modern Core i7 PCs. Wii U's DRM has not been cracked yet so there are no games to try and emulate. But the gulf in computational performance is that wide, that Dolphin could probably do it.
 
But, (query for the technically inclined) if the difference between PPC and x86 was trivial though, then wouldn't we see some form of BC on the XBO?
The architectural difference between XBone and X360 was far more deep than just x86 vs PPC.
It's CPU at 3,2 Ghz with an FPU and an Altivec unit, it's eDram configuration with more than 200GB/s between the ROPs and the 10MB of 1T-Sram memory (now it's 100GB/s between the eSram and the whole GPU), those are things that break emulation because XBone can't match them (like the PS3 couldn't match the eDram of the PS2 and the original models had to include a GS chip in order to do the emulation).

Emulation means to grab Xbox 360 code and execute it as is on the XBone. When I said that the difference between PPC and x86 was "trivial" I meant it in a scenario where you make a game on C++, and then you compile that code for the specific platform.

But if you want to emulate, apart from the differences I already pointed at (and those differences are enough to make emulation really difficult) you will need a "translator" from PPC to x86, and more precisely, from Xenon to Jaguar.
 
I have poor knowledge of what AMD was up to since their contemporary to the Pentium II, the Athlon. But I can tell you what Intel has done since the Pentium II:

Pentium III (Katmai -> Coppermine -> Tualatin)
Pentium 4 (Willamette -> Northwood -> Prescott)
Pentium M/Core (Banias -> Dothan -> Yonah)
Core 2 Duo (Conroe -> Allendale -> Wolfdale)
Core 2 Quad (Kentsfield -> Yorkfield)
Core iSeries Gen 1 (Bloomfield -> Lynnfield)
Core iSeries Gen 2 (Sandy Bridge -> Ivy Bridge)
Core iSeries Gen Next (Haswell)

Intel has released a total of 7 or 8 (depending on how you relate these CPU families) new CPU designs since the heyday of PowerPC 750. There were a total of 4 completely new architectures in that timeframe (P4, PM, Core 2, and Core iSeries). Haswell is again an entirely new architecture, so that makes 5 new architectures. Anyone who claims that the IPC of PII is similar to the IPC of Haswell is ignorant or delusional. I wouldn't hesitate to claim that Haswell probably has 10x IPC that PII has, nevermind that Haswell also has 10x the clockspeed. 10x IPC and 10x clockspeed. Is Haswell really 100 times faster than PII was? Without a single doubt, I could say yes and mean it truthfully on computations which didn't need to touch main memory too much. Moore's Law is a tough bitch on old CPU designs.

good thing the wii u doesn't have a Pentium 2 then
 

StevieP

Banned
You know... Jaguar is based on k5 if you follow the design path back. All modern current Intel designs stem from the p3. Being based on previous work doesn't mean they are inherently bad.
 

disap.ed

Member
Intel has released a total of 7 or 8 (depending on how you relate these CPU families) new CPU designs since the heyday of PowerPC 750. There were a total of 4 completely new architectures in that timeframe (P4, PM, Core 2, and Core iSeries). Haswell is again an entirely new architecture, so that makes 5 new architectures. Anyone who claims that the IPC of PII is similar to the IPC of Haswell is ignorant or delusional. I wouldn't hesitate to claim that Haswell probably has 10x IPC that PII has, nevermind that Haswell also has 10x the clockspeed. 10x IPC and 10x clockspeed. Is Haswell really 100 times faster than PII was? Without a single doubt, I could say yes and mean it truthfully on computations which didn't need to touch main memory too much. Moore's Law is a tough bitch on old CPU designs. Will a PC running Haswell actually be 100 times faster than an old Pentium II PC? No, of course not, because the whole rest of the PC is a factor in performance as well. But in terms of sheer computational grunt, Haswell being 100 times faster than PII is a good rough estimate.

I never considered Espresso to have Haswell level IPC, so doesn't Jaguar and you know it.

As for IPC of Jaguar vs. Espresso, I don't think 10x is too unreasonable a guess there either. AMD has not kept pace with Intel on sheer IPC, especially after the Core iSeries. So let's pretend clock-for-clock Jaguar is about 75% of an Intel CPU. Jaguar is conservatively clocked at 1.6 ghz on Xbone and PS4 compared to maybe 1.2 ghz on Espresso. There are 4 Jaguar cores vs. 3 Espresso cores. So doing some back of the envelope match, it's reasonable to guess that Jaguar has roughly 13 times more sheer computational grunt than Espresso does, considering just the cores and ignoring everything else such as main memory and the whole rest of the machine.

Xbox 1 XCPU: 951.64 DMIPS @ 733 MHz
Pentium III: 1124.311 @ 866 MHz
GC Gekko: 1125 DMIPS @ 486 MHz
Wii Broadway: 1687.5 DMIPS @ 729 MHz
Pentium 4A: 1694.717 @ 2 GHz
PS3 Cell PPE: 1879.630 DMIPS @ 3.2 GHz (sans SPE, SPE's are not meant for dhrystones/general purpose code)
X360 Xenon: 1879.630 DMIPS*3 = 5638.90 DMIPS @ 3.2 GHz (each 3.2 GHz core performing the same as the PS3)
PowerPC G4: 2202.600 @ 1.25GHz
AMD Bobcat: 2662.5*2 = 5325 DMIPS @ 1 GHz
Wii U Espresso: 2877.32 DMIPS*3 = 8631.94 DMIPS @ 1.24 GHz (again, final performance taking into account 3 fully accessible cores)
Pentium4 3.2GHz: 3258.068
8 core Bobcat: 4260*8 = 34080 DMIPS @ 1.6 GHz (said CPU doesn't exist, but best case scenario Jaguar is supposed to perform only 20% better; that would be 5112 DMIPS per core, 40896 DMIPS for 8 cpu's, but it's probably somewhere in between; again, taking into account 6 fully accessible cores, rumours suggest 2 cores for OS or so)

More like 4-5x more raw power.
IPC of Jaguar probably being 30% higher than of Espresso (we can't say for sure yet).

Wrong thread for these discussions here though (we are in the GPU thread).
 

wsippel

Banned
Some places do use 1T-SRAM and eDRAM interchangably, but Nintendo used to say 1T-SRAM for the Wii, and they said eDRAM specifically for the Wii U, why change the name if they were using the same thing? But more importantly, would we not know with everyone who has looked at the cells by now?
1T-SRAM is a MoSys trademark, they are not using 1T-SRAM anymore. Renesas has its own proprietary solution these days. If it's standalone, Renesas apparently calls it LLDRAM.

Also, Marcan looked at the cells, and wrote he thinks it looked like pseudostatic RAM. It's indeed one of his older tweets from February 5th.:

I said 1T-SRAM because it'd match the Wii and it looks like what I'd expect 1T-SRAM to look like, but take it with a grain of salt.

Either way, it has to match or exceed the Wii's 1T-SRAM in timings and performance, or else Wii mode wouldn't perform properly.
 
1T-SRAM is a MoSys trademark, they are not using 1T-SRAM anymore. Renesas has its own proprietary solution these days. If it's standalone, Renesas apparently calls it LLDRAM.

Also, Marcan looked at the cells, and wrote he thinks it looked like pseudostatic RAM. It's indeed one of his older tweets from February 5th.:


yeah i'm now completely agreeing that it is basically 1t-sram, it didn't make much sense to drop it all makes sense now
by the way do we have bandwidth and other specs for the wii's 1t-sram?
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
There is no scenario in which the Wii U2 would be performance-competitive with Xbone and PS4 where it still uses the ancient Gamecube -> Wii -> Wii U CPU design. It needs to be abandoned, period. There is no other alternative. Nintendo is using a super-clocked version of a CPU architecture which was contemporary with the Pentium II.
Come on now, it's not like Bobcat is this killer performance design. IMO had nintendo widened the paired singles (unlikely due to the level of register file alterations and the ISA implications) or just added a G4-era altivec block things would have been more than evened out with Jaguar. The rest is just a matter of number of cores.
 

ikioi

Banned
Yet somehow PCs have gotten CPUs with ever higher clockspeeds and ever higher demands for rapid access to large pools of system memory and in the end the latency disadvantage hasn't meant jack shit because of the sheer bandwidth advantage DDR3 has over DDR2 has over DDR.

But that's because other technologies have adapted to offset the ever increasing latency of DDR memory. Why do you think modern CPUs have an absolute ton of cache. Why AMD and Intel both got rid of Northbridge/Memory controllers and integrated these into the CPU's die. Why PCI-E lanes are now fed directly into CPUs. Then you have other technologies DDR's pre fetch, interleaving via combining multiple memory channels, and unganged multi channel setups.

You're making it sound as if DDR3's latency is basically a non issue. Quite far from it. The industry has had to go balls out implementing many of the above technologies to combat it. Almost all of the above target latency and efficiency > bandwidth.

Depending on just how Sony and MS implement the memory controllers of Xbone and PS4, the latency difference between the DDR3 in Xbone and the GDDR5 in PS4 might be as little as 2x. Which is basically nothing.

Depends very much on what you're doing.

Sequential vs Random etc.

Meanwhile the PS4 can access the entire 8 GB pool of GDDR5 with a bandwidth of 170 GB/s. The Xbox accesses it's 8 GB pool of DDR3 at 68 GB/s and even the 32 MB of eSRAM can only be accessed at 106 GB/s

But these figures are meaningless. They're both theoretical peak figures, real world the performance of both would be far more dependant on:

Memory architecture used
How developers optimise and code their games
How good Sony and Microsoft's bare metal and APIs are

Take for example:

Two graphics cards both featuring 3GBs of GDDR5 of memory. The memory on both cards is 100% identical, twelve 256 megabyte GDDR5 chips, 384bit bus, and same clocks. So both cards have a theoretical peak bandwidth of lets say 240GB/s. Only difference between the two cards is the memory controller the respective card's GPUs run. The first PC's graphics card runs a memory controller featuring clamshell bus. Where as the second computer's GPU runs a single bus.

The first card with its clamshell bus would absolutely mop the floor with the second card. That's despite the physical memory on both cards being identical.

The above example just highlights how freaking important the memory architecture is, not just the chips that are running on it.

We don't yet know, and may never, what memory architecture the Xbox One is running but I wouldn't be as bold as you to start making claims as how close to PS4 etc it's going to be. Heck we don't even know the full memory architecture of the PS4. While we know the PS4 features a clamshell bus for its ram, that's not enough to paint any picture.

when it comes to sheer fillrate the PS4 absolutely murders the Xbone because of raw bandwidth and also more ROPs. We're looking at a situation where the PS4 has enough sheer GPU grunt and bandwidth that some games might be forced to use FXAA on the Xbone and be able to use MSAA on the PS4. That's how big the performance difference is.

I agree something like this is very likely to be the case. But Microsoft could pull a hell of a lot out of a DDR3 configuration if it's memory architecture is efficient. Shared memory address space for CPU and GPU, GPU having direct bus to the CPU's cache, large on die cache for CPU, customisations to the chips themselves like increased register sizes, we don't know. If Microsoft pulled out stops like those just mentioned and all Sony did was go for a clamshell GDDR5, Microsoft's system would punch far harder then its specs on paper show.
 
I never considered Espresso to have Haswell level IPC, so doesn't Jaguar and you know it.

Hey, I factored that into my really silly math!

More like 4-5x more raw power.
IPC of Jaguar probably being 30% higher than of Espresso (we can't say for sure yet).

DMIPS is integer performance. Games are mostly FP-dependent, and I can tell you that increases in floating point performance has dramatically outpaced integer performance in subsequent CPU generations.

Furthermore, because FP compute dominates CPUs in games, throwing more INT units at games serves no purpose. This is the primary reason AMD's "8-core" CPUs get absolutely murdered by Intel's 4-core CPUs in games, because AMD's "8-core" CPUs consist of 4 "modules" which have 8 INT units and 4 FP units, and Intel's 4-core CPUs have 4 INT units and 4 FP units and Intel's FP performance beats the ever-loving shit out of AMD's. AMD having effectively double the INT performance of Intel per-core (ignoring bottlenecking at the shared cache and the memory subsystem) doesn't do a thing for AMD for games.

So no, 4-5x more raw power on INT doesn't mean a whole lot. I'm willing to bet my guess of 13x more is in the right ballpark when you realize it's the FP units that will get pounded by games running on the Jaguar in Xbone/PS4. The FP units on a modern x86 design are another universe from the FP units on the PPC750.

Wrong thread for these discussions here though.

Yeah, I guess. As you were then.
 

v1oz

Member
The new Xbox is using Mosys 6T-SRAM for the embedded memory on the GPU. Which is what Nintendo used to use on the GC but the 1 transistor variant 1T-SRAM.
 
The new Xbox is using Mosys 6T-SRAM for the embedded memory on the GPU. Which is what Nintendo used to use on the GC but the 1 transistor variant 1T-SRAM.

that explains over a billion and a half of Microsoft's transistors then, no wonder its such a huge chip
 
But that's because other technologies have adapted to offset the ever increasing latency of DDR memory. Why do you think modern CPUs have an absolute ton of cache. Why AMD and Intel both got rid of Northbridge/Memory controllers and integrated these into the CPU's die. Why PCI-E lanes are now fed directly into CPUs. Then you have other technologies DDR's pre fetch, interleaving via combining multiple memory channels, and unganged multi channel setups.

You're making it sound as if DDR3's latency is basically a non issue. Quite far from it. The industry has had to go balls out implementing many of the above technologies to combat it. Almost all of the above target latency and efficiency > bandwidth.

I guess I wasn't very clear there. Yes, you're right that increasing latency going from DDR -> DD2 -> DDR3 required both Intel and AMD to adapt their designs to account for it. That said, I shouldn't also fail to mention that for a time in the mid-2000's AMD's CPUs had a cache size advantage over Intel's, AMD was using on-die memory controller while Intel still was on Northbridge, and for awhile there Intel was using MCMs of 2 modules with 2 cores each when AMD had all 4 cores on the same die! And yet, latency just wasn't that much of a killer in the days of Core 2 vs. Phenom. Latency ended up being really low on the list of problems that Intel cared about and rightfully so, they were eating AMD alive anyways in performance.

But these figures are meaningless. They're both theoretical peak figures, real world the performance of both would be far more dependant on:

Memory architecture used
How developers optimise and code their games
How good Sony and Microsoft's bare metal and APIs are

Take for example:

Two graphics cards both featuring 3GBs of GDDR5 of memory. The memory on both cards is 100% identical, twelve 256 megabyte GDDR5 chips, 384bit bus, and same clocks. So both cards have a theoretical peak bandwidth of lets say 240GB/s. Only difference between the two cards is the memory controller the respective card's GPUs run. The first PC's graphics card runs a memory controller featuring clamshell bus. Where as the second computer's GPU runs a single bus.

The first card with its clamshell bus would absolutely mop the floor with the second card. That's despite the physical memory on both cards being identical.

The above example just highlights how freaking important the memory architecture is, not just the chips that are running on it.

We don't yet know, and may never, what memory architecture the Xbox One is running but I wouldn't be as bold as you to start making claims as how close to PS4 etc it's going to be. Heck we don't even know the full memory architecture of the PS4. While we know the PS4 features a clamshell bus for its ram, that's not enough to paint any picture.

So far, we've heard more about PS4's optimizations for GPGPU than anything else. Anyways, AMD designed both SoCs for Xbone and PS4, I'm finding it hard to believe that they could have dramatically different memory architectures. AMD has been to hell and back on the GPU side in terms of bad bus design, anybody remember the "ring bus" on the R520 series? They would certainly be using their best available designs for both MS and Sony.

I agree something like this is very likely to be the case. But Microsoft could pull a hell of a lot out of a DDR3 configuration if it's memory architecture is efficient. Shared memory address space for CPU and GPU, GPU having direct bus to the CPU's cache, large on die cache for CPU, customisations to the chips themselves like increased register sizes, we don't know. If Microsoft pulled out stops like those just mentioned and all Sony did was go for a clamshell GDDR5, Microsoft's system would punch far harder then its specs on paper show.

Judging from what has been said so far, it sounds like Sony did more customization than MS did. Neither you nor I can really speculate much on this not knowing what's inside, though I'm sure the 32 MB of eSRAM has all kinds of interesting connections with the rest of the SoC in Xbone. Whether the extremely low latency of the eSRAM allows operations which offset the bandwidth disadvantage, who knows. It's still a huge bandwidth difference no matter how you optimize memory architecture.
 

wsippel

Banned
yeah i'm now completely agreeing that it is basically 1t-sram, it didn't make much sense to drop it all makes sense now
by the way do we have bandwidth and other specs for the wii's 1t-sram?
Flipper offered <6.2ns sustained latency at 162MHz, 10.4GB/s texture bandwidth (512bit bus), 7.6GB/s framebuffer bandwidth (384bit bus).
 
Flipper offered <6.2ns sustained latency at 162MHz, 10.4GB/s texture bandwidth, 7.6GB/s framebuffer bandwidth.
6.2ns at 162MHz is a 1 cycle latency! For the WiiU to emulate this latency, it has to have 1T-Sram or one equivalent memory.
About SRam sustained latency... is it possible to go lower than 1 cycle?
 

wsippel

Banned
Looking through Renesas' docs again, Latte's MEM1 looks like it's using eight 128Kw * 256b macros. Shouldn't it therefore be on a 2048bit bus? 131GB/s? That doesn't seem right. Am I stupid or something?


weren't the framebuffer and texture cache seperate to the main 1t-sram though?
Yes, three distinct pools on individual busses. MEM1 was on a 64bit bus I believe, so that part was pretty slow in comparison: 2.7GB/s at 324MHz. Still very low latency of course.


6.2ns at 162MHz is a 1 cycle latency! For the WiiU to emulate this latency, it has to have 1T-Sram or one equivalent memory.
About SRam sustained latency... is it possible to go lower than 1 cycle?
I believe it just means that it'll never be more than 6.2ns - how could it be less? Still, it should be absolutely impossible to match this latency using anything other than SRAM or pseudostatic RAM. Also, at very low capacities, using PSRAM makes no sense anymore because of the complex logic required (which is quite a bit more complex than regular DRAM logic), so that would probably explain why the 1MB pool is SRAM now.
 

tipoo

Banned
that explains over a billion and a half of Microsoft's transistors then, no wonder its such a huge chip

Plus the GCN GPU family got up to 4.6 ish billion transistors on its own, even with Microsofts cut down version it's probably around 4 billion, plus eight CPU cores. That is one big chip. I wonder how many transistors are in the Sony APU, but maybe it's not so different since MS used some for eSRAM while Sony went for more GPU resources.
 

z0m3le

Banned
Plus the GCN GPU family got up to 4.6 ish billion transistors on its own, even with Microsofts cut down version it's probably around 4 billion, plus eight CPU cores. That is one big chip. I wonder how many transistors are in the Sony APU, but maybe it's not so different since MS used some for eSRAM while Sony went for more GPU resources.

4.3 Billion transistors for HD 7900 series (32 compute units or 2048ALUs) HD 7800 series is 2.8 Billion transistors (20 compute units or 1280ALUs) and the HD7700 series is 1.5 Billion transistors (10 compute units or 640ALUs)

Going by those families, XB1 has 12 compute units, but should otherwise fall into the HD 7700 series with 768 ALUs, making the transistor count close to ~1.7 Billion. PS4 has 18 compute units and should fall into HD 7800 series (between 7850 and 7870) making the count ~2.5 Billion transistors.

Microsoft's 5 Billion transistors claim is about all silicon, and in case anyone was wondering Wii U's GPU should be around ~750 million transistors if Thraktor's measurements were correct earlier in the year, not including the eDRAM which took up around 220 million transistors: http://www.neogaf.com/forum/showpost.php?p=45095173&postcount=836

Altogether Wii U's MCM is around 1.2 Billion transistors.
 

krizzx

Junior Member
I guess I wasn't very clear there. Yes, you're right that increasing latency going from DDR -> DD2 -> DDR3 required both Intel and AMD to adapt their designs to account for it. That said, I shouldn't also fail to mention that for a time in the mid-2000's AMD's CPUs had a cache size advantage over Intel's, AMD was using on-die memory controller while Intel still was on Northbridge, and for awhile there Intel was using MCMs of 2 modules with 2 cores each when AMD had all 4 cores on the same die! And yet, latency just wasn't that much of a killer in the days of Core 2 vs. Phenom. Latency ended up being really low on the list of problems that Intel cared about and rightfully so, they were eating AMD alive anyways in performance.



So far, we've heard more about PS4's optimizations for GPGPU than anything else. Anyways, AMD designed both SoCs for Xbone and PS4, I'm finding it hard to believe that they could have dramatically different memory architectures. AMD has been to hell and back on the GPU side in terms of bad bus design, anybody remember the "ring bus" on the R520 series? They would certainly be using their best available designs for both MS and Sony.



Judging from what has been said so far, it sounds like Sony did more customization than MS did. Neither you nor I can really speculate much on this not knowing what's inside, though I'm sure the 32 MB of eSRAM has all kinds of interesting connections with the rest of the SoC in Xbone. Whether the extremely low latency of the eSRAM allows operations which offset the bandwidth disadvantage, who knows. It's still a huge bandwidth difference no matter how you optimize memory architecture.

Just when I thought we were mkaing progress on the dual graphics engine front...

I guess this was expected though. The same thing happen when Sony announced it was using GDDR5
 

Hermii

Member
One thing struck me. If the main cause of the slow OS is flash memory, it would be easy for Nintendo down the line to release a version with faster flash memory and faster OS without affecting the rest of the system ?
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Looking through Renesas' docs again, Latte's MEM1 looks like it's using eight 128Kw * 256b macros. Shouldn't it therefore be on a 2048bit bus? 131GB/s? That doesn't seem right. Am I stupid or something?
Should't it be 137.5GB/s? And no, I don't think you are. But the macros don't necessarily need to be accessible all at once. There could be a DDR-style bus protocol which allows multiple slower macros to masquerade as a single fast device. Say, if there were 8 macros at 225MHz each, you could have a 1024bit bus running at 550MHz. As re the actual macros count - I've got no idea.
 
One thing struck me. If the main cause of the slow OS is flash memory, it would be easy for Nintendo down the line to release a version with faster flash memory and faster OS without affecting the rest of the system ?

its possibly more to do with encyption/decryption, the flash itself shouldn't be as slow as it seems to be
 
Status
Not open for further replies.
Top Bottom