• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

WiiU technical discussion (serious discussions welcome)

Thraktor

Member
Now that we've got (some) hard numbers, I think it's time for a bit more speculation:

CPU

According to Anandtech's teardown, Wii U's CPU die is approximately 32.76mm². We know that it's manufactured at 45nm, and there's 3MB of cache on there. From my calculations from the BlueGene/Q die shot on page 4 of this pdf, 2MB of IBM eDRAM cache on a 45nm process is 6.77mm², meaning we're looking at 10.16mm² for "Espresso"'s 3MB of eDRAM cache, leaving about 22.6mm² left for the cores, etc.

Now, just for reference, from the same die shot I calculated the A2 core at 6.58mm² (again, 45nm). Of course Nintendo isn't using A2 cores, but you could fit three of them on Espresso, and still have just enough space left for inter-core communication, off-die interfaces, etc. Perhaps this says more about how small the A2 cores are than anything else, though.

Anyway, the Wii's CPU is apparently about 16mm² on a 90nm node (I couldn't find a better source, if anyone has one it'd be appreciated). An optimistic shrink to 45nm would put this at 4mm², probably closer to 5-6mm² in reality, but considering we're stripping off the SRAM cache, off-die interfaces, etc. from this number, I feel 4mm² is a rough guide to how big a Broadway core would be at 45nm. Hence, if we were to assume that the Wii U's CPU is just three Broadways bolted together with 3MB of cache, then we have to assume that the IPC and off-die interfaces take up about 10.6mm², or about a third of the die, which I don't think is reasonable or necessary for a CPU with just three threads and an off-die memory controller. Then again, there isn't much room to for three cores more than ~50% bigger than Broadways, either.

This brings me back to the asymmetric cache, and a theory I had a good few months ago. The only reason that one core would have a larger cache than other cores is if it were simply chewing through more data than the other cores. Furthermore, if the cache were four times the size, you would have to assume that that's because it's chewing through four times as much data. What runs through four times as much data as a core that's performing 32-bit scalar maths? One that's performing 128-bit vector maths. I think that what we're looking at is two cores which are basically Broadways with minor improvements, and one core with a SIMD unit (possibly a combination FPU/SIMD as in the A2). From the size of the A2 core, we know such a unit can fit in the space provided (the A2 QFPU is actually a 256-bit wide SIMD unit), and it explains the asymmetric cache better than any other explanation we've heard. Furthermore, it makes sense from Nintendo's perspective. We know from Iwata's GPGPU comments, and the sheer size discrepancy between the CPU and GPU dies, that the GPU is intended to handle most of the computational grunt work, including I'd imagine most of the vector calculations. Hence, it doesn't make sense for all CPU cores to have dedicated SIMD units. Nonetheless the CPU is going to end up doing some amount of 3D maths, and therefore a SIMD unit on just one of the cores is the most practical approach.

I'll have some speculation on the GPU a bit later.
 

1-D_FTW

Member
Has anyone done any power measurements in more demanding games than NSMBU?

Taking that measurement of 33 Watt for that (that was after the power supply I believe), I'd estimate (roughly!) that at most the GPU alone will probably pull is 30W. The top-end of Flops/W AMD provides in the Turks line is a bit below 15, so let's go with 15. That would put the best-case Wii U GPU GFLOPS at 450. Which seems to match up pretty well with some of the (lower end) early rumours.

It may not even matter. I'm not sure how often you've plugged consoles into something like Kill-a-watt, but it's been my experience games don't have much variance. Historically, it's not like PCs where different CPU/GPU utilization will give you wide gaps in energy consumption. My measurements on consoles have always been fairly flat.
 

Durante

Member
This brings me back to the asymmetric cache, and a theory I had a good few months ago. The only reason that one core would have a larger cache than other cores is if it were simply chewing through more data than the other cores.
This premise is a bit too simplified IMHO. If "chewing through more data" just means doing more streaming computations, then in many cases more cache won't help you since there just isn't enough temporal reuse.

In fact, if you had 2 almost identical cores, the only difference being that one has a small cache and the other a large cache, you'd probably want to do streaming calculations with a low reuse factor on the former and things like latency-bound data structure traversal (which is usually not SIMD) on the latter.


It may not even matter. I'm not sure how often you've plugged consoles into something like Kill-a-watt, but it's been my experience games don't have much variance. Historically, it's not like PCs where different CPU/GPU utilization will give you wide gaps in energy consumption. My measurements on consoles have always been fairly flat.
This may be true for PS360, but -- as some people never tire of reminding us -- Wii U is a more modern architecture. Thus it might also feature a larger gap between idle, medium load and full load power. Of course, it might also be as you say, if they opted not to implement any complex power saving mechanism e.g. to guarantee sustained performance levels.
 

Thraktor

Member
This premise is a bit too simplified IMHO. If "chewing through more data" just means doing more streaming computations, then in many cases more cache won't help you since there just isn't enough temporal reuse.

In fact, if you had 2 almost identical cores, the only difference being that one has a small cache and the other a large cache, you'd probably want to do streaming calculations with a low reuse factor on the former and things like latency-bound data structure traversal (which is usually not SIMD) on the latter.

Yes, that's a fair point, I was working on the basis that the code run on each core is algorithmically equivalent bar the use of SIMD.

I still can't think of a better explanation for such a heavily asymmetric cache, though. The cores could be identical, and there could be a big warning on the front page of the documentation telling developers to make sure they run all pathfinding routines on the core with the extra cache, but it seems a rather bizarre complication to design into your CPU.

Edit: Technically, one possibility is that they simply threw together two Broadway cores and one A2 core, which would explain the cache (the A2 is four-way multithreaded, and the standard cache per core on the BlueGene/Q is 2MB), but it would be a seriously messy combination of architectures for developers to have to deal with.
 

dumbo

Member
It's not possible that the CPU is misunderstood - that it's 2 "new/different cores" + 1 original/enhanced Wii broadway core?
 
Yes, that's a fair point, I was working on the basis that the code run on each core is algorithmically equivalent bar the use of SIMD.

I still can't think of a better explanation for such a heavily asymmetric cache, though. The cores could be identical, and there could be a big warning on the front page of the documentation telling developers to make sure they run all pathfinding routines on the core with the extra cache, but it seems a rather bizarre complication to design into your CPU.

Edit: Technically, one possibility is that they simply threw together two Broadway cores and one A2 core, which would explain the cache (the A2 is four-way multithreaded, and the standard cache per core on the BlueGene/Q is 2MB), but it would be a seriously messy combination of architectures for developers to have to deal with.

Very interesting breakdown! To address the last part of your post first, I agree a single A2 core among Broadways seems unlikely and also contradicts the report of "less threads" than Xenon. Having an A2 core on there would theoretically bring the entire chip up to 6 threads.

At what point does adding more cache become useless? Also, if the core with 2 MB were acting as some sort of master core, then why is it labeled as "Core 1?" and not 0? Does it even matter? Or is this core not a master but one designed to bear the grunt of the graphical and physics code (with Core 0 running general game code and Core 2 perhaps AI)? We know they need some type of floating point functionality on there. It's possible that one core does have a beefy FPU, but compared to Xenon, 1 is simply not enough - at least to the devs who have been struggling with it.

One theory I proposed earlier is that Core 1 might have that extra cache simply because Wii U might shut down the other 2 cores when they are not needed (non gaming applications). I don't know if the TDP analysis backs that up, though. But running some extra eDRAM would probably save a few watts over two full cores.

Finally, there's the somewhat pessimistic hypothesis that run accross my mind that the IBM eDRAM is fast, but still not comparable to the SRAM used for Wii's L2. Isn't it possible that the eDRAM's latency is higher? And perhaps they needed more (and only on one core) in order to support the Wii BC mode. That's sheer speculation, and I hope it's not the case. I'd love to seem some numbers for comparison. (Edit: the more I think about this one, the more unlikely it seems)

This premise is a bit too simplified IMHO. If "chewing through more data" just means doing more streaming computations, then in many cases more cache won't help you since there just isn't enough temporal reuse.

In fact, if you had 2 almost identical cores, the only difference being that one has a small cache and the other a large cache, you'd probably want to do streaming calculations with a low reuse factor on the former and things like latency-bound data structure traversal (which is usually not SIMD) on the latter.

Didn't see this before, but it sounds like you have a point...latency-bound structure traversal. In terms of gaming, what kind of code would that be?
 

Thraktor

Member
At what point does adding more cache become useless?

Depends on what kind of code you're running. IBM's zEC12 mainframe processor has 384MB L4 cache, but that'd probably be a bit much for a 3 core console CPU.

Also, if the core with 2 MB were acting as some sort of master core, then why is it labeled as "Core 1?" and not 0? Does it even matter? Or is this core not a master but one designed to bear the grunt of the graphical and physics code (with Core 0 running general game code and Core 2 perhaps AI)? We know they need some type of floating point functionality on there. It's possible that one core does have a beefy FPU, but compared to Xenon, 1 is simply not enough - at least to the devs who have been struggling with it.

One theory I proposed earlier is that Core 1 might have that extra cache simply because Wii U might shut down the other 2 cores when they are not needed (non gaming applications). I don't know if the TDP analysis backs that up, though. But running some extra eDRAM would probably save a few watts over two full cores.

Finally, there's the somewhat pessimistic hypothesis that run accross my mind that the IBM eDRAM is fast, but still not comparable to the SRAM used for Wii's L2. Isn't it possible that the eDRAM's latency is higher? And perhaps they needed more (and only on one core) in order to support the Wii BC mode. That's sheer speculation, and I hope it's not the case. I'd love to seem some numbers for comparison. (Edit: the more I think about this one, the more unlikely it seems)

Personally, I think that the reason that the cores are numbered as they are is that Core 0 is the core which handles Wii BC. That is, when only one core's running, it's that one, so they call it Core 0. So, Core 0 is fully binary compatible with Broadway, and Core 1 probably isn't.

As far as cache when running in Wii mode is concerned, eDRAM does have slightly higher latency than SRAM, but it wouldn't be a major difference. Furthermore, the way IBM's eDRAM cache is designed, if you were to disable two of the cores, the other core would have access to the full 3MB of L2 cache, regardless of how it's originally configured.

Didn't see this before, but it sounds like you have a point...latency-bound structure traversal. In terms of gaming, what kind of code would that be?

Pathfinding/AI.
 

wsippel

Banned
Homogeneous coordinates are N-dimensional coordinates where one of the coordinates (read: normally the last one) is an 'out-of-this-world' component, figuratively speaking. It allows matrix transformations (among other things) to feature the entire set of spatial transformations you'd normally care about, as long as the matrices are NxN, or Nx(N-1). Basically, the extra component is an 'out-of-band' thing which carries extra information, which you cannot encode in a 'normal' vector of just the bare dimensionality.

Historically, the out-of-band component in 3D homogeneous space is called W, i.e. a 3D homogeneous space coordinate/vector is an <x, y, z, w> tuple. Setting W = 1 makes the tuple behave as a first-class-citizen coordinate, subject to translations and perspective transforms; setting W = 0 makes the tuple immune to translations, which is good for directional vectors. Yes, you can do all that manually, if you have the prior knowledge what type of tuple that is. But aside from a better abstraction mode, it can also be an efficiency gain if the hw supports 4-wide ops. For instance, imagine we have a vec3 coordinate we want to get the partial product of with a row/column from a homogeneous transform. We can do that as:

res = dot(vec4.xyz, vec3)
res += vec4.w

or as:

res = dot(vec4, vec4(vec3, 1.0))

If the hw does dot4 natively the latter case is preferable over the latter, which has a data dependency and is ergo not co-issuable in the given order - it would just stall the pipeline (the dot operation can have arbitrary high latency). Please note that the knowledge of the nature of the tuple still allows us to store our original argument as vec3 and not as vec4.

Last but not least, we have quaternions, which are inherently 4D.
Congratulations, you made my head spin! ;-)

So W is technically just a bool? Also, it's not really operated on when translating a coordinate? Isn't that a bit wasteful?
 

Ormberg

Member
Not sure this is the best thread to ask. As I've moved on from hardware speculation, my information in this field is way old so I figures there are other folks here better suited for answering/speculating.

What I'm wondering is how good system you can build given a budget of let's say $200 and a TDP of max 45W*. Perhaps WiiU is actually very power efficient, but given those limitations there's not a lot of head room for powerhungry CPU's...?

In light of this, I suspect, as perhaps several others already have, that Nintendo is aiming for the indie community - though that is a topic for another thread.


*I assume that the WiFi needed for the GamePad is not very power hungry.
 

efyu_lemonardo

May I have a cookie?
Congratulations, you made my head spin! ;-)

So W is technically just a bool? Also, it's not really operated on when translating a coordinate? Isn't that a bit wasteful?

If some vectors don't transform like others then they must be uniquely specified, so this is not at all wasteful.

Also, I think there were also some typos that made the post less coherent. For example if I understand correctly the last part should have been (corrections in bold):

We can do that as:

res = dot(vec4.xyz, vec3)
res += vec4.w

or as:

res = dot(vec4, (vec3, 1.0))

If the hw does dot4 natively the latter case is preferable over the former, which has a data dependency and is ergo not co-issuable in the given order - it would just stall the pipeline (the dot operation can have arbitrary high latency). Please note that the knowledge of the nature of the tuple still allows us to store our original argument as vec3 and not as vec4.
 

The Technomancer

card-carrying scientician
I have a question about the form factor of the U. Is it just the way things turned out to be because of the parts they used, they used specific parts in order to get a small form factor, or is there is a business case for it? I'm assuming having small form factors results in a higher costs.

There was an Iwata asks talking about this stuff actually, it was really interesting. I was impressed with the attention they gave the heat transfer issues in a design that small.
 

mrklaw

MrArseFace
Not sure this is the best thread to ask. As I've moved on from hardware speculation, my information in this field is way old so I figures there are other folks here better suited for answering/speculating.

What I'm wondering is how good system you can build given a budget of let's say $200 and a TDP of max 45W*. Perhaps WiiU is actually very power efficient, but given those limitations there's not a lot of head room for powerhungry CPU's...?

In light of this, I suspect, as perhaps several others already have, that Nintendo is aiming for the indie community - though that is a topic for another thread.


*I assume that the WiFi needed for the GamePad is not very power hungry.

It's probably a very nice system given that as a restriction. It just seems odd to have a super low TDP as the critical factor
 
Seems like the hylix memory used is gDDR like the samsung memory. This is DDR 3 that is specifically tweaked for graphics/desktop.
Also if the x720 rumours are true then you will likely see a simular set up to the WiiU's. Considering console makers are desperatly trying to keep costs down using DDR 3 is the way to go. It's till far cheaper than GDDR 3 and espically GDDR 5.

Yup, agreed. Feel free to call me an evil man but I'm quite looking forward to the Nerd-Rage when Sony and Microsoft fanboys realise that the next gen efforts from each platform holder is using DDR3 instead of GDDR5 lol. Especially after the recent hoo-hah over the U having DDR3.

Even if Microsoft in particular use GDDR5 and have 6GB of RAM then you're looking at 24 chips of memory...the motherboard is going to be monsterously complex and the case is going to be about the size of a small house before you take into consideration the cost.

We could end up seeing GDDR5 in the PS4 if they go for 2GB of RAM, that would be 8 chips which would be doable but I can't see them squeezing 16 chips in for 4GB of RAM.

What a great deal of people are forgetting is that Nintendo always produce balanced systems. People concerning themselves over the speed of the RAM and/or the power of the CPU are worrying about nothing, particularly when we've had plenty of developers and people here with contacts saying that the GPU has a fair bit of grunt.
 
Yup, agreed. Feel free to call me an evil man but I'm quite looking forward to the Nerd-Rage when Sony and Microsoft realise that the next gen efforts from each platform holder is using DDR3 instead of GDDR5 lol. Especially after the recent hoo-hah over the U having DDR3.

Even if Microsoft in particular use GDDR5 and have 6GB of RAM then you're looking at 24 chips of memory...the motherboard is going to be monsterously complex and the case is going to be about the size of a small house before you take into consideration the cost.

We could end up seeing GDDR5 in the PS4 if they go for 2GB of RAM, that would be 8 chips which would be doable but I can't see them squeezing 16 chips in for 4GB of RAM.

What a great deal of people are forgetting is that Nintendo always produce balanced systems. People concerning themselves over the speed of the RAM and/or the power of the CPU are worrying about nothing, particularly when we've had plenty of developers and people here with contacts saying that the GPU has a fair bit of grunt.

I've been saying this for awhile, but any negative news to these Nintendo-haters is good news to them.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Congratulations, you made my head spin! ;-)

So W is technically just a bool? Also, it's not really operated on when translating a coordinate? Isn't that a bit wasteful?
No, it's not a flag - it's a scalar. During a perspective transform (which cannot be expressed in its entirety through a matrix transform alone - you also need some reciprocals) w becomes a completely full-fledged coordinate axis - you can think of a perspective projection transform as a 4D transform, which ultimately re-projects its 4D citizens onto a 3D hyperplane. If that sounds too abstract, here's the low-down math of a traditional projection & clipping portion of the pipeline:

Given a vertex V subject to a 4x4 projection transform M, the following operations send that vertex to screen space.

Vclip = M * V, where V = (x, y, z, 1)T
(read: column-vectors, matrix operator sits on the left)

Vclip is subjected to clipping as:
-Vclip.w < Vclip.x < Vclip.w
-Vclip.w < Vclip.y < Vclip.w
-Vclip.w < Vclip.z < Vclip.w

Vndc = vec4(Vclip.xyz / Vclip.w, 1 / Vclip.w)
(read: ncd meaning Normalized Device Coordinates; also, a properly positioned front clipping plane in the projection transform guarantees the division by w would not end up as division-by-zero)

Vscreen = vec4((S * vec4(Vndc.xyz, 1)).xyz, Vndc.w), where S is a 4x4 transform accounting for the ranges of screen width, height and depth (z-buffer).

In practice the final step is never a full-blown matrix transform but instead is decomposed into trivial by-component linear transforms (ax + b) for the x, y and z component.

As you see, w is anything but a flag ; )

If some vectors don't transform like others then they must be uniquely specified, so this is not at all wasteful.

Also, I think there were also some typos that made the post less coherent. For example if I understand correctly the last part should have been (corrections in bold):
Yes, indeed, thanks for pointing that out. Typos corrected.
 

Corky

Nine out of ten orphans can't tell the difference.
( brace yourselves for a question asked by someone with 0 programming knowledge ) Is it known how much, if at all, the architecture differs between wii u and wii/gc? Asking this from an emulation perspective, eg Dolphin.
 

mrklaw

MrArseFace
Yup, agreed. Feel free to call me an evil man but I'm quite looking forward to the Nerd-Rage when Sony and Microsoft fanboys realise that the next gen efforts from each platform holder is using DDR3 instead of GDDR5 lol. Especially after the recent hoo-hah over the U having DDR3.

Even if Microsoft in particular use GDDR5 and have 6GB of RAM then you're looking at 24 chips of memory...the motherboard is going to be monsterously complex and the case is going to be about the size of a small house before you take into consideration the cost.

We could end up seeing GDDR5 in the PS4 if they go for 2GB of RAM, that would be 8 chips which would be doable but I can't see them squeezing 16 chips in for 4GB of RAM.

What a great deal of people are forgetting is that Nintendo always produce balanced systems. People concerning themselves over the speed of the RAM and/or the power of the CPU are worrying about nothing, particularly when we've had plenty of developers and people here with contacts saying that the GPU has a fair bit of grunt.

There is a critical difference. 720/PS4 will be lead development platforms and developers will learn to work and optimise around their architectures. With WiiU that isn't the case, and Nintendo should have considered more about how it would fit into existing workflows.
 
( brace yourselves for a question asked by someone with 0 programming knowledge ) Is it known how much, if at all, the architecture differs between wii u and wii/gc? Asking this from an emulation perspective, eg Dolphin.

I don't think anyone here has enough information to be specific, all we know is there are rumours that the CPU design is based off of broadway. I'm not sure it matters, hardware of this caliber won't be emulated any time soon and it won't be done by Dolphin.
 

Corky

Nine out of ten orphans can't tell the difference.
I don't think anyone here has enough information to be specific, all we know is there are rumours that the CPU design is based off of broadway. I'm not sure it matters, hardware of this caliber won't be emulated any time soon and it won't be done by Dolphin.

Did GC/WII use a broadway chip design as well?
 
There is a critical difference. 720/PS4 will be lead development platforms and developers will learn to work and optimise around their architectures. With WiiU that isn't the case, and Nintendo should have considered more about how it would fit into existing workflows.

I still wonder if Nintendo are gambling that the 720/PS4 will be architecturally-similar, even if the difference in power is significant.

EDIT:

...and please note, this is *not* a "Wii U version will be the same just with a few effects turned off!" expectation; more wondering whether Nintendo have at least made the right hardware choices so as not to lock the Wii U out in the same way Wii was by virtue of its hardware setup.
 

lherre

Accurate
Depends on what kind of code you're running. IBM's zEC12 mainframe processor has 384MB L4 cache, but that'd probably be a bit much for a 3 core console CPU.



Personally, I think that the reason that the cores are numbered as they are is that Core 0 is the core which handles Wii BC. That is, when only one core's running, it's that one, so they call it Core 0. So, Core 0 is fully binary compatible with Broadway, and Core 1 probably isn't.

As far as cache when running in Wii mode is concerned, eDRAM does have slightly higher latency than SRAM, but it wouldn't be a major difference. Furthermore, the way IBM's eDRAM cache is designed, if you were to disable two of the cores, the other core would have access to the full 3MB of L2 cache, regardless of how it's originally configured.



Pathfinding/AI.

If I remember correctly "core 1" is the one with more memory.
 

wsippel

Banned
If I remember correctly "core 1" is the one with more memory.
Seems odd. Maybe they simply numbered them the way they are layed out on the die? Pretty sure the core with more cache would typically be the "master", the core running the main game logic?
 

The_Lump

Banned
There is a critical difference. 720/PS4 will be lead development platforms and developers will learn to work and optimise around their architectures. With WiiU that isn't the case, and Nintendo should have considered more about how it would fit into existing workflows.


Well we still don't know that it won't (for sure, anyways)

I'm sure they're privvy to a lot more info on the other two systems than we are and will hopefully have based their design around receiving ports from the other two next gen systems.

As others have said, it's probably not going to be a case of "the same game with effects/assets turned down" but it's possible the same scalable features will be in the WiiU - unlike with the Wii. Which sounds like what Nintendo were aiming for with their design.

My take on it is that they would have been perfectly happy with the Wii up to now if only they thought to future proof it in terms of gpu feature set. They probably see that as the only reason Wii sales dropped (as 3rd parties found it un-economical to spend time re-hashing their games to fit a totally different set up). I bet thats their reasoning for sticking with the same strategy this gen, and just ensuring (hopefully) it will be simpler to receive ports by future proofing the feature set.

Evidently that hasn't worked a treat so far, as its already missed some games (Metro LL, Tomb Rader etc) but that was down to meeting launch schedules/THQ not having the time/money more likely, to be fair.

We'll see with the next round of 3rd party multiplatform titles if their gamble has paid off.
 

Durante

Member
Yup, agreed. Feel free to call me an evil man but I'm quite looking forward to the Nerd-Rage when Sony and Microsoft fanboys realise that the next gen efforts from each platform holder is using DDR3 instead of GDDR5 lol. Especially after the recent hoo-hah over the U having DDR3.
The "hoo-hah" isn't simply because it's DDR3. In combination with a good amount of fast eDRAM this can be a viable option for next-gen.

The "hoo-hah" is because it's 800 MHz DDR3 on a 64 bit bus. I'm almost certain that -- if they do use DDR3 -- it will be both higher clocked and at the very least on a 128 bit bus in PS4/720.
 

Thraktor

Member
If I remember correctly "core 1" is the one with more memory.

Yeah, that's what I was assuming (probably should have made myself more clear). My point was that if there are two "basic" cores and one "enhanced" core, the Wii BC mode is probably running on one of the "basic cores" (Core 0). The "enhanced" core, then, is not necessarily binary compatible with Wii code.

I still think they will go with 2GB fast GDDR5 vram and DD3/4 rest.

Two gigs of GDDR5 means eight 2Gb chips and a 256 bit bus. That's a complex enough console motherboard as it is without adding another pool of RAM, and the extra chips and traces that entails.

They'll both go with DDR3/4, with a wide bus and/or an eDRAM framebuffer to provide the necessary bandwidth.
 

Diffense

Member
Homogenous coordinates (I hated these things when I first came across them):
The main practical reason for the extra w coordinate is the ability to represent translations by matrix multiplication. If you use 3 coordinates for 3 dimensions (as would be logical lol) you have to add translation vectors. With homogenous 3D (x, y, z, w) coordinates there are matrices that can represent rotations, scales, AND translations in 3D. Since you can multiply matrices together to get composite transformations it's all very convenient. You can compute a single matrix product that represents a combined translation, rotation, and scale then apply it to any number of vectors. Also, most architectures prefer to deal with sets of four floating point numbers anyway given the binary, power-of-two, organisation of computer memory.

The mathematical motivation for homogenous coordinates actually started with projective geometry in 2 dimensions. If you imagine homogenous 2D coordinates (x, y, w), we'd have a bunch of planes stacked on each other for each w coordinate. We consider the plane at w=1 our "real" 2D world. If any transformation lifts us out of the w=1 plane we divide the x,y coordinates by w to project it back into our "real" 2D space. Each 2D point actually has an infinite number of representations in homogenous coordinates since w*(x, y, 1) = (xw, yw, w) all generate the same 2D point after w divide by w (not zero). It's sometimes called projective coordinates because all coordinate triples that would project to the same point on the w=1 plane are considered "the same" in a sense. These triples lie on lines fanning out from the origin like lines of sight of the vision cone of an eye sitting there. That's the "projective" part. But computer graphics just grabbed the 3D version of the concept for the representational advantages.

~~~~~~

So GPGPU was something Iwata said eh.
From the OP:
The shared access to MEM1 pool by the GPU and CPU alike indicated the two units are meant to interact at low latency, not normally seen in previous console generations.
Makes sense.
 

Linkup

Member
GPGPU is to Wii U like Soul Sacrifice is to Vita, the last beacon of hope so that the situation doesn't end up utterly pathetic.

I hope both pull through.
 

mrklaw

MrArseFace
how is stacked ram vs edram? eg in the Vita isn't 128MB ram stacked with the CPU? Is that a lot slower than edram or similar enough?

I think one of the big questions for the architects for these next systems will be that balance between overall size of memory and speed
 

wsippel

Banned
So GPGPU was something Iwata said eh.
Yes, he specifically highlighted it in a Nintendo Direct. Was actually pretty much the only GPU feature he highlighted. It's apparently also explicitly mentioned in the technical documentation. Obviously something Nintendo is focussing on. Makes sense I guess, looking at how GPU centric the whole design seems to be and how Nintendo put an emphasis on latency optimizations in the Iwata Asks about the chipset...
 
Yes, he specifically highlighted it in a Nintendo Direct. Was actually pretty much the only GPU feature he highlighted. It's apparently also explicitly mentioned in the technical documentation. Obviously something Nintendo is focussing on. Makes sense I guess, looking at how GPU centric the whole design seems to be and how Nintendo put an emphasis on latency optimizations in the Iwata Asks about the chipset...

...which is why I think it's fair for people to flag this up as a potential wrinkle in the Wii U hardware design that may not be properly exploited yet, and why I find the sneering about GPGPU comments a bit irritating. I don't think anyone reasonable is suggesting that there's some magic element in the hardware setup, just that perhaps there's a bit more going on here than is immediately obvious and that the design may - when properly utilised (an issue in itself, of course) - be a fair bit better than it's being given credit for at the moment.
 

Thoraxes

Member
Besides the regular points discussed with the OS (which yes, it's unoptimized), has anyone looked into what the wifi connectivity of the system is doing when switching between Netflix, games, Miiverse?

I'd also be curious as to when it's actively sending and receiving data, and if it's always on standby or has a particular set of rules predetermined for D/U data.
 

gofreak

GAF's Bob Woodward
how is stacked ram vs edram? eg in the Vita isn't 128MB ram stacked with the CPU? Is that a lot slower than edram or similar enough?

I think one of the big questions for the architects for these next systems will be that balance between overall size of memory and speed

All the RAM in vita is actually stacked.

I think the stacking allowed them to use wide-IO VRAM for better bandwidth for the GPU - this is something they were already doing in PSP. It actually used to be called 'semi-embeddded DRAM'.

Doing this for a portable chip vs a home console chip...it might be a different kettle of fish.
 

wsippel

Banned
...which is why I think it's fair for people to flag this up as a potential wrinkle in the Wii U hardware design that may not be properly exploited yet, and why I find the sneering about GPGPU comments a bit irritating. I don't think anyone reasonable is suggesting that there's some magic element in the hardware setup, just that perhaps there's a bit more going on here than is immediately obvious and that the design may - when properly utilised (an issue in itself, of course) - be a fair bit better than it's being given credit for at the moment.
I don't think any ports of PS360 games use GPGPU at all. Tecmo wouldn't have to bitch about the CPU being too weak to handle enemy AI if they used the GPU for crowd simulation and pathfinding (which is one area very well suited for GPUs, as demonstrated by AMD a few years ago - on a R700, incidentally) for example.
 

mrklaw

MrArseFace
I don't think any ports of PS360 games use GPGPU at all. Tecmo wouldn't have to bitch about the CPU being too weak to handle enemy AI if they used the GPU for crowd simulation and pathfinding (which is one area very well suited for GPUs, as demonstrated by AMD a few years ago - on a R700, incidentally) for example.

you could argue that some PS3 games that heavily use the SPEs are already using 'GPGPU' style code.
 

IdeaMan

My source is my ass!
...which is why I think it's fair for people to flag this up as a potential wrinkle in the Wii U hardware design that may not be properly exploited yet, and why I find the sneering about GPGPU comments a bit irritating. I don't think anyone reasonable is suggesting that there's some magic element in the hardware setup, just that perhaps there's a bit more going on here than is immediately obvious and that the design may - when properly utilised (an issue in itself, of course) - be a fair bit better than it's being given credit for at the moment.

Yes, and it's the case for the DDR3 ram also. There is clearly more to it than "lulz it's 50% less fast and pawarful than Xbox360 ram", at least in real development conditions, according to my sources. The memory should be the least of our concern. The real one is the accessibility/"easy-to-port-current-gen-titles-to" aspect. There is a problem when we see all the PR push from Nintendo around E3 2011 on this matter + comments from devs such as Darksiders 2 ones, and the witnessed results. The Wii U clearly need games to be totally tailored for it to shine, and on this point, i understand the disappointment, they should have planned a console with enough "raw grunt" without having to min-max your code and apply some programming wizardry, and/or create SDK/documentations who would act more efficiently as a bridge between the "old ways of game development on current gen hd platforms" and Wii U ones. In other words, a console less only fitted for their needs or those of first-parties, and more aware of the foreign development landscape.

My fear is that we could only see the Wii U shines technically with dedicated projects. Let's hope the SDK and middleware will keep being optimized (for the latter, they have more than 6 years of optimizations for the Xbox360/PS3 behind Vs a few months according to changelogs on Wii U), the programming mentality & habits will shift toward this new/different paradigm (relying less on the CPU, more on the GPU and side-chips like DSP, importance of eDram, etc.), yadda yadda, in order to have more satisfying visuals from multi, etc.
 
Has anyone done any power measurements in more demanding games than NSMBU?

Taking that measurement of 33 Watt for that (that was after the power supply I believe), I'd estimate (roughly!) that at most the GPU alone will probably pull is 30W. The top-end of Flops/W AMD provides in the Turks line is a bit below 15, so let's go with 15. That would put the best-case Wii U GPU GFLOPS at 450. Which seems to match up pretty well with some of the (lower end) early rumours.


450Gflops ?
If the GPU alone pull 30W, I can't see how it can reach 450Gflops.
best case would be, if it's based on RV740 which is likely given it's a GPU based on R700 line, 40nm and die size, at 30W, it would reach 360GFLOPs.
 

Thraktor

Member
All the RAM in vita is actually stacked.

I think the stacking allowed them to use wide-IO VRAM for better bandwidth for the GPU - this is something they were already doing in PSP. It actually used to be called 'semi-embeddded DRAM'.

Doing this for a portable chip vs a home console chip...it might be a different kettle of fish.

Yeah, it's quite a different thing to eDRAM. Vita's RAM is actually fairly standard (main RAM is LPDDR2, not sure about the VRAM), but it's on-chip with the CPU/GPU to allow them to use wider busses and probably achieve a slight reduction in latency. The eDRAM on the Wii U's GPU is a special kind of RAM designed specifically to be located on-die with components, to achieve extremely low latency (random access times are about 1-2ns vs about 50ns for DDR3) and very high bandwidth (up to a few hundred GB/s). Of course this means it's not cheap, which is why there's only 32MB of it.
 

Durante

Member
450Gflops ?
If the GPU alone pull 30W, I can't see how it can reach 450Gflops.
best case would be, if it's based on RV740 which is likely given it's a GPU based on R700 line, 40nm and die size, at 30W, it would reach 360GFLOPs.
I went with 15 GFlops/Watt as the top-end of what I could imagine a customized and improved Turks GPU delivering, which got me to the 450 GFlops upper limit. The point was to show that the pre-launch "3x" rumours for the GPU aren't really viable even under ideal assumptions, given what we now know about die sizes and power consumption.
 
I went with 15 GFlops/Watt as the top-end of what I could imagine a customized and improved Turks GPU delivering, which got me to the 450 GFlops upper limit. The point was to show that the pre-launch "3x" rumours for the GPU aren't really viable even under ideal assumptions, given what we now know about die sizes and power consumption.




IMO, I think those "3x" or even "4x" rumors were based on what they could expect from those components.
I mean, take an RV740 GPU. I think it's a 4x jump over 360's GPU. Now, when you underclock it to meet some weird low power consumption requirement, it's not anymore 4x, not even 3x.
I think is Nintendo wasn't that obsessed with so low power consumption, Wii U could have been more an improvement.
I mean, it would kill people if the console ate something more like 60-70W instead of that 30-40W thing.
 

Durante

Member
The only thing that still annoys me is that it seems like no one seems to have measured the power consumption with a larger variety of games. Since it's a more modern GPU than in the other consoles, the difference between low-load power consumption and high-load power consumption could also be greater.
 
The only thing that still annoys me is that it seems like no one seems to have measured the power consumption with a larger variety of games. Since it's a more modern GPU than in the other consoles, the difference between low-load power consumption and high-load power consumption could also be greater.



Sure ! Also this thing is rather annoying. I mean, measure power consumption with NSMBU ? I really doubt the game (while being nice looking) is drawing much power.
 
The only thing that still annoys me is that it seems like no one seems to have measured the power consumption with a larger variety of games. Since it's a more modern GPU than in the other consoles, the difference between low-load power consumption and high-load power consumption could also be greater.

Worth an email to somewhere like Digital Foundry suggesting a load test for, say, NSMBU, Nintendoland, BLOPS2, Darksiders 2 and Arkham City? That would cover Nintendo's inhouse titles (both running on different engines, with different demands, I'd assume) and three of the more demanding third-party titles (two using inhouse engines, one on UE3).
 
Top Bottom