• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

WiiU technical discussion (serious discussions welcome)

Awesome, thanks. This thread is my sole source of info, I don't have time to track down every tweet and news story. Thanks for the clarification. I still find it odd that in the quotes he just mentions a single core though... that's what started me on that line of reasoning.
 

AlStrong

Member
The die size seems odd for 3x Broadway cores and some SMP circuitry (and slightly more memory transistors per core) on 45nm. But I suppose shrinking a CPU to a smaller process rarely works out close to how it should theoretically.

It's quite possible there's a lot of dead space for padding purposes.
 

ahm998

Member
Well PayPal can take a while to process, then the tech guys will be doing comparisons when the image is purchased so I don't think we'll know much for another couple of weeks.

I appreciate your work members.

And teardown only for the GPU ?
 

pestul

Member
I still can't believe Nintendo opted for only a 33W full load machine. It's actually remarkable what it is able to achieve with that spec.
 

ozfunghi

Member
Sooo... does Fourth Storm really have to wait for PayPal? Can't he just front the money as he is certain it will come anyway? WE'RE GETTING IMPATIENT HERE!

:)

Seriously though, how long will this take for PayPal to clear? And how long will it take him do a count? Is this something that will take less than an hour, or will the man have to quit his dayjob for the forseeable future?
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Sooo... does Fourth Storm really have to wait for PayPal? Can't he just front the money as he is certain it will come anyway? WE'RE GETTING IMPATIENT HERE!

:)

Seriously though, how long will this take for PayPal to clear? And how long will it take him do a count? Is this something that will take less than an hour, or will the man have to quit his dayjob for the forseeable future?
Patience, young padwan. The planets have been set in motion.
 

Durante

Member
Sooo... does Fourth Storm really have to wait for PayPal? Can't he just front the money as he is certain it will come anyway? WE'RE GETTING IMPATIENT HERE!
03aoiu59.jpg
 

Gahiggidy

My aunt & uncle run a Mom & Pop store, "The Gamecube Hut", and sold 80k WiiU within minutes of opening.
Hopefully the numbers turn out promising and the press in turn picks up on this and it starts to boost sales of Wii U consoles. That's my hope.
 
Would it be farfetch'd (pun intended) to estimate that it has 486 shader units? I mean 32MB of eDRAM and a tiny leetel baybe ARM DSP can't take up THAT much space, can they (I'm being conservative in thinking that they only occupy about 30-40% combined chip real estate)?
 

pestul

Member
Would it be farfetch'd (pun intended) to estimate that it has 486 shader units? I mean 32MB of eDRAM and a tiny leetel baybe ARM DSP can't take up THAT much space, can they (I'm being conservative in thinking that they only occupy about 30-40% combined chip real estate)?

Same as 4850/4870.. dunno, but it would be nice. Clockspeeds must be down a lot to keep that power in check.
 

wsippel

Banned
Would it be farfetch'd (pun intended) to estimate that it has 486 shader units? I mean 32MB of eDRAM and a tiny leetel baybe ARM DSP can't take up THAT much space, can they (I'm being conservative in thinking that they only occupy about 30-40% combined chip real estate)?
There's both an ARM and a DSP. There also has to be other stuff in there, like TEV and EMBM units and such.
 
I still can't believe Nintendo opted for only a 33W full load machine. It's actually remarkable what it is able to achieve with that spec.

It is, but it also kind of isn't when you think about how efficient these devices are getting. I mean, even the newest iPad can produce some great looking results at a very high resolution, and it's using far less power than that.
 

pestul

Member
It is, but it also kind of isn't when you think about how efficient these devices are getting. I mean, even the newest iPad can produce some great looking results at a very high resolution, and it's using far less power than that.

Point taken. Especially with the lower process being used these days. Looking at Anand's tear down and power tests, even the proprietary disc drive uses very little power.
 

QaaQer

Member
Hopefully the numbers turn out promising and the press in turn picks up on this and it starts to boost sales of Wii U consoles. That's my hope.

Mine too, it might even give me a reason to buy one sooner rather than later...but I'm not holding my breath.
 
Point taken. Especially with the lower process being used these days. Looking at Anand's tear down and power tests, even the proprietary disc drive uses very little power.
Could that be a reason it doesn't spin down... it takes less energy to keep a disc spinning than to constantly have to accelerate? Or, it could keep it spinning at an optimal speed in order to achieve full bw with limited latency. Or a combination of both and they just haven't coded a spin down while playing off HHD?
 
Has he seen a die shot though? Maybe I'm being pedantic but surely there can be optimisations to a core that couldn't be seen on the software side. Suppose you can benchmark and compare per mhz. But I doubt performance would be identical per mhz anyway due to the differing cache setup.

Certainly seems the cores are very similar anyway. The die size seems odd for 3x Broadway cores and some SMP circuitry (and slightly more memory transistors per core) on 45nm. But I suppose shrinking a CPU to a smaller process rarely works out close to how it should theoretically.
He didn't, he's going by code efficiency and compatibility and how that doesn't seem to have changed, everything is intact and in line with a regular PPC750, taking the SMP configuration and cache aside (and SIMD features inherited from the Gekko), so, if no alarm is going off in any way it's safe for him to say it's just that; and I reckon it probably is.

Of course, he could be understimating it; new features, and new features built on top of a BC compatible platform often have to be called for specifically (otherwise the hardware will just perform as it's "supposed to"), and I'm not so sure he has access to any documentation so there's that.

It might have more SIMD intructions, somehow backported from VMX128 or specifically implemented, or we might have what Thraktor suggested, like messing with the FPU unit number and supercharging it that way; except code not expecting it, or not calling for it specifically wouldn't know that feature is there, right?

That's important because if the FPU performance was any better from the get go that could create some glitches on the BC.

It's the same principle of adding feature extensions like MMX and the like to existing platforms/technologies, if you didn't know they're there you'd never discover it by coding for it; you'd never call for them and you wouldn't know what to code for. Taking advantage of that doesn't come automagically.

On the other hand, we've heard no suggestion from anyone that actually has access to the documentation.
 

Durante

Member
What are the chances that the gpu is only something like 300 gflops? Because if it is, a new thread topic for this thing won't be pretty.
I'd say 300 or lower is extremely unlikely. IMHO, the range of likely values goes from ~350 at the low end to ~600 at the high end.
 

Schnozberry

Member
Would it be farfetch'd (pun intended) to estimate that it has 486 shader units? I mean 32MB of eDRAM and a tiny leetel baybe ARM DSP can't take up THAT much space, can they (I'm being conservative in thinking that they only occupy about 30-40% combined chip real estate)?

I think Thraktor pointed out earlier in the thread that since the GPU uses VLIW5, that the shader count would be a multiple of 5. I could have misinterpreted what he said, though.
 

Kenka

Member
By the way, talking about GFlops, are those in the WIiU comparable to the ones in the Orbis (and Durango) ? I remember they were around 1.8 and 1.2 TFlops respectively.
 

chaosblade

Unconfirmed Member
I think Thraktor pointed out earlier in the thread that since the GPU uses VLIW5, that the shader count would be a multiple of 5. I could have misinterpreted what he said, though.

VLIW5 was already confirmed? Or is it just because it's based on r700? I don't remember if VLIW4 was in the mix at that point or not.

If so that would make counting pretty easy.
 

Thraktor

Member
It's quite possible there's a lot of dead space for padding purposes.

I think there are two things at play here. For one, Anandtech's die size measurements aren't going to be 100% accurate, so the die could be smaller than we think, and for another die shrinks almost never scale perfectly, so I think the 750s + eDRAM probably fits what we know without any die padding. My main interest is if there have been any changes to the cores themselves. I'm not expecting anything huge, but it would be something that could be figured out from a die shot easily enough (although figuring out what the changes are may be a different matter entirely).

I also don't see the logic behind padding from IBM's or Nintendo's perspectives. With the size of Broadway cores at 45nm, they could fill that space with extra cores for virtually no cost (even leaving the cache at 3MB).
 

Kimawolf

Member
600? I thought the general opinion was that that was out of the question? Even 500 was borderlining it.

No that was two guy's opinion in the entire thread. We heard it could be in the mid 5s. I think 600 and above is definitely the borderline but I'd not be surprised if it was around 500 even or so.
 

Durante

Member
By the way, talking about GFlops, are those in the WIiU comparable to the ones in the Orbis (and Durango) ? I remember they were around 1.8 and 1.2 TFlops respectively.
Orbis and Durango are supposed to be GCN based, and per-GFlop, GCN is basically more efficient across the board. How big the difference is depends on the workload.

I was curious how it pans out in practice on PC, and using computerbase.de's numbers for their default set of benchmarks at 1080p, a VLIW5 GPU achieves ~29 FPS/TFLOP, while a GCN GPU averages ~37 FPS/TFLOP. Obviously this is also impacted by tons of other architectural differences, so keep that in mind. But it seems obvious that there is a very significant efficiency jump (at least in terms of using the theoretical TFLOPs) from VLIW5 to GCN.
 

Thraktor

Member
VLIW5 was already confirmed? Or is it just because it's based on r700? I don't remember if VLIW4 was in the mix at that point or not.

If so that would make counting pretty easy.

I'd be 95% confident it's VLIW5. There's a very slim chance they change the internal structure of the SIMD arrays, a la VLIW4, but it doesn't strike me as something you'd do in a game console. The reason that AMD switched to VLIW4 for some of their cards is that the fifth unit in VLIW5 arrays is specialised for complex mathematical operations, and they realised that PC games (running on highly abstracted APIs like Direct3D) don't actually make much use of these units, and they could cram more arrays on a die if they ditched them. The difference with a console is that you have much lower-level access to the hardware, and you'd expect developers to make good use of that complex math unit once they're able to access it directly.

Now, AMD and Nintendo worked on this chip for a couple of years, so you never know, but at this point I'd be surprised with anything but VLIW5.

Orbis and Durango are supposed to be GCN based, and per-GFlop, GCN is basically more efficient across the board. How big the difference is depends on the workload.

I was curious how it pans out in practice on PC, and using computerbase.de's numbers for their default set of benchmarks at 1080p, a VLIW5 GPU achieves ~29 FPS/TFLOP, while a GCN GPU averages ~37 FPS/TFLOP. Obviously this is also impacted by tons of other architectural differences, so keep that in mind. But it seems obvious that there is a very significant efficiency jump (at least in terms of using the theoretical TFLOPs) from VLIW5 to GCN.

I'd be interested to find out how much of GCN's improvement comes from the (significantly) increased amount of SRAM on-die (ie register files, caches, etc). Of course there are plenty of other architectural improvements in play, but one of the only things we've heard about the GPU is a potential increase in the size of register files, so it would be interesting to know how that might affect things compared to vanilla VLIW5.
 

ozfunghi

Member
600? I thought the general opinion was that that was out of the question? Even 500 was borderlining it.

Durante must be in a good mood, it's the most optimistic i've yet seen him in this topic.

At 480 SPU's, we get 528 Gflops. At 320, we get 352 Gflops.

A question for the immortals... how likely is it that the spu count will stray from what we 've come to expect... 320, 480, 640, 800...

Is it possible that the GPU turns our to have, say, 418 SPU's? Or is it always a multiple of 160?
 

chaosblade

Unconfirmed Member
I'd be 95% confident it's VLIW5. There's a very slim chance they change the internal structure of the SIMD arrays, a la VLIW4, but it doesn't strike me as something you'd do in a game console. The reason that AMD switched to VLIW4 for some of their cards is that the fifth unit in VLIW5 arrays is specialised for complex mathematical operations, and they realised that PC games (running on highly abstracted APIs like Direct3D) don't actually make much use of these units, and they could cram more arrays on a die if they ditched them. The difference with a console is that you have much lower-level access to the hardware, and you'd expect developers to make good use of that complex math unit once they're able to access it directly.

Now, AMD and Nintendo worked on this chip for a couple of years, so you never know, but at this point I'd be surprised with anything but VLIW5.

That kind of goes with something I had wondered about before too. It might have come up in this topic at some point already. GCN is definitely more efficient, but I'd wondered (a while before the WiiU even launched) how much (if any) benefit VLIW might see in a console. I can't imagine it would be enough to match GCN, of course.

This makes me think there should be at least a little benefit over PC cards.
 

Durante

Member
Durante must be in a good mood, it's the most optimistic i've yet seen him in this topic.
Not really, it's just that this time I wanted to provide a very broad range that covers everything anyone could ever reasonably expect -- and that's 320 to 560 ALUs or 350 to 600 GFlops.

As for the actual number, I'd be very surprised if its not at least a multiple of 40.
 

Thraktor

Member
Durante must be in a good mood, it's the most optimistic i've yet seen him in this topic.

At 480 SPU's, we get 528 Gflops. At 320, we get 352 Gflops.

A question for the immortals... how likely is it that the spu count will stray from what we 've come to expect... 320, 480, 640, 800...

Is it possible that the GPU turns our to have, say, 418 SPU's? Or is it always a multiple of 160?

All (I think) VLIW5 GPUs are set up with either one texture unit for 80 ALUs or one texture unit for 40 (on the lower end cards). There's nothing to say Latte has to stick to this set-up, but there's also nothing to say it would deviate from it.
 

Durante

Member
That kind of goes with something I had wondered about before too. It might have come up in this topic at some point already. GCN is definitely more efficient, but I'd wondered (a while before the WiiU even launched) how much (if any) benefit VLIW might see in a console. I can't imagine it would be enough to match GCN, of course.
I think it's always inconvenient to talk about "efficiency" in this context, since what we actually mean is how well the system can make use of its theoretical throughput in practice. We completely disregard the architectures' relative efficiency in providing that theoretical throughput.

Just looking at FPS/GFLOP, GCN is superior in almost every case, but for FPS/Watt, the picture may not be quite as clear. Of course, the latter isn't really applicable when we try to estimate console capabilities from their theoretical FLOP count.
 

USC-fan

Banned
I dont know if we can trust the numbers done by the couple of people that are looking at the image. Also just because its on the chip doesn't mean they are active.

Unless we get official specs than its just a guess. If they can even read the photos correctly...

I can see this ending very badly!
 
I dont know if we can trust the numbers done by the couple of people that are looking at the image. Also just because its on the chip doesn't mean they are active.

Unless we get official specs than its just a guess. If they can even read the photos correctly...

I can see this ending very badly!




Why can't we trust them ?
 

tipoo

Banned
Hopefully the numbers turn out promising and the press in turn picks up on this and it starts to boost sales of Wii U consoles. That's my hope.

The Joe and Jane consumers of the world care nothing for the details we're looking for, and they'll be the largest portion of buyers. And enthusiasts who really care about the technical nitty gritty would probably want Durango or Orbis instead, no?

I want specs for the sake of knowing the specs, I don't think it will hinder or help it in any major way beyond maybe a few thousand sales.
 

tipoo

Banned
By the way, talking about GFlops, are those in the WIiU comparable to the ones in the Orbis (and Durango) ? I remember they were around 1.8 and 1.2 TFlops respectively.



Depends, if the Wii U is based around the HD4000-HD6000 series it may be different than the GCN based PS4 and 720. I think GCN has a higher actual output in relation to theoretical max Gflops.
 

ozfunghi

Member
I dont know if we can trust the numbers done by the couple of people that are looking at the image. Also just because its on the chip doesn't mean they are active.

Unless we get official specs than its just a guess. If they can even read the photos correctly...

I can see this ending very badly!

Ugh. I'm sure they have been working for a couple of years on a customized GPU, just to have crap on it that doesn't work or is not being used. And why shouldn't you be able to trust the numbers counted? I'm sure 4thStorm will ask for feedback and a second opinion.

But whatever fits your asinine agenda, i guess.

Analyzing the photo will be a step closer to in the right direction compared to any speculation in this thread. Unless you're more intrigued by the latter.

He is; that way he can keep bullshitting everyone without repercussion.
 

OryoN

Member
I dont know if we can trust the numbers done by the couple of people that are looking at the image. Also just because its on the chip doesn't mean they are active.

Unless we get official specs than its just a guess. If they can even read the photos correctly...

I can see this ending very badly!

Analyzing the photo will be a step closer to in the right direction compared to any speculation in this thread. Unless you're more intrigued by the latter.
 

prag16

Banned
I dont know if we can trust the numbers done by the couple of people that are looking at the image. Also just because its on the chip doesn't mean they are active.

Unless we get official specs than its just a guess. If they can even read the photos correctly...

I can see this ending very badly!

Ah, pre-emptive damage control in case the results come out better than expected. Solid tactic.
 
I dont know if we can trust the numbers done by the couple of people that are looking at the image. Also just because its on the chip doesn't mean they are active.

Unless we get official specs than its just a guess. If they can even read the photos correctly...

I can see this ending very badly!

Is your issue with the particular individuals who are doing the checking? If so, I'm sure you could have volunteered your services earlier...

If it's with the idea of people trying to figure something out based on a die shot, then I'm sure whatever assessment the people involved give will be appropriately presented - disclaimers etc. - right, guys?
 

chaosblade

Unconfirmed Member
I think it's always inconvenient to talk about "efficiency" in this context, since what we actually mean is how well the system can make use of its theoretical throughput in practice.

Well, that's what I meant. I know VLIW suffered on PC because games generally weren't optimized for it, but I have no idea how much optimization can be done - if any - to get more out of it. I didn't actually mean to bring GCN in as a comparison, only mentioned it so I wouldn't come off as trying to pass VLIW's higher theoretical throughput as a way it could "magically" match the other two next-gen consoles.

I'm just thinking about a comparison between VLIW in a PC where you're dealing with more software layers and not targeting specific hardware vs VLIW in a console. Apparently most optimizations would deal with scheduling and I don't know if that's even feasible in most situations.

And I'll admit I don't know much more about this stuff than the average GAFfer. Just enough to follow this thread without sitting at my desk slack-jawed and confused. So I could be totally mistaken on what I said.

Ugh. I'm sure they have been working for a couple of years on a customized GPU, just to have crap on it that doesn't work or is not being used. And why shouldn't you be able to trust the numbers counted? I'm sure 4thStorm will ask for feedback and a second opinion.

But whatever fits your asinine agenda, i guess.

Sony had a SPU disabled to improve yields. Nintendo could do something similar here, but I don't think they would.
 
Top Bottom