• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

WiiU technical discussion (serious discussions welcome)

Earendil

Member
Here's my problem with the "3 Broadways glued together" theory. At 90nm, Broadway was 18.9sqmm in size. Espresso is built on a 45nm process. This reduces the size of the chip to 25%, or 4.725sqmm. Three of these together is 14.175sqmm. Espresso is 32.76sqmm in size. That leaves 15.585sqmm. 3MB of eDRAM and silicon for SMP would not take this much space on the die. So if we are simply looking at higher clocked, unmodified Broadways, what is using the rest of the die space?

Is there anything they can do? Cause it's sloooow. Too slow. It's a lot of waiting and that makes it less fun to use.
It gets annoying from the first moments you use it.

Short answer: Yes. Long answer: It depends on how the OS is designed. I don't have one yet, so I can't speak to the sluggishness with any experience. But from what I hear, the issue seems to only occur when loading apps or accessing the menu from a game. Once you are in Netflix, Hulu, etc... it runs fine. This leads me to believe that they are either reinitializing the entire OS each time, or they have not yet optimized the memory footprint and whatever system handles loading applications. My development experience is mostly with databases application, so I don't know entirely what is involved in building an OS. I know that in my case, database access is the single biggest bottleneck in any application I may develop. Hopefully, there is something similar in this OS that is the problem. I don't mean database access per se, but simply that there may be one primary bottleneck causing the sluggishness and it may or may not be something that is simple to fix. Really, only Nintendo knows for sure.
 

The_Lump

Banned
Question for those in the know, what does the information on WiiU CPU mean for non-game performance? Could that be a reason for the slow OS or is it more a software design issue?

I'm not "in the know" But it is almost certainly software related. Not least because I find it unlikely Nintendo would build there OS without the hardware its going to run on in mind.

Any truth to the OS running on the ARM chip rumour? Read that on B3D, wasn't sure if true and couldn't find it anywhere.
 

Earendil

Member
I'm not "in the know" But it is almost certainly software related. Not least because I find it unlikely Nintendo would build there OS without the hardware its going to run on in mind.

Any truth to the OS running on the ARM chip rumour? Read that on B3D, wasn't sure if true and couldn't find it anywhere.

Still just a rumor as far as I know. But even an ARM chip shouldn't have trouble with the OS.
 
Here's my problem with the "3 Broadways glued together" theory. At 90nm, Broadway was 18.9sqmm in size. Espresso is built on a 45nm process. This reduces the size of the chip to 25%, or 4.725sqmm. Three of these together is 14.175sqmm. Espresso is 32.76sqmm in size. That leaves 15.585sqmm. 3MB of eDRAM and silicon for SMP would not take this much space on the die. So if we are simply looking at higher clocked, unmodified Broadways, what is using the rest of the die space?

Others can explain far better than I can why that is the case, but Gekko is 43mm on 180 nm, so at 90 nm Broadway should be 10.75 mm and not 18.9 mm. There were a few changes I think but nothing close to slapping even just 3 Broadways with extra cache and all the requisite additions would require. Even so Broadway was 76% larger than a simple 1/4 size expectation.

Still just a rumor as far as I know. But even an ARM chip shouldn't have trouble with the OS.

Does anyone know if there has been more light shed on this? I thought I read the helper chips were DSP, ARM for I/O and security, ARM for OS. Of course I could have easily mistook IO for OS or could have missed the rumor tag.
 
Well, i said that since months :p

And it's even more the case for the CPU. As i hinted a moment ago now, a big studio managed to self-hinder how much throughput they could retrieve from the CPU. I would say between 20 and 40% to be more precise. It's huge yes. It's related to the way their engine put the U-CPU to use. It was corrected since then, but very lately in the development cycle. Now it doesn't mean every studio encountered this issue, but at the very least that it's an enough "unique" architecture that requires some adaption/learning. This studio developed on Wii also so they already knew Broadway CPU, therefore i doubt they would have met this difficulty with the U-CPU if it was just a three-core Broadway (although it's a possibility, they may just have messed up, that's all).

When you combine all that have been revealed on the learning curve + improvements (either hardware with dev kit revisions, or software with SDK, etc.) these past months + the fact that the system can run late-gen-ports + plenty of other info (between the fab process, the eDram, how mercan analyzed this CPU through hacking, "softwarely", and not directly the circuitry, so it may only means that it's compatible with Broadway, etc etc.) + reassuring comments from several sources (albeit countered by other negative ones, but that means there's a correct way to use it, in balance with the other components) = there's more to it than just "lol 1.2ghz 3-core Broadway" + above all, that the second-gen titles on Wii U will start from saner grounds that the firsts who were constrained by several not optimized factors (however, you could say that for most systems).

Now, i'm not defending Nintendo hardware choices, i would have gladly exchanged 30euros+bigger casing for a CPU that directly delivers more grunt without having to deploy huge efforts of optimizations here and there (by using the GPGPU aspect of the GPU if it's really a GPU-centric system, the DSP, ARM chip, etc.). It would have warranted better launch window ports, the superior versions (at least a tad) that i expected.

I take it that was the Batman devs :p.
 

Gahiggidy

My aunt & uncle run a Mom & Pop store, "The Gamecube Hut", and sold 80k WiiU within minutes of opening.
Here's my problem with the "3 Broadways glued together" theory. At 90nm, Broadway was 18.9sqmm in size. Espresso is built on a 45nm process. This reduces the size of the chip to 25%, or 4.725sqmm. Three of these together is 14.175sqmm. Espresso is 32.76sqmm in size. That leaves 15.585sqmm. 3MB of eDRAM and silicon for SMP would not take this much space on the die. So if we are simply looking at higher clocked, unmodified Broadways, what is using the rest of the die space?


....

Pikmin dormitories?
 

Earendil

Member
Others can explain far better than I can why that is the case, but Gekko is 43mm on 180 nm, so at 90 nm Broadway should be 10.75 mm and not 18.9 mm. There were a few changes I think but nothing close to slapping even just 3 Broadways with extra cache and all the requisite additions would require. Even so Broadway was 76% larger than a simple 1/4 size expectation.

If we use that same 76% ratio, then it makes more sense. That would leave roughly 8sqmm, which could be the cache and controller.

It just all seems so hard to believe that they would do this. There has to be more to it.

Pikmin dormitories?

Mystery solved!
 

MDX

Member
The TEV weren't exactly texture units, they are very similar to NVidia's Register Combiners introduced in the TNT and pre-shader GeForce GPUs. They are pixel shaders' direct ancestors, allowing the games to "program" how multiple texture layers and other inputs were combined to produce the final pixel color.

So its so old that it can't even be translated to todays terms.
No wonder developers didnt want to deal with it.
 

MDX

Member
Well, i said that since months :p

And it's even more the case for the CPU. As i hinted a moment ago now, a big studio managed to self-hinder how much throughput they could retrieve from the CPU. I would say between 20 and 40% to be more precise. It's huge yes. It's related to the way their engine put the U-CPU to use. It was corrected since then, but very lately in the development cycle.


I dont know if somebody already asked this, but is this info from a launch game, or one that currently is still being worked on?
 

MDX

Member
Anyway, after a year and a half of speculation and discussion with you fine folks, I finally have a Wii U sitting in a box next to me, and I'm about to head home to plug it in and start playing. So you'll have to excuse me if I start posting a whole lot less for the next few days :)


Lucky U
 

Earendil

Member
I think the software is just bloated

Agreed. Case in point (from my perspective):

Several years ago, I worked as a web developer for a magazine. I was tasked with going through the site and cleaning up queries, and optimizing the code as much as possible considering it was a pile of crap to begin with. One of our brilliant (I use the term loosely) former developers had written a page with an alphabetical listing of all our authors. In his infinite wisdom, he had designed the page to run 1 query for each letter. So all in all, 26 queries on one page. This page took 45 seconds to load. After I picked myself up off the floor and waited for my sides to stop hurting, I changed the page to get all the authors in 1 query and the page then loaded in less than 2 seconds.

I'm not saying that this level of incompetence is happening with the Wii U OS, but you can see how a simple change made a dramatic difference in performance.
 

Gahiggidy

My aunt & uncle run a Mom & Pop store, "The Gamecube Hut", and sold 80k WiiU within minutes of opening.
Agreed. Case in point (from my perspective):

Several years ago, I worked as a web developer for a magazine. I was tasked with going through the site and cleaning up queries, and optimizing the code as much as possible considering it was a pile of crap to begin with. One of our brilliant (I use the term loosely) former developers had written a page with an alphabetical listing of all our authors. In his infinite wisdom, he had designed the page to run 1 query for each letter. So all in all, 26 queries on one page. This page took 45 seconds to load. After I picked myself up off the floor and waited for my sides to stop hurting, I changed the page to get all the authors in 1 query and the page then loaded in less than 2 seconds.

I'm not saying that this level of incompetence is happening with the Wii U OS, but you can see how a simple change made a dramatic difference in performance.
Code:
<cfloop list="A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z" index="i">

	<cfquery datasource="mazagineDB" name="getAuthors">
	SELECT *
	FROM authors
	WHERE lastname LIKE '#i#%'
	</cfquery>
	
	<h2><cfoutput>#i#</cfoutput></h2>
	
	<ul>
		<cfoutput query="getAuthors">
			<li>#lastname#, #firstname#</li>
		</cfoutput>
	</ul>

</cfloop>
 

Earendil

Member
code snippet

Hahaha How did you know it was in CF?

Anyway, this illustrates my point. Like this, the page runs 26 queries. When instead it could be done like this:

Code:
<cfquery datasource="magazineDB" name="qryAuthors">
	SELECT
			a.lastName,
			a.firstName,
			LEFT(a.lastName, 1) AS lastInitial
	FROM
			authors a
</cfquery>

<cfoutput query="qryAuthors" group="lastInitial">
	<h2>#qryAuthors.lastInitial#</h2>
	
	<ul>
		<cfoutput>
			<li>#qryAuthors.lastName#, #qryAuthors.firstName#</li>
		</cfoutput>
	</ul>
	
</cfoutput>

Now, I don't know if the Wii U OS has issues like this page did, but if it does, then at some point, they should be able to improve the speed.
 

Jeffa

Neo Member
If we use that same 76% ratio, then it makes more sense. That would leave roughly 8sqmm, which could be the cache and controller.

It just all seems so hard to believe that they would do this. There has to be more to it.

Remember 18.9mm die on 90nm process includes 256KB SRAM for L2 cache. WiiU CPU now uses DRAM intead. SRAM takes up 6x the space of DRAM. The 3MB eDRAM would take up less space than 3x 256KB SRAM.
 

Thraktor

Member
Here's my problem with the "3 Broadways glued together" theory. At 90nm, Broadway was 18.9sqmm in size. Espresso is built on a 45nm process. This reduces the size of the chip to 25%, or 4.725sqmm. Three of these together is 14.175sqmm. Espresso is 32.76sqmm in size. That leaves 15.585sqmm. 3MB of eDRAM and silicon for SMP would not take this much space on the die. So if we are simply looking at higher clocked, unmodified Broadways, what is using the rest of the die space?

In a previous post, I calculated the total eDRAM cache in the CPU to come to 10.16mm². If you consider that die shrinks don't always achieve perfect 50% size reductions per node, and that the measurement of the Espresso die might be off by a millimetre here or there, then three Broadways + 3MB eDRAM fits the info we have. There might be some minor adjustments to the cores (adding no more than maybe 30% or 40% to their size), but there isn't really a vast amount of space in there to play with.


Thanks. I'm playing the best game ever. It's called "Waiting for the software update to download" :p

Edit:

Remember 18.9mm die on 90nm process includes 256KB SRAM for L2 cache. WiiU CPU now uses DRAM intead. SRAM takes up 6x the space of DRAM. The 3MB eDRAM would take up less space than 3x 256KB SRAM.

This is a valid point. I hadn't actually calculated the SRAM size originally, but you're quite right. On a technical point, if there's a 3x density increase when moving to eDRAM, then 3MB of eDRAM will actually take up 33% more space than 2 x 256KB of SRAM.
 

Earendil

Member
In a previous post, I calculated the total eDRAM cache in the CPU to come to 10.16mm². If you consider that die shrinks don't always achieve perfect 50% size reductions per node, and that the measurement of the Espresso die might be off by a millimetre here or there, then three Broadways + 3MB eDRAM fits the info we have. There might be some minor adjustments to the cores (adding no more than maybe 30% or 40% to their size), but there isn't really a vast amount of space in there to play with.



Thanks. I'm playing the best game ever. It's called "Waiting for the software update to download" :p

Thanks for clarifying. Though I like the "pikmin dormitories" idea better. :(
 
It depends on how the OS is designed. I don't have one yet, so I can't speak to the sluggishness with any experience. But from what I hear, the issue seems to only occur when loading apps or accessing the menu from a game. Once you are in Netflix, Hulu, etc... it runs fine. This leads me to believe that they are either reinitializing the entire OS each time,

It does look like the OS is reinitializing each time - from a user standpoint, I mean. It takes 20+ seconds to get back to the home screen from games and most apps, and even from the System Settings. But from Miiverse or the Internet browser, it doesn't do that reboot, and the home screen comes right up.

What I'd like to know is what's happening during the "splash screen" that comes up when you launch something. I think they're all the same duration, and nothing appears to be happening at all - not even game loading, which happens later. I've wondered if it's "un-loading" the OS and loading the in-game home button menu.
 

Thraktor

Member
Thanks for clarifying. Though I like the "pikmin dormitories" idea better. :(

Actually, after taking into account Jeffa's comment, we can easily calculate the extra space used up by the eDRAM cache over the SRAM cache as just 2.54mm². That brings us back to the point where things don't really add up.
 

Earendil

Member
Actually, after taking into account Jeffa's comment, we can easily calculate the extra space used up by the eDRAM cache over the SRAM cache as just 2.54mm². That brings us back to the point where things don't really add up.

I knew it!

jeremy-brett-as-holmes-e13171668122.jpg
 

Gahiggidy

My aunt & uncle run a Mom & Pop store, "The Gamecube Hut", and sold 80k WiiU within minutes of opening.
Hahaha How did you know it was in CF?

Anyway, this illustrates my point. Like this, the page runs 26 queries. When instead it could be done like this:

....

Now, I don't know if the Wii U OS has issues like this page did, but if it does, then at some point, they should be able to improve the speed.
Lucky coincidence as that's the one language I happen to work with, though, the poor coding you described seemed like something us CF programmers might commit, with its easy and "block" like way of querying the database.

btw, I'm not experiencing too bad load times on my Wii U. Really, its seems to be between 5-8 seconds jumping from App to App. After Miiverse finally became live and populated those first two days from launch, the load times became tolerable.
 

Earendil

Member
Lucky coincidence as that's the one language I happen to work with, though, the poor coding you described seemed like something us CF programmers might commit, with its easy and "block" like way of querying the database.

btw, I'm not experiencing too bad load times on my Wii U. Really, its seems to be between 5-8 seconds jumping from App to App. After Miiverse finally became live and populated those first two days from launch, the load times became tolerable.

We should talk later...

@Bolded: That's interesting. Was it just for Miiverse? I wonder what difference that would make. It might be coincidental though.
 

Thraktor

Member
This may have already been mentioned, but anyone think that the ri-goddamn-diculous slowness of the OS could be down to encryption?

If the designers had any sense they'd encrypt the OS in the flash ram to at least slow down the hackers, worked great for Sony. They even dedicated an SPE in the Cell to OS/HD realtime decryption didn't they?
I'm wondering if the OS programmers did all their dev work on a modern PC, all of which have hardware-based AES-NI encryption that descrambles that stuff 4-7 times faster than a chip without hardware support.

Either they forgot or didn't realise that the CPU in the final kit, nor the security chip that Marcan found, had sufficient juice to decrypt the OS as it comes off the internal storage as fast as it seemed on the dev boxes.

Just my theory anyway, i literally can't think of any other reason for such a shockingly slow OS. Not slow in running, a Raspberry Pi could run it probably, but re-initialising the whole OS footprint could take a while, and its insistence on reloading the whole thing when exiting from most apps could be another security "feature" to protect against in-memory attacks.

The thing is that the entire basis of their decision on "what security co-processor should we use?" would have been based on fairly detailed analysis of the expected encryption/decryption throughput that it would need to provide. They wouldn't just pick any old chip and then "not realise" that it didn't have the needed specs.
 

Earendil

Member
If I were designing an OS, here's how I would do it.

I would have a listener service running at all times. When a new app or game is "installed", it would receive an encrypted service key. Anytime it needed to communicate with another app, such as Miiverse, it would call the listener service, pass the service key and a API method name. For instance, you are playing NSMBU and you post to Miiverse. NSMBU would call the listener service with the following info:

ServiceKey: LA)k21%0ss0@23Sw2W01!
Action: system.miiverse.postmessage

The listener service decrypts the service key and checks to see if it is valid. Then it makes the call to the Miiverse service (which would be a lightweight service that simply handles posting and retrieving messages). The Miiverse service does it's thing, and passes back a status code to the listener. This is then passed to NSMBU and you as the user get a message that your post has been added successfully.

This is lightweight, relatively secure (you could add a session key to the service call also) and it would be reasonably fast.
 
could the GPGPU make up for the lack of new SIMDs?

Okay. A couple of things here. There really isn't any such thing as a GPGPU. A GPGPU is just a way of saying that a GPU can be used to offload tasks that are typically done by a CPU, but can be done more efficiently by the GPU. So pretty much ANY GPU can be a GPGPU. This isn't new. The PS3 and Xbox 360 could offload stuff to the GPU if they wanted to. There's just been an increased software focus in the capability with the introduction of technologies like OpenCL, DirectX 11, and Cuda.

Second off, if you want the GPU to being doing ANY considerable lifting in addition to rendering games, you're also going to want a pretty hefty GPU. Others could speak on this better than I could, probably, but my takeaway, and somebody more knowledgable on GAF should confirm this, is that unless the Wii U has a high-end older AMD GPU, or a modern middle-end+ GPU, offloading CPU tasks to the GPU is not going to magically save this system. It's just not going to have a ton of power left over to do that kind of stuff.

I can confirm, however, that up until basically the current AMD GPU architecture, AMD's GPGPU performance has been pretty terrible compared to the competition.
 

The_Lump

Banned
Still just a rumor as far as I know. But even an ARM chip shouldn't have trouble with the OS.


No I agree. I was just bringing it up as the OS talk reminded me of it.

There is no way Nintendo designed software which couldn't run on their hardware - it's just normal teething problems.
 

AzaK

Member
The hardware was reportedly in a state of flux until fairly recently, so the OS and security teams couldn't run on final hardware for a long time, and since then have been busy squashing bugs and sorting out that monster of a patch.

Based on teething problems, unacceptable load times and massive day one updates, not to mention dubious build quality with a rattling GamePad (Mine does, buttons not seated and two in store other ones were the same when we checked), it's clear Nintendo rushed this thing. Still I'm having fun ;)
 
Okay. A couple of things here. There really isn't any such thing as a GPGPU. A GPGPU is just a way of saying that a GPU can be used to offload tasks that are typically done by a CPU, but can be done more efficiently by the GPU. So pretty much ANY GPU can be a GPGPU. This isn't new. The PS3 and Xbox 360 could offload stuff to the GPU if they wanted to. There's just been an increased software focus in the capability with the introduction of technologies like OpenCL, DirectX 11, and Cuda.

Second off, if you want the GPU to being doing ANY considerable lifting in addition to rendering games, you're also going to want a pretty hefty GPU. Others could speak on this better than I could, probably, but my takeaway, and somebody more knowledgable on GAF should confirm this, is that unless the Wii U has a high-end older AMD GPU, or a modern middle-end+ GPU, offloading CPU tasks to the GPU is not going to magically save this system. It's just not going to have a ton of power left over to do that kind of stuff.

I can confirm, however, that up until basically the current AMD GPU architecture, AMD's GPGPU performance has been pretty terrible compared to the competition.
All very true which is part of the reason GPGPU has been used as a way to mock those who use without intricate knowledge of what that entails.
 

USC-fan

Banned
Okay. A couple of things here. There really isn't any such thing as a GPGPU. A GPGPU is just a way of saying that a GPU can be used to offload tasks that are typically done by a CPU, but can be done more efficiently by the GPU. So pretty much ANY GPU can be a GPGPU. This isn't new. The PS3 and Xbox 360 could offload stuff to the GPU if they wanted to. There's just been an increased software focus in the capability with the introduction of technologies like OpenCL, DirectX 11, and Cuda.

Second off, if you want the GPU to being doing ANY considerable lifting in addition to rendering games, you're also going to want a pretty hefty GPU. Others could speak on this better than I could, probably, but my takeaway, and somebody more knowledgable on GAF should confirm this, is that unless the Wii U has a high-end older AMD GPU, or a modern middle-end+ GPU, offloading CPU tasks to the GPU is not going to magically save this system. It's just not going to have a ton of power left over to do that kind of stuff.

I can confirm, however, that up until basically the current AMD GPU architecture, AMD's GPGPU performance has been pretty terrible compared to the competition.

The r700 is "first gen" compute shaders support and performance is pretty poor. So every dx10+ gpu is a GPGPU but some are better at this than others.

At this point we dont have any info that anyone is using the GPGPU functions of the wiiu. Now you have forum poster that have stated it was design to be a "gpgpu to help the CPU" like its a known fact. One of the best uses is GPGPU-accelerated physic and Nintendo paid for the Havok physic engine to be ported to the wiiu and they have stated it run on the CPU. Unlike on a PC where this would run on GPU.
 

The_Lump

Banned
The r700 is "first gen" compute shaders support and performance is pretty poor. So every dx10+ gpu is a GPGPU but some are better at this than others.

At this point we dont have any info that anyone is using the GPGPU functions of the wiiu. Now you have forum poster that have stated it was design to be a "gpgpu to help the CPU" like its a known fact. One of the best uses is GPGPU-accelerated physic and Nintendo paid for the Havok physic engine to be ported to the wiiu and they have stated it run on the CPU. Unlike on a PC where this would run on GPU.

To be fair though, the "GPGPU" notion came directly for Nintendo's mouth. That's why people were quoting it.

But you're right, we don't know if any developers are currently using the GPU for none-graphical computational purposes. Nor do we know how effectively/ineffectively "Latte" can/could handle these tasks.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Okay. A couple of things here. There really isn't any such thing as a GPGPU. A GPGPU is just a way of saying that a GPU can be used to offload tasks that are typically done by a CPU, but can be done more efficiently by the GPU. So pretty much ANY GPU can be a GPGPU. This isn't new. The PS3 and Xbox 360 could offload stuff to the GPU if they wanted to. There's just been an increased software focus in the capability with the introduction of technologies like OpenCL, DirectX 11, and Cuda.
Most current GPUs (basically all PC-realm GPUs) are GPGPUs. And yes, you could run 'compute tasks' on an SST-1 back in '96, for some definition of compute task. That does not mean that any GPU can present a modern GPGPU pipeline to the developer. There's a very rudimentary and formal split, and it is the following: does the device have some GPGPU API support (OpenCL, CUDA, etc). That basically indicates if the IHV deems said device as a viable GPGPU contender, meeting the industry's interpretation of GPGPU. Now, while you surely could run compute task on an RSX (for a wider definition of the term compared to an SST-1), G70 is not supported by nvidia in any of their GPGPU API releases. What does that tell you?

Second off, if you want the GPU to being doing ANY considerable lifting in addition to rendering games, you're also going to want a pretty hefty GPU.
That's absolutely not true - there are bazillion of configurations out there where even a modest GPU could have what to offer over its companion CPU for a certain domain of tasks. I'm running LuxRays on my netbook's GPU, which beats the hell out of running Lux on the netbook's CPU. That's a typical GPGPU scenario.

Others could speak on this better than I could, probably, but my takeaway, and somebody more knowledgable on GAF should confirm this, is that unless the Wii U has a high-end older AMD GPU, or a modern middle-end+ GPU, offloading CPU tasks to the GPU is not going to magically save this system. It's just not going to have a ton of power left over to do that kind of stuff.
It's not about what power would be left after setting the GPU on a compute task (every GPU could be choked with the 'right' compute workload). It's about what component in the system could do that task best.

I can confirm, however, that up until basically the current AMD GPU architecture, AMD's GPGPU performance has been pretty terrible compared to the competition.
GPGPU is a darn wide domain of tasks. Overgeneralizations are bad (tm).
 

Thraktor

Member
Okay. A couple of things here. There really isn't any such thing as a GPGPU. A GPGPU is just a way of saying that a GPU can be used to offload tasks that are typically done by a CPU, but can be done more efficiently by the GPU. So pretty much ANY GPU can be a GPGPU. This isn't new. The PS3 and Xbox 360 could offload stuff to the GPU if they wanted to. There's just been an increased software focus in the capability with the introduction of technologies like OpenCL, DirectX 11, and Cuda.

Second off, if you want the GPU to being doing ANY considerable lifting in addition to rendering games, you're also going to want a pretty hefty GPU. Others could speak on this better than I could, probably, but my takeaway, and somebody more knowledgable on GAF should confirm this, is that unless the Wii U has a high-end older AMD GPU, or a modern middle-end+ GPU, offloading CPU tasks to the GPU is not going to magically save this system. It's just not going to have a ton of power left over to do that kind of stuff.

I can confirm, however, that up until basically the current AMD GPU architecture, AMD's GPGPU performance has been pretty terrible compared to the competition.

One of the main reasons (although certainly not the only one) that there was such a big improvement in compute performance for GCN cards over VLIW cards is that there's a hell of a lot more memory on the die. Bigger caches, more register memory, basically more SRAM wherever AMD could cram it. On-die memory is a big deal for compute performance on GPUs because the latency penalties for reading from GDDR5 are enormous. I don't have the numbers for AMD cards, but it's 200+ cycles for CUDA on Nvidia's GPUs. The compute units in GCN cards are getting much more bang for their Gflops largely because they're being fed with data much more efficiently than VLIW cards were, but that leads us to the biggest difference between the "Latte" GPU in Wii U and an off-the-shelf VLIW GPU; it's got 32MB of eDRAM right there on die which it can access at far higher bandwidth and far lower latency than any external memory. This isn't some magical feature that'll propel Latte's GPGPU functionality up to that of a 7970, but if used properly it should enable it to significantly outperform a similarly configured 4000/5000/6000 series card in many compute tasks.

Also, the notion of needing a "pretty hefty" GPU somewhat misses the mark. Of course, it's always better to have a more powerful GPU, and it's always better to have a more powerful CPU, but for a company like Nintendo, it's all about transistor efficiency. If they're deciding which component should be running physics code, they're choosing between adding X number of transistors to the CPU (lets say adding in some beefy SIMD units) and adding Y number of transistors to the GPU (lets say more SPUs) to get the same amount of performance. What they'll find is that X > Y, and that it costs more (probably a lot more) transistors to get the same performance out of the CPU as out of the GPU. Hence why they've combined a tiny CPU die with a far, far bigger GPU die, because the GPU's simply giving them more bang for their transistor buck. The same is true with heat/energy usage. They've got a finite (and pretty small) number of watts the system has to be able to run off, and they have to choose which gives them the better performance per watt, which is once again the GPU. Given the clock rates and die sizes, the GPU is obviously using up the vast majority of the system's power, and they've set it up like that because giving those precious watts to the GPU allows it to do more than the CPU would with them.

At this point we dont have any info that anyone is using the GPGPU functions of the wiiu. Now you have forum poster that have stated it was design to be a "gpgpu to help the CPU" like its a known fact.

Some numbers for you:

The Xenon CPU is a 165m transistor part and the Xenos GPU consists of two dies totalling 337m transistors. That's a GPU:CPU transistor ratio of 2.04:1.

The Cell CPU is a 241m transistor part and the RSX GPU is a 300 million transistor part. That's a GPU:CPU transistor ratio of 1.24:1.

Now, we don't have transistor counts for Espresso and Latte, but we can use die sizes instead (which will actually understate the ratio, as the CPU is made on a slightly bigger node). The Espresso CPU is 32.76mm² and the Latte GPU is 156.21mm². That gives a GPU:CPU die size ratio of 4.77:1.

Now, given the available evidence (and not even considering the clock speeds), would you honestly say that Nintendo intends on the exact same division of work between the CPU and GPU as on XBox360 or PS3?
 

IdeaMan

My source is my ass!
I dont know if somebody already asked this, but is this info from a launch game, or one that currently is still being worked on?

launch game. They basically fixed this problem just mere weeks if i'm recalling it right before their project going gold. Apparently it could have concerned several titles, although it wasn't witnessed in every games in development on Wii U i was aware of. At least it could indicates what i said here.
 

wsippel

Banned
The r700 is "first gen" compute shaders support and performance is pretty poor. So every dx10+ gpu is a GPGPU but some are better at this than others.
No. GPGPU is much, much older. Also, Nintendo itself made GPGPU a big deal. That's straight from the horses mouth. And many people will tell you that the biggest problem with GPGPU was always bandwidth and latency - areas Nintendo heavily focussed on. The Wii U isn't a system designed to support GPGPU. It's a system designed to use GPGPU.
 

MDX

Member
WiiU's clocks dont seem to make any sense

But, after hearing that the GPU was increased from 400mhz to 550, then I do think the original clocks were balanced:

1200mhzCPU, 800mhzRAM, 400mhzGPU, 200mhzDSP

But when Nintendo slightly increased the speeds, to offer more performance for developers who might of complained, they didnt bother to keep their traditional multipliers anymore. Maybe this was due to making late last minute changes.

Seems like this console, with its OS, online network, and gamepad, etc, has put a lot of stress on Nintendo as a company to bring everything to fruition for launch.
 

MDX

Member
launch game. They basically fixed this problem just mere weeks if i'm recalling it right before their project going gold. Apparently it could have concerned several titles, although it wasn't witnessed in every games in development on Wii U i was aware of. At least it could indicates what i said here.


So what you are saying, there is a game out there that, if they hadn't fixed those problems, would have been performing very badly??
 

ozfunghi

Member
WiiU's clocks dont seem to make any sense

But, after hearing that the GPU was increased from 400mhz to 550, then I do think the original clocks were balanced:

1200mhzCPU, 800mhzRAM, 400mhzGPU, 200mhzDSP

But when Nintendo slightly increased the speeds, to offer more performance for developers who might of complained, they didnt bother to keep their traditional multipliers anymore. Maybe this was due to making late last minute changes.

Seems like this console, with its OS, online network, and gamepad, etc, has put a lot of stress on Nintendo as a company to bring everything to fruition for launch.

Except that the CPU was running at 1GHz before V4.
 

blu

Wants the largest console games publisher to avoid Nintendo's platforms.
Ok, while on the subject of devkit updates, here's a riddle from me. I just started Nintendo Land for the first time. Title auto-updated right away so I don't know how it may have looked before any updates, but my first reaction was to scrutinize the shadow quality (a pet peeve of mine). Conclusion: shadow quality in this launch-day updated title has absolutely nothing in common with the under-sampled (read: aliased) shadows in the shots/teasers we were seeing before launch - current shadows have literally an order of magnitude higher resolution, to the degree they have nearly-perfect screen-space resolution - something more characteristic of stencil-based shadow techniques than of depth-buffer-based ones. And yet, there are certain artifacts there (i.e. polygon-slope-related) which are characteristic of depth-buffer-based shadows, so these must still be depth-buffered shadows. I'm both confused and impressed - how the heck did the up the shadow buffer res so much?

Oh, and BTW, shadows are front-face-casted ; )
 

Thraktor

Member
Ok, while on the subject of devkit updates, here's a riddle from me. I just started Nintendo Land for the first time. Title auto-updated right away so I don't know how it may have looked before any updates, but my first reaction was to scrutinize the shadow quality (a pet peeve of mine). Conclusion: shadow quality in this launch-day updated title has absolutely nothing in common with the under-sampled (read: aliased) shadows in the shots/teasers we were seeing before launch - current shadows have literally an order of magnitude higher resolution, to the degree they have nearly-perfect screen-space resolution - something more characteristic of stencil-based shadow techniques than of depth-buffer-based ones. And yet, there are certain artifacts there (i.e. polygon-slope-related) which are characteristic of depth-buffer-based shadows, so these must still be depth-buffered shadows. I'm both confused and impressed (how the heck did the up the shadow buffer res so much?!).

Oh, and BTW, shadows are front-face-casted ; )

Perhaps they're using up half the eDRAM with one massive shadow map? Shadows do look very clean, though, and there's little noticeable aliasing on screen, which is nice for a game with bright, high-contrast visuals (and especially after having seen so many AA-less screenshots pre-launch). Good IQ points all round.

Also, Nano Assault is very nice looking indeed, and there's not much noticeable aliasing there either, even without the AA applied. Strange how arguably the best-looking launch game is a 50MB download. Plays well too :)

Slightly more on-topic:

WiiU's clocks dont seem to make any sense

But, after hearing that the GPU was increased from 400mhz to 550, then I do think the original clocks were balanced:

1200mhzCPU, 800mhzRAM, 400mhzGPU, 200mhzDSP

But when Nintendo slightly increased the speeds, to offer more performance for developers who might of complained, they didnt bother to keep their traditional multipliers anymore. Maybe this was due to making late last minute changes.

Seems like this console, with its OS, online network, and gamepad, etc, has put a lot of stress on Nintendo as a company to bring everything to fruition for launch.

Back when we thought multipliers were a thing, I was actually thinking a 400MHz GPU/800MHz RAM combo was a possibility. To be honest, it's quite possible that they were planning on a multiplier configuration, then in testing realised the current set-up gives such a performance boost that a harmonic set of clocks just isn't worthwhile.


Edit: Well here's something to counteract blu's "Hey look at this thing developers did well with a launch title!". I was just playing the demo of FIFA 13, and the default setting for the gamepad in FIFA 13 is to mirror the TV (you can also switch tabs to various other team-management things). When I tried out manager mode, I noticed something. Instead of just rendering the scene once at 720p and scaling it down to 480p for the gamepad, which I was sure any sane dev would do with a screen-mirroring game, I'm almost certain the game actually renders both screens individually, even though they're displaying exactly the same scene from exactly the same angle 99.9% of the time. I first noticed something weird when I saw very noticeable aliasing on an EA logo which swishes past the screen, even though I was looking at the gamepad, and it would have been much smoother scaled down from the 720p image it was mirroring. I then noticed that, for a brief second before kick-off when you're playing as a manager, you can tilt the gamepad to give a slightly different view than the TV. There doesn't seem to be any other ability to move the viewpoint of the gamepad to anything different from what the TV's showing, but for that brief second you can see that the system's actually rendering two distinct images.

Now, I suppose it's possible that they just render two screens for that second before kick-off, and then switch to a mirror of the main screen, but that's almost as puzzling itself (why bother implementing something like that?), and there doesn't seem to be any noticeable drop in performance or image quality for that brief moment either.

A rather spectacular waste of system resources if true.
 

MDX

Member
Except that the CPU was running at 1GHz before V4.


Well Im glad you brought this up.
What if many games are not performing as they could because
developers didnt get the latest dev kits or didn't target game
development on the latest dev kits?

&#8220;It´s been a hell of a ride! I can't say it's easier or more difficult to develop on Wii U versus other platforms. It's just different. What was difficult was working on work-in-progress hardware, but Nintendo has been very helpful about that. I think the real strength of the Wii U, its large memory, has yet to be exploited.&#8221; - Guillaume Brunier, producer Ubisoft Montpellier


Thraktor:
Now, I suppose it's possible that they just render two screens for that second before kick-off, and then switch to a mirror of the main screen, but that's almost as puzzling itself (why bother implementing something like that?), and there doesn't seem to be any noticeable drop in performance or image quality for that brief moment either.

A rather spectacular waste of system resources if true.

Sloppy. Sounds like indie developers working on small games had an easier time with development than
teams working on ports. Now Im very curious about the games that missed launch, how they will look and play given they are using the extra time for polishing. Games like: AvP and Pikmin3
 

ozfunghi

Member
Yeah, i had mentioned it before in this thread a couple pages back. Actually... now i'm not sure anymore if it was lherre or Arkam. I suppose i said it correctly in the original post. (lol, i'm having trouble finding my post back... maybe i just forgot to post it :/ )

CPU had increased 25%.
GPU was clocked at 400 MHz.

Both info was given in different posts of the same thread (WiiU clockspeed thread).

I also read that comment from the ZombiU (?) dev. There was also a comment from the guys responsible for the ME3 port, who said that the process war rather easy and straight forward... makes you think if all devs got the same info to work with, as ME3 is one of the better ports, running clearly better than the PS3 port.
 

z0m3le

Banned
Not entirely sure how helpful this would be, but as far as GPGPU performance goes between R700, R800 and GCN. It's fairly easy to measure thanks to bitcoin performance, of course this is just one simple OpenCL task that does nothing but simple math.

R700 with 2048shaders @ 1100MHz = 339Mhash/s
R800 with 2048shaders @ 1100MHz = 620Mhash/s
GCN with 2048shaders @ 1100MHz = 650Mhash/s

Of course to get these numbers, some math was involved, luckily with so many configurations and overclock data for these chips, it's fairly easy to find these numbers. A big difference between R700 and R800 was memory latency on chip.

Even taking just the R700 numbers into account without improvements thanks to memory, it's ~52% of GCN's GPGPU performance at same # shaders and clock speed. This obviously points to GCN being much faster, but it doesn't make R700 useless as a GPGPU chip.
 
Top Bottom