• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Japanese Kutaragi Interview, on PS3, Nvidia, eDram etc.

ok so, chances are it's not really possible to answer this questions. but i'd like to get a good idea of how powerful cell is.

so yeah, any comparisons that can be made?

if cell was the be measured the same was pentium processors are, how many ghz would it be?
 

Shompola

Banned
Johnny Nighttrain said:
ok so, chances are it's not really possible to answer this questions. but i'd like to get a good idea of how powerful cell is.

so yeah, any comparisons that can be made?

if cell was the be measured the same was pentium processors are, how many ghz would it be?

Check out the quotes made by Steve Jobs, you'll have your answer right there.
 

Pug

Member
All you really need to ask is will Nvidia go with unified shader further down the line with their PC GPU's. The answer is almost certainlty yes. All ATI have done is moved quicker to a unified system thanks to MS asking for the technology and willing to fund it. I'm sure after the 520 all ATI cards will use the same technology in Xenos.
 

gofreak

GAF's Bob Woodward
Johnny Nighttrain said:
ok so, chances are it's not really possible to answer this questions. but i'd like to get a good idea of how powerful cell is.

so yeah, any comparisons that can be made?

if cell was the be measured the same was pentium processors are, how many ghz would it be?

You can't quantify the difference in terms of gigahertz or clockspeed.

There are no independent benchmarks yet.

Cell isn't as generally balanced as a P4 would be. It'll fair worse at some things, much better at others.

The only idea of relative performance we've been given is that

a) 6 SPEs running at an undisclosed clockspeed can decode 48 SDTV streams simultaneously (probably more if they were optimising performance, but it seemed this demo was more aimed at showing of Toshiba's software platform).

b) PS3 can decode 12 HDTV streams simultaneously.

c) IBM presented a FFT (Fast Fourier Transform) implementation on Cell. One version of it running on one SPE was 2x the performance of an equivalently clocked P4, a larger version was 100x. (Note that IBM openly stated this was comparing an optimised implementation on Cell vs a relatively unoptimised library on the P4, but I don't think you'd come close to closing the gap with an optimised P4 version anyway. To illustrate, the same unoptimised library was slightly faster on one SPE vs a P4 at the same clock).

Of course, IBM is only going to showcase things Cell is good at. Again, it's not a generally balanced processor.


Pug said:
All you really need to ask is will Nvidia go with unified shader further down the line with their PC GPU's.

You also have to ask yourself, however, if now is the right time to move to unified shaders. NVidia thinks the SM3.0+ model isn't suited to it..pixels and vertex shaders are still too different, in their opinion (and remember, they're the ones with actual SM3.0 products under their belt). So just because they may move to the architecture later doesn't mean it would necessarily have been the right thing to do with these chips..

I also believe they have delayed the need to move to unified shaders in hardware with WGF2.0? They have to appear unified to the software, but on a hardware level they can still be seperate, IIRC.
 

mrklaw

MrArseFace
Theoretically, per cycle, Xenos can do 96 pixel shader ops. Clearly, in a game, that won't happen every cycle, there'll need to be a balance with some vertex shading, but its a number to work with.

Now, RSX says it can do 136 shader ops per cycle, but how many are vertex, and how many are pixel shaders? Its a fixed unit, but are its pixel shaders likely to approach the maximum possible on Xenos? If so, then surely RSX games will never have less pixel shader performance, and that is only when Xenos is pushed towards pixel shaders.

I guess we need to know more specifics about the layout of the RSX for that comparison?
 

Nostromo

Member
heoretically, per cycle, Xenos can do 96 pixel shader ops. Clearly, in a game, that won't happen every cycle, there'll need to be a balance with some vertex shading, but its a number to work with.

Now, RSX says it can do 136 shader ops per cycle, but how many are vertex, and how many are pixel shaders?
Sorry guys but how many times has to be said that it's not right to compare different GPUs using shader ops count since shader ops are NOT the same thing across different architectures?
You can't compare GPU A and GPU B relative performance using shader ops, IT HAS NO MEANING AT ALL, please guys STOP this madness.
Use floating point ops instead!
Xenon does 480 floating ops per cycle, RSX is still unknown at this time (IF it's just a NV40 with 8 VS and 24 PS it can do 328 floating point ops per cycle, but I think it will be a vastly improved NV40..)
 

mrklaw

MrArseFace
Nostromo said:
Sorry guys but how many times has to be said that it's not right to compare different GPUs using shader ops count since shader ops are NOT the same thing across different architectures?
You can't compare GPU A and GPU B relative performance using shader ops, IT HAS NO MEANING AT ALL, please guys STOP this madness.
Use floating point ops instead!
Xenon does 480 floating ops per cycle, RSX is still unknown at this time (IF it's just a NV40 with 8 VS and 24 PS it can do 328 floating point ops per cycle, but I think it will be a vastly improved NV40..)


OK. But I don't want to compare overall FLOPS, I want to compare pixel shader flops. Overall will vary depending on help from the CPUs, I'd like to know about relative pixel quality
 

Panajev2001a

GAF's Pleasant Genius
gofreak said:
You can't quantify the difference in terms of gigahertz or clockspeed.

There are no independent benchmarks yet.

Cell isn't as generally balanced as a P4 would be. It'll fair worse at some things, much better at others.

The only idea of relative performance we've been given is that

a) 6 SPEs running at an undisclosed clockspeed can decode 48 SDTV streams simultaneously (probably more if they were optimising performance, but it seemed this demo was more aimed at showing of Toshiba's software platform).

b) PS3 can decode 12 HDTV streams simultaneously.

c) IBM presented a FFT (Fast Fourier Transform) implementation on Cell. One version of it running on one SPE was 2x the performance of an equivalently clocked P4, a larger version was 100x. (Note that IBM openly stated this was comparing an optimised implementation on Cell vs a relatively unoptimised library on the P4, but I don't think you'd come close to closing the gap with an optimised P4 version anyway. To illustrate, the same unoptimised library was slightly faster on one SPE vs a P4 at the same clock).

Of course, IBM is only going to showcase things Cell is good at. Again, it's not a generally balanced processor.




You also have to ask yourself, however, if now is the right time to move to unified shaders. NVidia thinks the SM3.0+ model isn't suited to it..pixels and vertex shaders are still too different, in their opinion (and remember, they're the ones with actual SM3.0 products under their belt). So just because they may move to the architecture later doesn't mean it would necessarily have been the right thing to do with these chips..

I also believe they have delayed the need to move to unified shaders in hardware with WGF2.0? They have to appear unified to the software, but on a hardware level they can still be seperate, IIRC.


The 100x speed-up included all SPE's or at least almost all of them: it was mentioned how on smaller FFT's the SPE's theoretical peak performance could almost be reached (while it was not the case on the MEGA FFT's) and also, if you noticed it, 38+ GFLOPS is much more than what an SPE can do at 3.2 GHz. Still it is about 18.5% of total chip's peak performance while the Pentium 4 CPU manages only about 3.125% of its total peak performance. Situation would be better for the Pentium 4's if you did DP FP calculations: as a CPU the CELL chip might still come out ahead, but the difference would not be anywhere near 100x.
 

Nostromo

Member
mrklaw said:
OK. But I don't want to compare overall FLOPS, I want to compare pixel shader flops. Overall will vary depending on help from the CPUs, I'd like to know about relative pixel quality
I'm not comparing overall flops, I'm comparing SHADER floating point ops.
Shader ops ARE NOT shader floating point operations!!
Shader ops are a meaningless marketing term that doesn't tell us how much shaders floating point ops per cycle a GPU can execute.
Example: ATI/MS say Xenon ALU can do 2 shader ops per cycle. Well..the first shader op is a
4way fmadd (8 floating point ops) and the second shader op is a scalar fmadd (2 floating point ops).
How can you call with the same name 2 completely different things?
It doesn't make any sense at all!
Shader ops don't tell us how much work is being done or can be done for real any given clock cycle.
Fuck shader ops (tm) :lol
 

Panajev2001a

GAF's Pleasant Genius
Nostromo said:
Sorry guys but how many times has to be said that it's not right to compare different GPUs using shader ops count since shader ops are NOT the same thing across different architectures?
You can't compare GPU A and GPU B relative performance using shader ops, IT HAS NO MEANING AT ALL, please guys STOP this madness.
Use floating point ops instead!
Xenon does 480 floating ops per cycle, RSX is still unknown at this time (IF it's just a NV40 with 8 VS and 24 PS it can do 328 floating point ops per cycle, but I think it will be a vastly improved NV40..)

I think that there are other places where the logic is going besides 128 bits FP blending (textures).

Xenos totals about 252 MTransistors for logic (caches, Shader ALU's, TMU's, ROP's, etc...).

RSX totals about 300+ MTransistors for logic (caches, Shader ALU's, TMU's, ROP's, etc...).

Somewhere we are missing almost 50 MTransistors ;).
 

Amir0x

Banned
Panajev2001a said:
I think that there are other places where the logic is going besides 128 bits FP blending (textures).

Xenos totals about 252 MTransistors for logic (caches, Shader ALU's, TMU's, ROP's, etc...).

RSX totals about 300+ MTransistors for logic (caches, Shader ALU's, TMU's, ROP's, etc...).

Somewhere we are missing almost 50 MTransistors ;).

Did you ever write that tech break down article you were into, or are you waiting for the G70 unveiling to complete it?
 

Nostromo

Member
Panajev2001a said:
Somewhere we are missing almost 50 MTransistors ;).
Yeah..that's right, but we still don't know how good are NVIDIA and ATI at 'saving' transistors without impacting performance, moreover we are comparing very different architectures.
ATI approach is new and should use more die space to unify shaders so NVIDIA approach should be more efficient regarding die are per flop, but ATI approach has potential to be much more efficient than NVIDIA approach in reducing ALUs idle cycles.
 

mrklaw

MrArseFace
Nostromo said:
I'm not comparing overall flops, I'm comparing SHADER floating point ops.
Shader ops ARE NOT shader floating point operations!!
Shader ops are a meaningless marketing term that doesn't tell us how much shaders floating point ops per cycle a GPU can execute.
Example: ATI/MS say Xenon ALU can do 2 shader ops per cycle. Well..the first shader op is a
4way fmadd (8 floating point ops) and the second shader op is a scalar fmadd (2 floating point ops).
How can you call with the same name 2 completely different things?
It doesn't make any sense at all!
Shader ops don't tell us how much work is being done or can be done for real any given clock cycle.
Fuck shader ops (tm) :lol


OK.... (trying to get a response that doesn't give Nostromo an embolism)

I mean *pixel shader* flops. Flops that are only interested in making pixels look pretty. Not flops dedicated to transforming polys etc.

Thats where we seem to know a fair bit about ATIs chip, but not RSX. Based on previous experience of the ratio of vertex:pixel shaders, what would a guesstimate be as to how much of the RSX will be focussing on pretty pixels? And from that, are we able to work out likely performance comparisons compared to Xenos pretty pixels (as its unified, work on best case "doing nothing but pixels" and perhaps a standard mix, as used by general fixed chips)
 

Panajev2001a

GAF's Pleasant Genius
Amir0x said:
Did you ever write that tech break down article you were into, or are you waiting for the G70 unveiling to complete it?

Waiting for RSX info and being kinda busy at life, you know... sometimes it gets in the way of more important things like that ;).
 

Amir0x

Banned
Panajev2001a said:
Waiting for RSX info and being kinda busy at life, you know... sometimes it gets in the way of more important things like that ;).

Life, man. I hear some people have one, and it's an enviable thing for sure!
 

dorio

Banned
gofreak said:
You also have to ask yourself, however, if now is the right time to move to unified shaders. NVidia thinks the SM3.0+ model isn't suited to it..pixels and vertex shaders are still too different, in their opinion (and remember, they're the ones with actual SM3.0 products under their belt). So just because they may move to the architecture later doesn't mean it would necessarily have been the right thing to do with these chips..

I also believe they have delayed the need to move to unified shaders in hardware with WGF2.0? They have to appear unified to the software, but on a hardware level they can still be seperate, IIRC.
I don't understand this. Isn't that a software issue? Can't you just make the hardware unified and modify the software for the ps3 since its a closed system in the first place.
 

gofreak

GAF's Bob Woodward
Panajev2001a said:
The 100x speed-up included all SPE's or at least almost all of them: it was mentioned how on smaller FFT's the SPE's theoretical peak performance could almost be reached (while it was not the case on the MEGA FFT's) and also, if you noticed it, 38+ GFLOPS is much more than what an SPE can do at 3.2 GHz. Still it is about 18.5% of total chip's peak performance while the Pentium 4 CPU manages only about 3.125% of its total peak performance. Situation would be better for the Pentium 4's if you did DP FP calculations: as a CPU the CELL chip might still come out ahead, but the difference would not be anywhere near 100x.

Thanks for the extra commentary, I wasn't entirely sure of everything being said about those results slides. I'll have to watch that presentation again ;)

dorio said:
I don't understand this. Isn't that a software issue? Can't you just make the hardware unified and modify the software for the ps3 since its a closed system in the first place.

Not sure if I follow..
 

gofreak

GAF's Bob Woodward
midnightguy said:
can someone explain to me what FFT is?

It stands for Fast Fourier Transform, it's a very common algorithm in Digital Signal Processing. It takes a signal in the time domain and transforms it into the frequency domain. Used a lot in image/sound analysis/manipulation.
 

Vince

Banned
midnightguy said:
can someone explain to me what FFT is?

It's a computationally simpler approximation of a discrete Fourier transform, which is pretty widely used for it's ability to expose periodicities (repeating data structures) from larger data sets. The dFT itself scales like shit in terms of computation, O(n^2), and the FFT reduces it to O(n log n) -- If you want to see the difference, graph the two on a calculator, it's pretty significant.

It's used extensively as an operation, but I'm only familiar with it in practice due to it's usage in contemporary Pulsed-NMR where basically you excite all the molecules in a sample with a Rf pulse (say, 100MHz) and based on how long it takes the molecules to return to their previous state (Free Induction Delay), you get this huge amount of information that looks like nothing. So, you keep exciting the molecules extremely quickly and use a FFT to look for the underderling structures and data periodicities that exist as the noise averages out the more you iteratively do this.
I'm sure Faf or nAo are more familiar with it's usage in CS.

EDIT: Yeah, what he said :)
 

Lord Error

Insane For Sony
RSX is still unknown at this time (IF it's just a NV40 with 8 VS and 24 PS it can do 328 floating point ops per cycle, but I think it will be a vastly improved NV40..)
How did you get the 328 number?
 

gofreak

GAF's Bob Woodward
Elios83 said:

Thanks, I think they're missing some of it though. Hopefully one @ B3D might do a job on the whole Watch Impress article again.

Impress PC Watch: Will the PS3's backward compatibility with the PlayStation and PlayStation 2 be done through hardware?

Ken Kutaragi: It will be done through a combination of hardware and software. We can do it with software alone, but it's important to make it as close to perfect as possible. Third-party developers sometimes do things that are unimaginable. For example, there are cases where their games run, but not according to the console's specifications. There are times when games pass through our tests, but are written in ways that make us say, "What in the world is this code?!" We need to support backward compatibility towards those kinds of games as well, so trying to create compatibility by software alone is difficult. There are things that will be required by hardware. However, with the powers of [a machine like] the PS3, some parts can be handled by hardware, and some parts by software.

IPW: What about the endian (byte order) when emulating CPU codes with software?

KK: The Cell is bi-endian (has the ability to switch between usage of big endian and little endian ordering), so there are no problems.

IPW: The Xbox 360's backward compatibility will be done by software, since [there is] no other choice since they don't manufacture their own chips...

KK: The current Xbox will become antiquated once the new machine comes out this November. When that happens, the Xbox will be killing itself. The only way to avoid that is to support 100 percent compatibility from its [Xbox 360's] launch date, but Microsoft won't be able to commit to that. It's technically difficult.

IPW: The most surprising thing about the PS3's architecture is that its graphics are not processed by the Cell. Why didn't you make a Cell-based GPU?

KK: The Cell's seven Synergistic Processor Elements (SPE) can be used for graphics. In fact, some of the demos at E3 were running without a graphics processor, with all the renderings done with just the Cell. However, that kind of usage is a real waste. There are a lot of other things that should be done with the Cell. One of our ideas was to equip two Cell chips and to use one as a GPU, but we concluded that there were differences between the Cell to be used as a computer chip and as a shader, since a shader should be graphics-specific. The Cell has an architecture where it can do anything, although its SPE can be used to handle things such as displacement mapping. Prior to PS3, real-time rendered 3D graphics might have looked real, but they weren't actually calculated in a fully 3D environment. But that was OK for screen resolutions up until now. Even as of the current time, most of the games for the Xbox 360 use that kind of 3D. However, we want to realize fully calculated 3D graphics in fully 3D environments. In order to do that, we need to share the data between the CPU and GPU as much as possible. That's why we adopted this architecture. We want to make all the floating-point calculations including their rounded numbers the same, and we've been able to make it almost identical. So as a result, the CPU and GPU can use their calculated figures bidirectionally.

IPW: We were predicting that eDRAM was going to be used for the graphics memory, but after hearing that the PS3 will support the use of two HDTVs, we understood why it wasn't being used.

KK: Fundamentally, the GPU can run without graphics memory since it can use Redwood (the high-speed interface between Cell and the RSX GPU) and YDRAM (the code name for XDR DRAM). YDRAM is unified memory. However, there's still the question of whether the [bandwidth and cycle time] should be wasted by accessing the memory that's located far away when processing the graphics or using the shader. And there's also no reason to use up the Cell's memory bandwidth for normal graphics processes. The shader does a lot of calculations of its own, so it will require its own memory. A lot of VRAM will especially be required to control two HDTV screens in full resolution (1920x1080 pixels). For that, eDRAM is no good. eDRAM was good for the PS2, but for two HDTV screens, it's not enough. If we tried to fit enough volume of eDRAM [to support two HDTV screens] onto a 200-by-300-millimeter chip, there won't be enough room for the logics, and we'd have had to cut down on the number of shaders. It's better to use the logics in full, and to add on a lot of shaders.

IPW: First of all, why did you select Nvidia as your GPU vendor?

KK: Up until now, we've worked with Toshiba [for] our computer entertainment graphics. But this time, we've teamed with Nvidia, since we're making an actual computer. Nvidia has been thoroughly pursuing PC graphics, and with their programmable shader, they're even trying to do what Intel's processors have been doing. Nvidia keeps pursuing processor capabilities and functions because [Nvidia chief scientist] David Kirk and other developers come from all areas of the computer industry. They sometimes overdo things, but their corporate culture is very similar to ours. Sony and Nvidia have agreed that our goal will be to pursue [development of] a programmable processor as far as we can. I get a lot of opportunity to talk to Nvidia CEO Jen-Hsun [Huang] and David, and we talk about making the ideal GPU. When we say "ideal," we mean a processor that goes beyond any currently existing processor. Nvidia keeps on going into that direction, and in that sense, they share our vision. We share the same road map as well, as they are actually influenced by our [hardware] architecture. We know each other's spirits and we want to do the same thing, so that's why [Sony] teamed with Nvidia. The other reason is that consumers are starting to use fixed-pixel displays, such as LCD screens. When fixed-pixel devices become the default, it will be the age when TVs and PCs will merge, so we want to support everything perfectly. Aside from backward compatibility to, we also want to support anything from legacy graphics to the latest shader. We want to do resolutions higher than WSXGA (1680x1050 pixels). In those kinds of cases, it's better to bring everything from Nvidia rather than for us to create [a build] from scratch.

IPW: Microsoft decided to use a unified-shader GPU by ATI for its Xbox 360. Isn't unified shader more cutting edge when it comes to programming?

KK: The vertex shader and pixel shader are unified in ATI's architecture, and it looks good at one glance, but I think it will have some difficulties. For example, some question where will the results from the vertex processing be placed, and how will it be sent to the shader for pixel processing. If one point gets clogged, everything is going to get stalled. Reality is different from what's painted on canvas. If we're taking a realistic look at efficiency, I think Nvidia's approach is superior.

edit - actually, the whole interview itself is there, but the Watch Impress commentary is missing, understandably enough.
 

Forsete

Gold Member
Prior to PS3, real-time rendered 3D graphics might have looked real, but they weren't actually calculated in a fully 3D environment. But that was OK for screen resolutions up until now. Even as of the current time, most of the games for the Xbox 360 use that kind of 3D. However, we want to realize fully calculated 3D graphics in fully 3D environments. In order to do that, we need to share the data between the CPU and GPU as much as possible. That's why we adopted this architecture.

What does he mean by this?
 

gofreak

GAF's Bob Woodward
Forsete said:
What does he mean by this?

B0002COAK2.03.LZZZZZZZ.jpg


I've no idea. Maybe it's a poorly phrased and/or poorly translated reference to the level of simulation involved in game worlds or graphics up to this point or something.
 

Elios83

Member
I think he just means that with Cell they can finally simulate what happens in a world.It's the same thing Kojima said.The passage from a movie set to a 'real' world.
 

Pimpwerx

Member
Interesting read as usual, although he's clearly going to champion his machine over the competition.

I gathered most of this from the machine translation, but I have to say again that it's very interesting the crossover both MS and Sony are pulling with their hardware. For the 360, we know how Xenos is a fresh take on a general-purpose GPU. And we see how the GPU is really the heart of that system. With Cell, we're seeing a CPU that's kinda GPU-like (a nice VS stack sitting under the PPE) and apparently a GPU that's gonna be kinda CPU-like (sounds like highly programmable with memory handling like a CPU?). Different companies with different teams, but the industry tends to move along in the same directions.

KK: Fundamentally, the GPU can run without graphics memory since it can use Redwood (the high-speed interface between Cell and the RSX GPU) and YDRAM (the code name for XDR DRAM). YDRAM is unified memory. However, there's still the question of whether the [bandwidth and cycle time] should be wasted by accessing the memory that's located far away when processing the graphics or using the shader. And there's also no reason to use up the Cell's memory bandwidth for normal graphics processes. The shader does a lot of calculations of its own, so it will require its own memory. A lot of VRAM will especially be required to control two HDTV screens in full resolution (1920x1080 pixels). For that, eDRAM is no good. eDRAM was good for the PS2, but for two HDTV screens, it's not enough. If we tried to fit enough volume of eDRAM [to support two HDTV screens] onto a 200-by-300-millimeter chip, there won't be enough room for the logics, and we'd have had to cut down on the number of shaders. It's better to use the logics in full, and to add on a lot of shaders.

What I don't get is this. If you want to pule on more shaders, then where's the bandwidth coming from, Ken? You halved the bus to 128bit for VRAM, so the aggregate is gonna be below that of the 360. I can only assume that there's method to the madness. Either the pipes are so long and deep that external accesses aren't gonna swallow up bandwidth...or...there's gonna be a lot of shaders and not enough bandwidth to keep them fed. I assume they've balanced the system well enough here. But that comment gives me hope that they've solved the bandwidth connundrum that a number of us are puzzled over, b/c that would be the only reason to forgo eDRAM and pile on more bandwidth-sucking shaders. PEACE.
 

gofreak

GAF's Bob Woodward
Pimpwerx said:
You halved the bus to 128bit for VRAM, so the aggregate is gonna be below that of the 360.

Are you counting eDram bandwidth there, or..?

Main system memory bandwidth on both system is 47GB/s on PS3 vs 22GB/s on X360 (RSX has 57GB/s of external bandwidth, but mem bandwidth would be capped at the 47GB/s).

If cell can be used for scene postprocessing it's also arguable that Cell's internal bandwidth may become countable :p ;) (I'm not sure if I'd argue that though..obviously the eDram HAS to be used for that kind of stuff, but Cell isn't necessarily going to be in every case..but it's an option I suppose).
 

lips

Member
OMG, some1 did not relay the telegram.

memory transistors are not computational transistors. stop.

232 trans at 500 mhz is still less than 300+ at 550 mhz. stop.

stop. stop.
 

thorns

Banned
lips said:
OMG, some1 did not relay the telegram.

memory transistors are not computational transistors. stop.

232 trans at 500 mhz is still less than 300+ at 550 mhz. stop.

stop. stop.

One element that has been reported on is the number of 150M transistors in relation to the graphics processing elements of Xenon, however according to ATI this is not correct as the shader core itself is comprised from in the order of 232M transistors. It may be that the 150M transistor figure pertains only to the eDRAM module as with 10MB of DRAM, requiring one transistor per bit, 80M transistors will be dedicated to just the memory; when we add the memory control logic, Render Output Controllers (ROP's) and FSAA logic on top of that it may be conceivable to see an extra 70M transistors of logic in the eDRAM module.

so it's 232M+70M of logic and 80MB of memory. You can't just dismiss the memory as well, since it will be used for a lot of different stuff, freeing up the main core from a lot of tasks.
 

MetalAlien

Banned
Forsete said:
What does he mean by this?

I think he means were before something like the shadow on a building might have been prerendered and displayed as a texture, now it will be realtime. Basicly more realtime and less pre-processing of a scene. That's what I got.
 

MetalAlien

Banned
Pimpwerx said:
Interesting read as usual, although he's clearly going to champion his machine over the competition.

What I don't get is this. If you want to pule on more shaders, then where's the bandwidth coming from, Ken? You halved the bus to 128bit for VRAM, so the aggregate is gonna be below that of the 360. I can only assume that there's method to the madness. Either the pipes are so long and deep that external accesses aren't gonna swallow up bandwidth...or...there's gonna be a lot of shaders and not enough bandwidth to keep them fed. I assume they've balanced the system well enough here. But that comment gives me hope that they've solved the bandwidth connundrum that a number of us are puzzled over, b/c that would be the only reason to forgo eDRAM and pile on more bandwidth-sucking shaders. PEACE.

Well even with a 128bit wide bus the spec-ed bandwidth is pretty good. But I wouldn't be surprised if the machine was lacking in one area.. Japanese designed consoles seem to be built with the intention (the developers) on finding the power, not just giving it out for free.
 

aaaaa0

Member
MetalAlien said:
Japanese designed consoles seem to be built with the intention (the developers) on finding the power, not just giving it out for free.

:-|

"We're going to design this monsterously powerful machine. But we're not going to let you use it! Nope, we're going to MAKE YOU WORK FOR IT! Nyah nyah!"

No one in their right mind would intentionally design a system so the developers have to work hard to "find the power". The only valid reason someone would do such a thing is if an architectural tradeoff forced them to (trade programming difficulty for greater performance), or they simply made a colossal error in the system's design.
 

ourumov

Member
can someone explain to me what FFT is?
It stands for Fast Fourier Transformation, which is an algorythm that does a fast implementation of the Fourier Transform. Basically the key is that we can simplify a lot of calculations due to the fact that we work in a discrete domain.

Applications ?

A lot. Especially in audio and video. In the audio domain it can allow us to convert a time-daomain signal, to the frequence one. Basically that's useful for doing filtering for instance. When you are using winamp, the equalitzer is doing FFTs. You can see the representation of each frequency in the current sample and then you can easily modify the ouptut by just playing on the frequence domain.
Basically you FFT your signal, modify it, and then you IFFT it and get a filtered signal.

Another application on the video field ?
It allows you to equalize your images as well. In other words to distribute better the level of color along a image (histogram) so less differences appear. Doing this on a final framebuffer could allow us to avoid a lot of aliasing artifacts.
 

mrklaw

MrArseFace
Amir0x said:
Life, man. I hear some people have one, and it's an enviable thing for sure!

Nah. I think I preferred it when I didn't have a life. Much simpler, and I wouldn't have such a backlog of games
 
MetalAlien said:
Japanese designed consoles seem to be built with the intention (the developers) on finding the power, not just giving it out for free.


Yeah, and American consoles are designed to give power for free.
 

MetalAlien

Banned
aaaaa0 said:
:-|

"We're going to design this monsterously powerful machine. But we're not going to let you use it! Nope, we're going to MAKE YOU WORK FOR IT! Nyah nyah!"

No one in their right mind would intentionally design a system so the developers have to work hard to "find the power". The only valid reason someone would do such a thing is if an architectural tradeoff forced them to (trade programming difficulty for greater performance), or they simply made a colossal error in the system's design.

Alright maybe I went too far, but they do have a different state of mind over there.. you're not suppose to complain about doing things the hard way...
 

Striek

Member
thorns said:
so it's 232M+70M of logic and 80MB of memory. You can't just dismiss the memory as well, since it will be used for a lot of different stuff, freeing up the main core from a lot of tasks.
Isn't that just supposition on his behalf? Might be true, but lets not tout it as fact yet. I thought I heard a 332M transistor number a while ago from somewhere....


I'm curious to see how the RSX can interact/work with CELL. nVidia are talented people, I hope to see something special.
 

gofreak

GAF's Bob Woodward
thorns said:
You can't just dismiss the memory as well, since it will be used for a lot of different stuff, freeing up the main core from a lot of tasks.

In one sense you can, actually, in that memory transistors do nothing computationally. They can't free other logic from computational tasks, which is all the other logic is doing.
 

teiresias

Member
The very notion that gofreak is some anti-x360 zealot is completely and utterly laughable, given his post history both here and at B3D.
 

gofreak

GAF's Bob Woodward
PhatSaqs said:
Hey gofreak, do you have anything at all positive to say about the 360 GPU? Just wondering....

Hey phatsaqs. Nothing much. It's a horrible design.

:roll

Can you point out where I'm being negative in the above post?

The X360 GPU is a lovely design, it looks very neat and clean and powerful. There's plenty positive to say about it, but forgive me if I won't ignore dubious statements that stretch things. The reality is good enough not to need that.
 

PhatSaqs

Banned
gofreak said:
Hey phatsaqs. Nothing much. It's a horrible design.

:roll

Can you point out where I'm being negative in the above post?
I apologize if it seems i'm implying you're being overly negative. I certainly didn't say you were and it's not a loaded question at all. I was asking since you seem to be pretty knowledgable on the subject.

If you've already done so (detail things you might like about the GPU) and i've missed it, my bad.
 

gofreak

GAF's Bob Woodward
PhatSaqs said:
If you've already done so (detail things you might like about the GPU) and i've missed it, my bad.

Things I like about Xenos:

- as above, the thought of a close to as is possible level of utilisation of computational resources very much appeals. I'm an efficiency freak, to a fault perhaps, so this is probably numero uno.

- the eDram. Pretty obvious. I'd take this in any system easily. I still think it has value, despite increasing bandwidth.

- the memexport function looks tasty. I'm not sure how much use it'd get typically in a game, but I'd love to tinker with it.

- the decoupling of the texture units is nice!

They're the highlights for me.
 
lips said:
OMG, some1 did not relay the telegram.

memory transistors are not computational transistors. stop.

232 trans at 500 mhz is still less than 300+ at 550 mhz. stop.

stop. stop.


this comparison is just plain wrong, in light of very recent information. this Nvidiot comparison needs to be stopped :lol
 

Lord Error

Insane For Sony
this comparison is just plain wrong, in light of very recent information.
For shader ops it wouldn't be wrong. There defintiely is value in having more logic transistors that give you more shader ops. Ati's R500 chip is using lots of transistors for a different purpose, that also has it's significant value, though.
 
Top Bottom