• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

The great NeoGAF thread of understanding specs

Durante

Member
specswlq5a.png


We are on the cusp of a new console generation, and many gamers want to talk about technical specifications. As we saw with the recent confirmation of 8GB GDDR 5 in PS4, not everyone has an understanding of what they mean. Instead of complaining about that, I thought it would be a good idea to try and explain some basic concepts, hopefully in a manner that is easy to follow.

A few things first:
  • I've never shipped a commercial game, and I have no insider knowledge of any kind about existing or upcoming consoles
  • As per the above, some things I write here could well be incomplete or wrong, please point that out so I can continue improving this OP
  • I hope this thread can also serve as a central discussion point about what specs mean, without degenerating too much into console (or general platform) warfare
  • The explanations in this thread will focus on a gaming perspective, and what the individual components and their specifications mean for gaming
  • Due to the number of topics touched upon, this will by necessity be a very shallow overview. If you want slightly more detail and a more general view of these topics, Wikipedia is a good starting point

spacerjvobf.png


CPU

The CPU serves to perform general purpose processing in a console. What does this mean? It means performing tasks such as deciding for each non-player actor in an FPS what their next action will be, or packaging data to send over the network in any online game, or orchestrating (but not actually performing) the display of graphics and the playback of sound.

Specifications and terms which might turn up when discussing CPU performance include:
  • Cores: Most modern CPUs have multiple independent processing cores. This means that they can work on more than one program stream (often called thread) at the same time. The number of cores determines the number of such threads that can be processed at the same time. Cores can be symmetrical or asymmetrical, the former is the traditional and usual case while the latter means that the individual cores differ from each other in architecture. For example, Xbox 360 has 3 symmetrical cores, PS4 will have 8 symmetrical cores, PS3 uses 8 asymmetrical cores, and Wii has a single core.
  • Hardware threads: As explained above, cores execute program streams which are called threads. On some architectures, more than one thread may be active per core at a time in hardware, a functionality also widely referred to as SMT (simultaneous multithreading). This usually serves to achieve better utilization of that core. Note that this is incomparable to having an additional full core in performance, and also note that any number of software threads can run on any core by distributing time slices. Examples of hardware multithreading include Pentium 5, the Xbox 360 cores or the PS3 PPE, which could all execute 2 threads per core, or IBM's Power7 which performs up to 4-way SMT.
  • Clock frequency: maybe the most well known factor in CPU performance. The clock rate describes the frequency (that is, cycles per second) with which instructions are moved through the processor pipeline. With all else being equal, a processor clocked at twice the frequency will be able to perform twice as much work per unit of time. The PS3 and 360 SPUs were both clocked at 3.2 GHz, while PS4 and 720 are rumored to clock at 1.6 GHz. Modern desktop PC processors clock anywhere from 2.5 to 4.2 Ghz.
  • Instruction-level parallelism (ILP): almost all modern processor architectures are superscalar. This means that they are capable of executing more than one instruction per clock cycle by distributing them to individual functional units on the processor. A closely related metric is IPC (instructions per clock), describing the number of instructions that can be completed in one clock cycle by one core. This is an often overlooked factor in discussions on message boards, and one of the reasons why e.g. a Sandy Bridge (Core i7) core at half the frequency of an Atom or ARM core easily outperforms them.
  • Pipeline stages: To enable high clock frequencies, instructions in modern CPUs are pipelined, which means that they are executed in small chunks over multiple clock cycles. This is generally useful, but can lead to problems e.g. when results need to be available before it can be decided how the program execution will continue. These are called pipeline stalls, and the magnitude of their impact on performance depends on the length of the pipeline.
  • Out-of-order execution: In the simple case, a processor will execute instructions in exactly the order provided by the instruction stream of the program -- this is known as in-order execution. However, quite a while ago in computer architecture, it was discovered that it can often be advantageous to execute instructions out of order. This can lead to much better utilization of the available processor resources, especially in cases where compiler optimization fails. All modern desktop CPUs and the next-gen console CPUs support out-of-order execution, while 360 and PS3 were both in-order
  • Single instruction multiple data (SIMD): SIMD instructions and units allow a processor to work on multiple pieces of data with one instruction and per clock cycle. SIMD units are usually characterized by their width in bits. For example, a 128 bit SIMD unit can work on 4 single-precision floating point values (32 bit each) at a time. By definition, this works well only when performing the same sequence of instructions on many data elements.
  • Cache size: CPUs run at multiple GHz, and it can take dozens or even hundreds of cycles to get data from main memory. Caches serve as an intermediate storage for data that is often accessed and greatly reduce access latency. Often there are multiple levels of cache, with smaller, faster layers fed by larger, slower ones. one important factor is whether individual cache layers are shared by all cores on a chip or exclusive to a core. Shared caches can be used to speed up communication between cores and reduce the penalty for moving threads.

What does all of this mean? To understand how a processor will perform at a task we first need to decide how that task will be impacted by the individual performance characteristics of the processor. Will there be a lot of SIMD-friendly number crunching? Can the task be distributed across multiple threads efficiently? Will there be lots of data-dependent unpredictable branching?

Let's look at some examples to get an idea of how different architectures would fare:
  • Multiplying 2 dense matrices (e.g. some core graphics-, audio-, or physics-related engine code). Here, we know exactly what we are going to do, and the task is easily parallelized across cores and on SIMD units. On the other hand, a long pipeline or even a lack of out-of-order execution will not hurt much (given a competent compiler).
  • Interpreting a scripting language (e.g. what Skyrim does for much of its actor behavior). We may interpret multiple separate scripts on different cores, but we can't multi-thread a single script. SIMD is likely to be mostly useless, and long pipelines are likely to hurt our throughput. Out-of-order execution will improve performance by a significant degree.

spacerjvobf.png


GPU

Traditionally, GPUs are meant to render graphics onto the screen and that's it. This is still their main purpose, however, since around 2006 the field of general-purpose computation on GPUs has increased in importance. What this means is taking some tasks that would have traditionally been performed on CPUs and letting the GPU work on them instead.

The most important GPU specifications are:
  • Shader processors: The main processing elements on a GPU. You can think of them as simple CPU cores with low frequency, very wide SIMD units and high penalties for branching code. They execute all the pixel, vertex, geometry and hull shader code that a modern 3D engine throws at the GPU. Thus, they limit the computational complexity of these effects. Since different vendors count these differently, it makes most sense to me to just look at the number of floating point operations that can be performed per cycle on the whole GPU.
  • Clock frequency: Just like on CPUs, determines the number of hardware cycles per second. Usually around 1 GHz on high-end PC hardware now, ~ 500 Mhz on PS3 and 360. Obviously, the frequency impacts the performance of all the other components of the GPU.
  • Render Output Pipelines (ROPs): These take shader output, and write/blend it to buffers in memory. The number of ROPs therefore impacts the maximum number of pixels that can be rendered per unit of time. Modern ROPs can perform multiple Z operations for each color operation, which allows the GPU to more quickly discard pixels which will not be visible in the final rendered image (since they are behind some obstructing geometry).
  • Texture Mapping Units (TMUs): TMUs gather and filter texture data which is used as one of the inputs to various shader programs. The number of these units available determines the detail and filtering quality of textures you can use in a 3D scene.
  • Caches and local storage: Just like CPUs, modern GPUs feature caches to alleviate external memory bandwidth and latency issues. Unlike most CPUs, they also reserve small local memory spaces for programmers to actively use for communication or caching. This is similar to the local store on each Cell SPE.

Now let's again look at some examples and see how the individual specs impact them:
  • Increased geometric asset detail. This will put more vertex processing strain on our GPU, but will leave ROPs and TMUs entirely unaffected.
  • Increased rendering resolution. Here we will increase the amount of pixel processing processing required, while other shader processing should stay at similar levels. Texturing load will also increase, but we may get better texture caching. ROPs performance requirements will increase significantly.

spacerjvobf.png


Memory

Memory is used to store data needed by your game/OS. This may seem obvious, but it's clearly necessary to establish that memory, by itself, performs no computation. For games, the majority of memory is usually taken up by graphics asset data, such as textures, models and animation data. Audio and gameplay-related data is usually comparatively small. Of course, this also depends on the type of game. A corridor shooter will require less memory for gameplay data than an RTS or large-scale open world game with many active actors.

Memory is characterized by several distinct aspects, each of which is individually important:
  • Capacity: Very straightforward, this is the amount of data that can be stored in a given block of memory.
  • Bus width: The number of bits that can be transferred to/from memory per cycle. This is usually limited by (and limits) the number of memory chips needed to implement a given capacity.
  • Clock frequency: Just like CPU and GPU, memory will also operate at some clock rate. Together with the bus width, this determines the bandwidth of the memory, and together with the delay in clock cycles of various operations it determines the latency.
  • Bandwidth: The amount of data that can be transferred to and from memory in a given unit of time. In some cases, bandwidth is unidirectional (that is, it can be used to transfer either in one direction or the other), and in other cases it's bidirectional.
  • Latency: The time it takes to access any given location in memory. In practice this determines, once the CPU or GPU requests some value that is not in cache, how long it will take until this value is accessible to it.
  • Layout: Memory can be set up in any number of blocks of different types. If the major main memory block is all using the same type of memory and accessible by both the CPU and GPU, the layout is usually called uniform. A uniform layout is easier to program and more straightforward to implement, but restricts you to a single memory type and limits high-end performance. PS3 and PCs use a split memory layout with separate main and graphics memory, while PS4 uses a unified layout. Xbox 360 and Wii U are unified in terms of main memory, but with a separate embedded memory pool.

For memory, there are only a few types that are in general use, so let's go over them quickly:
  • DDR3: Largest capacity per chip, low bandwidth, medium latency. Main system memory on PC and Wii U, rumored to be used in the next Xbox.
  • GDDR5: Lower capacity per chip, higher bandwidth, higher per-clock latency (partially offset by higher clock). Used in all high-end GPUs on PC as well as in PS4.
  • eDRAM: Very low capacity due to being embedded on-chip, low latency and potentially high bandwidth (with a wide bus). Used for the 360 GPU framebuffer and on Wii U.
  • eSRAM: Even lower latency and capacity, used to implement caches.

Let's finish this section up by once more looking at a couple of use cases and how they impact memory:
  • Increasing the framerate. Going e.g. from 30 to 60 FPS will not require any additional memory capacity, but significantly higher bandwidth for the GPU and potentially also lower latency.
  • Increasing level size. This will mostly impact capacity, since you need to keep a larger set of assets in memory. However, since the set of assets used in each individual frame is not likely to increase much in size, bandwidth requirements are mostly unaffected (and so is latency).

spacerjvobf.png


Other hardware

Consoles used to include lots of special fixed function hardware to perform a variety of tasks. While the tendency in general in hardware has been towards programmability and more general purpose computation, a few components that get mentioned often should be discussed:
  • Audio DSPs: Digital Signal Processors are very efficient at the kind of processing required for e.g. audio. This is particularly important when your main CPU is comparatively weak at these tasks. Wii U, PS4 and the next Xbox are all rumoured to feature some dedicated audio hardware, while on PCs audio processing is mostly done entirely on the CPU.
  • Video Encoding Hardware: Video encoding is a very performance intensive task, and one which can be accelerated significantly by dedicated hardware. Wii U uses dedicated video compression hardware for streaming to the gamepad, and PS4 also includes such hardware for its streaming and recording features. Nvidia plans to use hardware in the 600-series GPUs to enable streaming to Shield.

spacerjvobf.png


General Terms

Here I planned to list a few more terms that are used for different components, and often crop up in discussions, but there's really only one I can think of that isn't covered yet:
  • GFLOPs: Giga Floating Point Operations Per Second. In console discussions, this is usually referring to the maximum number of single-precision floating point operations theoretically possible on some hardware per second. With the information about CPUs and GPUs outlined above, we can imagine that this number is a function of (core count) * (ILP) * (SIMD width) * (clock frequency).
    An 8-core Jaguar CPU at 1.6 GHz performs ~100 GFLOPs, Cell in PS3 around 200, the Xenos GPU in Xbox 360 managed 240 and the GPU in PS4 does 1800. AMDs current high-end graphics chip does 4300 and Titan does 4500.

spacerjvobf.png


Congratulations if you read all of the above, it should now be clear to you why e.g. doubling the memory capacity of PS4 will not automatically increase the framerates of games running on it, or why the Wii U CPU at a much lower clock rate can still keep up with the Xbox 360 CPU in some tasks -- and why others are problematic for it.

I geeked out a bit too much when I started writing this, especially on the CPU part. In the interest of only focusing on stuff useful to gamers, keeping this readable in some fashion at least and keeping the time for writing it below 4 hours I got somewhat more focused/shallow in the other parts.
 
Everyone link this thread when the next fanboy makes a thread about how many RAMs can he fit on his control pad/what would you do if the next xbox had 2 graphics cards?
 
You know this isn't going to stop all of the new threads about RAM right? It's the nextgen lunacy, it's best to embrace it and let the crazy engulf you
ahh.png
 

Durante

Member
You know this isn't going to stop all of the new threads about RAM right? It's the nextgen lunacy, it's best to embrace it and let the crazy engulf you
I think spending hours writing this thing shows that I already embraced the crazy :p

Actually, I just have a fierce cold and it was a nice distraction. Though I do have some hope that it might help a bit with the sheer volume of posts and threads we have.
 
I have absolutely no issue with your post, OP. Wonderful quality, and thank you for investing the 3+ hours into creating it. I agree----this post should be stickied.
 

lol51

Member
Nice thread. I was listening to PS4 segment in the AnandTech podcast and they made it seem like the 2.4ghz wifi in the PS4 can cause packet loss and connection issues? They also said it would be a cheap upgrade ($0.75-1 per unit) to go to 5ghz.

Can anyone explain 2.4ghz/5ghz wifi and would this really be noticeable to a non tech savvy individual? I try to keep my devices that have ethernet connected through ethernet, but those unable to connect that way - will they see problems?

Edit: Did some learning with google~! Not all routers transmit a 5GHz signal and not all devices have a wifi compatible with a 5GHz signal. The 5GHz has a higher transfer rate and handles streaming video better. It also is less likely to be congested by other wifi signals and be effected by other devices (eg: your microwave). The trade off is it isn't that strong of a signal strength and is more likely to be blocked by walls.
 

Chittagong

Gold Member
Fantastic thread. It's pretty spectacular how informed about the underpinnings of consoles general gamers can be these days by just visiting a place like NeoGAF.
 

I'm an expert

Formerly worldrevolution. The only reason I am nice to anyone else is to avoid being banned.
Gaf can only blame themselves for the ram mess. That thread with everyone in the beginning like OHH SHIT 8GIGS DDR5 and the people who had no idea what ram even was like..oh..shit...yeah..RAM..GD..sumtin sumtin 5!! 8 GIGS!! (what's this mean, xbox has this too right)

Great op.
 
Gaf can only blame themselves for the ram mess. That thread with everyone in the beginning like OHH SHIT 8GIGS DDR5 and the people who had no idea what ram even was like..oh..shit...yeah..RAM..GD..sumtin sumtin 5!!

Great op.

It's a confirmation bias fallacy. "Everyone around me is saying how wonderful GDDR5 RAM is, so it must be really awesome!"
 

pfkas

Member
Bravo Sir. I'm a software guy and generally keep my nose out of hardware, but I read all that and you've explained it all very well.
 

Mogwai

Member
[*]Hardware threads: As explained above, cores execute program streams which are called threads. On some architectures, more than one thread may be active per core at a time in hardware, a functinality also widely referred to as SMT (simultaneous multithreading). This usually serves to achieve better utilization of that core. Note that this is incomparable to having an additional full core in performance, and also note that any number of software threads can run on any core by distributing time slices. Examples of hardware multithreading include Pentium 5, the Xbox 360 cores or the PS3 PPE, which could all execute 2 threads per core, or IBM's Power7 which performs up to 4-way SMT.

"Simultaneous multithreading" is that the same as hyperthreading? I'm no CPU expert, just merely wondering.

Great overview :)
 
It seems like there are tradeoffs between ddr3 and gddr5 and that one isn't clearly superior to the other....? If so, why is GDDR5 used in high end video cards and why do you think sony picked it (considering it's a higher price)?

Thanks Durante, you had my favorite thread of last year and this is likely to be my favorite thread of this year. DS FIX fo lyfeeee
 
D

Deleted member 125677

Unconfirmed Member
First he fixed Dark Souls PC, then he fixed all the inaccurate specs debates of next gen consoles.
 

Durante

Member
Gaf can only blame themselves for the ram mess. That thread with everyone in the beginning like OHH SHIT 8GIGS DDR5 and the people who had no idea what ram even was like..oh..shit...yeah..RAM..GD..sumtin sumtin 5!! 8 GIGS!! (what's this mean, xbox has this too right)
To be fair, my post when the RAM announcement happened was

"HOLY CRAAAAAAAAAAAAAAP!"

it was very unexpected, and it is a big deal. Just not in all the ways people imagine.


"Simultaneous multithreading" is that the same as hyperthreading? I'm no CPU expert, just merely wondering.
Yes, "Hyperthreading" is Intel's marketing name for SMT.
 

Durante

Member
It seems like there are tradeoffs between ddr3 and gddr5 and that one isn't clearly superior to the other....? If so, why is GDDR5 used in high end video cards and why do you think sony picked it (considering it's a higher price)?
Because GPUs primarily want bandwidth and can deal with latency in a variety of ways, while CPUs don't really care about bandwidth beyond a certain point. When you have to chose one or the other for a unified memory system, you pretty much have to go with the higher bandwidth option, everything else will cripple your graphics performance. Unless of course you also invest into embedded memory to mitigate that, but doing so introduces its own problems (i.e. larger die, capacity restrictions, more complexity for programmers to deal with).
 
I'm reading it and it's very heavy for someone who has minimal knowledge in this field.

I'm trying to understand this part:
DDR3 RAm has medium latency while GDDR5 has higher per clock latency (don't really understand the per clock portion).

The lower latency the better, as reduces the time to read information, I'm assuming.

So what exactly is the DDR3 better at doing with regard to latency when it comes to gaming when comparing it to GDDR5. Since OS processes will also be playing a big role in the upcoming gen (Gakai, instant streaming/replay, and all the other OS stuff), in what ways would DDR3's lower latency compared to GDDR5 benefit the system that has DDR3.

I don't expect a clear cut answer, as it seems there isn't, but maybe something you write will clear some of this fog in my head.

Thank you for your hard work.
 
I would add that clock speeds on different architectures don't mean much for comparisons. Just because something is clocked higher doesn't mean it is faster.

edit: i guess it was covered
 

Thraktor

Member
Great thread, Durante, you summarized the main points very well. This'll be very handy to have bookmarked for those times where I don't have the energy to explain some of these points for the 100th time myself.
 
Good idea for a thread. Hopefully this will clear up a lot of confusion. I generally consider myself pretty well versed in this stuff, but I learned a bit too!
 

Ahmed360

Member
Well Done sir, your hard work is truly appreciated, I really can't thank you enough!

I have couple of questions I would love if you could answer:

1- I read/heard that GPUs have huge number of cores, much more than CPUs (I think I read somewhere those are like in hundreds?) and that's because graphics processing is more of a multithreaded operation, is that correct?

2- Could you explain more on why the lower clocked WiiU CPU is able to match the highly clocked CPU in Xbox360? Is it only because of the new architecture used? Or ist something else?

I really hope you could answer my two questions, as I have been always wandering about them for a while now.

Again, thank you so much for the very informative thread, it is simply great! :)
 

Perkel

Banned
It seems like there are tradeoffs between ddr3 and gddr5 and that one isn't clearly superior to the other....? If so, why is GDDR5 used in high end video cards and why do you think sony picked it (considering it's a higher price)?

Thanks Durante, you had my favorite thread of last year and this is likely to be my favorite thread of this year. DS FIX fo lyfeeee

Durante didn't said much about actual use of memory in games beside 30fps/60fps thing. This is what i posted in other thread as example :

Perkel said:
Someone correct me if i am wrong.
Small example of bandwidth vs size of ram. We assume there are no bottlenecks (like inferior GPU CPU etc). We love to play 1FPS games to further simplify our test.

We have two memory pools:

M1 - 20MB which has 20Megabytes/s bandwidth
M2 - 40MB with 5MB/s bandwidth.

We want to make a forest in our game.

Now we load for example 1MB tree (mesh,textures, everything) into our M1 and M2.

M1- 1 out of 20MB is reserved
M2- 1 out of 40MB is reserved

Because developers reuse assets we will now make a forest from this 1MB tree.

M1 thanks to 20MB/s bandwidth can output 20 tree models a second.
M2 thanks to 5MB/s can output 5 tree models a second.

Now because our tree forest can't be just one tree "copypasta" we will add 4 other tree variations (each 1MB).

M1 - 5 of 20MB reserved for 5 tree variants
M2 - 5 of 40MB reserved for 5 tree variants

M1 output is 20 trees forest which is consisted of 5 variations of tree model.
M2 output is 5 trees forest which is consisted of 5 variations of tree model.

Now we add 5 more variations to our forest

M1 - 10MB/20MB reserved and can output 20 trees with 10 variations
M2 - 10MB/20MB reserved and can output 5 trees with 5 variations.

As you can see we can add more variations and still slower bandwidth simply won't be able to show you forest. What size gives you is variety of assets
What bandwidth gives you is how much of those assets can be output into final image.

And there is another thing that you must understand. Developers reuse assets ALL TIME.

This is why no one uses cheap DDR3 over expensive GDDR5 in GFX cards. DDR3 is good for doing things like browsing internet, checking mail, writing, watching youtube, playing old games which were not created with high bandwidth in mind.

Someone said that games which take only 2GB of ram don't need more than 2GB/s bandwidth. That is completely untrue. Since developers reuse assets you can have game in which only 100MB is used but because they reuse a lot of assets you can run out of bandwidth.

This is also why EDRAM in X360 and PS2 was so important it gave it bandwidth to better use small amout memory which had better effects on overall end picture than just more main slow memory.

This is also why 4GB of GDDR5 ram was better than 8GB of DDR3 ram and still would be better than 16GB or 32GB of DDR3 ram and this is why MS uses SDRAM to have better bandwidth.

And this is also why it is so important for next gen games because those games will be pushing a lot of things on screen.
 

Neo C.

Member
This thread is needed. It's always weird to see people heavily involved into graphics yet don't understand even the most basic things. "Did you read Durante's thread" should be a standard answer in future tech threads.
 
Top Bottom