• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

IGN rumour: PS4 to have '2 GPUs' - one APU based + one discrete

Hope Sony don't kill them self trying to much for more power when noone going to see it out side hard core tech guys, not just from a cost to making

Game should be about game play and art style from now on we not going to have a big leap like ps2 to ps3 anymore
 

Donnie

Member
From a cost standpoint, the larger the GPU size/power the lower the yield and higher the cost. Taking a mid-range GPU and allowing it efficiencies increases performance without increasing cost. My post above showed that using a eventually cheaper 3D stacked memory in the SOC can double performance without increasing cost. So take developer specs and double them, we are now above the 2.5 TFLOPS next generation goal provided the PS4 is a 2014 SOC design with 3D stacked memory and the same number of Compute units and CPUs as in Developer platforms.

There is the other possibility that final PS4 designs will have half the number of CPUs and GPU compute units and the same performance as developer platforms. The PS4 would be much less expensive. This is a Sony marketing decision but AMD will want a powerful platform if it's to point to it and boast "Made with AMD HSA SOC building blocks".

Am I right in thinking that you're talking about a GPU with theoretical performance of around 1.4Tflops. Then going off the "more than halving program execution time" comment to suggest it could perform above the same level of a 2.5Gflop GPU because of the improved efficiency afforded to it by 3D stacked memory?
 

i-Lo

Member
Hope Sony don't kill them self trying to much for more power when noone going to see it out side hard core tech guys, not just from a cost to making

Game should be about game play and art style from now on we not going to have a big leap like ps2 to ps3 anymore

You should know that greater power also allows for simplification of certain time consuming processes. Also, looking at UE4, the next generation will be about tools that not only evolve on aspects that already exist today but also introduce new methods for saving time and costs. This in itself requires a more advanced hardware.

More power means:

1. Advancements in things that are visually perceivable &
2. Ones that are not so but affect the game world with similar significance

People have to realize that a powerful GPU can do more than just render prettier graphics compared to current gen systems.
 
Not sure if this is news-worthy or not, but...

Sony have just put a competition live where UK gamers can win a trip to E3 with the PlayStation crew to get some amateur coverage from a member of the community.

http://www.youtube.com/watch?v=JdYUW-2QftQ

The interesting thing is the tags for the video include "ps4" and "orbis".

Possibly going to be at E3 after all? :eek:


Is PlaystationAccess handeled by Sony? If so I say this deservs its own thread.
 
Imagine if they actually went with 16GB Ram lol. Crytek crying tears of joy, and Newegg making billions of dollars confirmed :p (I realize that not how it really works).

Jeff, there is one thing I wonder about though. If the PS4 is to be AMD's HSA banner carrier, what do you think of that one poster's leak that AMD had made the next Xbox's components higher priority?
AMD needs at least one next generation game platform to be full AMD HSA and powerful, it can be the next Xbox, it doesn't have to be the PS4. Edit: more than 2 3D wafers stacked on top of each other will wait for 2015-2016 due to the vertical height of the stack causing problems with voids needing fill. Logic/CPU and GPU if one layer would require too much fill on top of them and the glue/fill does not transfer heat as well as near direct contact with the outside of the case.

Donnie said:
Am I right in thinking that you're talking about a GPU with theoretical performance of around 1.4Tflops. Then going off the "more than halving program execution time" comment to suggest it could perform above the same level of a 2.5Gflop GPU because of the improved efficiency afforded to it by 3D stacked memory?
Yes for inside the SOC only, but it's not just the GPU, the example was for multiple CPUs and GPGPU or Sony's first choice of 4PPU16SPU. Some processes overlap, for example the CPU can be used to prefetch for the GPU with a common memory interface/cache and zero copy there are supposed to be 113% efficiencies there. But some of the efficiencies are duplicates between 3D stacked memory and common memory pool, common cache etc. so the total is less that the sum of efficiencies. Each of the studies did not take into account other processes that might duplicate efficiencies.

The second GPU outside of the SOC will not be able to take advantage of the 3D stacked ultra wide I/O memory. It's probably going to be connected to the SOC main memory with AMD Hypertransport or PCIe buss or a 64 bit wide memory buss. That's one of the unanswered questions. It may have 1 gig of 3D stacked memory local for the second GPU with a data/memory bottleneck between SOC and second GPU. Still the CPU in the SOC can prefetch for the second GPU with some efficiency for it under 113% depending on bottleneck.

All GPU elements inside the SOC gives the best performance. Developer designs might be a way to duplicate what may eventually be inside one very large SOC.
 

Donnie

Member
AMD needs at least one next generation game platform to be full AMD HSA and powerful, it can be the next Xbox, it doesn't have to be the PS4.

Yes for inside the SOC only, but it's not just the GPU, the example was for multiple CPUs and GPGPU or Sony's first choice of 4PPU16SPU. Some processes overlap, for example the CPU can be used to prefetch for the GPU with a common memory interface/cache and zero copy there are supposed to be 113% efficiencies there. But some of the efficiencies are duplicates between 3D stacked memory and common memory pool, common cache etc. so the total is less that the sum of efficiencies. Each of the studies did not take into account other processes that might duplicate efficiencies.

The second GPU outside of the SOC will not be able to take advantage of the 3D stacked ultra wide I/O memory. It's probably going to be connected to the SOC main memory with AMD Hypertransport or PCIe buss or a 64 bit wide memory buss. That's one of the unanswered questions. It may have 1 gig of 3D stacked memory local for the second GPU with a data/memory bottleneck between SOC and second GPU. Still the CPU in the SOC can prefetch for the second GPU with some efficiency for it under 113% depending on bottleneck.

All GPU elements inside the SOC gives the best performance. Developer designs might be a way to duplicate what may eventually be inside one very large SOC.

Eliminating bandwidth bottlenecks can certainly increase performance, but I don't think we should take the numbers they're giving quite so literally here. They're probably best case examples (what are they comparing this supposed stacked 3D memory too?) and also based on massively parallel CPU processing with equally massive bandwidth requirements.

I just don't buy a 1.4Tflop GPU with 3D stacked memory on a SOC giving you performance equivalent to a 2.5Tflop+ GPU without, not unless the 2.5Tflop+ GPU is amazingly poorly designed.
 

JABEE

Member
Using Orbis or PS4 on an official PlayStation video is confirmation. Orbis is a name only talked about in rumors? Why would Sony use that as a promotional device or tag if it wasn't coming to E3?
 
Using Orbis or PS4 on an official PlayStation video is confirmation. Orbis is a name only talked about in rumors? Why would Sony use that as a promotional device or tag if it wasn't coming to E3?

Because it's a way to get hits from those who are searching using said terms?
 
Unfortunately the guy who put up the video posted a comment a little later:

Don't read into that - it's just a few terms that are being searched very regularly :)

...but I really hope we will see the PS4 at E3.
 

JABEE

Member
Because it's a way to get hits from those who are searching using said terms?

Yes, but I'm almost positive that using an official codename for the system is not something Sony condones. using an official codename to get more hits on a YouTube video would seem to not make any sense. Imagine Microsoft officially tossing around the word Durango. This is either a leak or Sony's European marketing people are not following NDAs.

Edit: It's also an E3 video, so even if SCE in the UK was just trying to get hits those tags are almost certainly saying that PS4/Orbis is an E3 2012 related concept.
 
Yes, but I'm almost positive that using an official codename for the system is not something Sony condones. using an official codename to get more hits on a YouTube video would seem to not make any sense. Imagine Microsoft officially tossing around the word Durango. This is either a leak or Sony's European marketing people are not following NDAs.

Unless Orbis isn't the codename, at least not any longer.
 

JABEE

Member
Unless Orbis isn't the codename, at least not any longer.

Then why would they use a fake codename for a YouTube video that will likely only get a few thousand hits. Are there any other examples of that channel using non-related tags to draw in viewers? I'm almost certain this is a leak and that is their hasty response to the people who noticed it. The Orbis name is almost confirmed as being a codename at some point in the development, because there was an art website that used Orbis and showed people playing some motion control game.
 
Then why would they use a fake codename for a YouTube video that will likely only get a few thousand hits. Are there any other examples of that channel using non-related tags to draw in viewers? I'm almost certain this is a leak and that is their hasty response to the people who noticed it. The Orbis name is almost confirmed as being a codename at some point in the development, because there was an art website that used Orbis and showed people playing some motion control game.

I don't understand the big deal. If there was a rumor that Sony was releasing a machine called SweetsmotheredCornbreadandchili and people know about said name, why wouldn't Sony try to capitalize on it to get views? Regardless of whether it is or isn't a legitimate name?
 

Ashes

Banned
Yes, but I'm almost positive that using an official codename for the system is not something Sony condones. using an official codename to get more hits on a YouTube video would seem to not make any sense. Imagine Microsoft officially tossing around the word Durango. This is either a leak or Sony's European marketing people are not following NDAs.

Edit: It's also an E3 video, so even if SCE in the UK was just trying to get hits those tags are almost certainly saying that PS4/Orbis is an E3 2012 related concept.

Sony is a behemoth in terms of marketing prowess. No youtube poster pr guy will know about secret product launches or e3 lineups.

It was a 'hits' mongerer. That's how these things work. Trust me on that. :p

But that doesn't mean that Sony won't show the ps4 off at E3. Outside chance, but there's that little bit of hope there to cling onto.*

*And that has nothing to do with this youtube vid. And everything to do with the things we've already been discussing in this thread. ;)
 
Eliminating bandwidth bottlenecks can certainly increase performance, but I don't think we should take the numbers they're giving quite so literally here. They're probably best case examples (what are they comparing this supposed stacked 3D memory too?) and also based on massively parallel CPU processing with equally massive bandwidth requirements.

I just don't buy a 1.4Tflop GPU with 3D stacked memory on a SOC giving you performance equivalent to a 2.5Tflop+ GPU without, not unless the 2.5Tflop+ GPU is amazingly poorly designed.
Yup, best case examples not taking into account other efficiencies that overlap and other efficiencies that might multiply either. Edit: remember that AMD has Infinity view it has to support as well as 4K and 8K (Sony CTO) with a total framerate of 300FPS

Notice AMD was staging efficiencies in multiple years with economical and practical designs and getting about 40% better/faster each year. On same die processes only because to have separate GPU-CPU or DRAM would require a transposer or wiring in a package that would increase cost. That last step had to wait till this year with "Process optimized building blocks" to be 2.5D assembled on a custom designed substrate which is a SOC.

So the biggest single change coming with SOCs should be ultrawide I/O 3D stacked memory in the SOC as well as CPU and GPU cache changes to fully use such a memory. Edit: 3D stacked memory has not been confirmed.

"Process optimized building blocks" Memory uses a different build process than GPUs or CPUs, Northbridge can be made at 22nm while GPU must be made at 28nm, power control circuits made at larger die sizes and included in the SOC, and more.

AMD has been planning for this for 5 years and it's not just Process optimized building blocks, it's also 3D wafer stacking. Memory and FPGA 3D wafer stacking are in the PDF I cited but GPUs can benefit from 3D wafer stacking by reducing 2000 Compute unit GPUs to wafers of 300 or so, pre-testing and stacking them. You get reduced cost as well as a bump in process speeds if you increase the data buss width completely through to memory access. This is not practical in conventional designs. Will a 3D stacked GPU be in the PS4 SOC, is the 2nd GPU to be inside the SOC as part of a 3D stacked wafer 2.5D attached to SOC substrate.

Several posters have asked how much faster the PS4 would be with several of my posts mentioning efficiencies and I have not answered. Partly for the reason you mentioned, partly because we don't know if Sony will use them to reduce cost and partly there is so much overlap in these processes.


phosphor112 said:
Apparently I missed the post where they originally wanted a "Cell" design (4 PPU 16 SPU). I knew I wasn't crazy for thinking that.
It's just a few posts above on this page. Essentially I reread the entire Sony patent and noticed one of the methods to connect 1PPU4SPU building blocks was similar to the AMD descriptions of HSA with an identical cache arrangement used for the Trinity 4 CPU to DDR3 memory controllers. Second, 4PPUs and 16SPUs is approximately the same FLOP performance as the chosen AMD Fusion 4CPU-GPU and finally from white papers the ratio of PPU to SPU and number of SPUs with their own cache is a best case design according to the white papers I have read.

In other words the 1PPU4SPU is the redesigned Cell (patent published Dec 2010) that is designed to be a building block for SOCs which AMD/IBM have been working on for 5 years (since 2008) and IBM has been coordinating with Sony. It can use 3D stacked memory (any very fast memory), and does not have a dedicated Flex I/O or XDR ram interface as part of the chip...it's designed for a SOC while the Cell with Flex I/O and XDR memory interface is designed to attach to a motherboard!
 
From AMD:

http://www.anandtech.com/show/5847/answered-by-the-experts-heterogeneous-and-gpu-compute-with-amds-manju-hegde said:
In HSA we have taken a look at all the issues in programming GPUs that have hindered mainstream adoption of heterogeneous compute and changed the hardware architecture to address those. In fact the goal of HSA is to make the GPU in the APU a first class programmable processor as easy to program as today's CPUs. In particular, HSA incorporates critical hardware features which accomplish the following:

1. GPU Compute C++ support: This makes heterogeneous compute access a lot of the programming constructs that only CPU programmers can access today

2. HSA Memory Management Unit: This allows all system memory is accessible by both CPU or GPU, depending on need. In today's world, only a subset of system memory can be used by the GPU.

3. Unified Address Space for CPU and GPU: The unified address space provides ease of programming for developers to create applications. By not requiring separate memory pointers for CPU and GPU, libraries can simplify their interfaces

4. GPU uses pageable system memory via CPU pointers: This is the first time the GPU can take advantage of the CPU virtual address space. With pageable system memory, the GPU can reference the data directly in the CPU domain. In all prior generations, data had to be copied between the two spaces or page-locked prior to use

5. Fully coherent memory between CPU & GPU: This allows for data to be cached in the CPU or the GPU, and referenced by either. In all previous generations GPU caches had to be flushed at command buffer boundaries prior to CPU access. And unlike discrete GPUs, the CPU and GPU share a high speed coherent bus

6. GPU compute context switch and GPU graphics pre-emption: GPU tasks can be context switched, making the GPU in the APU a multi-tasker. Context switching means faster application, graphics and compute interoperation. Users get a snappier, more interactive experience. As UI's are becoming increasing more touch focused, it is critical for applications trying to respond to touch input to get access to the GPU with the lowest latency possible to give users immediate feedback on their interactions. With context switching and pre-emption, time criticality is added to the tasks assigned to the processors. Direct access to the hardware for multi-users or multiple applications are either prioritized or equalized

As a result, HSA is a purpose designed architecture to enable the software ecosystem to combine and exploit the complementary capabilities of CPUs (sequential programming) and GPUs (parallel processing) to deliver new capabilities to users that go beyond the traditional usage scenarios. It may be the first time a processor company has made such significant investment primarily to improve ease of programming!

In addition on an HSA architecture the application codes to the hardware which enables user mode queueing, hardware scheduling and much lower dispatch times and reduced memory operations. We eliminate memory copies, reduce dispatch overhead, eliminate unnecessary driver code, eliminate cache flushes, and enable GPU to be applied to new workloads. We have done extensive analysis on several workloads and have obtained significant performance per joule savings for workloads such as face detection, image stabilization, gesture recognition etc…

Finally, AMD has stated from the beginning that our intention is to make HSA an open standard, and we have been working with several industry partners who share our vision for the industry and share our commitment to making this easy form of heterogeneous computing become prevalent in the industry. While I can't get into specifics at this time, expect to hear more about this in a few weeks at the AMD Fusion Developer Summit (AFDS).

So you see why HSA is different and why we are excited :)
2,3,4 and 5 above are part of the efficiencies in the GA Tech paper, not mentioned is DRAM in the SOC or SOC at all. That is probably waiting for the Developer Summit. As mentioned before E3 and AMD Fusion Developer Summit occur at nearly the same time.

"Finally, AMD has stated from the beginning that our intention is to make HSA an open standard, and we have been working with several industry partners who share our vision for the industry and share our commitment to making this easy form of heterogeneous computing become prevalent in the industry." IBM, Samsung, Global Foundries and Khronos confirmed and Sony most likely (Cell was the first attempt at HSA). Sony is sharing technology with Samsung, both are using Gnome technology for their browser and Samsung has confirmed a browser desktop UI based on GTKwebkit for Tizen and my opinion is that Sony will do the same with the Vita and PS3.

"While I can't get into specifics at this time" why? does it give away NDA information about PS4 or next Xbox?

"it is critical for applications trying to respond to touch input to get access to the GPU with the lowest latency possible to give users immediate feedback on their interactions" Mentioned in a Samsung paper on Tizen and webkit2 is touchscreen response is slower. AMD to get into touchscreen devices in a big way? SOC the next step to handhelds, game controllers, CE controllers? Near zero power standby is an AMD feature.
 
It is entirely possible that 2 X86 cpus were taken out of the PS4 AMD only design and two 1PPU4SPU modules substituted. This is wild speculation and several things must also be speculated to support this. 1) A slimmer slim is coming with a redesigned Cell built with two 1PPU4SPU modules in a SOC with 3D stacked memory, I/O and GPU. It will essentially be a complete PS3 in one SOC @ 28nm. It will allow for a major cost reduction as well as provide an economy of scale for the SPU module. 2) Sony has plans for the SPU building block in other platforms. 3) The Slimmer Slim will be the PS3.5 we have been speculating (Digitimes rumor) and should ship before the PS4 sometime late this year. Either this is all true or none is.

There are many rumors and they appear to contradict themselves. Understanding them requires a very wide view of what is possible. Without knowing about the coming AMD SOC, 3D stacking, HSA requirements, 3D stacked memory (faster than XDR2 even with standard I/O) being cheaper and all this coming on line in 2012 ramping up to full production in 2013, you can't make sense of the rumors.

The multiple methods in the Sony patent for configuring the 1PPU4SPU module might be for 1) PS3.5, 2) PS4, and other platforms. Mentioned earlier in the same year (2010) was Sony not having plans to refresh the PS3 @ 32nm, they were waiting on something. Maybe they were waiting for the IBM/Global foundries/Samsung consortium "building block" SOCs and 3D stacked memory to come on-line.

Were PS4 developers told to use only OpenCL and Hsail (HSA IL) at this time? Only upper level and APIs.

Edit: A PS3.5 build with the features I speculated is going to have to emulate a PS3, to do so it's going to need more memory and be faster at some operations or have unused processors that could help with emulation; 2 1PPU4SPU modules would have 1 PPU and 1 SPU free. I suspect that 3D memory wafers for the PS4 and Next Xbox are being produced now; they will most likely be 1 gig wafers. Many are going to be partially defective and could be used in a PS3.5 that is going to need some amount of memory above 512meg.

Much of the same code to emulate a PS3 in a PS3.5 could emulate a PS3 in a PS4. Assumption is that some of the I/O and hardware in a PS4 will be the same as in a PS3.5. Think stacked memory and stacked GPU with partially defective subsets (wafers) used for the PS3.5.

Another really wild question, could a PS3.5 SOC also emulate an Xbox360? Could the Oban (Japanese name) be such a SOC being made by IBM for both Microsoft and Sony and according to rumors, is being made now? It is way too early for a Next Xbox console to be manufactured. Remember the domain name registration Microsoft-Sony.com and Sony-microsoft.com!

Edit: All the above hinge on 3D stacked memory in Gigabyte quantities being available. DRAM in 80 megabyte quantities stacked on processors is now available.
 
Keep up the great work Mr. Rigby. Really enjoying your hard work and analysis.

+1 A really interesting read everyday.

I can only hope that the majority of those speculations and insights come true - for Sony. The MS possibility in this case makes me shiver since Sony really needs to put on their A game to succeed in the next generation.

With Wii-U power probably being in the regions of PS3+/360+ (this is not to troll but I simply doubt that a console which launches in 2012 is as powerfull as a PS4 in 2013/14) a cheaper and slim PS3(.5) might indicate a 2014 launch for the PS4.
 
It is entirely possible that 2 X86 cpus were taken out of the PS4 AMD only design and two 1PPU4SPU modules substituted. This is wild speculation and several things must also be speculated to support this. 1) A slimmer slim is coming with a redesigned Cell built with two 1PPU4SPU modules in a SOC with 3D stacked memory, I/O and GPU. It will essentially be a complete PS3 in one SOC @ 28nm. It will allow for a major cost reduction as well as provide an economy of scale for the SPU module. 2) Sony has plans for the SPU building block in other platforms. 3) The Slimmer Slim will be the PS3.5 we have been speculating (Digitimes rumor) and should ship before the PS4 sometime late this year. Either this is all true or none is.

There are many rumors and they appear to contradict themselves. Understanding them requires a very wide view of what is possible. Without knowing about the coming AMD SOC, 3D stacking, HSA requirements, 3D stacked memory (faster than XDR2 even with standard I/O) being cheaper and all this coming on line in 2012 ramping up to full production in 2013, you can't make sense of the rumors.

The multiple methods in the Sony patent for configuring the 1PPU4SPU module might be for 1) PS3.5, 2) PS4, and other platforms. Mentioned earlier in the same year (2010) was Sony not having plans to refresh the PS3 @ 32nm, they were waiting on something. Maybe they were waiting for the IBM/Global foundries/Samsung consortium "building block" SOCs and 3D stacked memory to come on-line.

Were PS4 developers told to use only OpenCL and Hsail (HSA IL) at this time? Only upper level and APIs.

The idea of a 1PPU 4SPU chip sounds highly probable. They aren't dropping Cell for their HDTV's and neither is... Sharp? (don't remember which brands use the Cell in their newer HDTV's). The 2 modules, as you said, can be used to reduce cost of the PS3, while still creating cells for their other hardware. This also would remove problems of having to get simulation/emulation of the PS3 hardware on the stream processors off of the AMD chips.

My question, and you probably can probably take a stab at this answer. How would 2 Cell modules work in tangent with the rest of the HSA model? Seems like coding would become a nightmare, unless OpenCL can unify that system or something.
 
The idea of a 1PPU 4SPU chip sounds highly probable. They aren't dropping Cell for their HDTV's and neither is... Sharp? (don't remember which brands use the Cell in their newer HDTV's). The 2 modules, as you said, can be used to reduce cost of the PS3, while still creating cells for their other hardware. This also would remove problems of having to get simulation/emulation of the PS3 hardware on the stream processors off of the AMD chips.

My question, and you probably can probably take a stab at this answer. How would 2 Cell modules work in tangent with the rest of the HSA model? Seems like coding would become a nightmare, unless OpenCL can unify that system or something.
Toshiba and the idea of HSA IL and OpenCL is to make using the various different CPUs easier to use and in some cases totally platform-CPU independent. HSA is designed to support multiple different CPUs; X86, ARM, FPGA, PPC-SPU. The programmer will have to know about the different CPUs and be able to choose the best for the job...or the OS can do so???

Beyond this and I am out of my depth.
 

onQ123

Member
The idea of a 1PPU 4SPU chip sounds highly probable. They aren't dropping Cell for their HDTV's and neither is... Sharp? (don't remember which brands use the Cell in their newer HDTV's). The 2 modules, as you said, can be used to reduce cost of the PS3, while still creating cells for their other hardware. This also would remove problems of having to get simulation/emulation of the PS3 hardware on the stream processors off of the AMD chips.

My question, and you probably can probably take a stab at this answer. How would 2 Cell modules work in tangent with the rest of the HSA model? Seems like coding would become a nightmare, unless OpenCL can unify that system or something.

Virtualization

DesignCon Keynote Speaker AMDs Joe Macri on Heterogeneous Computing
 

i-Lo

Member
yep & that's why I hope the talk about FPGA is true , so they can use each part of the PS4 for running code that it's good at without having a hard time coding for the CPU, GPU & FPGA.

So what would it exactly mean in terms of practical performance (gains)?
 
The reference to gigabyte 3D stacked wafers being tested attached to quad core CPUs was from 2008. IBM 3D stacking is going on line in 2012 ramping up to full production in 2013 and one of the first products is the logic layer for the 3D stacked memory from Micron and Samsung. The Global Foundries "process optimized building blocks" and custom SOCs is going on-line in 2012 with ramp up to full production in 2013. Global Foundries/AMD + IBM TSV 3D stacking going on line at the same time MIGHT indicate something but what?

http://chipdesignmag.com/lpd/blog/2011/10/06/samsung-micron-unveil-3d-stacked-memory-and-logic/ said:
What becomes particularly interesting with 3D memory is the possibility of using the memory much more judiciously with heterogeneous cores so only the resources that are needed are actually used. That can save on power while also reserving enough performance for those applications that require more memory and processing power. These memories can be used both in 3D stacks, as well as 2.5D stacked configurations where the memory is connected through an interposer layer.

Both Graham and Pablo Temprano, director of DRAM and graphics marketing at Samsung Semiconductor, acknowledged there are numerous possible scenarios for using this technology. They noted that some customers also are looking at using 3D stacked memory to replace some of the cache on a chip because moving data in and out of memory can be extremely fast.
http://www.infoneedle.com/posting/100175?snc=20641
get-attachment3_aspx.gif
(A.6) Wide I/O DRAM
Recently, the Hybrid Memory Cube (HMC) Consortium which includes companies like Micron, Intel, Altera, Samsung, Open Silicon, Xilinx, and IBM created an entirely new technology (http://www.micron.com/innovations/hmc.html). The end results could be a high bandwidth (15x more than DDR3), low power (70% less energy per bit than DDR3), and small form-factor (90% less space than RDIMMS) product as shown schematically in Figure 2a and is planned for fabricated with one of IBM’s via-middle TSV technologies shown in Figure 2b [1]. The cross-section is shown in Figure 2c and the logic/memory interface (LMI) is following the new JEDEC Wide I/O SDR (JESD229) standard (http://www.jedec.org/), which is shown in Figure 2d.
Normally a chip or SOC design is a 2+year effort and there is no way a new memory with new standard could be used without a large lead time. It's now possible to reduce time to market (From AMD) using process optimized STANDARDized building blocks in a SOC. That's half the story, as seen in the picture a SOC substrate with TSVs or bumps must be designed and manufactured. IBM probably has software to design and build a Substrate for a SOC using "Building blocks" built to known standards with little lead time.

The Hybrid Memory Cube appears to be a serial standard like SATA for Hard disks. As such it is probably not practical inside a SOC but the HMC without the logic layer is made up of stacked ultrawide I/O memory wafers built to a standard that might be used for other applications.

PDF on Hybrid memory cube confirms a serial interface with the maximum transfer speed of 1Tb/sec only possible with a optical buss.
http://www.edn.com/article/521730-Microsoft_joins_Micron_memory_cube_effort.php said:
Micron says it will deliver early next year 2 and 4 Gbyte versions of the Cube providing aggregate bi-directional bandwidth of up to 160 Gbytes/second.

Separately, the Jedec standards group is working on a follow on to the 12.8 Gbit/second Wide I/O interface that targets mobile applications processors. The so-called HB-DRAM or HBM effort is said to target a 120-128 Gbyte/second interface and is led by the Jedec JC-42 committee including representatives from Hynix and other companies.
Micron is making 2 gig and 4 gig stacks of ultrawide I/O memory and IBM is making the logic layer, it's not known at this time who is assembling the HMC. The 2 gig and 4 gig Micron memory stacks without the logic layer could be used for other purposes and IBM has the TSV connection template for the Micron memory because they make the logic layer.

Micron Stockholder meeting August 2011:

Graphics and consumer. Fair to say, a little bit of a slowdown here, specifically in the DTV segment. I'll speak more about what's happening in game consoles as well. A pretty good push for more memory coming up in the Game Console segment as a level of redesigns. We'll start to hit it over the next couple of years.

And talking about consumer again here. I thought it'd be beneficial to show you across a couple of key applications how this looks in terms of megabyte per system. On the left, what we have are game consoles. This is a space that's been pretty flat for a number of years in terms of the average shipped density per system. That's going to be changing here pretty quickly. I think everyone realizes that these systems are somewhat clumpy in their development. The next generation of system is under development now and that because of 3D and some of the bandwidth requirements, drives the megabyte per console up fairly quickly. So we're anticipating some good growth here.

We've worked with a number of these vendors specifically on both custom and semi-custom solutions in that space.

I hear that "target Specs" for the PS4 is for GDDR5 memory:

http://www.brightsideofnews.com/news/2011/11/30/radeon-hd-7000-revealed-amd-to-mix-gcn-with-vliw4--vliw5-architectures.aspx said:
A rumor recently exploded that HD 7900 Series will come with Rambus XDR2 memory. Given the fact that AMD has a memory development team and the company being the driving force behind creation of GDDR3, GDDR4 and GDDR5 memory standards - we were unsure of the rumors.

Bear in mind that going Rambus is not an easy decision, as a lot of engineers inside AMD flat out refuse to even consider the idea of using Rambus products due to company's litigious behavior. However, our sources are telling us that AMD is frustrated that the DRAM industry didn't made good on the very large investment on AMD's part, creating two GDDR5 memory standards: Single Ended (S.E. GDDR5) and Differential GDDR5. Thus, the company applied pressure to the memory industry in bridging GDDR5 and the future memory standard with XDR2 memory. The production Tahiti part will utilize GDDR5 memory, though.

Is AMD going to continue investing in future memory standards? We would say yes, but with all the changes that have happened, it just might take the executive route to utilize available market technologies rather than spending time and money on future iterations of GDDR memory. After all, AMD recently reshuffled their memory design task force. In any case, Differential GDDR5 comes at very interesting bandwidth figures and those figures are something AMD wants to utilize "as soon as possible".
Product list of Hynix
GDDR5
2Gb x32/x16 7.0Gbps Game console, Desktop, Notebook, Workstation, HPC
1Gb x32/x16 6.0Gbps

So game console(s) are going to use GDDR5 (which one and how much). Developer platforms using existing NON-SOC hardware have 2 64 bit DDR3 interfaces for the Fusion ALU and if they are using a faster more efficient 2nd GPU could use GDDR5 for it. All I/O and memory transfer between second GPU and ALU is through the PCIe buss, they don't have a common memory buss. This is a PC design not a game console.

The final design SOC could have the second GPU migrate into the SOC and then it wouldn't use GDDR5 memory but some faster than DDR3 main memory which should be 3D stacked memory, there is no faster than DDR3 until later this year when DDR4 is released (Slower than GDDR5). But guys with access to developers are stating that documentation states the target memory is 2 gig GDDR5 which does not make sense with current designs unless the second GPU and SOC share the same GDDR5 memory buss rather than using the PCIe buss like PCs do. That's a significant design change.

Using key words in the AMD cite above (GDDR5 Differential) brings up a 2011 PDF from Hynix memory
3 Options after GDDR5

• GDDR5 Single-ended I/O
- Max. 8Gbps with same power

• GDDR5 Differential I/O
- Max. 14Gbps with much more power

• HBM*(Wide I/O with TSV) High Bandwidth Memory HBM migrating to mainstream 2-3 years after High-end segmentation
- Lower speed with many I/Os and low power => Handheld low power DRAM
 Upgradable DRAM Speed ==> using higher power faster DRAM at higher clock speed, not for handheld
 Increasable # of I/O
 Flexible # of stack
Graphics Card, HPC, Workstation

Interim Solution between DDR3 and HBM
GDDR5M

 Speed : Max. 4.0Gbps @1.35V
 IO : x16/x8

Based on the strong partnership with AMD, Hynix navigates the best Graphics solutions of each system for the future today.
"Differential GDDR5 comes at very interesting bandwidth figures and those figures are something AMD wants to utilize "as soon as possible". "the company applied pressure to the memory industry in bridging GDDR5 and the future memory standard with XDR2 memory" GDDR5 + some of the XDR2 memory features = differential GDDR5.

It's possible that the final design could have second GPU and SOC share the same GDDR5 differential memory buss rather than using the PCIe buss like PCs do. That's a significant design change from developer platforms and not mentioned in any roadmaps. Differential "XDR" buss GDDR5 would be needed just as the PS3 needed XDR1 because of the speed and length of buss lines outside the GPU and SOC. This would allow efficiencies for the CPU to prefetch for the second GPU as well as a common memory for Zero copy.

Still think the SOC should have a 100 meg or so of Ultrawide I/O memory, may or may not be 3D stacked. Best long term cheapest solution is to have second GPU in the SOC with 2 gigs of 3D stacked ultrawide I/O memory which appears to be on the roadmap and as mentioned before, timing puts it as a possible for 2014. Developers were told target spec is GDDR5 which with something like a Differential buss or wider than normal data buss can increase memory bandwidth. Also with custom packaging mentioned above, a reduction in the number of chips that have to be attached to the motherboard can reduce the drive voltage and current.

Tradeoffs for differential buss memory is a more expensive motherboard (more traces) and memory with higher power but memory buss driver in SOC and GPU would be driving a lower voltage and should run cooler.

AMD has access to wide I/O DRAM interface

EDIT:

It appears that it's possible to have 2 gigs of wide I/O inside the SOC now but 4 gigs would have to wait and/or be more expensive. The following needs to be understood completely, quad channel DDR3 or 4 "don't think about it" needs to be understood, it's not gong to be in a future design for the same reason the PS3 Cell @ 40nm can't be easily scaled to 32nm, the XDR interface is too large just like 4 DDR channels in a AMD Fusion would be. The next node process shrink would have issues and that should be part of Sony and Microsoft long range plans. A custom memory interface is an absolute MUST!

http://www.amdzone.com/phpbb3/viewtopic.php?f=532&t=139005&start=50#p218132 said:
DDR4 is not much faster than DDR3 ... it all depends on latencies (tymings). Also DDR4 tends to consume more relative power for the higher "speeds" ... better would be AMD to launch a LR (load reduced) non-registered non-ECC high speed standard!... just a crazy idea!... but since AMD is now in the DRAM business also, it could make better with what is already there.

About a quad channel APU, just DON'T think about it... DRAM DIMM interfaces are absolutely HUGE and scale terribly bad with smaller process nodes ... the necessary lane layout would mean a much larger chip than necessary, perhaps curtailing the possibility of ULV 17W bins. Also it defeats the purpose for mobile, entry to mainstream desktop markets...

The better way to deal with the memory bandwidth problem of APUs is to have TSV eDRAM... IBM could help in the design ( i think AMD already has a license for the macro designs) and also the Wide I/O DRAM interface of which consortium AMD is part, all fits like a charm.

So after Kavery we can have an APU with 512 to 1GB of TSV DRAM on the package, no POPs no interposers, cooling solutions already exist, so the "execution" parts don't have to lose much (if anything).

heck! it can have also an interposed on package like Haswell will have 512MB to 1GB of DRAM

Heck this is nothing new, quite before any Haswell, only it could be way much better... and instead of a Radeon it will be an APU on that package... and by the time it will be necessary it can have 2 DRAM chips instead of 4, cause TSV is starting to be big among DRAM IDMs, and those chips can have up to 8 stacked dies (4Gbit/die each means 4GB/chip or 8 GB for total of 2 DRAM chips on package -> meaning 1 to 2GB total will be relative inexpensive... heck! 4 DRAM dies + 1 controller (a LR type ) and it will be 2GB in one single chip (with up to 128bit interface on the Wide I/O standard)... for the GPGPU on the APU!..

17a.jpg

Micron stated in a stockholder meeting that they are providing custom and semi-custom memory for next generation game consoles.

2 Gig of stacked RAM in the SOC (one to two layers high only, more and fill/heat issues crop up) would be an efficient and cost effective design. More memory at this time would have to be outside the SOC (probably a Load Reduced DDR3 128 bit wide custom package) but could be attached to a package like in the above picture. These are the kinds of design changes that can take place, the SOC is probably locked at this point.

In this picture on the bottom right is a prototype SOC with 2 memory chips inside the SOC (two rectangles on the right).

15.jpg


So either way, inside the SOC or outside on the package is possible. Trace length is not an issue with either method but the number of pins, how wide the Memory buss (I/O) is an issue. Inside the SOC you could have a 512 bit buss, outside the SOC on the package 256 or 128 bit buss.

Only if there is going to be a second GPU outside the SOC would GDDR5 memory be used. DDR3 memory as a second pool for the GPU is not going to happen if it's a HSA design. Rumors of split pool and odd memory sizes are probably from developer platforms which can only approximate a final design using current hardware made for PCs.

Over and over I see Memory wall issues mentioned (memory bandwidth has not kept up with CPU needs) and APU Fusion SOCs needing large memory bandwidth, Game consoles will push against the memory wall even harder.

This is the point of the SimiAccurate post on the PS4, "Stacked memory and lots of it". Stacked memory uses TSVs to stack like on like memory to increase the density which decreases the total trace length on motherboards. This is necessary to have faster memory and also reduces motherboard costs. Game console volumes justify/make practical a custom memory for a game console. The interface between SOC to memory outside the SOC is another "Game console volume justifies/makes economically practical" the cost of a custom interface.

Just to be clear about custom memory interface; current DDR memory controllers are 64 bit wide and to create a 256 bit buss you use 4 DDR memory controllers. A custom DDR memory controller connected to custom memory might be ONE 256 bit DDR memory controller connected to between 1 and 4 external Stacked memory chips that would be replaced in the next refresh with a HMC "Copper"multi-serial interface or an optical (1Tbye/sec) interface. Game consoles don't need expandable memory so the most cost effective design is to include the memory inside the SOC with a 256 or 512 bit interface. If this is not possible this generation then a 256 bit external interface allows for an easy refresh to support 256 bit memory inside the SOC in 2 years.
 
Wait... so... is that last part saying that they're (allegedly) using XDR2 instead of GDDR5 for their latest GPU because it doesn't require them to spend a lot on R&D? Even though they aren't fans of Rambus?
 
Speaking of which, Rambus is hosting Barclays Global Technology, Media and Telecommunications Conference RIGHT NOW

http://investor.rambus.com/eventdetail.cfm?EventID=114245

Click to login, just type in name, student.. email

EDIT... just double posted... Derp.

-No idea who is talking right now...
-Just talked about ultra thin memory formats that they have.
-Currently talking about ...security business... by end of year they will be incorporated into many set top boxes and other DRM related products... "Cryptofirewall"... related press release from Rambus earlier this year.
-Nothing interesting yet...
-DPA Countermeasures mentioned for "game consoles"
-Implementation of... (i yawned... fuck) into SOC's... guessing Cryptofirewall (it has a hardware component). He said it's required to do this to prevent bypassing this and things like counterfeiting. He said he mentioned hardware companies before... I joined this late... Samsung was just mentioned for "Vertical technology"...

Technical difficulties? =/...

They'll have it available afterward anyway...

GlobalFoundries is plastering on the walls of their website that they have already ramped up full production of "HKMG" products like Llano from AMD.
Also, Rambus and GlobalFoundries has some low power HKMG memory going on. GF also gave Rambus an award for Best Innovator last year... I'm not good at piecing things together like Jeff.. So I'm just posting what I've found.
 

StevieP

Banned
Find me some (any) orders of XDR2 in any company's production pipelines. I'm talking chips being stamped out somewhere. I'm not talking about those rumours of Tahiti shipping with XDR2, because its top-end model (7990) has 6GB of GDDR5
 
Find me some (any) orders of XDR2 in any company's production pipelines. I'm talking chips being stamped out somewhere. I'm not talking about those rumours of Tahiti shipping with XDR2, because its top-end model (7990) has 6GB of GDDR5
phosphor112 misread, it's something we have not heard about, "Differential GDDR5 comes at very interesting bandwidth figures and those figures are something AMD wants to utilize "as soon as possible". "the company applied pressure to the memory industry in bridging GDDR5 and the future memory standard with XDR2 memory" GDDR5 + some of the XDR2 memory features = differential GDDR5. Which is probably Micron's; "We've worked with a number of these vendors specifically on both custom and semi-custom solutions in that space."

Developer platforms using existing NON-SOC hardware have 2 64 bit DDR3 interfaces for the Fusion ALU and if they are using a faster more efficient 2nd GPU could use GDDR5 for it. All I/O and memory transfer between second GPU and ALU is through the PCIe buss, they don't have a common memory buss. This is a PC design not a game console.

It's possible that the final design could have second GPU and SOC share the same GDDR5 memory buss rather than using the PCIe buss like PCs do. That's a significant design change from developer platforms and not mentioned in any roadmaps. Differential "XDR" buss GDDR5 would be needed just as the PS3 needed XDR1 because of the speed and length of buss lines outside the GPU and SOC. This would allow efficiencies for the CPU to prefetch for the second GPU as well as a common memory for Zero copy.

Tradeoff is a more expensive motherboard and memory. Still think the SOC should have a 100 meg or so of Ultrawide I/O memory, may or may not be 3D stacked. Best long term cheapest solution is to have second GPU in the SOC with 2 gigs of 3D stacked ultrawide I/O memory.
 
Find me some (any) orders of XDR2 in any company's production pipelines. I'm talking chips being stamped out somewhere. I'm not talking about those rumours of Tahiti shipping with XDR2, because its top-end model (7990) has 6GB of GDDR5

Maybe it'll be used for the SOC? Smaller and faster than DDR3, DDR4, and GGDR5. GF's and Rambus are working on 28nm designs for "advanced SOC development." Only company that I can think of that has extensively done SOC development is AMD.

EDIT: How would a "differential 'XDR' bus GDDR5" work?... Are you saying for example it would be like a GDDR5 with... lets say... flexIO? or some "XDR" feature to increase performance?... why wouldn't they just go with XDR2 at that point?
 

StevieP

Banned
Maybe it'll be used for the SOC?

Currently, XDR2 exists only in the form of engineering documents on computers inside Rambus - and it has been this way since 2005.

There were some hopeful rumours in regards to Tahiti sporting it, but then we had a 7990 with 6GB of GDDR5 show up
 
Interesting find off topic here though, in 2008 Toshiba was saying Cell & 4SPU:

New applications by SpursEngineTM

Super-Real-Time Transcoding: Transcoding at faster than real-time
Indexing: Video categorizing during HDD storing, DVD burning
Gesture I/F: Control various devices in the living room by hand gestures
Super-Resolution: Picture resolution upconverting for HDTV
Face Tracking: Realtime 3D face tracking for communication tools
Interactive Gaming: New type of real-time game with Gesture I/F and Face Tracking
Editing: Video editing of consumer generated content

Sony says but this is a 2008 PDF and since then GPUs have improved and OpenMP and OpenCL have advanced as well as there are now extensions to C++;

SPE
Very good performance of general instructions
if(), switch(), for(), while() are fast in C/C++ language
Capable for different processing in parallel (Task parallel model)
2 SPEs for Physics engine, 2 SPEs for vision recognition, 2 SPEs for codec

GPGPU
Limited performance on general instructions
Not good for different processing in parallel (Task parallel model)
Suitable for processing large data with the same calculation (Data parallel model)

SPE is better for general purpose processing to adopt wide range of programming

ftp://ftp.infradead.org/pub/Sony-PS3/mars/presentations/MARS-SIGGRAPH-2008.pdf
 

AgentP

Thinks mods influence posters politics. Promoted to QAnon Editor.
Since 2008 Cell work has withered and died while GPGPU interest has skyrocketed, along with GPU performance. That was also cheer-leading from a Sony Cell engineer, not an unbiased source.
 

i-Lo

Member
All this tech talk is pretty much like pigs flying over the moon for me. At the end of the day, I want to know if any of these potential technological magnificences have a real chance of being the inside the PS4 and if so what would it culminate into on screen....
 
All this tech talk is pretty much like pigs flying over the moon for me. At the end of the day, I want to know if any of these potential technological magnificences have a real chance of being the inside the PS4 and if so what would it culminate into on screen....

more power and more efficiency and maybe (hopefully?) not costing $599 day one.
 

onQ123

Member
I just seen a picture of the leaked PS4 spec sheet it's going to have 20GB of ram (10GB of XDR2 + 10GB of GDDR6 ) Cell with 16 PPE & 128 SPE , Nvidia GPU & all games must be at least 1080P 30fps & full BC


I want to believe lol
 
I just seen a picture of the leaked PS4 spec sheet it's going to have 20GB of ram (10GB of XDR2 + 10GB of GDDR6 ) Cell with 16 PPE & 128 SPE , Nvidia GPU & all games must be at least 1080P 30fps & full BC


I want to believe lol

You know. I saw this on the front page thinking "new news?"

While there wasn't anything new...

I didn't leave empty-handed lol.
 

mhayze

Member
I am totally flummoxed by the FPGA suggestion. FPGAs are a cheap way to simulate something that real hardware (fixed ICs) can do much faster, except they are 'field reprogrammable'. You can come up with random stats like FPGAs may be 100X faster than a CPU, but I can't imagine any real world task that a high powered modern console would need to do, that is worth the development cost for FPGA-specific development within the context of a game. Maybe there are some left field use cases such as an HDMI controller implemented on a FPGA to future proof it, or something, but I just can't imagine Sony actually doing this. Stranger things have happened, I suppose.
 
Top Bottom