Memory bandwidth, shared by processor cores, GPGPUs, and accelerators, is looming as a major bottleneck for scaling up the performance of modern applications. As more cores are integrated onto a single die [4, 14, 37, 51], the demand for memory band-width will grow to unprecedented levels. To alleviate this issue, architects have replaced the traditional, intensely contested and congested front-side bus with new interfaces such as integrated memory controllers, AMD’s HyperTransport, and Intel’s QuickPath Interconnect. In spite of these innovations, the bandwidth available to future processor will continue to be restricted by the limited number of pins in the processor’s package. According to the ITRS, the number of package pins will not grow much in the coming decade due to cost and power constraints and most of these additional pins will be delivering power, not data. Therefore, new architectural innovations must be discovered. One promising solution to this problem is the 3Dstacked-DRAM.
For single-threaded memory-intensive applications, the SMART-3D architecture achieves speedups from 1.53 to 2.14 over planar designs and from 1.27 to 1.72 over prior 3D designs. We achieve similar speedups for multi-program and multi-threaded workloads on multi-core and multi-socket processors. Furthermore, SMART-3D can even lower the energy consumption in the L2 cache and 3D DRAM for it reduces the total number of row buffer misses.
our 3D stacked DRAM is also simpler to design than traditional DRAM. Overall, our technique is a very simple design with very high effectiveness, more than halving program execution time on the average.