- Level 0: Legacy solutions
- Level 1: Software on traditional GPUs
- Level 2: Ray/box and ray/tri-testers in hardware
- Level 3: Bounding Volume Hierarchy (BVH) processing in hardware
- Level 4: BVH processing and coherency sorting in hardware
- Level 5: Coherent BVH processing with Scene Hierarchy Generation (SHG) in hardware
As been pointed out before and elsewhere, ray tracing is not a new subject or a new computer technique. The following is a bit of an expansion on the six levels proposed by Imagination.
Level 0: There have been many ambitious Level 0 attempts but all unfortunately failed, and yet new designs with custom APIs continue to be announced. The biggest reason for failure was the discontinuity with how traditional GPUs process data. Part of the failure has been trying to create a new paradigm. Without continuity, a completely new and not compatible ecosystem is imposed and doesn’t offer an evolutionary adoption. Imagination Technologies’ OpenRL was the first attempt to have a link with standard 3D APIs such as OpenGL.
Level 1: Ray tracing has been treated as an app and runs on conventual processors, x86 being most common. Such a software solution ensures continuity with the existing ecosystem. Compute/Shader paths are used to execute ray tracing functionality. However, because a scene can have so many rays running simultaneously, a 2-, 4-, or even 16-core CPU will have difficulty with performance due to computational load. For realtime experience, one must use many tricks, hacks, and shortcuts as well as limit the resolution.
An example is Adshir’s LocalRay where the secondary rays are handled apriority in coherent beams. This not only improves the parallelism and performance but cache usage as well. It is not limited in resolution/usage and no tricks.
Level 2: Ray-box and ray-triangle testers can be implemented in hardware using standard
fused multiply-add operations on GPUs but this repeated operation is expensive (cycles/power/area cost). A Level 2 solution offloads a large part of the ray tracing job to dedicated hardware improving efficiency.
Level 3: Bounding Volume hierarchical
(BVH) processing provides a more extensive offloading of data flow management in hardware. BVH helps cut down the amount of ray testing needed through a hierarchical testing system thus making realtime ray tracing possible. Tracing a ray through the acceleration structure is much more complicated than just ray-box and ray-triangle testing. Complex and dynamic data flow is required where each box test step decides what happens next, e.g., more hierarchical box tests and/or triangle tests. There are significant opportunities to streamline this process by moving the full BVH tree structure walking into hardware. It can improve execution efficiency, bandwidth, and caching efficiency, enabling the next level of ray tracing acceleration.
Level 4: BVH processing with coherency sorting in hardware can increase the processing and bandwidth efficiency of ray tracing. Ray tracing struggles with coherency as bouncing rays generate ever more divergence in ray directions. Each ray needs to walk through the BVH structure and if each ray follows a different path this results in very poor memory access efficiency and caching. As divergent rays also hit different objects, this mismatches with the SIMD nature of all modern GPU architectures: different ray hits mean different shaders. A hardware coherency sorting engine can enable this 4th level of efficiency. Adding coherency sorting across the rays in flights helps with SIMD and BVH memory access efficiency for higher real-world ray rate utilization. This type of hardware coherency engine is similar to
Imagination Technologies’ tile-based deferred rending (TBDR) which uses a unique sorting block to ensure coherent processing of pixels, the coherency engine enables the same for rays. A hardware ray coherency sorting engine enables this 4th level of efficiency.
Level 5: Full acceleration of the ray tracing processing in hardware. Building an efficient BVH structure is complex and expensive. It can be done on the CPU and/or the GPU using a variety of algorithms and approaches. However, achieving optimal level 5 efficiency calls for a dedicated hardware solution. A hardware BVH builder enables much higher performance with high efficiency for very detailed dynamic 3D scenes. When this capability is added to a lower than Level 4 hardware design, it can be recognized as a plus level, e.g., Level 2 Plus solution.