The one problem no one seems to have really tackled yet is focus, or depth of focus to be more precise. While this tech simulates eye position and movement coupled with visual queues to make the 3D immersion better than it likely has ever been before, there's still the issue that you're looking at a screen in which all objects are constantly in focus. Real eyesight allows one to selectively focus on objects based on physical distance, ocular convergence, and ocular lens/light ray convergence. With this tech, you're still looking at a flat (though slightly curved) surface and thus all objects on screen are still at the same physical distance relative to your eyes (though again, mostly diminished by the brain working out the difference in viewpoints and ocular distance to extrapolate perceived distance). You can see some of the consequences of focus manipulation in
tilt shift photography. So while the rift will be generally acceptable and even spectacular with just movement through environments, the effect will likely greatly diminish when faced with more complex up close scenes/objects with varying distances layered on top of one another. Such a problem though is far more important in cinema which deals mostly with closeups.
Half the equation is already solved on the digital side since a virtual engine can easily be controlled to refocus on any particular object, and on the analog side there's
plenoptic imaging/
light field photography which is slowly gaining traction and allows for refocusing independent of the time and state of imaging. But in both arenas, there's no way for the user to control the focus naturally, relying entirely on manual adjustment. I expect we'll need to invent some kind of ocular/retinal tracking in order to simulate this in a more immersive way, but that seems immensely impractical. We'll likely need an entirely new display paradigm, either volumetric projection or, much much further down the line, direct neural stimulation, both of which will likely come with their own unique issues to solve.
Unless I'm interpreting something wrong in all of this *shrug*