I don't think people are aware of quite how effective proper 3D/binaural audio is and unfortunately conveying it to people without them trying it is akin to VR.
Before I put on a VR headset my assumption was it's just going to be like a big screen wrapped around in front of me, in 3D and it would simply be "more immersive". Even with all the hype and explanations out there, I wasn't prepared for "presence". When a character gets up in your face you really feel them stood there. I thought I'd still be looking into the world, not literally placed within it. This is the exact same thing for that can be achieved for audio.
Building a convincing virtual soundscape is a vast technological undertaking in itself, but it's just the beginning. You have to convey it to the individual and this appears to be the first solution that not just covers every link in the chain but goes balls to the wall with it, providing the means to do it properly.
You've got the soundscape itself and the audio properties of the assets, you've got the convolution and bouncing of the audio itself, you have a hundred or so of those objects all emitting their own audio within that space and they're all dynamic, mobile and being updated hundreds of times a second... think bullets whizzing through the air, planes flying overhead, raindrops hitting the floor, team mates calling out to you, wind blowing through the trees. You then probe sound from the player's location and have to run it through a transfer function that mimics the acoustic properties a human head and most importantly the structure of the inner ear and how sound bounces throughout it, the more personalised it is, the better.
It's not just stereo but wider, it's not virtual surround, it's not more fidelity in itself.. Your brain should be able to pinpoint the location of everything around you, not just forwards/backwards, left/right but also height. Not just a case of direction but how far away too. This has functional gameplay implications too. Competitive online games will be great.
3D audio has appeared many times over many years often with a lot of trickery, but never with such an expansive amount of potentially dynamic audio sources; and with a pure path from those sources to the ear.
I've been privileged to have a sound engineer create a personalised HRTF profile back in 2016 by using probe mics in my ears and creating impulse responses of my head/ears. The audio I got to listen to was recorded in New York on the usual neumann dummy head with in-ear mics before having additional algorithms applied to it. I listened back on some in-ear monitors worth about £40 and it was the most convincing audio I've ever heard in my life. The striking nature of it made me feel like I was more there than I actually would be if I was really there. "More human than human?"...."More there than there!".
If this can be achieved in a virtual world where imagination as opposed to reality is the driving force, then that's something truly special. And Cerny said all the right things to make me think it can be done. Even cooler is that they at least want virtualise this to make the best of less optimal sound devices such as TVs and Soundbars.
From what we've heard so far MS/NinjaTheory have simply said XSX has "3d audio" whereas Cerny has detailed a promising and highly custom solution in hardware, software and potential approaches to custom HRTF. If it turns out that Sony does indeed have the stronger solution then that is honestly more important to me -- and I think in terms of moving the industry forward -- than ~1980p vs ~2160p for eg.
However, I don't mean to be fanboyish, I'm just stating whats important to me and how awesome this could be. If XSX comes up with something similar, that's great, the more standardised this is and the more people who at least get a chance to try it, the better.
Even 24yrs later, the best quick and dirty example of binaural audio out there is probably the famous qsound virtual barbershop, by comparison the soundscape is constrained and the fidelity relatively poor. And without a tailored HRTF profile it lacks that real sense of locality, decent earphones are also more effective on it than headphones, it does not work with speakers. It's still super cool though and a great little taster, though I'm sure half this forum have heard it by now: