• Hey, guest user. Hope you're enjoying NeoGAF! Have you considered registering for an account? Come join us and add your take to the daily discourse.

Microsoft Research: Acoustics

A couple of videos where released today from microsoft research

"We present the first wave-based interactive system
for practical rendering of global sound propagation effects
including diffraction, in complex game and VR scenes
with moving directional sources and listeners."


This one is a demo of source directivity, acoustic loudness and direction variations


*Notice that it runs inside unreal engine.



In their youtube channel there are various things, including older presentations demonstrating scene-aware sound effects like ambience, attenuation, reverbation, etc
Not bad to browse for anyone interested.
 

CamHostage

Member
Fun stuff, it'll make a difference. Developers will still need to have dialog at a certain level for interactive cutscenes (they already do some leveling and sometimes some reverb effects for those forced-walk-and-talk sequences) but having sound behave as sound should is going to help convince your brain to accept the reality easier in 3D scenery, just as having light behave as light should is a big part of what they're focused on with graphics.

By the way, this is generally what Sony's "3D Sound" is built around, correct? I know a lot of people jumped to the idea of virtualized modulation (like the Dolby Headphones with a directional 3D feel on 2 speakers,) and Sony is including that too with their HRTF approach, but it sounds more like their system is just about getting the directions and the volume of sound sources right.

Do they comment on how computationally expensive this demo is?

I don't recall much mention about audio processing with Xbox Series X? (These Microsoft tech demonstrations are not specific to or using Series X, it's just a general talk about all the next-gen sound features that MS has already talked about being part of its plans for the new console.) However, Mark Cerny's talk did go into power for Sony's console, and the PS5's dedicated Tempest Engine performs so that console's sound work on its SPU-like modules; they're dedicated to the task of sound processing, and audio alone has "roughly the same SIMD power and bandwidth as all 8 Jaguar cores in the PlayStation 4 combined". Although those Tempest Engine units can potentially be repurposed for other things in a game, Tempest Engine is its own hardware unit designed for specific, proprietary algorithms, and it's job is primarily to crunch sounds to leave the CPU and GPU at full capacity in their jobs.


(Jump to 38:37 if that link doesn't go their right off)
 
Last edited:

Iced Arcade

Member
I don't recall much mention about audio processing with Xbox Series X?
Must be dark under that rock.

 

CamHostage

Member
Only four posts for a Microoft thread to turn into a Playstation thread. That's faster than an SSD!

Heh, sorry, I didn't mean to... I actually was slagging PS5 at first (before editing, I at first posted, "this is basically what Sony's "3D Sound" amounts to, correct?") but I ended up going back in to contextualize that (using PS-based source references I'm admittedly more aware with,) and it inadvertently became a PlayStation thread pivot. I figured somebody more familiar with Xbox would come around and show it's side anyway.

(But BTW, it's also a lot friggin' easier to Google "Tempest Engine" then "MS Designed SOC HW Audio Units"...)

So yeah, both consoles, sort of the same thing:

GkrDv6ja9TiDYz3incXznN-970-80.jpg.webp


Also, the "Planeverb" video from MS is a lot of fun. The walk-and-talk in the OP's video is a good example of the rooms, but hearing the sound muffle as you duck under a ledge or move boxes around or open/close doors is just something people will want to play with.

 
Last edited:

Reallink

Member
Fun stuff, it'll make a difference. Developers will still need to have dialog at a certain level for interactive cutscenes (they already do some leveling and sometimes some reverb effects for those forced-walk-and-talk sequences) but having sound behave as sound should is going to help convince your brain to accept the reality easier in 3D scenery, just as having light behave as light should is a big part of what they're focused on with graphics.

By the way, this is generally what Sony's "3D Sound" is built around, correct? I know a lot of people jumped to the idea of virtualized modulation (like the Dolby Headphones with a directional 3D feel on 2 speakers,) and Sony is including that too with their HRTF approach, but it sounds more like their system is just about getting the directions and the volume of sound sources right.



I don't recall much mention about audio processing with Xbox Series X? (And note, these Microsoft tech demonstrations are NOT specific to Series X, it's just using the next-gen systems that MS has already talked about being part of its plans for the new console.) However, Mark Cerny's talk did go into power, and the PS5's dedicated Tempest Engine performs so much sound work on its SPU-like modules that audio alone has "roughly the same SIMD power and bandwidth as all 8 Jaguar cores in the PlayStation 4 combined" (and although those Tempest Engine units can be repurposed for other things in a game, Tempest Engine is its own hardware unit designed for specific, proprietary algorithms, and it's job is primarily to crunch sounds while the CPU and GPU do their jobs.)


(Jump to 38:37 if that link doesn't go their right off)


Yea I'm aware both consoles have quite a bit of audio specific processing power, but I'm curious how expensive something like this demo video is. They're literally doing audio raytracing here, which leads me to question whether they actually have the power to pull this off in games from multiple NPC's, or if it's just tech demos and marketing puffery smoke and mirrors.
 

CamHostage

Member
Yea I'm aware both consoles have quite a bit of audio specific processing power, but I'm curious how expensive something like this demo video is. They're literally doing audio raytracing here, which leads me to question whether they actually have the power to pull this off in games from multiple NPC's, or if it's just tech demos and marketing puffery smoke and mirrors.

I don't know what kind of stress-tests the audio systems of next-gen hardware has been made available (maybe somebody can track down another powerpoint deck from Hot Chips and learn me about that too...) but that's what these audio units do, and pretty much all they do. It looks like they will be fighting for RAM to hold the sound files off storage just as any other content the consoles are processing, but because they're dedicated to only thinking about sound, and because they are specialized for sound processing techniques and their own acoustic rendering, something like 5,000 traditional sound sources can be running concurrently (albeit under traditional gaming sound processing techniques, not necessarily this advanced "audio raytracing".) So, that doesn't really answer your question, but they're talking about applying presence and locality to "hundreds of advanced sound sources" in-game. (And also, that 5k figure only was mentioned by one of the two console manufacturers, it's not clear to me how similar the other is to the Son... er, to the other one.)

Microsoft Research's 2019 "Project Acoustics" is a familiar introduction to wave physics sound processing, and I don't know how fully relevant that clip is to console power (they talk about Triton technology and "the Power of the Cloud" and whatnot,) but in their demo, they didn't use some it-plays-on-even-a-toaster game like Minecraft to show this killer technology off; they used Gears.



BTW, I also don't know much about PC sound systems, but it looks like the NVIDIA RTX approach does not have dedicated sound units; however, they have been developing sound processes for its GPU cores that should put it in this same conversion. (AMD is the same for ZEN2 and probably ZEN3.) Side note, though: NVIDIA has some insanely cool sound features in RTX around reprocessing microphone sound via NVIDIA Broadcast, so look that stuff up too.
 
Last edited:
I don't know what kind of stress-tests the audio systems of next-gen hardware has been made available (maybe somebody can track down another powerpoint deck from Hot Chips and learn me about that too...) but that's what these audio units do, and pretty much all they do. It looks like they will be fighting for RAM to hold the sound files off storage just as any other processes the consoles are running, but because they're dedicated to only thinking about sound, and because they are specialized for sound processing techniques and their own acoustic rendering, something like 5,000 traditional sound sources can be running concurrently (albeit under traditional gaming sound processing techniques, not necessarily this advanced "audio raytracing".) So, that doesn't really answer your question, but they're talking about applying presence and locality to "hundreds of advanced sound sources" in-game. (And also, that 5k figure only was mentioned by one of the two console manufacturers, it's not clear to me how similar the other is to the Son... er, to the other one.)

Microsoft Research's 2019 "Project Acoustics" is a familiar introduction to wave physics sound processing, and I don't know how fully relevant that clip is to console power (they talk about Triton technology and "the Power of the Cloud" and whatnot,) but in their demo, they didn't use some it-plays-on-even-a-toaster game like Minecraft to show this killer technology off; they used Gears.



BTW, I also don't know much about PC sound systems, but it looks like the NVIDIA RTX approach does not have dedicated sound units; however, they have been developing sound processes for its GPU cores that should put it in this same conversion. (AMD is the same for ZEN2 and probably ZEN3.) Side note, though: NVIDIA has some insanely cool sound features in RTX around reprocessing microphone sound via NVIDIA Broadcast, so look that stuff up too.

They probably won't need to hold any audio files in the RAM. The SSDs in the next gen console will be able to stream hundreds of audio files with no seek times and very little bandwidth. I seem to recall one of the microsoft studios devs saying they no longer have to worry about fighting with graphics designers for RAM.
 

Yoboman

Member
Platform wars aside this sort of stuff is generational leap quality stuff to me

You hear it how it should be and then the way audio is done currently just sounds so antiquated

Imagine a whole Witcher 3 like city scape with proper directional and 3d audio sources

Imagine just a simple "follow my voice" gameplay dynamic
 

Kagey K

Banned
Imagine a whole Witcher 3 like city scape with proper directional and 3d audio sources

Imagine just a simple "follow my voice" gameplay dynamic

That works until you realize that an entire swath of people wear their earphones on the wrong ear all the time and don’t understand how headphones work.
 

Yoboman

Member
That works until you realize that an entire swath of people wear their earphones on the wrong ear all the time and don’t understand how headphones work.
Maybe but you'd figure that out pretty quick when the people on your left have audio coming from your right. These audio advancements will be possible to hear even on TV speakers as well
 

Kagey K

Banned
Maybe but you'd figure that out pretty quick when the people on your left have audio coming from your right. These audio advancements will be possible to hear even on TV speakers as well
We only have to wait 2 months to see as both of these systems offer 3D raytraced audio.

Either they will figure it out or they will get behind.
 

Redlight

Member
Maybe but you'd figure that out pretty quick when the people on your left have audio coming from your right. These audio advancements will be possible to hear even on TV speakers as well
I think headphones wearers and surround sound system owners will benefit, however stereo TV speakers and soundbars won't get any great benefit from improvements to directional audio.
 

Yoboman

Member
I think headphones wearers and surround sound system owners will benefit, however stereo TV speakers and soundbars won't get any great benefit from improvements to directional audio.
I can literally hear the benefits from my phone so I dont see why TV speakers wouldn't benefit
 

Yoboman

Member
You can tell if someone is walking up behind you from audio on your phone? Without headphones?

Yeah...nah.
You can definitely tell where the sounds are coming from in that environment even on shitty speakers

Obviously headphones will be way better but even regular speakers will get a benefit because the sound is relative to what's happening in the game world
 

Redlight

Member
You can definitely tell where the sounds are coming from in that environment even on shitty speakers

Obviously headphones will be way better but even regular speakers will get a benefit because the sound is relative to what's happening in the game world
I question your ability to tell a sound is coming from behind you by listening to audio coming from a phone and without using headphones.

Directional audio advances, apart from normal stereo staging, will not be there for people using stereo tv speakers or soundbars. You'll need to use headphones or have a surround setup to get a real benefit from this.
 
Last edited:

CamHostage

Member
I question your ability to tell a sound is coming from behind you by listening to audio coming from a phone and without using headphones.

Directional audio advances, apart from normal stereo staging, will not be there for people using stereo tv speakers or soundbars. You'll need to use headphones or have a surround setup to get a real benefit from this.

Maybe not behind you in the OP's first video (I believe the camera only actually has its back or lets the singer get behind it for a split second.) That video tests the sound directionality and room acoustics, but I don't know that it's using all the elements (obstruction, portaling, occlusion, rebererance, and decay time) that are being experimented with with here? When running the sample, I couldn't tell if sounds was coming from behind me on this video with the tiny bit that it actually was from behind.

in the Planeverb video, though, there is a distinct difference between when you are facing the speakerbox and when you are looking away from it. I'm not sure how much that qualifies for the "head-related transfer function" directional calculation that virtual surround sound technology employs, mostly it seems to be adding a muffle effect that you'd get from being right near a sound source but having your ears pointed away from it. (Or maybe it's all the same thing?) This demo is not as clear as those special Dolby Headphones audio show test samples where the sound designers have crafted a whole soundspace surrounding you with effects produced for the purpose of total directionality illusion, but if you close your eyes, I do think you can tell when the player is facing towards or away from the music or fire. And I felt I could hear it even on my little phone speakers. (Albeit I tested it first on a stereo PC monitor and so was listening for what I'd heard before; I'd like a more "scientific test", but you give it a try yourself and see if you can place the sounds in the space?)

 
Last edited:
Top Bottom