Spatial audio

🎻 September 2021 fiddle

👉 The sound on this one isn’t always great. You’ll have some echo artifacts in it. This happens due to the need to fiddle around with audio devices during the session itself. Can’t talk about audio without having some audio artifacts…

Want to learn more about WebRTC?

Look us up here for the WebRTC Fiddle of the Month, created… once a month.

Or just enroll to one of our excellent WebRTC training courses.

Resources

Fiddle: https://jsfiddle.net/fippo/aLf2yos3
WebAudio explainer: https://developer.mozilla.org/en-US/docs/Web/API/Web_Audio_API/Basic_concepts_behind_Web_Audio_API
StereoPannerNode: https://developer.mozilla.org/en-US/docs/Web/API/StereoPannerNode
Some context: https://twitter.com/juberti/status/1435668582105444352
Bluetooth limitations: https://clubhouseapp.zendesk.com/hc/en-us/articles/4406101390355-What-is-spatial-audio

Transcription

Tsahi: Hi. And welcome to WebRTC Fiddle of the Month. And this time because it is becoming more important these days we’re going to talk about spatial audio or 3D audio or whatever. So what we mean by that is the fact that instead of getting mono and the same audio and both ears, what we’re going to get from our speakers, especially on headsets, is different channels in each ear, and with that we can provide 3D sounds or the impression of people speaking from different locations. Philip, let’s go for that.

Philipp: Yes. Let me share my screen.

So we’re going to use the WebAudio API to implement that. And it’s quite a simple fiddle. What we have is:

a couple of buttons
an audio element, of course
a button to start the capture
a list of these devices, because we will need to select a device like this one to hear it
and we have a control which has a range from minus one to one for left to right.

We start with a WebAudio audio context, and we create 2 nodes.

One is a StereoPannelNode, which allows you to add a way to the stereo channels you can do pan to the left or pan to the right. And we have a AudioDestinationNode, which is a MediaStreamAudioDestinationNode, which you can then use again with an element. And it connects the panel to the destination node and set the audio elements source subject to the destination node stream element. So let’s look at the stuff at the bottom first, which is very simple. We have a list of audio outputs and is that changes if we change that here, which we can do after we have permission, we can change the audio output destination to select the right device using the setSyncId API.

Tsahi: Okay. So essentially, what you’re doing is taking the local microphone and then just shoving it to wherever you want in terms of making that stereo and deciding what weights to give to each ear.

Philipp: Yes. So let’s walk through that. We have the button, we disable it, we get a stream from getUserMedia, which is typically not what you would do because you would do this operation on the remote stream.

And we need to deal with autoplay, because browsers and WebAudio is quite sensitive to autoplay issues. And we create a media stream source node, which is a way to get the MediaStream or MediaStreamTrack into WebAudio, and we connect left to the pan node.

So we go from source node to the pan node to the destination node. And then get to devices and we’re done. So if we try that, it’s a bit, you need to try that yourself because I can’t make you hear what I hear. I press that button and then very quickly need to select the right… So now it’s coming from my headphones and it’s currently attend to the left. So I’m setting it middle. So now it’s hearing it on both earphones. And if I move it to the left, I will hear my voice mostly from the left, a bit from the right.

But you can hear how a different – different volume on each ear. And you can play around with that.

Tsahi: Okay. So this isn’t really how we would use it. And in real life, my guess is that what we’ll do is we will receive from the SFU multiple audio streams. And then instead of just letting WebRTC mix them on its own, we would use this approach and this type of code to mix these two or more streams and decide where they should come from.

Philipp: Yes, it might depend on your video layout. For example, if you have a layout like we have here, you have one big speaker. But what if you have a grid view with three or four speakers side by side? Then you can make some come from the left, some from the right. That might feel more natural.

Tsahi: Okay. And again, I guess the biggest limitation with doing that, and we spatial audio as a whole is that I actually need a headset for that.

Philipp: Yes.

And to make things more fun, bluetooth headsets don’t work well with stereo audio, which is a limitation of the Bluetooth profiles.

Tsahi: Okay. So thank you for this and see you all in our next fiddle of the month.

Previous Lesson

Back to Module

Next Lesson