Multiple video tracks on a single peer connection

🎻 May 2021 fiddle

Want to learn more about WebRTC?

Look us up here for the WebRTC Fiddle of the Month, created… once a month.

Or just enroll to one of our excellent WebRTC training courses.



Tsahi: Hi and welcome to WebRTC Fiddle of the Month. This time, we’re going to look at multiple video tracks in a single peer connection.

Now that everyone is moving or should be moving towards unified plan, we might have a scenario where what we want to do is to use a single connection to bundle or to pack or send multiple video streams: that might be my webcam and screen sharing content. It might be two separate screens that I want to share at the same time or it might be multiple cameras. And we’ve seen these kinds of use cases cropping up lately. So to you Philipp, how exactly do we get that done in WebRTC?

Philipp: OK, let’s take a look at the code.

So we have our usual jsFiddle way to get connected. We create some, we wire up on the ICE candidate handler (which is slightly wrong) and we have a mapping id2content, which we’re going to use later on.

The second on the receiving peer connection we have an ontrack handler. And we look at the track and the stream we get from it and we look at track ID in the track as well as the stream ID on the stream and our id2content mapping because we want to know what kind of metadata is associated with that stream.

Tsahi: So let’s talk about that a bit. OK, we’ve created our peer connection. I’m going to send you now two video streams, let’s say two cameras. Camera A and camera B. And the question is, how exactly do you know which video that you receive come from which camera that I’m sending?

Philipp: Exactly. And the SDP doesn’t tell you that. So we need our metadata signaling for that.

Tsahi: OK, so I’m going to do it out of band of WebRTC, so to speak.

Philipp: Yes, you can do it in the same messages. So if you send your offer, you can add a metadata object to it.

Then we have our negotiation function, which does setLocalDescription which is the implicit creation of an offer and setLocalDescription and we’re extracting the msid lines from the SDP because mid is how the stream IDs ae signaled and the track IDs.

Then we do a setRemoteDescription, again, implicit to create the answer and call setRemoteDescription again. And we have two buttons here. One is to start and call getUserMedia from the camera, which gets us a local stream, with a certain stream ID and a certain certain track ID. And we fill our id2content mapping and say this is our webcam. We can put any object in there.

Tsahi: So this happens on the local peer connection that is sending my camera information. And I simply know that this is the webcam, so I placed somewhere the identifier for that. OK, and in real life, I would take that data and then send it over the network. We don’t need to do it now because it’s a fiddle that runs on the same page.

Philipp: Yes, And then we add the tracks to the peer connection.

And we have a similar thing for screen sharing, which calls getDisplayMedia. We add the content mapping, addthe tracks to the peer connection and negotiate again.

OK, so let’s click the start button.

OK, so we got a local stream with a certain ID. And the offer, we get the same stream ID here in the msid line. And we also get a track ID. But the track ID is basically useless because it’s not guaranteed to be associated with a local track.

Tsahi: OK, so what we did was found out the msid for the stream that we’re sending out. And then you took that and gave that the name “webcam”. In the id2content array that we’ve got.

Philipp: And now on the other end we got to the track with an ID, which is actually the same that we see in the SDP, but that’s not guaranteed to be the same. So we can’t rely on that. We also get the stream, which again has the msid we see up here in the SDP, and that is guaranteed to be the same.

So we can use that.

And then we have our content mapping, which says this is our webcam content and this is on the remote end now. And we can do the same if we have an actual signaling somewhere in between. So that works nicely.

Let’s close the console and add screen sharing. Fun, let’s through tab sharing. So we got the local stream, which has the ID here and we see this is our old stream ID from the camera and this is our new stream ID from the screen sharing.

On the other end, we get the ontrack event with the track ID, which again, is not reliable and the stream ID. We use the stream ID to look up the content type and it’s “screen”.

Tsahi: OK, so what we’ve had here is a peer connection with two separate tracks, we’ve checked what the track ID is or the stream ID. the msid for that. We then should pass that information along with the context, what type of video we’re trying to send over to the other end. And there we can use that when we want to look up and understand or distinguish between these two channels, these two tracks, and decide which one is which.

OK, thank you for this and see you next time.