Measuring video codecs performance

🎻 March 2021 fiddle

Want to learn more about WebRTC?

Look us up here for the WebRTC Fiddle of the Month, created… once a month.

Or just enroll to one of our excellent WebRTC training courses.

Resources

Transcription

Tsahi: Hi and welcome to another WebRTC fiddle of the month. This time, we’re going to measure video codec performance. Philipp and I decided today to come with our corporate shirt.

So let’s start, Philip.

Philipp: Yes, let me share my screen, as usual.

So the fiddles, this one is a bit larger because measuring is not as easy as the small things we usually do, but it’s still less than one hundred lines of JavaScript code.

The HTML boilerplate is a bit bigger. We’re including a graph library from the WebRTC samples, which comes from Chrome originally.

We are going to paint the graph here. We are going to look at the resolution we get and what codec implementation we get. And then similar to the last fiddle we had about determining which codecs we have, we have a dropdown for selecting the actual codec to use without SDP munging. And then we have a stop button and a call button. We also have some CSS this time: the canvas is going to be 800 pixels wide, which makes it nice and big.

That is easy. We are combining two samples we have and basically putting the code from them together, so we’re creating two peer connections with wiring up their onicecandidate and each other’s candidates.

Tsahi: We’re going to send video from one connection to the other.

Philipp: Yes. We’re not going to show the video on the screen this time. We’re going to draw a graph. And that is what we do here.

We’re going to use 720p at 30 frames per second, but you can also use other resolutions. And that’s interesting to play with.

So in the start button, we are going to get our capabilities, we’re going to filter out the “not-really-codecs” like redundancy for video ulpfec and rtx, because it doesn’t make sense to select them because they’re just like redundant mechanisms, not codecs.

And we’re going to add them to the dropdown we get. So let me show that if we start, the camera starts and we get a dropdown: VP8 to VP9, four H.264 variants and AV1.

Tsahi: And we’ve got one because you’re using Chrome 90, the Canary version, right?

Philipp: Yes. That became enabled recently. And the interesting question is, of course, how is AV1’s performance?

Tsahi: OK, but let’s continue.

Philipp: Yes. Then we’re going to call getUserMedia() and add the track to the peer connection.

Then once we click the call button, we take the preferred codec from the dropdown, for example, VP8.

And we push it. Yes, to the front of the codec list and then we use setCodecPreferences() to actually enable it.

Tsahi: So we’re not using SDP munging anymore. We’re using actual APIs for that.

Philipp: Yes. Which are not implemented in Firefox, I think.

But Firefox, well, they need to fix that. And then we’re calling setLocalDescription(). Note that we’re not calling createOffer() anymore because this is implicit setLocalDescription, if you call it without any arguments, it will just try to figure out what you mean, which is useful for small fiddles like this, because it saves your a line or two. OK, so that’s the easy part now here comes the big part here, which is the calling getStats()

Tsahi: The other easy part.

Philipp: Yes, two easy parts.

So we have a sender and receiver from the peer connection. And if they’re not created yet, we don’t do anything.

Otherwise we’re querying the sender stats, store the last stats and then do a comparison on the total encode time. And that is a difference between the current total encode time and the last total encode time.

Tsahi: So what we’re trying to find out is how much time it takes us to decode the single frame. And if you scroll down, you do the same for the decoder.

Philipp: Yes. We always divide by frames encoded or frames decoded.

And then we’re also looking at which decoder is used and what the resolution is, because sometimes the resolution ramp up is slow and that affects how much time you spent on the encoder or decoder.

Tsahi: Click the Call button. This is why we’re here for.

Philipp: Yes. So we’re doing a 3 second average, so it takes a bit, and there it goes.

And the initial values are a bit lower than what you get during the call because it’s still like it’s ramping up from 320p to 720p.

And then we wait and we see results. And that is interesting because we get like 30.

Tsahi: Well, remember that we’re we are on another call at the same time.

Philipp: Yes. Which probably means the other is blocking our encoder.

Tsahi: And we don’t care about that, I guess. I think that the best way to do that now is after we’ve seen that it’s working and it’s nice. And we’ve got the graph is to go and look at the graphs that you took earlier, the screenshots and see what happens there.

WebRTC video codecs performance results

OK, so before we begin, this is a kind of a disclaimer that we’re making or EULA for this fiddle:

  • There are no bandwidth restrictions, so running with whatever it will want to use
  • We’re running with 720 resolutions. You can play with that if you want
  • We’re doing 30 frames per second
  • We’re not doing simulcast. This is just peer-to-peer. And if you are going to do simulcast, the results are going to be different
  • Then it’s Chrome Canary 90. My assumption is 91, 92, 93 are all going to improve performance in each time because performance of video is now the most important thing
  • And then you’ve used two machines. One of them was in Core i7 with 8 cores – the Linux machine and the other one is a Windows machine with 6 cores. And the reason for using a Windows machine was…

Philipp: Well, the hardware decoder work on that machine.

VP8 WebRTC performance

Tsahi: OK. So let’s go to the result. OK, so this one is VP8. Encoders are going to be in red in all of them and decoders in green and the first thing that we see is that encoding time is higher than decoding, which makes a lot of sense.

Philipp: Yes, this one has to make decisions about what to encode, how to encode. And the decoder just takes whatever it gets.

Tsahi: Yes. And the machine here is the Linux machine, 720p and it uses the software decoder for video.

Philipp: Yes, we can see that the decoder is given is libvpx by the statistics.

Tsahi: For me, I’d say, let’s remember that it’s 4 and 2: 4 milliseconds for the encoder 2 for the decoder. If we do the math, we’ve got probably around 30, 33 milliseconds of budget per frame. If you want to do 30 frames per second so far is a nice rounded number that is small enough for us to use.

And just one thing, the fact that we’ve got 33 milliseconds doesn’t mean that we can use 33 milliseconds. It doesn’t have any meaning. You can use 30 or even 25 or 20 because we need the extra CPU for other things of the machine. We don’t want to be overloaded just with the encoder and decoder.

Philipp: Yes. And sometimes you are decoding more than one video…

Tsahi: or doing other things on the UI or someone else is using the machine for other processors than your application.

Philipp: Yes. Also we have audio to take into consideration and add that.

VP9 WebRTC performance

Tsahi: Yes. And OK, so that was VP8; and this one is VP9 profile zero.

Tsahi: Which is the most common profile, and it’s roughly the same numbers, 4 and 2.

Philipp: Yes, the decoder seems to be a bit below 2 even, but that depends on the machine.

Tsahi: OK, but there isn’t that much of a difference here for 720p at least.

Philipp: Yes. The encoder is slightly more expensive. But I wouldn’t say more than 10% and the decoder is the same or better.

Tsahi: Mm hmm. OK. And then you switch to the Windows machine, and got results from hardware.

Philipp: Yes. And you can see external decoders and say the NVIDIA GPU and you can see we’re way below 1 milliseconds on average for decoding, which is much, much better than what we had before.

Tsahi: Why didn’t you get me a VP8 one on Windows for a hardware decoder?

Philipp: I didn’t get a hardware decoder on Windows.

Tsahi: OK, and that begs the question, is that because the Intel machine doesn’t have the hardware for that or Google just didn’t do the optimization tools, the hardware decoding implementation for that?

Philipp: Yes, there are some Kaby Lake processors which have VP8 hardware decoder, but it’s not activated yet. There are some experiments, I think, but it’s hard to find out what’s the current state of these things are. So you measure it using statistics and look what results you get.

Tsahi: OK, so then you tried to play with other profiles for VP9, right?

Philipp: Yes. Profile two, which is I think the more colorful profile and I wouldn’t use it for real time because we’re going into 10 millisecond encode time and 3 milliseconds for decode.

Tsahi: And I guess this is why they use it for things like Stadia and cloud gaming, but not for video conferencing.

H.264 WebRTC performance

Then you started playing with H.264

Philipp: Yes, and we see encoding in software 4 milliseconds, decoding 1 millisecond. Compare that to VP8, then we had 4 and 2; now we have 4 and 1.

Tsahi: So this was H.264 at least on this machine was performing better. Interesting, and if you move to the hardware one on Windows…

Philipp: We get below 1 millisecond.

Tsahi: But the encoders seems almost the same. Not much difference there. You had another profile there on software.

Philipp: Yes, it’s a bit higher quality profile even though nobody understands these profiles really. And you can see it’s roughly the same results and quality might be a bit better, but it’s hard to tell the difference.

Tsahi: And if you do that with hardware, you again get that half a millisecond or 1 millisecond for the decoder.

AV1 WebRTC performance

OK, now I think we’re going to the most interesting one, and it’s why we’re using Canary here: Chrome 90.

Philipp: Yes. So now we are looking at AV1 and you see 10 milliseconds for encode and 4 milliseconds for decode roughly.

Tsahi: So the decoder of AV1 is roughly an encoder of any of the other codecs in terms of the performance. And the encoder is twice as more CPU consuming or even more than that.

Which is why this doesn’t fit to some of these cases out there.

Today, at least.

Philipp: Yes, and I mean, it’s software, so for H.264 you have lots of hardware and codes for VP9, you have some hardware decoders.

So you compare the hardware results to this result. I guess I would say it’s expensive.

Tsahi: Yes, I guess it would be fine with these numbers if the machine was smaller. Because at the end of the day, most people would have 2-4 cores today, at least on a desktop and on the phone, it’s just useless. Looking at these numbers and values, at least at this resolution.

Philipp: Yeah, I mean, it would be interesting to run this experiment of this fiddle on a phone and see what you get.

Tsahi: OK, so this is an exercise for everyone here, it wants to play with it then.

So you’ve got that fiddle now. People go use it and play with it and check it. Now, based on the things that you’re trying to achieve or modify to fit the application behavior that you have so that you can make some educated decisions on which codecs you should start looking for moving forward in your application. OK, so thank you for this one, Philip, and see your next month in our next fiddle.

Philipp: Bye.

[]