Look us up here for the WebRTC Fiddle of the Month, created… once a month.
Or just enroll to one of our excellent WebRTC training courses.
Tsahi: Hi and welcome to another WebRTC fiddle of the month. This time, we’re going to measure video codec performance. Philipp and I decided today to come with our corporate shirt.
So let’s start, Philip.
Philipp: Yes, let me share my screen, as usual.
The HTML boilerplate is a bit bigger. We’re including a graph library from the WebRTC samples, which comes from Chrome originally.
We are going to paint the graph here. We are going to look at the resolution we get and what codec implementation we get. And then similar to the last fiddle we had about determining which codecs we have, we have a dropdown for selecting the actual codec to use without SDP munging. And then we have a stop button and a call button. We also have some CSS this time: the canvas is going to be 800 pixels wide, which makes it nice and big.
That is easy. We are combining two samples we have and basically putting the code from them together, so we’re creating two peer connections with wiring up their onicecandidate and each other’s candidates.
Tsahi: We’re going to send video from one connection to the other.
Philipp: Yes. We’re not going to show the video on the screen this time. We’re going to draw a graph. And that is what we do here.
So in the start button, we are going to get our capabilities, we’re going to filter out the “not-really-codecs” like redundancy for video ulpfec and rtx, because it doesn’t make sense to select them because they’re just like redundant mechanisms, not codecs.
Tsahi: And we’ve got one because you’re using Chrome 90, the Canary version, right?
Philipp: Yes. That became enabled recently. And the interesting question is, of course, how is AV1’s performance?
Tsahi: OK, but let’s continue.
Then once we click the call button, we take the preferred codec from the dropdown, for example, VP8.
And we push it. Yes, to the front of the codec list and then we use setCodecPreferences() to actually enable it.
Tsahi: So we’re not using SDP munging anymore. We’re using actual APIs for that.
Philipp: Yes. Which are not implemented in Firefox, I think.
But Firefox, well, they need to fix that. And then we’re calling setLocalDescription(). Note that we’re not calling createOffer() anymore because this is implicit setLocalDescription, if you call it without any arguments, it will just try to figure out what you mean, which is useful for small fiddles like this, because it saves your a line or two. OK, so that’s the easy part now here comes the big part here, which is the calling getStats()
Tsahi: The other easy part.
Philipp: Yes, two easy parts.
So we have a sender and receiver from the peer connection. And if they’re not created yet, we don’t do anything.
Otherwise we’re querying the sender stats, store the last stats and then do a comparison on the total encode time. And that is a difference between the current total encode time and the last total encode time.
Tsahi: So what we’re trying to find out is how much time it takes us to decode the single frame. And if you scroll down, you do the same for the decoder.
Philipp: Yes. We always divide by frames encoded or frames decoded.
And then we’re also looking at which decoder is used and what the resolution is, because sometimes the resolution ramp up is slow and that affects how much time you spent on the encoder or decoder.
Tsahi: Click the Call button. This is why we’re here for.
Philipp: Yes. So we’re doing a 3 second average, so it takes a bit, and there it goes.
And the initial values are a bit lower than what you get during the call because it’s still like it’s ramping up from 320p to 720p.
And then we wait and we see results. And that is interesting because we get like 30.
Tsahi: Well, remember that we’re we are on another call at the same time.
Philipp: Yes. Which probably means the other is blocking our encoder.
Tsahi: And we don’t care about that, I guess. I think that the best way to do that now is after we’ve seen that it’s working and it’s nice. And we’ve got the graph is to go and look at the graphs that you took earlier, the screenshots and see what happens there.
OK, so before we begin, this is a kind of a disclaimer that we’re making or EULA for this fiddle:
Philipp: Well, the hardware decoder work on that machine.
Tsahi: OK. So let’s go to the result. OK, so this one is VP8. Encoders are going to be in red in all of them and decoders in green and the first thing that we see is that encoding time is higher than decoding, which makes a lot of sense.
Philipp: Yes, this one has to make decisions about what to encode, how to encode. And the decoder just takes whatever it gets.
Tsahi: Yes. And the machine here is the Linux machine, 720p and it uses the software decoder for video.
Philipp: Yes, we can see that the decoder is given is libvpx by the statistics.
Tsahi: For me, I’d say, let’s remember that it’s 4 and 2: 4 milliseconds for the encoder 2 for the decoder. If we do the math, we’ve got probably around 30, 33 milliseconds of budget per frame. If you want to do 30 frames per second so far is a nice rounded number that is small enough for us to use.
And just one thing, the fact that we’ve got 33 milliseconds doesn’t mean that we can use 33 milliseconds. It doesn’t have any meaning. You can use 30 or even 25 or 20 because we need the extra CPU for other things of the machine. We don’t want to be overloaded just with the encoder and decoder.
Philipp: Yes. And sometimes you are decoding more than one video…
Tsahi: or doing other things on the UI or someone else is using the machine for other processors than your application.
Philipp: Yes. Also we have audio to take into consideration and add that.
Tsahi: Yes. And OK, so that was VP8; and this one is VP9 profile zero.
Tsahi: Which is the most common profile, and it’s roughly the same numbers, 4 and 2.
Philipp: Yes, the decoder seems to be a bit below 2 even, but that depends on the machine.
Tsahi: OK, but there isn’t that much of a difference here for 720p at least.
Philipp: Yes. The encoder is slightly more expensive. But I wouldn’t say more than 10% and the decoder is the same or better.
Tsahi: Mm hmm. OK. And then you switch to the Windows machine, and got results from hardware.
Philipp: Yes. And you can see external decoders and say the NVIDIA GPU and you can see we’re way below 1 milliseconds on average for decoding, which is much, much better than what we had before.
Tsahi: Why didn’t you get me a VP8 one on Windows for a hardware decoder?
Philipp: I didn’t get a hardware decoder on Windows.
Tsahi: OK, and that begs the question, is that because the Intel machine doesn’t have the hardware for that or Google just didn’t do the optimization tools, the hardware decoding implementation for that?
Philipp: Yes, there are some Kaby Lake processors which have VP8 hardware decoder, but it’s not activated yet. There are some experiments, I think, but it’s hard to find out what’s the current state of these things are. So you measure it using statistics and look what results you get.
Tsahi: OK, so then you tried to play with other profiles for VP9, right?
Philipp: Yes. Profile two, which is I think the more colorful profile and I wouldn’t use it for real time because we’re going into 10 millisecond encode time and 3 milliseconds for decode.
Tsahi: And I guess this is why they use it for things like Stadia and cloud gaming, but not for video conferencing.
Then you started playing with H.264
Philipp: Yes, and we see encoding in software 4 milliseconds, decoding 1 millisecond. Compare that to VP8, then we had 4 and 2; now we have 4 and 1.
Tsahi: So this was H.264 at least on this machine was performing better. Interesting, and if you move to the hardware one on Windows…
Philipp: We get below 1 millisecond.
Tsahi: But the encoders seems almost the same. Not much difference there. You had another profile there on software.
Philipp: Yes, it’s a bit higher quality profile even though nobody understands these profiles really. And you can see it’s roughly the same results and quality might be a bit better, but it’s hard to tell the difference.
Tsahi: And if you do that with hardware, you again get that half a millisecond or 1 millisecond for the decoder.
OK, now I think we’re going to the most interesting one, and it’s why we’re using Canary here: Chrome 90.
Philipp: Yes. So now we are looking at AV1 and you see 10 milliseconds for encode and 4 milliseconds for decode roughly.
Tsahi: So the decoder of AV1 is roughly an encoder of any of the other codecs in terms of the performance. And the encoder is twice as more CPU consuming or even more than that.
Which is why this doesn’t fit to some of these cases out there.
Today, at least.
Philipp: Yes, and I mean, it’s software, so for H.264 you have lots of hardware and codes for VP9, you have some hardware decoders.
So you compare the hardware results to this result. I guess I would say it’s expensive.
Tsahi: Yes, I guess it would be fine with these numbers if the machine was smaller. Because at the end of the day, most people would have 2-4 cores today, at least on a desktop and on the phone, it’s just useless. Looking at these numbers and values, at least at this resolution.
Philipp: Yeah, I mean, it would be interesting to run this experiment of this fiddle on a phone and see what you get.
Tsahi: OK, so this is an exercise for everyone here, it wants to play with it then.
So you’ve got that fiddle now. People go use it and play with it and check it. Now, based on the things that you’re trying to achieve or modify to fit the application behavior that you have so that you can make some educated decisions on which codecs you should start looking for moving forward in your application. OK, so thank you for this one, Philip, and see your next month in our next fiddle.