1. Introduction
Since the advent of digital video technology in the late 1980s, the amount of video data delivered over the global communication network has been increasing constantly. Recent data show that video accounted for 75% of the Internet Protocol traffic in 2017 and it is forecast to exceed 80% in 2022 [
1].
Video compression schemes are thus essential for significantly reducing the data capacity for storage, or the data bandwidth for transmission of such big data. Those are the motivations behind the over 30 years of development of video coding standards in ITU (since H.261 in 1988) and in ISO (since MPEG-1 in 1993).
The current state-of-the-art video coding standards in the context of ITU, ISO and industrial fora are: High Efficiency Video Coding (HEVC/H.265), Essential Video Coding (EVC), Versatile Video Coding (VVC/H.266), and AV1. HEVC/H.265 is the joint ISO/ITU video coding standard finalized in 04/2013 [
2]. EVC is the ISO video coding standard at the stage of Final Draft International Standard and scheduled for finalization in 2020 [
3]. VVC/H.266 is the joint ISO/ITU video coding standard at the stage of Final Draft International Standard and is scheduled for finalization in 2020 [
4]. AOM AV1 is an open specification from the industrial forum Alliance for Open Media.
Ohm et al. [
5] published in 2012 a study of the state-of-the-art video codecs, comparing MPEG-2/H.262, H.263, MPEG-4, AVC/H.264 with the most recent standard HEVC/H.265 that was published in 2013. The main outcomes were the usage of 50% less bitrate of HEVC with respect to AVC for the same Peak Signal to Noise Ratio (PSNR), and an even higher subjective quality improvement.
Similarly, Dissanayake and Abeyrathna [
6] in 2015 compared the bitrate/quality performance of AVC and HEVC in a broadcast environment, taking into account also the increase in processing complexity to achieve the 50% saving in bitrate of HEVC.
Gros et al. [
7] in 2013 extended the comparison to AVC, HEVC and the proprietary codec VP9 published in 2013, the same year as HEVC. The paper reported bitrate savings for HEVC of 39% over AVC and 43% over VP9. The paper also reported a coding efficiency for VP9 inferior to both AVC by 8.4% and HEVC by 79.4% in terms of average bitrate overhead at the same objective quality, with a processing complexity greater by a factor 100 when comparing VP9 and x264 (open source AVC), and lower by a factor 7 when comparing VP9 and HM (reference HEVC).
Barman and Martini [
8] in 2017 presented an objective evaluation of the eight most popular games encoded using AVC, HEVC, and VP9 encoders for live game video streaming applications. The results are reported in terms of three objective video quality metrics (PSNR, SSIM, VIFp), Bjontegaard-Delta Bit-Rate (BD-BR) analysis, and encoding time. HEVC provided the best compression efficiency in terms of BD-BR analysis, with an encoding time being three times slower than AVC, and AVC provided better compression than VP9 with an encoding time four times faster than VP9.
Gros et al. [
9] in 2018 further extended the comparison to HEVC, JEM (preliminary model for VVC), VP9 (proprietary), and AV1 (AOM open evolution of VP9). The authors obtained the following results: AV1 achieved an average bitrate savings of 17% relative to VP9 at the cost of a factor 117 in encoder run time. JEM (model for VVC) achieved an average bitrate saving of 30% relative to HEVC at the cost of a factor of 11 in encoder run time. AV1 produced an average bitrate overhead of more than 100% relative to JEM at the same objective reconstruction quality besides a factor of three in encoder run time. Even in a two-pass rate-control mode, AV1 had an overhead of 55% relative to JEM (VVC) and 10% relative to HEVC.
In a different direction, Katsavounidis and Guo [
10] in 2018 presented a new methodology for an objective comparison of video codecs. Using Video Multimethod Assessment Fusion (VMAF), an open-source perceptual video quality metric, the paper proposes a visual perceptual optimization of any video codec in terms of PSNR and VMAF. The methodology is applied to encoder implementations of AVC, HEVC, and VP9. The paper reports advantages and disadvantages of different encoders for different bitrate/quality ranges and for a variety of contents.
This work presents a methodology for modeling the performances of video codecs. The modeling and comparison are based on the estimation of the Rate Distortion (RD) curves for each of the video codecs using experimental data from a set of video sequences at the same spatial resolution, in terms of Bitrate (BR) and PSNR. The methodology was applied in order to compare the performances of three different video codecs: MPEG-H HEVC/H.265, MPEG-5 EVC, MPEG-I VVC/H.266, by using six video sequences at Ultra HD (UHD) resolution. The modeling and comparison of these state-of-the-art video codecs are achieved by first studying the behavior of the codecs in terms of BR and PSNR variations depending on the Quantization Parameter (QP) values and under similar encoding configurations, and then studying an averaged model for each video codec for a set of sequences at the same spatial resolution. Using such modeling of different video codecs in terms of RD curves, it is possible to compare their performances in terms of Bitrate difference at the same PSNR, or PSNR difference at the same BR.
The main novelty and advantage of the proposed modeling and comparison methodology consists of the possibility to compare, both numerically and graphically, the trend of the RD curves in a range of BR and PSNR. The widely adopted BD-BR algorithm [
11,
12] provides a single numerical value for each sequence on a given range of Bitrates, and a single numerical value for the average of the RD performance among several sequences. The proposed method, instead, provides an RD model for each sequence and an averaged RD model, derived from the set of sequence models: this allows a more significant analysis of the RD characteristics for each codec at different bitrates.
The same methodology can also be extended to modeling and comparing the codecs performance with respect to other metrics, such as VMAF or Mean opinion score (MOS).
The paper is organized as follows.
Section 2 explains how the data set, used for the estimation of the codec RD models, was obtained from a chosen set of sequences.
Section 3 introduces, in
Section 3.1 the methodology for modeling the RD behavior of the codecs, for each sequence, and the realization of the averaged RD codec model. In
Section 3.2 the results of model estimation are presented.
Section 4 explains how to use the averaged codec model to compare the codecs in terms of PSNR and Bitrate. A Comparison with the well established BD-BR algorithm is given in
Section 5, and eventually conclusions are drawn in
Section 6.
2. Experimental Data Set for the Codec Modeling
With the aim of introducing a methodology for the comparison of the performances of video codecs, this Section presents the procedure followed to provide the data set used for the estimation of the codec models. Six sequences with different frame rates (50 and 60 frames per second) and the same resolution (UHD,
were chosen for producing the data set. Such sequences are a subset from the MPEG test set:
DaylightRoad | at | and 60 fps; |
CatRobot | at | and 60 fps; |
Fortnite | at | and 60 fps; |
ParkRunning | at | and 50 fps; |
FlyingBirds | at | and 60 fps; |
SunsetBeach | at | and 60 fps. |
As an example of an application of the methodology, the following video codecs were considered:
HEVC, from MPEG, as implemented in the Test Model HM 19;
EVC, from MPEG, as implemented in the Test Model ETM 6;
VVC, from MPEG, as implemented in the Test Model VTM 9.
For the comparison of the three codecs, the PSNR as a function of BR has been chosen as RD curve. The main criterion is to operate at fixed QP (i.e., encoding the whole sequence with the same QP) from a set of four predefined QP values, applied to each sequence with each codec:
Of course, the resulting data can be extended to more sequences at the same resolution. For similar comparative analysis see [
9,
13,
14]. By using such QP values the corresponding
values were calculated according to the following expression
In the following Figures, to avoid subscripts in the labels, the
will be indicated as YUV-PSNR. Each PSNR in the right part of (
1) is calculated as:
where, for a picture size of M rows by N columns of pixels,
where B is the number of bits per sample of Luminance (Y) and Chrominance (U, V), and
and
are the Y, U and V values of the pixel at position
in the original and coded pictures, respectively.
As an example,
Table 1 reports the data resulting from encoding the sequence DaylightRoad. The QP values and respective BR and
values are shown in the Table for the encodings with the three codecs: HEVC, EVC, and VVC. From first inspection, we can see that the experimental data in
Table 1 show different characteristics. For the same range of QP’s, the bitrates obtained are higher for HEVC with respect to EVC, and for EVC with respect to VVC. However, for the same QP, the PSNR values are almost always increasing, going from HEVC to VVC, with a maximum difference of
dB (QP = 42) and decreasing, going from HEVC to EVC, with a maximum difference of
dB (QP = 32). Furthermore, we can expect these results to change by using a different sequence.
These observations show the need for a consistent and repeatable methodology for a fair comparison of such data sets in a given range of BR and PSNR. The proposed methodology is described in the following section.
4. Comparison of the Codecs in Terms of PSNR and Bitrate
Let us suppose we want to compare the performance of two codecs in a given range of BR values. Identifying the reference codec as “H” and the tested codec as “K”, and considering an interval in BR between and (or an interval in PSNR between and ) the average difference in PSNR (or in BR) between the codecs can be computed with the following procedure.
For the two models, H and K, we apply the linear model (
5), and define the difference in PSNR for the extrema of the BR range (
):
Thus, the average difference of PSNR over the given interval of BR results in:
Conversely, we can also apply the linear model (
6), and define the difference in BR for a given value of
:
Thus, the average difference of BR over a given interval of PSNR results in:
Figure 4 shows graphically the comparison among the different models for
resolution.
From
Figure 4 and
Table 2 it can be seen that VVC has an average gain of
dB in
over HEVC in the range of BR between 2 and 32 Mbps. This gain is practically constant over all the range of considered BR values, as can be easily seen from
Table 2, that reports very similar coefficients for the two models (
,
). On the other hand, considering an interval of
, the same result can be seen as an average saving of 25.06% in BR of VVC over HEVC, in the range of PSNR between 30 dB and 46 dB.
Figure 4 and
Table 2 also show that EVC has an average gain of
dB over HEVC, for the same BR range of 2 Mbps to 32 Mbps. Comparing the angular coefficients, we can also note that the gain for EVC is practically constant with respect to HEVC, given that
. All data about the angular coefficients are reported in
Table 2.
As a summary of the analysis,
Table 3 shows the comparisons both in terms of average PSNR difference and average BR percent difference for EVC and VVC, all with respect to HEVC used as a reference.
5. Comparison with the BD-BR Algorithm
This Section compares the results obtained by applying the proposed method with the cited BD-BR algorithm. The relationships (
8) and (
10) have the same purpose as the BD-BR algorithm [
11,
12]: estimating differences in PSNR or BR between two corresponding RD curves.
The most common implementation of the BD-BR algorithm to compute the average bitrate savings for a given PSNR range (BD-BR), or conversely the average PSNR gain for a given bitrate range (BD-PSNR), estimates the area between the two RD curves. This is achieved using a piecewise cubic Hermite polynomials (PCHIP) interpolation on the data, with the bitrate measured on a log scale. The area between the two piecewise cubic curves, i.e., the integral between them, is then computed. As an example, the numerical integration can be performed with 1000 equal-sized subintervals, as described in [
5].
Table 4 reports the values of the BD-BR metric for the test set used in this work.
Comparing the results in
Table 3, based on the proposed method, and in
Table 4, based on the BD-BR algorithm, for the BR differences of EVC and VVC with respect to HEVC, the offset between the two estimations is 0.24% for EVC and 1.63% for VVC. These results confirm that, even using a simple linear method, reliable values are obtained, with a small difference with respect to the BD-BR method.
The difference in computational complexity between the linear model described in this paper and the piecewise cubic model is not so significant when compared to the complexity of the video coding. In any case, the proposed method simplifies the computation, since it does not require the numerical integration step after the approximation step.
The two main advantages of the proposed method lie in the possibility to have direct information, both in numerical and in graphical terms, of the RD trend at different BR, both at the single sequence level and at the codec level. For a single sequence, the slope parameter of the RD model provides an immediate estimate of the RD variations from lower to higher bitrates, when comparing such a slope with another sequence encoded with the same codec, or the same sequence encoded with a different codec. Furthermore, with the model averaging, the averaged RD models of the different codecs allow a comparison of the performance of different codecs (in our case HEVC, EVC, VVC) when operating at lower or higher bitrates. Since the slopes of the averaged RD models for HEVC, EVC, VVC are not the same, the RD curves are not parallel to each other, and the numerical or graphical analysis gives an estimation of how the relative RD performance changes when changing the bitrate (or conversely, changing the PSNR for the inverse model (
6)).
Vice versa, the typical BD-BR analysis just gives a numerical value for the differences in BR or PSNR between two different sets of encodings for a specific sequence. For the comparison between different codecs, the result is a single numerical value, obtained as an average over a number of sequences. So the different behavior of the codec at lower and higher Bitrates (or lower and higher PSNR) cannot be analyzed when comparing different sequences or when comparing with a different codec by the single numerical data resulting from the BD-BR method.
6. Conclusions
This paper presents a simple but reliable procedure to model and compare different video codecs under as similar test conditions as possible. Since it is well known that the PSNR vs BR characteristic is linear in a semi-logarithmic scale, this property is exploited to model the behavior of the codec for a single sequence and to average the obtained models over a number of sequences to have a general model for the codec. Finally, such averaged codec models can be compared to estimate the gain or loss in terms either of delta PSNR for a given BR range, or delta BR for a given PSNR range.
The main advantages of the proposed methodology in comparison to existing ones, and specifically the widely adopted BD-BR algorithm, are twofold. The BD-BR algorithm provides a single numerical value for the RD performance of a specific sequence encoded with a specific encoder. Such values are then averaged over several sequences to provide a single numerical value for the RD performance of a specific codec. With the proposed method, instead, we have a linear RD model for each sequence and an averaged RD model for a given codec. Such models allow the same comparison over the whole range of bitrates that can be done with the BD-BR algorithm. Furthermore, it is possible to study the behavior at lower or higher bitrates of the codecs, since the resulting RD curves can be more or less close to each other at different BR, depending on the slope coefficient. As an example, the RD curves of
Figure 4 show graphically that the RD performance gain of VVC over EVC is larger at lower bitrates and smaller at higher bitrates. This comparison can be done also at the single sequence level, comparing the behaviour of the same codec over the different sequences. Besides graphically, all such comparisons can be performed also numerically, using the models as defined in (
7) to (
10).
The approach is presented for the rate distortion curves in terms of PSNR vs BR, but can easily be extended to other metrics such as VMAF and MOS.