1. Introduction
A distributed video coding (DVC) scheme is based on two important theorems, Slepian-Wolf [
1] and Wyner-Ziv [
2]. This video coding paradigm follows the principle of distributed source coding (DSC) and is becoming a prominent video coding paradigm due to shifting the high computational complexity to the decoder. Moreover, it came up as a promising scheme for computationally lightweight and limited resources applications [
3,
4] such as wireless video sensor networks, due to the independent encoding of video frames at the encoder and their joint decoding at the decoder [
4]. However, poor rate-distortion (RD) performance and high delay due to feedback channels are major challenges in this video coding paradigm. The channel decoding needs a feedback channel for error correction and exhibits high computational complexity. Even today, the RD performance offered by DVC still lags the conventional motion-compensated predictive video codecs [
4], such as H.264/AVC. One of the most adopted DVC frameworks in the literature is DISCOVER [
5]. It outperforms the H.264/AVC intra-encoder RD performance for simple motion videos, while having worse performance for complex and high motion videos.
Different factors contribute to the coding performance gap between DVC and H.264/AVC, including the inferior performance of channel coding tools, the correlation noise model (CNM) inaccuracies, and inferior side information (SI) quality. Many techniques have been proposed to improve the RD performance in DVC at the expense of system complexity. Some of these techniques include the hash methods used at the encoder, encoder-based CNM estimation, encoder-based quality control, encoder-based motion estimation, etc. [
6,
7,
8]. These methods increase the encoder’s complexity.
In DVC, SI has a significant impact on RD performance. This is because the high-quality SI leads to higher compression efficiency and lower bit rates. Since the decoder does not have access to the current Wyner-Ziv (WZ) frame, the hash code is generated at the cost of extra computation at the encoder. Usually, the cyclic redundancy check (CRC) [
9] is generated as hash code and sent as auxiliary information to assist in the SI generation process [
10,
11,
12,
13,
14,
15,
16,
17,
18]. Among these hash code-based encoders, some codecs presented encoder-based adaptive hash generation strategies to obtain the optimal RD performance. In addition, other computationally expensive information, such as entropy encoded most significant bit-plane [
19,
20] or intra-encoded down-sampled WZ frame [
21], can be sent as hash information. In [
17], the authors presented detailed analyses of hash-based motion estimation. Their experimental results depict that the hash-based strategy assists in achieving high quality SI for video sequences with medium to high bit rates (motion). However, this strategy is not beneficial for low motion videos. Further, it is suggested that SI quality is directly proportional to the number of hash codes. A large number of hash codes generate high quality SI and vice versa. This hash-based codec increases the encoder’s computational complexity by generating a large number of hash codes. The hash-based techniques did not focus on reducing the workload of the channel encoding process, which is a major element of the encoder’s computational complexity.
Quantization is another key technique used in the DVC encoder for rate control and coding efficiency. Due to its simplicity, scalar quantization is used in the majority of existing DVC schemes. Multiple DVC codecs have presented adaptive quantization techniques [
22,
23,
24,
25]. The authors in [
24] explored the three distinct types of adaptive quantization methods: adaptive sub-band level quantization, adaptive frame-level quantization, and overall adaptive quantization [
24]. The perceptual distortion probability for overall adaptive quantization is established first to determine the target perceptual distortion of SI. This technique is somewhat complicated since SI must be created at the encoder for perceptual distortion probability estimation. The best quantization matrix is identified adaptively and iteratively. The SI quality, RD optimization, and perceptual features are combined with the estimated SI and target perceptual distortions for quantization matrix identification. The authors in [
23] propose a complicated encoder-based optimal entropy-constrained non-uniform scalar quantizer for pixel domain DVC (PDDVC). First, the encoder employs a conditional probability density function for the estimation of the rate and distortion model. Then, an optimization function for RD is developed. A modified Lloyd-Max technique with a novel quantization partition updating approach is applied to optimize the RD function. Experimental findings of [
23,
24] indicate that suggested quantization techniques enhance RD performance. However, these established algorithms increased the encoder’s computational complexity. Furthermore, no progress has been made to reduce the input to channel encoding to decrease its computational complexity.
Techniques other than adaptive quantization and hash methods are deployed at the encoder to increase the coding efficiency. The literature [
26] provided an encoder-based SI interpolation approach for obtaining the global motion vector by using the SI frame interpolation algorithm. The suggested approach is capable of improving the quality of SI, however, at the cost of increased encoder complexity due to the feature-point matching process deployed at the encoder. The research work [
27] presented the Human Visual system (HVS) based DVC technique. Due to its underlying temporal and spatial sensitivity and masking properties, HVS is rarely able to detect the changes below the just noticeable difference (JND) distortion threshold. Therefore, correcting the unnoticeable signal difference between the original frame and SI is unnecessary. To deploy the JND model at encoder basic SI is generated. The basic SI is required to be generated at the encoder to deploy the JND model. Experimental findings indicate that bit rate is reduced significantly with the proposed model at the expense of the SI generation and JND calculation complexity.
In any of the DVC codecs, one of their major components is the channel coding process utilized for error correction. The iterative decoding process, associated with the error correction, is a time consuming task that increases the complexity of a WZ decoder. Currently, the low-density parity-check accumulate (LDPCA) codes [
28] are a popular choice for Wyner-Ziv (WZ) video coding and are considered to exhibit lower complexity compared to previous channel coding techniques. However, their complexity is still dominant in the overall DVC decoding process.
The most challenging issue in DVC is a trade-off between the RD performance and the encoder complexity and the channel coding process. This research presents the encoder-based scheme for distributed residual video coding (DRVC) that limits the number of input bit planes passed through the channel coding process, thus maintaining the low encoder complexity, and improving the coding efficiency. The DRVC codec with proposed attributes is named QUAntized Transform ResIdual Decision (QUATRID). The primary attributes of this research work include the following:
We propose a QUantized TrAnsform Decision Mode (QUAM) for DRVC that drops the zero QT blocks of the residual frame. QUAM generates fewer bit planes to be channel encoded, thus reducing the complexity of the channel encoding. Similarly, the fewer channel-encoded bit planes reduce the complexity of channel decoding. Therefore, QUAM reduces the overall computational complexity while improving coding efficiency.
We propose and deploy the online correlation noise model (CNM) at the decoder to perform the error correction of the bit planes generated using a limited number of nonzero QT blocks. The decision mode information (DMI) is utilised to form CNM.
We introduce the algorithm for the final reconstructed WZ frame reconstruction. The algorithm combines the blocks taken from SI based on DMI provided by the encoder and decoded quantized blocks. The primary contribution is designing an algorithm for reconstructing residual frames from a set of combined SI blocks and decoded quantized blocks.
The rest of this work is organized in the following manner:
Section 2 discusses the related research studies that have been conducted on DRVC.
Section 3 presents the proposed QUATRID codec in detail. The experimental findings are presented and discussed in
Section 4. Finally, the research findings are concluded in
Section 5.
3. Proposed DVRC Scheme
In this section, we present the proposed Quantized Transform Residual Decision (QUATRID) scheme. The QUATRID scheme, illustrated in
Figure 1, introduces a set of novel features deployed within a baseline DRVC codec:
Residual Frame and QUAM Implementation
Correlation Noise Modelling for a Designed Scheme
Residual Frame Reconstruction
The new features of QUATRID introduced to the DRVC codec are marked in
Figure 1 (entitled in dotted boxes 1, 2, and 3).
The QUATRID codec’s main encoder feature is QUantized TrAnsform Decision Mode (QUAM). The QUAM decides to skip or code any quantized transform 16 × 16 block (Bq16). If Bq16 is supposed to be coded, QUAM further processes it to extract the nonzero quantized transform 4 × 4 block (Bq4). It generates the decision mode information (DMI) and passes the nonzero Bq4 for channel encoding, while the DMI is sent to the decoder. The QUATRID decoder’s main feature is an online CNM, for which residual error is computed first. Then, based on DMI, CNM is calculated for corresponding nonzero blocks. The second important feature is the reconstruction of the R frame. First, based on DMI information, the full length decoded quantized transform bands are generated by combining the decoded quantized bin and blocks from SI correspond to skipped blocks for R frame reconstruction. Then, the reconstruction process for a transformed R frame is executed. The detailed implementation of each feature is discussed in the following subsections.
3.1. Residual Frame and QUAM Implementation
This subsection presents the details of R frame calculation and implementation of QUAM deployed at the encoder. The R frame calculation in the QUATRID scheme differs from most DRVC codecs. Instead of taking the difference between the current W frame and its estimated version
, the R frame is computed by taking the difference between the current W frame and the previous frame. To derive the general mathematical notation, suppose that we have a video with N number of frames (I). Currently, we are at the k
th index frame (I
k) then our previous frame is I
k−1. Then mathematically, the R frame is presented in Equation (1), the x and y determine the position of any pixel in a frame.
Afterwards, the R frame is decomposed into the 16 × 16 blocks, which are then 4 × 4 block-wise transformed and quantized before the QUAM process. The QUATRID makes use of the quantization metric deployed in [
35] for R frame quantization. However, the quantization step (W
q) is computed differently and is presented in Equation (2). In Equation (2), the
define the maximum absolute coefficient value of the corresponding transform band
and
defines the quantization level of the transformed band
. The
varies according to quantization metrics and goes from 1 to 16. This quantization step calculation leads to coarse quantization.
The QUAM is applied on each Bq16, which determines whether it will be skipped or passed to the channel encoder. It generates the DMI and nonzero Bq4, which are converted into bands. Finally, the bit planes are generated from these bands and channel encoded.
The flowchart in
Figure 2 presents the general working scenario of QUAM. Starting with the B
q16, at the first step, the Decision Mode (DM) checks whether the B
q16 should be skipped or encoded. Based on the sum of absolute values within block B
q16, as expressed in Equation (3), the decision information is passed to the DMI, where i and j determine the position of the quantized transform coefficient in B
q16.
Equation (3) states that if the sum of the absolute of all values of the block B
q16 is zero, then it will be skipped; otherwise, the coding mode will be activated. First, the coding mode-I (CM-I) will be activated in the coding mode process. The CM-I splits B
q16 into 8 × 8 quantized transform blocks (B
q8), where each B
q8 block is tested using Equation (4).
Equation (4) defines that if the sum of the absolute of all coefficients of B
q8 is not equal to zero, it is considered a nonzero block; otherwise, its skipped. The coding mode-I passes nonzero B
q8 blocks to the coding mode-II (CM-II) for further processing. It also passes the code and skipped block information to DMI. In the CM-II process, the nonzero B
q8 is first split into 4 × 4 quantized transform block (B
q4). Then, it analyses each B
q4 to sort out the nonzero B
q4 and notifies the DMI about the coded and skipped blocks. Finally, the nonzero B
q4 blocks are passed for the channel encoding process. The CM-II skips B
q4 if the sum of the absolute of all coefficients of B
q4 is equal to zero; otherwise, it is coded. Mathematically, the CM-II conditions are presented in Equation (5), where m and n determine the position of the quantized transform coefficient in B
q4.
3.2. Residual Error and Correlation Noise Model
The accurate online correlation noise model (CNM) improves error correction and coding efficiency. With an accurate online CNM, the channel decoder error correction capability improves, while demanding fewer parity bits from the channel encoder. In conventional DVC, the residual error, also known as noise residue, between the actual W frame and its estimated replica SI frame is required to calculate online CNM. In conventional DVC, the Laplacian distribution is normally used for modelling noise residue or residual error. In the DRVC codec, accurate residual error of the actual R frame and its replica Ŕ frame generated at the decoder is a quite challenging task. Therefore, it is hard to establish accurate online CNM to gain coding efficiency. In the proposed QUATRID scheme, the zero B
q16 and B
q4 blocks are skipped at the encoder and, therefore, it is required to establish the online CNM that fit the decoding of only the nonzero coded B
q4. The following steps are taken to implement an online CNM that is the best fit for the QUATRID scheme. In the QUATRID decoder, the
frame at the decoder is computed by subtracting the previously decoded frame from SI (x,y). To derive the mathematical notation, consider that our previous decoded frame is I
k−1 (x,y), where the replica frame of the current I
k (x,y) frame generated at the decoder is SI (x,y). Then mathematically,
is presented by Equation (6).
Since the original R (x,y) frame is not available at the decoder, the model is required to define the variance between R (x,y) frame and corresponding estimated Ŕ (x,y) frame for estimation of Laplacian distribution parameter α. Therefore, we adopted the frame-level online α estimation, as described next.
First, the residual error frame Err
R (x,y) is computed. In an ideal case, when the original R (x,y) frame is available at the decoder, the Err
R (x,y) frame is computed by taking the difference between the original R (x,y) and Ŕ (x,y) frame. However, the R (x,y) frame is not available at the decoder; therefore, Err
R (x,y) frame is computed with the motion compensated version of
and
frames as follows, by Equation (7).
and represent the forward and backward motion compensated frames, respectively. The corresponds to a pixel location in the ErrR frame. The and represent the (horizontal and vertical) motion vectors for the and frames, respectively.
Based on DMI, the corresponding blocks are extracted from Err
R (x,y) as Ȓ (u,v). Then the variance (
) is computed as follows
where u and v determine the position of a pixel in Ȓ. The variance
is a confidence measure of the Ŕ (x,y) frame creation process that indicates how good is the outcome of the frame interpolation process. Ideally, the
calculated in Equation (8) should be close to a variance of residual between actual R (x,y) and Ŕ (x,y). Since the actual R (x,y) is not available at the decoder, therefore, the variance (
) computed in Equation (8) is a proposed way to represent the variance between the original R (x,y) and Ŕ (x,y), then,
is computed as presented in Equation (9)
Finally, the Laplacian distribution model for QUATRID is presented by Equation (10)
3.3. Residual Frame Reconstruction
The decoded quantized bin is formed after channel decoding. Then, the DMI sent by the encoder is utilized to create the final full length quantized bin at the decoder for the reconstruction of the corresponding residual frame , which is further added up with the previous frame to get the final decoded Wyner-Ziv frame Ŵ.
The channel decoded quantized bin is converted into decoded Bq4 blocks to generate the full length quantized bin. Based on the DMI, all decoded Bq16 are generated by combining these decoded Bq4 and zero 4 × 4 blocks (that represent skipped blocks). The decoded Bq16 blocks, which include both skipped and non-skipped Bq16, are converted into bands, and finally, the reconstruction of transform bands is performed. The reconstruction of each coefficient of the corresponding band is discussed in detail later in this section. After the successful reconstruction of all required bands, the 4 × 4 blocks are created. Finally, the decoded residual frame is formed after the inverse transformation of these blocks.
The accurate reconstruction of every single coefficient of the residual band plays a key role in the final decoded W frame. Generally, the decoded quantized bin consists of intervals (q) that are either zero (q = 0) or above zero (positive interval range, q > 0), or under zero (negative interval range, q < 0). The quantization process at the encoder introduces the quantization error. The reconstruction process assists in reconstructing values close enough to actual values by minimizing the quantization errors and leads to a better W frame reconstruction.
As mentioned earlier, the quantized bin consists of three types of intervals; zero, positive and negative. Generally, the transformed coefficient lies between the lower and upper bound ranges. For any positively transformed coefficient, the general lower bound is and the upper bound is . For any negative transformed coefficient, the lower bound is and the upper bound is where q is negative. Some transformed coefficients also lie in so their upper bound is and lower bound is . During the quantization at the encoder, it is noticed that after the quantization, the positively transformed coefficients usually go toward the lower bound. Where after quantization, the negative values go toward the upper bound. Therefore, all three quantized interval conditions mentioned earlier are reconstructed with different algorithms to achieve significant reconstruction. Finally, the mathematical expressions are given in Equations (11)–(13).
If decoded quantized bin value (interval)
, it means that the actual transformed value was in the interval between
and
. Then reconstruction of a coefficient is performed by the different boundary conditions given in Equation (11), where
defines quantization step,
Laplacian distribution parameter computed earlier.
In this case upper bound is and lower bound is . So, after quantization, for such a range, the resultant interval is q = 0; therefore, the boundary conditions defined by Equation (11) suit the reconstruction process. Generally, it is considered that when SI is of high quality, the , and vice versa. As is changing from frame to frame, it contributes accordingly. When is high, then it slightly moves the resultant reconstruction value and vice versa. Usually, when the actual transformed coefficient values are within and , the designed quantization slides them to zero. Therefore, second condition of Equation (11) shifts the y toward the lower bound range and reconstructs the improved quality coefficient . When y is out of the lower bound range, the first condition of Equation (11) improves the reconstruction by bringing it within the range. Similarly, when y is equal or out of the upper bound range () then third condition of Equation (11) improves reconstruction by sliding it within the upper bound.
If
, then reconstruction is performed by different boundary conditions given in Equation (12). If decoded quantized interval value is greater than zero (
), it determines that the actual transformed coefficient encoded at the encoder lies in positive interval ranges.
Equation (12) is used to reconstruct the coefficient that belongs to the positive intervals with lower bound and upper bound . When SI’s coefficient y lies within or it is under the lower bound, then first condition of Equation (12) enhances the reconstruction quality by sliding it to the upper bound side. To improve the reconstruction quality when SI’s coefficient y lies out of the upper bound then second condition of Equation (12) reconstruct the enhanced quality coefficient by bringing it within the upper bound.
If
, then reconstruction is performed by different boundary conditions given in Equation (13). If decoded quantized interval value is less than zero (
), it determines that the actual encoded transform coefficient lies in negative interval ranges.
Equation (13) is used to reconstruct the coefficient that belongs to the positive intervals with lower bound and upper bound . Usually, when actual transformed coefficient values are within especially close to the lower bound, the designed quantization function slides intervals to the upper bound side. Therefore, deploying a second condition of Equation (13) shifts the y toward the lower bound range and reconstructs the improved quality coefficient . When y is out of the lower bound range, the first condition of Equation (13) improves the reconstruction by bringing it within the range. Similarly, when y is equal or out of the upper bound range () then third condition of Equation (13) improves reconstruction by sliding it below the upper bound.
To briefly conclude the proposed QUATRID, its primary attributes are:
QUAM for DRVC that drops the zero QT blocks of the residual frame. QUAM generates fewer bit planes to be channel encoded, thus reducing the complexity of the channel encoding. Similarly, the fewer channel-encoded bit planes reduce the complexity of channel decoding. Therefore, QUAM reduces the overall computational complexity while improving coding efficiency.
Online correlation noise model (CNM) at the decoder to perform the error correction of the bit planes generated using a limited number of nonzero QT blocks. The decision mode information (DMI) is utilised for CNM.
A reconstruction algorithm for the final reconstructed WZ frame reconstruction. The algorithm combines the blocks taken from SI based on decision mode information provided by the encoder and decoded quantized blocks. The primary contribution is designing an algorithm for reconstructing residual frames from a set of combined SI blocks and decoded quantized blocks.
4. Experimental Results and Analysis
The experiments are carried out in MATLAB 2018b on an Intel Core-i7-7820HQ CPU 2.90 and operating system (OS) Windows 10 (64-bits) system. The performance of the codec is presented in the average computational time of partial WZ encoding time (Tp), the average computational time of full WZ encoding (Tf), the average computational time of channel encoding (Tc), the average number of encoded bit planes (NBP) per frame and rate-distortion (RD) performance of video sequence. The test video sequences utilized for performance are Hall, Foreman, and Coastguard with a frame rate (fps) of 15 Hz and video size of 176 × 144. For DIS and QUATRID codecs, the group of pictures (GOP) size 2 is adopted, meaning one frame is called a keyframe (KF), and the other is a Wyner-Ziv frame. The quantization parameter (Qp) is deployed for KF quantization. The quantization metric (Qm) is used for the quantization of the W frame in the DIS codec and the residual frame in QUATRID codec. The test conditions (RD points) provided by DIS are utilized for Hall, Foreman, and Coastguard. The RD performance of QUATRID is compared with conventional DVC (DIS) codec and conventional Intra H.264/AVC codec.
Figure 3 shows the computational times of different components of the WZ Encoder, such as the average partial computational time of WZ encoding (T
p), the average full computational time of WZ encoding (T
f), and the average computational time of channel encoding (T
c). These computational times are measured for different videos with different motions.
Table 1 provides a detailed comparison of T
p and T
f of the DIS and QUATRID codec for Hall, Foreman, and Coastguard video sequences with GOP size 2. The components whose T
p and T
f are measured for the QUATRID codec are shown in
Figure 3. In
Table 1, the Partial Computational Time Improvement Ratio (CTIR
p) and full Computational Time Improvement Ratio (CTIR
f) define the improvement ratio in average computational time by the QUATRID codec compared to DIS and are calculated by Equations (14) and (15). The T
p,DIS and T
p,QUATRID defines the T
p of DIS and QUATRID, respectively. The T
f,DIS and T
f,QUATRID defines the T
f of DIS and QUATRID.
The CTIRp of Hall for low RD points is less than DIS because the Tp of the QUATRID is a bit higher than DIS. The Tp of the QUATRID is a bit high due to the computation of the QUAM process. The overall Tf of QUATRID is less than the Tf of the DIS because fewer NBP is channel encoded. Therefore, CTIRf is high for these low RD points. Further, the CTIRp reached up to 1.74 folds for high RD points. In addition to this, for such high RD points, the CTIRf is also high, which determines that QUATRID has low computational complexity. The CTIRf varies from 4.5 to 9.32 folds which determines that the computational complexity of QUATRID encoders is 4.5 to 9.3 times less than the DIS. The CTIRp for high RD points ranges from 1.25 to 1.74 folds. Therefore, the QUATRID performs 1.25 to 1.74 folds faster than DIS. At these points, the number of bands to be quantized increases and is further processed; therefore, DIS exhibits high computational complexity because it needs to perform band organization and bit-plane extraction for all the blocks. Where in the QUATRID codec, most of the zero Bq16 and Bq4 are dropped during the QUAM process; therefore, fewer bit planes are generated with the remaining nonzero Bq4. These fewer bit planes are encoded quickly; therefore, due to less channel coding computational time, the CTIRf is increased. Further, for the RD points where the Tp of QUATRID was high due to the QUAM process, the Tf of QUATRID remains less than DIS because few bit planes are channel encoded. Therefore, skipping the zero Bq16 and Bq4 blocks assists in reducing the channel encoding process. It also reduced the channel decoding process because fewer bit planes are decoded. Thus, low computational channel coding (encoding and decoding) is achieved with the QUAM process deployed on the DRVC codec.
The T
p, T
f, CTIR
p, and CTIR
f values for both Foreman and Coastguard are also shown in
Table 1. Due to the QUAM process, T
p,QUATRID is higher than T
p,DIS for low RD points, resulting in a decrease in CTIR
p. However, at high RD points, the T
p,QUATRID is smaller than T
p,DIS because DIS requires more time to generate bit planes, resulting in an increase in CTIR
p for both sequences. The table demonstrates that Foreman’s CTIR
p ranges from 0.53 to 1.43 folds, whereas, for Coastguard, it ranges between 0.6 and 1.87 folds. Further study of the findings indicates that for all RD points, T
f, QUATRID is smaller than T
f, DIS because the QUATRID channel encodes fewer bit planes, increasing CTIR
f. The high CTIR
f determines that the computational complexity is reduced by the pre-mentioned times. Based on the table, Foreman and Coastguard’s CTIR
f ranges from 3.75 to 5.96 folds and 5.38 to 7.59 folds, respectively.
The average computational time of channel encoding (T
C) and the average number of bit planes (NBP) are provided in
Table 2. The CTIR
C defines channel encoding time efficiency. It is defined as the percentage improvement in T
C taken by the QUATRID codec to T
C taken by DIS. The BPR defines the percentage bit plane reduction. The CTIR
C and BPR are calculated by Equations (16) and (17), respectively. In Equation (16), the T
C,DIS and T
p,QUATRID defines the T
C of DIS and QUATRID, respectively. In Equation (17), NBP
DIS and NBP
QUATRID define the NBP of DIS and NBP of QUATRID codecs, respectively.
Table 2 results depict the average computational time of channel encoding (T
C) and an average number of bit planes for Hall, Foreman, and Coastguard video sequences. The results illustrate that for all the RD points, the channel encoding time taken by the DIS is comparatively higher than the time taken by the QUATRID codec for all video sequences. The T
C,DIS is high because many bit planes are required to be channel encoded. However, the QUATRID codec has fewer bit planes to be channel encoded, achieving the high CTIR
C for all RD points. The table analysis for each video illustrates that Hall’s CTIR
C ranges from 7 to 33 folds. The high CTIR
C indicates that the computational cost of channel coding is reduced by the stated factor. This CTIR
C is too high for low RD points because fewer bit planes are channel encoded. For high RD points, the CTIR
C is slightly reduced because NBP increased. The bit plane reduction percentage is computed to determine the performance of QUATRID in terms of the capability to reduce the bit planes. The high BPR determines that a large number of bit planes are reduced, and a small BPR determines a smaller number of bit planes is reduced. The high BPR also determines that the T
C is smaller and vice versa. This high BPR also shows that channel decoding computation complexity is smaller because fewer NBP is decoded. The QUATRID encoded significantly less NBP compared to DIS. Thus, the BPR ranges from 84% to 97% for Hall, which is a major advantage of incorporating QUAM.
In addition, the high BPR determines the low channel coding computational complexity. Further analysis of the table for Foreman leads to the conclusion that the TC,QUATRID is comparatively smaller than TC,DIS because fewer NBP are channel encoded. Consequently, QUATRID acquired a high CTIRC, which varies from 4.8 to 22.69 folds and BPR ranges from 78% to 95.6%. Similar to the other sequences, the QUATRID shows the same TC and NBP findings for Coastguard. Thus, Coastguard CTIRC ranges between 6.7 and 34 folds, while BPR ranges between 75% and 96.95%.
Table 1 and
Table 2 analysis conclude that instead of adding any computational complexity, the QUAM process improves the CTIRp, CTIRC, and CTIR
f of the QUATRID. In addition, QUAM assists in reducing the channel decoding process.
Table 3 shows the RD performance of DIS, Intra, and QUATRID codecs for the Foreman video sequence with a GOP size of two. The results show that the QUATRID has shown dominance in coding efficiency. The evaluation of experimental results indicates that the coding efficiency of QUATRID over DIS ranges from 13.6 kbps to 42 kbps. Further analysis shows that for low RD points, At low RD points, the coding efficiency attained by the QUATRID compared to DIS ranged from 19.32 kbps to 26 kbps while QUATRID encoder computation was reduced by more than four times. While it costs only in the degradation of the PSNR up to 0.26. Further, at high RD points, the coding efficiency achieved by QUATRID was up to 42.11 kbps with slight PSNR degradation of up to 1.55 dB. However, the QUATRID computational complexity analysis shows low computational complexity for these RD points. Further NBP analysis shows that fewer channel encoded bit planes are utilized for all RD points and still achieve the comparable PSNR. The analysis evidenced that the BPR is up to 78% for the highest RD point. In addition to this, for the same RD point, the CTIR
f is more than 3.75 times and goes up to more than 5 times. Further, the BPR ranges from 78% to more than 95%. In addition to this, coding efficiency achieved by QUATRID in comparison to Intra varied from 27.74 kbps to 30.22 kbps for low RD points, while PSNR improvement of 0.81 dB was observed. However, at high RD points, it ranged from 9 kbps to 30.66 kbps with a PSNR degradation of 1.33 dB. Based on the results analysis and discussion, it can be generalized that QUATRID obtained high coding efficiency for all RD points than DIS and Intra. In contrast to DIS, QUATRID decreased the computational complexity of the encoder and the computational complexity of channel coding, while resulting in a modest PSNR degradation. The results evidenced that the later bands have few nonzero values due to coarse quantization in high RD points. Even the intensity of those values is too small, so they do not contribute much to reconstruction. Therefore, the reconstructed frame quality could not improve on a large scale due to these mostly zero values. However, for the frames which have a considerable count of nonzero B
q4, the reconstruction process improves their quality on a large scale.
Table 4 illustrates the RD performance results comparison of DIS, Intra, and QUATRID codecs for Coastguard video sequence with GOP 2. The results indicate that QUATRID coding efficiency varies from 12.01 kbps to 38.05 kbps. Further, the results analysis indicates that QUATRID coding efficiency varies from 16.56 kbps to 22.08 kbps for the low RD points. In addition, the coding efficiency varies between 12.01 kbps and 38.05 kbps at high RD points. Further, in comparison to Intra codec, the QUATRID coding efficiency remains better for all RD points. The results analysis demonstrates that the QUATRID is coding efficient at low RD points by saving the coding rate ranges between 49.8 kbps and 55.34 kbps. Whereas, at high RD points QUATRID saved coding rate ranges from 6.45 kbps to 68.86 kbps. Further analysis directs that for low RD points, the RD performance is comparable with DIS since PSNR is relatively close to the DIS. Especially for the lowest RD point, the QUATRID increased the PSNR gain up to 0.07 dB over DIS. However, for intermediate and high RD points, it lags from 0.05 to 0.93 dB. In contrast, the QUATRID is much better and outperformed the Intra by gaining the PSNR ranging between 0.51 dB and 0.58 dB for low RD points. Furthermore, the QUATRID performance is quite close to Intra at intermediate RD points with a PSNR gain of 0.04 dB; however, it lagged at high RD points as PSNR degradation ranges up to 0.26 dB. While coding efficiency is far better than both Intra and DIS for all RD points. Close inspection of the results at the bands level reveals that most of the values in later bands have very few nonzero values due to coarse quantization. Even these values have small intensity. Therefore, reconstructing these dead zone values of later bands is challenging. Therefore, these zero values could not improve the reconstructed frame quality on a large scale. However, for the frames with a considerable count of nonzero B
q4, the reconstruction process improves their quality on a large scale. Although the QUATRID PSNR slightly degraded at high RD points, the coding efficiency is too high throughout all RD points. Other major performance advantages of QUTRID are the CTIR
C, CTIR
f, and BPR. The achieved CTIR
C varies from 6.7 to 34 folds. In addition to this, the CTIR
f varies from 5.2 to 7.6 folds. In addition to this, the BPR achieved by QUATRID is 85% to 97%.
Table 5 determines the rate-distortion (RD) performance of DIS, Intra (Conventional Codec), and QUATRID codec. The RD performance table for the low motion Hall video sequence determines that the QUATRID codec achieved high coding efficiency for low RD points, whereas its PSNR slightly decreased. This video sequence is of low motion and coded at a low bit rate even at high RD points. Therefore, DMI generated by the QUATRID becomes an additional burden at high RD points and increases the coding rate. The coding efficiency gained by the QUATRID codec varies from 6.56 kbps to 10.65 kbps. However, the QUATRID codec lagged in coding efficiency for high RD points compared to DIS. At the same time, the QUATRID codec is much more coding efficient than Intra. For the RD points where the QUATRID codec is not enough coding efficiently as DIS, the number of nonzero B
q4 per frame was increased. Once the number of nonzero B
q4 increased in a frame, the DMI bits rapidly increased, and it affected the overall coding efficiency for corresponding RD points. Furthermore, after the QUAM process, most of the frames generated a small number of nonzero B
q4. Thus, fewer bit planes are generated and encoded. The reconstruction of such frames improves the reconstructed frame quality, but its effect is smaller than in other reconstructed frames with more nonzero B
q4. The band level evaluation of such frames determines that later bands of the frame have few nonzero values due to coarse quantization, and the intensity of these nonzero values is small. Therefore, the reconstructed frame quality could not improve on a large scale due to these mostly zero values. For the frames which have a considerable count of nonzero B
q4, the reconstruction process improves their quality on a large scale. The PSNR analysis shows that the QUATRID outperforms the DIS by 0.06 dB for low RD points. For intermediate RD points, it is quite comparable with DIS. However, for high RD points, it lagged up to 0.82 dB. It is evident that for low RD points, the performance of QUATRID is 0.10 dB better than Intra at the lowest RD point. While for other low RD points, it is comparable with Intra codec. However, the QUATRID lagged up to 1.11 dB for high RD points. The major advantage of QUATRID is CTIR
C, CTIR
f, and BPR for these high RD points. The CTIR
C and CTIR
f are 7.9 and 5.85 folds, respectively. The BPR by the QUATRID codec is up to 84% for these high RD points. The maximum BPR achieved by QUATRID for Hall is 97% at low RD points.
Figure 4 shows the RD performance of different video sequences with GOP 2 for DIS, Intra, and QUATRID.
Figure 4a shows the RD performance of the Hall video sequence. The graph analysis shows that the QUATRID performance is comparable with DIS for low RD points; however, it outperformed the Intra codec. Further investigation determines that QUATRID performance slightly lagged compared to the DIS for intermediate RD points, but slightly improved over the Intra codec. The RD performance QUATRID at high RD points lagged compared to both DIS and Intra codec. However, close analysis determines that the major advantages of QUATRID achieved throughout all RD points are high coding efficiency, low encoder computational complexity, and less channel coding process. The coding efficiency varies from 6.02 kbps to 7.65 kbps. In addition to this, the QUATRID computational complexity analysis determines low computational complexity for all RD points. Further NBP analysis shows that fewer channel encoded bit planes are utilized for all RD points and can still achieve the comparable PSNR. The analysis is evident that the BPR is up to 84% for the highest RD point. In addition to this, for the same RD point, the CTIR
f is more than 5.8 folds, which determines that QUATRID performed 5.8 times faster than QUATRID. Although QUATRID slightly lagged in PSNR gain at some RD points, channel encoding computational complexity was reduced by more than fivefold because the BPR ranged to 84%. Further, QUATRID gains coding efficiency compared to Intra, which varies from 64 to 83 kbps. Despite its modest coding efficiency for Hall video, the main benefit of QUATRID is its lower computational cost for channel encoding compared to the DIS codec.
Figure 4b shows the RD performance of the Foreman video sequence. The QUATRID codec at low rate points shows better RD performance than DIS and Intra. It achieved the gain in PSNR. While for intermediate rate points, the QUATRID codec lagged from the DIS. However, it outperforms the Intra. The QUATRID codec PSNR performance marginally degraded for the high rate point from both the DIS and Intra. However, the BPR analysis shows that the QUATRID BPR is up to 78% for such a high rate point. In addition to this, for the same RD point, the CTIR
f is more than 3.75 times and goes up to more than five times. In
Figure 4, graph (c) shows the RD performance of the Coastguard video sequence. The QUATRID outperforms low rate points and gains up to 0.07 dB PSNR. The QUATRID’s RD performance is comparable with the DIS code for intermediate rate points. In addition to this, it is better from the Intra codec. Comparing the performance at intermediate rate points, QUATRID is comparable to DIS and superior to Intra. However, the QUATRID RD performance slightly lagged from the DIS at high rate points. However, the QUATRID depicts low computational complexity and comparable PSNR gain for Coastguard, as it does for other sequences. The analysis evidenced that the BPR is up to 85% for the highest RD point. Moreover, for the same RD point, the CTIR
f is more than five folds, indicating that QUATRID performs 5.38 to 7.59 times faster than the DIS codec.
The performance graphs show that the QUATRID performance in terms of rate for all the video sequences is very good. The coding rate for all the RD points for all video sequences is less than DIS and Intra, so this depicts the high coding efficiency offered by the QUATRID for all RD points for all video sequences. The coding efficiency achieved by QUATRID for the Hall video sequence varies from 6.02 kbps to 7.65 kbps. The Hall video is a low motion video, and from the results of DIS, we can see that its coding rate is lower than other video sequences. Therefore, QUATRID achieved up to 10.65 kbps coding efficiency. The RD performance graph indicates that QUATRID is coding efficient throughout the RD points of the Foreman video sequence. The coding efficiency improvement over DIS, achieved by QUATRID, varies from 13.6 kbps to 42.11 kbps. The coding rate of QUATRID for Coastguard is lower than DIS for all RD points, which means coding efficiency is improved. The coding efficiency achieved by QUATRID varies from 12.01 kbps to 38.05 kbps.
Table 6 determines the average feedback channel requests per frame when channel coding (LDPCA) is deployed with DIS and with QUATRID. These average feedback request results are for the highest RD points for all video sequences. The evaluating the performance by average feedback requests per frame, the performance of the QUATRID is far better than the DIS for all video sequences. The average feedback requests reduction per frame occurs because QUATRID has less number of bit planes per frame compared to a DIS codec. However, the result evaluation determines that the average feedback requests per bit plane of the DIS for the Hall and Foreman sequence are slightly less than the QUATRID. However, due to the proposed CNM efficiency, the average feedback per bit plane of the QUATRID is far better than the DIS for the Coastguard sequence. The feedback channel requests reduction ratio is obtained by dividing the feedback channel requests of DIS by QUATRID’s feedback channel requests to compare QUATRID and DIS performance. The high feedback channel requests reduction ratio (higher than 1) demonstrates that QUATRID efficiently minimises the feedback requests compared to DIS. Furthermore, the high feedback request reduction ratio shows that latency drops proportionally to that degree. The QUATRID feedback requests reduction ratios for Hall, Foreman, and Coastguard are 2.8, 3.2, and 4 folds, respectively, demonstrating that QUTRID feedback requests are reduced to that extent. The latency is also reduced accordingly. The average feedback requests are reduced because of the CNM model and also because fewer bit planes are coded.
Table 7 summarizes the performance comparison of DIS, QUATRID, and Intra codecs. A comparison of the computational complexity of QUATRID with DIS is carried out by estimating computational time improvement ratios (CTIR
p, CTIR
f, CTIR
C) by Equations (14)–(16). Furthermore, codec performance is evaluated in terms of the average number of bit planes (NBP) per frame, and performance comparison of the QUATRID and DIS is conducted by bit plane reduction percentage (BPR). In addition to this, Bjøntegaard delta performance is computed to evaluate the coding efficiency and quality achievement of QUATRID compared to DIS and Intra codecs.
Further, in
Table 7, the computational complexity performance comparison of QUATRID and DIS is evaluated in terms of average full encoding computational time improvement ratio (CTIR
f) and average channel coding computational time improvement ratio (CTIR
C) for Hall, Foreman, and Coastguard. The high CTIR
f and CTIR
C establish that QUATRID has a fast computation, indicating the low computational complexity of QUATRID compared to DIS. The CTIR
f of Hall, Foreman, and Coastguard varied from 4.5 to 9.3 folds, 3.7 to 5.9 folds, and 5.2 to 7.5 folds, respectively. This establishes that the QUATRID encoder performed the pre-mentioned times faster than DIS for mentioned videos. While CTIR
C of Hall, Foreman, and Coastguard varied from 7 to 33 folds, 4.8 to 22.6, and 8.7 to 34 folds. This establishes that the QUATRID channel encoder performed the pre-mentioned times faster than DIS’s channel encoder for mentioned videos. The QUATRID channel encoding process is fast because it has to encode fewer bit planes. The QUAM deployed with the QUATRID encoder dropped a large number of zero quantized transform blocks of sizes 16 × 16 and 4 × 4, leading to fewer bit planes with remaining nonzero blocks. The bit plane reduction percentage (BPR) computed between an average number of bit planes (NBP) encoded by DIS and QUATRID determines the percentage reduction in channel encoded bit planes of QUATRID. The high BPR percentage determines that relatively few bit planes are channel encoded using QUATRID, which leads to high CTIR
C. The BPR of Hall, Foreman, and Coastguard varied from 84% to 97%, 78% to 95%, and 86% to 96%, respectively. The CTIR
f, CTIR
C, and BPR analysis indicate that QUATRID full encoding, channel encoding computational complexity is reduced to a great extent and where high BPR percentage indicates that the overall channel coding (encoding and decoding) process is reduced with the deployment of QUAM.
Further,
Table 7 determines the Bjøntegaard delta performance, deployed to measure the QUATRID’s performance for different video sequences. The RD performance defines the codec’s performance in terms of coding rate and PSNR achievement at different RD points. The evaluation of the RD findings of Hall, Foreman, and Coastguard demonstrates that QUATRID has achieved a high level of coding efficiency throughout all RD points, where PSNR improved at certain RD points and slightly reduced at other RD points. The coding efficiency identified remains 6 to 7 kbps, 13 to 42 kbps, and 12 to 38 kbps for Hall, Foreman, and Coastguard, respectively. The RD performance is used as a quality evaluation tool and shows the PSNR dependency on the bit rate scale. It evaluates which codec performs better in PSNR (or reduced distortion effectively) at different bit rates. From RD curves, we can differentiate which codec performs superior at the given rate points. However, the Bjøntegaard delta (BD) model is used to compute the average PSNR and bit rate differences between two RD curves of the different codecs. Bjøntegaard delta analysis differentiates these RD curves by computing a single number or point between two RD plots, which tells almost everything. The Bjøntegaard delta metric computation contains two parts—BD PSNR (also written as B-DSNR) and BD Rate (B-DBR). Computing the Bjøntegaard Delta metric and its evolution offer a good comparison of the RD performance of two different codecs. Both BD Rate and BD PSNR are interpreted individually and differently. The BD Rate indicates the number of bits saved (coding efficiency) by the test codec in comparison to the reference codec while keeping the same PSNR.
The detailed profiling of the BD Rate calculated with DIS and QUATRID RD curves of Hall, Foreman, and Coastguard identified that QUATRID is coding efficient and capable of saving the bit rate between 5.4% and 10.48% while gaining the same PSNR as the DIS, except for Hall video sequences. For this low motion video, QUATRID demands a 4.2% bit rate to achieve the same PSNR. Therefore, break down these RD curves into low and high RD points and analyze each separately. This breakdown analysis for BD Rate shows that QUATRID effectively saves a high bit rate at low RD points for all video sequences compared to high RD points where it slightly lags in PSNR gain. The QUATRID shows coding efficiency by saving 2.7%, 15.5%, and 18.4% for Hall, Foreman, and Coastguard, respectively, while maintaining the same PSNR. The BD Rate calculated with Intra and QUATRID RD curves of Hall, Foreman, and Coastguard identified that QUATRID is coding efficient by saving the bit rate between 18.86% and 37.41% while gaining the same PSNR as the Intra. Finally, thorough profiling of BD PSNR calculated with DIS and QUATRID RD curves of Hall, Foreman, and Coastguard identified that QUATRID can gain the PSNR. The QUATRID PSNR gain varies from 0.061 dB to 0.32 dB while utilizing the same bit rate, except for the Hall video sequence, for which it lost 0.23 dB PSNR. Similarly, such as BD Rate analysis, break down these RD curves into low and high RD points and analyze each separately. This breakdown analysis for BD PSNR identifies that at a low RD point, QUATRID shows the capability of gaining a PSNR for all video sequences. The BD PSNR for Hall, Foreman, and Coastguard directed that QUATRID gain the PSNR 0.08 dB, 0.44 dB, and 0.68 dB, respectively, over DIS. Furthermore, the BD PSNR calculated with Intra and QUATRID RD curves of Hall, Foreman, and Coastguard identified that QUATRID gains 2.31 dB, 0.69 dB, and 1.74 dB PSNR, respectively.
Table 8 illustrate the RD performance comparison of Intra and QUATIRD Codecs for additional new test sequences; Akiyo and Salesman. The performance of QUATRID is compared only with the Intra codec. The test video sequences exhibit medium to high motion in some parts of the frame, which means motion changes from frame to frame in regular intervals of time with medium or high speed. Table analysis shows that for the Akiyo test sequence, the QUATRID codec achieves the coding efficiency from 80.15 kbps to 116.33 kbps over the Intra codec. Further, the PSNR gain of the QUATRID codec is from 0.26 dB to 1.33 dB. Further, the Salesman results evaluation unfolds that the QUATRID achieves coding efficiency from 123.33 kbps to 192.14 kbps over the Intra codec. In this video sequence, the QUATRID gains 0.13 dB to 0.73 dB for low RD points. The PSNR performance for high RD points was slightly degraded due to the coarse quantization of the last few AC bands. The coarse quantization generates a small number of low-intensity nonzero values.
Table 9 results depict the average computational time of full Wyner-Ziv encoding, channel encoding (T
C), and an average number of bit planes (NBP) for BUS and Coastguard video sequences with 352 × 288 resolution. The evaluation of results taken under random conditions determines that the average computational complexity of full Wyner-Ziv encoding (T
f) and channel encoding (T
C) of QUATRID is far better than DIS. The CTIR
f and CTIR
C of both sequences determine that QUATRID performs 1.17 to 1.7 times faster than DIS. Further analysis of a number of bit plane results depicts that QUATRID efficiently reduces 16.05 to 57.5 percent of channel encoding bit planes. Therefore, be able to reduce the channel encoding computational complexity, which directly reduces the channel decoding process.