1. Introduction
During video transmission, the channel introduces noise and degrades the transmitted information. To minimize the negative impacts experienced while transmitting data, various measures are implemented by employing tactics belonging to a specialized category of channel coding techniques known as forward error correction (FEC). These highly efficient strategies entail the incorporation of different codes with error-resolving capabilities, including Reed–Solomon, convolutional, Bose–Chaudhuri–Hocquenghem (BCH), and low-density parity-checking (LDPC) codes [
1]. This approach has demonstrated significant success in reducing transmission errors.
The main advantage of channel coding lies in enhancing the system’s performance over an uncoded transmission. Certain digital systems utilize a specific class of codes known as concatenated code pairs, which employ two levels of coding, namely, internal and external codes. Several systems use pairs of concatenated codes, exemplified by the digital video broadcasting terrestrial (DVB-T) system. With the escalating demand for broadcasting services, there has emerged a requisite for developing a novel, more efficient standard [
2]. This necessity led to the inception of DVB-T2 (digital video broadcasting—second-generation terrestrial), which offers advanced technology and augmented capacity across diverse terrestrial domains. Several notable enhancements have been introduced, with the evolution in FEC codes standing out as one of the most prominent. Assessing this transition is imperative to gauge the efficacy of the implemented improvements and comprehend their ramifications.
Although the DVB-T and DVB-T2 systems prove effective in error correction, it is essential to acknowledge the gap in current research. Many studies comparing or evaluating their performances primarily concentrate on technical aspects such as bit error rate and error correction capability, often neglecting user experience quality, particularly video quality [
3]. Given the importance of video degradation in Quality of Experience (QoE), a methodology has been devised to evaluate video quality temporally.
2. Related Work
The performance and parameters of DVB-T/T2 systems have undergone extensive evaluation across diverse research endeavors conducted in recent years [
4,
5,
6,
7,
8,
9,
10,
11]. For instance, in [
4], evaluation of the DVB-T/T2 Lite system’s performance utilizing the multiple-input single-output (MISO) transmission technique was conducted. In [
5], an enhanced receiver for DVB-T systems was proposed to mitigate the impacts induced by channel imperfections, particularly in phase and quadrature. Similarly, Ref. [
6] delved into the effects of time and frequency deviation on radar performance within the DVB-T system. Moreover, Ref. [
7] simulated a multiple-input multiple-output (MIMO) scenario over DVB-T2 and LDPC channel coding, employing the maximum likelihood estimation technique. Additionally, Ref. [
8] focused on analyzing the performance degradation attributed to phase and quadrature (IQ) imperfections in the orthogonal frequency division multiplexing (OFDM) modulator/demodulator of DVB-T and DVB-T2 Lite systems.
The quality of DVB-T2 transmission was analyzed in [
9] under fixed reception conditions to monitor the transition process from analog to digital terrestrial TV in Indonesia. Furthermore, Refs. [
10,
11] proposed the adoption of flexible waveform techniques such as the universal filtered multicarrier (UFMC) and filter bank multicarrier (FBMC) techniques for 5G networks to enhance the spectral efficiency of DVB-T2. Despite these advancements, the extant literature evaluating the performance of DVB-T/T2 systems overlooks crucial aspects for end users, notably, video quality. With the scaling of new streaming platforms and ascending demand for high-quality video services, a pressing need arises to improve user experience, particularly by considering human perception [
12].
The introduction of DVB-T2 heralded critical technological changes compared to its predecessor, notably impacting system performance. One of the main disparities lies in the error correction coding schemes employed, with DVB-T standardizing the concatenated pair as Reed–Solomon and convolutional (RS-CONV) for external and internal error correction, respectively. Conversely, DVB-T2 incorporates the BCH code concatenated with LDPC.
Despite the advantages presented by DVB-T2 over DVB-T, both technologies persist in numerous countries. For instance, DVB-T2 reigns as the dominant technology in Europe. At the same time, the RS-CONV pair retains usage in countries like Brazil and Japan, which have adopted the integrated-services digital broadcasting terrestrial (ISDB-T) system(
https://www.dibeg.org/world/ (accessed on 4 April 2024)).Evaluating the improvements facilitated by exchanging concatenated code pairs in this context is imperative. Consequently, several studies have explored the performance of FEC codes [
13,
14] and analyzed the performance gains using constellation techniques rotated within the DVB-T2 system [
2]. The assessment of these studies often employs the bit error rate (BER) metric, widely utilized in digital systems [
15,
16,
17,
18].
In addition to system metrics, numerous studies have evaluated various user metrics concerning video quality within the realm of Quality of Experience (QoE) [
19]. Objective metrics have also been employed to analyze video quality in multimedia systems [
20,
21,
22]. However, these studies typically overlook temporal variations influencing QoE, a critical gap identified in [
23,
24,
25,
26]. Temporal fluctuations in video quality can significantly impair QoE [
26], necessitating a quantitative analysis of video quality related to QoE. Accordingly, this study proposes a novel methodology for temporal (frame-by-frame) analysis of 4 K ultra-resolution video quality.
The evaluation of video quality employs SSIM/PSNR metrics. Although the SSIM primarily serves as the primary quality evaluation metric in this study, PSNR is also considered as an additional measure. Notably, the SSIM provides a measure closely aligned with human perception as it assesses the quality of digital images relative to the original image considering factors such as luminance, contrast, and structure [
27,
28].
The SSIM/PSNR values are obtained through a new methodology involving subjecting a set of frames representing the video to varying noise levels, simulating fluctuations in channel conditions during video transmission. This methodology yields the average percentage gain in the SSIM of one encoder relative to the other.
In summary, this article’s contributions include:
Utilization of the SSIM for temporal video quality evaluation, aligning closely with human perception;
Development of a novel methodology assessing SSIM/PSNR values through frame-by-frame analysis under varying noise levels, simulating channel condition fluctuations during video transmission;
Consideration of temporal variations enabling the generation of quantitative data for more accurate analysis of technology performance regarding video quality, aiding professionals and researchers in technology selection;
Identification of the most efficient techniques in reducing quality degradation, facilitating prediction and optimization of video quality, particularly for streaming ultra-resolution videos.
The subsequent sections of this document are structured as follows:
Section 3 elucidates the methodology and metrics employed to derive the results,
Section 4 showcases the outcomes attained in this investigation, and
Section 5 deliberates upon the findings. The final insights are encapsulated in
Section 6.
3. Methodology
In this section, we delineate the methodology employed for temporal evaluation, as depicted in
Figure 1. The approach encompasses several stages, each elucidated in detail below. Our objective is to provide a comprehensive explanation of our evaluation process and ensure methodological transparency.
Four selected frames from the Cross video are showcased in
Figure 2 to further explain the methodology. The discernible escalation in noise levels evident in each frame relative to its predecessor is noteworthy. This gradual noise amplification adheres to the previously described methodology, wherein noise intensity systematically escalates as the video progresses. These nuanced alterations in channel conditions allow us to assess the efficacy of various techniques amidst fluctuating noise levels. Ultimately, this facilitates the acquisition of valuable insights into the robustness and adaptability of the video transmission system.
3.1. Video Encoding/Decoding
In the Video Encoding block, the original YUV file is encoded to the H.264 standard with a frame rate of 50 FPS and a maximum duration of 10 s using FFmpeg (FFmpeg is a command line tool used to convert multimedia files between formats [
29]). YUV is a raw format commonly used in video compression studies [
30]. The videos used in the simulations are Cross, Crowd, Duck, Tree, and Park, which were obtained from Xiph.org (
https://media.xiph.org/video/derf/ (accessed on 4 April 2024)). The resolution, frame rate, number of frames, length of GOP, B-frames per GOP, and Quantization Parameter (QP) are considered as video codification parameters, as presented in
Table 1. The Video Decoding block transforms the received H.264 video into YUV format, allowing for further processing and analysis.
3.2. Channel Coding/Decoding
In the Channel Coding block, the video is converted into binary data, represented as binary information. In channel coding, redundant bits are added to the original data to enhance the robustness of the transmission. Moreover, various combinations of code pairs are utilized to further enhance the robustness of the transmission by channel coding. The RS-CONV and BCH-LDPC are codes applied sequentially. Initially, an external encoder (RS and BCH) adds redundancy to the binary information; subsequently, a second layer of redundancy is added by the internal encoder (LDPC and CONV), as presented in
Figure 3. The Channel Decoding block decodes the received information by performing the inverse operations of the RS-CONV and BCH-LDPC, thus obtaining the binary information corresponding to the received video.
A brief description of the codes is presented below.
BCH (Bose–Chaudhuri–Hocquenghem): BCH codes represent block codes that function on multiple bits instead of individual ones. Employing a BCH(n,k) code enables the encoding of k message bits and the generation of encoded n-bit data, where
n is equal to
with
[
31];
LDPC (low-density parity check): LDPC codes utilize a binary parity check matrix characterized by numerous elements with values of 1 and 0. In particular, LDPC coding encompasses diverse methodologies, including the implementation of matrices and graphs [
32];
RS (Reed–Solomon): RS codes are systematic cyclic linear block codes that operate on symbols with width of
m bits, where m is
. In RS codes, the codes are designed in such a way that every possible m-bit word is indeed a valid symbol [
33];
CONV (convolutional codes): Unlike block encoding, the output of the convolutional encoder is not in block format but in the form of a coded sequence generated from an input information sequence. The encoder generates redundancy through convolutions. The decoder utilizes the redundancy in the coded sequence to determine which message sequences are sent through an error correction action. Thus, in this type of error-correcting code, a set of
m symbols is transformed into a set of
n symbols [
34].
Table 2 lists the parameters of the channel encoders and the transmission and reception process of the OFDM systems. These parameters are based on the standards defined in [
35,
36].
3.3. Orthogonal Frequency Division Multiplexing Symbol TX/RX
The OFDM symbol TX refers to transmitting symbols in an OFDM (orthogonal frequency division multiplexing) communication system.
Figure 4 presents a basic diagram with the necessary steps for creating OFDM symbols. The first step is to convert binary information into complex symbols generated from modulation schemes such as PSK (Phase Shift Keying) or QAM (Quadrature Amplitude Modulation). Next, the Serial–Parallel (S/P) block will divide the transmitted data symbols serially into subgroups. These subgroups will be modulated onto subcarriers.
OFDM symbol RX refers to the reception process of OFDM symbols in a communication system. After dividing the received signal into individual subcarriers through FFT, channel estimation and equalization occur, using the characteristics of the channel to equalize the received signal. Next, symbols on each subcarrier are demodulated using appropriate modulation schemes. Subsequently, the demodulated symbols are mapped back to their original bit sequences.
3.4. Channel Additive White Gaussian Noise/Rayleigh
For simplification and processing reasons, the simulations are baseband signal transmissions. Consequently, a signal in the passband can be represented by a complex signal in the baseband. The noise variation is applied to AWGN and Rayleigh channels, subjecting different video segments to varying channel conditions. The noise level increases gradually as the video frame sequence progresses, achieved by adjusting the Signal-to-Noise Ratio (SNR). The relationship between SNR and the noise variance can be defined by Equation (
1) [
37].
where the following abbreviations apply:
The methodology employed results in a significant loss of quality in the information, as depicted in
Figure 5.
The simulations use the Rayleigh channel gain and delay values obtained from field tests performed by the Brazilian Association of Radio and Television Broadcasters (ABERT) and Mackenzie University [
38].
Table 3 presents the gains and delays.
3.5. Calculation of Structural Similarity Index/Peak Signal-to-Noise Ratio
Access to the original and received videos makes computing the SSIM/PSNR feasible. An objective evaluation of transmission quality can be obtained by computing the SSIM/PSNR between the original and received video.
3.5.1. Structural Similarity Index
The SSIM is responsible for comparing each frame of the original video and degraded video sequences to quantify the video quality. The SSIM is based on the idea that natural images are highly structured; their pixels have a strong dependency, particularly when they are spatially close. Thus, a strong dependency returns an index close to 1 (higher quality), whereas a weak dependency returns an index close to 0 (lower quality) [
39]. The SSIM is given by Equation (
2).
where the following abbreviations apply:
x and y are the dimensions of the frame;
and are the means of x and y, respectively;
and are the variances of x and y, respectively;
is the variance of x and y;
and are variables for stabilizing the division by a minimum.
3.5.2. Peak Signal-to-Noise Ratio
The PSNR value is calculated as:
where MAX is the maximum possible pixel intensity (which, in 8-bit images, is 255), and MSE is the mean square error between the reference image pixel value and the compressed image pixel value [
27].
4. Results
This section presents the temporal analysis of SSIM/PSNR performance between RS-CONV and BCH-LDPC concatenated pairs in AWGN and Rayleigh channels and QPSK, 16-QAM, and 64-QAM modulated signals. The methodology proposed in the study was used to calculate the SSIM/PSNR values of both the original and resulting videos obtained from the simulations. Each video was simulated in two channels and three modulation schemes, totaling 990 simulations. Each simulation was repeated 33 times to ensure statistical variability.
Table 4 provides comprehensive details of all performed simulations.
Figure 6 and
Figure 7 serve as examples of how the data generated by the methodology behave. Due to space constraints, not all figures are shown. In
Figure 6 and
Figure 7, the colored curves are the SSIM values extracted from the videos resulting from the simulations, and the reference curves are the SSIM values from the original videos. The colored points represent the values where there are losses in quality; therefore, it is possible to observe that, as the sequence of frames advances, the loss of quality increases, being more critical for RS-CONV/64-QAM.
The increase in losses is a trend that follows the adopted methodology, which employs a temporal variation of the noise where the inserted noise in the fragments that compose the video is gradually increased, thus causing an increase in the BER and sequential losses in the frames. In practice, this implies changes that may occur in the channel conditions during video transmission to the user. It can be concluded that techniques with fewer wrong frames are more robust and tend to provide more consistent video quality.
5. Discussion
An important robustness parameter is the number of frames transmitted without error, which directly reflects on the execution time and video quality. From the extracted SSIM/PSNR values, it is possible to calculate, for all the videos and defined scenarios, the proportional values of the number of transmitted frames without losses, and the mean values obtained are presented in
Table 5. The frames lost during the transmission can result in visual artifacts such as blurs, jumps, or distortions in the image, as well as abrupt cuts in the audio and loss of synchronization between image and sound. These problems can impair the end user’s QoE and compromise the understanding of the transmitted content. It becomes extremely important to maintain a stable and high-speed connection during the transmission and to use appropriate and up-to-date equipment.
When we examine the values in
Table 5, it becomes evident that BCH-LDPC with the QPSK modulation scheme presents better performance, obtaining values above 70% for most videos. Considering the number of frames transmitted without error, the mean values indicated in
Table 6 and
Table 7 can be obtained. The results of the SSIM show that, for all the videos simulated in BCH-LDPC/64-QAM, the values are close to or higher than those simulated in RS-CONV/QPSK. This results in gains in SSIM values close to or above 78% on average for BCH-LDPC in relation to RS-CONV. As an additional metric, the PSNR reinforces the results found, exhibiting a similar behavior through which the BCH-LDPC method demonstrates greater gains as conditions become more severe and where the greatest gains are in Rayleigh/64-QAM.
The values obtained indicate that, by using the BCH-LDPC/64-QAM system, it is possible to achieve video quality levels close to or higher than with RS-CONV/QPSK, thus enabling the use of videos in ultra resolution while maintaining acceptable quality levels. Thus, even in adverse channel conditions, the BCH-LDPC pair significantly improves the video quality delivered to the user. In addition, its proposal to meet the current demands of videos with increasingly high resolutions can be met satisfactorily, proving the importance of adopting BCH-LDPC in systems that use RS-CONV. An example is the Japanese standard ISDB-T. Such observations highlight the importance of the proposed methodology, which makes it possible to quantitatively define the average performance gain between DVB-T2/BCH-LDPC and DVB-T/RS-CONV systems in terms of objective metrics of video quality delivered to the end user. In contrast, metrics limited to the physical layer level, such as BER, do not consider important QoE parameters.
In our future work, we plan to use the proposed methodology to assess other systems based on video quality, such as the techniques used by 5G, including FBMC and UFMC. These methods have been suggested as alternatives to the traditional OFDM transmission technique used in DVB-T2 systems, as mentioned in [
10,
11].
6. Conclusions
This paper presents a methodology for temporal analysis of 4K ultra-resolution video quality, allowing quantitative comparisons regarding the objective video quality metric of the SSIM. The results of using BCH-LDPC and RS-CONV encoders in different scenarios were analyzed and compared, considering the H.264 digital compression standard. These results contribute to research on variations in image quality during video transmission.
The simulation results indicate that the BCH-LDPC encoder performed better on both AWGN and Rayleigh channels, demonstrating greater robustness in multipath environments. The BCH-LDPC/64-QAM system, with its average gain of over 78% in the SSIM metric compared to that of RS-CONV, has proven its adaptability even under adverse channel conditions and with higher spectral efficiency. These values provide clear quantitative evidence that the BCH-LDPC/64-QAM system can achieve video quality levels that are comparable or even superior to those of RS-CONV/QPSK. This robust performance under challenging conditions further supports the argument for considering the BCH-LDPC/64-QAM system for adoption in other existing systems, such as the Japanese ISDB-T standard.
However, it is important to acknowledge that the proposed methodology has some disadvantages. For example, certain channel conditions or specific scenarios may have limitations that were not addressed in this study. Additionally, there may be additional costs associated with implementing the BCH-LDPC system compared to RS-CONV. Such considerations should be taken into account when evaluating the feasibility and practical applicability of this approach.
The guidelines for future work need to be expanded to consider the disadvantages identified earlier. It is recommended that we explore the limitations of the proposed methodology in different application scenarios. Additionally, it would be interesting to investigate ways to mitigate or overcome these limitations, either by adapting the methodology or developing complementary techniques. These efforts can help enhance the applicability and performance of the proposed system in various practical situations.