Rapid CU Partitioning and Joint Intra-Frame Mode Decision Algorithm

Song, Wenjun; Li, Congxian; Zhang, Qiuwen

doi:10.3390/electronics13173465

Open AccessArticle

Rapid CU Partitioning and Joint Intra-Frame Mode Decision Algorithm

by

Wenjun Song

,

Congxian Li

^*

and

Qiuwen Zhang

College of Computer Science and Technology, Zhengzhou University of Light Industry, Zhengzhou 450002, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(17), 3465; https://doi.org/10.3390/electronics13173465

Submission received: 8 July 2024 / Revised: 23 August 2024 / Accepted: 26 August 2024 / Published: 31 August 2024

(This article belongs to the Special Issue Image and Video Processing and Retrieval Based on Machine Learning and Deep Learning)

Download

Browse Figures

Versions Notes

Abstract

:

H.266/Versatile Video Coding (VVC) introduces new techniques that build upon previous standards, proposing a nested multi-type tree quadtree (QTMT). The introduction of this structure significantly enhances video coding efficiency; additionally, the number of directional modes in H.266 has increased by 32 compared to H.265, accommodating a greater variety of texture patterns. However, the changes in the related structures have also led to a significant increase in encoding complexity. To address the issue of excessive computational complexity, this paper proposes a targeted rapid Coding Units segmenting approach combined with decision-making for an intra-frame modes algorithm. In the first phase of the algorithm, we extract different features for CU blocks of various sizes and input them into the decision tree model’s classifier for classification processing, determining the CU partitioning mode to prematurely terminate the partitioning, thereby reducing the encoding complexity to some extent. In the second phase of the algorithm, we put forward an intra-frame mode decision strategy grounded in gradient descent techniques with a bidirectional search mode. This maximizes the approach to the global optimum, thereby obtaining the optimal intra-frame mode and further reducing the encoding complexity. Experimentation has demonstrated that the algorithm achieves a 54.53% reduction in encoding time. In comparison, the BD-BR (Bitrate-Distortion Rate) only increases by 1.38%, striking an optimal balance between the fidelity of video and the efficacy of the encoding process.

Keywords:

DT; intra-frame mode; gradient descent

1. Introduction

With the widespread use of electronic technology products, consumers’ demands for video quality are increasingly high. The requirements for hardware resources are increasing, and the HEVC (High Efficiency Video Coding) standard can no longer meet the demands for high-performance storage coupled with rapid transmission of video content required by the current market. Based on this, the Joint Video Exploration Team (JVET) proposed a new video coding standard, evaluated in response to the Call for Proposals (CfP). In the new Versatile Video Coding (VVC) standard [1], various coding technologies have been improved, for example, larger prediction blocks have been introduced; to improve prediction accuracy, the MRL multi-reference line mode has been introduced. To better handle reference pixels, a new filtering technology (MDIS) has been used; more directionalities have been introduced, with angle prediction modes reaching 65 types. Inter-component prediction has been introduced, as well as smaller prediction and transform blocks, etc. To further eliminate spatial redundancy, the VVC coding standard introduces various new structures and techniques. Due to the introduction of these new technologies and structures, the VVC encoding complexity has increased sharply. Theoretically, the encoding sophistication of CUs partitioning in VVC intra-frame coding can be significantly reduced through various algorithms; designing a rapid CU partitioning algorithm to predict the optimal partitioning is the most efficient way to decrease the VVC encoding complication. Meanwhile, intra-frame prediction can effectively eliminate spatial redundancy, which is an important part of the coding standard. In intra-frame coding, the sophistication of VVC reaches 18 times that of HEVC, mostly due to the new technologies and various extended modes applied in VVC. Therefore, developing corresponding rapid encoding algorithms is very necessary.

In past research, a multitude of researchers have studied how to reduce the encoding sophistication in the new coding standard. Since CU segmenting occupies a significant amount of encoding time, most of the encoding research has focused on the CU partitioning part. There are also some studies dedicated to intra-frame mode decisions. In the study of CU partitioning, early research primarily focused on traditional methods and machine learning approaches: Refs. [2,3,4] still used the traditional method of selecting gradient variance as a feature, aiming to reduce complexity; Refs. [5,6,7] mostly used machine learning methods to predict CU blocks. In recent years, as deep learning methods have progressively advanced, further reduction in encoding complexity has been achieved, and more researchers are trying to integrate Convolutional Neural Networks (CNN) into new standard to predict the entire partition structure of CU, research using deep learning is broadly categorized into two types: one is end-to-end [8,9] deep learning algorithms, which generally adopt the method of inputting features, performing feature extraction, and then directly inputting them into the neural network. The other is the use of multiple convolutional neural networks stacked together, the most common being the utilization of several classifiers [10,11] to predict the best mode of different CU partitions at each level.

It is worth noting that most current research is theoretical, focusing on improving video encoding efficiency and quality. However, a more challenging task is to apply these algorithms to downstream video analysis tasks. Chen et al. [12] utilized a multi-attention network composed of a dual-path dual-attention module and a query-based cross-modal Transformer module. This approach eliminates the complex post-masking matching process found in existing methods, addressing the computational and storage challenges present in current video compression algorithms.

Additionally, the intra-frame modes in the new standard have increased from 35 in the previous standard to 67, including directional modes, as well as planar and DC modes. While achieving better video compression gain, it also brings an unparalleled increase in complexity. RD cost (Rate-Distortion cost) is a critical evaluation metric used to make optimal coding decisions during the encoding process. It integrates the impacts of encoding efficiency (Rate) and distortion (Distortion), typically calculated through the following steps:

J_{S A T D} = D_{H a d} + λ_{M O D E} \times R_{M O D E},

(1)

where

D_{H a d}

is the Hadamard transform.

R_{M O D E}

represents the number of bits generated by the transformation.

λ_{M O D E}

is the weighting factor used to balance the relationship between bitrate and distortion. In terms of intra-frame mode selection, Zhang et al. [13] perform a progressive rough mode search involving selectively examining and calculating the Hadamard cost of predictive modes within specific frames.

Due to the addition of the MTT partition structure in VVC, each CTU in VVC requires checking 5781 CU blocks during encoding, nearly 6000 more than the original HEVC standard, as our first-stage algorithm requires feature extraction and classification processing based on the size of the root CU, as shown in Figure 1. This illustrates the ratio of segmenting modes for CUs of scale ranging from 32 × 32 to 8 × 8 in VVC; it can be observed that as the size of the CU decreases, the proportion of NS (Not Split) gradually increases. We predict whether the CU is split in advance, skipping the RDO calculations for QT and MTT partitioning. The MTT mode occupies a higher proportion in the total partitioning modes, especially for CUs of 32 × 16 and 16 × 32. For these CUs, we can skip the MTT modes in other directions by predicting the splitting direction of the CU. The QT mode occupies a lower proportion in the total partitioning modes, so predicting QT partitioning in the algorithm can only reduce the encoding complexity to a limited extent, and may increase unnecessary losses in encoding performance.

In this paper, inspired by [14], we have incorporated a CU segmenting method based on the DT (decision tree) classifier into the one-dimensional gradient descent algorithm for joint decision-making in CU partitioning and intra-frame mode selection. Based on the analysis of the ratio of different modes in CUs of various scales in the above figure, we select different features for CUs of different scales to input into the DT classifier. This provides an important basis for the design of our first-stage algorithm. In the second phase, we use bidirectional search to select the optimal mode. The primary contributions of this paper can be outlined as follows:

An algorithm for partitioning CU blocks based on different CU sizes has been proposed, where different features are selected according to various CU sizes and input into our DT classifier. These tailor-made features will better facilitate CU partitioning, and finally, a confidence level is set for the terminal nodes of the decision tree; when the confidence level is below 0.9, the current CU is identified to be an unrecognized CU. This approach helps to prevent model overfitting while also enhancing the robustness of the model.

A method based on gradient descent has been proposed, which employs bidirectional search mode in the algorithm while setting the step size and initial search point. This approach maximizes the avoidance of local optima when solving optimization problems, striving to find the global optimum as much as possible.

The remainder of this paper is structured as follows: Section 2 introduces related work on VVC coding, mainly talking about the analysis of algorithms proposed by predecessors to reduce encoding complexity through CU partitioning and intra-frame coding. Section 3 discusses the partitioning structure under the VVC standard and proposes our rapid CU partitioning algorithm and rapid intra-frame coding algorithm. Section 4 outlines our experimental results. Lastly, Section 5 provides the paper’s conclusion.

2. Background and Related Works

In VVC, new structures are introduced where MTT divides CTUs into various types of CUs, allowing CUs to be rectangles with multiple aspect ratios. Consequently, VVC can perform separate QT, BT, and TT partitions, with BT and TT including both horizontal and vertical structures, respectively. Figure 2a demonstrates the four types of CUs in the MTT structure, and Figure 2b illustrates the partitioning rules of the CTU. It is evident from (b) that after using BT or TT partitioning, further partitioning with QT is not allowed. Although the VVC video coding standard is not widely applied, there have been many mature discussions in the field of theoretical research to date.

2.1. Rapid CU Partitioning Algorithm

In rapid algorithms, the most effective way to reduce encoding complexity is through rapid CU partitioning. In the VVC coding standard, research methods can generally be categorized into three divisions. The first type consists of traditional algorithms, which are similar to those used in HEVC. These studies mostly use different operators for gradient feature judgment, predicting the partitioning mode of CU in advance, thereby skipping some unnecessary partition determinations. Ref. [2] extracts relevant features using the Sobel operator, and in Ref. [3], variance and gradient of CU blocks were respectively extracted for feature extraction; Ref. [4] applies gradients to split mode decision-making. In recent years, there have also been other more innovative ways of feature extraction. Dong et al. [6] proposed a new feature extraction method, which specifically involves dividing the QTMT and modeling it as multiple sets of binary classification tasks, followed by proposing a cascading framework for partitioning decisions that can simplify the process of CU segmenting. Wu et al. [15] proposed a new framework, composed of hierarchical grids. This frame efficiently retrieves information for the CU partition and its sub-CUs, requiring only one step of inference based on the proposed framework. The second category involves using machine learning algorithms to simplify encoding complexity. Lei et al. [16] compared the Hadamard cost and RD cost, and early termination of the MTT partition is determined. Yang et al. [14] proposed a cascading decision structure for QT and MTT partitioning using a decision tree algorithm, and Zhang et al. [17] advocated a rapid coding unit segmenting method that combines Bayesian algorithms with an improved deblocking filter (IDBF).

As deep learning gradually evolves, more deep learning networks are being used in the VVC standard to further reduce encoding complexity. Feng et al. [18] proposed a Down–Up CNN structure, where the CNN structure is used for simulating partition mapping prediction. Zhao et al. [19] extracted spatiotemporal correlation features and applied them to deep convolutional networks, then used a probability-based model combined with an algorithm to choose the best candidate segment mode within the most favorable encoding depth. Huang et al. [20] suggested a deep learning method of rapid segmentation mode and direction mode decision method for intra-frame prediction under the VVC standard. Peng et al. [21] recommended a single CNN to predict all blocks, first proposing a new block-related partition decision (BDPD) framework, where block partitioning based on QTMT uses a partition homogeneity map (PHM) for representation. Wang et al. [22] also used deep learning technology to predict the content type of all CUs under VVC SCC content, then used differential pulse code modulation to terminate CU splitting prematurely to improve encoding speed. Tissier et al. [8] proposed a publicly available dataset of VVC frame partitioning that can train our deep learning models on various types of image content. Among them, while the CNN network is widely used, due to the complexity of its network, more researchers have pruned the CNN network and have also shown promising outcomes. Pakdaman et al. [23] used two forms of joint and individual adoption to simplify intra-frame encoding decisions, where LNN was used for prediction. Park et al. [10] used LNN to judge the statistical features of TT, predicting whether to skip partitioning in advance, further reducing redundancy. Chen et al. [24] suggested an uneven kernel based on LNN to extract features in partitioning and mode decision-making.

2.2. Method for Intra-Frame Mode Selection

In intra-frame mode judgment methods, most current improvements are still aimed at H.265/HEVC, with the first type focusing on features in the RMD (rate-distortion modeling) process. Zhang et al. [13] proposed a method based on gradients to decrease the coarse mode decision while optimizing the candidate modes for RD (Rate distortion). Jiang et al. [25] used the gradient direction, and each coding unit generates a gradient mode histogram to simplify the computational complexity of encoding. Another approach targets the RDO (rate-distortion optimization) process, and it is extremely time-consuming. Therefore, with the in-depth development of machine learning, machine learning models are also used to participate in intra-frame mode decisions. Hu et al. [26] formalized the problem of rapid mode decision as a Bayesian decision problem, and introduced discrete cosine transform coefficients for solving it. Dong et al. [6] proposed adaptive mode pruning during mode selection, introducing IBC (Intra Boundary Classification) and ISP (Intra Search Pruning), and then proposed a mode-dependent termination algorithm, reducing unnecessary prediction steps.

Due to the exponential increase in intra-frame modes in the VVC standard, algorithms from HEVC are difficult to directly port; hence, only a restricted amount of research has been conducted under the VVC standard. Yang et al. [14] designed optimal search points and determined search step sizes using a gradient descent algorithm to find the optimal mode. Zhang et al. [27] designed a classifier that inputs features of texture regions into a random forest model. The algorithm proposed by Guo et al. [28] directly establishes the correlation between intra-frame patterns and HOG bins, capable of handling extended intra-frame modes and employing rapid partition decisions using the HOG operator. Li et al. [29] proposed, based on the ensemble learning mode prediction termination method for intra-frame mode selection, calculating the probability that the current candidate mode is determined as the final mode, and reordering the candidate mode list based on the probability.

3. Proposed Algorithm

In the new standard, a new structure is adopted with up to six segmenting modes for CUs. The increase in CU partitioning modes brings diversity to the shapes of CU division, including not only squares but also rectangles. Therefore, when deciding on the optimal partitioning type for each layer of depth, it is necessary to determine not only whether to partition but also the specific type of partition. At the same time, angle prediction has increased to as many as 32 types, which, while offering an efficient and flexible partitioning structure, also results in significant encoding complexity. To reduce redundancy and lower encoding complexity, we have designed an algorithm composed of two stages: rapid CU partitioning and decision-making within a single frame. In the first stage of the algorithm, informative features are firstly selected to ascertain the CU split mode, and then corresponding algorithms are executed according to different CU block sizes. The second stage of the intra-frame mode judgment uses a gradient descent search algorithm, the Hadamard cost serving as the objective function.

3.1. Rapid CU Partitioning Based on DT Model

In this algorithm, we first extract features and then classify them based on effective features. Subsequently, these features are used to train DT (decision tree) classifier models corresponding to CU sizes to determine CU partitioning modes. When solving classification problems, we employ a tree-structured deep learning algorithm, choosing DT as our classifier, as it ensures good classification performance without increasing complexity; our algorithm has the particularity of binary classification. When looking for a classifier, we tried different classifier models, including SVM (support vector machine), DT, etc. DT only requires “yes or no” judgments in the prediction period, and DT can capture the non-linear relationships between features. In contrast, SVM is more difficult to use than DT, and although it has the advantage of high generalization performance, it may be affected by noise or outliers in many practical classification tasks, which can affect the classification hyperplane.

When the CU size is 32 × 32, we calculate the variance for five different partitioning schemes, identify the scheme with the highest variance, and adopt it as the current optimal scheme. When the CU size has other dimensions, we use the DT (decision tree) classifier model and adopt different features for different CU sizes. The DT only needs to make “yes or no” judgments during the prediction period. When the CU size is 128 × 128 and 64 × 64, the DT classifier is used to represent split and non-split; when the CU size is 16 × 16 and other smaller rectangular shapes, the corresponding effective features are used to classify horizontal partitioning and vertical partitioning. The flowchart illustrating the execution of the initial-stage algorithm is depicted in Figure 3.

First, we perform feature extraction, and our algorithm selects appropriate features based on different CU sizes. When the CU size is 128 × 128 and 64 × 64, we select the overall variance of the CU, the expected value and variance of the partition, and the QP value as features for training.

(1) Variance (

ε

): This feature is strongly correlated with the texture complexity of the CU. The higher the texture complexity, the greater the feature value, indicating that the present CU tends to prefer to be split into smaller CUs. Conversely, when the feature value is smaller, it indicates that the consistency of the present CUs is smoother, and when dividing the present CUs, larger CU blocks or no division is preferred. This can be represented as

ε = \sqrt{\frac{1}{w \times h} \sum_{x = 0}^{w - 1} \sum_{y = 0}^{h - 1} b {(x, y)}^{2} - {(\frac{1}{w \times h} \sum_{x = 0}^{w - 1} \sum_{y = 0}^{h - 1} b (x, y))}^{2}},

(2)

where w represents the width of the present CU block, h represents the height, and

b (x, y)

represents the luminance at the coordinate

(x, y)

of the current CU.

(2) Expected Value and Variance of the Partition (EVP): The absolute difference between the expected value and variance indicates whether there is a difference between the predicted parts of the CU above and below, left and right. As shown in Figure 4 below, these are the two aforementioned representations. The absolute difference between the expected value and variance is represented as

E_{v e r} = |E_{u} - E_{d}|, E_{h o r} = |E_{l} - E_{r}|,

(3)

V_{v e r} = |V_{u} - V_{d}|, V_{h o r} = |V_{l} - V_{r}|,

(4)

where

E_{v e r}

represents the vertical absolute differences of the expected values. And

E_{h o r}

represents the horizontal one.

V_{v e r}

and

V_{h o r}

represent the vertical and horizontal absolute differences of the variances.

E_{u}, E_{d}, E_{l},

and

E_{r}

are the expected values for the upper, lower, left, right partitions.

V_{u}, V_{d}, V_{l},

and V are the variances for the upper, lower, left, and right partitions, respectively. When the CU split is prematurely terminated, the variance and expected value are smaller, indicating that these two values can influence CU splitting. Therefore, we define the sum of the absolute differences of the expected values and variances of the four kinds of partitions as one of the features:

E_{s u m} = E_{v e r} + E_{h o r};

(5)

V_{s u m} = V_{v e r} + V_{h o r} .

(6)

(3) Quantization Parameter (QP) Value: The selection of the QP value also affects the size of the CU. As the value increases, the number of larger-sized CUs increases while the number of smaller-sized CUs decreases; here, we use all intra-frame configurations. Furthermore, four commonly used QP values were selected for experimentation. We selected the values 22, 27, 32, and 37 for experimentation. When the CU size is 32 × 16, 16 × 32, 16 × 16, 8 × 16, and 16 × 8, according to the MTT (Multi-Type Tree) structure, we have made corresponding improvements to the entropy and direction complexity features and applied them to the classifiers for training in the corresponding CU sizes.

(1) Entropy (E): Entropy accesses the extent of randomness of details in a video frame, and its calculation method is

η = - \sum_{k = 0}^{255} l (k) log l (k),

(7)

where

l (k)

is the k-th gray degree value.

Based on this calculation, the entropy variance of different partitions can be obtained, and the specific calculation is as follows:

Δ η_{B T} = |η_{B T H 0} - η_{B T H 1}| - |η_{B T V 0} - η_{B T V 1}|,

(8)

Δ η_{T T} = |η_{T T H 0} - η_{T T H 1}| + |η_{T T V 1} - η_{T T V 2}| - |η_{T T V 0} - η_{T T V 1}| - |η_{T T V 1} - η_{T T V 2}|,

(9)

where

|η_{B T H 0} - η_{B T H 1}|

is the entropy variance of

B T H

and

|η_{B T V 0} - η_{B T V 1}|

is the entropy variance of

B T V

. Accordingly,

|η_{T T H 0} - η_{T T H 1}| + |η_{T T V 1} - η_{T T V 2}|

is the entropy variance of

T T H

and

|η_{T T V 0} - η_{T T V 1}| - |η_{T T V 1} - η_{T T V 2}|

is the entropy variance of

T T V

.

(2) Directional Complexity Estimation (DCE): For estimating directional complexity, a gradient estimation method is used, where an operator is applied to calculate gradients in different directions of the CU, and gradients in four directions are selected for the complexity calculation. Unlike other methods of calculating directional complexity, instead of the Sobel operator, we use the Scharr operator, which can be considered an enhanced version of the Sobel operator, the specific calculation formula is as follows:

\begin{matrix} D C = \frac{1}{w \times h} \sum_{m = 0}^{w - 1} \sum_{n = 0}^{h - 1} [|G_{0^{\circ}}| + |G_{45^{\circ}}| + |G_{90^{\circ}}| + |G_{135^{\circ}}|] \end{matrix},

(10)

where from

G_{0^{\circ}}

to

G_{135^{\circ}}

desperately represent the four directions we selected, and, respectively, their calculation formulas are as follows:

G_{0^{\circ}} = [\begin{matrix} - 3 & - 10 & - 3 \\ 0 & 0 & 0 \\ 3 & 10 & 3 \end{matrix}] * L G_{45^{\circ}} = [\begin{matrix} - 10 & - 3 & 0 \\ - 3 & 0 & 3 \\ 0 & 3 & 10 \end{matrix}] * L

\begin{matrix} G_{90^{\circ}} = [\begin{matrix} - 3 & 0 & 3 \\ - 10 & 0 & 10 \\ - 3 & 0 & - 3 \end{matrix}] * L G_{135^{\circ}} = [\begin{matrix} 0 & - 3 & - 10 \\ 3 & 0 & - 3 \\ 10 & 3 & 0 \end{matrix}] * L \end{matrix},

(11)

where L is the matrix of luminous pixels; the formula is expressed as follows:

\begin{matrix} L = [\begin{matrix} p (m - 1, n - 1) & p (m - 1, n) & p (m - 1, n + 1) \\ p (m, n - 1) & p (m, n) & p (m, n + 1) \\ p (m + 1, n - 1) & p (m + 1, n - 1) & p (m + 1, n + 1) \end{matrix}] \end{matrix} .

(12)

For the selection of features, we used the F-score method to calculate the influence of different features on the classification to evaluate their contribution to the final classification results. At the same time, for these features, attention should be paid to the size of computational complexity to avoid excessive redundant computational overhead. Given a set of eigenvalues, each sample contains n features, and a label categorized as positive or negative; the specific formula is as follows:

\begin{matrix} F_{j} = \frac{{({\bar{x}}_{j}^{(+)} - {\bar{x}}_{j})}^{2} + {({\bar{x}}_{j}^{(-)} - {\bar{x}}_{j})}^{2}}{\frac{1}{P - 1} \sum_{i = 1}^{P} {(x_{i, j}^{(+)} - {\bar{x}}_{j}^{(+)})}^{2} + \frac{1}{N - 1} \sum_{i = 1}^{N} {(x_{i, j}^{(-)} - {\bar{x}}_{j}^{(-)})}^{2}} \end{matrix},

(13)

where

{\bar{x}}_{j}

denotes the mean value of the feature jth and

{\bar{x}}_{j}^{(+)}

is the positive mean of jth, belongs to class P, and is positive;

{\bar{x}}_{j}^{(-)}

is the negative mean of jth, belongs to class N, and is negative. And the

x_{i}

is the eigenvalue of jth.

Our algorithm sets split and non-split to plus or minus and horizontal and vertical splits to plus or minus, and we select four sequences with different resolutions to calculate the F-score value of the DT model. See Table 1.

When the CU size is 32 × 32, we use the QT structure or MTT for partitioning. We analyze the images in the dataset as shown in the following formula, and obtain five different variances of sub-CUs. We then take the mode with the largest variance among the fives as the optimal mode, where the formula of the variance in partitions can be represented as follows:

\begin{matrix} V_{Q T} = \frac{1}{4} {(\sum_{m = 1}^{4} (\frac{1}{w_{m} \times h_{m}} \sum_{i = 1}^{w_{n}} \sum_{j = 1}^{h_{n}} {(E (i, j) - E_{m})}^{2} - V_{Q}))}^{2} \end{matrix}

(14)

\begin{matrix} V_{B T H} = \frac{1}{2} {(\sum_{m = 1}^{2} (\frac{1}{w_{m} \times h_{m}} \sum_{i = 1}^{w_{n}} \sum_{j = 1}^{h_{n}} {(E (i, j) - E_{m})}^{2} - V_{B H}))}^{2} \end{matrix}

(15)

\begin{matrix} V_{B T V} = \frac{1}{2} (\sum_{m = 1}^{2} (\frac{1}{w_{m} \times h_{m}} \sum_{i = 1}^{w_{n}} \sum_{j = 1}^{h_{n}} {(E (i, j) - E_{m})}^{2} - V_{B V})) \end{matrix}

(16)

\begin{matrix} V_{T T H} = \frac{1}{3} (\sum_{m = 1}^{3} (\frac{1}{w_{m} \times h_{m}} \sum_{i = 1}^{w_{n}} \sum_{j = 1}^{h_{n}} {(E (i, j) - E_{m})}^{2} - V_{T H})) \end{matrix}

(17)

\begin{matrix} V_{T T V} = \frac{1}{3} (\sum_{m = 1}^{3} (\frac{1}{w_{m} \times h_{m}} \sum_{i = 1}^{w_{n}} \sum_{j = 1}^{h_{n}} {(E (i, j) - E_{m})}^{2} - V_{T H})) \end{matrix}

(18)

\begin{matrix} V_{M} = max (V_{Q T}, V_{B T H}, V_{B T V}, V_{T T H}, V_{T T V}) \end{matrix},

(19)

where

V_{Q T},

V_{B T H},

V_{B T V},

V_{T T H},

and

V_{T T V}

are the variances of the respective partitions that need to be calculated,

w_{m}

is the width of the present sub-CUs, and

h_{m}

is the height, respectively.

E_{m}

is the pixel mean of the current sub-CU,

V_{Q}

is the average variance of all CU blocks in the QT partition, and similarly,

V_{B H},

V_{B V},

V_{T H},

and

V_{T V}

are the average variance values of all sub-CU blocks in the BH, BV, TH, and TV partitions, respectively.

V_{M}

is the maximum value among the five partitions.

To enhance the robustness of the model, while also avoiding model overfitting, our algorithm sets a confidence level for the well-trained decision tree model. When the confidence level is less than 0.9, we perform a traditional full RDO search on the current CU.

3.2. Intra-Frame Decision Method Based on Bidirectional Gradient Search

In terms of angular, the new VVC standard has 32 additional modes compared to HEVC. Therefore, to decrease the algorithmic burden of mode prediction, the VTM encoder adopts the sequential decision-making process of RMD + MPM + RDO algorithms to achieve the final optimal prediction mode. However, during the RMD process, dozens of modes still need to be checked. Based on this, and considering that the prediction direction is continuously changing and is consistent with the change in motion direction, we advocate a method to minimize the number of RMD modes through a one-dimensional gradient descent search, which is able to significantly simplify the complexity of intra-frame coding.

Gradient descent is one of the most frequently utilized methods in unconstrained problems, which seeks to minimize the objective function by finding its derivative. This algorithm is the most widely used optimization algorithm in machine learning, and during its use, we mainly select and consider the objective function, initial search point, search pattern, search step size, and descent process. For the objective function, the clear choice is the Hadamard cost as the objective function for our algorithm here. For the initial search point, we need to choose the mode that the vast majority of blocks tend to select as the initial search point, as such a choice can more easily approach the global minimum; in other words, we need to at least obtain a better-performing local minimum. The search step size is a key factor affecting the results of our algorithm. A value that is too low will result in a local minimum, and on the contrary, too large a value will cause the gradient to disappear and ultimately fail to obtain the minimum point. In the following sections, we discuss the four key factors affecting the algorithm separately.

Objective Function: Hadamard Cost.
The Hadamard Cost can be understood here as a balance between computational resource consumption and compression efficiency when using the Hadamard transform in the encoding process compared to other transform methods. To achieve greater reduction in computational complexity, we adopt the Hadamard Cost as the objective function of our algorithm.
Initial Search Point: Reorder the MPM list according to the Hadamard cost, and select the option with the lowest cost as the initial mode for the next phase.
Based on the analysis above, our choice for the initial search point is to choose the mode that the majority of blocks tend to favor. Therefore, for the initial search point, the basis for our selection is shown in Figure 5 below, where the sequence of modes in Figure 5 is randomly selected for encoding at the following 4 QP values: 20, 25, 30, and 35. It is not difficult to find that more blocks choose the directional mode after calculating the MPM proportion in the total modes, as shown in Figure 6. Compared to the simple directional mode, the vast majority of CUs tend to select the mode from the MPM list as the optimum mode; therefore, we select the corresponding mode from the MPM list as the initial search point.
Search Mode: Bidirectional Search.
The search mode is a key factor in finding the global optimum; to avoid falling into a local optimum, we propose a dual search mode algorithm. This algorithm initiates two search processes at the starting and ending points; one searches forward, the other backward, with both conducting gradient descent searches. Based on the results of the two searches, two modes are obtained, which are compared, and the mode with the lower cost is selected. The use of this search mode is a natural form of parallel processing, which enables the algorithm to converge more quickly and further improves search efficiency.
Search Step Size: Adaptive Step Size Search.
In this paper, to achieve better RD performance, our algorithm searches within a range of $(M_{B e s t} \pm 2)$ , where $M_{B e s t}$ is the result with relatively lower costs obtained when left and right gradient descent are used. Based on the search results, the current best mode, $M_{c u r r e n t}$ , is determined. After obtaining this mode, a full RDO is performed on DC, Planar, and $M_{c u r r e n t}$ to arrive at the optimal mode that our algorithm, $M_{F i n a l}$ , ultimately selects.

3.3. Overall Frame

This paper proposes an algorithm that combines two-stage decision-making. The first stage is an algorithm proposed for rapid CU partitioning decisions, where we propose a corresponding decision tree model and conduct different feature analyses and extraction based on different CU sizes; applying different features to the decision tree for classification can achieve a better CU partitioning mode, and the first-stage algorithm can forecast the CU segmenting mode well, thereby reducing the explosive growth in computational complexity brought by the new VVC standard. The second stage is a rapid intra-frame mode judgment method for the selection of the best intra-frame prediction mode; when all adjoining blocks of the current encoding CU exist, our proposed algorithm is executed. If adjacent CUs do not exist, the initial tactic of MPM+RMD mode search is proposed. The algorithm uses the gradient descent algorithm, which is most commonly used for optimizing models, while also using a left-right bidirectional gradient search pattern for the search, greatly increasing the probability of finding the global optimum and avoiding the model falling into local optimal solutions. Then, a full RDO is performed, in which a full RDO is carried out among DC, Planar, and the current optimal model obtained, further selecting the optimal mode. These two algorithms can better achieve a compromise between coding performance and video quality; the specific algorithm flowchart can be seen in Figure 7 below.

4. Model Training Process and Experimental Setup

In this part, we assess the efficacy of our proposed rapid CU partitioning and intra-frame mode decision joint algorithm through experimental testing; complexity is assessed using Bjøntegaard delta bit rate (BD-BR) [30], and RD performance is evaluated using the TS metric. In Section 4.1, we assess the performance of the rapid CU partitioning algorithm based on DT, specifically observing the accuracy of the DT classifier. In Section 4.2, we evaluate the recall rate of intra-frame mode decision, specifically observing the accuracy of bidirectional pattern search in the gradient descent algorithm. In Section 4.3, we first select two types of methods, one representing artificial intelligence methods and the other representing traditional methods. A total of three advanced methods are taken for comparison with our overall algorithm. Subsequently, ablation experiments are conducted to assess the effectiveness of each of the two proposed algorithms separately. Our experiments are all carried out on the VTM-10.0 [31] software using the DIK2K dataset for the experiments, and they are run on an Intel(R) Core(TM) i5-7200U CPU @ 2.50 GHz 2.71 GHz platform in All-Intra configuration.

4.1. Performance of the Rapid CU Partitioning Algorithm Based on DT

In our algorithm, after obtaining the corresponding texture features considered for different CU sizes, our DT (decision tree) classifier model analyzes various features to identify the optimal partition configuration for the CU block. Since our classifier model is trained offline, the classification model does not increase the encoder’s loss.

We selected 6 sequences, which come from different categories, including different resolutions and frame rates; information about these sequences is shown in Table 2 below. The predictive accuracy of the DT (decision tree) classifier was assessed for these 6 sequences, as shown in Figure 8, which displays the predictive accuracy of these DT classifier models.

It can be observed from Figure 8 that the average accuracy of the majority of DT (decision tree) models can exceed 80%, and it is not difficult to find that the average accuracy of the classifiers for smaller-sized CUs is even higher, able to reach over 90%. Therefore, this indicates that the DT classifier model is effective for processing CU blocks.

4.2. Performance of the Gradient Descent Algorithm

The encoders using VTM-10.0 and the proposed encoder are used to encode them, respectively. The recall rate of the suggested intra-frame mode judgment is shown in Figure 9 below, where the horizontal axis represents QP value, and the vertical axis means recall rate of our algorithm, which can represent the accuracy of phase two of the algorithm in the second stage. The specific calculation is as follows:

\begin{matrix} R = \frac{T P}{T P + F N} \end{matrix},

(20)

where

T P

is the quantity of blocks where our proposed algorithm’s forecast intra-frame mode matches that of VTM-10.0 and

F N

is the number of blocks where our algorithm differs from the standard.

T P + F N

is the whole quantity of CU blocks. Therefore, the higher the recall rate value, the more blocks are hit, which can prove the better predictive accuracy of our algorithm.

We continue to use the 6 sequences from Section 4.1, and select 4 QP values that increase sequentially to test the recall rate of our algorithm. The obtained recall rates are shown in the following figure. It is not difficult to find from the rising trajectory of the curve that the recall rate of most sequences will slightly increase with the increase in QP. At the same time, it can be observed that even at the smallest QP value, the recall rate of our algorithm has reached over 80%. This result can, to some extent, indicate that our algorithm has found the global optimal solution, or at least it can indicate that our algorithm is already relatively close to the global optimal point we hope to find.

4.3. Overall Performance

Our proposed algorithm consists of two parts: a rapid CU partitioning algorithm based on DT (decision tree) and an intra-frame mode decision algorithm based on gradient descent. To evaluate the encoding efficiency of the algorithm more effectively, we chose TS and BD-BR as performance metrics, where TS represents the encoding time saving rate, capable of measuring performance after reducing complexity. BD-BR is used to measure the evaluation of RD (Rate-Distortion) performance.

\begin{matrix} T S = \frac{T_{p r o p o s e d} - T_{V T M 10.0}}{T_{V T M 10.0}} \times 100 % \end{matrix}

(21)

First, as shown in Table 3 below, our algorithm is compared with three of the most advanced algorithms. It is not difficult to see from Table 3 that our proposed algorithm has significantly reduced the complexity, with an average diminution reduction of 54.53% and a 1.38% increase in BD-BR.

4.4. Discussion of Algorithm Applications

Our algorithm consists of two parts: the first part is a fast CU partitioning algorithm based on a DT model, and the second part is an intra-frame mode decision algorithm. In this section, we will discuss their applicability, particularly in downstream tasks, by analyzing the algorithms across the following three aspects.

Task-driven encoding strategy.
When the algorithm is applied to different downstream tasks, CU partitioning can be adjusted according to the task requirements. For example, in object detection tasks, it may be necessary to preserve finer spatial details, so smaller CUs might be preferred to enhance spatial resolution. On the other hand, in action recognition tasks, temporal information may be more critical, so the importance of temporal information can be considered during CU partitioning, selecting an appropriate CU size and inter-frame correlation. The gradient descent-based intra-mode decision algorithm can also be optimized according to the demands of downstream tasks. For example, in tasks requiring precise edge detection, intra-frame modes with strong edge-preserving capabilities may be favored, whereas in other tasks, texture information or other features might be prioritized.
Coordinating encoding decisions with downstream tasks.
If a downstream task is particularly sensitive to specific image features (such as texture, motion information, edges, etc.), our algorithm can design an encoding strategy that prioritizes the preservation of these features. By using the DT-based fast CU partitioning, the partitioning strategy can be dynamically adjusted based on the importance of the task, thereby enhancing the quality of key feature retention. In scenarios where multiple downstream tasks need to be supported, a multimodal encoding strategy can be considered. This means that during video encoding, different CU partitioning and intra-frame mode decision strategies are employed according to the requirements of various tasks, generating multiple versions of the encoded video stream. Downstream tasks can then select the version that best suits their needs, achieving a balance between computational efficiency and task performance.
Dynamic Adjustment and Adaptive Encoding.
If the requirements of downstream tasks are dynamic (such as adjusting task priorities in real-time video analysis), the encoder can dynamically adjust CU partitioning and intra-frame mode selection based on downstream feedback. For example, the system can adjust the decision parameters in the DT algorithm in real time according to the feedback from downstream tasks, allowing the encoder to adaptively generate the optimal encoding results for the current task. In the second stage of the algorithm, gradient descent can be applied to frame-by-frame optimization, continuously refining the encoding strategy based on downstream task feedback. In this case, the encoding of each frame depends not only on the content of the current frame but also on the long-term needs of the task, thereby gradually improving encoding performance while maintaining real-time processing.

However, it is important to note that our algorithm is primarily designed to improve video encoding efficiency, particularly in terms of CU (Coding Unit) partitioning and intra-mode decision-making. It aims to reduce encoding complexity and enhance encoding quality while maintaining high-quality results in task-related scenarios. Therefore, our algorithm is more suited to real-time video encoding systems or scenarios that require efficient encoding with limited resources, such as real-time video transmission or video conferencing systems. On the other hand, if the primary focus is on downstream video analysis tasks, especially those that involve direct object segmentation in compressed video streams, such as video surveillance, object recognition, and tracking in autonomous driving, it is necessary to leverage the characteristics of compressed video to enhance the performance of downstream tasks, particularly in object segmentation tasks. In such applications, a commonly used approach is the “multi-attention network” for compressed video object segmentation. This work emphasizes improving the performance of downstream tasks, particularly object segmentation, by exploiting the features of compressed video. By operating in the compressed domain and utilizing information such as motion vectors and residuals from video encoding, segmentation accuracy and computational efficiency can be enhanced.

However, it is worth noting that our designed algorithm is primarily aimed at improving video encoding efficiency, particularly in the aspects of Coding Unit (CU) partitioning and intra-mode decision. The goal is to reduce the complexity of encoding while enhancing the quality of the encoded video, ensuring high-quality results in task-related scenarios. Therefore, our algorithm is more suited for application in real-time video encoding systems or scenarios that require efficient encoding under limited resources, such as real-time video streaming and video conferencing systems. In contrast, for downstream video analysis tasks—particularly those requiring direct object segmentation on compressed video streams, such as video surveillance, object recognition, and tracking in autonomous driving—leveraging the characteristics of compressed video can enhance the performance of these downstream tasks, especially in referring object segmentation.

Therefore, our algorithm is more suited for application in real-time video encoding systems or scenarios that require efficient encoding under limited resources, such as real-time video streaming and video conferencing systems. In contrast, for downstream video analysis tasks—particularly those requiring direct object segmentation on compressed video streams, such as video surveillance, object recognition, and tracking in autonomous driving—leveraging the characteristics of compressed video can enhance the performance of these downstream tasks, especially in referring object segmentation. In such cases, Ref. [12] not only addresses the inherent difficulties of video referring object segmentation by obtaining distinctive representations from compressed video but also avoids increasing computational and storage requirements. This approach enhances the speed of video compression and storage, representing a significant advancement in the field of application.

Overall, our algorithm focuses more on improving encoding efficiency and quality, while Ref. [12] focuses more on leveraging the characteristics of compressed video to enhance the performance of downstream tasks, particularly in object segmentation. Therefore, in future research, combining the strengths of both approaches could achieve better video encoding results.

The TS-BD-BR graph of the four algorithms drawn according to Table 3 is shown in Figure 10 below. Compared to Amna’s algorithm [32], our algorithm’s average BD-BR increased by only 0.58%. Contrasted with Zhao’s method [33], ours has an average TS that is 8.22% higher. Compared to Ni’s algorithm [34], our algorithm saves an additional 5.47% in TS.

Table 4 presents the results of the ablation study conducted on the two sub-methods. It can be observed from Table 4 that in the first-stage method, TS was reduced by 47.73%, while the increase in BD-BR is negligible. The first-stage algorithm can better perform CU partitioning and reduce RDO redundancy. In the second-stage algorithm, the average reduction in encoding time is 26.56%, and the increase in BD-BR remains negligible. These results demonstrate that the second-stage algorithm can search for the optimal intra-frame prediction mode, thereby significantly reducing encoding complexity.

5. Conclusions

In our paper, we have advocated algorithms in two stages; in the first stage, we proposed a rapid CU partitioning algorithm, where we extracted different effective features according to the size of the CU blocks and trained them separately in the DT (decision tree) classifiers. In the second stage of the algorithm, we suggested an intra-frame mode judgment method based on gradient descent, which further reduced the computational complexity while ensuring video quality. The results of our experiments indicate that in contrast to VTM-10.0, the proposed algorithm can reduce the computational complexity by about 54.53%, with an RD performance loss of 1.38%. Based on the results of our experiments, we found that our suggested algorithm is effective. Reducing the complexity of coding can significantly reduce the resources and time required, and this improvement will be especially important for real-time applications, which intuitively impacts the user experience. At the same time, it can reduce the energy consumption of computing devices, which is especially critical in scenarios such as mobile devices, edge computing devices, and data centers. With the continuous advancement of video encoding technology, new algorithms and optimization methods can reduce complexity while ensuring or even improving video quality. This has driven technological advancements in the field of video encoding, providing more efficient solutions for future developments.

Author Contributions

Conceptualization, W.S. and C.L.; methodology, W.S.; software, Q.Z.; validation, C.L. and Q.Z.; formal analysis, W.S.; investigation, W.S.; resources, Q.Z.; data curation, C.L.; writing—original draft preparation, C.L. and Q.Z.; writing—review and editing, W.S. and Q.Z.; visualization, W.S.; supervision, Q.Z.; project administration, Q.Z.; funding acquisition, Q.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China No. 61771432 and No. 61302118, the Basic Research Projects of Education Department of Henan No. 21zx003, the Key projects Natural Science Foundation of Henan 232300421150, Zhongyuan Science and Technology Innovation Leadership Program 244200510026, the Scientific and Technological Project of Henan Province 232102211014 and 232102211017, and the Postgraduate Education Reform and Quality Improvement Project of Henan Province YJS2023JC08.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bross, B.; Wang, Y.K.; Ye, Y.; Liu, S.; Chen, J.; Sullivan, G.J.; Ohm, J.R. Overview of the Versatile Video Coding (VVC) Standard and Its Applications. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 3736–3764. [Google Scholar] [CrossRef]
Fan, Y.; Chen, J.; Sun, H.; Katto, J.; Jing, M. A Fast QTMT Partition Decision Strategy for VVC Intra Prediction. IEEE Access 2020, 8, 107900–107911. [Google Scholar] [CrossRef]
Chen, J.; Sun, H.; Katto, J.; Zeng, X.; Fan, Y. Fast QTMT Partition Decision Algorithm in VVC Intra Coding Based on Variance and Gradient. In Proceedings of the 2019 IEEE Visual Communications and Image Processing (VCIP), Sydney, Australia, 1–4 December 2019; pp. 1–4. [Google Scholar] [CrossRef]
Cui, J.; Zhang, T.; Gu, C.; Zhang, X.; Ma, S. Gradient-Based Early Termination of CU Partition in VVC Intra Coding. In Proceedings of the 2020 Data Compression Conference (DCC), Snowbird, UT, USA, 24–27 March 2020; pp. 103–112. [Google Scholar] [CrossRef]
Amestoy, T.; Mercat, A.; Hamidouche, W.; Menard, D.; Bergeron, C. Tunable VVC Frame Partitioning Based on Lightweight Machine Learning. IEEE Trans. Image Process. 2020, 29, 1313–1328. [Google Scholar] [CrossRef]
Dong, X.; Shen, L.; Yu, M.; Yang, H. Fast Intra Mode Decision Algorithm for Versatile Video Coding. IEEE Trans. Multimed. 2022, 24, 400–414. [Google Scholar] [CrossRef]
Fu, T.; Zhang, H.; Mu, F.; Chen, H. Fast CU Partitioning Algorithm for H.266/VVC Intra-Frame Coding. In Proceedings of the 2019 IEEE International Conference on Multimedia and Expo (ICME), Shanghai, China, 8–12 July 2019; pp. 55–60. [Google Scholar] [CrossRef]
Tissier, A.; Hamidouche, W.; Mdalsi, S.B.D.; Vanne, J.; Galpin, F.; Menard, D. Machine Learning Based Efficient QT-MTT Partitioning Scheme for VVC Intra Encoders. IEEE Trans. Circuits Syst. Video Technol. 2023, 33, 4279–4293. [Google Scholar] [CrossRef]
Tech, G.; Pfaff, J.; Schwarz, H.; Helle, P.; Wieckowski, A.; Marpe, D.; Wiegand, T. Rate-Distortion-Time Cost Aware CNN Training for Fast VVC Intra-Picture Partitioning Decisions. In Proceedings of the 2021 Picture Coding Symposium (PCS), Bristol, UK, 29 June–2 July 2021; pp. 1–5. [Google Scholar] [CrossRef]
Park, S.h.; Kang, J.W. Fast Multi-Type Tree Partitioning for Versatile Video Coding Using a Lightweight Neural Network. IEEE Trans. Multimed. 2021, 23, 4388–4399. [Google Scholar] [CrossRef]
Li, T.; Xu, M.; Tang, R.; Chen, Y.; Xing, Q. DeepQTMT: A Deep Learning Approach for Fast QTMT-Based CU Partition of Intra-Mode VVC. IEEE Trans. Image Process. 2021, 30, 5377–5390. [Google Scholar] [CrossRef]
Chen, W.; Hong, D.; Qi, Y.; Han, Z.; Wang, S.; Qing, L.; Huang, Q.; Li, G. Multi-Attention Network for Compressed Video Referring Object Segmentation. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 4416–4425. [Google Scholar] [CrossRef]
Zhang, T.; Sun, M.T.; Zhao, D.; Gao, W. Fast Intra-Mode and CU Size Decision for HEVC. IEEE Trans. Circuits Syst. Video Technol. 2017, 27, 1714–1726. [Google Scholar] [CrossRef]
Yang, H.; Shen, L.; Dong, X.; Ding, Q.; An, P.; Jiang, G. Low-Complexity CTU Partition Structure Decision and Fast Intra Mode Decision for Versatile Video Coding. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 1668–1682. [Google Scholar] [CrossRef]
Wu, S.; Shi, J.; Chen, Z. HG-FCN: Hierarchical Grid Fully Convolutional Network for Fast VVC Intra Coding. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 5638–5649. [Google Scholar] [CrossRef]
Lei, M.; Luo, F.; Zhang, X.; Wang, S.; Ma, S. Look-Ahead Prediction Based Coding Unit Size Pruning for VVC Intra Coding. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 4120–4124. [Google Scholar] [CrossRef]
Zhang, Q.; Zhao, Y.; Jiang, B.; Wu, Q. Fast CU Partition Decision Method Based on Bayes and Improved De-Blocking Filter for H.266/VVC. IEEE Access 2021, 9, 70382–70391. [Google Scholar] [CrossRef]
Feng, A.; Liu, K.; Liu, D.; Li, L.; Wu, F. Partition Map Prediction for Fast Block Partitioning in VVC Intra-Frame Coding. IEEE Trans. Image Process. 2023, 32, 2237–2251. [Google Scholar] [CrossRef] [PubMed]
Zhao, T.; Huang, Y.; Feng, W.; Xu, Y.; Kwong, S. Efficient VVC Intra Prediction Based on Deep Feature Fusion and Probability Estimation. IEEE Trans. Multimed. 2023, 25, 6411–6421. [Google Scholar] [CrossRef]
Huang, Y.; Yu, J.; Wang, D.; Lu, X.; Dufaux, F.; Guo, H.; Zhu, C. Learning-Based Fast Splitting and Directional Mode Decision for VVC Intra Prediction. IEEE Trans. Broadcast. 2024, 70, 681–692. [Google Scholar] [CrossRef]
Peng, Z.; Shen, L.; Ding, Q.; Dong, X.; Zheng, L. Block-Dependent Partition Decision for Fast Intra Coding of VVC. IEEE Trans. Consum. Electron. 2024, 70, 277–289. [Google Scholar] [CrossRef]
Wang, D.; Yu, J.; Lu, X.; Dufaux, F.; Hang, B.; Guo, H.; Zhu, C. Fast Mode and CU Splitting Decision for Intra Prediction in VVC SCC. IEEE Trans. Broadcast. 2024, 1–12. [Google Scholar] [CrossRef]
Pakdaman, F.; Adelimanesh, M.A.; Hashemi, M.R. BLINC: Lightweight Bimodal Learning for Low-Complexity VVC Intra-Coding. J. Real-Time Image Process. 2022, 19, 791–807. [Google Scholar] [CrossRef]
Chen, Z.; Shi, J.; Li, W. Learned Fast HEVC Intra Coding. IEEE Trans. Image Process. 2020, 29, 5431–5446. [Google Scholar] [CrossRef]
Jiang, W.; Ma, H.; Chen, Y. Gradient Based Fast Mode Decision Algorithm for Intra Prediction in HEVC. In Proceedings of the 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), Yichang, China, 21–23 April 2012; pp. 1836–1840. [Google Scholar] [CrossRef]
Hu, N.; Yang, E.H. Fast Mode Selection for HEVC Intra-Frame Coding With Entropy Coding Refinement Based on a Transparent Composite Model. IEEE Trans. Circuits Syst. Video Technol. 2015, 25, 1521–1532. [Google Scholar] [CrossRef]
Zhang, Q.; Wang, Y.; Huang, L.; Jiang, B. Fast CU Partition and Intra Mode Decision Method for H.266/VVC. IEEE Access 2020, 8, 117539–117550. [Google Scholar] [CrossRef]
Gou, A.; Sun, H.; Liu, C.; Zeng, X.; Fan, Y. A Novel Fast Intra Algorithm for VVC Based on Histogram of Oriented Gradient. J. Vis. Commun. Image Represent. 2023, 95, 103888. [Google Scholar] [CrossRef]
Li, Y.; He, Z.; Zhang, Q. Fast Decision-Tree-Based Series Partitioning and Mode Prediction Termination Algorithm for H.266/VVC. Electronics 2024, 13, 1250. [Google Scholar] [CrossRef]
Bjontegaard, G. Calculation of Average PSNR Differences between RD-Curves. ITU SG16 Doc. VCEG-M33. 2001. Available online: https://cir.nii.ac.jp/crid/1571980074917801984 (accessed on 20 August 2024).
VTM-10.0 · jvet/VVCSoftware_VTM · GitLab. Available online: https://vcgit.hhi.fraunhofer.de/jvet/VVCSoftwareVTM/-/releases/VTM-10.0 (accessed on 20 August 2024).
Amna, M.; Imen, W.; Fatma Ezahra, S. Fast Multi-Type Tree Partitioning for Versatile Video Coding Using Machine Learning. Signal, Image Video Process. 2023, 17, 67–74. [Google Scholar] [CrossRef]
Zhao, J.; Wu, A.; Jiang, B.; Zhang, Q. ResNet-Based Fast CU Partition Decision Algorithm for VVC. IEEE Access 2022, 10, 100337–100347. [Google Scholar] [CrossRef]
Ni, C.T.; Lin, S.H.; Chen, P.Y.; Chu, Y.T. High Efficiency Intra CU Partition and Mode Decision Method for VVC. IEEE Access 2022, 10, 77759–77771. [Google Scholar] [CrossRef]

Figure 1. Proportion of different modes in CUs of various sizes.

Figure 2. CTU partition structure and partitioning rules. (a) CTU partition structure; (b) CTU partitioning rules.

Figure 3. Flowchart of the first-stage algorithm.

Figure 4. Examples of Up, Down, Left, and Right CU partitioning.

Figure 5. Proportion of different modes in the sequence.

Figure 6. Comparison between directional modes and MPM list modes.

Figure 7. Overall algorithm flowchart.

Figure 8. DT classifier accuracy chart for different CUs.

Figure 9. Intra-frame mode decision algorithm recall rate.

Figure 10. TS-BD-BR Comparison chart of various algorithms.

Table 1. F-score value.

Feature	Feature Scores for Different Sequences
Feature	BQMall	Campfire	Johnny	RaceHorses
$ε$	0.102	0.133	0.084	0.124
$E_{s u m}$	0.121	0.095	0.102	0.095
$V_{s u m}$	0.127	0.103	0.095	0.098
QP	0.089	0.101	0.087	0.084
$Δ η_{B T}$	0.103	0.064	0.078	0.052
$Δ η_{T T}$	0.099	0.071	0.079	0.054
DCE	0.046	0.083	0.096	0.058

Table 2. Sequence Information.

Class	TestSequence	Resolution	Frame Count	Frame Rate	Bit Depth
A1	Campfire	3840 × 2160	300	30 fps	10
A2	DaylightRoad2	3840 × 2160	300	60 fps	10
B	BasketballDrive	1920 × 1080	500	50 fps	8
C	BQMall	832 × 480	600	60 fps	8
D	RaceHorses	416 × 240	300	30 fps	8
E	Johnny	1280 × 720	600	60 fps	8

Table 3. Result of the proposed algorithm is compared with three others.

Class	TestSequence	Amna [32]		Zhao [33]		Ni [34]		Proposed
Class	TestSequence	BDBR	TS	BDBR	TS	BDBR	TS	BDBR	TS
A1	Campfire	0.49	48.12	1.55	52.32	0.53	52.57	1.76	60.68
	Drums	—	—	—	—	—	—	1.64	58.72
	FoodMarket4	0.16	42.85	1.53	50.08	0.44	58.89	1.77	59.21
A2	TrafficFlow	—	—	—	—	—	—	1.76	58.37
	DaylightRoad	0.80	48.70	—	—	—	—	1.67	57.12
	ParkRunning3	0.54	50.28	1.42	47.62	0.21	46.07	1.69	57.93
B	BasketballDrive	0.67	47.12	1.58	43.31	0.55	47.36	1.75	54.26
	BQTerrace	0.89	44.65	0.84	47.58	0.46	48.03	1.31	53.49
	Cactus	0.79	47.19	1.28	44.03	0.55	46.12	1.36	56.91
	MarketPlace	—	—	—	—	0.30	46.15	0.74	54.71
	RitualDance	—	—	—	—	0.65	51.65	0.89	55.85
C	BasketballDrill	1.31	48.34	1.27	44.15	0.96	46.37	1.58	50.24
	BQMall	1.06	45.15	1.11	47.37	0.69	50.00	0.81	52.57
	PartyScene	0.62	45.11	0.78	46.37	0.42	46.01	0.62	47.68
	RaceHorsesC	0.79	47.55	0.84	47.58	0.47	48.66	0.75	51.93
D	BasketballPass	1.01	43.69	1.34	38.31	0.70	50.50	1.46	50.32
	BlowingBubbles	0.61	45.7	0.93	44.29	0.51	45.16	1.23	47.64
	BQSquare	0.53	43.72	0.82	46.65	0.67	49.10	1.41	49.91
	RaceHorses	—	—	1.12	39.46	0.47	48.66	1.09	50.79
E	FourPeople	1.2	47.91	1.35	48.15	0.82	49.96	1.78	56.37
	Johnny	1.06	47.68	1.67	51.60	0.76	52.42	1.73	57.68
	KristenAndSara	0.98	44.98	1.49	48.42	0.70	51.42	1.65	57.26
Average		0.80	46.54	1.24	46.31	0.57	49.06	1.38	54.53

Table 4. Sub-algorithms at various stages of our proposed algorithm.

Class	TestSequence	1		2		Proposed
Class	TestSequence	BDBR	TS	BDBR	TS	BDBR	TS
A1	Campfire	1.67	44.29	0.38	26.82	1.76	60.68
	Drums	1.51	51.96	0.57	28.51	1.64	58.72
	FoodMarket4	1.69	46.67	0.29	21.73	1.77	59.21
A2	TrafficFlow	1.42	52.1	0.42	25.96	1.76	58.37
	DaylightRoad	1.29	54.09	0.25	29.87	1.67	57.12
	ParkRunning3	1.35	50.78	0.37	27.25	1.69	57.93
B	BasketballDrive	1.65	50.34	0.59	24.23	1.75	54.26
	BQTerrace	1.29	49.88	0.47	28.55	1.31	53.49
	Cactus	1.31	50.02	0.61	27.09	1.36	56.91
	MarketPlace	0.41	49.86	0.56	24.62	0.74	54.71
	RitualDance	0.72	50.96	0.39	20.17	0.89	55.85
C	BasketballDrill	1.02	47.93	0.65	29.76	1.58	50.24
	BQMall	0.75	43.21	0.73	27.93	0.81	52.57
	PartyScene	0.43	44.59	0.56	26.36	0.62	47.68
	RaceHorsesC	0.68	42.07	0.62	25.47	0.75	51.93
D	BasketballPass	1.41	41.35	0.52	25.61	1.46	50.32
	BlowingBubbles	1.13	40.28	0.66	29.32	1.23	47.64
	BQSquare	1.32	43.67	0.70	27.98	1.41	49.91
	RaceHorses	1.03	43.91	0.64	27.53	1.09	50.79
E	FourPeople	1.59	53.28	0.49	24.02	1.78	56.37
	Johnny	1.64	50.11	0.83	29.68	1.73	57.68
	KristenAndSara	1.52	48.62	0.67	25.90	1.65	57.26
Average		1.22	47.73	0.54	26.56	1.38	54.53

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, W.; Li, C.; Zhang, Q. Rapid CU Partitioning and Joint Intra-Frame Mode Decision Algorithm. Electronics 2024, 13, 3465. https://doi.org/10.3390/electronics13173465

AMA Style

Song W, Li C, Zhang Q. Rapid CU Partitioning and Joint Intra-Frame Mode Decision Algorithm. Electronics. 2024; 13(17):3465. https://doi.org/10.3390/electronics13173465

Chicago/Turabian Style

Song, Wenjun, Congxian Li, and Qiuwen Zhang. 2024. "Rapid CU Partitioning and Joint Intra-Frame Mode Decision Algorithm" Electronics 13, no. 17: 3465. https://doi.org/10.3390/electronics13173465

APA Style

Song, W., Li, C., & Zhang, Q. (2024). Rapid CU Partitioning and Joint Intra-Frame Mode Decision Algorithm. Electronics, 13(17), 3465. https://doi.org/10.3390/electronics13173465

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Rapid CU Partitioning and Joint Intra-Frame Mode Decision Algorithm

Abstract

1. Introduction

2. Background and Related Works

2.1. Rapid CU Partitioning Algorithm

2.2. Method for Intra-Frame Mode Selection

3. Proposed Algorithm

3.1. Rapid CU Partitioning Based on DT Model

3.2. Intra-Frame Decision Method Based on Bidirectional Gradient Search

3.3. Overall Frame

4. Model Training Process and Experimental Setup

4.1. Performance of the Rapid CU Partitioning Algorithm Based on DT

4.2. Performance of the Gradient Descent Algorithm

4.3. Overall Performance

4.4. Discussion of Algorithm Applications

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI