High-Fold 3D Gaussian Splatting Model Pruning Method Assisted by Opacity

Qiu, Shiyu; Wu, Chunlei; Wan, Zhenghao; Tong, Siyuan

doi:10.3390/app15031535

Open AccessArticle

High-Fold 3D Gaussian Splatting Model Pruning Method Assisted by Opacity

by

Shiyu Qiu

,

Chunlei Wu

^*,

Zhenghao Wan

and

Siyuan Tong

Qingdao Institute of Software, College of Computer Science and Technology, China University of Petroleum (East China), Qingdao 266500, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1535; https://doi.org/10.3390/app15031535

Submission received: 23 December 2024 / Revised: 26 January 2025 / Accepted: 31 January 2025 / Published: 3 February 2025

(This article belongs to the Special Issue Technical Advances in 3D Reconstruction)

Download

Browse Figures

Versions Notes

Abstract

:

Recent advancements in 3D scene representation have underscored the potential of Neural Radiance Fields (NeRFs) for producing high-fidelity renderings of complex scenes. However, NeRFs are hindered by the significant computational burden of volumetric rendering. To address this, 3D Gaussian Splatting (3DGS) has emerged as an efficient alternative, utilizing Gaussian-based representations and rasterization techniques to achieve faster rendering speeds without sacrificing image quality. Despite these advantages, the large number of Gaussian points and associated internal parameters result in high storage demands. To address this challenge, we propose a pruning strategy applied during the Gaussian densification and pruning phases. Our approach integrates learnable Gaussian masks with a contribution-based pruning mechanism, further enhanced by an opacity update strategy to facilitate the pruning process. This method effectively eliminates redundant Gaussian points and those with minimal contributions to scene construction. Additionally, during the Gaussian parameter compression phase, we employ a combination of teacher–student models and vector quantization to compress the spherical harmonic coefficients. Extensive experimental results demonstrate that our approach reduces the storage requirements of original 3D Gaussian models by over 30 times, with only a minor degradation in rendering quality.

Keywords:

3D gaussian splatting; novel view synthesis; 3D compression

1. Introduction

In recent years, significant breakthroughs have been made in 3D scene reconstruction techniques, particularly in the area of radiance fields, with Neural Radiance Field (NeRF) [1] being a pivotal development. NeRF implicitly represents and learns the radiance field of a 3D scene through a multilayer perceptron (MLP), achieving high-quality reconstruction by utilizing images captured from multiple viewpoints. Compared to other traditional 3D scene reconstruction methods, the high-quality images rendered by NeRF highlight the advantages of differentiable radiance fields in scene representation. However, NeRF’s slow training and rendering speed have become a significant bottleneck in the practical application of radiance field models. Although various methods [2,3,4,5,6] have been continuously proposed for accelerating the training and rendering process, they often come at the cost of depleting the reconstruction accuracy.

With advancements in 3D reconstruction technology, the 3D Gaussian Splatting model (3DGS) [7] has been proposed. Unlike discrete point cloud models, the 3DGS model is differentiable across the scene due to the properties of 3D Gaussian points, allowing for higher-quality rendering after scene construction. Compared to NeRF, the 3DGS model avoids unnecessary computations in empty space and leverages rasterization techniques to quickly project and render 3D Gaussians onto a 2D image, significantly enhancing training and rendering speed. Despite using a completely different data structure, the 3DGS model retains the properties of a differentiable radiance field, allowing for high-quality scene rendering akin to NeRF. In the realm of Neural Radiance Fields (NeRF) methods, InstantNGP [8] achieves significantly high training speed, while MIP-NeRF 360 [9] attains outstanding scene reconstruction accuracy. However, as an innovative scene representation method, the 3DGS model not only rivals the state-of-the-art NeRF methods in terms of speed and accuracy but also supports near real-time rendering. 3DGS has seen significant advancements across various research domains. The 3D Gaussian Splatting model has broad applicability across multiple domains, including Simultaneous Localization and Mapping (SLAM) [10,11], dynamic scene reconstruction [12,13,14], Artificial Intelligence-Generated Content (AIGC) [15], autonomous driving [16], endoscopic scene reconstruction [17], and largescale scene modeling [18]. Due to its excellent performance and extensibility, 3D Gaussian Splatting modeling has quickly become a prominent branch in 3D reconstruction.

However, the 3D Gaussian Splatting method is not without flaws. Its model parameter count is significantly larger than that of NeRF, which allows for high-quality reconstruction but at the expense of storage requirements. This presents a considerable limitation for resource-constrained applications such as AR/VR headsets and IoT devices. Therefore, it is crucial to investigate methods for compressing 3DGS models to reduce storage requirements while maintaining rendering quality.

Before conducting compression studies, it is essential to understand the model’s storage format. 3DGS models primarily represent scenes explicitly by training a large number of parameterized 3D Gaussian points. Each Gaussian point consists of 59 parameters, including such as position, rotation quaternion, covariance matrix, opacity, and spherical harmonic coefficients. During model training, the 3DGS model first performs Structure from Motion (SfM) [19] based on training datasets to estimate camera positions and the initial point cloud, initializing a small number of 3D Gaussians based on the sparse point cloud. Throughout training, the original Gaussian points continuously clones and splits to generate new Gaussian points with numerous attribute parameters, updating these parameters to optimize the model’s fit to the scene. However, this process inevitably generates redundant Gaussians. Although the inherent Gaussian pruning method in the 3D Gaussian Splatting model limits the number of Gaussians to some extent, our experiments have shown that inherent pruning is incomplete, leaving a substantial number of redundant Gaussian points.

Some existing research works, such as Compact3DGS [20] and LightGaussian [21], have contributed to the compression of 3D Gaussian models by implementing pruning and compression techniques. These methods, when handling under-reconstruction rendering regions, often rely on the cloning approach of the original 3D Gaussians Splatting to improve the efficiency of the Gaussian distribution. However, this full-parameter cloning approach does not specifically address the issue of managing redundancy in high-density areas with large local variations or maintaining rendering quality. When dealing with localized scene variations caused by scale or rotation properties, the traditional cloning method merely copies the parameters related to spherical harmonic coefficients without adapting to these changes. According to the rendering principle of the splatting method, since the Gaussian points are purely cloned, during the splatting stage along a certain ray, the Gaussian points will overlap, causing the front-order opacity weight to be too large, thus causing color errors. In contrast, our opacity update technique dynamically adjusts the spherical harmonic coefficients of newly generated Gaussian points, ensuring that the rendering quality is preserved by maintaining the consistency of pixel-wise renderings.

Based on the above characterization of the storage structure of the 3DGS model, it is evident that, aside from a small amount of data for constructing the global scene, the storage size is primarily determined by the product of the number of Gaussian points and the number of parameters within each point. Therefore, to effectively reduce the model size, two approaches can be pursued: (a) prune redundant Gaussians to reduce the number of Gaussian points, and (b) compress Gaussian parameter representations to shrink the product of the Gaussian model. Inspired by existing work [20,21,22,23], we improve the applicable 3DGS model compression method. The main contributions of this paper are as follows:

Propose a principle-level opacity-updating strategy to assist in pruning redundant Gaussian points;
Propose a proportional pruning method of Gaussian contribution values combined with trainable Gaussian masks;
Validate the effectiveness of a method combining teacher–student models and vector quantization for compressing spherical harmonic coefficients.

By employing these effective 3D Gaussian model compression methods, the memory requirement of the Gaussian model can be significantly reduced, which is of great significance for the application of the 3D Gaussian model in real scenarios.

2. Related Work

2.1. Three-Dimensional Gaussian Splatting

Three-dimensional Gaussian Splatting (3DGS) [7] is an emerging technique for scene representation and novel view synthesis, enabling high-quality, photorealistic renderings. It reconstructs complex 3D scenes by representing them with a set of parameterized 3D Gaussian points. These Gaussian points include parameters such as position, covariance matrix, opacity, and spherical harmonic coefficients that represent colors in 3D space. Three-dimensional Gaussian models learn these parameters from images captured from multiple viewpoints, allowing for fast and efficient scene rendering. A growing number of research has explored the integration of 3D Gaussian Splatting in various applications, demonstrating its versatility and effectiveness across different domains. For example, Matsuki et al. [10] integrated 3D Gaussian Splatting into monocular SLAM systems, achieving efficient tracking, mapping, and rendering through a unified representation. Wu et al. [12] introduced 4D Gaussian Splatting, modeling Gaussian motion and shape changes to enable efficient rendering of dynamic scenes. Zhang et al. [24] proposed Pixel-GS, which incorporates pixel-aware gradients to control density, effectively reducing artifacts and enhancing rendering quality. DreamGaussian [25] developed a model capable of generating novel view rendering from arbitrary angles based on a single image. Furthermore, the explicit representation of 3D Gaussian Splatting facilitates direct editing of 3D scenes in the parameter space, paving the way for advanced scene manipulation techniques. Some works [26,27] have made some significant advancements in the field of editable 3D Gaussian models. Therefore, the lightweight design of 3D Gaussian Splatting models plays a crucial role in advancing practical applications across various domains. To reduce storage requirements, we adopt the 3D Gaussian Splatting model as the foundation, achieving compression through pruning redundant Gaussian points and optimizing the representation of Gaussian parameters.

2.2. Knowledge Distillation

Knowledge distillation (KD) [28] is a widely adopted and effective model comprehension technique proposed to transfer knowledge from a larger, high-capacity teacher model to a smaller, efficient student model while trying to maintain predictive accuracy. By utilizing the soft labels generated by the teacher model, the student model learns complex patterns and generalizations, enabling significant reductions in storage and computational requirements [29]. In recent years, KD has been extensively applied across various domains. There is a lot of in-depth work on network structure and framework [30,31,32]. In 3D scene representation, Dutt et al. [33] proposed a multiview 3D reconstruction framework using knowledge distillation, where a Vision Transformer-based student model replicates the performance of the Dust3r teacher model while requiring significantly less computation and memory. Lightgaussian [21] utilized KD to compress spherical harmonic coefficients in 3D Gaussian Splatting models, achieving lightweight representations while preserving rendering fidelity. Taking inspiration from LightGaussian, we incorporated a teacher–student training method with randomly perturbed into our compression process, reducing the degree of spherical harmonic coefficients from 3 to 2, thereby reducing the storage volume.

2.3. Vector Quantization

Vector Quantization (VQ) [34,35] is a widely used data compression technique applied in various domains, such as image compression, speech coding, and neural network model compression. It compresses data by mapping a large number of data vectors to a limited set of vector codebooks. The core idea is to replace the input vectors with their nearest neighbors from a pre-defined codebook, effectively reducing data storage requirements. Training the codebook ensures that it generalizes well to represent the original data [36]. Makhoul et al. [37] explored the application of vector quantization in speech coding, proposing techniques to enhance audio quality at low bitrates. Navaneet et al. [20,21,38] introduced a vector quantization method to optimize the memory efficiency of 3D scene rendering models, achieving reduced storage with high visual fidelity. Current works have applied vector quantization to compress the attribute parameters of 3D Gaussian models, demonstrating its feasibility. In this paper, we employ vector quantization only on the spherical harmonic coefficients.

3. Methods

The work in this paper is divided into two parts: Gaussian quantity compression and Gaussian spherical harmonic coefficient compression. The overview is shown in Figure 1. Firstly, a Structure from Motion (SfM) operation is performed on the training dataset to initialize the 3D Gaussian points. During the densification and pruning stage, unlike previous strategies that reset opacity at intervals, our model applies a new opacity-updating strategy during the Gaussian point cloning and splitting process. This approach is more consistent with the rendering principles and assists in the pruning of Gaussian points. In the pruning stage, redundant Gaussians are masked using a learnable Gaussian mask obtained from training. Then, the contribution value of each Gaussian point is calculated, and less influential redundant Gaussian points are pruned proportionally. Finally, During the spherical harmonic coefficient compression stage, knowledge distillation is employed to allow the student model to learn color features at a smaller spherical harmonic degree. Additionally, new view renderings generated by the high-precision teacher model guide the training of the student model, enabling it to learn richer knowledge. The new views are obtained by adding random average perturbations to the original views from the training dataset. Then, vector quantization is performed on the compressed spherical harmonic coefficients to further compress the color features, resulting in the final complete compressed model.

3.1. Gaussian Quantity Compression

3.1.1. Opacity Update Module

When a 3D Gaussian Splatting model performs cloning and splitting during the densification phase, the new Gaussian points inherit the opacity of the original Gaussian points. According to the alpha blending principle of rendering, this direct numerical replication can introduce errors in opacity. Especially when fitting a local area with complex colors, reconstruction of this area often requires a large number of Gaussian points to participate in cloning and splitting, resulting in an excessive density of Gaussian points in the local space. This high-density space will aggravate the opacity error during the training iteration process. Inspired by the explanation of the Gaussian point cloning principle in [39], we introduce the updated opacity formula from a new perspective.

As shown in Figure 2 left part, suppose a ray of light passes sequentially through two Gaussian points,

g_{1}

and

g_{2}

, during rendering. The opacity of Gaussian point

g_{1}

is

α_{1}

and its color is

c_{1}

, while the opacity of Gaussian point

g_{2}

is

α_{2}

and its color is

c_{2}

. According to the alpha blending logic, the opacity parameter is multiplied by the color parameter to weight the color. The Gaussian point further along the ray is affected not only by its own opacity but also by the product of its opacity and the difference between 1 and the total opacity of the preceding points. Therefore, before cloning the Gaussian point, take the left schematic in Figure 2 as an example. The color of this ray rendered to a pixel is calculated as follows:

C = α_{1} \cdot c_{1} + (1 - α_{1}) \cdot α_{2} \cdot c_{2}

(1)

Suppose a new Gaussian point

g_{1}^{'}

is cloned from the Gaussian point

g_{1}

along the direction of the ray, resulting in the ray passing through three Gaussian points. Following the original cloning method, as illustrated in the upper right schematic of Figure 2, except for the position, the new and old Gaussian parameters are the same. At this point,

α_{1} = α_{1}^{'}

,

c_{1} = c_{1}^{'}

. Although the Gaussian point

g_{2}

remains unchanged, its color weight along the ray will decrease due to the increased number of preceding Gaussian points. The color blending formula in this case is:

C = α_{1} \cdot c_{1} + (1 - α_{1}) \cdot α_{1}^{'} \cdot c_{1}^{'} + [1 - α_{1} - (1 - α_{1}) \cdot α_{1}^{'}] \cdot α_{2} \cdot c_{2}

(2)

Due to the opacity

α \in [0, 1]

, in this primitive cloning method, the coefficient relationship between the color parameters of the before and after Gaussian points

g_{2}

is:

[1 - α_{1} - (1 - α_{1}) \cdot α_{1}^{'}] \cdot α_{2} \leq (1 - α_{1}) \cdot α_{2}

(3)

It can be seen that, although the Gaussian point

g_{2}

is not cloned, it is still affected by other Gaussian points during rendering. After cloning Gaussian point

g_{1}

, it has less weight in the calculation of color. From a theoretical perspective, the primary purpose of cloning new Gaussian points

g_{1}^{'}

is to correct deviations in scale and position. However, simply copying the opacity can lead to cumulative errors in rendering along the ray. To address this issue, this paper introduces an opacity-updating strategy based on the alpha blending principle during the cloning process. This ensures that when rendering Gaussian points, the opacity weight after superimposing the original Gaussian point and the new Gaussian point is consistent with the opacity weight of the original Gaussian point before cloning, thereby eliminating the adverse effects on subsequent Gaussian point rendering.

Taking the lower right diagram of Figure 2 as an example, assume that the opacities of the clones after update are

{\hat{α}}_{1}

and

{\hat{α}}_{1}^{'}

, where

{\hat{α}}_{1} = {\hat{α}}_{1}^{'} \neq α_{1}

,

c_{1} = c_{1}^{'}

. The rendering formula is:

C = {\hat{α}}_{1} \cdot c_{1} + (1 - {\hat{α}}_{1}) \cdot {\hat{α}}_{1}^{'} \cdot c_{1}^{'} + [1 - {\hat{α}}_{1} - (1 - {\hat{α}}_{1}) \cdot {\hat{α}}_{1}^{'}] \cdot α_{2} \cdot c_{2}

(4)

To make the color weight of the Gaussian point

g_{2}

unchanged, we have to make

[1 - {\hat{α}}_{1} - (1 - {\hat{α}}_{1}) \cdot {\hat{α}}_{1}^{'}] \cdot α_{2} = (1 - α_{1}) \cdot α_{2}

(5)

Since

α

is non-negative, after simplification, the updated opacity can be derived as:

{\hat{α}}_{1} = 1 - \sqrt{1 - α_{1}}

(6)

Because Gaussian points usually change their covariance parameters when split during reconstruction, the original method is still used for splitting.

Due to the inevitable impact of model compression on model quality, in order to reduce the loss of rendering quality caused by model compression, we use this opacity update strategy in all processes involving Gaussian point cloning.

3.1.2. Proportional Contribution Pruning with Learnable Masks

Redundant Gaussians can be mainly classified into two categories: a. Noisy Gaussian points that negatively impact scene reconstruction; and b. Gaussian points with minimal opacity or scale, and have little effect on the scene reconstruction. Although the pruning strategy of the original 3DGS model can control the number of Gaussian points to a certain extent, experiments have shown that this approach is incomplete, and the redundant Gaussian points still take up a large proportion of the model. If it can be achieved that only Gaussian points that are favorable for reconstruction are retained, the model storage space can be drastically reduced. Therefore, if a learnable mask can be trained to mask the redundant Gaussians, it will help the pruning greatly. Common masking efforts tend to be binary number masks, and directly training a binary mask results in a gradient of 0 almost everywhere, making it impossible to train via a gradient descent algorithm. However, there have been a large number of studies on the gradient of binary networks [40,41,42,43] inspired by these approaches. We design a proportional contribution pruning with learnable masks to reduce the number of Gaussian points. This mask primarily serves to filter out redundant Gaussians, allowing for their removal at the end of the process. Quantizing the contribution of Gaussian points helps to retain the effective Gaussian points. To further reduce their number, we sort the quantization results and proportionally prune Gaussian points with lower contributions, achieving a reduction in the number of points while minimizing the impact on reconstruction quality. The flowchart of the combination of learnable mask and opacity update strategy is shown in the right figure of Figure 2. In the process of training iteration, opacity, and mask parameters are trained together, which has a coupling effect on Gaussian pruning. When quantizing the contribution of Gaussian points, the updated opacity participates in the calculation, so it also benefits from the opacity update strategy.

Initially, the Gaussian mask is stored in each Gaussian point, along with all other Gaussian parameters, and is initialized to a tensor of zeros. During the training process, as densification and pruning occur, the Gaussian mask is copied when new Gaussian points are generated and removed when old points are pruned. Since binary masks cannot compute gradients during training, we employ a straight-through estimator, as suggested in previous work [20], to obtain gradients. The core idea behind the straight-through estimator is that while the model’s output can be computed normally during forward propagation, the gradient can be computed in a customized manner during backpropagation in order to bypass certain non-differentiable operations.

The learnable Gaussian mask formula is as follows:

M_{n} = s g [1 (m_{n} > φ) - m_{n}] + m_{n}

(7)

{\hat{s}}_{n} = M_{n} s_{n}, {\hat{s}}_{n} \Rightarrow s_{n}

(8)

{\hat{o}}_{n} = M_{n} o_{n}, {\hat{o}}_{n} \Rightarrow o_{n}

(9)

where the subscript

n

represents the number of Gaussian points.

M_{n}

denotes the binary mask of the

n

Gaussian point and

s g [\cdot]

is the function used to stop calculating the gradient. The mask parameter within the Gaussian points, represented by

m_{n}

, is processed through a sigmoid function and constrained to a range between 0 and 1.

φ

is the Gaussian mask threshold, which was empirically determined at 0.01 for Mip-NeRF360 and Tanks & Temples. The indicator function

1 (\cdot)

is employed such that when

m_{n}

exceeds

φ

,

M_{n}

is set to 1; otherwise,

M_{n}

is set to 0. Due to the presence of the

s g [\cdot]

function, the training process only computes the gradient of

m_{n}

. The mask

M_{n}

is applied to the scale

s_{n}

and opacity

o_{n}

, effectively preventing the masked Gaussian points from participating in the rendering computation while minimizing the amount of calculation. Finally, the Gaussian model is subsequently pruned according to the mask to yield a preliminary de-redundant Gaussian model.

During the pruning process of 3D Gaussian points, we found that due to the phased pruning strategy applied in the original 3DGS model, the number of 3D Gaussian points fluctuates in the early stages of pruning, which interferes with the reduction in quantity. The work in [39] proposed a method to eliminate such fluctuations in the number of 3D Gaussian points, where the opacity is reduced by a small fixed value after each cloning or splitting operation. To maximize the trade-off between density and quality, the fixed value was empirically determined at 0.01. We incorporated this method into our Gaussian point reduction module. This effectively eliminates the fluctuations in the number of Gaussian points, meanwhile enhancing the compression of the Gaussian model, and the loss in reconstruction quality is negligible.

When training to a certain number of times, the number of Gaussian points retained by the mask gradually tends to stabilize, and then calculate the value of the contribution of each retained Gaussian point according to the following formula:

C o n_{n} = {\hat{o}}_{n} \cdot (\frac{4}{3} π a_{n} b_{n} c_{n}) \cdot [\sum_{i = 1}^{N H W} 1 (G_{n}, p_{i})]

(10)

C o n_{n}

represents the contribution of a Gaussian point. Given that the Gaussian point is represented as an ellipsoid in three-dimensional space, the scale parameter

s_{n}

comprises the three components of the ellipsoid:

a_{n}

,

b_{n}

and

c_{n}

. Consequently, the volume of a 3D Gaussian point can be calculated by

\frac{4}{3} π a_{n} b_{n} c_{n}

. The product

N H W

indicates the total number of pixels in the training image set and

p_{i}

refers to the

i

pixel point. The function

1 (G_{n}, p_{i})

determines whether the Gaussian point

G_{n}

contributes to the rendered pixel

p_{i}

, counting as 1 if there is an intersection during rendering. This process iterates over the entire training dataset to count the number of times each Gaussian point is rendered on all pixels. Then, multiply the opacity parameter

{\hat{o}}_{n}

processed by the mask to obtain the contribution value parameter of the Gaussian point. After quantifying and ranking the contribution of each Gaussian point, we proportionally pruned the low-contribution Gaussian points. Finally, we complete the reduction of the number of Gaussian points in the Gaussian model.

The loss function of the contribution pruning module combined with the learnable mask is as follows:

L = (1 - λ) L_{1} + λ L_{D - S S I M} + \frac{1}{N} \sum_{n = 1}^{N} m_{n}

(11)

Incorporating a mask loss term into the loss function of the 3D Gaussian Splatting model promotes the minimization of the sum of

m

, thereby guiding the model to mask as many redundant Gaussian points as possible. This approach aids in reducing the total number of Gaussian points.

3.2. Gaussian Point Spherical Harmonic Coefficient Compression

The previous sections have addressed the compression of the number of Gaussians, but there remains significant potential for further reduction in the model’s parameter size. The 3D Gaussian Splatting model is primarily composed of the attribute parameters associated with each Gaussian point. In the traditional uncompressed model, the degree of spherical harmonic function is usually 3. Each Gaussian point contains a total of 59 parameters. Among these, 48 are spherical harmonic coefficients, which take up most of the storage space. The expressive capability of spherical harmonic functions increases with higher spherical harmonic function degrees, but this also significantly elevates the number of parameters that need to be stored. Experiments [21] have demonstrated that a spherical harmonic degree of 2, when combined with knowledge distillation aided by novel views, can effectively represent scenes while reducing the number of spherical harmonic coefficients. Furthermore, it has been observed that a lot of points within a scene exhibit similar color characteristics. Storing colors independently for each Gaussian point incurs substantial storage overhead. To mitigate this, we apply vector quantization to the spherical harmonic coefficients representing color, allowing multiple Gaussian points with similar colors to share a single color feature vector.

Firstly, the model compressed through the Gaussian Point Quantity Compression module is used as the teacher model to guide the student model, in which the spherical harmonic degree is set to 2. Given that the teacher model is capable of high-quality scene reconstruction, we introduce a random uniform perturbation to the poses of the old views to generate new view renderings. These are added to the training dataset for the student model. Compared to the limited dataset of old views, the enriched dataset with new perspective renderings enables the student model to learn more beneficial information.

δ

as a perturbation threshold, the formula for generating new views is:

v_{n e w} = v_{o l d} + U (0, δ)

(12)

The loss function shown in Equation (11) is calculated on a pixel-by-pixel basis, taking the sum of the Euclidean distances rendered by the teacher model and the student model as the loss.

L = \frac{1}{H W} \sum_{i = 1}^{H W} ‖C_{t e a c h e r} (p_{i}) - C_{s t u d e n t} (p_{i})‖ \begin{matrix} 2 \\ 2 \end{matrix}

(13)

After obtaining the student model through distillation, vector quantization is applied to the spherical harmonic coefficients. The process is shown as Figure 3. The process begins with the initialization using the K-means algorithm [44], clustering the spherical harmonic coefficient vectors into K clusters, where K is the size of the codebook. Within each cluster, the most representative SH coefficient vector is determined to serve as a codebook key

c_{k}

. Since the number of Gaussian points far exceeds K, this method can achieve a very high compression ratio.

This approach reduces the model’s overall storage requirements for color parameters. Although knowledge distillation and vector quantization lead to a slight decline in reconstruction quality, they result in a more compact color representation within the 3DGS model, significantly aiding model compression by reducing the number of parameters.

4. Experiments

Our experimental setup utilized an NVIDIA RTX 3090 GPU. The experiments were evaluated using peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), learned perceptual image patch similarity (LPIPS), and model size as metrics. These experiments were conducted on the Mip-NeRF360 dataset [9], the Tanks & Temples dataset [45], and the nerf_LLFF_data dataset [46] to demonstrate the effectiveness of our model compression method. In comparative experiments, the original 3D Gaussian Splatting model served as the baseline to assess the performance and compression of our model. Additionally, we performed a detailed ablation study on each module to analyze the interdependencies among different modules and quantify their effectiveness. Besides quantitatively analyzing the compressed model through visual evaluation metrics, we rendered the model to visually demonstrate the reconstruction quality after compression.

4.1. Description of the Dataset

The diversity and complexity of realistic scenes have always been challenging difficulties for 3D reconstruction. To effectively validate our research from multiple views, we selected datasets that emphasize different aspects of scene representation to comprehensively evaluate our work. A brief overview of the experimental datasets is provided below:

Mip-NeRF360 Dataset: This dataset is used to assess the performance of scene models in complex environments, particularly focusing on handling high dynamic range and intricate lighting conditions.
Tanks & Temples Dataset: A commonly used benchmark for 3D reconstruction, this dataset primarily evaluates the accuracy and detail preservation capabilities of reconstruction algorithms.
nerf_LLFF_data Dataset: This dataset contains high-resolution images of real-world scenes, serving to evaluate the model’s performance in realistic scenarios.

4.2. Performance Evaluation

Table 1 reported the performance of the original 3DGS model and several NeRF models, all of which were executed on an NVIDIA A6000 GPU. Since our experiments were conducted on an NVIDIA RTX 3090 GPU, we re-evaluated the performance of the 3DGS model on the same GPU (denoted as 3DGS *) to ensure a fair comparison. In the test results of existing work [7,20,21,47], the commonly used Tanks & Temples datasets are partial sub-datasets, such as Train and Truck. Therefore, we only count the commonly used sub-datasets in Table 1, but in Table 2, we test more sub-datasets of Tanks & Temples to verify the performance of our model. Additionally, we compared our model with more advanced 3DGS compression algorithms. From Table 1, it can be seen that while the rendering quality of our model is slightly lower than that of the advanced methods, it achieves the highest compression ratio. Model compression inherently leads to some degradation in reconstruction quality. However, considering the substantial compression achieved, these quality losses are acceptable within the scope of our research. The minor reduction in quality is a reasonable trade-off for the significant improvements in storage efficiency and computational resource savings.

Table 2 presents the experimental results for each sub-dataset of the Tanks & Temples dataset. The findings indicate that the compressed Gaussian model experiences a decline in visual metrics, with the most significant reductions observed in PSNR and LPIPS. The PSNR decreased by a maximum of 2.5 points, and the maximum decrease in LPIPS was 0.12. SSIM shows a slight decrease. Despite the degradation in visual quality, the model’s size is reduced by nearly 30 times on this dataset. In practical applications, the ease of deploying the model through compression often takes precedence overachieving high-precision reconstruction quality, particularly in scenarios with limited storage space and processing power. The balance between compression and quality retention is critical for the scalability and practicality of 3D scene reconstruction technologies in real-world settings.

Table 3 presents the experimental performance of the Mip-NeRF360 dataset. Compared to the Tanks & Temples dataset, each sub-dataset of Mip-NeRF360 contains high-resolution images from a greater number of viewpoints, resulting in a larger overall dataset size. The inclusion of multiple viewpoints in this dataset facilitates improved model training. Despite the larger size of the dataset, our method achieves a higher compression ratio in terms of model size, and the decline in reconstruction accuracy is less pronounced than that observed with the Tanks & Temples dataset. This partially demonstrates the significant impact that the number of viewpoints has on the reconstruction accuracy of our model.

Visual perception serves as a critical measure of reconstruction quality. Figure 4 presents a comparison of the original images, those rendered by the original 3DGS model, and the renderings generated by our model. Our compressed model does not exhibit significant distortion or deformation in the overall structure, and the renderings retain satisfactory visual quality. However, in areas with dense details and subtle color variations, the renderings from our model appear somewhat blurred, as demonstrated by the depiction of grass in Figure 4. The results of our analysis suggest that this phenomenon can be attributed to the compression strategy employed by our model, which adheres to a pruning principle and exhibits low tolerance for small Gaussian points during training. As a result, Gaussian points corresponding to areas with rapid color changes may be excessively pruned. This over-pruning of Gaussian points leads to a substantial degradation in reconstruction quality, which is reflected in the blurring and distortion observed in the rendered images. The garden dataset shown in Figure 4 further highlights that our compression model requires refinement, particularly in rendering regions with significant gloss changes. Nevertheless, in terms of overall image rendering, our model still achieves commendable visual expression effects. Figure 5 compares the visualization with the Lightgaussian method.

To further observe the compression effect of the number of Gaussian points and validate the effectiveness of our Gaussian point reduction module, we set the scale to a minimal value to present the image in a point cloud-like effect. As shown in Figure 6, the original 3DGS method generates relatively dense Gaussian points, while our method significantly reduces the number of Gaussian points, making the overall representation sparser. However, by comparing the rendered results of the two models, although some degree of degradation is inevitably observed, particularly in the imaging of trees, the rendered results do not significantly affect the overall viewing experience. Therefore, our compression method proves to be an effective solution.

4.3. Ablation Study

4.3.1. Experiments of Main Process Module

To verify the effectiveness of each module, we conducted ablation experiments on four subsets of the Tanks & Temples dataset, with the average value of the results presented in Table 4. Red stands for the Gaussian point number reduction module, which includes the opacity update strategy and proportional contribution pruning with learnable masks. KD represents the knowledge distillation module for reducing spherical harmonic coefficients, and VQ is the vector quantization module for compressing spherical harmonic coefficients. From the ablation experiments, we found that the red module, which reduces the number of Gaussian points, resulted in the largest decrease in metrics, especially in PSNR and LPIPS. Although this module caused the most significant drop in metrics, it also provided the most substantial compression, reducing the model size by nearly seven times compared to the baseline. The VQ module, representing vector quantization, also achieved notable compression while causing minimal visual metric decline, making it a highly effective step. By applying these modules cumulatively, our compression method achieves an extremely high compression ratio for the 3DGS model.

Figure 7 is a visualization of the ablation experiment of the main process modules. The dataset used is the Horse sub-dataset of Tanks & Temples. We use red boxes to frame two local areas on the complete original image and zoom in on the two areas in the renderings at different stages of the training process. We use the size of the 3DGS model as a benchmark. Our method compresses to 65% at the 15,000th iteration, 25% at the 30,000th iteration, 15% after knowledge distillation, and 5% after vector quantization. From the first group of images, we can see that after the model is compressed when rendering areas with dense changes such as leaves, the edges have obvious degradation. We further zoom in on the image, and from the second group of images, we can see that obvious degradation has occurred in the high-frequency texture area of the leaves. In addition, we can find that in the first 30,000 iterations, the main changes occur in the morphology of the Gaussian points. In the knowledge distillation and vector quantization stages, the main changes occur in the color attributes of the Gaussian points.

4.3.2. Experiments of Reduction Module

In the major module for reducing the number of Gaussian points, there are three submodules. To verify their effectiveness individually, we conducted a more detailed ablation study, with results shown in Table 5. Here, OpaU represents the opacity update strategy, ConP stands for Gaussian proportional contribution pruning, and GMask denotes the learnable Gaussian mask. Interestingly, we observed that when these three submodules are combined, the model not only achieves effective compression but also significantly enhances the performance of scene representation. This improvement might indicate implicit positive coupling effects occurring between the submodules.

4.3.3. Experiments of Opacity Update Module

We visualized the trend in the number of Gaussian points before and after incorporating the opacity update strategy into the baseline 3D Gaussian Splatting model, as shown in Figure 8. This figure presents the results from the flower subset of the nerf_LLFF_data dataset, demonstrating the positive impact of the opacity update strategy on pruning. In this experiment, we used contribution ratio pruning twice. With the addition of the opacity update strategy, the change in the number of Gaussian points becomes smoother.

In this regard, we analyze that while the opacity update part is more consistent with the principle of Gaussian point replication, it may result in a slightly lower overall opacity compared to the original 3DGS model. The introduction of a fixed decrement to prevent fluctuations in the Gaussian point count curve also helps accelerate the change in the opacity parameters across the entire model. Redundant Gaussian points can be quickly captured and masked, thereby achieving effective compression of the number of Gaussian points.

To further verify the opacity update strategy, we subtracted the opacity update strategy represented by OpaU from the final compression model and compared the experimental results when only ConP and GMask modules were effective. The results on the flower sub-dataset are shown in Table 6. It can be seen from the table that the opacity update strategy has little impact on visual indicators but can effectively reduce the size of the model.

We conducted a series of comprehensive ablation experiments to demonstrate the impact of each module on visual indicators and demonstrated their respective performance on compression effects.

4.4. More Technical Details

The pseudocode of our work is shown as Figure 9 and Figure 10.

The attribute parameters of Gaussian points are updated during the training process. For example, parameters such as the scale and rotation of the Gaussian points are adjusted. When an area undergoes over-reconstruction or under-reconstruction, the Gaussian points are cloned or split. As the iterations progress, parameters like scale and rotation evolve toward fitting the scene. However, this approach tends to be slower and can lead to the generation of more Gaussian points, resulting in excessive local Gaussian point density. The opacity parameter, which directly affects the rendering of the Gaussian points, plays a crucial role in controlling their impact on the scene. Even if the Gaussian points are large, a low opacity can minimize their influence on the rendered scene. Pruning based on opacity is an effective and efficient control mechanism. Therefore, we chose to add the code for manipulating opacity into the program.

By examining the 3DGS model section in Figure 11, it is evident that, with the progression of training iterations, the visual metrics consistently improve and eventually stabilize. During this process, the model size will initially grow rapidly and then stabilize. In comparison with our model compression method, the visual metrics show an overall positive trend, although the curve is less smooth than that of the 3DGS method. It can be observed that because a contribution-based pruning occurred at the 20,000th iteration, the PSNR metric temporarily halted its upward trend, and the other two metrics experienced slight declines. However, with further iterations, these metrics gradually optimized and stabilized. In addition, we can also find that at the 20,000th iteration, the model size of our method is much smaller than that of the 3DGS model. Finally, under the effect of contribution ratio pruning, the model is compressed to 1/4 of the original size.

Table 7 provides a comparison of effect sizes for different Gaussian compression methods across four metrics: PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural Similarity Index), LPIPS (Learned Perceptual Image Patch Similarity), and Size on the MIP-NeRF 360 datasets. The effect size is a measure that quantifies the magnitude of the difference between two groups. Cohen’s d is a measure of effect size used to indicate the standardized difference between two means to quantify the magnitude of an effect independent of sample size. This makes it particularly useful for comparing results across different methods. In the context of comparing different Gaussian compression methods, a large Cohen’s d effect size for size suggests a significant difference in the file sizes produced by the compared methods. Our method has a large positive effect size for file size compared to other methods (e.g., 3DGS [7] vs. Ours: Size = 2.27), indicating that it produces significantly smaller file sizes, which is advantageous for storage and transmission efficiency.

Figure 12 presents a comparative analysis of different Gaussian compression methods using ANOVA (Analysis of Variance). The box plots illustrate the performance metrics of four methods: 3DGS [7], Lightgaussian [21], C3dgs [20], and Ours. In the PSNR (Peak Signal-to-Noise Ratio) comparison, our method shows a slightly lower median value compared to the other three methods, indicating that it may have a higher noise level in the reconstructed images. However, the range of PSNR values for our method is relatively narrow, suggesting more consistent performance across different datasets. Lastly, in terms of size, since the compression rates of Lightgaussian [21], C3dgs [20], and ours are all high, if the indicators of the 3DGS method (average: 756 MB) are displayed in the figure, it will affect the visual display effects of the other three methods. Therefore, we only draw the visualization images of these three methods. Our method has the smallest median value, indicating that it produces the most compact representations among the four methods. This could be advantageous in scenarios where storage or transmission efficiency is critical.

Overall, while our method may not outperform the others in every metric, it offers a balance between performance and efficiency, making it a viable option depending on the specific requirements of the application.

4.5. Scalability Analysis for Real-World Applications

As a highly effective technique for scene reconstruction, 3DGS has great potential for various applications. If successfully implemented, it could have a significant impact in many fields, particularly in virtual reality (VR), augmented reality (AR), and digital scene modeling. However, many practical application scenarios, especially those involving resource-constrained platforms such as mobile devices (e.g., VR headsets), impose strict limitations on model size and computational resources. In these conditions, the large size and computational demands of the original 3DGS models often make them unsuitable for direct use on such devices.

To address this challenge, our proposed compression method effectively reduces the size of 3DGS models while preserving as much of their original quality and detail as possible. This compression approach not only solves the problem of excessive model size but also ensures the broad application of 3DGS technology on mobile devices and other resource-constrained platforms. It allows more users across various fields to benefit from this technology, promoting its adoption and practical use in real-world scenarios.

To assess the scalability of practical applications, we employ training time, frames per second (FPS), compressed model size, and memory usage as key performance indicators for statistical analysis. The results for several sub-datasets from the Tanks & Temples and Mip-NeRF 360 datasets are presented in Table 8. The uncompressed model sizes range from 180 MB to 1430 MB; however, the training time for our model remains consistently within 40 min, with significant improvements observed in FPS. Notably, there appears to be a correlation between model training time and the initial size of the model. For instance, models with relatively large base sizes, such as the Bicycle and Garden datasets, exhibit a notable increase in training time. Nevertheless, this relationship is not absolute, as demonstrated by the stump dataset, where, despite the large base model size, the training time remains at a moderate level.

Figure 13 visualizes the variation in memory usage during the training process. To investigate the performance trends associated with increasing dataset size or complexity, we selected the Lighthouse, Family, and Bicycle datasets, with approximate sizes of 200 MB, 500 MB, and 1500 MB, respectively, and conducted testing on an NVIDIA RTX 3090. Our experiments revealed that the memory usage during the vector quantization phase remains fixed across all three datasets, consistently at 16,054 MB (equivalent to 15.6 GB), and thus is not shown in the figure. Figure 13 presents only the memory usage during the Gaussian quantity compression module and the knowledge distillation module. The data indicates that there is no absolute linear relationship between model size and memory usage. From the trends observed in the line chart, memory usage fluctuates significantly between 15,000 and 30,000 iterations, peaking during the knowledge distillation phase. Based on our experimental findings, we conclude that our method requires at least 16 GB of GPU memory to run efficiently, while 25 GB of GPU memory is sufficient for the vast majority of cases.

5. Conclusions

We propose a method to highly compress 3D Gaussian models by a factor of 20–40. To reduce the number of Gaussian points, we combine proportional contribution pruning with learnable masks to prune away redundant Gaussian points. In the process, we demonstrate the theoretical and compression advantages of an integrated opacity update strategy. To address the problem of spherical harmonic coefficients taking up too much storage space, we use a teacher–student model framework and vector quantization to further compress the color representation parameters of the model. Experiments show that although the quality of the reconstructed scene is slightly reduced, our compression technique significantly reduces the model size and greatly reduces storage requirements. This work helps to promote the development and application of 3D scene reconstruction technology.

In our comparative experiments using the Tanks & Temples dataset, the compression ratios achieved by LightGaussian, Compressed3D, and Compact3DGS are 17×, 12×, and 11×, respectively. While these methods all achieve a certain degree of efficient compression, our approach surpasses them with a compression ratio of 35×. Although the reconstruction quality is slightly compromised, the compression ratio is significantly higher than that of the other advanced methods. Through extensive experimentation, our model achieves a maximum compression ratio of 60×, with a minimum compression ratio consistently maintained at 20×, and the overall compression effect falls within the range of 20× to 40×.

Building on the contributions of this study, one promising direction for future research is the adaptation of the proposed high-fold Gaussian compression method to dynamic scenes. For instance, integrating motion-aware strategies, such as 4D Gaussian Splatting, could enable real-time rendering and reconstruction of dynamic scenes with high fidelity. However, addressing the challenges of occlusion and depth variation in dynamic scenes may require innovations in the pruning and compression processes. Adapting opacity optimization and contribution-based pruning methods to accommodate temporal dynamics could further enhance the applicability of the method in real-world scenarios.

Author Contributions

Conceptualization, S.Q., C.W. and Z.W.; Methodology, S.Q.; Validation, S.Q.; Investigation, S.T.; Resources, C.W.; Writing—original draft, S.Q.; Writing—review and editing, C.W., Z.W. and S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Natural Science Foundation of Shandong Province (grant number ZR2024MF145), the National Natural Science Foundation of China (grant number 62072469), and the Qingdao Natural Science Foundation (grant number 23-2-1-162-zyyd-jch).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data and source are available from the authors upon request.

Conflicts of Interest

The research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
Reiser, C.; Peng, S.; Liao, Y.; Geiger, A. Kilonerf: Speeding up neural radiance fields with thousands of tiny mlps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 14335–14345. [Google Scholar]
Hu, T.; Liu, S.; Chen, Y.; Shen, T.; Jia, J. Efficientnerf efficient neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12902–12911. [Google Scholar]
Li, S.; Li, H.; Wang, Y.; Liao, Y.; Yu, L. Steernerf: Accelerating nerf rendering via smooth viewpoint trajectory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20701–20711. [Google Scholar]
Garbin, S.J.; Kowalski, M.; Johnson, M.; Shotton, J.; Valentin, J. Fastnerf: High-fidelity neural rendering at 200fps. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 14346–14355. [Google Scholar]
Yu, A.; Li, R.; Tancik, M.; Li, H.; Ng, R.; Kanazawa, A. Plenoctrees for real-time rendering of neural radiance fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5752–5761. [Google Scholar]
Kerbl, B.; Kopanas, G.; Leimkühler, T.; Drettakis, G. 3d gaussian splatting for real-time radiance field rendering. ACM Trans. Graph. 2023, 42, 139:1–139:14. [Google Scholar] [CrossRef]
Müller, T.; Evans, A.; Schied, C.; Keller, A. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (TOG) 2022, 41, 1–15. [Google Scholar] [CrossRef]
Barron, J.T.; Mildenhall, B.; Verbin, D.; Srinivasan, P.P.; Hedman, P. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5470–5479. [Google Scholar]
Matsuki, H.; Murai, R.; Kelly, P.H.; Davison, A.J. Gaussian splatting slam. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 18039–18048. [Google Scholar]
Guo, C.; Gao, C.; Bai, Y.; Lv, X. RD-SLAM: Real-Time Dense SLAM Using Gaussian Splatting. Appl. Sci. 2024, 14, 7767. [Google Scholar] [CrossRef]
Wu, G.; Yi, T.; Fang, J.; Xie, L.; Zhang, X.; Wei, W.; Liu, W.; Tian, Q.; Wang, X. 4d gaussian splatting for real-time dynamic scene rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 20310–20320. [Google Scholar]
Ren, J.; Pan, L.; Tang, J.; Zhang, C.; Cao, A.; Zeng, G.; Liu, Z. Dreamgaussian4d: Generative 4d gaussian splatting. arXiv 2023, arXiv:2312.17142. [Google Scholar]
Yang, Z.; Yang, H.; Pan, Z.; Zhu, X.; Zhang, L. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv 2023, arXiv:2310.10642. [Google Scholar]
Li, X.; Wang, H.; Tseng, K.K. Gaussiandiffusion: 3d gaussian splatting for denoising diffusion probabilistic models with structured noise. arXiv 2023, arXiv:2311.11221. [Google Scholar]
Zhou, X.; Lin, Z.; Shan, X.; Wang, Y.; Sun, D.; Yang, M.H. Drivinggaussian: Composite gaussian splatting for surrounding dynamic autonomous driving scenes. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21634–21643. [Google Scholar]
Huang, Y.; Cui, B.; Bai, L.; Guo, Z.; Xu, M.; Islam, M.; Ren, H. Endo-4dgs: Endoscopic monocular scene reconstruction with 4d gaussian splatting. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer Nature: Cham, Switzerland, 2024; pp. 197–207. [Google Scholar]
Liu, Y.; Luo, C.; Fan, L.; Wang, N.; Peng, J.; Zhang, Z. Citygaussian: Real-time high-quality large-scale scene rendering with gaussians. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2025; pp. 265–282. [Google Scholar]
Schonberger, J.L.; Frahm, J.M. Structure-from-motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 4104–4113. [Google Scholar]
Lee, J.C.; Rho, D.; Sun, X.; Ko, J.H.; Park, E. Compact 3d gaussian representation for radiance field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21719–21728. [Google Scholar]
Fan, Z.; Wang, K.; Wen, K.; Zhu, Z.; Xu, D.; Wang, Z. Lightgaussian: Unbounded 3d gaussian compression with 15x reduction and 200+ fps. arXiv 2023, arXiv:2311.17245. [Google Scholar]
Cheng, Y.; Wang, D.; Zhou, P.; Zhangs, T. A survey of model compression and acceleration for deep neural networks. arXiv 2017, arXiv:1710.09282. [Google Scholar]
Choudhary, T.; Mishra, V.; Goswami, A.; Sarangapanis, J. A comprehensive survey on model compression and acceleration. Artif. Intell. Rev. 2020, 53, 5113–5155. [Google Scholar] [CrossRef]
Tencent, A.I. Pixel-GS: Density Control with Pixel-Aware Gradient for 3D Gaussian Splatting. In European Conference on Computer Vision; Springer Nature: Cham, Switzerland, 2024. [Google Scholar]
Tang, J.; Ren, J.; Zhou, H.; Liu, Z.; Zengs, G. Dreamgaussian: Generative gaussian splatting for efficient 3d content creation. arXiv 2023, arXiv:2309.16653. [Google Scholar]
Wang, J.; Fang, J.; Zhang, X.; Xie, L.; Tian, Q. Gaussianeditor: Editing 3d gaussians delicately with text instructions. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 20902–20911. [Google Scholar]
Chen, Y.; Chen, Z.; Zhang, C.; Wang, F.; Yang, X.; Wang, Y.; Cai, Z.; Yang, L.; Liu, H.; Lins, G. Gaussianeditor: Swift and controllable 3d editing with gaussian splatting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 21476–21485. [Google Scholar]
Hinton, G. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
Gou, J.; Yu, B.; Maybank, S.J.; Tao, D. Knowledge distillation: A survey. Int. J. Comput. Vis. 2021, 129, 1789–1819. [Google Scholar] [CrossRef]
Romero, A.; Ballas, N.; Kahou, S.E.; Chassang, A.; Gatta, C.; Bengio, Y. FitNets: Hints for Thin Deep Nets. arXiv 2014, arXiv:1412.6550. [Google Scholar]
Passalis, N.; Tefas, A. Learning Deep Representations with Probabilistic Knowledge Transfer. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 268–284. [Google Scholar]
Mirzadeh, S.I.; Farajtabar, M.; Li, A.; Ghasemzadeh, H. Improved Knowledge Distillation via Teacher Assistant. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 5191–5198. [Google Scholar]
Dutt, A.; Lunawat, I.; Kaur, M. Mutli-View 3D Reconstruction using Knowledge Distillation. arXiv 2024, arXiv:2412.02039. [Google Scholar]
Gray, R.M. Vector Quantization. IEEE ASSP Mag. 1984, 1, 4–29. [Google Scholar] [CrossRef]
Gersho, A.; Gray, R.M. Vector Quantization and Signal Compression; Springer: Cham, Switzerland, 1992. [Google Scholar]
Van den Oord, A.; Vinyals, O.; Kavukcuoglu, K. Neural Discrete Representation Learning. Adv. Neural Inf. Process. Syst. (NeurIPS) 2017, 30, 6306–6315. [Google Scholar]
Makhoul, J.; Roucos, S.; Gish, H. Vector quantization in speech coding. Proc. IEEE 1985, 73, 1551–1588. [Google Scholar] [CrossRef]
Navaneet, K.L.; Pourahmadi Meibodi, K.; Abbasi Koohpayegani, S.; Pirsiavash, H. Compgs: Smaller and faster gaussian splatting with vector quantization. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2025; pp. 330–349. [Google Scholar]
Rota Bulò, S.; Porzi, L.; Kontschieder, P. Revising Densification in Gaussian Splatting. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2025; pp. 347–362. [Google Scholar]
Bengio, Y.; Léonard, N.; Courville, A. Estimating or propagating gradients through stochastic neurons for conditional computation. arXiv 2013, arXiv:1308.3432. [Google Scholar]
Qin, H.; Gong, R.; Liu, X.; Shen, M.; Wei, Z.; Yu, F.; Song, J. Forward and backward information retention for accurate binary neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 2250–2259. [Google Scholar]
Liu, Z.; Wu, B.; Luo, W.; Yang, X.; Liu, W.; Cheng, K.T. Bi-real net: Enhancing the performance of 1-bit cnns with improved representational capability and advanced training algorithm. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 722–737. [Google Scholar]
Lin, M.; Ji, R.; Xu, Z.; Zhang, B.; Wang, Y.; Wu, Y.; Huang, F.; Lin, C.-W. Rotated binary neural network. Adv. Neural Inf. Process. Syst. 2020, 33, 7474–7485. [Google Scholar]
Krishna, K.; Murty, M.N. Genetic K-means algorithm. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 1999, 29, 433–439. [Google Scholar] [CrossRef] [PubMed]
Knapitsch, A.; Park, J.; Zhou, Q.Y.; Koltun, V. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans. Graph. (ToG) 2017, 36, 1–13. [Google Scholar] [CrossRef]
Mildenhall, B.; Srinivasan, P.P.; Ortiz-Cayon, R.; Kalantari, N.K.; Ramamoorthi, R.; Ng, R.; Kar, A. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (ToG) 2019, 38, 1–14. [Google Scholar] [CrossRef]
Niedermayr, S.; Stumpfegger, J.; Westermann, R. Compressed 3d gaussian splatting for accelerated novel view synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–22 June 2024; pp. 10349–10358. [Google Scholar]
Fridovich-Keil, S.; Yu, A.; Tancik, M.; Chen, Q.; Recht, B.; Kanazawa, A. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5501–5510. [Google Scholar]

Figure 1. Illustration of the proposed Gaussian compression framework. Firstly, the 3D Gaussian points are initialized using Structure from Motion (SfM). During the densification process, an opacity-based strategy is applied to guide the cloning of Gaussian points. In the pruning stage, a learnable Gaussian mask is introduced to identify and mask redundant Gaussian points, which are subsequently pruned based on their contribution to image rendering at certain intervals. To address the high memory usage of spherical harmonic coefficients, knowledge distillation and vector quantization are applied, resulting in the final compressed model.

Figure 2. (left) Comparison of the original update policy with the opacity update strategy. (right) A flowchart of the Gaussian mask integration that illustrates the relationship between opacity, density, and pruning choices.

Figure 3. The process of vector quantization of SH coefficients in 3DGS model. Comparison of the original update policy with the opacity update strategy. The process employs K-means algorithms, which iteratively assigns 3DGS points to clusters in order to minimize the distance between the SH vectors and their corresponding centroids. Data compression is then achieved by substituting each original vector with the index of the nearest representative vector in the SH codebook.

Figure 4. Visualization of the reconstructed results. It can be observed that our method can significantly reduce the size of 3D models without significantly affecting the visual perception of rendering quality.

Figure 5. Visual comparison results of the original image, LightGaussian rendering, and our model rendering.

Figure 6. Visualize the result of reducing the number of Gaussian points. We degrade the Gaussian points and visualize only the center point. It can be seen that our method effectively reduces the number of Gaussian points.

Figure 7. Compression ratio and visualization at different stages on the Horse dataset.

Figure 8. The influence curve of the opacity update strategy on the number of Gaussian points.

Figure 9. Pseudocode of the overall pipeline.

Figure 10. Pseudocode of Gaussian quantity compression.

Figure 11. Relationship between PSNR, SSIM, LPIPS, and Size during the training process of our method and 3DGS method.

Figure 12. ANOVA Comparison of Different Gaussian Compression Methods.

Figure 13. Line chart of memory usage during training of some datasets.

Table 1. Qualitative results of the proposed method evaluated on Tanks & Temples datasets.

Dataset	Tanks & Temples
Method	PSNR	SSIM	LPIPS	Size
Plenoxels [48]	21.08	0.719	0.379	2.3 GB
INGP-base [8]	21.72	0.723	0.330	13 MB
INGP-big [8]	21.92	0.745	0.305	48 MB
Mip-NeRF360 [9]	22.22	0.759	0.257	8.6 MB
3DGS	23.14	0.841	0.183	411 MB
3DGS *	26.34	0.902	0.148	302.25 MB
Lightgaussian [21]	24.89	0.876	0.216	17.49 MB
Compressed3D [47]	24.36	0.836	0.208	25.69 MB
C3dgs [20]	25.97	0.893	0.166	27.57 MB
Ours	24.79	0.839	0.238	8.67 MB

Table 2. Comparative experimental results of Tanks & Temples sub-dataset.

Scene		M60	Playground	Train	Truck	Family	Horse	Lighthouse
3DGS	PSNR	28.991	26.985	22.804	26.597	25.116	24.393	22.142
	SSIM	0.925	0.902	0.874	0.907	0.877	0.891	0.839
	LPIPS	0.125	0.146	0.187	0.134	0.165	0.165	0.222
	Size	291M	352M	186M	380M	506M	298M	202M
Ours	PSNR	26.397	25.428	21.615	25.702	24.553	23.781	21.593
	SSIM	0.872	0.833	0.773	0.879	0.849	0.868	0.792
	LPIPS	0.218	0.239	0.304	0.191	0.225	0.209	0.326
	Size	7.53M	10.79M	4.89M	11.47M	23M	15M	5.3M

Table 3. Comparative experimental results of MIP-NeRF 360 sub-dataset.

Scene		Bicycle	Flower	Garden	Stump	Treehill	Room	Counter	Kitchen	Bonsai
3DGS	PSNR	25.19	21.32	27.31	26.60	22.51	31.46	29.15	31.63	32.89
	SSIM	0.752	0.586	0.858	0.767	0.631	0.920	0.913	0.932	0.947
	LPIPS	0.237	0.364	0.121	0.242	0.352	0.201	0.181	0.114	0.171
	Size	1433.6M	806M	1331.2M	1126.4M	820.2M	350M	274M	417M	248M
Light-gaussian	PSNR	25.06	20.25	26.62	26.76	22.60	31.17	28.36	30.96	31.45
	SSIM	0.745	0.569	0.842	0.774	0.628	0.917	0.899	0.921	0.938
	LPIPS	0.257	0.412	0.167	0.251	0.372	0.214	0.218	0.162	0.196
	Size	55.84M	45.31M	55.65M	48.46M	52.60M	30.36M	30.44M	39.41M	31.42M
C3dgs	PSNR	25.23	20.89	26.74	26.42	22.65	30.96	27.91	30.86	31.97
	SSIM	0.726	0.565	0.832	0.759	0.634	0.903	0.885	0.907	0.941
	LPIPS	0.264	0.390	0.143	0.294	0.374	0.244	0.213	0.186	0.187
	Size	62.98M	52.16M	69.78M	56.74M	57.12M	34.86M	35.76M	46.53M	37.48M
Ours	PSNR	24.45	19.81	25.50	25.69	21.03	28.62	26.89	29.77	30.42
	SSIM	0.675	0.514	0.769	0.714	0.683	0.843	0.853	0.857	0.896
	LPIPS	0.308	0.429	0.234	0.338	0.437	0.279	0.279	0.248	0.230
	Size	32.22M	14.94M	26.08M	29.16M	16.68M	7.66M	6.75M	6.41M	8.09M

Table 4. Ablation experiment of main process module on Tanks & Temples dataset. Red denotes the Gaussian point number reduction module, KD denotes the knowledge distillation module for reducing spherical harmonic coefficients, and VQ denotes the vector quantization module.

Method\Dataset			Tanks & Temples
Red	KD	VQ	PSNR	SSIM	LPIPS	Size
3DGS			26.345	0.902	0.148	302.25M
✓			25.372	0.851	0.215	43.97M
✓	✓		25.082	0.845	0.223	28.98M
✓	✓	✓	24.785	0.839	0.238	8.67 MB

Table 5. Ablation experiment of Gaussian point number reduction module on nerf_LLFF_data dataset. OpaU denotes the opacity update strategy, ConP denotes Gaussian proportional contribution pruning, and GMask denotes the learnable Gaussian mask.

Method\Dataset			Flower
OpaU	ConP	GMask	PSNR	SSIM	LPIPS	Size
3DGS			26.971	0.845	0.207	150M
✓			26.932	0.844	0.211	98M
✓	✓		25.446	0.815	0.246	64.60M
✓	✓	✓	26.336	0.824	0.247	36.49M

Table 6. Qualitative results with opacity update strategy removed.

Method	PSNR	SSIM	LPIPS	Size
(-OpaU) Ours	26.487	0.826	0.237	58.17M
Ours	26.336	0.824	0.247	36.49M

Table 7. Effect Size Comparison of Different Gaussian Compression Methods.

Method	PSNR	SSIM	LPIPS	Size
3DGS vs. LightGaussian	0.13	0.06	−0.33	2.18
3DGS vs. C3dgs	0.12	0.13	−0.40	2.16
3DGS vs. Ours	0.46	0.43	−1.06	2.27
Lightgaussian vs. Ours	0.32	0.37	−0.71	2.56
C3dgs vs. Ours	0.34	0.30	−0.66	2.96

Table 8. Performance analysis on Tanks & Temples and Mip-NeRF 360 datasets for Real-World Applications.

Scene		M60	Playground	Train	Truck	Family	Horse	Lighthouse	Bicycle	Counter	Garden	Kitchen	Stump
3DGS	Size	291M	352M	186M	380M	506M	298M	202M	1433.6M	274M	1331.2M	417M	1126.4M
	FPS	121.64	89.82	117.01	96.26	75.73	106.40	87.79	65.79	96.21	66.87	100.33	88.83
	Train	22m33s	27m31s	18m23s	21m30s	24m52s	23m	20m16s	40m5s	23m13s	40m4s	28m23s	31m35s
Ours	Size	7.53M	10.79M	4.89M	11.47M	23M	15M	5.3M	32.22M	6.75M	26.08M	6.41M	29.16M
	FPS	219.51	180.86	281.65	184.70	180.64	192.84	233.65	163.64	192.93	147.60	206.3	161.97
	Train	24m41s	37m57s	21m49s	24m23s	31m34s	24m34s	25m35s	41m7s	26m34s	30m44s	27m10s	25m43s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Qiu, S.; Wu, C.; Wan, Z.; Tong, S. High-Fold 3D Gaussian Splatting Model Pruning Method Assisted by Opacity. Appl. Sci. 2025, 15, 1535. https://doi.org/10.3390/app15031535

AMA Style

Qiu S, Wu C, Wan Z, Tong S. High-Fold 3D Gaussian Splatting Model Pruning Method Assisted by Opacity. Applied Sciences. 2025; 15(3):1535. https://doi.org/10.3390/app15031535

Chicago/Turabian Style

Qiu, Shiyu, Chunlei Wu, Zhenghao Wan, and Siyuan Tong. 2025. "High-Fold 3D Gaussian Splatting Model Pruning Method Assisted by Opacity" Applied Sciences 15, no. 3: 1535. https://doi.org/10.3390/app15031535

APA Style

Qiu, S., Wu, C., Wan, Z., & Tong, S. (2025). High-Fold 3D Gaussian Splatting Model Pruning Method Assisted by Opacity. Applied Sciences, 15(3), 1535. https://doi.org/10.3390/app15031535

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

High-Fold 3D Gaussian Splatting Model Pruning Method Assisted by Opacity

Abstract

1. Introduction

2. Related Work

2.1. Three-Dimensional Gaussian Splatting

2.2. Knowledge Distillation

2.3. Vector Quantization

3. Methods

3.1. Gaussian Quantity Compression

3.1.1. Opacity Update Module

3.1.2. Proportional Contribution Pruning with Learnable Masks

3.2. Gaussian Point Spherical Harmonic Coefficient Compression

4. Experiments

4.1. Description of the Dataset

4.2. Performance Evaluation

4.3. Ablation Study

4.3.1. Experiments of Main Process Module

4.3.2. Experiments of Reduction Module

4.3.3. Experiments of Opacity Update Module

4.4. More Technical Details

4.5. Scalability Analysis for Real-World Applications

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI