1. Introduction
In recent years, significant breakthroughs have been made in 3D scene reconstruction techniques, particularly in the area of radiance fields, with Neural Radiance Field (NeRF) [
1] being a pivotal development. NeRF implicitly represents and learns the radiance field of a 3D scene through a multilayer perceptron (MLP), achieving high-quality reconstruction by utilizing images captured from multiple viewpoints. Compared to other traditional 3D scene reconstruction methods, the high-quality images rendered by NeRF highlight the advantages of differentiable radiance fields in scene representation. However, NeRF’s slow training and rendering speed have become a significant bottleneck in the practical application of radiance field models. Although various methods [
2,
3,
4,
5,
6] have been continuously proposed for accelerating the training and rendering process, they often come at the cost of depleting the reconstruction accuracy.
With advancements in 3D reconstruction technology, the 3D Gaussian Splatting model (3DGS) [
7] has been proposed. Unlike discrete point cloud models, the 3DGS model is differentiable across the scene due to the properties of 3D Gaussian points, allowing for higher-quality rendering after scene construction. Compared to NeRF, the 3DGS model avoids unnecessary computations in empty space and leverages rasterization techniques to quickly project and render 3D Gaussians onto a 2D image, significantly enhancing training and rendering speed. Despite using a completely different data structure, the 3DGS model retains the properties of a differentiable radiance field, allowing for high-quality scene rendering akin to NeRF. In the realm of Neural Radiance Fields (NeRF) methods, InstantNGP [
8] achieves significantly high training speed, while MIP-NeRF 360 [
9] attains outstanding scene reconstruction accuracy. However, as an innovative scene representation method, the 3DGS model not only rivals the state-of-the-art NeRF methods in terms of speed and accuracy but also supports near real-time rendering. 3DGS has seen significant advancements across various research domains. The 3D Gaussian Splatting model has broad applicability across multiple domains, including Simultaneous Localization and Mapping (SLAM) [
10,
11], dynamic scene reconstruction [
12,
13,
14], Artificial Intelligence-Generated Content (AIGC) [
15], autonomous driving [
16], endoscopic scene reconstruction [
17], and largescale scene modeling [
18]. Due to its excellent performance and extensibility, 3D Gaussian Splatting modeling has quickly become a prominent branch in 3D reconstruction.
However, the 3D Gaussian Splatting method is not without flaws. Its model parameter count is significantly larger than that of NeRF, which allows for high-quality reconstruction but at the expense of storage requirements. This presents a considerable limitation for resource-constrained applications such as AR/VR headsets and IoT devices. Therefore, it is crucial to investigate methods for compressing 3DGS models to reduce storage requirements while maintaining rendering quality.
Before conducting compression studies, it is essential to understand the model’s storage format. 3DGS models primarily represent scenes explicitly by training a large number of parameterized 3D Gaussian points. Each Gaussian point consists of 59 parameters, including such as position, rotation quaternion, covariance matrix, opacity, and spherical harmonic coefficients. During model training, the 3DGS model first performs Structure from Motion (SfM) [
19] based on training datasets to estimate camera positions and the initial point cloud, initializing a small number of 3D Gaussians based on the sparse point cloud. Throughout training, the original Gaussian points continuously clones and splits to generate new Gaussian points with numerous attribute parameters, updating these parameters to optimize the model’s fit to the scene. However, this process inevitably generates redundant Gaussians. Although the inherent Gaussian pruning method in the 3D Gaussian Splatting model limits the number of Gaussians to some extent, our experiments have shown that inherent pruning is incomplete, leaving a substantial number of redundant Gaussian points.
Some existing research works, such as Compact3DGS [
20] and LightGaussian [
21], have contributed to the compression of 3D Gaussian models by implementing pruning and compression techniques. These methods, when handling under-reconstruction rendering regions, often rely on the cloning approach of the original 3D Gaussians Splatting to improve the efficiency of the Gaussian distribution. However, this full-parameter cloning approach does not specifically address the issue of managing redundancy in high-density areas with large local variations or maintaining rendering quality. When dealing with localized scene variations caused by scale or rotation properties, the traditional cloning method merely copies the parameters related to spherical harmonic coefficients without adapting to these changes. According to the rendering principle of the splatting method, since the Gaussian points are purely cloned, during the splatting stage along a certain ray, the Gaussian points will overlap, causing the front-order opacity weight to be too large, thus causing color errors. In contrast, our opacity update technique dynamically adjusts the spherical harmonic coefficients of newly generated Gaussian points, ensuring that the rendering quality is preserved by maintaining the consistency of pixel-wise renderings.
Based on the above characterization of the storage structure of the 3DGS model, it is evident that, aside from a small amount of data for constructing the global scene, the storage size is primarily determined by the product of the number of Gaussian points and the number of parameters within each point. Therefore, to effectively reduce the model size, two approaches can be pursued: (a) prune redundant Gaussians to reduce the number of Gaussian points, and (b) compress Gaussian parameter representations to shrink the product of the Gaussian model. Inspired by existing work [
20,
21,
22,
23], we improve the applicable 3DGS model compression method. The main contributions of this paper are as follows:
Propose a principle-level opacity-updating strategy to assist in pruning redundant Gaussian points;
Propose a proportional pruning method of Gaussian contribution values combined with trainable Gaussian masks;
Validate the effectiveness of a method combining teacher–student models and vector quantization for compressing spherical harmonic coefficients.
By employing these effective 3D Gaussian model compression methods, the memory requirement of the Gaussian model can be significantly reduced, which is of great significance for the application of the 3D Gaussian model in real scenarios.
3. Methods
The work in this paper is divided into two parts: Gaussian quantity compression and Gaussian spherical harmonic coefficient compression. The overview is shown in
Figure 1. Firstly, a Structure from Motion (SfM) operation is performed on the training dataset to initialize the 3D Gaussian points. During the densification and pruning stage, unlike previous strategies that reset opacity at intervals, our model applies a new opacity-updating strategy during the Gaussian point cloning and splitting process. This approach is more consistent with the rendering principles and assists in the pruning of Gaussian points. In the pruning stage, redundant Gaussians are masked using a learnable Gaussian mask obtained from training. Then, the contribution value of each Gaussian point is calculated, and less influential redundant Gaussian points are pruned proportionally. Finally, During the spherical harmonic coefficient compression stage, knowledge distillation is employed to allow the student model to learn color features at a smaller spherical harmonic degree. Additionally, new view renderings generated by the high-precision teacher model guide the training of the student model, enabling it to learn richer knowledge. The new views are obtained by adding random average perturbations to the original views from the training dataset. Then, vector quantization is performed on the compressed spherical harmonic coefficients to further compress the color features, resulting in the final complete compressed model.
3.1. Gaussian Quantity Compression
3.1.1. Opacity Update Module
When a 3D Gaussian Splatting model performs cloning and splitting during the densification phase, the new Gaussian points inherit the opacity of the original Gaussian points. According to the alpha blending principle of rendering, this direct numerical replication can introduce errors in opacity. Especially when fitting a local area with complex colors, reconstruction of this area often requires a large number of Gaussian points to participate in cloning and splitting, resulting in an excessive density of Gaussian points in the local space. This high-density space will aggravate the opacity error during the training iteration process. Inspired by the explanation of the Gaussian point cloning principle in [
39], we introduce the updated opacity formula from a new perspective.
As shown in
Figure 2 left part, suppose a ray of light passes sequentially through two Gaussian points,
and
, during rendering. The opacity of Gaussian point
is
and its color is
, while the opacity of Gaussian point
is
and its color is
. According to the alpha blending logic, the opacity parameter is multiplied by the color parameter to weight the color. The Gaussian point further along the ray is affected not only by its own opacity but also by the product of its opacity and the difference between 1 and the total opacity of the preceding points. Therefore, before cloning the Gaussian point, take the left schematic in
Figure 2 as an example. The color of this ray rendered to a pixel is calculated as follows:
Suppose a new Gaussian point
is cloned from the Gaussian point
along the direction of the ray, resulting in the ray passing through three Gaussian points. Following the original cloning method, as illustrated in the upper right schematic of
Figure 2, except for the position, the new and old Gaussian parameters are the same. At this point,
,
. Although the Gaussian point
remains unchanged, its color weight along the ray will decrease due to the increased number of preceding Gaussian points. The color blending formula in this case is:
Due to the opacity
, in this primitive cloning method, the coefficient relationship between the color parameters of the before and after Gaussian points
is:
It can be seen that, although the Gaussian point is not cloned, it is still affected by other Gaussian points during rendering. After cloning Gaussian point , it has less weight in the calculation of color. From a theoretical perspective, the primary purpose of cloning new Gaussian points is to correct deviations in scale and position. However, simply copying the opacity can lead to cumulative errors in rendering along the ray. To address this issue, this paper introduces an opacity-updating strategy based on the alpha blending principle during the cloning process. This ensures that when rendering Gaussian points, the opacity weight after superimposing the original Gaussian point and the new Gaussian point is consistent with the opacity weight of the original Gaussian point before cloning, thereby eliminating the adverse effects on subsequent Gaussian point rendering.
Taking the lower right diagram of
Figure 2 as an example, assume that the opacities of the clones after update are
and
, where
,
. The rendering formula is:
To make the color weight of the Gaussian point
unchanged, we have to make
Since
is non-negative, after simplification, the updated opacity can be derived as:
Because Gaussian points usually change their covariance parameters when split during reconstruction, the original method is still used for splitting.
Due to the inevitable impact of model compression on model quality, in order to reduce the loss of rendering quality caused by model compression, we use this opacity update strategy in all processes involving Gaussian point cloning.
3.1.2. Proportional Contribution Pruning with Learnable Masks
Redundant Gaussians can be mainly classified into two categories: a. Noisy Gaussian points that negatively impact scene reconstruction; and b. Gaussian points with minimal opacity or scale, and have little effect on the scene reconstruction. Although the pruning strategy of the original 3DGS model can control the number of Gaussian points to a certain extent, experiments have shown that this approach is incomplete, and the redundant Gaussian points still take up a large proportion of the model. If it can be achieved that only Gaussian points that are favorable for reconstruction are retained, the model storage space can be drastically reduced. Therefore, if a learnable mask can be trained to mask the redundant Gaussians, it will help the pruning greatly. Common masking efforts tend to be binary number masks, and directly training a binary mask results in a gradient of 0 almost everywhere, making it impossible to train via a gradient descent algorithm. However, there have been a large number of studies on the gradient of binary networks [
40,
41,
42,
43] inspired by these approaches. We design a proportional contribution pruning with learnable masks to reduce the number of Gaussian points. This mask primarily serves to filter out redundant Gaussians, allowing for their removal at the end of the process. Quantizing the contribution of Gaussian points helps to retain the effective Gaussian points. To further reduce their number, we sort the quantization results and proportionally prune Gaussian points with lower contributions, achieving a reduction in the number of points while minimizing the impact on reconstruction quality. The flowchart of the combination of learnable mask and opacity update strategy is shown in the right figure of
Figure 2. In the process of training iteration, opacity, and mask parameters are trained together, which has a coupling effect on Gaussian pruning. When quantizing the contribution of Gaussian points, the updated opacity participates in the calculation, so it also benefits from the opacity update strategy.
Initially, the Gaussian mask is stored in each Gaussian point, along with all other Gaussian parameters, and is initialized to a tensor of zeros. During the training process, as densification and pruning occur, the Gaussian mask is copied when new Gaussian points are generated and removed when old points are pruned. Since binary masks cannot compute gradients during training, we employ a straight-through estimator, as suggested in previous work [
20], to obtain gradients. The core idea behind the straight-through estimator is that while the model’s output can be computed normally during forward propagation, the gradient can be computed in a customized manner during backpropagation in order to bypass certain non-differentiable operations.
The learnable Gaussian mask formula is as follows:
where the subscript
represents the number of Gaussian points.
denotes the binary mask of the
Gaussian point and
is the function used to stop calculating the gradient. The mask parameter within the Gaussian points, represented by
, is processed through a sigmoid function and constrained to a range between 0 and 1.
is the Gaussian mask threshold, which was empirically determined at 0.01 for Mip-NeRF360 and Tanks & Temples. The indicator function
is employed such that when
exceeds
,
is set to 1; otherwise,
is set to 0. Due to the presence of the
function, the training process only computes the gradient of
. The mask
is applied to the scale
and opacity
, effectively preventing the masked Gaussian points from participating in the rendering computation while minimizing the amount of calculation. Finally, the Gaussian model is subsequently pruned according to the mask to yield a preliminary de-redundant Gaussian model.
During the pruning process of 3D Gaussian points, we found that due to the phased pruning strategy applied in the original 3DGS model, the number of 3D Gaussian points fluctuates in the early stages of pruning, which interferes with the reduction in quantity. The work in [
39] proposed a method to eliminate such fluctuations in the number of 3D Gaussian points, where the opacity is reduced by a small fixed value after each cloning or splitting operation. To maximize the trade-off between density and quality, the fixed value was empirically determined at 0.01. We incorporated this method into our Gaussian point reduction module. This effectively eliminates the fluctuations in the number of Gaussian points, meanwhile enhancing the compression of the Gaussian model, and the loss in reconstruction quality is negligible.
When training to a certain number of times, the number of Gaussian points retained by the mask gradually tends to stabilize, and then calculate the value of the contribution of each retained Gaussian point according to the following formula:
represents the contribution of a Gaussian point. Given that the Gaussian point is represented as an ellipsoid in three-dimensional space, the scale parameter comprises the three components of the ellipsoid: , and . Consequently, the volume of a 3D Gaussian point can be calculated by . The product indicates the total number of pixels in the training image set and refers to the pixel point. The function determines whether the Gaussian point contributes to the rendered pixel , counting as 1 if there is an intersection during rendering. This process iterates over the entire training dataset to count the number of times each Gaussian point is rendered on all pixels. Then, multiply the opacity parameter processed by the mask to obtain the contribution value parameter of the Gaussian point. After quantifying and ranking the contribution of each Gaussian point, we proportionally pruned the low-contribution Gaussian points. Finally, we complete the reduction of the number of Gaussian points in the Gaussian model.
The loss function of the contribution pruning module combined with the learnable mask is as follows:
Incorporating a mask loss term into the loss function of the 3D Gaussian Splatting model promotes the minimization of the sum of , thereby guiding the model to mask as many redundant Gaussian points as possible. This approach aids in reducing the total number of Gaussian points.
3.2. Gaussian Point Spherical Harmonic Coefficient Compression
The previous sections have addressed the compression of the number of Gaussians, but there remains significant potential for further reduction in the model’s parameter size. The 3D Gaussian Splatting model is primarily composed of the attribute parameters associated with each Gaussian point. In the traditional uncompressed model, the degree of spherical harmonic function is usually 3. Each Gaussian point contains a total of 59 parameters. Among these, 48 are spherical harmonic coefficients, which take up most of the storage space. The expressive capability of spherical harmonic functions increases with higher spherical harmonic function degrees, but this also significantly elevates the number of parameters that need to be stored. Experiments [
21] have demonstrated that a spherical harmonic degree of 2, when combined with knowledge distillation aided by novel views, can effectively represent scenes while reducing the number of spherical harmonic coefficients. Furthermore, it has been observed that a lot of points within a scene exhibit similar color characteristics. Storing colors independently for each Gaussian point incurs substantial storage overhead. To mitigate this, we apply vector quantization to the spherical harmonic coefficients representing color, allowing multiple Gaussian points with similar colors to share a single color feature vector.
Firstly, the model compressed through the Gaussian Point Quantity Compression module is used as the teacher model to guide the student model, in which the spherical harmonic degree is set to 2. Given that the teacher model is capable of high-quality scene reconstruction, we introduce a random uniform perturbation to the poses of the old views to generate new view renderings. These are added to the training dataset for the student model. Compared to the limited dataset of old views, the enriched dataset with new perspective renderings enables the student model to learn more beneficial information.
as a perturbation threshold, the formula for generating new views is:
The loss function shown in Equation (11) is calculated on a pixel-by-pixel basis, taking the sum of the Euclidean distances rendered by the teacher model and the student model as the loss.
After obtaining the student model through distillation, vector quantization is applied to the spherical harmonic coefficients. The process is shown as
Figure 3. The process begins with the initialization using the K-means algorithm [
44], clustering the spherical harmonic coefficient vectors into K clusters, where K is the size of the codebook. Within each cluster, the most representative SH coefficient vector is determined to serve as a codebook key
. Since the number of Gaussian points far exceeds K, this method can achieve a very high compression ratio.
This approach reduces the model’s overall storage requirements for color parameters. Although knowledge distillation and vector quantization lead to a slight decline in reconstruction quality, they result in a more compact color representation within the 3DGS model, significantly aiding model compression by reducing the number of parameters.
4. Experiments
Our experimental setup utilized an NVIDIA RTX 3090 GPU. The experiments were evaluated using peak signal-to-noise ratio (PSNR), structural similarity index (SSIM), learned perceptual image patch similarity (LPIPS), and model size as metrics. These experiments were conducted on the Mip-NeRF360 dataset [
9], the Tanks & Temples dataset [
45], and the nerf_LLFF_data dataset [
46] to demonstrate the effectiveness of our model compression method. In comparative experiments, the original 3D Gaussian Splatting model served as the baseline to assess the performance and compression of our model. Additionally, we performed a detailed ablation study on each module to analyze the interdependencies among different modules and quantify their effectiveness. Besides quantitatively analyzing the compressed model through visual evaluation metrics, we rendered the model to visually demonstrate the reconstruction quality after compression.
4.1. Description of the Dataset
The diversity and complexity of realistic scenes have always been challenging difficulties for 3D reconstruction. To effectively validate our research from multiple views, we selected datasets that emphasize different aspects of scene representation to comprehensively evaluate our work. A brief overview of the experimental datasets is provided below:
Mip-NeRF360 Dataset: This dataset is used to assess the performance of scene models in complex environments, particularly focusing on handling high dynamic range and intricate lighting conditions.
Tanks & Temples Dataset: A commonly used benchmark for 3D reconstruction, this dataset primarily evaluates the accuracy and detail preservation capabilities of reconstruction algorithms.
nerf_LLFF_data Dataset: This dataset contains high-resolution images of real-world scenes, serving to evaluate the model’s performance in realistic scenarios.
4.2. Performance Evaluation
Table 1 reported the performance of the original 3DGS model and several NeRF models, all of which were executed on an NVIDIA A6000 GPU. Since our experiments were conducted on an NVIDIA RTX 3090 GPU, we re-evaluated the performance of the 3DGS model on the same GPU (denoted as 3DGS *) to ensure a fair comparison. In the test results of existing work [
7,
20,
21,
47], the commonly used Tanks & Temples datasets are partial sub-datasets, such as Train and Truck. Therefore, we only count the commonly used sub-datasets in
Table 1, but in
Table 2, we test more sub-datasets of Tanks & Temples to verify the performance of our model. Additionally, we compared our model with more advanced 3DGS compression algorithms. From
Table 1, it can be seen that while the rendering quality of our model is slightly lower than that of the advanced methods, it achieves the highest compression ratio. Model compression inherently leads to some degradation in reconstruction quality. However, considering the substantial compression achieved, these quality losses are acceptable within the scope of our research. The minor reduction in quality is a reasonable trade-off for the significant improvements in storage efficiency and computational resource savings.
Table 2 presents the experimental results for each sub-dataset of the Tanks & Temples dataset. The findings indicate that the compressed Gaussian model experiences a decline in visual metrics, with the most significant reductions observed in PSNR and LPIPS. The PSNR decreased by a maximum of 2.5 points, and the maximum decrease in LPIPS was 0.12. SSIM shows a slight decrease. Despite the degradation in visual quality, the model’s size is reduced by nearly 30 times on this dataset. In practical applications, the ease of deploying the model through compression often takes precedence overachieving high-precision reconstruction quality, particularly in scenarios with limited storage space and processing power. The balance between compression and quality retention is critical for the scalability and practicality of 3D scene reconstruction technologies in real-world settings.
Table 3 presents the experimental performance of the Mip-NeRF360 dataset. Compared to the Tanks & Temples dataset, each sub-dataset of Mip-NeRF360 contains high-resolution images from a greater number of viewpoints, resulting in a larger overall dataset size. The inclusion of multiple viewpoints in this dataset facilitates improved model training. Despite the larger size of the dataset, our method achieves a higher compression ratio in terms of model size, and the decline in reconstruction accuracy is less pronounced than that observed with the Tanks & Temples dataset. This partially demonstrates the significant impact that the number of viewpoints has on the reconstruction accuracy of our model.
Visual perception serves as a critical measure of reconstruction quality.
Figure 4 presents a comparison of the original images, those rendered by the original 3DGS model, and the renderings generated by our model. Our compressed model does not exhibit significant distortion or deformation in the overall structure, and the renderings retain satisfactory visual quality. However, in areas with dense details and subtle color variations, the renderings from our model appear somewhat blurred, as demonstrated by the depiction of grass in
Figure 4. The results of our analysis suggest that this phenomenon can be attributed to the compression strategy employed by our model, which adheres to a pruning principle and exhibits low tolerance for small Gaussian points during training. As a result, Gaussian points corresponding to areas with rapid color changes may be excessively pruned. This over-pruning of Gaussian points leads to a substantial degradation in reconstruction quality, which is reflected in the blurring and distortion observed in the rendered images. The garden dataset shown in
Figure 4 further highlights that our compression model requires refinement, particularly in rendering regions with significant gloss changes. Nevertheless, in terms of overall image rendering, our model still achieves commendable visual expression effects.
Figure 5 compares the visualization with the Lightgaussian method.
To further observe the compression effect of the number of Gaussian points and validate the effectiveness of our Gaussian point reduction module, we set the scale to a minimal value to present the image in a point cloud-like effect. As shown in
Figure 6, the original 3DGS method generates relatively dense Gaussian points, while our method significantly reduces the number of Gaussian points, making the overall representation sparser. However, by comparing the rendered results of the two models, although some degree of degradation is inevitably observed, particularly in the imaging of trees, the rendered results do not significantly affect the overall viewing experience. Therefore, our compression method proves to be an effective solution.
4.3. Ablation Study
4.3.1. Experiments of Main Process Module
To verify the effectiveness of each module, we conducted ablation experiments on four subsets of the Tanks & Temples dataset, with the average value of the results presented in
Table 4. Red stands for the Gaussian point number reduction module, which includes the opacity update strategy and proportional contribution pruning with learnable masks. KD represents the knowledge distillation module for reducing spherical harmonic coefficients, and VQ is the vector quantization module for compressing spherical harmonic coefficients. From the ablation experiments, we found that the red module, which reduces the number of Gaussian points, resulted in the largest decrease in metrics, especially in PSNR and LPIPS. Although this module caused the most significant drop in metrics, it also provided the most substantial compression, reducing the model size by nearly seven times compared to the baseline. The VQ module, representing vector quantization, also achieved notable compression while causing minimal visual metric decline, making it a highly effective step. By applying these modules cumulatively, our compression method achieves an extremely high compression ratio for the 3DGS model.
Figure 7 is a visualization of the ablation experiment of the main process modules. The dataset used is the Horse sub-dataset of Tanks & Temples. We use red boxes to frame two local areas on the complete original image and zoom in on the two areas in the renderings at different stages of the training process. We use the size of the 3DGS model as a benchmark. Our method compresses to 65% at the 15,000th iteration, 25% at the 30,000th iteration, 15% after knowledge distillation, and 5% after vector quantization. From the first group of images, we can see that after the model is compressed when rendering areas with dense changes such as leaves, the edges have obvious degradation. We further zoom in on the image, and from the second group of images, we can see that obvious degradation has occurred in the high-frequency texture area of the leaves. In addition, we can find that in the first 30,000 iterations, the main changes occur in the morphology of the Gaussian points. In the knowledge distillation and vector quantization stages, the main changes occur in the color attributes of the Gaussian points.
4.3.2. Experiments of Reduction Module
In the major module for reducing the number of Gaussian points, there are three submodules. To verify their effectiveness individually, we conducted a more detailed ablation study, with results shown in
Table 5. Here, OpaU represents the opacity update strategy, ConP stands for Gaussian proportional contribution pruning, and GMask denotes the learnable Gaussian mask. Interestingly, we observed that when these three submodules are combined, the model not only achieves effective compression but also significantly enhances the performance of scene representation. This improvement might indicate implicit positive coupling effects occurring between the submodules.
4.3.3. Experiments of Opacity Update Module
We visualized the trend in the number of Gaussian points before and after incorporating the opacity update strategy into the baseline 3D Gaussian Splatting model, as shown in
Figure 8. This figure presents the results from the flower subset of the nerf_LLFF_data dataset, demonstrating the positive impact of the opacity update strategy on pruning. In this experiment, we used contribution ratio pruning twice. With the addition of the opacity update strategy, the change in the number of Gaussian points becomes smoother.
In this regard, we analyze that while the opacity update part is more consistent with the principle of Gaussian point replication, it may result in a slightly lower overall opacity compared to the original 3DGS model. The introduction of a fixed decrement to prevent fluctuations in the Gaussian point count curve also helps accelerate the change in the opacity parameters across the entire model. Redundant Gaussian points can be quickly captured and masked, thereby achieving effective compression of the number of Gaussian points.
To further verify the opacity update strategy, we subtracted the opacity update strategy represented by OpaU from the final compression model and compared the experimental results when only ConP and GMask modules were effective. The results on the flower sub-dataset are shown in
Table 6. It can be seen from the table that the opacity update strategy has little impact on visual indicators but can effectively reduce the size of the model.
We conducted a series of comprehensive ablation experiments to demonstrate the impact of each module on visual indicators and demonstrated their respective performance on compression effects.
4.4. More Technical Details
The attribute parameters of Gaussian points are updated during the training process. For example, parameters such as the scale and rotation of the Gaussian points are adjusted. When an area undergoes over-reconstruction or under-reconstruction, the Gaussian points are cloned or split. As the iterations progress, parameters like scale and rotation evolve toward fitting the scene. However, this approach tends to be slower and can lead to the generation of more Gaussian points, resulting in excessive local Gaussian point density. The opacity parameter, which directly affects the rendering of the Gaussian points, plays a crucial role in controlling their impact on the scene. Even if the Gaussian points are large, a low opacity can minimize their influence on the rendered scene. Pruning based on opacity is an effective and efficient control mechanism. Therefore, we chose to add the code for manipulating opacity into the program.
By examining the 3DGS model section in
Figure 11, it is evident that, with the progression of training iterations, the visual metrics consistently improve and eventually stabilize. During this process, the model size will initially grow rapidly and then stabilize. In comparison with our model compression method, the visual metrics show an overall positive trend, although the curve is less smooth than that of the 3DGS method. It can be observed that because a contribution-based pruning occurred at the 20,000th iteration, the PSNR metric temporarily halted its upward trend, and the other two metrics experienced slight declines. However, with further iterations, these metrics gradually optimized and stabilized. In addition, we can also find that at the 20,000th iteration, the model size of our method is much smaller than that of the 3DGS model. Finally, under the effect of contribution ratio pruning, the model is compressed to 1/4 of the original size.
Table 7 provides a comparison of effect sizes for different Gaussian compression methods across four metrics: PSNR (Peak Signal-to-Noise Ratio), SSIM (Structural Similarity Index), LPIPS (Learned Perceptual Image Patch Similarity), and Size on the MIP-NeRF 360 datasets. The effect size is a measure that quantifies the magnitude of the difference between two groups. Cohen’s d is a measure of effect size used to indicate the standardized difference between two means to quantify the magnitude of an effect independent of sample size. This makes it particularly useful for comparing results across different methods. In the context of comparing different Gaussian compression methods, a large Cohen’s d effect size for size suggests a significant difference in the file sizes produced by the compared methods. Our method has a large positive effect size for file size compared to other methods (e.g., 3DGS [
7] vs. Ours: Size = 2.27), indicating that it produces significantly smaller file sizes, which is advantageous for storage and transmission efficiency.
Figure 12 presents a comparative analysis of different Gaussian compression methods using ANOVA (Analysis of Variance). The box plots illustrate the performance metrics of four methods: 3DGS [
7], Lightgaussian [
21], C3dgs [
20], and Ours. In the PSNR (Peak Signal-to-Noise Ratio) comparison, our method shows a slightly lower median value compared to the other three methods, indicating that it may have a higher noise level in the reconstructed images. However, the range of PSNR values for our method is relatively narrow, suggesting more consistent performance across different datasets. Lastly, in terms of size, since the compression rates of Lightgaussian [
21], C3dgs [
20], and ours are all high, if the indicators of the 3DGS method (average: 756 MB) are displayed in the figure, it will affect the visual display effects of the other three methods. Therefore, we only draw the visualization images of these three methods. Our method has the smallest median value, indicating that it produces the most compact representations among the four methods. This could be advantageous in scenarios where storage or transmission efficiency is critical.
Overall, while our method may not outperform the others in every metric, it offers a balance between performance and efficiency, making it a viable option depending on the specific requirements of the application.
4.5. Scalability Analysis for Real-World Applications
As a highly effective technique for scene reconstruction, 3DGS has great potential for various applications. If successfully implemented, it could have a significant impact in many fields, particularly in virtual reality (VR), augmented reality (AR), and digital scene modeling. However, many practical application scenarios, especially those involving resource-constrained platforms such as mobile devices (e.g., VR headsets), impose strict limitations on model size and computational resources. In these conditions, the large size and computational demands of the original 3DGS models often make them unsuitable for direct use on such devices.
To address this challenge, our proposed compression method effectively reduces the size of 3DGS models while preserving as much of their original quality and detail as possible. This compression approach not only solves the problem of excessive model size but also ensures the broad application of 3DGS technology on mobile devices and other resource-constrained platforms. It allows more users across various fields to benefit from this technology, promoting its adoption and practical use in real-world scenarios.
To assess the scalability of practical applications, we employ training time, frames per second (FPS), compressed model size, and memory usage as key performance indicators for statistical analysis. The results for several sub-datasets from the Tanks & Temples and Mip-NeRF 360 datasets are presented in
Table 8. The uncompressed model sizes range from 180 MB to 1430 MB; however, the training time for our model remains consistently within 40 min, with significant improvements observed in FPS. Notably, there appears to be a correlation between model training time and the initial size of the model. For instance, models with relatively large base sizes, such as the Bicycle and Garden datasets, exhibit a notable increase in training time. Nevertheless, this relationship is not absolute, as demonstrated by the stump dataset, where, despite the large base model size, the training time remains at a moderate level.
Figure 13 visualizes the variation in memory usage during the training process. To investigate the performance trends associated with increasing dataset size or complexity, we selected the Lighthouse, Family, and Bicycle datasets, with approximate sizes of 200 MB, 500 MB, and 1500 MB, respectively, and conducted testing on an NVIDIA RTX 3090. Our experiments revealed that the memory usage during the vector quantization phase remains fixed across all three datasets, consistently at 16,054 MB (equivalent to 15.6 GB), and thus is not shown in the figure.
Figure 13 presents only the memory usage during the Gaussian quantity compression module and the knowledge distillation module. The data indicates that there is no absolute linear relationship between model size and memory usage. From the trends observed in the line chart, memory usage fluctuates significantly between 15,000 and 30,000 iterations, peaking during the knowledge distillation phase. Based on our experimental findings, we conclude that our method requires at least 16 GB of GPU memory to run efficiently, while 25 GB of GPU memory is sufficient for the vast majority of cases.
5. Conclusions
We propose a method to highly compress 3D Gaussian models by a factor of 20–40. To reduce the number of Gaussian points, we combine proportional contribution pruning with learnable masks to prune away redundant Gaussian points. In the process, we demonstrate the theoretical and compression advantages of an integrated opacity update strategy. To address the problem of spherical harmonic coefficients taking up too much storage space, we use a teacher–student model framework and vector quantization to further compress the color representation parameters of the model. Experiments show that although the quality of the reconstructed scene is slightly reduced, our compression technique significantly reduces the model size and greatly reduces storage requirements. This work helps to promote the development and application of 3D scene reconstruction technology.
In our comparative experiments using the Tanks & Temples dataset, the compression ratios achieved by LightGaussian, Compressed3D, and Compact3DGS are 17×, 12×, and 11×, respectively. While these methods all achieve a certain degree of efficient compression, our approach surpasses them with a compression ratio of 35×. Although the reconstruction quality is slightly compromised, the compression ratio is significantly higher than that of the other advanced methods. Through extensive experimentation, our model achieves a maximum compression ratio of 60×, with a minimum compression ratio consistently maintained at 20×, and the overall compression effect falls within the range of 20× to 40×.
Building on the contributions of this study, one promising direction for future research is the adaptation of the proposed high-fold Gaussian compression method to dynamic scenes. For instance, integrating motion-aware strategies, such as 4D Gaussian Splatting, could enable real-time rendering and reconstruction of dynamic scenes with high fidelity. However, addressing the challenges of occlusion and depth variation in dynamic scenes may require innovations in the pruning and compression processes. Adapting opacity optimization and contribution-based pruning methods to accommodate temporal dynamics could further enhance the applicability of the method in real-world scenarios.