4.1. Evaluation
We conducted a rigorous validation of our method through both qualitative visualizations and quantitative analyses. Our validations focused primarily on the claimed contributions of the proposed differentiable renderer and reconstruction framework. As the accuracy of the estimated gradients and computational performance are critical metrics for evaluating a differentiable renderer, we first validated these two aspects. Subsequently, we conducted ablation studies to assess the reconstruction capabilities of our differentiable renderer and the efficacy of the transport constraints. Furthermore, given the significant impact of the estimated velocity on reconstructed fluid motions, we analyzed comparisons between our velocity estimates and the ground-truth data. Finally, we compared our approach against previous methods through both qualitative visualizations and quantitative efficiency metrics.
Validation of gradient computation. To validate the effectiveness of our proposed differentiable renderer, we optimized a density field and analyzed the evolution of the calculated derivatives. The initial density configuration was in the shape of a rabbit, which was optimized by updating the density field based on a target smoke image. The image sequence in
Figure 3 shows the images derived throughout the optimization iterations.
Initially, the density inside the rabbit shape needed to decrease, while the density in the smoke region had to increase to match the target. Since the smoke density was thinner than the rabbit density, the derivatives for the smoke density diminished faster, disappearing after around 30 iterations. In contrast, significant derivatives for the rabbit density persisted from iterations 0 to 50. By iteration 99, the optimization had converged to the target image. Overall,
Figure 3 demonstrates that the density field derivatives followed the expected variations throughout the optimization process.
Performance validation of our differentiable renderer. Since our differentiable renderer significantly differed from previous methods based on differential volumetric path tracing in implementation, different factors impacted its performance. Previous physics-based differentiable renderers heavily relied on the Monte Carlo method to estimate the radiance and its derivatives. Hence, these methods’ performance was affected by the number of samplings. Unlike these methods, the volumetric data size significantly impacted our renderer’s performance. To characterize this relationship, validations were performed on the rabbit scene in
Figure 3 under four resolutions:
,
,
, and
voxels.
Table 1 presents the average runtime and GPU memory usage across 100 iterative optimizations at different volumetric data resolutions. As anticipated, the runtime of our renderer demonstrated an increasing trend with higher-resolution volumetric data. The recorded runtimes for the resolutions of
,
,
, and
voxels were 9.4 ms, 17.6 ms, 82.4 ms, and 545.1 ms, respectively. Despite the increase in runtime, our method maintained real-time performance even at higher resolutions, which could be crucial for interactive applications and dynamic scenes.
The memory usage of our differentiable renderer was another essential aspect of its performance. Similar to the runtime analysis, we observed a direct correlation between the resolution of the volumetric data and the GPU memory usage. As the resolution increased, so did the memory requirement. The memory usages for the resolutions , , , and voxels were 612 MB, 648 MB, 851 MB, and 2813 MB, respectively. This increase was consistent with the greater amount of data that needs to be processed and stored in higher-resolution volumetric data. Despite this growth, our method remained memory-efficient even at the highest resolution, which was a testament to its practical viability for a wide range of applications.
Evaluation of density reconstruction. To evaluate the reconstruction accuracy of our proposed method, we conducted an experiment using synthetic fluid motion data. The ground-truth fluid flows were generated using the
Mantaflow simulator [
38]. The density fields were then rendered into a monocular video sequence using
Mitsuba 3 [
39]. This synthetic video served as the target for guiding the reconstruction using our approach. To enhance the shape constraints, we re-used the monocular video to constrain the reconstruction from orthogonal angles (e.g., angle
and
). As shown in
Figure 4, we conducted an ablation experiment to study the effectiveness of the proposed
coupled density and velocity estimation component. Qualitatively, the physically constrained result in
Figure 4c more faithfully captured the true fluid evolution compared to the unconstrained reconstruction in
Figure 4b. The results demonstrated that our proposed framework could accurately reconstruct fluid motions from monocular video while maintaining physically consistent temporal evolutions.
Evaluation of the velocity estimation. The evaluation of the velocity estimation is of utmost importance in reconstructing fluid motions, as it significantly impacts the reconstruction quality and temporal consistency of fluid motions. We compared estimated velocity fields obtained from reconstructing
Figure 4 with the ground truth. As
Figure 5 shows, the estimated velocity was somewhat minimal at the base because no specialized processing was implemented for the inflow zones. Despite the inherent challenges in precisely estimating inflow velocity from only a monocular video and transport constraints, the overall velocity field could still be coarsely estimated to a reasonable degree. These findings validated the feasibility of estimating fluid velocities from limited visual information while preserving efficiency. In applications where computational efficiency is not the primary concern, one potential area of investigation could be devising techniques to better model the inflow velocity profile, which would likely improve estimated accuracy.
Comparison to previous work. While we demonstrated our method’s fluid motion reconstruction capabilities above, further comparative evaluations were essential to fully validate its abilities against previous approaches. To this end, we conducted fluid motion reconstruction from the same initial configuration using both our method and Franz et al.’s approach [
5]. Our method followed similar overall workflows to Franz et al.’s approach but had key differences in the following critical aspects: First, we proposed an efficient differentiable renderer for participating media, which was successfully integrated into our framework. Second, our method utilized an improved joint density and velocity estimation strategy. These improvements aimed to enable efficient and temporally consistent fluid motion reconstruction. As depicted in
Figure 6, both our method and Franz et al.’s method achieved a high similarity between visualizations and the ground truth. This demonstrated that our framework had a comparable fluid motion reconstruction capability to Franz et al.’s method.
Furthermore, run-time efficiency is an important indicator to evaluate a reconstruction framework. To validate the efficiency of our proposed framework, we compared the average runtime of our method against Franz et al.’s method for reconstructing fluid densities from video inputs. As shown in
Table 2, we evaluated three volumetric resolutions of 64 × 96 × 64, 128 × 192 × 128, and 256 × 288 × 256 voxels.
For the 64 × 96 × 64 resolution, our complete reconstruction time per frame was 6.13 s, over 59× faster than Franz et al.’s 363 s. More significantly, our differentiable rendering time was just 0.08 s, demonstrating a speedup of over 870× compared to their 69.6 s for this key computational stage.
For the higher resolutions of 128 × 192 × 128 and 256 × 288 × 256, our method outperformed Franz et al.’s by factors of approximately 172 and 173, respectively, in terms of average reconstruction time. Furthermore, our differentiable rendering stage exhibited speedup factors of around 2655 and 1676 compared to Franz et al.’s methodology.