Three-Dimensional-Consistent Scene Inpainting via Uncertainty-Aware Neural Radiance Field
Abstract
:1. Introduction
- (1)
- The mask branch is introduced based on the radiance field. By leveraging the strong generalizability of the radiance field, it is possible to train 3D-consistent segmentation results from mask information that may contain errors. Using this branch for rendering background views enables the preservation of more real background information during the 2D inpainting process;
- (2)
- The uncertainty branch based on the normal distribution is innovatively used in the NeRF inpainting task. Every spatial point is modeled as a Gaussian distribution, and the uncertainty branch outputs the variance. Through minimizing the negative log-likelihood loss, it is possible to learn the visibility of spatial points in an unsupervised manner. This branch aids in identifying regions in the background views that require inpainting, thereby optimizing the mask used for 2D manipulation;
- (3)
- A new dynamic weight training strategy is proposed to further enhance the optimization effect by utilizing the uncertainty branch. During the inpainting training stage, the uncertainty branch is adopted to measure the 3D consistency of 2D inpainted views. Based on the variance output from this branch, the confidence of the sampled ray’s color is calculated and used as dynamic weights for both the color loss and adversarial loss. This approach achieves a balance between structure and texture in the inpainted regions of 3D scenes.
2. Related Work
2.1. Image Inpainting
2.2. NeRF Edit
2.3. Uncertainty Estimation
3. Method
3.1. Stage 1: Initial NeRF Training
3.1.1. TensoRF
3.1.2. Mask Branch
3.1.3. Uncertainty Branch
3.2. Optimization and 2D Inpainting
3.2.1. Mask Optimization
3.2.2. 2D Inpainting
3.3. Stage 2: Inpainted NeRF Training
3.3.1. Adversarial Optimization
3.3.2. Dynamic Weight
Algorithm 1 Inpainted NeRF trained using the dynamic weight strategy |
|
4. Experiments
4.1. Experimental Settings
4.1.1. Implement Details
4.1.2. Datasets
4.1.3. Metrics
4.1.4. Baselines
4.2. Results and Discussion
4.2.1. Mask Optimization
4.2.2. 3D-Consistent Scenes Inpainting
4.3. Ablation Studies
4.4. Limitations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
NeRF | Neural Radiance Field |
3D | 3 Dimensions |
2D | 2 Dimensions |
References
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Levoy, M. Display of surfaces from volume data. IEEE Comput. Graph. Appl. 1988, 8, 29–37. [Google Scholar] [CrossRef]
- Müller, T.; Evans, A.; Schied, C.; Keller, A. Instant neural graphics primitives with a multiresolution hash encoding. ACM Trans. Graph. (ToG) 2022, 41, 1–15. [Google Scholar] [CrossRef]
- Fridovich-Keil, S.; Yu, A.; Tancik, M.; Chen, Q.; Recht, B.; Kanazawa, A. Plenoxels: Radiance fields without neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5501–5510. [Google Scholar]
- Chen, A.; Xu, Z.; Geiger, A.; Yu, J.; Su, H. Tensorf: Tensorial radiance fields. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany; pp. 333–350. [Google Scholar]
- Yang, J.; Pavone, M.; Wang, Y. FreeNeRF: Improving Few-shot Neural Rendering with Free Frequency Regularization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 8254–8263. [Google Scholar]
- Jain, A.; Tancik, M.; Abbeel, P. Putting nerf on a diet: Semantically consistent few-shot view synthesis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5885–5894. [Google Scholar]
- Kuang, Z.; Luan, F.; Bi, S.; Shu, Z.; Wetzstein, G.; Sunkavalli, K. Palettenerf: Palette-based appearance editing of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20691–20700. [Google Scholar]
- Bao, C.; Zhang, Y.; Yang, B.; Fan, T.; Yang, Z.; Bao, H.; Zhang, G.; Cui, Z. Sine: Semantic-driven image-based nerf editing with prior-guided editing field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20919–20929. [Google Scholar]
- Fridovich-Keil, S.; Meanti, G.; Warburg, F.R.; Recht, B.; Kanazawa, A. K-planes: Explicit radiance fields in space, time, and appearance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 12479–12488. [Google Scholar]
- Liu, Y.L.; Gao, C.; Meuleman, A.; Tseng, H.Y.; Saraf, A.; Kim, C.; Chuang, Y.Y.; Kopf, J.; Huang, J.B. Robust dynamic radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 13–23. [Google Scholar]
- Haque, A.; Tancik, M.; Efros, A.A.; Holynski, A.; Kanazawa, A. Instruct-nerf2nerf: Editing 3d scenes with instructions. arXiv 2023, arXiv:2303.12789. [Google Scholar]
- Zhang, K.; Kolkin, N.; Bi, S.; Luan, F.; Xu, Z.; Shechtman, E.; Snavely, N. Arf: Artistic radiance fields. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany; pp. 717–733. [Google Scholar]
- Gong, B.; Wang, Y.; Han, X.; Dou, Q. RecolorNeRF: Layer Decomposed Radiance Field for Efficient Color Editing of 3D Scenes. arXiv 2023, arXiv:2301.07958. [Google Scholar]
- Yuan, Y.J.; Sun, Y.T.; Lai, Y.K.; Ma, Y.; Jia, R.; Gao, L. Nerf-editing: Geometry editing of neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 18353–18364. [Google Scholar]
- Goel, R.; Sirikonda, D.; Saini, S.; Narayanan, P. Interactive segmentation of radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 4201–4211. [Google Scholar]
- Liu, H.K.; Shen, I.; Chen, B.Y. NeRF-In: Free-form NeRF inpainting with RGB-D priors. arXiv 2022, arXiv:2206.04901. [Google Scholar]
- Mirzaei, A.; Aumentado-Armstrong, T.; Derpanis, K.G.; Kelly, J.; Brubaker, M.A.; Gilitschenski, I.; Levinshtein, A. SPIn-NeRF: Multiview segmentation and perceptual inpainting with neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 20669–20679. [Google Scholar]
- Weder, S.; Garcia-Hernando, G.; Monszpart, A.; Pollefeys, M.; Brostow, G.J.; Firman, M.; Vicente, S. Removing objects from neural radiance fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 16528–16538. [Google Scholar]
- Ballester, C.; Bertalmio, M.; Caselles, V.; Sapiro, G.; Verdera, J. Filling-in by joint interpolation of vector fields and gray levels. IEEE Trans. Image Process. 2001, 10, 1200–1211. [Google Scholar] [CrossRef]
- Li, K.; Wei, Y.; Yang, Z.; Wei, W. Image inpainting algorithm based on TV model and evolutionary algorithm. Soft Comput. 2016, 20, 885–893. [Google Scholar] [CrossRef]
- Elad, M.; Starck, J.L.; Querre, P.; Donoho, D.L. Simultaneous cartoon and texture image inpainting using morphological component analysis (MCA). Appl. Comput. Harmon. Anal. 2005, 19, 340–358. [Google Scholar] [CrossRef]
- Criminisi, A.; Pérez, P.; Toyama, K. Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 2004, 13, 1200–1212. [Google Scholar] [CrossRef] [PubMed]
- Wang, W.; Niu, L.; Zhang, J.; Yang, X.; Zhang, L. Dual-path image inpainting with auxiliary gan inversion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11421–11430. [Google Scholar]
- Liu, H.; Wan, Z.; Huang, W.; Song, Y.; Han, X.; Liao, J. Pd-gan: Probabilistic diverse gan for image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 9371–9381. [Google Scholar]
- Cao, C.; Fu, Y. Learning a sketch tensor space for image inpainting of man-made scenes. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 14509–14518. [Google Scholar]
- Dong, Q.; Cao, C.; Fu, Y. Incremental transformer structure enhanced image inpainting with masking positional encoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11358–11368. [Google Scholar]
- Yu, Y.; Zhan, F.; Lu, S.; Pan, J.; Ma, F.; Xie, X.; Miao, C. Wavefill: A wavelet-based generation network for image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 14114–14123. [Google Scholar]
- Suvorov, R.; Logacheva, E.; Mashikhin, A.; Remizova, A.; Ashukha, A.; Silvestrov, A.; Kong, N.; Goka, H.; Park, K.; Lempitsky, V. Resolution-robust large mask inpainting with fourier convolutions. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022; pp. 2149–2159. [Google Scholar]
- Lugmayr, A.; Danelljan, M.; Romero, A.; Yu, F.; Timofte, R.; Van Gool, L. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 11461–11471. [Google Scholar]
- Yen-Chen, L.; Florence, P.; Barron, J.T.; Lin, T.Y.; Rodriguez, A.; Isola, P. Nerf-supervision: Learning dense object descriptors from neural radiance fields. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 6496–6503. [Google Scholar]
- Viazovetskyi, Y.; Ivashkin, V.; Kashin, E. Stylegan2 distillation for feed-forward image manipulation. In Proceedings of the Computer Vision—ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XXII 16. Springer: Berlin/Heidelberg, Germany, 2020; pp. 170–186. [Google Scholar]
- Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
- Wang, P.; Liu, L.; Liu, Y.; Theobalt, C.; Komura, T.; Wang, W. Neus: Learning neural implicit surfaces by volume rendering for multi-view reconstruction. arXiv 2021, arXiv:2106.10689. [Google Scholar]
- Oechsle, M.; Peng, S.; Geiger, A. Unisurf: Unifying neural implicit surfaces and radiance fields for multi-view reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 5589–5599. [Google Scholar]
- Zhi, S.; Laidlow, T.; Leutenegger, S.; Davison, A.J. In-place scene labelling and understanding with implicit scene representation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 15838–15847. [Google Scholar]
- Denker, J.; LeCun, Y. Transforming neural-net output levels to probability distributions. Adv. Neural Inf. Process. Syst. 1990, 3. Available online: https://proceedings.neurips.cc/paper_files/paper/1990/hash/7eacb532570ff6858afd2723755ff790-Abstract.html (accessed on 15 January 2024).
- MacKay, D.J. A practical Bayesian framework for backpropagation networks. Neural Comput. 1992, 4, 448–472. [Google Scholar] [CrossRef]
- Graves, A. Practical variational inference for neural networks. Adv. Neural Inf. Process. Syst. 2011, 24. Available online: https://proceedings.neurips.cc/paper_files/paper/2011/hash/7eb3c8be3d411e8ebfab08eba5f49632-Abstract.html (accessed on 15 January 2024).
- Kendall, A.; Gal, Y. What uncertainties do we need in bayesian deep learning for computer vision? Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/hash/2650d6089a6d640c5e85b2b88265dc2b-Abstract.html (accessed on 15 January 2024).
- Martin-Brualla, R.; Radwan, N.; Sajjadi, M.S.; Barron, J.T.; Dosovitskiy, A.; Duckworth, D. Nerf in the wild: Neural radiance fields for unconstrained photo collections. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7210–7219. [Google Scholar]
- Pan, X.; Lai, Z.; Song, S.; Huang, G. Activenerf: Learning where to see with uncertainty estimation. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 230–246. [Google Scholar]
- Roessle, B.; Barron, J.T.; Mildenhall, B.; Srinivasan, P.P.; Nießner, M. Dense depth priors for neural radiance fields from sparse input views. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 12892–12901. [Google Scholar]
- Cheng, H.K.; Tai, Y.W.; Tang, C.K. Rethinking space-time networks with improved memory coverage for efficient video object segmentation. Adv. Neural Inf. Process. Syst. 2021, 34, 11781–11794. [Google Scholar]
- Lim, J.H.; Ye, J.C. Geometric gan. arXiv 2017, arXiv:1705.02894. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Ortiz-Cayon, R.; Kalantari, N.K.; Ramamoorthi, R.; Ng, R.; Kar, A. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. ACM Trans. Graph. (TOG) 2019, 38, 1–14. [Google Scholar] [CrossRef]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30. Available online: https://proceedings.neurips.cc/paper_files/paper/2017/hash/8a1d694707eb0fefe65871369074926d-Abstract.html (accessed on 15 January 2024).
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Xue, W.; Zhang, L.; Mou, X.; Bovik, A.C. Gradient magnitude similarity deviation: A highly efficient perceptual image quality index. IEEE Trans. Image Process. 2013, 23, 684–695. [Google Scholar] [CrossRef] [PubMed]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. Available online: https://proceedings.neurips.cc/paper/2012/hash/c399862d3b9d6b76c8436e924a68c45b-Abstract.html (accessed on 15 January 2024). [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Rother, C.; Kolmogorov, V.; Blake, A. Interactive foreground extraction using iterated graph cuts, 2004. ACM Trans. Graph. 2004, 23, 309–314. [Google Scholar] [CrossRef]
- Hao, Y.; Liu, Y.; Wu, Z.; Han, L.; Chen, Y.; Chen, G.; Chu, L.; Tang, S.; Yu, Z.; Chen, Z.; et al. Edgeflow: Achieving practical interactive segmentation with edge-guided flow. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1551–1560. [Google Scholar]
- Kobayashi, S.; Matsumoto, E.; Sitzmann, V. Decomposing nerf for editing via feature field distillation. Adv. Neural Inf. Process. Syst. 2022, 35, 23311–23330. [Google Scholar]
- Liang, Z.; Zhang, Q.; Feng, Y.; Shan, Y.; Jia, K. GS-IR: 3D Gaussian Splatting for Inverse Rendering. arXiv 2023, arXiv:2311.16473. [Google Scholar]
Method | Acc↑ | IoU↑ |
---|---|---|
Grabcut [53] | 91.45 | 48.51 |
Edgeflow [54] | 97.23 | 84.96 |
FFD [55] | 97.76 | 86.46 |
STCN [44] | 98.55 | 91.30 |
Our (with Single mask) | 98.36 | 98.17 |
Our (with STCN) | 99.21 | 93.66 |
Spin-NeRF Dataset | NeRF Object Removal Dataset | |||||||
---|---|---|---|---|---|---|---|---|
Mothed | LPIPS↓ | FID↓ | SSIM↑ | GSMD↓ | LPIPS↓ | FID↓ | SSIM↑ | GSMD↓ |
LaMa [29] | 0.0362 | 98.4 | 0.9452 | 0.0717 | 0.0483 | 90.3 | 0.9235 | 0.0837 |
MST-Inpaint [26] | 0.0549 | 147.7 | 0.9440 | 0.0791 | 0.0775 | 118.6 | 0.9250 | 0.1012 |
Masked NeRF [1] | 0.0612 | 210.2 | 0.9477 | 0.1084 | 0.0815 | 159.7 | 0.9341 | 0.1092 |
Inpainted NeRF [1] | 0.0554 | 141.8 | 0.9475 | 0.0947 | 0.0743 | 167.0 | 0.9268 | 0.1118 |
NeRF-In [17] | 0.0566 | 122.8 | 0.9481 | 0.0869 | 0.0727 | 103.5 | 0.9335 | 0.0933 |
Spin-NeRF [18] | 0.0365 | 118.9 | 0.9451 | 0.0770 | 0.0543 | 114.4 | 0.9269 | 0.1012 |
Ours | 0.0351 | 99.2 | 0.9480 | 0.0701 | 0.0480 | 81.7 | 0.9336 | 0.0788 |
Method | LPIPS↓ | FID↓ | SSIM↑ | GSMD↓ |
---|---|---|---|---|
Our | 0.0351 | 99.2 | 0.9480 | 0.0701 |
Our (w/o Mask Optimization) | 0.0363 | 103.3 | 0.9469 | 0.0730 |
Our (w/o ) | 0.0532 | 138.3 | 0.9478 | 0.0932 |
Our (w/o ) | 0.0426 | 110.6 | 0.9464 | 0.0756 |
Our (with Static Weights) | 0.0382 | 114.7 | 0.9477 | 0.0752 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, M.; Yu, Q.; Liu , H. Three-Dimensional-Consistent Scene Inpainting via Uncertainty-Aware Neural Radiance Field. Electronics 2024, 13, 448. https://doi.org/10.3390/electronics13020448
Wang M, Yu Q, Liu H. Three-Dimensional-Consistent Scene Inpainting via Uncertainty-Aware Neural Radiance Field. Electronics. 2024; 13(2):448. https://doi.org/10.3390/electronics13020448
Chicago/Turabian StyleWang, Meng, Qinkang Yu, and Haipeng Liu . 2024. "Three-Dimensional-Consistent Scene Inpainting via Uncertainty-Aware Neural Radiance Field" Electronics 13, no. 2: 448. https://doi.org/10.3390/electronics13020448
APA StyleWang, M., Yu, Q., & Liu , H. (2024). Three-Dimensional-Consistent Scene Inpainting via Uncertainty-Aware Neural Radiance Field. Electronics, 13(2), 448. https://doi.org/10.3390/electronics13020448