Res-NeuS: Deep Residuals and Neural Implicit Surface Learning for Multi-View Reconstruction
Abstract
:1. Introduction
2. Related Work
2.1. Multi-View Surface Reconstruction
2.2. Surface Rendering and Volume Rendering
2.3. Neural Implicit Surface Reconstruction
2.4. Improvements in and Drawbacks of Neural Implicit Surface Reconstruction
3. Background
3.1. NeRF and NeuS Preliminaries
3.2. View Dependent on Sparse Feature Bias
3.3. Color Weight Bias
3.4. Geometric Bias
4. Method
4.1. Appearance Embedding
4.2. Volume Rendering Interpolation and Color Weight Regularization
4.3. Geometric Constraints
4.3.1. Photometric Consistency Constraints
4.3.2. Point Constraints
4.4. Point Cloud Coarse Sampling
4.5. Loss Function
5. Experiments
5.1. Exp Setting
5.1.1. Dataset
5.1.2. Evaluation Metrics
5.1.3. Baselines
5.1.4. Implementation Details
5.2. Experimental Results
5.3. Ablation Study
6. Conclusions
7. Patents
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Hou, M.; Yang, S.; Hu, Y.; Wu, Y.; Jiang, L.; Zhao, S.; Wei, P. Novel Method for Virtual Restoration of Cultural Relics with Complex Geometric Structure Based on Multiscale Spatial Geometry. ISPRS Int. J. Geo-Inf. 2018, 7, 353. [Google Scholar] [CrossRef]
- Inzerillo, L.; Di Mino, G.; Roberts, R. Image-based 3D reconstruction using traditional and UAV datasets for analysis of road pavement distress. Autom. Constr. 2018, 96, 457–469. [Google Scholar] [CrossRef]
- Yu, D.; Ji, S.; Liu, J.; Wei, S. Automatic 3D building reconstruction from multi-view aerial images with deep learning. ISPRS J. Photogramm. Remote Sens. 2021, 171, 155–170. [Google Scholar] [CrossRef]
- Schönberger, J.L.; Frahm, J.M. Structure-from-Motion Revisited. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 4104–4113. [Google Scholar] [CrossRef]
- Furukawa, Y.; Ponce, J. Accurate, Dense, and Robust Multiview Stereopsis. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 32, 1362–1376. [Google Scholar] [CrossRef] [PubMed]
- Kutulakos, K.N.; Seitz, S.M. A theory of shape by space carving. In Proceedings of the Seventh IEEE International Conference on Computer Vision, Kerkyra, Greece, 20–27 September 1999; Volume 1, pp. 307–314. [Google Scholar] [CrossRef]
- Seitz, S.M.; Dyer, C.R. Photorealistic scene reconstruction by voxel coloring. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Juan, PR, USA, 17–19 June 1997; pp. 1067–1073. [Google Scholar] [CrossRef]
- Tola, E.; Strecha, C.; Fua, P. Efficient large-scale multi-view stereo for ultra high-resolution image sets. Mach. Vis. Appl. 2012, 23, 903–920. [Google Scholar] [CrossRef]
- Aanæs, H.; Jensen, R.R.; Vogiatzis, G.; Tola, E.; Dahl, A.B. Large-scale data for multiple-view stereopsis. Int. J. Comput. Vis. 2016, 120, 153–168. [Google Scholar] [CrossRef]
- Yao, Y.; Luo, Z.; Li, S.; Zhang, J.; Ren, Y.; Zhou, L.; Fang, T.; Quan, L. BlendedMVS: A large-scale dataset for generalized multi-view stereo networks. In Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1787–1796. [Google Scholar] [CrossRef]
- Park, J.J.; Florence, P.; Straub, J.; Newcombe, R.; Lovegrove, S. DeepSDF: Learning Continuous Signed Distance Functions for Shape Representation. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 165–174. [Google Scholar] [CrossRef]
- Yariv, L.; Kasten, Y.; Moran, D.; Galun, M.; Atzmon, M.; Ronen, B.; Lipman, Y. Multiview neural surface reconstruction by disentangling geometry and appearance. Adv. Neural Inf. Process. Syst. 2020, 33, 2492–2502. [Google Scholar] [CrossRef]
- Zhang, K.; Luan, F.; Wang, Q.; Bala, K.; Snavely, N. PhySG: Inverse Rendering with Spherical Gaussians for Physics-based Material Editing and Relighting. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 5449–5458. [Google Scholar] [CrossRef]
- Kellnhofer, P.; Jebe, L.C.; Jones, A.; Spicer, R.; Pulli, K.; Wetzstein, G. Neural Lumigraph Rendering. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 4285–4295. [Google Scholar] [CrossRef]
- Liu, S.; Zhang, Y.; Peng, S.; Shi, B.; Pollefeys, M.; Cui, Z. DIST: Rendering Deep Implicit Signed Distance Function with Differentiable Sphere Tracing. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2016–2025. [Google Scholar] [CrossRef]
- Tancik, M.; Casser, V.; Yan, X.; Pradhan, S.; Mildenhall, B.P.; Srinivasan, P.; Barron, J.T.; Kretzschmar, H. Block-NeRF: Scalable Large Scene Neural View Synthesis. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 8238–8248. [Google Scholar] [CrossRef]
- Sun, C.; Sun, M.; Chen, H.T. Direct Voxel Grid Optimization: Super-fast Convergence for Radiance Fields Reconstruction. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 5449–5459. [Google Scholar] [CrossRef]
- Dong, Y.; Che, H.; Leung, M.-F.; Liu, C.; Yan, Z. Centric graph regularized log-norm sparse non-negative matrix factorization for multi-view clustering. Signal Process. 2024, 217, 109341. [Google Scholar] [CrossRef]
- Liu, C.; Li, R.; Wu, S.; Che, H.; Jiang, D.; Yu, Z.; Wong, H.S. Self-Guided Partial Graph Propagation for Incomplete Multiview Clustering. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–14. [Google Scholar] [CrossRef] [PubMed]
- Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision; Cambridge University Press: Cambridge, UK, 2003. [Google Scholar] [CrossRef]
- Zhang, J.; Yao, Y.; Quan, L. Learning Signed Distance Field for Multi-view Surface Reconstruction. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Nashville, TN, USA, 20–25 June 2021; pp. 6505–6514. [Google Scholar] [CrossRef]
- Zhang, J.; Yao, Y.; Li, S.; Fang, T.; McKinnon, D.; Tsin, Y.; Quan, L. Critical Regularizations for Neural Surface Reconstruction in the Wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 6260–6269. [Google Scholar] [CrossRef]
- Niemeyer, M.; Mescheder, L.; Oechsle, M.; Geiger, A. Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 3501–3512. [Google Scholar] [CrossRef]
- Wang, P.; Liu, L.; Liu, Y.; Theobalt, C.; Komura, T.; Wang, W. NeuS: Learning Neural Implicit Surfaces by Volume Rendering for Multi-view Reconstruction. arXiv 2021, arXiv:2106.10689. [Google Scholar] [CrossRef]
- Sun, J.; Chen, X.; Wang, Q.; Li, Z.; Averbuch-Elor, H.; Zhou, X.; Snavely, N. Neural 3d reconstruction in the wild. In Proceedings of the ACM SIGGRAPH 2022 Conference Proceedings, Vancouver, BC, Canada, 8–11 August 2022; pp. 1–9. [Google Scholar] [CrossRef]
- Long, X.; Lin, C.; Wang, P.; Komura, T.; Wang, W. Sparseneus: Fast generalizable neural surface reconstruction from sparse views. Eur. Conf. Comput. Vis. 2022, 210–227. [Google Scholar] [CrossRef]
- Fu, Q.; Xu, Q.; Ong, Y.S.; Tao, W. Geo-neus: Geometry-consistent neural implicit surfaces learning for multi-view reconstruction. Adv. Neural Inf. Process. Syst. 2022, 35, 3403–3416. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. Commun. ACM 2021, 65, 99–106. [Google Scholar] [CrossRef]
- Li, Z.; Müller, T.; Evans, A.; Taylor, R.H.; Unberath, M.; Liu, M.-Y.; Lin, C.-H. Neuralangelo: High-Fidelity Neural Surface Reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 8456–8465. [Google Scholar]
- Mescheder, L.; Oechsle, M.; Niemeyer, M.; Nowozin, S.; Geiger, A. Occupancy Networks: Learning 3D Reconstruction in Function Space. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4455–4465. [Google Scholar] [CrossRef]
- Michalkiewicz, M.; Pontes, J.K.; Jack, D.; Baktashmotlagh, M.; Eriksson, A. Implicit Surface Representations as Layers in Neural Networks. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Long Beach, CA, USA, 15–20 June 2019; pp. 4742–4751. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, H. Learning Implicit Fields for Generative Shape Modeling. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5932–5941. [Google Scholar] [CrossRef]
- Atzmon, M.; Lipman, Y. SAL: Sign Agnostic Learning of Shapes from Raw Data. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 2562–2571. [Google Scholar] [CrossRef]
- Gropp, A.; Yariv, L.; Haim, N.; Atzmon, M.; Lipman, Y. Implicit geometric regularization for learning shapes. arXiv 2020, arXiv:2002.10099. [Google Scholar] [CrossRef]
- Yifan, W.; Wu, S.; Öztireli, C.; Sorkine-Hornung, O. Iso-Points: Optimizing Neural Implicit Surfaces with Hybrid Representations. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 374–383. [Google Scholar] [CrossRef]
- Peng, S.; Niemeyer, M.; Mescheder, L.; Pollefeys, M.; Geiger, A. Convolutional occupancy networks. In Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part III 16; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 523–540. [Google Scholar] [CrossRef]
- Oechsle, M.; Peng, S.; Geiger, A. UNISURF: Unifying Neural Implicit Surfaces and Radiance Fields for Multi-View Reconstruction. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Nashville, TN, USA, 20–25 June 2021; pp. 5569–5579. [Google Scholar] [CrossRef]
- Yariv, L.; Gu, J.; Kasten, Y.; Lipman, Y. Volume rendering of neural implicit surfaces. Adv. Neural Inf. Process. Syst. 2021, 34, 4805–4815. [Google Scholar] [CrossRef]
- Wu, Y.; Zou, Z.; Shi, Z. Remote Sensing Novel View Synthesis With Implicit Multiplane Representations. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–13. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Martin-Brualla, R.; Radwan, N.; Sajjadi, M.S.M.; Barron, J.T.; Dosovitskiy, A.; Duckworth, D. NeRF in the Wild: Neural Radiance Fields for Unconstrained Photo Collections. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 7206–7215. [Google Scholar] [CrossRef]
- Shamsipour, G.; Fekri-Ershad, S.; Sharifi, M.; Alaei, A. Improve the efficiency of handcrafted features in image retrieval by adding selected feature generating layers of deep convolutional neural networks. Signal Image Video Process. 2024, 1–14. [Google Scholar] [CrossRef]
- Huang, G.; Liu, Z.; Laurens, V.D.M.; Weinberger, K.Q. Densely Connected Convolutional Networks. IEEE Comput. Soc. 2016, 4700–4708. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.-C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 15–20 June 2019; pp. 1314–1324. [Google Scholar]
- Galliani, S.; Lasinger, K.; Schindler, K. Massively Parallel Multiview Stereopsis by Surface Normal Diffusion. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 873–881. [Google Scholar] [CrossRef]
Multi-View 3D Reconstruction | Merit | Limitation | ||
---|---|---|---|---|
Explicit representation | Point clouds | Handles arbitrary topologies | Discrete, limited resolution | |
Depth map | ||||
Mesh | ||||
Implicit representation | Signed Distance function | Continuity; high spatial resolution | Three-dimensional structures are obtained by rendering | |
Occupancy field | ||||
Neural networks encode implicit functions | Surface rendering | Continuity; high spatial resolution | With mask supervision | |
Volume rendering | Continuity, high spatial resolution; without mask supervision | Large-scale scene geometry blurring, color, and weight bias |
Layer Name | Output Size | ResNet-50 (50-Layer) |
---|---|---|
Conv1 | 112 × 112 | 7 × 7, 64, stride 2 |
Conv2_x | 56 × 56 | 3 × 3 max pool, stride 2 |
Conv3_x | 28 × 28 | |
Conv4_x | 14 × 14 | |
Conv5_x | 7 × 7 | |
1 × 1 | Average pool, 256-d fc, SoftMax |
Method | Distance | ||
---|---|---|---|
Acc↓ | Comp↓ | Overall↓ | |
NeuS | 0.215 | 0.00128 | 0.108 |
Appearance Embedding | 0.176 | 0.00108 | 0.089 |
Method | PSNR↑ | SSIM↑ | LPIPS↓ |
---|---|---|---|
NeuS | 28.24 | 0.909 | 0.077 |
Appearance Embedding | 33.99 | 0.982 | 0.039 |
Method | NeuS | COLMAP | Ours |
---|---|---|---|
Scene | |||
scene 3 | 0.125 | 0.119 | 0.117 |
scene 4 | 0.00128 | 0.0425 | 0.000810 |
scene 5 | 0.297 | 1.75 | 0.0877 |
scene 6 | 1.18 | 0.165 | 0.581 |
scene 7 | 3.475 | 0.385 | 0.469 |
scene 8 | 0.285 | 0.242 | 0.237 |
scene 9 | 2.11 | 1.89 | 1.84 |
scene 10 | 0.425 | 0.87 | 0.421 |
scene 11 | 1.32 | 0.692 | 0.510 |
scene 12 | 0.990 | 0.921 | 0.102 |
scene 13 | 0.121 | 0.234 | 0.061 |
scene 14 | 0.540 | 0.370 | 0.319 |
scene 15 | 1.306 | 0.454 | 0.311 |
scene 16 | 0.213 | 0.232 | 0.120 |
mean | 0.884 | 0.597 | 0.369 |
Method | Appearance | Weight Constraints | Geometric Constraints | Comp↓ |
---|---|---|---|---|
Baseline | 0.2134 | |||
Model-A | ✓ | ✓ | 0.1725 | |
Model-B | ✓ | ✓ | 0.1289 | |
Model-C | ✓ | ✓ | 0.1275 | |
Ours | ✓ | ✓ | ✓ | 0.1203 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, W.; Gao, F.; Shen, Y. Res-NeuS: Deep Residuals and Neural Implicit Surface Learning for Multi-View Reconstruction. Sensors 2024, 24, 881. https://doi.org/10.3390/s24030881
Wang W, Gao F, Shen Y. Res-NeuS: Deep Residuals and Neural Implicit Surface Learning for Multi-View Reconstruction. Sensors. 2024; 24(3):881. https://doi.org/10.3390/s24030881
Chicago/Turabian StyleWang, Wei, Fengjiao Gao, and Yongliang Shen. 2024. "Res-NeuS: Deep Residuals and Neural Implicit Surface Learning for Multi-View Reconstruction" Sensors 24, no. 3: 881. https://doi.org/10.3390/s24030881
APA StyleWang, W., Gao, F., & Shen, Y. (2024). Res-NeuS: Deep Residuals and Neural Implicit Surface Learning for Multi-View Reconstruction. Sensors, 24(3), 881. https://doi.org/10.3390/s24030881