Neural Rendering-Based 3D Scene Style Transfer Method via Semantic Understanding Using a Single Style Image
Abstract
:1. Introduction
- (1)
- The proposed method facilitates a 3D scene style transfer that considers semantic style using only a single style image. Initially, this method learns 3D geometry using neural radiance fields, followed by learning the characteristics of a single style image. This strategy ensures structural alignment within a 3D scene while considering the semantic style.
- (2)
- The proposed method incorporates a semantic nearest-neighbor feature-matching technique, thereby enabling semantic style transfer. By formulating a semantic color guidance image through a k-means-based semantic color guidance image generation module and simultaneously learning the characteristics of the corresponding image and style images, the technique can be adapted to match the style of each object, thus enabling semantic style transfer.
2. Related Work
2.1. Style Transfer of 2D Images
2.2. Style Transfer of Videos
2.3. 3D Style Transfer
3. Proposed Method
3.1. Geometry Neural Radiance Fields Training Stage
3.2. Style Neural Radiance Fields Training Stage
3.2.1. Semantic Color Guidance Image Generation
3.2.2. Semantic Nearest-Neighbor Feature Matching
4. Experiments
4.1. Experimental Setup
4.1.1. Datasets
4.1.2. Evaluation Methods
- (1)
- 3D consistency evaluation:
- (a)
- Short-range consistency: This aspect focused on evaluating the consistency between nearby novel views, calculated at 1-frame intervals.
- (b)
- Long-range consistency: The consistency between faraway novel views was assessed and calculated at 5-frame intervals.
- (2)
- Style consistency evaluation:
- (3)
- Semantic consistency evaluation:
4.2. Experimental Results
4.2.1. 3D-View Consistency Results and Comparison
4.2.2. Style Consistency Results and Comparison
4.2.3. Semantic Consistency Results and Comparison
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ayush, T.; Thies, J.; Mildenhall, B.; Srinivasan, P.; Tretschk, E.; Yifan, W.; Lassner, C.; Sitzmann, V.; Martin-Brualla, R.; Lombardi, S.; et al. Advances in neural rendering. Comput. Graph. Forum 2022, 41, 703–735. [Google Scholar]
- Xie, Y.; Takikawa, T.; Saito, S.; Litany, O.; Yan, S.; Khan, N.; Tombari, F.; Tompkin, J.; Sitzmann, V.; Sridhar, S. Neural Fields in Visual Computing and Beyond. Comput. Graph. Forum 2022, 41, 641–676. [Google Scholar] [CrossRef]
- Huang, X.; Liu, M.-Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 172–189. [Google Scholar]
- Park, T.; Efros, A.A.; Zhang, R.; Zhu, J.-Y. Contrastive learning for unpaired image-to-image translation. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020. [Google Scholar]
- Fabio, P.; Cerri, P.; de Charette, R. CoMoGAN: Continuous model-guided image-to-image translation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Richter, S.R.; Al Haija, H.A.; Koltun, V. Enhancing photorealism enhancement. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 1700–1715. [Google Scholar] [CrossRef] [PubMed]
- Park, J.; Choi, T.H.; Cho, K. Horizon Targeted Loss-Based Diverse Realistic Marine Image Generation Method Using a Multimodal Style Transfer Network for Training Autonomous Vessels. Appl. Sci. 2022, 12, 1253. [Google Scholar] [CrossRef]
- Huang, X.; Belongie, S. Arbitrary style transfer in real-time with adaptive instance normalization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Li, Y.; Fang, C.; Yang, J.; Wang, Z.; Lu, X.; Yang, M.-H. Universal style transfer via feature transforms. In Advances in Neural Information Processing Systems (NeurIPS); The MIT Press: Long Beach, CA, USA, 2017. [Google Scholar]
- Li, X.; Liu, S.; Kautz, J.; Yang, M.-H. Learning linear transformations for fast arbitrary style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Sturm, P.; Triggs, B. A factorization based algorithm for multi-image projective structure and motion. In European Conference on Computer Vision (ECCV); Springer: Berlin/Heidelberg, Germany, 1996. [Google Scholar]
- Wang, W.; Yang, S.; Xu, J.; Liu, J. Consistent video style transfer via relaxation and regularization. IEEE Trans. Image Process. 2020, 29, 9125–9139. [Google Scholar] [CrossRef] [PubMed]
- Deng, Y.; Tang, F.; Dong, W.; Huang, H.; Xu, C. Arbitrary video style transfer via multi-channel correlation. AAAI 2021, 35, 1210–1217. [Google Scholar] [CrossRef]
- Nguyen-Phuoc, T.; Liu, F.; Xiao, L. Snerf: Stylized neural implicit representations for 3d scenes. arXiv 2022, arXiv:2207.02363. [Google Scholar] [CrossRef]
- Chiang, P.Z.; Tsai, M.S.; Tseng, H.Y.; Lai, W.S.; Chiu, W.C. Stylizing 3d scene via implicit representation and hypernetwork. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–8 January 2022. [Google Scholar]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2414–2423. [Google Scholar]
- Chen, T.Q.; Schmidt, M.W. Fast patch-based style transfer of arbitrary style. arXiv 2016, arXiv:1612.04337. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherland, 8–16 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 694–711. [Google Scholar]
- Karras, T.; Laine, S.; Aila, T. A style-based generator architecture for generative adversarial networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410. [Google Scholar]
- Kolkin, N.; Kucera, M.; Paris, S.; Sykora, D.; Shechtman, E.; Shakhnarovich, G. Neural neighbor style transfer. arXiv 2022, arXiv:2203.13215. [Google Scholar]
- Li, Y.; Liu, M.-Y.; Li, X.; Yang, M.-H.; Kautz, J. A closed-form solution to photorealistic image stylization. In ECCV; Springer: Berlin/Heidelberg, Germany, 2018; pp. 453–468. [Google Scholar]
- Xia, X.; Zhang, M.; Xue, T.; Sun, Z.; Fang, H.; Kulis, B.; Chen, J. Joint bilateral learning for real-time universal photorealistic style transfer. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 327–342. [Google Scholar]
- Xia, X.; Xue, T.; Lai, W.S.; Sun, Z.; Chang, A.; Kulis, B.; Chen, J. Real-time localized photorealistic video style transfer. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 1089–1098. [Google Scholar]
- Luan, F.; Paris, S.; Shechtman, E.; Bala, K. Deep photo style transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4990–4998. [Google Scholar]
- Risser, E.; Wilmot, P.; Barnes, C. Stable and controllable neural texture synthesis and style transfer using histogram losses. arXiv 2017, arXiv:1701.08893. [Google Scholar]
- Li, Y.; Fang, C.; Yang, J.; Wang, Z.; Lu, X.; Yang, M.H. Diversified texture synthesis with feed-forward networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3920–3928. [Google Scholar]
- Heitz, E.; Vanhoey, K.; Chambon, T.; Belcour, L. A sliced wasserstein loss for neural texture synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 9412–9420. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Chen, D.; Liao, J.; Yuan, L.; Yu, N.; Hua, G. Coherent online video style transfer. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 1105–1114. [Google Scholar]
- Ruder, M.; Dosovitskiy, A.; Brox, T. Artistic Style Transfer for Videos and Spherical Images. Int. J. Comput. Vis. 2018, 126, 1199–1219. [Google Scholar] [CrossRef] [Green Version]
- Wu, Z.; Zhu, Z.; Du, J.; Bai, X. Ccpl: Contrastive coherence preserving loss for versatile style transfer. arXiv 2022, arXiv:2207.04808. [Google Scholar]
- Huang, H.; Wang, H.; Luo, W.; Ma, L.; Jiang, W.; Zhu, X.; Li, Z.; Liu, W. Realtime neural style transfer for videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 783–791. [Google Scholar]
- Liu, S.; Lin, T.; He, D.; Li, F.; Wang, M.; Li, X.; Sun, Z.; Li, Q.; Ding, E. Adaattn: Revisit attention mechanism in arbitrary neural style transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 6649–6658. [Google Scholar]
- Yin, K.; Gao, J.; Shugrina, M.; Khamis, S.; Fidler, S. 3dstylenet: Creating 3d shapes with geometric and texture style variations. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 12456–12465. [Google Scholar]
- Michel, O.; Bar-On, R.; Liu, R.; Benaim, S.; Hanocka, R. Text2mesh: Text-driven neural stylization for meshes. arXiv 2021, arXiv:2112.03221. [Google Scholar]
- Huang, H.P.; Tseng, H.Y.; Saini, S.; Singh, M.; Yang, M.H. Learning to stylize novel views. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 13869–13878. [Google Scholar]
- Mu, F.; Wang, J.; Wu, Y.; Li, Y. 3d photo stylization: Learning to generate stylized novel views from a single image. arXiv 2021, arXiv:2112.00169. [Google Scholar]
- Mildenhall, B.; Srinivasan, P.P.; Tancik, M.; Barron, J.T.; Ramamoorthi, R.; Ng, R. Nerf: Representing scenes as neural radiance fields for view synthesis. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2020; pp. 405–421. [Google Scholar]
- Liu, L.; Gu, J.; Zaw Lin, K.; Chua, T.S.; Theobalt, C. Neural sparse voxel fields. Adv. Neural Inf. Process. Syst. 2020, 33, 15651–15663. [Google Scholar]
- Yu, A.; Ye, V.; Tancik, M.; Kanazawa, A. pixelnerf: Neural radiance fields from one or few images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 4578–4587. [Google Scholar]
- Zhang, K.; Riegler, G.; Snavely, N.; Koltun, V. Nerf++: Analyzing and improving neural radiance fields. arXiv 2020, arXiv:2010.07492. [Google Scholar]
- Chen, A.; Xu, Z.; Zhao, F.; Zhang, X.; Xiang, F.; Yu, J.; Su, H. MVSNeRF: Fast generalizable radiance field reconstruction from multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 14124–14133. [Google Scholar]
- Zhang, K.; Kolkin, N.; Bi, S.; Luan, F.; Xu, Z.; Shechtman, E.; Snavely, N. Arf: Artistic radiance fields. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Springer Nature: Cham, Switzerland, 2022; Proceedings, Part XXXI. [Google Scholar]
- Liu, K.; Zhan, F.; Chen, Y.; Zhang, J.; Yu, Y.; El Saddik, A.; Xing, E.P. StyleRF: Zero-shot 3D Style Transfer of Neural Radiance Fields. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
- Zhang, Y.; He, Z.; Xing, J.; Yao, X.; Jia, J. Ref-NPR: Reference-Based Non-Photorealistic Radiance Fields for Controllable Scene Stylization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
- Zhang, S.; Peng, S.; Chen, T.; Mou, L.; Lin, H.; Yu, K.; Zhou, X. Painting 3D Nature in 2D: View Synthesis of Natural Scenes from a Single Semantic Mask. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
- Xu, S.; Li, L.; Shen, L.; Lian, Z. DeSRF: Deformable Stylized Radiance Field. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023. [Google Scholar]
- Jain, J.; Li, J.; Chiu, M.; Hassani, A.; Orlov, N.; Shi, H. OneFormer: One Transformer to Rule Universal Image Segmentation. arXiv 2022, arXiv:2211.06220. [Google Scholar]
- Porter, T.; Duff, T. Compositing digital images. In Proceedings of the 11th Annual Conference on Computer Graphics and Interactive Techniques, Minneapolis, MN, USA, 23–27 July 1984; pp. 253–259. [Google Scholar]
- Straub, J.; Whelan, T.; Ma, L.; Chen, Y.; Wijmans, E.; Green, S.; Newcombe, R. The Replica dataset: A digital replica of indoor spaces. arXiv 2019, arXiv:1906.05797. [Google Scholar]
- Fu, H.; Cai, B.; Gao, L.; Zhang, L.X.; Wang, J.; Li, C.; Zhang, H. 3d-front: 3d furnished rooms with layouts and semantics. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021. [Google Scholar]
- Knapitsch, A.; Park, J.; Zhou, Q.Y.; Koltun, V. Tanks and temples: Benchmarking large-scale scene reconstruction. ACM Trans. Graph. (ToG) 2017, 36, 1–13. [Google Scholar] [CrossRef]
- Lai, W.S.; Huang, J.B.; Wang, O.; Shechtman, E.; Yumer, E.; Yang, M.H. Learning blind video temporal consistency. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Park, T.; Liu, M.Y.; Wang, T.C.; Zhu, J.Y. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Wang, Z.; Zhang, Z.; Zhao, L.; Zuo, Z.; Li, A.; Xing, W.; Lu, D. AesUST: Towards aesthetic-enhanced universal style transfer. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022. [Google Scholar]
Data | AdaIN [8] | WCT [9] | AreUST [56] | ReReVST [12] | MCCNet [13] | ARF [43] | Ours |
---|---|---|---|---|---|---|---|
Room 0 | 1.3610 | 2.6471 | 1.5472 | 1.4704 | 1.4590 | 0.2054 | 0.2697 |
Room 1 | 1.7866 | 2.6240 | 1.8340 | 1.5739 | 1.2554 | 0.3720 | 0.2224 |
Office 2 | 1.5438 | 2.6627 | 1.6636 | 1.4346 | 1.2954 | 0.2832 | 0.2802 |
Room 0044 | 1.7568 | 2.5653 | 1.6444 | 1.5665 | 1.3563 | 0.3254 | 0.2658 |
Room 1013 | 1.5669 | 2.3547 | 1.5335 | 1.4899 | 1.3464 | 0.3865 | 0.2654 |
Playground | 1.8864 | 2.5182 | 1.9192 | 1.8338 | 1.1646 | 0.3011 | 0.2379 |
Average | 1.6562 | 2.5104 | 1.6396 | 1.5448 | 1.3363 | 0.3341 | 0.2613 |
Data | AdaIN [8] | WCT [9] | AreUST [56] | ReReVST [12] | MCCNet [13] | ARF [43] | Ours |
---|---|---|---|---|---|---|---|
Room 0 | 5.3132 | 6.8596 | 4.5472 | 4.8139 | 4.9938 | 2.8125 | 1.934 |
Room 1 | 7.9342 | 6.0772 | 4.8340 | 5.6226 | 2.3598 | 3.9416 | 1.7545 |
Office 2 | 5.4719 | 6.4132 | 4.6636 | 4.3751 | 2.8393 | 2.4181 | 2.7884 |
Room 0044 | 5.8358 | 5.6587 | 4.3654 | 5.6654 | 3.6547 | 3.2587 | 2.6987 |
Room 1013 | 6.3548 | 5.7556 | 4.3549 | 6.3254 | 2.6587 | 3.2989 | 2.3654 |
Playground | 8.8055 | 5.8038 | 4.9192 | 8.6928 | 2.2472 | 3.7675 | 1.9527 |
Average | 6.3572 | 5.9009 | 4.6140 | 5.9556 | 3.1411 | 3.2641 | 2.3905 |
Style | AdaIN [8] | WCT [31] | AreUST [56] | ReReVST [12] | MCCNet [13] | ARF [43] | Ours |
---|---|---|---|---|---|---|---|
Style 1 | 0.7650 | 0.7183 | 0.7535 | 0.8686 | 0.8917 | 0.5854 | 0.5232 |
Style 2 | 0.6953 | 0.7897 | 0.7316 | 0.7929 | 0.8850 | 0.6098 | 0.5003 |
Style 3 | 0.6651 | 0.6912 | 0.7268 | 0.7775 | 0.8194 | 0.5862 | 0.5874 |
Style 4 | 0.8017 | 0.7187 | 0.7330 | 0.7484 | 0.7810 | 0.5663 | 0.5262 |
Style 5 | 0.5988 | 0.7889 | 0.7392 | 0.7747 | 0.7063 | 0.5636 | 0.5164 |
Style 6 | 0.7692 | 0.7752 | 0.7532 | 0.8173 | 0.7298 | 0.5829 | 0.5062 |
Style 7 | 0.6495 | 0.6923 | 0.7125 | 0.8469 | 0.7148 | 0.5669 | 0.5813 |
Style 8 | 0.6339 | 0.7739 | 0.7248 | 0.7648 | 0.7435 | 0.5723 | 0.5116 |
Style 9 | 0.6741 | 0.7918 | 0.7410 | 0.8371 | 0.8925 | 0.5611 | 0.5637 |
Style 10 | 0.6811 | 0.7307 | 0.8016 | 0.7065 | 0.8065 | 0.5609 | 0.5942 |
Style 11 | 0.6834 | 0.6923 | 0.6980 | 0.8040 | 0.7309 | 0.5810 | 0.5785 |
Style 12 | 0.7828 | 0.7825 | 0.7948 | 0.7593 | 0.7962 | 0.6019 | 0.5057 |
Style 13 | 0.7494 | 0.7281 | 0.7783 | 0.7331 | 0.8679 | 0.5919 | 0.5568 |
Average | 0.7038 | 0.7441 | 0.7453 | 0.7870 | 0.7973 | 0.5792 | 0.5424 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Park, J.; Cho, K. Neural Rendering-Based 3D Scene Style Transfer Method via Semantic Understanding Using a Single Style Image. Mathematics 2023, 11, 3243. https://doi.org/10.3390/math11143243
Park J, Cho K. Neural Rendering-Based 3D Scene Style Transfer Method via Semantic Understanding Using a Single Style Image. Mathematics. 2023; 11(14):3243. https://doi.org/10.3390/math11143243
Chicago/Turabian StylePark, Jisun, and Kyungeun Cho. 2023. "Neural Rendering-Based 3D Scene Style Transfer Method via Semantic Understanding Using a Single Style Image" Mathematics 11, no. 14: 3243. https://doi.org/10.3390/math11143243
APA StylePark, J., & Cho, K. (2023). Neural Rendering-Based 3D Scene Style Transfer Method via Semantic Understanding Using a Single Style Image. Mathematics, 11(14), 3243. https://doi.org/10.3390/math11143243