Controllable Unsupervised Snow Synthesis by Latent Style Space Manipulation
Abstract
:1. Introduction
- Content and style disentanglement. The CUSS model employs an architectural framework comprising a content encoder and a style encoder. In order to separate the content and style latent spaces, we introduce a supplementary content discriminator that distinguishes the content codes of clear and snow images.
- Multimodal controllable output generation. The CUSS model allows for the generation of multiple and diverse outputs based on a single input image through the sampling of distinct style codes. Moreover, the incorporation of a self-regression module facilitates the linear interpolation of the style code, thereby enabling manual adjustment of the generated size of snow.
- Evaluations on public datasets. The model undergoes evaluation employing the Cityscapes and EuroCity Persons datasets. Various image quality metrics, encompassing the traditional PSNR, SSIM, and perceptual loss VGG distance, are employed to substantiate the effectiveness of the model. The source code will be available on https://github.com/HantingYang/Controllable-Snow-Synthesize (accessed on 13 September 2023).
2. Related Work
2.1. Weather Generation
2.1.1. Fog and Rain
2.1.2. Snow
2.2. Unpaired Image-to-Image Translation
Latent Space Constraint
3. Controllable Unsupervised Snow Synthesis
3.1. Fundamental Basis
Algorithm 1: Controllable Unsupervised Snow Synthesis (CUSS) |
Input: Training data pairs ▹ In order of clear and snow Output: Encoders , Decoders ▹ generate clear images, generate snow images
|
3.2. Disentanglement of Content and Style
3.3. Loss Function
3.3.1. Adversarial Loss
3.3.2. Identity Loss and Latent Space Reconstruction Loss
3.4. Cross-Cycle Consistency Loss
4. Experiments
4.1. Datasets
4.1.1. Urban Sceneries: Cityscapes
4.1.2. European Urban Scenes: EuroCity Persons
4.1.3. Snow Condition Driving Dataset
4.2. Implementation Details
- The content encoder has five convolutional layers.
- The style encoder includes an initial residual layer, two downsampling layers, and one adaptive average pooling layer.
- The content encoder features an initial residual layer, two downsampling layers, and four residual blocks.
- Each decoder is made up of four residual blocks and two upsampling layers. It employs adaptive instance normalization, while the encoders use standard instance normalization.
- All the discriminators take specific image patches with the same resolution as input, which is inspired by Demir’s work [36]. This structure includes five convolutional layers.
4.3. Performance Assessment
4.3.1. Assessment Criteria
4.3.2. Qualitative Results
4.3.3. Qualitative Results
4.3.4. Discussion
5. Conclusions
- Expanding the proposed technique to tackle generation tasks in more demanding driving conditions, such as heavy rain, dense fog, nighttime, and strong light;
- Delving deeper into the relationship between generative methods and latent space manipulation for I2I translation tasks by integrating existing insights from self-supervised and contrast learning methodologies.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zhang, Y.; Carballo, A.; Yang, H.; Takeda, K. Perception and sensing for autonomous vehicles under adverse weather conditions: A survey. ISPRS J. Photogramm. Remote Sens. 2023, 196, 146–177. [Google Scholar] [CrossRef]
- Ding, Q.; Li, P.; Yan, X.; Shi, D.; Liang, L.; Wang, W.; Xie, H.; Li, J.; Wei, M. CF-YOLO: Cross Fusion YOLO for Object Detection in Adverse Weather With a High-Quality Real Snow Dataset. IEEE Trans. Intell. Transp. Syst. 2023, 24, 10749–10759. [Google Scholar] [CrossRef]
- Zhang, W.; Yao, R.; Du, X.; Liu, Y.; Wang, R.; Wang, L. Traffic flow prediction under multiple adverse weather based on self-attention mechanism and deep learning models. Phys. A Stat. Mech. Its Appl. 2023, 625, 128988. [Google Scholar] [CrossRef]
- Qin, Q.; Chang, K.; Huang, M.; Li, G. DENet: Detection-driven Enhancement Network for Object Detection Under Adverse Weather Conditions. In Proceedings of the Asian Conference on Computer Vision, Macao, China, 4–8 December 2022; pp. 2813–2829. [Google Scholar]
- Rothmeier, T.; Huber, W. Let it snow: On the synthesis of adverse weather image data. In Proceedings of the 2021 IEEE International Intelligent Transportation Systems Conference (ITSC), Indianapolis, IN, USA, 19–22 September 2021; IEEE: New York, NY, USA, 2021; pp. 3300–3306. [Google Scholar]
- Sakaridis, C.; Dai, D.; Van Gool, L. Semantic foggy scene understanding with synthetic data. Int. J. Comput. Vis. 2018, 126, 973–992. [Google Scholar] [CrossRef]
- Garg, K.; Nayar, S.K. Detection and removal of rain from videos. In Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, CVPR 2004, Washington, DC, USA, 27 June–2 July 2004; IEEE: New York, NY, USA, 2004; Volume 1, p. I. [Google Scholar]
- Liu, Y.F.; Jaw, D.W.; Huang, S.C.; Hwang, J.N. DesnowNet: Context-aware deep network for snow removal. IEEE Trans. Image Process. 2018, 27, 3064–3073. [Google Scholar] [CrossRef] [PubMed]
- Zhang, K.; Li, R.; Yu, Y.; Luo, W.; Li, C. Deep dense multi-scale network for snow removal using semantic and depth priors. IEEE Trans. Image Process. 2021, 30, 7419–7431. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Sun, J.; Tang, X. Single image haze removal using dark channel prior. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 2341–2353. [Google Scholar] [PubMed]
- Engin, D.; Genç, A.; Kemal Ekenel, H. Cycle-dehaze: Enhanced cyclegan for single image dehazing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 825–833. [Google Scholar]
- Guo, Y.; Ma, Z.; Song, Z.; Tang, R.; Liu, L. Cycle-Derain: Enhanced CycleGAN for Single Image Deraining. In Proceedings of the Big Data and Security: Second International Conference, ICBDS 2020, Singapore, 20–22 December 2020; Revised Selected Papers 2. Springer: Berlin/Heidelberg, Germany, 2021; pp. 497–509. [Google Scholar]
- Kim, T.; Cha, M.; Kim, H.; Lee, J.K.; Kim, J. Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the International Conference on Machine Learning, PMLR, Sydney, Australia, 6–11 August 2017; pp. 1857–1865. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Yi, Z.; Zhang, H.; Tan, P.; Gong, M. Dualgan: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2849–2857. [Google Scholar]
- Yang, H.; Ming, D.; Alexander, C.; Zhang, Y.; Kento, O.; Yinjie, N.; Maoning, G.; Yan, F.; Kazuya, T. Synthesizing Realistic Snow Effects in Driving Images Using GANs and Real Data with Semantic Guidance. In Proceedings of the IEEE Intelligent Vehicles Symposium, Anchorage, AK, USA, 4–7 June 2023; p. 34. [Google Scholar]
- Zhang, C.; Lin, Z.; Xu, L.; Li, Z.; Tang, W.; Liu, Y.; Meng, G.; Wang, L.; Li, L. Density-aware haze image synthesis by self-supervised content-style disentanglement. IEEE Trans. Circuits Syst. Video Technol. 2021, 32, 4552–4572. [Google Scholar] [CrossRef]
- Biri, V.; Michelin, S. Real Time Animation of Realistic Fog. In Proceedings of the Poster Session of 13th Eurographic Workshop on Rendering, Berlin, Germany, 24–28 August 2002; pp. 9–16. [Google Scholar]
- Ohlsson, P.; Seipel, S. Real-time rendering of accumulated snow. In Proceedings of the Sigrad Conference, Citeseer, Online, 24–25 November 2004; pp. 25–32. [Google Scholar]
- Stomakhin, A.; Schroeder, C.; Chai, L.; Teran, J.; Selle, A. A material point method for snow simulation. ACM Trans. Graph. (TOG) 2013, 32, 1–10. [Google Scholar] [CrossRef]
- Liu, M.Y.; Breuel, T.; Kautz, J. Unsupervised image-to-image translation networks. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Huang, X.; Liu, M.Y.; Belongie, S.; Kautz, J. Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 172–189. [Google Scholar]
- Pang, Y.; Lin, J.; Qin, T.; Chen, Z. Image-to-image translation: Methods and applications. IEEE Trans. Multimed. 2021, 24, 3859–3881. [Google Scholar] [CrossRef]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Zhu, J.Y.; Zhang, R.; Pathak, D.; Darrell, T.; Efros, A.A.; Wang, O.; Shechtman, E. Toward multimodal image-to-image translation. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Lee, H.Y.; Tseng, H.Y.; Huang, J.B.; Singh, M.; Yang, M.H. Diverse image-to-image translation via disentangled representations. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 35–51. [Google Scholar]
- Choi, Y.; Choi, M.; Kim, M.; Ha, J.W.; Kim, S.; Choo, J. Stargan: Unified generative adversarial networks for multi-domain image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8789–8797. [Google Scholar]
- Liu, A.H.; Liu, Y.C.; Yeh, Y.Y.; Wang, Y.C.F. A unified feature disentangler for multi-domain image translation and manipulation. Adv. Neural Inf. Process. Syst. 2018, 31. [Google Scholar]
- Yang, H.; Carballo, A.; Takeda, K. Disentangled Bad Weather Removal GAN for Pedestrian Detection. In Proceedings of the 2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring), Helsinki, Finland, 19–22 June 2022; pp. 1–6. [Google Scholar]
- Yang, H.; Carballo, A.; Zhang, Y.; Takeda, K. Framework for generation and removal of multiple types of adverse weather from driving scene images. Sensors 2023, 23, 1548. [Google Scholar] [CrossRef] [PubMed]
- Wang, M.; Su, T.; Chen, S.; Yang, W.; Liu, J.; Wang, Z. Automatic Model-Based Dataset Generation for High-Level Vision Tasks of Autonomous Driving in Haze Weather. IEEE Trans. Ind. Inform. 2022, 19, 9071–9081. [Google Scholar] [CrossRef]
- Ni, S.; Cao, X.; Yue, T.; Hu, X. Controlling the rain: From removal to rendering. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6328–6337. [Google Scholar]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Braun, M.; Krebs, S.; Flohr, F.; Gavrila, D.M. The eurocity persons dataset: A novel benchmark for object detection. arXiv 2018, arXiv:1805.07193. [Google Scholar]
- Demir, U.; Unal, G. Patch-based image inpainting with generative adversarial networks. arXiv 2018, arXiv:1803.07422. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Wu, H.; Qu, Y.; Lin, S.; Zhou, J.; Qiao, R.; Zhang, Z.; Xie, Y.; Ma, L. Contrastive learning for compact single image dehazing. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 10551–10560. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Higgins, I.; Amos, D.; Pfau, D.; Racaniere, S.; Matthey, L.; Rezende, D.; Lerchner, A. Towards a definition of disentangled representations. arXiv 2018, arXiv:1812.02230. [Google Scholar]
Methods | |||||
---|---|---|---|---|---|
CUT | 0.392 | 15.853 | 6.063 | 26.156 | 0.047 |
CycleGAN | 16.072 | 5.892 | 26.342 | 0.048 | |
MUNIT | 0.452 | 16.118 | 5.923 | 25.447 | 0.049 |
DRIT | 0.446 | 16.432 | 5.426 | 25.874 | 0.049 |
CUSS | 0.465 |
Methods | |||||
---|---|---|---|---|---|
CUT | 0.396 | 15.912 | 6.123 | 26.245 | 0.048 |
CycleGAN | 0.501 | 16.233 | 5.927 | 26.581 | 0.049 |
MUNIT | 0.479 | 16.483 | 6.012 | 25.847 | 0.048 |
DRIT | 0.469 | 16.741 | 5.756 | 26.007 | 0.048 |
CUSS | 0.494 | 16.983 | 5.386 | 25.402 | 0.051 |
Module | |||||
---|---|---|---|---|---|
w/o | 0.459 | 16.868 | 6.044 | 26.156 | 0.046 |
w/o | 0.461 | 16.831 | 6.052 | 26.133 | 0.045 |
w/o | 0.466 | 16.843 | 6.032 | 26.092 | 0.045 |
w/o | 0.462 | 16.857 | 6.063 | 26.112 | 0.046 |
w/o | 0.468 | 16.801 | 6.015 | 26.127 | 0.046 |
CUSS | |||||
CUSS | |||||
CUSS |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yang, H.; Carballo, A.; Zhang, Y.; Takeda, K. Controllable Unsupervised Snow Synthesis by Latent Style Space Manipulation. Sensors 2023, 23, 8398. https://doi.org/10.3390/s23208398
Yang H, Carballo A, Zhang Y, Takeda K. Controllable Unsupervised Snow Synthesis by Latent Style Space Manipulation. Sensors. 2023; 23(20):8398. https://doi.org/10.3390/s23208398
Chicago/Turabian StyleYang, Hanting, Alexander Carballo, Yuxiao Zhang, and Kazuya Takeda. 2023. "Controllable Unsupervised Snow Synthesis by Latent Style Space Manipulation" Sensors 23, no. 20: 8398. https://doi.org/10.3390/s23208398
APA StyleYang, H., Carballo, A., Zhang, Y., & Takeda, K. (2023). Controllable Unsupervised Snow Synthesis by Latent Style Space Manipulation. Sensors, 23(20), 8398. https://doi.org/10.3390/s23208398