Bootstrapped SSL CycleGAN for Asymmetric Domain Transfer
Abstract
:1. Introduction
2. Preliminaries
2.1. Generative Adversarial Networks
2.2. CycleGAN Networks
2.3. Unsupervised Training Procedure
Algorithm 1 CycleGAN training procedure |
|
3. Proposed Approach
3.1. Avoiding the Overfitting by SSL Strategy
3.2. Improving the Learning Process by BTS Strategy
3.3. Proposed SSL+BTS Training Procedure
Algorithm 2 BTS-SSL CycleGAN training |
|
4. Elements of the Adopted Experimental Setup
4.1. Network Architecture and Training Details
4.2. Considered Domain Translation Tasks
4.3. Considered Baseline Domain Translation Methods
4.4. Utilized Measures for Results Comparison
5. Experimental Results
5.1. Quantitative Evaluation of the Conducted Experiments
5.2. Visual Comparison between Analyzed Methods
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Park, T.; Liu, M.Y.; Wang, T.C.; Zhu, J.Y. Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2337–2346. [Google Scholar]
- Zhu, P.; Abdal, R.; Qin, Y.; Wonka, P. SEAN: Image synthesis with semantic region-adaptive normalization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5104–5113. [Google Scholar]
- Lee, C.H.; Liu, Z.; Wu, L.; Luo, P. MaskGAN: Towards diverse and interactive facial image manipulation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5549–5558. [Google Scholar]
- Tang, H.; Xu, D.; Yan, Y.; Torr, P.H.; Sebe, N. Local class-specific and global image-level generative adversarial networks for semantic-guided scene generation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 7870–7879. [Google Scholar]
- Yi, Z.; Zhang, H.; Tan, P.; Gong, M. DualGAN: Unsupervised dual learning for image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2849–2857. [Google Scholar]
- Mejjati, Y.A.; Richardt, C.; Tompkin, J.; Cosker, D.; Kim, K.I. Unsupervised attention-guided image-to-image translation. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; pp. 3697–3707. [Google Scholar]
- Tomei, M.; Cornia, M.; Baraldi, L.; Cucchiara, R. Art2Real: Unfolding the reality of artworks via semantically-aware image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 5849–5859. [Google Scholar]
- Liu, M.Y.; Huang, X.; Mallya, A.; Karras, T.; Aila, T.; Lehtinen, J.; Kautz, J. Few-shot unsupervised image-to-image translation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 10551–10560. [Google Scholar]
- Song, Y.; Yang, C.; Lin, Z.; Liu, X.; Huang, Q.; Li, H.; Kuo, C.C.J. Contextual-based image inpainting: Infer, match, and translate. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Zhao, L.; Mo, Q.; Lin, S.; Wang, Z.; Zuo, Z.; Chen, H.; Xing, W.; Lu, D. UCTGAN: Diverse image inpainting based on unsupervised cross-space translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 5741–5750. [Google Scholar]
- Yuan, Y.; Liu, S.; Zhang, J.; Zhang, Y.; Dong, C.; Lin, L. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–23 June 2018; pp. 701–710. [Google Scholar]
- Zhang, Y.; Liu, S.; Dong, C.; Zhang, X.; Yuan, Y. Multiple cycle-in-cycle generative adversarial networks for unsupervised image super-resolution. IEEE Trans. Image Process. 2019, 29, 1101–1112. [Google Scholar]
- Isola, P.; Zhu, J.Y.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Wang, C.; Zheng, H.; Yu, Z.; Zheng, Z.; Gu, Z.; Zheng, B. Discriminative region proposal adversarial networks for high-quality image-to-image translation. In Proceedings of the European Conference on Computer Vision, Munich, Germany, 8–14 September 2018; pp. 770–785. [Google Scholar]
- Wang, T.C.; Liu, M.Y.; Zhu, J.Y.; Tao, A.; Kautz, J.; Catanzaro, B. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8798–8807. [Google Scholar]
- AlBahar, B.; Huang, J.B. Guided image-to-image translation with bi-directional feature transformation. In Proceedings of the IEEE International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019; pp. 9016–9025. [Google Scholar]
- Zhu, J.Y.; Park, T.; Isola, P.; Efros, A.A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Bao, F.; Neumann, M.; Vu, N.T. CycleGAN-based emotion style transfer as data augmentation for speech emotion recognition. In Proceedings of the Conference of the International Speech Communication Association (Interspeech), Graz, Austria, 15–19 September 2019; pp. 2828–2832. [Google Scholar]
- Meng, Z.; Li, J.; Gong, Y.; Juang, B.H. Cycle-consistent speech enhancement. In Proceedings of the Conference of the International Speech Communication Association (Interspeech), Hyderabad, India, 2–6 September 2018; pp. 1165–1169. [Google Scholar]
- Hosseini-Asl, E.; Zhou, Y.; Xiong, C.; Socher, R. Robust domain adaptation by augmented cyclic adversarial learning. In Proceedings of the 31st International Conference on Neural Information Processing Systems—Interpretability and Robustness for Audio, Speech and Language Workshop, Montreal, QC, Canada, 3–8 December 2018; pp. 1–11. [Google Scholar]
- Qi, C.; Chen, J.; Xu, G.; Xu, Z.; Lukasiewicz, T.; Liu, Y. SAG-GAN: Semi-supervised attention-guided GANs for data augmentation on medical images. arXiv 2020, arXiv:2011.07534. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 27th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Ulyanov, D.; Vedaldi, A.; Lempitsky, V. Instance normalization: The missing ingredient for fast stylization. arXiv 2017, arXiv:1607.08022v3. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 694–711. [Google Scholar]
- Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The cityscapes dataset for semantic urban scene understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3213–3223. [Google Scholar]
- Reed, S.; Akata, Z.; Yan, X.; Logeswaran, L.; Schiele, B.; Lee, H. Generative adversarial text to image synthesis. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 20–22 June 2016; pp. 1060–1069. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [PubMed] [Green Version]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
pix2pix | CycleGAN | SSL | BTS | SSL+BTS | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
[%] | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | PSNR | SSIM | [%] | [%] | |
CityScapes | 25 | 19.98 | 0.60 | 17.20 | 0.56 | 18.30 | 0.59 | 17.30 | 0.58 | 18.77 | 0.61 | 9.1 | 8.9 |
50 | 20.45 | 0.64 | 17.00 | 0.55 | 18.95 | 0.61 | 17.18 | 0.59 | 19.04 | 0.64 | 12.00 | 16.4 | |
100 | 19.51 | 0.59 | 17.12 | 0.54 | 20.03 | 0.58 | 17.75 | 0.63 | 20.47 | 0.65 | 19.6 | 20.4 | |
Facade dataset | 25 | 13.78 | 0.35 | 10.93 | 0.25 | 11.78 | 0.31 | 10.81 | 0.27 | 11.83 | 0.32 | 8.23 | 28.0 |
50 | 14.24 | 0.40 | 11.00 | 0.25 | 13.22 | 0.37 | 11.92 | 0.28 | 13.75 | 0.40 | 25.0 | 60.0 | |
100 | 14.25 | 0.42 | 10.98 | 0.27 | 12.88 | 0.33 | 11.52 | 0.35 | 13.21 | 0.41 | 20.3 | 51.8 | |
Google Maps | 25 | 30.35 | 0.67 | 30.47 | 0.71 | 30.62 | 0.73 | 30.55 | 0.75 | 31.20 | 0.77 | 2.4 | 8.4 |
50 | 30.55 | 0.68 | 29.78 | 0.72 | 30.68 | 0.75 | 29.81 | 0.76 | 30.88 | 0.79 | 3.7 | 9.7 | |
100 | 30.01 | 0.69 | 30.24 | 0.73 | 30.92 | 0.75 | 30.27 | 0.77 | 31.23 | 0.81 | 3.3 | 11.0 |
Task/Dataset | Measure | pix2pix | CycleGAN | SSL | BTS | SSL+BTS | ||
---|---|---|---|---|---|---|---|---|
Photo Map Google Maps | PRM | 25 | 18.8 | 17.5 | 18.0 | 18.1 | 18.4 | 5.1 |
50 | 19.8 | 18.7 | 19.3 | 19.2 | 19.5 | 4.3 | ||
100 | 21.7 | 20.4 | 20.8 | 20.7 | 21.1 | 3.4 | ||
Photo Semantic label CityScapes | FCN per-pixel acc. | 25 | 0.57 | 0.43 | 0.45 | 0.46 | 0.50 | 16.3 |
50 | 0.63 | 0.49 | 0.52 | 0.51 | 0.54 | 10.2 | ||
100 | 0.71 | 0.52 | 0.56 | 0.55 | 0.58 | 11.5 | ||
FCN mean acc. | 25 | 0.16 | 0.13 | 0.15 | 0.16 | 0.18 | 38.5 | |
50 | 0.20 | 0.15 | 0.17 | 0.18 | 0.19 | 26.7 | ||
100 | 0.25 | 0.17 | 0.19 | 0.20 | 0.22 | 29.4 | ||
FCN mean IoU | 25 | 0.13 | 0.09 | 0.10 | 0.10 | 0.12 | 33.3 | |
50 | 0.16 | 0.10 | 0.11 | 0.12 | 0.13 | 30.0 | ||
100 | 0.18 | 0.11 | 0.13 | 0.13 | 0.15 | 36.4 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Krstanović, L.; Popović, B.; Janev, M.; Brkljač, B. Bootstrapped SSL CycleGAN for Asymmetric Domain Transfer. Appl. Sci. 2022, 12, 3411. https://doi.org/10.3390/app12073411
Krstanović L, Popović B, Janev M, Brkljač B. Bootstrapped SSL CycleGAN for Asymmetric Domain Transfer. Applied Sciences. 2022; 12(7):3411. https://doi.org/10.3390/app12073411
Chicago/Turabian StyleKrstanović, Lidija, Branislav Popović, Marko Janev, and Branko Brkljač. 2022. "Bootstrapped SSL CycleGAN for Asymmetric Domain Transfer" Applied Sciences 12, no. 7: 3411. https://doi.org/10.3390/app12073411
APA StyleKrstanović, L., Popović, B., Janev, M., & Brkljač, B. (2022). Bootstrapped SSL CycleGAN for Asymmetric Domain Transfer. Applied Sciences, 12(7), 3411. https://doi.org/10.3390/app12073411