Symmetric Connected U-Net with Multi-Head Self Attention (MHSA) and WGAN for Image Inpainting
Abstract
:1. Introduction
2. Related Work
3. Approach
3.1. The Principle of the SC-Unet Model
3.2. DCNN in Convolutional Block of SC-Unet
3.3. MHSA in Convolutional Block of SC-Unet
3.4. The Loss Functions
3.5. The Adversarial Loss
3.6. The Main Framework
4. Experiments
4.1. Experimental Settings
4.2. Quantitative Evaluation
- –
- CA is a method proposed by Jiahui Yu. This approach uses the contextual attention mechanism to establish long-range dependencies of the feature map for inpainting.
- –
- GLCIC is a method proposed by Iizuka et al. This approach utilizes a single completion network for image completion and global and local context discriminator networks to realistically complete images.
4.3. Qualitative Evaluation
4.4. Ablation Study
5. Discussions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Sun, J.; Yuan, L.; Jia, J. Image completion with structure propagation. In ACM SIGGRAPH 2005 Papers; Association for Computing Machinery: New York, NY, USA, 2005; pp. 861–868. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully convolutional networks for semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 7–12. [Google Scholar]
- Xu, Y.; Gu, T.; Chen, W. Ootdiffusion: Outfitting fusion based latent diffusion for controllable virtual try-on. arXiv 2024, arXiv:2403.01779. [Google Scholar]
- Chunqi, F.; Kun, R.; Lisha, M. Advances in digital image inpainting algorithms based on deep learning. J. Signal Process 2020, 36, 102–109. [Google Scholar]
- Qin, Z.; Zeng, Q.; Zong, Y. Image inpainting based on deep learning: A review. Displays 2021, 69, 102028. [Google Scholar] [CrossRef]
- Mardieva, S.; Ahmad, S.; Umirzakova, S.; Rasool, M.A.; Whangbo, T.K. Lightweight image super-resolution for IoT devices using deep residual feature distillation network. Knowl.-Based Syst. 2024, 285, 111343. [Google Scholar] [CrossRef]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 1–26 July 2016; pp. 770–778. [Google Scholar]
- Jiao, L.; Wu, H.; Wang, H.; Bie, R. Multi-scale semantic image inpainting with residual learning and GAN. Neurocomputing 2019, 331, 199–212. [Google Scholar] [CrossRef]
- Araujo, A.; Norris, W.; Sim, J. Computing receptive fields of convolutional neural networks. Distill 2019, 4, e21. [Google Scholar] [CrossRef]
- Phutke, S.S.; Murala, S. Diverse receptive field based adversarial concurrent encoder network for image inpainting. IEEE Signal Process. Lett. 2021, 28, 1873–1877. [Google Scholar] [CrossRef]
- Yu, F. Multi-scale context aggregation by dilated convolutions. arXiv 2015, arXiv:1511.07122. [Google Scholar]
- Yan, Z.; Li, X.; Li, M. Shift-net: Image inpainting via deep feature rearrangement. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 1–17. [Google Scholar]
- Yu, J.; Lin, Z. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5505–5514. [Google Scholar]
- Liu, W.; Shi, Y.; Li, J. Multi-stage Progressive Reasoning for Dunhuang Murals Inpainting. In Proceedings of the 2023 IEEE 4th International Conference on Pattern Recognition and Machine Learning (PRML), Urumqi, China, 4–6 August 2023; pp. 211–217. [Google Scholar]
- Liu, G.; Reda, F.A.; Shih, K.J. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 85–100. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2536–2544. [Google Scholar]
- Im, D.J.; Kim, C.D.; Jiang, H. Generating Images with Recurrent Adversarial Networks. arXiv 2016, arXiv:1602.05110. [Google Scholar]
- Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Trans. Graph 2017, 36, 1–14. [Google Scholar] [CrossRef]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Arjovsky, M.; Chintala, S. Wasserstein generative adversarial networks. In Proceedings of the International Conference on Machine Learning, Sydney, NSW, Australia, 6–11 August 2017; pp. 214–223. [Google Scholar]
- Mirza, M. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Zhu, J.-Y.; Park, T.; Isola, P. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2223–2232. [Google Scholar]
- Lou, S.; Fan, Q.; Chen, F. Preliminary investigation on single remote sensing image inpainting through a modified GAN. In Proceedings of the 2018 10th IAPR Workshop on Pattern Recognition in Remote Sensing, PRRS, Beijing, China, 19–20 August 2018; pp. 1–6. [Google Scholar]
- Deng, Y.; Hui, S.; Zhou, S. Learning contextual transformer network for image inpainting. In Proceedings of the 29th ACM International Conference on Multimedia, Chengdu, China, 20–24 October 2021; pp. 2529–2538. [Google Scholar]
- Wang, N.; Ma, S.; Li, J. Multistage attention network for image inpainting. Pattern Recognit. 2020, 106, 107448. [Google Scholar] [CrossRef]
- Li, J.; Wang, N.; Zhang, L. Recurrent feature reasoning for image inpainting. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 7757–7765. [Google Scholar]
- Liu, H.; Jiang, B. Coherent semantic attention for image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4170–4179. [Google Scholar]
- Wang, W.; Xie, E.; Li, X. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 568–578. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Wang, Q.; He, S.; Su, M. Context-Encoder-Based Image Inpainting for Ancient Chinese Silk. Appl. Sci. 2024, 14, 6607. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Russakovsky, O.; Deng, J.; Su, H. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. A neural algorithm of artistic style. arXiv 2015, arXiv:1508.06576. [Google Scholar] [CrossRef]
- Gatys, L.A.; Ecker, A.S.; Bethge, M. Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 27–30. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 11–14 October 2016; pp. 694–711. [Google Scholar]
- Karras, T.; Aila, T.; Laine, S. Progressive growing of gans for improved quality, stability, and variation. arXiv 2017, arXiv:1710.10196. [Google Scholar]
- Hore, A.; Ziou, D. Image quality metrics: PSNR vs. SSIM. In Proceedings of the 2010 20th International Conference on Pattern Recognition (ICPR), IEEE, Istanbul, Turkey, 23–26 August 2010; pp. 2366–2369. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. Gans trained by a two time-scale update rule converge to a local nash equilibrium. arXiv 2017, arXiv:1706.08500. [Google Scholar]
Metrics | Methods | 1–10% | 10–20% | 20–30% | 30–40% | 40–50% |
---|---|---|---|---|---|---|
ImageNet | CA | 32.45 | 28.13 | 24.97 | 22.74 | 20.18 |
GLCIC | 30.09 | 26.75 | 23.88 | 21.61 | 19.55 | |
Ours | 35.31 | 31.12 | 28.09 | 24.56 | 22.39 | |
CelebA-HQ | CA | 32.36 | 27.89 | 25.86 | 23.73 | 20.12 |
GLCIC | 30.13 | 25.91 | 22.02 | 19.60 | 17.37 | |
Ours | 36.19 | 33.86 | 30.55 | 26.67 | 23.39 |
Metrics | Methods | 1–10% | 10–20% | 20–30% | 30–40% | 40–50% |
---|---|---|---|---|---|---|
ImageNet | CA | 0.937 | 0.875 | 0.801 | 0.720 | 0.650 |
GLCIC | 0.910 | 0.855 | 0.754 | 0.660 | 0.609 | |
Ours | 0.960 | 0.910 | 0.870 | 0.818 | 0.784 | |
CelebA-HQ | CA | 0.959 | 0.909 | 0.832 | 0.750 | 0.721 |
GLCIC | 0.936 | 0.860 | 0.799 | 0.711 | 0.689 | |
Ours | 0.975 | 0.932 | 0.899 | 0.851 | 0.794 |
Metrics | Methods | 1–10% | 10–20% | 20–30% | 30–40% | 40–50% |
---|---|---|---|---|---|---|
ImageNet | CA | 12.84 | 23.13 | 35.69 | 44.72 | 56.19 |
GLCIC | 15.09 | 32.75 | 46.88 | 55.61 | 69.55 | |
Ours | 8.15 | 15.26 | 26.12 | 33.26 | 45.88 | |
CelebA-HQ | CA | 3.24 | 6.87 | 12.83 | 20.01 | 38.14 |
GLCIC | 4.24 | 10.91 | 18.76 | 29.33 | 47.37 | |
Ours | 1.69 | 2.36 | 4.58 | 6.12 | 8.19 |
Metrics | Methods | 1–10% | 10–20% | 20–30% | 30–40% | 40–50% |
---|---|---|---|---|---|---|
ImageNet | Model-1 | 23.89 | 22.35 | 19.26 | 15.39 | 14.17 |
Model-2 | 26.61 | 24.49 | 21.13 | 18.44 | 15.39 | |
Model-3 | 30.06 | 27.55 | 23.67 | 21.56 | 19.38 | |
Model-4 | 33.15 | 28.69 | 26.44 | 22.99 | 20.18 | |
Model-5 | 35.31 | 31.12 | 28.09 | 24.56 | 22.39 | |
CelebA-HQ | Model-1 | 24.86 | 23.25 | 19.76 | 17.19 | 15.13 |
Model-2 | 27.96 | 25.06 | 21.73 | 19.04 | 16.19 | |
Model-3 | 30.18 | 29.60 | 25.85 | 22.76 | 19.33 | |
Model-4 | 32.64 | 31.53 | 28.04 | 24.23 | 22.86 | |
Model-5 | 36.19 | 33.86 | 30.55 | 26.67 | 23.39 |
Metrics | Methods | 1–10% | 10–20% | 20–30% | 30–40% | 40–50% |
---|---|---|---|---|---|---|
ImageNet | Model-1 | 0.809 | 0.742 | 0.633 | 0.609 | 0.585 |
Model-2 | 0.850 | 0.793 | 0.704 | 0.654 | 0.629 | |
Model-3 | 0.899 | 0.842 | 0.755 | 0.702 | 0.684 | |
Model-4 | 0.933 | 0.899 | 0.811 | 0.758 | 0.735 | |
Model-5 | 0.960 | 0.910 | 0.870 | 0.818 | 0.784 | |
CelebA-HQ | Model-1 | 0.813 | 0.748 | 0.694 | 0.621 | 0.592 |
Model-2 | 0.862 | 0.801 | 0.755 | 0.684 | 0.643 | |
Model-3 | 0.895 | 0.842 | 0.796 | 0.743 | 0.691 | |
Model-4 | 0.923 | 0.901 | 0.851 | 0.796 | 0.746 | |
Model-5 | 0.975 | 0.932 | 0.899 | 0.851 | 0.794 |
Metrics | Methods | 1–10% | 10–20% | 20–30% | 30–40% | 40–50% |
---|---|---|---|---|---|---|
ImageNet | Model-1 | 54.27 | 68.29 | 72.35 | 86.05 | 112.87 |
Model-2 | 39.27 | 52.33 | 66.19 | 74.77 | 93.15 | |
Model-3 | 28.47 | 44.59 | 57.22 | 69.23 | 82.00 | |
Model-4 | 15.14 | 31.66 | 41.69 | 58.19 | 73.61 | |
Model-5 | 12.15 | 23.26 | 35.12 | 51.26 | 66.88 | |
CelebA-HQ | Model-1 | 21.86 | 38.11 | 51.28 | 72.39 | 93.16 |
Model-2 | 13.25 | 26.76 | 39.46 | 55.84 | 64.27 | |
Model-3 | 7.60 | 11.29 | 20.85 | 32.76 | 37.37 | |
Model-4 | 3.96 | 6.13 | 9.04 | 13.23 | 19.86 | |
Model-5 | 1.69 | 2.36 | 4.58 | 6.12 | 8.19 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Hou, Y.; Ma, X.; Zhang, J.; Guo, C. Symmetric Connected U-Net with Multi-Head Self Attention (MHSA) and WGAN for Image Inpainting. Symmetry 2024, 16, 1423. https://doi.org/10.3390/sym16111423
Hou Y, Ma X, Zhang J, Guo C. Symmetric Connected U-Net with Multi-Head Self Attention (MHSA) and WGAN for Image Inpainting. Symmetry. 2024; 16(11):1423. https://doi.org/10.3390/sym16111423
Chicago/Turabian StyleHou, Yanyang, Xiaopeng Ma, Junjun Zhang, and Chenxian Guo. 2024. "Symmetric Connected U-Net with Multi-Head Self Attention (MHSA) and WGAN for Image Inpainting" Symmetry 16, no. 11: 1423. https://doi.org/10.3390/sym16111423
APA StyleHou, Y., Ma, X., Zhang, J., & Guo, C. (2024). Symmetric Connected U-Net with Multi-Head Self Attention (MHSA) and WGAN for Image Inpainting. Symmetry, 16(11), 1423. https://doi.org/10.3390/sym16111423