Multi-Step Structure Image Inpainting Model with Attention Mechanism
Abstract
:1. Introduction
- (1)
- We propose a coarse-to-fine inpainting network, in which we inpaint the structural information of the image in the first stage and color the reconstructed area in the second stage. At the same time, to solve the problem that the damaged area is challenging to recover, we put forward a multi-step structure inpainting scheme.
- (2)
- We introduce a structural information attention module to improve the ability to reconstruct structural information in the first stage.
- (3)
- Our proposed method outperforms existing methods in the benchmark dataset. It resolves the instability issue of two-stage models, such as the gated convolution model [13] and Edgeconnect model [8], in image structure reconstruction, resulting in improved restoration effects. Our method demonstrates better inpainting results compared to MED [6], a single-stage restoration model.
2. Related Work
2.1. Traditional Mathematical Method
2.2. Image Inpainting Based on Deep Learning
3. Approach
3.1. Model
3.2. Multi-Step Structure Inpainting
3.3. Loss Function
3.3.1. Feature-Matching LOSS
3.3.2. Reconstruction Loss
3.3.3. Adversarial Loss
3.3.4. Perceptual Loss
3.3.5. Style Loss
3.3.6. Joint Loss
3.3.7. Comparison of Loss Function
4. Experiments
4.1. Visual Evaluations
4.2. Numerical Evaluations
5. Ablation Study
5.1. Multi-Step Structure Inpainting Model
5.2. Structural Attention Mechanism
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
- Yang, J.; Kannan, A.; Batra, D.; Parikh, D. Lr-gan: Layered recursive generative adversarial networks for image generation. arXiv 2017, arXiv:1703.01560. [Google Scholar]
- Xu, T.; Zhang, P.; Huang, Q.; Han, Z.; He, X. Attngan: Fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1316–1324. [Google Scholar]
- Johnson, J.; Gupta, A.; Li, F.-F. Image generation from scene graphs. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1219–1228. [Google Scholar]
- Yan, Z.; Li, X.; Li, M.; Zuo, W.; Shan, S. Shift-net: Image inpainting via deep feature rearrangement. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 1–17. [Google Scholar]
- Liu, H.; Jiang, B.; Song, Y.; Huang, W.; Yang, C. Rethinking image inpainting via a mutual encoder-decoder with feature equalizations. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 725–741. [Google Scholar]
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Thomas, S. Generative image inpainting with contextual attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 5505–5514. [Google Scholar]
- Nazeri, K.; Ng, E.; Joseph, T.; Qureshi, F.Z.; Ebrahimi, M. Edgeconnect: Generative image inpainting with adversarial edge learning. arXiv 2019, arXiv:1901.00212. [Google Scholar]
- Ribeiro, H.D.M.; Arnold, A.; Howard, J.P.; Shun-Shin, M.J.; Zhang, Y.; Francis, D.P.; Lim, P.B.; Whinnett, Z.; Zolgharni, M. ECG-based real-time arrhythmia monitoring using quantized deep neural networks: A feasibility study. Comput. Biol. Med. 2022, 143, 105249. [Google Scholar] [CrossRef]
- Liu, Z.; Chen, Y.; Zhang, Y.; Ran, S.; Cheng, C.; Yang, G. Diagnosis of arrhythmias with few abnormal ECG samples using metric-based meta learning. Comput. Biol. Med. 2023, 153, 106465. [Google Scholar] [CrossRef]
- Hu, R.; Chen, J.; Zhou, L. A transformer-based deep neural network for arrhythmia detection using continuous ECG signals. Comput. Biol. Med. 2022, 144, 105325. [Google Scholar] [CrossRef]
- Chen, H.; Das, S.; Morgan, J.M.; Maharatna, K. Prediction and classification of ventricular arrhythmia based on phase-space reconstruction and fuzzy c-means clustering. Comput. Biol. Med. 2022, 142, 105180. [Google Scholar] [CrossRef]
- Yu, J.; Lin, Z.; Yang, J.; Shen, X.; Lu, X.; Huang, T. Free-form image inpainting with gated convolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4471–4480. [Google Scholar]
- Wu, Z.; Shen, S.; Lian, X.; Su, X.; Chen, E. A dummy-based user privacy protection approach for text information retrieval. Knowl.-Based Syst. 2020, 195, 105679. [Google Scholar] [CrossRef]
- Wu, Z.; Xuan, S.; Xie, J.; Lin, C.; Lu, C. How to ensure the confidentiality of electronic medical records on the cloud: A technical perspective. Comput. Biol. Med. 2022, 147, 105726. [Google Scholar] [CrossRef]
- Wu, Z.; Shen, S.; Li, H.; Zhou, H.; Lu, C. A basic framework for privacy protection in personalized information retrieval: An effective framework for user privacy protection. J. Organ. End User Comput. (JOEUC) 2021, 33, 1–26. [Google Scholar] [CrossRef]
- Wu, Z.; Li, G.; Shen, S.; Lian, X.; Chen, E.; Xu, G. Constructing dummy query sequences to protect location privacy and query privacy in location-based services. World Wide Web 2021, 24, 25–49. [Google Scholar] [CrossRef]
- Wu, Z.; Shen, S.; Zhou, H.; Li, H.; Lu, C.; Zou, D. An effective approach for the protection of user commodity viewing privacy in e-commerce website. Knowl.-Based Syst. 2021, 220, 106952. [Google Scholar] [CrossRef]
- Bertalmio, M.; Sapiro, G.; Caselles, V.; Ballester, C. Image inpainting. In Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, New Orleans, LA, USA, 23–28 July 2000; pp. 417–424. [Google Scholar]
- Shen, J.; Chan, T.F. Mathematical models for local nontexture inpaintings. SIAM J. Appl. Math. 2002, 62, 1019–1043. [Google Scholar] [CrossRef] [Green Version]
- Shen, J.; Kang, S.H.; Chan, T.F. Euler’s elastica and curvature-based inpainting. SIAM J. Appl. Math. 2003, 63, 564–592. [Google Scholar] [CrossRef]
- Tsai, A.; Yezzi, A.; Willsky, A.S. Curve evolution implementation of the Mumford-Shah functional for image segmentation, denoising, interpolation, and magnification. IEEE Trans. Image Process. 2001, 10, 1169–1186. [Google Scholar] [CrossRef] [Green Version]
- Drori, I.; Cohen-Or, D.; Yeshurun, H. Fragment-based image completion. In ACM SIGGRAPH 2003 Papers; Assoc Computing Machinery: New York, NY, USA, 2003; pp. 303–312. [Google Scholar]
- Criminisi, A.; Pérez, P.; Toyama, K. Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 2004, 13, 1200–1212. [Google Scholar] [CrossRef]
- Wilczkowiak, M.; Brostow, G.J.; Tordoff, B.; Cipolla, R. Hole filling through photomontage. In Proceedings of the British Machine Vision Conference, BMVC 2005, Oxford, UK, 5–8 September 2005. [Google Scholar]
- Pathak, D.; Krahenbuhl, P.; Donahue, J.; Darrell, T.; Efros, A.A. Context encoders: Feature learning by inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2536–2544. [Google Scholar]
- Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Trans. Graph. (ToG) 2017, 36, 1–14. [Google Scholar] [CrossRef]
- Yang, C.; Lu, X.; Lin, Z.; Shechtman, E.; Wang, O.; Li, H. High-resolution image inpainting using multi-scale neural patch synthesis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 6721–6729. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein generative adversarial networks. In International Conference on Machine Learning; PMLR: London, UK, 2017; pp. 214–223. [Google Scholar]
- Liu, G.; Reda, F.A.; Shih, K.J.; Wang, T.C.; Tao, A.; Catanzaro, B. Image inpainting for irregular holes using partial convolutions. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 85–100. [Google Scholar]
- Liu, H.; Jiang, B.; Xiao, Y.; Yang, C. Coherent semantic attention for image inpainting. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4170–4179. [Google Scholar]
- Zhu, M.; He, D.; Li, X.; Chao, L.; Zhang, Z. Image inpainting by end-to-end cascaded refinement with mask awareness. IEEE Trans. Image Process. 2021, 30, 4855–4866. [Google Scholar] [CrossRef]
- Ren, Y.; Ren, H.; Shi, C.; Zhang, X.; Wu, X.; Li, X.; Mumtaz, I. Multistage semantic-aware image inpainting with stacked generator networks. Int. J. Intell. Syst. 2022, 37, 1599–1617. [Google Scholar] [CrossRef]
- Gilbert, A.; Collomosse, J.; Jin, H.; Price, B. Disentangling structure and aesthetics for style-aware image completion. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 1848–1856. [Google Scholar]
- Wang, Y.; Tao, X.; Qi, X.; Shen, X.; Jia, J. Image inpainting via generative multi-column convolutional neural networks. arXiv 2018, arXiv:1810.08771. [Google Scholar]
- Vo, H.V.; Duong, N.Q.K.; Pérez, P. Structural inpainting. In Proceedings of the 26th ACM International Conference on Multimedia, Seoul, Republic of Korea, 22–26 October 2018; pp. 1948–1956. [Google Scholar]
- Abbas Hedjazi, M.; Genc, Y. Learning to inpaint by progressively growing the mask regions. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
- Yu, T.; Guo, Z.; Jin, X.; Wu, S.; Chen, Z.; Li, W. Region normalization for image inpainting. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 12733–12740. [Google Scholar]
- Wang, Y.; Chen, Y.C.; Tao, X.; Jia, J. Vcnet: A robust approach to blind image inpainting. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 752–768. [Google Scholar]
- Sun, Q.; Ma, L.; Oh, S.J.; Van Gool, L. Natural and effective obfuscation by head inpainting. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 5050–5059. [Google Scholar]
- Song, Y.; Yang, C.; Lin, Z.; Liu, X.; Huang, Q.; Li, H.; Kuo, C. Contextual-based image inpainting: Infer, match, and translate. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
- Wang, T.; Ouyang, H.; Chen, Q. Image inpainting with external-internal learning and monochromic bottleneck. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 5120–5129. [Google Scholar]
- Zeng, Y.; Lin, Z.; Lu, H.; Patel, V.M. Cr-fill: Generative image inpainting with auxiliary contextual reconstruction. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 19–25 June 2021; pp. 14164–14173. [Google Scholar]
- Guo, X.; Yang, H.; Huang, D. Image inpainting via conditional texture and structure dual generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 19–25 June 2021; pp. 14134–14143. [Google Scholar]
- Yu, J.; Li, K.; Peng, J. Reference-guided face inpainting with reference attention network. Neural Comput. Appl. 2022, 34, 9717–9731. [Google Scholar] [CrossRef]
- Li, L.; Chen, M.; Shi, H.; Duan, Z.; Xiong, X. Multiscale Structure and Texture Feature Fusion for Image Inpainting. IEEE Access 2022, 10, 82668–82679. [Google Scholar] [CrossRef]
- Johnson, J.; Alahi, A.; Li, F.-F. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 694–711. [Google Scholar]
- Zhou, B.; Lapedriza, A.; Khosla, A.; Oliva, A.; Torralba, A. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 1452–1464. [Google Scholar] [CrossRef] [Green Version]
- Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [Green Version]
Mask | GC | EC | MED | Ours | |
---|---|---|---|---|---|
PSNR↑ | 10∼20 | 26.580 | 28.230 | 28.909 | 29.269 |
20∼30 | 22.551 | 25.115 | 25.235 | 26.314 | |
30∼40 | 16.485 | 18.520 | 19.936 | 20.260 | |
40∼50 | 15.130 | 17.433 | 17.852 | 18.312 | |
SSIM↑ | 10∼20 | 0.909 | 0.934 | 0.941 | 0.946 |
20∼30 | 0.754 | 0.881 | 0.893 | 0.903 | |
30∼40 | 0.652 | 0.706 | 0.719 | 0.731 | |
40∼50 | 0.560 | 0.679 | 0.698 | 0.708 | |
FID↓ | 10∼20 | 24.751 | 22.947 | 22.022 | 18.465 |
20∼30 | 32.259 | 31.518 | 29.063 | 26.973 | |
30∼40 | 46.207 | 45.170 | 42.570 | 38.585 | |
40∼50 | 62.524 | 59.960 | 59.179 | 54.080 |
Mask | GC | EC | MED | Ours | |
---|---|---|---|---|---|
PSNR↑ | 10∼20 | 27.979 | 30.473 | 30.585 | 30.946 |
20∼30 | 20.459 | 23.179 | 27.579 | 27.614 | |
30∼40 | 17.658 | 19.973 | 20.467 | 21.056 | |
40∼50 | 16.209 | 18.216 | 19.243 | 19.420 | |
SSIM↑ | 10∼20 | 0.742 | 0.907 | 0.926 | 0.934 |
20∼30 | 0.694 | 0.862 | 0.883 | 0.892 | |
30∼40 | 0.607 | 0.744 | 0.765 | 0.774 | |
40∼50 | 0.554 | 0.628 | 0.733 | 0.739 | |
FID↓ | 10∼20 | 20.580 | 19.159 | 19.491 | 16.264 |
20∼30 | 31.472 | 29.738 | 27.071 | 26.207 | |
30∼40 | 50.485 | 47.652 | 43.357 | 43.067 | |
40∼50 | 62.975 | 62.721 | 58.946 | 57.700 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Ran, C.; Li, X.; Yang, F. Multi-Step Structure Image Inpainting Model with Attention Mechanism. Sensors 2023, 23, 2316. https://doi.org/10.3390/s23042316
Ran C, Li X, Yang F. Multi-Step Structure Image Inpainting Model with Attention Mechanism. Sensors. 2023; 23(4):2316. https://doi.org/10.3390/s23042316
Chicago/Turabian StyleRan, Cai, Xinfu Li, and Fang Yang. 2023. "Multi-Step Structure Image Inpainting Model with Attention Mechanism" Sensors 23, no. 4: 2316. https://doi.org/10.3390/s23042316
APA StyleRan, C., Li, X., & Yang, F. (2023). Multi-Step Structure Image Inpainting Model with Attention Mechanism. Sensors, 23(4), 2316. https://doi.org/10.3390/s23042316