GRPAFusion: A Gradient Residual and Pyramid Attention-Based Multiscale Network for Multimodal Image Fusion
Abstract
:1. Introduction
- We propose an end-to-end multimodal image fusion framework that can adaptively fuse information from different modalities without human intervention. The training process of the fusion network is constrained using adaptive pixel intensity loss and maximum gradient loss, such that the fused image can highlight the thermal-radiation information of the infrared image while retaining the texture details of the visible image effectively.
- The fusion framework proposed in this study extracts the multiscale features of the source images while keeping the computational effort small and avoiding the use of downsampling operations. The detailed residuals of multiple granularities are used together in the encoder to represent the joint residual information, effectively preventing the problem of detailed feature loss and gradient disappearance. The pyramid split attention module is introduced in the feature fusion to make the fusion network pay more attention to the thermal-radiation information and texture details in the original image, whereas the fused image has higher contrast.
- Adequate ablation and comparative experimental studies were conducted using the TNO dataset. The results of the ablation experiments show that the fusion framework and trained loss function proposed in this study are effective. The subjective and objective experimental results of the comparative experiments show that the multimodal image fusion framework proposed in this paper has superior fusion performance compared to current state-of-the-art image fusion methods.
2. Proposed Method
2.1. Network Architecture
2.2. Loss Function
2.3. Evaluation of the Fusion Results
3. Experiments and Discussion
3.1. Experiments Setting
3.1.1. Dataset
3.1.2. Train Details
3.1.3. Test Details
3.2. Ablation Study
3.2.1. Experimental Validation of Parameter Sensitivity Analysis
3.2.2. Experimental Validation of Network Architecture
3.3. Comparative Experiment
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Han, J.; Bhanu, B. Fusion of Color and Infrared Video for Moving Human Detection. Pattern Recognit. 2007, 40, 1771–1784. [Google Scholar] [CrossRef]
- Zhang, H.; Xu, H.; Tian, X.; Jiang, J.; Ma, J. Image Fusion Meets Deep Learning: A Survey and Perspective. Inf. Fusion 2021, 76, 323–336. [Google Scholar] [CrossRef]
- Tu, Z.; Pan, W.; Duan, Y.; Tang, J.; Li, C. RGBT Tracking via Reliable Feature Configuration. Sci. China Inf. Sci. 2022, 65, 142101. [Google Scholar] [CrossRef]
- Tang, L.; Yuan, J.; Ma, J. Image Fusion in the Loop of High-Level Vision Tasks: A Semantic-Aware Real-Time Infrared and Visible Image Fusion Network. Inf. Fusion 2022, 82, 28–42. [Google Scholar] [CrossRef]
- Ha, Q.; Watanabe, K.; Karasawa, T.; Ushiku, Y.; Harada, T. MFNet: Towards Real-Time Semantic Segmentation for Autonomous Vehicles with Multi-Spectral Scenes. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, Canada, 24–28 September 2017; pp. 5108–5115. [Google Scholar] [CrossRef]
- Bavirisetti, D.P.; Dhuli, R. Fusion of infrared and visible sensor images based on anisotropic diffusion and Karhunen-Loeve transform. IEEE Sensors J. 2015, 16, 203–209. [Google Scholar] [CrossRef]
- Naidu, V. Image fusion technique using multi-resolution singular value decomposition. Def. Sci. J. 2011, 61, 479. [Google Scholar] [CrossRef] [Green Version]
- Shreyamsha Kumar, B. Image fusion based on pixel significance using cross bilateral filter. Signal Image Video Process. 2015, 9, 1193–1204. [Google Scholar] [CrossRef]
- Zhou, Z.; Dong, M.; Xie, X.; Gao, Z. Fusion of infrared and visible images for night-vision context enhancement. Appl. Opt. 2016, 55, 6480–6490. [Google Scholar] [CrossRef]
- Li, S.; Kang, X.; Hu, J. Image fusion with guided filtering. IEEE Trans. Image Process. 2013, 22, 2864–2875. [Google Scholar]
- Z, L.Y.L.S.W. A General Framework for Image Fusion Based on Multi-Scale Transform and Sparse Representation. Inf. Fusion 2015, 24, 147–164. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, X.; Ward, R.K.; Wang, Z.J. Image Fusion With Convolutional Sparse Representation. IEEE Signal Process. Lett. 2016, 23, 1882–1886. [Google Scholar] [CrossRef]
- Liu, C.H.; Qi, Y.; Ding, W.R. Infrared and Visible Image Fusion Method Based on Saliency Detection in Sparse Domain. Infrared Phys. Technol. 2017, 83, 94–102. [Google Scholar] [CrossRef]
- Bavirisetti, D.P.; Dhuli, R. Two-scale image fusion of visible and infrared images using saliency detection. Infrared Phys. Technol. 2016, 76, 52–64. [Google Scholar] [CrossRef]
- Ma, J.; Zhou, Z.; Wang, B.; Zong, H. Infrared and visible image fusion based on visual saliency map and weighted least square optimization. Infrared Phys. Technol. 2017, 82, 8–17. [Google Scholar] [CrossRef]
- Han, J.; Pauwels, E.J.; De Zeeuw, P. Fast saliency-aware multi-modality image fusion. Neurocomputing 2013, 111, 70–80. [Google Scholar] [CrossRef]
- Liu, Y.; Chen, X.; Cheng, J.; Peng, H.; Wang, Z. Infrared and Visible Image Fusion with Convolutional Neural Networks. Int. J. Wavelets Multiresolution Inf. Process. 2018, 16, 1850018. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J.; Kittler, J. Infrared and Visible Image Fusion Using a Deep Learning Framework. In Proceedings of the 2018 24th International Conference on Pattern Recognition (ICPR), Beijing, China, 20–24 August 2018; pp. 2705–2710. [Google Scholar] [CrossRef] [Green Version]
- Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A Generative Adversarial Network for Infrared and Visible Image Fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J. DenseFuse: A Fusion Approach to Infrared and Visible Images. IEEE Trans. Image Process. 2019, 28, 2614–2623. [Google Scholar] [CrossRef] [Green Version]
- Li, J.; Liu, J.; Zhou, S.; Zhang, Q.; Kasabov, N.K. Infrared and Visible Image Fusion Based on Residual Dense Network and Gradient Loss. Infrared Phys. Technol. 2023, 128, 104486. [Google Scholar] [CrossRef]
- Li, J.; Huo, H.; Li, C.; Wang, R.; Feng, Q. AttentionFGAN: Infrared and Visible Image Fusion Using Attention-Based Generative Adversarial Networks. IEEE Trans. Multimed. 2021, 23, 1383–1396. [Google Scholar] [CrossRef]
- Ma, J.; Tang, L.; Fan, F.; Huang, J.; Mei, X.; Ma, Y. SwinFusion: Cross-Domain Long-Range Learning for General Image Fusion via Swin Transformer. IEEE/CAA J. Autom. Sin. 2022, 9, 1200–1217. [Google Scholar] [CrossRef]
- Tang, L.; Deng, Y.; Ma, Y.; Huang, J.; Ma, J. SuperFusion: A Versatile Image Registration and Fusion Network with Semantic Awareness. IEEE/CAA J. Autom. Sin. 2022, 9, 2121–2137. [Google Scholar] [CrossRef]
- Zhang, H.; Zu, K.; Lu, J.; Zou, Y.; Meng, D. EPSANet: An efficient pyramid squeeze attention block on convolutional neural network. arXiv 2021, arXiv:2105.14447. [Google Scholar]
- Gao, S.H.; Cheng, M.M.; Zhao, K.; Zhang, X.Y.; Yang, M.H.; Torr, P. Res2Net: A New Multi-Scale Backbone Architecture. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 652–662. [Google Scholar] [CrossRef] [Green Version]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Roberts, J.W.; Van Aardt, J.A.; Ahmed, F.B. Assessment of image fusion procedures using entropy, image quality, and multispectral classification. J. Appl. Remote Sens. 2008, 2, 023522. [Google Scholar]
- Cui, G.; Feng, H.; Xu, Z.; Li, Q.; Chen, Y. Detail preserved fusion of visible and infrared images using regional saliency extraction and multi-scale image decomposition. Opt. Commun. 2015, 341, 199–209. [Google Scholar] [CrossRef]
- Piella, G.; Heijmans, H. A New Quality Metric for Image Fusion. In Proceedings of the 2003 International Conference on Image Processing (Cat. No.03CH37429), Barcelona, Spain, 14–17 September 2003; Volume 3, p. III-173. [Google Scholar] [CrossRef]
- Haghighat, M.; Razian, M.A. Fast-FMI: Non-reference image fusion metric. In Proceedings of the 2014 IEEE 8th International Conference on Application of Information and Communication Technologies (AICT), Astana, Kazakhstan, 15–17 October 2014; pp. 1–3. [Google Scholar]
- Ma, K.; Zeng, K.; Wang, Z. Perceptual Quality Assessment for Multi-Exposure Image Fusion. IEEE Trans. Image Process. 2015, 24, 3345–3356. [Google Scholar] [CrossRef]
- Toet, A. The TNO Multiband Image Data Collection. Data Brief 2017, 15, 249–251. [Google Scholar] [CrossRef]
- Jia, X.; Zhu, C.; Li, M.; Tang, W.; Zhou, W. LLVIP: A visible-infrared paired dataset for low-light vision. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 19–25 June 2021; pp. 3496–3504. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A Unified Unsupervised Image Fusion Network. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 502–518. [Google Scholar] [CrossRef]
Model | EN | SF | AG | MS-SSIM | ||||
---|---|---|---|---|---|---|---|---|
Global Average Pool | 6.6457 | 0.0470 | 4.6667 | 0.4778 | 0.9061 | 0.2208 | 0.2597 | 0.9289 |
Global Level Map | 6.8160 | 0.0464 | 4.6735 | 0.4818 | 0.9080 | 0.2204 | 0.2658 | 0.8880 |
Model | EN | SF | AG | MS-SSIM | ||||
---|---|---|---|---|---|---|---|---|
No-GradRes | 6.8677 | 0.0453 | 4.3747 | 0.4669 | 0.9048 | 0.2171 | 0.2744 | 0.8649 |
No-PSA | 6.8430 | 0.0460 | 4.5856 | 0.4756 | 0.9060 | 0.2153 | 0.2622 | 0.8730 |
Ours | 6.8160 | 0.0464 | 4.6735 | 0.4818 | 0.9080 | 0.2204 | 0.2658 | 0.8880 |
Method | EN | SF | AG | MS-SSIM | ||||
---|---|---|---|---|---|---|---|---|
ADF | 6.2691 | 0.0345 | 3.5217 | 0.4127 | 0.8829 | 0.2275 | 0.2595 | 0.8760 |
MSVD | 6.6088 | 0.0370 | 3.2704 | 0.3061 | 0.8960 | 0.1750 | 0.2191 | 0.8933 |
DLF | 6.1855 | 0.0244 | 2.3587 | 0.2965 | 0.9033 | 0.1975 | 0.2251 | 0.8655 |
DenseFuse | 6.1700 | 0.0219 | 2.2100 | 0.2627 | 0.8965 | 0.1767 | 0.2155 | 0.8603 |
FusionGAN | 6.3600 | 0.0215 | 2.1173 | 0.1905 | 0.8889 | 0.1727 | 0.1957 | 0.7245 |
U2Fusion | 6.2493 | 0.0313 | 3.3398 | 0.3655 | 0.8896 | 0.2054 | 0.2373 | 0.8931 |
SuperFusion | 6.6180 | 0.0316 | 3.0343 | 0.4138 | 0.8961 | 0.2701 | 0.3031 | 0.8566 |
SwinFusion | 6.7007 | 0.0408 | 3.9628 | 0.4987 | 0.9102 | 0.2945 | 0.3267 | 0.8844 |
Ours | 6.8160 | 0.0464 | 4.6735 | 0.4818 | 0.9080 | 0.2204 | 0.2658 | 0.8880 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, J.; Xi, X.; Li, D.; Li, F.; Zhang, G. GRPAFusion: A Gradient Residual and Pyramid Attention-Based Multiscale Network for Multimodal Image Fusion. Entropy 2023, 25, 169. https://doi.org/10.3390/e25010169
Wang J, Xi X, Li D, Li F, Zhang G. GRPAFusion: A Gradient Residual and Pyramid Attention-Based Multiscale Network for Multimodal Image Fusion. Entropy. 2023; 25(1):169. https://doi.org/10.3390/e25010169
Chicago/Turabian StyleWang, Jinxin, Xiaoli Xi, Dongmei Li, Fang Li, and Guanxin Zhang. 2023. "GRPAFusion: A Gradient Residual and Pyramid Attention-Based Multiscale Network for Multimodal Image Fusion" Entropy 25, no. 1: 169. https://doi.org/10.3390/e25010169
APA StyleWang, J., Xi, X., Li, D., Li, F., & Zhang, G. (2023). GRPAFusion: A Gradient Residual and Pyramid Attention-Based Multiscale Network for Multimodal Image Fusion. Entropy, 25(1), 169. https://doi.org/10.3390/e25010169