DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution
Abstract
:1. Introduction
- We propose a dual-branch progressive image fusion framework, based on complementary self-attention and convolution, for the IVF, which can take into account both global and local information of the source image to achieve better feature fusion.
- We propose Cross-Modality Feature Extraction (CMFE) to enhance the information interaction between modalities, while suppressing the loss of high-frequency information.
- Extensive experimental results on the public image fusion datasets, MSRS, RoadScene, and TNO, demonstrate that our method is superior to the current general fusion framework. Meanwhile, we investigate the facilitation of our fused images for object detection and semantic segmentation.
2. Related Work
2.1. CNN-Based and AE-Based Fusion Methods
2.2. GAN-Based Fusion Methods
2.3. Transformer-Based Fusion Methods
3. Methodology
3.1. Network Architecture
3.2. Specific Framework of ACmix
3.3. Specific Framework of CMEF
3.4. Loss Function
4. Experiments
4.1. Experimental Configurations
4.2. Comparative Experiment
4.2.1. Qualitative Results
4.2.2. Quantitative Result
4.3. Generalization Experiment
4.3.1. Results of RoadScene
4.3.2. Results of TNO
4.4. Ablation Study
4.5. Downstream IVF Applications
4.5.1. Object Detection Performance
4.5.2. Semantic Segmentation Performance
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Tang, L.; Zhang, H.; Xu, H.; Ma, J. Deep learning-based image fusion: A survey. J. Image Graph. 2023, 28, 3–36. [Google Scholar]
- Wang, J.; Liu, A.; Yin, Z.; Liu, S.; Tang, S.; Liu, X. Dual Attention Suppression Attack: Generate Adversarial Camouflage in Physical World. arXiv 2021, arXiv:2103.01050. [Google Scholar]
- Liu, A.; Liu, X.; Yu, H.; Zhang, C.; Liu, Q.; Tao, D. Training Robust Deep Neural Networks via Adversarial Noise Propagation. IEEE Trans. Image Process. 2021, 30, 5769–5781. [Google Scholar] [CrossRef]
- Zeng, Y.; Zhang, D.; Wang, C.; Miao, Z.; Liu, T.; Zhan, X.; Hao, D.; Ma, C. LIFT: Learning 4D LiDAR Image Fusion Transformer for 3D Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 17172–17181. [Google Scholar]
- Pan, X.; Cheng, J.; Hou, F.; Lan, R.; Lu, C.; Li, L.; Feng, Z.; Wang, H.; Liang, C.; Liu, Z.; et al. SMILE: Cost-sensitive multi-task learning for nuclear segmentation and classification with imbalanced annotations. Med. Image Anal. 2023, 88, 102867. [Google Scholar] [CrossRef]
- Jin, C.; Luo, C.; Yan, M.; Zhao, G.; Zhang, G.; Zhang, S. Weakening the Dominant Role of Text: CMOSI Dataset and Multimodal Semantic Enhancement Network. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–15. [Google Scholar] [CrossRef]
- Qin, H.; Ding, Y.; Zhang, M.; Yan, Q.; Liu, A.; Dang, Q.; Liu, Z.; Liu, X. BiBERT: Accurate Fully Binarized BERT. arXiv 2022, arXiv:2203.06390. [Google Scholar]
- Qin, H.; Zhang, X.; Gong, R.; Ding, Y.; Xu, Y.; Liu, X. Distribution-sensitive Information Retention for Accurate Binary Neural Network. arXiv 2022, arXiv:2109.12338. [Google Scholar] [CrossRef]
- Yan, M.; Lou, X.; Chan, C.A.; Wang, Y.; Jiang, W. A semantic and emotion-based dual latent variable generation model for a dialogue system. Caai Trans. Intell. Technol. 2023, 8, 319–330. [Google Scholar] [CrossRef]
- Wang, Z.; Feng, J.; Zhang, Y. Pedestrian detection in infrared image based on depth transfer learning. Multimed. Tools Appl. 2022, 81, 39655–39674. [Google Scholar] [CrossRef]
- Zhang, J.; Liu, C.; Wang, B.; Chen, C.; He, J.; Zhou, Y.; Li, J. An infrared pedestrian detection method based on segmentation and domain adaptation learning. Comput. Electr. Eng. 2022, 99, 107781. [Google Scholar] [CrossRef]
- Liu, J.; Fan, X.; Huang, Z.; Wu, G.; Liu, R.; Zhong, W.; Luo, Z. Target-aware Dual Adversarial Learning and a Multi-scenario Multi-Modality Benchmark to Fuse Infrared and Visible for Object. arXiv 2022, arXiv:2203.16220. [Google Scholar]
- Ma, J.; Tang, L.; Fan, F.; Huang, J.; Mei, X.; Ma, Y. SwinFusion: Cross-domain Long-range Learning for General Image Fusion via Swin Transformer. IEEE/CAA J. Autom. Sin. 2022, 9, 1200–1217. [Google Scholar] [CrossRef]
- Tang, L.; Deng, Y.; Ma, Y.; Huang, J.; Ma, J. SuperFusion: A Versatile Image Registration and Fusion Network with Semantic Awareness. IEEE/CAA J. Autom. Sin. 2022, 9, 2121–2137. [Google Scholar] [CrossRef]
- Wang, Z.; Chen, Y.; Shao, W.; Li, H.; Zhang, L. SwinFuse: A Residual Swin Transformer Fusion Network for Infrared and Visible Images. arXiv 2022, arXiv:2204.11436. [Google Scholar] [CrossRef]
- Zhang, H.; Ma, J. SDNet: A Versatile Squeeze-and-Decomposition Network for Real-Time Image Fusion. Int. J. Comput. Vis. 2021, 129, 2761–2785. [Google Scholar] [CrossRef]
- Rao, Y.; Wu, D.; Han, M.; Wang, T.; Yang, Y.; Lei, T.; Zhou, C.; Bai, H.; Xing, L. AT-GAN: A generative adversarial network with attention and transition for infrared and visible image fusion. Inf. Fusion 2023, 92, 336–349. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J. DenseFuse: A Fusion Approach to Infrared and Visible Images. IEEE Trans. Image Process. 2019, 28, 2614–2623. [Google Scholar] [CrossRef] [PubMed]
- Li, H.; Wu, X.J.; Kittler, J. RFN-Nest: An end-to-end residual fusion network for infrared and visible images. Inf. Fusion 2021, 73, 72–86. [Google Scholar] [CrossRef]
- Li, H.; Wu, X.J.; Durrani, T. NestFuse: An Infrared and Visible Image Fusion Architecture Based on Nest Connection and Spatial/Channel Attention Models. IEEE Trans. Instrum. Meas. 2020, 69, 9645–9656. [Google Scholar] [CrossRef]
- Tang, L.; Yuan, J.; Ma, J. Image fusion in the loop of high-level vision tasks: A semantic-aware real-time infrared and visible image fusion network. Inf. Fusion 2022, 82, 28–42. [Google Scholar] [CrossRef]
- Zhang, Y.; Liu, Y.; Sun, P.; Yan, H.; Zhao, X.; Zhang, L. IFCNN: A general image fusion framework based on convolutional neural network. Inf. Fusion 2020, 54, 99–118. [Google Scholar] [CrossRef]
- Ma, J.; Tang, L.; Xu, M.; Zhang, H.; Xiao, G. STDFusionNet: An Infrared and Visible Image Fusion Network Based on Salient Target Detection. IEEE Trans. Instrum. Meas. 2021, 70, 5009513. [Google Scholar] [CrossRef]
- Zhao, Z.; Xu, S.; Zhang, J.; Liang, C.; Zhang, C.; Liu, J. Efficient and Model-Based Infrared and Visible Image Fusion via Algorithm Unrolling. IEEE Trans. Circuits Syst. Video Technol. 2022, 32, 1186–1196. [Google Scholar] [CrossRef]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Gool, L.V.; Timofte, R. SwinIR: Image Restoration Using Swin Transformer. arXiv 2021, arXiv:2108.10257. [Google Scholar]
- Qu, L.; Liu, S.; Wang, M.; Song, Z. TransMEF: A Transformer-Based Multi-Exposure Image Fusion Framework using Self-Supervised Multi-Task Learning. arXiv 2021, arXiv:2112.01030. [Google Scholar] [CrossRef]
- Pan, X.; Ge, C.; Lu, R.; Song, S.; Chen, G.; Huang, Z.; Huang, G. On the Integration of Self-Attention and Convolution. arXiv 2022, arXiv:2111.14556. [Google Scholar]
- Tang, L.; Yuan, J.; Zhang, H.; Jiang, X.; Ma, J. PIAFusion: A progressive infrared and visible image fusion network based on illumination aware. Inf. Fusion 2022, 83–84, 79–92. [Google Scholar] [CrossRef]
- Wang, Z.; Wu, Y.; Wang, J.; Xu, J.; Shao, W. Res2Fusion: Infrared and Visible Image Fusion Based on Dense Res2net and Double Nonlocal Attention Models. IEEE Trans. Instrum. Meas. 2022, 71, 5005012. [Google Scholar] [CrossRef]
- Zhang, H.; Xu, H.; Xiao, Y.; Guo, X.; Ma, J. Rethinking the Image Fusion: A Fast Unified Image Fusion Network based on Proportional Maintenance of Gradient and Intensity. Proc. AAAI Conf. Artif. Intell. 2020, 34, 12797–12804. [Google Scholar] [CrossRef]
- Xu, H.; Ma, J.; Jiang, J.; Guo, X.; Ling, H. U2Fusion: A Unified Unsupervised Image Fusion Network. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 502–518. [Google Scholar] [CrossRef]
- Ma, J.; Yu, W.; Liang, P.; Li, C.; Jiang, J. FusionGAN: A generative adversarial network for infrared and visible image fusion. Inf. Fusion 2019, 48, 11–26. [Google Scholar] [CrossRef]
- Ma, J.; Xu, H.; Jiang, J.; Mei, X.; Zhang, X.P. DDcGAN: A Dual-Discriminator Conditional Generative Adversarial Network for Multi-Resolution Image Fusion. IEEE Trans. Image Process. 2020, 29, 4980–4995. [Google Scholar] [CrossRef] [PubMed]
- Park, S.; Choi, D.H.; Kim, J.U.; Ro, Y.M. Robust thermal infrared pedestrian detection by associating visible pedestrian knowledge. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 22–27 May 2022; pp. 4468–4472. [Google Scholar]
- Wu, X.; Hong, D.; Chanussot, J. UIU-Net: U-Net in U-Net for Infrared Small Object Detection. IEEE Trans. Image Process. 2023, 32, 364–376. [Google Scholar] [CrossRef]
- Wang, A.; Li, W.; Wu, X.; Huang, Z.; Tao, R. Mpanet: Multi-Patch Attention for Infrared Small Target Object Detection. In Proceedings of the IGARSS 2022—2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; pp. 3095–3098. [Google Scholar] [CrossRef]
- Sun, Y.; Cao, B.; Zhu, P.; Hu, Q. DetFusion: A Detection-Driven Infrared and Visible Image Fusion Network. In Proceedings of the MM’22: 30th ACM International Conference on Multimedia, New York, NY, USA, 4–7 July 2022; pp. 4003–4011. [Google Scholar] [CrossRef]
- Zhao, W.; Xie, S.; Zhao, F.; He, Y.; Lu, H. MetaFusion: Infrared and Visible Image Fusion via Meta-Feature Embedding From Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 13955–13965. [Google Scholar]
- Wang, D.; Liu, J.; Liu, R.; Fan, X. An interactively reinforced paradigm for joint infrared-visible image fusion and saliency object detection. Inf. Fusion 2023, 98, 101828. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- Zhu, X.; Su, W.; Lu, L.; Li, B.; Wang, X.; Dai, J. Deformable DETR: Deformable Transformers for End-to-End Object Detection. arXiv 2021, arXiv:2010.04159. [Google Scholar]
- Zhou, M.; Yan, K.; Huang, J.; Yang, Z.; Fu, X.; Zhao, F. Mutual Information-Driven Pan-Sharpening. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 1798–1808. [Google Scholar]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Lu, T.; Luo, P.; Shao, L. Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction Without Convolutions. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 568–578. [Google Scholar]
- Zhao, H.; Nie, R. DNDT: Infrared and Visible Image Fusion Via DenseNet and Dual-Transformer. In Proceedings of the 2021 International Conference on Information Technology and Biomedical Engineering (ICITBE), Nanchang, China, 24–26 December 2021; pp. 71–75. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. arXiv 2021, arXiv:2103.14030. [Google Scholar]
- Rao, D.; Wu, X.; Xu, T. TGFuse: An Infrared and Visible Image Fusion Approach Based on Transformer and Generative Adversarial Network. arXiv 2022, arXiv:2201.10147. [Google Scholar] [CrossRef]
- Li, J.; Zhu, J.; Li, C.; Chen, X.; Yang, B. CGTF: Convolution-Guided Transformer for Infrared and Visible Image Fusion. IEEE Trans. Instrum. Meas. 2022, 71, 5012314. [Google Scholar] [CrossRef]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Toet, A. TNO Image Fusion Dataset. Figshare Data. 2014. Available online: https://figshare.com/articles/dataset/TNO_Image_Fusion_Dataset/1008029 (accessed on 1 January 2023).
- Ma, J.; Ma, Y.; Li, C. Infrared and visible image fusion methods and applications: A survey. Inf. Fusion 2019, 45, 153–178. [Google Scholar] [CrossRef]
- Ma, J.; Zhang, H.; Shao, Z.; Liang, P.; Xu, H. GANMcC: A Generative Adversarial Network with Multiclassification Constraints for Infrared and Visible Image Fusion. IEEE Trans. Instrum. Meas. 2021, 70, 5005014. [Google Scholar] [CrossRef]
- Reis, D.; Kupec, J.; Hong, J.; Daoudi, A. Real-Time Flying Object Detection with YOLOv8. arXiv 2023, arXiv:2305.09972. [Google Scholar]
- Padilla, R.; Passos, W.L.; Dias, T.L.B.; Netto, S.L.; da Silva, E.A.B. A Comparative Analysis of Object Detection Metrics with a Companion Open-Source Toolkit. Electronics 2021, 10, 279. [Google Scholar] [CrossRef]
- Peng, C.; Tian, T.; Chen, C.; Guo, X.; Ma, J. Bilateral attention decoder: A lightweight decoder for real-time semantic segmentation. Neural Netw. 2021, 137, 188–199. [Google Scholar] [CrossRef] [PubMed]
M1 | M2 | M3 | M4 | Our | |
---|---|---|---|---|---|
EN | 6.6125 | 6.6146 | 6.6437 | 6.6017 | 6.6571 |
Qabf | 0.6912 | 0.6820 | 0.6635 | 0.6507 | 0.7005 |
SF | 11.2660 | 11.1202 | 10.9875 | 11.0457 | 11.3704 |
MI | 4.5540 | 4.6525 | 4.4937 | 4.2439 | 4.7669 |
AG | 3.6181 | 3.5809 | 3.3374 | 3.5419 | 3.7201 |
FMI_pixel | 0.9289 | 0.9291 | 0.9277 | 0.9262 | 0.9314 |
[email protected] | [email protected] | [email protected] | |||||||
---|---|---|---|---|---|---|---|---|---|
Person | Car | All | Person | Car | All | Person | Car | All | |
IR | 0.7891 | 0.5491 | 0.6691 | 0.7394 | 0.4787 | 0.6091 | 0.1961 | 0.1946 | 0.1953 |
VI | 0.5478 | 0.7660 | 0.6569 | 0.3873 | 0.7108 | 0.5491 | 0.0359 | 0.3462 | 0.1911 |
DenseFuse [18] | 0.7731 | 0.8079 | 0.7905 | 0.7296 | 0.7713 | 0.7505 | 0.1578 | 0.4583 | 0.3081 |
IFCNN [22] | 0.7862 | 0.7768 | 0.7815 | 0.7316 | 0.7246 | 0.7281 | 0.1620 | 0.4118 | 0.2869 |
U2Fusion [31] | 0.7823 | 0.7950 | 0.7937 | 0.7347 | 0.7724 | 0.7536 | 0.1599 | 0.4053 | 0.2826 |
SDNet [16] | 0.7523 | 0.7649 | 0.7586 | 0.6519 | 0.7315 | 0.6917 | 0.1043 | 0.3826 | 0.2434 |
GANMcC [51] | 0.7657 | 0.8132 | 0.7895 | 0.7250 | 0.7658 | 0.7454 | 0.1834 | 0.4415 | 0.3129 |
SwinFusion [13] | 0.7699 | 0.7980 | 0.7840 | 0.6969 | 0.7499 | 0.7234 | 0.1183 | 0.3867 | 0.2525 |
TarDAL [12] | 0.7897 | 0.7746 | 0.7822 | 0.7083 | 0.7246 | 0.7165 | 0.1646 | 0.4070 | 0.2858 |
DPACFuse | 0.8034 | 0.8210 | 0.8122 | 0.7462 | 0.7753 | 0.7607 | 0.1862 | 0.4312 | 0.3087 |
Method | Background | Car | Person | Bike | Curve | Car Stop | Cuardrail | Color Tone | Bump | mIoU |
---|---|---|---|---|---|---|---|---|---|---|
IR | 97.96 | 85.69 | 71.27 | 65.46 | 52.52 | 54.48 | 27.59 | 54.92 | 60.11 | 63.33 |
VI | 97.69 | 83.07 | 54.67 | 66.27 | 51.97 | 54.94 | 59.69 | 50.35 | 66.18 | 64.98 |
DenseFuse [18] | 98.36 | 89.48 | 73.47 | 69.84 | 57.76 | 63.56 | 65.07 | 62.34 | 66.00 | 71.76 |
IFCNN [22] | 98.38 | 89.54 | 72.25 | 70.15 | 56.85 | 64.08 | 54.43 | 63.35 | 71.92 | 71.22 |
U2Fusion [31] | 98.27 | 88.08 | 73.42 | 69.39 | 57.85 | 62.76 | 53.03 | 59.7 | 69.75 | 69.75 |
SDNet [16] | 98.35 | 89.39 | 74.31 | 69.31 | 57.56 | 61.63 | 49.53 | 60.76 | 71.46 | 70.26 |
GANMcC [51] | 98.34 | 88.85 | 73.68 | 69.67 | 56.75 | 65.17 | 57.06 | 61.50 | 71.72 | 71.41 |
SwinFusion [13] | 98.25 | 88.08 | 70.74 | 68.66 | 74.31 | 61.70 | 67.34 | 64.06 | 67.86 | 71.67 |
TarDAL [12] | 98.3 | 89.11 | 72.67 | 68.96 | 57.00 | 62.34 | 52.43 | 60.79 | 61.41 | 69.22 |
DPACFuse | 98.6 | 90.38 | 74.58 | 71.94 | 65.39 | 74.44 | 84.66 | 66.03 | 77.18 | 78.13 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhu, H.; Wu, H.; Wang, X.; He, D.; Liu, Z.; Pan, X. DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution. Sensors 2023, 23, 7205. https://doi.org/10.3390/s23167205
Zhu H, Wu H, Wang X, He D, Liu Z, Pan X. DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution. Sensors. 2023; 23(16):7205. https://doi.org/10.3390/s23167205
Chicago/Turabian StyleZhu, Huayi, Heshan Wu, Xiaolong Wang, Dongmei He, Zhenbing Liu, and Xipeng Pan. 2023. "DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution" Sensors 23, no. 16: 7205. https://doi.org/10.3390/s23167205
APA StyleZhu, H., Wu, H., Wang, X., He, D., Liu, Z., & Pan, X. (2023). DPACFuse: Dual-Branch Progressive Learning for Infrared and Visible Image Fusion with Complementary Self-Attention and Convolution. Sensors, 23(16), 7205. https://doi.org/10.3390/s23167205