Improved Thermal Infrared Image Super-Resolution Reconstruction Method Base on Multimodal Sensor Fusion
Abstract
:1. Introduction
2. Related work
2.1. Image Super-Resolution Reconstruction Based on Neural Networks
2.2. Multimodal Reference-Based Super-Resolution Reconstruction
3. Proposed Method
3.1. Network Architecture
3.1.1. Primary Feature Encoding Subnetwork
3.1.2. Super-Resolution Reconstruction Subnetwork
- Feature Extraction Module (FEM): We employed the same structure for the FEM used to process both infrared images and visible light features. This approach is based on the fact that visible light features have been previously adjusted to a feature space that matches the infrared features during the primary feature encoding process. Batch normalization layers in the network can destroy the original contrast of images in image reconstruction tasks according to some existing research. Therefore, we specifically removed all batch normalization layers in the network to improve reconstruction performance, reduce redundancy operations, and increase training and inference speed. Our FEM consisted of a series of improved Hierarchical Dilated Distillation Modules (HDDM) as presented in Figure 3a. We designed a multi-scale distillation fusion mechanism for visible or infrared input feature maps , which sequentially passes through filters with different dilation rates to separate different frequency components of different image features. This process enhances the representational capacity of the network. After each filter output, the feature map is split into two equal parts along the channel dimension. One part is directly passed on to the subsequent steps for feature fusion, while the other part continues to extract features. This operation can be represented asWe also utilized a 1-D convolution to process the compressed channel information, inspired by previous literature [38]. This approach reduces the computational and parameter complexity. We not only avoided the dimensionality reduction operation in channel attention, but also elevated the channel dimension to form multiple subspaces for different aspects of information. By combining different dimension features, we achieved more flexible information interaction between channels.Inspired by previous research, such as EDSR, we introduced local residual learning into the feature extraction module. This approach can effectively alleviate the potential problem of gradient disappearance in the parameter optimization process, making it possible to construct a deeper super-resolution reconstruction network. To perform point-wise add operation between the output feature map and the input feature map, we set a filter with a convolution kernel size of 1 × 1 at the end. This filter fuses the multi-scale information previously extracted and matches the number of channels with the input feature map. This operation can be represented as follows:
- Cross-Attention Transformation Module (CATM): In order to guide the process of infrared image SR with visible features, we constructed a Cross-Attention Transformation Module to obtain the attention map of relevant information from the input visible light features and transfer the useful information. The structure of the CATM are shown in Figure 4.Given the input of infrared and visible feature maps and , which are obtained by the feature extraction module processing the primary features of infrared and visible light, respectively. After and were concatenated into a tensor , they were input into the attention branch. Unlike previous attention mechanisms, we did not limit the estimation of attention maps to channel or spatial dimensions, but constructed a pixel-level attention mechanism. Firstly, was filtered by a 3 × 3 convolutional kernel to extract effective features in the feature map, and the channel number of the feature map was compressed to ( was the compression ratio, set to 4 due to performance limitations of server) to improve the computational efficiency of attention map estimation. Then, the number of channels was restored through a 3 × 3 convolutional kernel, and the attention map was reconstructed based on the effective features. This operation can be represented as:Finally, the feature map and attention map were multiplied point by point, and added to the infrared feature map to introduce the structural features of visible light images and obtain the updated infrared features . This operation can be represented by the following formula:
- Global Residual Connection and Hierarchical Feature Fusion: In this task, there is a strong correlation between input features and the output image. Shallow features typically retain a significant amount of low-frequency information. Additionally, as the network goes deeper, an optimization challenge called gradient vanishing occurs. Global residual connection serves as a simple yet effective solution for addressing these issues. It enables the network to concentrate on reconstructing the image’s high-frequency information, reduces resource waste, and simultaneously resolves the gradient vanishing problem. However, relying solely on global residual connections during the network inference process cannot fully utilize the abundant image features generated, resulting in information redundancy. As the network depth increases, the spatial representation capacity gradually decreases, while the semantic representation capacity increases. Therefore, fully exploiting these features can enhance the quality of the reconstructed image. To address this issue, we adopted a hierarchical feature fusion mechanism that sent the output of each CATM to the endpoint before upsampling for processing. Considering the significant amount of redundancy in these features, we added a feature fusion layer, which acts as a bottleneck layer to selectively extract relevant information from the hierarchical features. This layer is crucial for improving network efficiency and performance. The operation can be represented by the following formula:
- Upsamle Module (UM): Upsampling methods have been extensively studied in super-resolution networks. Some studies process feature maps at low resolutions, and then directly upsample and reconstruct the features to the target scale, which can reduce some computational cost. However, these methods are not conducive to achieving high magnification ratios and convenient interaction of multimodal information. Our proposed network gradually performs feature extraction and information fusion while the feature map is being constantly upsampled by a factor of 2 in each stage, in order to introduce rich texture details of visible light images at different scales. The feature was input into UM and upsampled by 2× through bilinear interpolation. Then, the updated features were filtered using a 3 × 3 convolution kernel to reduce the block effect in the feature maps. This process can be formalized as follows:
3.1.3. High-Frequency Detail Fusion Subnetwork
3.2. Loss Function
3.3. Training Strategies
4. Experiment
4.1. Experimental Environment and Dataset Settings
4.2. Comparative Experiments
4.3. Ablation Study
4.3.1. Ablation Studies of Network Structure
4.3.2. Ablation Study of Loss Function
4.3.3. Ablation Study of Training Strategy
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Zhu, D.; Zhan, W.; Jiang, Y.; Xu, X.; Guo, R. IPLF: A novel image pair learning fusion network for infrared and visible image. IEEE Sens. J. 2022, 22, 8808–8817. [Google Scholar] [CrossRef]
- Yu, Q.; Zhu, M.; Zeng, Q.; Wang, H.; Chen, Q.; Fu, X.; Qing, Z. Weather Radar Super-Resolution Reconstruction Based on Residual Attention Back-Projection Network. Remote Sens. 2023, 15, 1999. [Google Scholar] [CrossRef]
- Zhao, J.; Ma, Y.; Chen, F.; Shang, E.; Yao, W.; Zhang, S.; Yang, J. SA-GAN: A Second Order Attention Generator Adversarial Network with Region Aware Strategy for Real Satellite Images Super Resolution Reconstruction. Remote Sens. 2023, 15, 1391. [Google Scholar] [CrossRef]
- Wang, J.; Shao, Z.; Huang, X.; Lu, T.; Zhang, R.; Li, Y. From artifact removal to super-resolution. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–15. [Google Scholar] [CrossRef]
- Fang, H.; Ding, L.; Wang, L.; Chang, Y.; Yan, L.; Han, J. Infrared Small UAV Target Detection Based on Depthwise Separable Residual Dense Network and Multiscale Feature Fusion. IEEE Trans. Instrum. Meas. 2022, 71, 1–20. [Google Scholar] [CrossRef]
- Liu, Q.; Yuan, D.; Fan, N.; Gao, P.; Li, X.; He, Z. Learning dual-level deep representation for thermal infrared tracking. IEEE Trans. Multimed. 2022, 25, 1269–1281. [Google Scholar] [CrossRef]
- Yuan, D.; Shu, X.; Liu, Q.; He, Z. Structural target-aware model for thermal infrared tracking. Neurocomputing 2022, 491, 44–56. [Google Scholar] [CrossRef]
- Rivera Velázquez, J.M.; Khoudour, L.; Saint Pierre, G.; Duthon, P.; Liandrat, S.; Bernardin, F.; Fiss, S.; Ivanov, I.; Peleg, R. Analysis of thermal imaging performance under extreme foggy conditions: Applications to autonomous driving. J. Imaging 2022, 8, 306. [Google Scholar] [CrossRef]
- Munir, F.; Azam, S.; Rafique, M.A.; Sheri, A.M.; Jeon, M.; Pedrycz, W. Exploring thermal images for object detection in underexposure regions for autonomous driving. Appl. Soft Comput. 2022, 121, 108793. [Google Scholar] [CrossRef]
- Keys, R. Cubic convolution interpolation for digital image processing. IEEE Trans. Acoust. Speech Signal Process. 1981, 29, 1153–1160. [Google Scholar] [CrossRef] [Green Version]
- Sun, J.; Zhu, J.; Tappen, M.F. Context-constrained hallucination for image super-resolution. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 231–238. [Google Scholar] [CrossRef] [Green Version]
- Yang, J.; Wright, J.; Huang, T.; Ma, Y. Image super-resolution as sparse representation of raw image patches. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar] [CrossRef] [Green Version]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part IV 13; Springer: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Li, J.; Fang, F.; Mei, K.; Zhang, G. Multi-scale residual network for image super-resolution. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 517–532. [Google Scholar]
- Anwar, S.; Barnes, N. Densely residual laplacian super-resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 1192–1204. [Google Scholar] [CrossRef] [PubMed]
- Li, B.; Wang, B.; Liu, J.; Qi, Z.; Shi, Y. s-lwsr: Super lightweight super-resolution network. IEEE Trans. Image Process. 2020, 29, 8368–8380. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Chengdu, China, 20–22 April 2018; pp. 2472–2481. [Google Scholar]
- Mehta, N.; Murala, S. MSAR-Net: Multi-scale attention based light-weight image super-resolution. Pattern Recognit. Lett. 2021, 151, 215–221. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Zhao, H.; Kong, X.; He, J.; Qiao, Y.; Dong, C. Efficient image super-resolution using pixel attention. In Proceedings of the Computer Vision—ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16; Springer: Berlin/Heidelberg, Germany, 2020; pp. 56–72. [Google Scholar]
- Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss functions for image restoration with neural networks. IEEE Trans. Comput. Imaging 2016, 3, 47–57. [Google Scholar] [CrossRef]
- Liang, J.; Zeng, H.; Zhang, L. Details or artifacts: A locally discriminative learning approach to realistic image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–20 June 2022; pp. 5657–5666. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1905–1914. [Google Scholar]
- Zhou, Y.; Wu, G.; Fu, Y.; Li, K.; Liu, Y. Cross-mpi: Cross-scale stereo for image super-resolution using multiplane images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 14842–14851. [Google Scholar]
- Zhang, J.; Zhang, W.; Jiang, B.; Tong, X.; Chai, K.; Yin, Y.; Chen, X. Reference-Based Super-Resolution Method for Remote Sensing Images with Feature Compression Module. Remote Sens. 2023, 15, 1103. [Google Scholar] [CrossRef]
- Yang, Q.; Yang, R.; Davis, J.; Nistér, D. Spatial-depth super resolution for range images. In Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition, Minneapolis, MN, USA, 17–22 June 2007; pp. 1–8. [Google Scholar]
- He, K.; Sun, J.; Tang, X. Guided image filtering. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 1397–1409. [Google Scholar] [CrossRef]
- Sun, K.; Tran, T.H.; Krawtschenko, R.; Simon, S. Multi-frame super-resolution reconstruction based on mixed Poisson–Gaussian noise. Signal Process. Image Commun. 2020, 82, 115736. [Google Scholar] [CrossRef]
- Liu, H.c.; Li, S.t.; Yin, H.t. Infrared surveillance image super resolution via group sparse representation. Opt. Commun. 2013, 289, 45–52. [Google Scholar] [CrossRef]
- Dong, X.; Yokoya, N.; Wang, L.; Uezato, T. Learning Mutual Modulation for Self-supervised Cross-Modal Super-Resolution. In Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Proceedings, Part XIX; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–18. [Google Scholar]
- Zhang, W.; Sui, X.; Gu, G.; Chen, Q.; Cao, H. Infrared thermal imaging super-resolution via multiscale spatio-temporal feature fusion network. IEEE Sens. J. 2021, 21, 19176–19185. [Google Scholar] [CrossRef]
- Du, J.; Zhou, H.; Qian, K.; Tan, W.; Zhang, Z.; Gu, L.; Yu, Y. RGB-IR cross input and sub-pixel upsampling network for infrared image super-resolution. Sensors 2020, 20, 281. [Google Scholar] [CrossRef] [Green Version]
- Wang, B.; Zou, Y.; Zhang, L.; Li, Y.; Chen, Q.; Zuo, C. Multimodal super-resolution reconstruction of infrared and visible images via deep learning. Opt. Lasers Eng. 2022, 156, 107078. [Google Scholar] [CrossRef]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
- Ge, L.; Dou, L. G-Loss: A loss function with gradient information for super-resolution. Optik 2023, 280, 170750. [Google Scholar] [CrossRef]
- Isola, P.; Zhu, J.; Zhou, T.; Efros, A.A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1125–1134. [Google Scholar]
- Ha, Q.; Watanabe, K.; Karasawa, T.; Ushiku, Y.; Harada, T. MFNet: Towards real-time semantic segmentation for autonomous vehicles with multi-spectral scenes. In Proceedings of the 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Vancouver, BC, USA, 24–28 September 2017; pp. 5108–5115. [Google Scholar]
- Rivadeneira, R.E.; Sappa, A.D.; Vintimilla, B.X.; Kim, J.; Kim, D.; Li, Z.; Jian, Y.; Yan, B.; Cao, L.; Qi, F.; et al. Thermal image super-resolution challenge results-PBVS 2022. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 418–426. [Google Scholar]
- Zou, Y.; Zhang, L.; Liu, C.; Wang, B.; Hu, Y.; Chen, Q. Super-resolution reconstruction of infrared images based on a convolutional neural network with skip connections. Opt. Lasers Eng. 2021, 146, 106717. [Google Scholar] [CrossRef]
- Shacht, G.; Danon, D.; Fogel, S.; Cohen-Or, D. Single pair cross-modality super resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 6378–6387. [Google Scholar]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 586–595. [Google Scholar]
Methods | Prameters | 4× | 8× | ||
---|---|---|---|---|---|
PNSR | SSIM | PNSR | SSIM | ||
RCAN | 16 M | 31.66 | 0.8724 | 28.20 | 0.7107 |
s-LWSR64 | 2.27 M | 31.67 | 0.8894 | 28.18 | 0.7098 |
Zou et al. | 3.73 M | 31.84 | 0.8863 | 28.40 | 0.7121 |
EDSR | 43.09 M | 32.21 | 0.8913 | 28.38 | 0.7166 |
Wang et al. | 573.6 K | 21.33 | 0.6042 | - | - |
Ours | 698.2 K | 32.15 | 0.8921 | 28.41 | 0.7283 |
Methods | Ref. | PSNR | SSIM | LPIPS | MI | EN | AG | EI | SF |
---|---|---|---|---|---|---|---|---|---|
Origin TIR | - | - | - | - | - | 7.1152 | 2.3342 | 30.6074 | 14.1549 |
Real-ESRGAN | × | 30.23 | 0.7130 | 0.1728 | 3.0812 | 7.1295 | 1.6568 | 20.8512 | 6.3397 |
CMSR | ✓ | 29.52 | 0.6623 | 0.2213 | 1.7308 | 6.8232 | 3.4218 | 32.1385 | 12.2585 |
Wang et al. | ✓ | 29.38 | 0.6570 | 0.2513 | 1.7603 | 6.9754 | 3.0916 | 37.8797 | 13.0632 |
Ours | ✓ | 30.04 | 0.7041 | 0.1869 | 2.8672 | 7.6473 | 4.2393 | 49.9074 | 18.9905 |
MS | ID | CEAM | Param | PSNR | SSIM |
---|---|---|---|---|---|
✓ | × | × | 14.5 K | 31.84 | 0.6997 |
✓ | ✓ | × | 14.5 K | 32.02 | 0.7002 |
✓ | ✓ | ✓ | 15.4 K | 32.15 | 0.7041 |
Transformation Type | PSNR | SSIM | EN | AG | EI | SF |
---|---|---|---|---|---|---|
point-wise add | 23.2113 | 0.6377 | 4.6732 | 2.9369 | 21.1218 | 9.5865 |
Channel Attention | 25.8466 | 0.6902 | 6.6181 | 3.6187 | 31.7487 | 12.3742 |
Spatial Attention | 28.7214 | 0.6911 | 7.2657 | 4.1258 | 42.0281 | 15.8656 |
Ours | 30.0418 | 0.7041 | 7.6473 | 4.2393 | 49.9074 | 18.9905 |
PSNR | SSIM | EN | AG | EI | SF | |||
---|---|---|---|---|---|---|---|---|
✓ | × | × | 33.6149 | 0.9012 | 7.0281 | 2.2627 | 30.6074 | 14.1833 |
✓ | ✓ | × | 28.8282 | 0.6702 | 7.6657 | 4.4251 | 52.0372 | 16.7221 |
✓ | ✓ | ✓ | 30.0418 | 0.7041 | 7.2657 | 4.1258 | 42.0281 | 15.8656 |
Training Strategy | SISR | Multimodal SR | ||||
---|---|---|---|---|---|---|
PSNR | SSIM | EN | AG | EI | SF | |
× | 25.64 | 0.6130 | 7.1545 | 4.2258 | 40.8564 | 14.9313 |
✓ | 32.15 | 0.7041 | 7.2657 | 4.1258 | 42.0281 | 15.8656 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jiang, Y.; Liu, Y.; Zhan, W.; Zhu, D. Improved Thermal Infrared Image Super-Resolution Reconstruction Method Base on Multimodal Sensor Fusion. Entropy 2023, 25, 914. https://doi.org/10.3390/e25060914
Jiang Y, Liu Y, Zhan W, Zhu D. Improved Thermal Infrared Image Super-Resolution Reconstruction Method Base on Multimodal Sensor Fusion. Entropy. 2023; 25(6):914. https://doi.org/10.3390/e25060914
Chicago/Turabian StyleJiang, Yichun, Yunqing Liu, Weida Zhan, and Depeng Zhu. 2023. "Improved Thermal Infrared Image Super-Resolution Reconstruction Method Base on Multimodal Sensor Fusion" Entropy 25, no. 6: 914. https://doi.org/10.3390/e25060914
APA StyleJiang, Y., Liu, Y., Zhan, W., & Zhu, D. (2023). Improved Thermal Infrared Image Super-Resolution Reconstruction Method Base on Multimodal Sensor Fusion. Entropy, 25(6), 914. https://doi.org/10.3390/e25060914