USES-Net: An Infrared Dim and Small Target Detection Network with Embedded Knowledge Priors
Abstract
:1. Introduction
- (1)
- An infrared dim and small target detection and recognition method based on the joint drives of knowledge and data is designed and implemented. By embedding the local contrast distribution of the dim, small infrared target into the network model as a physical prior, the original purely data-driven deep learning method is extended, which makes the network training process more targeted and improves the generalization and interpretability of the network model.
- (2)
- Innovatively introducing the Swin Transformer attention module into the UNet network structure to replace traditional convolutional kernels for target feature extraction can effectively overcome the receptive field limitation problem of convolutional kernels. Supervised learning methods are used to extract richer global semantic features of targets during training, fully exploiting the intrinsic information of dim, small infrared targets.
- (3)
- A bottom-up cross-layer feature fusion module (AFM) is designed as the decoder of the proposed network, which can reconstruct the target feature information obtained at different scales and can fully retain the low-level local spatial features and high-level global semantic features of small infrared targets. In addition, the slice-assisted enhancement and inference method based on SAHI can further enhance the feature saliency of dim and small infrared targets and ultimately achieves a more accurate detection and recognition effect.
2. Related Work
3. Proposed Method
3.1. Overview
3.2. The Proposed Network Structure
3.2.1. Feature Extraction Module
- (1)
- First, a layer norm (LN) operation is performed on input feature map I, and the data are standardized in the channel dimension. The output result is ILN. The corresponding formula is
- (2)
- For feature map ILN, after layer regularization, the feature weight based on the multi-head self-attention (MSA) mechanism is calculated to obtain IAttention. The corresponding formula is
- (3)
- To obtain the intermediate feature (F), the original input feature map (I) is connected with the IAttention value calculated via MSA in its residual form, and the corresponding formula is
- (4)
- To obtain the output (S), the intermediate feature (F) is regularized using a layer norm (LN) operation and adjusted with a multilayer perceptron (MLP). The adjusted result is then connected with F through the residual network. The corresponding formula is
- (5)
- Finally, the output result (S) is subjected to image block merging operations, which reduce its size by half and double its number of channels. This is achieved through image block stitching, layer regularization, and channel linear mapping operations. The final output feature map (Z) has a size of and a number of channels of 2C.
3.2.2. Local Contrast Learning Module
3.2.3. Multi-Scale Feature Fusion Module
3.2.4. Loss Function and Slicing-Aided Inference
Algorithm 1. The work process of USES-Net |
USES-Net Training and Testing (One Image as a Batch) |
The Training Process: 1. Initialize: Input the infrared image I, label m, set the learning rate , model detection function (Including encode convolution , decode convolution , Swin-Transformer fitting function and EPCLM fitting function ). 2. Step 1: Perform slicing-aided enhancement on training set to enhance the images and extract different hierarchical feature maps from to on I. 3. 4. Step 2: Perform EPCLM operation on features maps to calculate embedded patch-based local contrast feature maps from to . 5. 6. Step 3: Perform AFM operation on sequentially from top to bottom according to Equation (9), and output a multi-scale contrast feature fusion map . 7. Step 4: Perform on and output the predicted mask through sigmoid activation function. 8. 9. Step 5: Calculate the loss value with and according to Equation (11) 10. Step 6: Iteratively update the model detection function according to the gradient descent. 11. |
12. Return the final parameters of . The Testing Process: |
1. Step 1: Input the test infrared image set , perform slicing-aided hyper inference on to divide it into some overlapping patches . 2. Step 2: Resize each patch while preserving the aspect ratio, and then apply the detection model from the training process independently to each overlapping patch. 3. Step 3: Merge the overlapping predictions and full hyper inference results into original size using NMS. According to Equation (10), pixels having a higher Soft-IoU ratio than a predefined matching threshold are matched, and for each match, while detections with a detection probability lower than the threshold are removed. 4. Step 4: Finally, output the infrared small target results. |
4. Experimental Analysis
4.1. Dataset Description
4.2. Experimental Setup
4.3. Comparison with Some State-of-the-Art Methods on the NUAA-SIRST Dataset
4.4. Comparison with Some State-of-the-Art Methods on the NUDT-SIRST Dataset
4.5. Comparison with Some State-of-the-Art Methods on the IRSTD-1K Dataset
4.6. Ablation Study
4.7. Computational Efficiency Analysis
4.8. Error Diagnosis and Limitations
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhao, M.; Li, W.; Li, L.; Hu, J.; Ma, P.; Tao, R. Single-frame infrared small-target detection: A survey. IEEE Geosci. Remote Sens. Mag. 2022, 10, 87–119. [Google Scholar] [CrossRef]
- Wu, L.; Fang, S.; Ma, Y.; Fan, F.; Huang, J. Infrared small target detection based on gray intensity descent and local gradient watershed. Infrared Phys. Technol. 2022, 123, 104171. [Google Scholar] [CrossRef]
- Rawat, S.S.; Verma, S.K.; Kumar, Y. Review on recent development in infrared small target detection algorithms. Procedia Comput. Sci. 2020, 167, 2496–2505. [Google Scholar] [CrossRef]
- Zhao, B.; Wang, C.; Fu, Q.; Han, Z. A novel pattern for infrared small target detection with generative adversarial network. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4481–4492. [Google Scholar] [CrossRef]
- Han, J.; Liang, K.; Zhou, B.; Zhu, X.; Zhao, J.; Zhao, L. Infrared small target detection utilizing the multiscale relative local contrast measure. IEEE Geosci. Remote Sens. Lett. 2018, 15, 612–616. [Google Scholar] [CrossRef]
- Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Asymmetric contextual modulation for infrared small target detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–7 January 2021; pp. 950–959. [Google Scholar]
- Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
- Fan, Z.; Bi, D.; Xiong, L.; Ma, S.; He, L.; Ding, W. Dim infrared image enhancement based on convolutional neural network. Neurocomputing 2018, 272, 396–404. [Google Scholar] [CrossRef]
- Zhang, S.; Huang, X.; Wang, M. Background suppression algorithm for infrared images based on Robinson guard filter. In Proceedings of the 2017 2nd International Conference on Multimedia and Image Processing (ICMIP), Wuhan, China, 17–19 March 2017; pp. 250–254. [Google Scholar]
- Pan, S.; Zhang, S.; Zhao, M.; An, B. Infrared small target detection based on double-layer local contrast measure. Acta Photonica Sin. 2020, 49, 0110003. [Google Scholar]
- Zhang, X.; Ding, Q.; Luo, H.; Hui, B.; Chang, Z.; Zhang, J. Infrared small target detection based on an image-patch tensor model. Infrared Phys. Technol. 2019, 99, 55–63. [Google Scholar] [CrossRef]
- Jun, C.; Yuanyuan, H.; Pengze, L. Infrared small target detection algorithm using visual contrast mechanism. Syst. Eng. Electron. 2019, 41, 2416–2423. [Google Scholar]
- Wei, Y.; You, X.; Li, H. Multiscale patch-based contrast measure for small infrared target detection. Pattern Recognit. 2016, 58, 216–226. [Google Scholar] [CrossRef]
- Zhang, L.; Peng, L.; Zhang, T.; Cao, S.; Peng, Z. Infrared small target detection via non-convex rank approximation minimization joint l 2, 1 norm. Remote Sens. 2018, 10, 1821. [Google Scholar] [CrossRef]
- Zhou, F.; Wu, Y.; Dai, Y.; Wang, P. Detection of small target using schatten 1/2 quasi-norm regularization with reweighted sparse enhancement in complex infrared scenes. Remote Sens. 2019, 11, 2058. [Google Scholar] [CrossRef]
- Vaishnavi, R.; Unnikrishnan, G.; Raj, A.B. Implementation of algorithms for Point target detection and tracking in Infrared image sequences. In Proceedings of the 2019 4th International Conference on Recent Trends on Electronics, Information, Communication & Technology (RTEICT), Bangalore, India, 17–18 May 2019; pp. 904–909. [Google Scholar]
- Yi, W.; Fang, Z.; Li, W.; Hoseinnezhad, R.; Kong, L. Multi-frame track-before-detect algorithm for maneuvering target tracking. IEEE Trans. Veh. Technol. 2020, 69, 4104–4118. [Google Scholar] [CrossRef]
- Wang, J.; Yi, W.; Kirubarajan, T.; Kong, L. An efficient recursive multiframe track-before-detect algorithm. IEEE Trans. Aerosp. Electron. Syst. 2017, 54, 190–204. [Google Scholar] [CrossRef]
- Lee, J.-Y. A Study of CR-DuNN based on the LSTM and Du-CNN to Predict Infrared Target Feature and Classify Targets from the Clutters. Trans. Korean Inst. Electr. Eng. 2019, 68, 153–158. [Google Scholar]
- Qili, Y.; Binghong, Z.; Wei, Z. Trajectory detection of small targets based on convolutional long short-term memory with attention mechanisms. Opt. Precis. Eng. 2020, 28, 2535–2548. [Google Scholar]
- Zhao, M.; Cheng, L.; Yang, X.; Feng, P.; Liu, L.; Wu, N. TBC-Net: A real-time detector for infrared small target detection using semantic constraint. arXiv 2019, arXiv:2001.05852. [Google Scholar]
- Wang, H.; Zhou, L.; Wang, L. Miss detection vs. false alarm: Adversarial learning for small object segmentation in infrared images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8509–8518. [Google Scholar]
- Huang, L.; Dai, S.; Huang, T.; Huang, X.; Wang, H. Infrared small target segmentation with multiscale feature representation. Infrared Phys. Technol. 2021, 116, 103755. [Google Scholar] [CrossRef]
- Dai, Y.; Wu, Y.; Zhou, F.; Barnard, K. Attentional local contrast networks for infrared small target detection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9813–9824. [Google Scholar] [CrossRef]
- Zhang, M.; Zhang, R.; Yang, Y.; Bai, H.; Zhang, J.; Guo, J. ISNet: Shape matters for infrared small target detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 877–886. [Google Scholar]
- Li, B.; Xiao, C.; Wang, L.; Wang, Y.; Lin, Z.; Li, M.; An, W.; Guo, Y. Dense nested attention network for infrared small target detection. IEEE Trans. Image Process. 2022, 32, 1745–1758. [Google Scholar] [CrossRef]
- Sun, H.; Bai, J.; Yang, F.; Bai, X. Receptive-Field and Direction Induced Attention Network for Infrared Dim Small Target Detection With a Large-Scale Dataset IRDST. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–13. [Google Scholar] [CrossRef]
- Hou, Q.; Zhang, L.; Tan, F.; Xi, Y.; Zheng, H.; Li, N. ISTDU-Net: Infrared Small-Target Detection U-Net. IEEE Geosci. Remote Sens. Lett. 2022, 19, 7506205. [Google Scholar] [CrossRef]
- Wu, X.; Hong, D.; Chanussot, J. UIU-Net: U-Net in U-Net for infrared small object detection. IEEE Trans. Image Process. 2022, 32, 364–376. [Google Scholar] [CrossRef] [PubMed]
- Akyon, F.C.; Altinuc, S.O.; Temizel, A. Slicing aided hyper inference and fine-tuning for small object detection. In Proceedings of the 2022 IEEE International Conference on Image Processing (ICIP), Bordeaux, France, 16–19 October 2022; pp. 966–970. [Google Scholar]
- Lin, J.; Zhang, K.; Yang, X.; Cheng, X.; Li, C. Infrared dim and small target detection based on U-Transformer. J. Vis. Commun. Image Represent. 2022, 89, 103684. [Google Scholar] [CrossRef]
- Ju, M.; Luo, J.; Liu, G.; Luo, H. ISTDet: An efficient end-to-end neural network for infrared small target detection. Infrared Phys. Technol. 2021, 114, 103659. [Google Scholar] [CrossRef]
- Ryu, J.; Kim, S. Small infrared target detection by data-driven proposal and deep learning-based classification. In Proceedings of the Infrared Technology and Applications XLIV, Orlando, FL, USA, 16–19 April 2018; pp. 134–143. [Google Scholar]
- Fan, M.; Tian, S.; Liu, K.; Zhao, J.; Li, Y. Infrared small target detection based on region proposal and CNN classifier. Signal Image Video Process. 2021, 15, 1927–1936. [Google Scholar] [CrossRef]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Yu, C.; Liu, Y.; Wu, S.; Hu, Z.; Xia, X.; Lan, D.; Liu, X. Infrared small target detection based on multiscale local contrast learning networks. Infrared Phys. Technol. 2022, 123, 104107. [Google Scholar] [CrossRef]
Structure | Input Shape | Output Shape |
---|---|---|
Enhancement | (256, 256, 3) | (256, 256, 1) |
Encode1 | (256, 256, 1) | (128, 128, 32) |
Encode2 | (128, 128, 32) | (64, 64, 64) |
Swin Transfomer Module1 | (64, 64, 64) | (32, 32, 128) |
Swin Transfomer Module2 | (32, 32, 128) | (16, 16, 256) |
Swin Transfomer Module3 | (16, 16, 256) | (8, 8, 512) |
Decode1 | (8, 8, 512) | (16, 16, 256) |
Decode2 | (16, 16, 256) | (32, 32, 128) |
Decode3 | (32, 32, 128) | (64, 64, 64) |
Decode4 | (64, 64, 64) | (128, 128, 32) |
Decode5 | (128, 128, 32) | (256, 256, 1) |
Types and Hyperparameters | Details and Values |
---|---|
CPU | 12th Gen Intel(R) Core (TM) i5-12600K |
GPU | NVIDIA GeForce RTX 3080 Ti |
Memory Size | 12 GB |
PyTorch Version | 1.10.0 |
Acceleration Environment | Cuda11.3 |
Learning Rate | 1 × 10−3 |
Batch Size | 16 |
Epoch | 400 |
Optimizer | Adam |
Methods | PA | mIOU | Pd | Fa |
---|---|---|---|---|
ACM | 0.835 | 0.694 | 0.920 | 2.271 × 10−5 |
ALC-Net | 0.755 | 0.610 | 0.871 | 5.600 × 10−5 |
DNA-Net | 0.833 | 0.748 | 0.935 | 3.828 × 10−5 |
ISNet | 0.891 | 0.705 | 0.951 | 6.798 × 10−5 |
ISTDU-Net | 0.866 | 0.759 | 0.962 | 3.890 × 10−5 |
RDIAN | 0.832 | 0.707 | 0.951 | 4.733 × 10−5 |
UIU-Net | 0.840 | 0.775 | 0.924 | 9.330 × 10−6 |
Proposed method | 0.902 | 0.763 | 0.965 | 8.724 × 10−6 |
Methods | PA | mIOU | Pd | Fa |
---|---|---|---|---|
ACM | 0.864 | 0.649 | 0.967 | 2.859 × 10−5 |
ALC-Net | 0.926 | 0.611 | 0.972 | 2.909 × 10−5 |
DNA-Net | 0.963 | 0.942 | 0.993 | 2.390 × 10−6 |
ISNet | 0.922 | 0.812 | 0.978 | 6.343 × 10−6 |
ISTDU-Net | 0.945 | 0.918 | 0.985 | 3.769 × 10−6 |
RDIAN | 0.908 | 0.824 | 0.988 | 1.360 × 10−5 |
UIU-Net | 0.948 | 0.905 | 0.988 | 8.342 × 10−6 |
Proposed method | 0.971 | 0.948 | 0.991 | 2.146 × 10−6 |
Methods | PA | mIOU | Pd | Fa |
---|---|---|---|---|
ACM | 0.852 | 0.603 | 0.933 | 6.802 × 10−5 |
ALC-Net | 0.796 | 0.581 | 0.929 | 7.411 × 10−5 |
DNA-Net | 0.766 | 0.657 | 0.896 | 1.234 × 10−5 |
ISNet | 0.776 | 0.619 | 0.902 | 3.156 × 10−5 |
ISTDU-Net | 0.802 | 0.650 | 0.939 | 2.644 × 10−5 |
RDIAN | 0.735 | 0.599 | 0.872 | 3.321 × 10−5 |
UIU-Net | 0.779 | 0.657 | 0.912 | 1.342 × 10−5 |
Proposed method | 0.873 | 0.692 | 0.951 | 1.148 × 10−5 |
Experiment | Swin-Transformer | EPCLM | SAHI | PA | mIOU | Pd | Fa |
---|---|---|---|---|---|---|---|
Exp1 | N | N | N | 0.723 | 0.516 | 0.862 | 8.963 × 10−5 |
Exp2 | Y | N | N | 0.808 | 0.663 | 0.903 | 5.288 × 10−5 |
Exp3 | Y | Y | N | 0.864 | 0.712 | 0.951 | 2.075 × 10−5 |
Exp4 | Y | Y | Y | 0.902 | 0.763 | 0.965 | 8.724 × 10−6 |
Experiment | Swin-Transformer | EPCLM | SAHI | PA | mIOU | Pd | Fa |
---|---|---|---|---|---|---|---|
Exp1 | N | N | N | 0.851 | 0.624 | 0.942 | 1.323 × 10−5 |
Exp2 | Y | N | N | 0.903 | 0.787 | 0.975 | 8.753 × 10−6 |
Exp3 | Y | Y | N | 0.946 | 0.859 | 0.988 | 6.582 × 10−6 |
Exp4 | Y | Y | Y | 0.971 | 0.948 | 0.991 | 2.146 × 10−6 |
Experiment | Swin-Transformer | EPCLM | SAHI | PA | mIOU | Pd | Fa |
---|---|---|---|---|---|---|---|
Exp1 | N | N | N | 0.757 | 0.527 | 0.861 | 8.352 × 10−5 |
Exp2 | Y | N | N | 0.823 | 0.604 | 0.913 | 6.096 × 10−5 |
Exp3 | Y | Y | N | 0.844 | 0.657 | 0.929 | 4.358 × 10−5 |
Exp4 | Y | Y | Y | 0.873 | 0.692 | 0.951 | 1.148 × 10−5 |
Methods | Parameters(M) | FLOPs(G) | FPS | Platform |
---|---|---|---|---|
ACM | 0.398 | 0.402 | 113 | GTX 3080TI |
ALC-Net | 0.427 | 0.378 | 85 | GTX 3080TI |
DNA-Net | 4.697 | 14.261 | 32 | GTX 3080TI |
ISNet | 0.966 | 30.618 | 49 | GTX 3080TI |
ISTDU-Net | 2.752 | 7.944 | 37 | GTX 3080TI |
RDIAN | 0.217 | 3.718 | 56 | GTX 3080TI |
UIU-Net | 50.540 | 54.425 | 21 | GTX 3080TI |
Proposed method | 0.872 | 1.155 | 67 | GTX 3080TI |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, L.; Liu, L.; He, Y.; Zhong, Z. USES-Net: An Infrared Dim and Small Target Detection Network with Embedded Knowledge Priors. Electronics 2024, 13, 1400. https://doi.org/10.3390/electronics13071400
Li L, Liu L, He Y, Zhong Z. USES-Net: An Infrared Dim and Small Target Detection Network with Embedded Knowledge Priors. Electronics. 2024; 13(7):1400. https://doi.org/10.3390/electronics13071400
Chicago/Turabian StyleLi, Lingxiao, Linlin Liu, Yunan He, and Zhuqiang Zhong. 2024. "USES-Net: An Infrared Dim and Small Target Detection Network with Embedded Knowledge Priors" Electronics 13, no. 7: 1400. https://doi.org/10.3390/electronics13071400
APA StyleLi, L., Liu, L., He, Y., & Zhong, Z. (2024). USES-Net: An Infrared Dim and Small Target Detection Network with Embedded Knowledge Priors. Electronics, 13(7), 1400. https://doi.org/10.3390/electronics13071400