Lightweight Neural Network for Centroid Detection of Weak, Small Infrared Targets via Background Matching in Complex Scenes
Abstract
:1. Introduction
1.1. Challenges
1.2. Existing State-Of-The-Art Methods
1.3. Our Contributions
2. Materials and Methods
2.1. Data and Label Preprocessing
2.1.1. Data Preprocessing
2.1.2. Label Preprocessing
- Directly predicting a centroid coordinate based on the input images is a regression task, which is more complex than predicting the centroid map. When a neural network is trained to predict the coordinate values of targets, it is essentially learning how to map the visual features within images onto these numerical coordinates. However, the neural network might merely treat the output coordinate values as numerical entities with magnitudes, without comprehending the positional information they represent. This significantly undermines the model’s generalization ability, resulting in the network only achieving good regression results on the training set but not on the test set, thereby leading to overfitting. This is precisely why algorithms like YOLOv5 do not directly output the coordinate values of targets but instead produce feature maps of various sizes.
- Directly predicting centroid coordinates may result in error accumulation. Since the network needs to predict accurate coordinates immediately, any small prediction error may lead to a significant deviation in the final result. Predicting the centroid map can gradually reduce errors through pixel-level predictions, as the predictions for each pixel are relatively independent.
- The centroid map provides detailed spatial distribution information about the target centroid position through pixel intensities. The intensity of each pixel indicates the proximity of that pixel to the target centroid, and this spatial distribution information is highly useful for the network as it helps the network understand the position of the target in the image. In contrast, directly outputting a single centroid coordinate would lead to the loss of this spatial distribution information.
2.2. Network Architecture
2.3. Local Feature Aggregation Module
2.4. Post-Processing for Centroid Computation
- Firstly, we determine the number of high-energy regions (depicted as red regions in Figure 4) in each centroid map, denoted as n. The value of n also represents the number of targets in the centroid map. When the centroid coordinates are (x.5, y.5), the distance between the centroid and the pixel center reaches its maximum value. At this point, the maximum value within the high-energy area will be minimized. According to Equation (3), this maximum value can be calculated to be approximately 0.8. Therefore, we consider two peaks in high-energy areas to represent two targets when both maximum values exceed 0.8 and their positions are separated by more than 16 pixels.
- For each high-energy region in the centroid map, we locate the position (, ) of the point with the highest value within this region. Let denote its value. If the predicted centroid map is accurate enough, the value of each position in the centroid map is calculated by Equation (3). Then, the value of is calculated using the following equation:
- Within the 4-neighborhood N4() of , we find the positions (, ) and (, ) of points and in the vertical direction, respectively, with their values denoted as and . Similarly, the values of and are calculated using the following equations:
- Since , , and are in the same column in the image, = = . According to Equations (6) and (7), we can calculate the centroid’s vertical coordinate . According to Equations (6) and (8), we can calculate the centroid’s vertical coordinate . We take their average as the final centroid’s vertical coordinate .
- Within the 4-neighborhood N4() of , we find the positions (, ) and (, ) of points and in the horizontal direction, respectively, with their values denoted as and . Similarly, the values of and are calculated using the following equations:
- Since , , and are in the same row in the image, = = . According to Equations (6) and (9), we can calculate the centroid’s horizontal coordinate . According to Equations (6) and (10), we can calculate the centroid’s horizontal coordinate . We take their average as the final centroid’s horizontal coordinate .
- Finally, we repeat steps 2–6 until all n high-energy regions in the centroid map are traversed.
3. Results
3.1. Implementation Details
3.1.1. Training Settings
3.1.2. Evaluation Metrics
3.2. Comparison with Other State-Of-The-Art Methods
3.3. Ablation Experiments
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
Appendix A
References
- Lin, B.; Yang, X.; Wang, J.; Wang, Y.Y.; Wang, K.P.; Zhang, X.H. A Robust Space Target Detection Algorithm Based on Target Characteristics. IEEE Geosci. Remote Sens. Lett. 2022, 19, 8012405. [Google Scholar] [CrossRef]
- Wang, J.W.; Li, G.; Zhao, Z.C.; Jiao, J.; Ding, S.; Wang, K.P.; Duan, M.Y. Space Target Anomaly Detection Based on Gaussian Mixture Model and Micro-Doppler Features. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5118411. [Google Scholar] [CrossRef]
- Fang, H.Z.; Ding, L.; Wang, L.M.; Chang, Y.; Yan, L.X.; Han, J.H. Infrared Small UAV Target Detection Based on Depthwise Separable Residual Dense Network and Multiscale Feature Fusion. IEEE Trans. Instrum. Meas. 2022, 71, 5019120. [Google Scholar] [CrossRef]
- Ma, X.Y.; Li, Y. Edge-Aided Multiscale Context Network for Infrared Small Target Detection. IEEE Geosci. Remote Sens. Lett. 2023, 20, 7001405. [Google Scholar] [CrossRef]
- Wenchao, Z.; Yanfei, W.; Hexin, C. Moving Point Target Detection in Complex Background Based on Tophat Transform. J. Image Graph. 2007, 12, 871–874. [Google Scholar]
- Qingboa, J.I.; Xingzhou, Z.; Xuezhi, X. A Detection Method for Small Targets Based on Wavelet Transform and Data Fusion. J. Proj. Rocket. Missiles Guid. 2008, 28, 234. [Google Scholar]
- Li, Y.S.; Li, Z.Z.; Shen, Y.; Li, J. Infrared Small Target Detection Based on 1-D Difference of Guided Filtering. IEEE Geosci. Remote Sens. Lett. 2023, 20, 7000105. [Google Scholar] [CrossRef]
- Luo, J.; Yu, H. Research of Infrared Dim and Small Target Detection Algorithms Based on Low-Rank and Sparse Decomposition. Laser Optoelectron. Prog. 2023, 60, 1600004. [Google Scholar]
- Hao, S.; Ma, X.; Fu, Z.X.; Wang, Q.L.; Li, H.A. Landing Cooperative Target Robust Detection via Low Rank and Sparse Matrix Decomposition. In Proceedings of the 3rd International Symposium on Computer, Consumer and Control (IS3C), Xi’an, China, 4–6 July 2016; pp. 172–175. [Google Scholar]
- Zhou, W.N.; Xue, X.Y.; Chen, Y. Low-Rank and Sparse Decomposition Based Frame Difference Method for Small Infrared Target Detection in Coastal Surveillance. IEICE Trans. Inf. Syst. 2016, 99, 554–557. [Google Scholar] [CrossRef]
- Chen, C.L.P.; Li, H.; Wei, Y.T.; Xia, T.; Tang, Y.Y. A Local Contrast Method for Small Infrared Target Detection. IEEE Trans. Geosci. Remote Sens. 2014, 52, 574–581. [Google Scholar] [CrossRef]
- Qin, Y.; Li, B. Effective Infrared Small Target Detection Utilizing a Novel Local Contrast Method. IEEE Geosci. Remote Sens. Lett. 2016, 13, 1890–1894. [Google Scholar] [CrossRef]
- He, Y.J.; Li, M.; Wei, Z.H.; Cai, Y.C. Infrared Small Target Detection Based on Weighted Variation Coefficient Local Contrast Measure. In Proceedings of the 4th Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Beijing, China, 19–21 December 2021; pp. 117–127. [Google Scholar]
- Xu, D.Q.; Wu, Y.Q. MRFF-YOLO: A Multi-Receptive Fields Fusion Network for Remote Sensing Target Detection. Remote Sens. 2020, 12, 3118. [Google Scholar] [CrossRef]
- Liu, H.; Ding, M.; Li, S.; Xu, Y.B.; Gong, S.L.; Kasule, A.N. Small-Target Detection Based on an Attention Mechanism for Apron-Monitoring Systems. Appl. Sci. 2023, 13, 5231. [Google Scholar] [CrossRef]
- Xu, H.; Zhong, S.; Zhang, T.X.; Zou, X. Multiscale Multilevel Residual Feature Fusion for Real-Time Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5002116. [Google Scholar] [CrossRef]
- Chen, Q.; Wang, Y.M.; Yang, T.; Zhang, X.Y.; Cheng, J.; Sun, J. You Only Look One-Level Feature. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 19–25 June 2021; pp. 13034–13043. [Google Scholar]
- Xiong, J.; Wu, J.; Tang, M.; Xiong, P.W.; Huang, Y.S.; Guo, H. Combining YOLO and background subtraction for small dynamic target detection. Visual Comput. 2024. [Google Scholar] [CrossRef]
- Lin, J.; Zhang, K.; Yang, X.; Cheng, X.Z.; Li, C.H. Infrared dim and small target detection based on U-Transformer. J. Vis. Commun. Image Represent. 2022, 89, 103684. [Google Scholar] [CrossRef]
- Tong, X.Z.; Zuo, Z.; Su, S.J.; Wei, J.Y.; Sun, X.Y.; Wu, P.; Zhao, Z.Q. ST-Trans: Spatial-Temporal Transformer for Infrared Small Target Detection in Sequential Images. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5001819. [Google Scholar] [CrossRef]
- Zhang, F.; Lin, S.L.; Xiao, X.Y.; Wang, Y.; Zhao, Y.Q. Global attention network with multiscale feature fusion for infrared small target detection. Opt. Laser Technol. 2024, 168, 110012. [Google Scholar] [CrossRef]
- Chen, G.; Wang, W.H.; Li, X.J. Designing and learning a lightweight network for infrared small target detection via dilated pyramid and semantic distillation. Infrared Phys. Technol. 2023, 131, 104671. [Google Scholar] [CrossRef]
- Wang, K.W.; Du, S.Y.; Liu, C.X.; Cao, Z.G. Interior Attention-Aware Network for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5002013. [Google Scholar] [CrossRef]
- Dai, Y.M.; Wu, Y.Q.; Zhou, F.; Barnard, K. Attentional Local Contrast Networks for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2021, 59, 9813–9824. [Google Scholar] [CrossRef]
- Yuan, S.; Qin, H.L.; Yan, X.; Akhtar, N.; Mian, A. SCTransNet: Spatial-Channel Cross Transformer Network for Infrared Small Target Detection. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5002615. [Google Scholar] [CrossRef]
- Kou, R.K.; Wang, C.P.; Yu, Y.; Peng, Z.M.; Huang, F.Y.; Fu, Q. Infrared Small Target Tracking Algorithm via Segmentation Network and Multistrategy Fusion. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5612912. [Google Scholar] [CrossRef]
- Li, B.Y.; Xiao, C.; Wang, L.G.; Wang, Y.Q.; Lin, Z.P.; Li, M.; An, W.; Guo, Y.L. Dense Nested Attention Network for Infrared Small Target Detection. IEEE Trans. Image Process. 2023, 32, 1745–1758. [Google Scholar] [CrossRef]
- Chen, Y.H.; Li, L.Y.; Liu, X.; Su, X.F. A Multi-Task Framework for Infrared Small Target Detection and Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 5003109. [Google Scholar] [CrossRef]
- Hui, B.; Song, Z.; Fan, H.; Zhong, P.; Hu, W.; Zhang, X.; Ling, J.; Su, H.; Jin, W.; Zhang, Y.; et al. A dataset for infrared image dim-small aircraft target detection and tracking under ground/air background. Sci. Data Bank 2019, 5, 291–302. [Google Scholar]
- Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the Inception Architecture for Computer Vision. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
- Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 7464–7475. [Google Scholar]
- Wang, C.; Yeh, I.; Liao, H.M. YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
- Wang, A.; Chen, H.; Liu, L.; Chen, K.; Lin, Z.; Han, J.; Ding, G. YOLOv10: Real-Time End-to-End Object Detection. arXiv 2024, arXiv:2405.14458. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Proceedings of the 14th European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Zhou, X.; Wang, D.; Krähenbühl, P. Objects as Points. arXiv 2019, arXiv:1904.07850. [Google Scholar]
- Wu, T.H.; Li, B.Y.; Luo, Y.H.; Wang, Y.Q.; Xiao, C.; Liu, T.; Yang, J.G.; An, W.; Guo, Y.L. MTU-Net: Multilevel TransUNet for Space-Based Infrared Tiny Ship Detection. IEEE Trans. Geosci. Remote Sens. 2023, 61, 5601015. [Google Scholar] [CrossRef]
- Xu, X.D.; Wang, J.R.; Zhu, M.; Sun, H.J.; Wu, Z.Y.; Wang, Y.; Cao, S.Y.; Liu, S.Z. UCDnet: Double U-Shaped Segmentation Network Cascade Centroid Map Prediction for Infrared Weak Small Target Detection. Remote Sens. 2023, 15, 3736. [Google Scholar] [CrossRef]
Scene | Description | Quantity | Train | Test |
---|---|---|---|---|
Scene 1 | Background changes slowly, targets appear as points, targets are small with little variation in size, the smallest target size is just one pixel, and targets move slowly. | 3000 | √ | |
Scene 4 | 399 | √ | ||
Scene 8 | 1500 | √ | ||
Scene 11 | 751 | √ | ||
Scene 2 | The targets have very low contrast and move slowly, and the high complexity of the background causes some targets to be submerged in the background in certain images. | 399 | √ | |
Scene 6 | 401 | √ | ||
Scene 7 | 745 | √ | ||
Scene 9 | 763 | √ | ||
Scene 10 | 1426 | √ | ||
Scene 3 | High background complexity, uneven lighting in some images, relatively large and continuously changing target sizes, and fast target movement. | 399 | √ | |
Scene 5 | 399 | √ | ||
Scene 12 | 500 | √ | ||
Scene 13 | 500 | √ | ||
Scene 14 | 499 | √ |
mDis | Dr | |
---|---|---|
0.5 | 0.5205 | 79.66% |
1 | 0.4224 | 86.83% |
2 | 0.2577 | 87.37% |
4 | 0.3008 | 85.66% |
Method | mDis | Dr | Fr | FLOPs(G) | Params(M) | FPS |
---|---|---|---|---|---|---|
YOLOv5s | 0.5226 | 42.07% | 0.3573 | 1.2641 | 7.0128 | 42 |
YOLOv7 [31] | 0.5981 | 70.02% | 0.1867 | 8.2657 | 36.4799 | 45 |
YOLOv8n | 0 | 0 | 0 | 0.6511 | 3.0110 | 164 |
YOLOv9 [32] | 0 | 0 | 0 | 3.0972 | 9.5980 | 36 |
YOLOv10s [33] | 0 | 0 | 0 | 1.9817 | 8.0671 | 102 |
SSD [34] | 0.8270 | 68.20% | 0.3238 | 30.4530 | 23.7454 | 49 |
CenterNet [35] | 1.2932 | 63.73% | 0.3072 | 8.7420 | 32.6642 | 32 |
DNAnet [27] | 0.0082 | 57.35% | 0.1818 | 14.2479 | 4.6969 | 35 |
MTU-Net [36] | 0.0779 | 60.25% | 0.3337 | 6.2123 | 8.2202 | 78 |
UCDnet [37] | 0.0401 | 58.67% | 0.3313 | 14.7372 | 9.2762 | 36 |
Ours | 0.2577 | 87.37% | 0.1263 | 1.1871 | 0.1574 | 40 |
Input | mDis | Dr | Fr |
---|---|---|---|
single-frame images | 0.7107 | 69.07% | 0.3093 |
sequences of 5 unmatched images | 0.2185 | 77.49% | 0.2251 |
sequences of 5 matched images | 0.2577 | 87.37% | 0.1263 |
sequences of 10 matched images | 0.4142 | 89.11% | 0.1089 |
sequences of 20 matched images | 0.3500 | 91.79% | 0.0821 |
Method | mDis | Dr | Fr | FLOPs (G) | Params (M) |
---|---|---|---|---|---|
CNN | 0.2737 | 84.95% | 0.1505 | 0.8427 | 0.1231 |
LFAM | 0.2577 | 87.37% | 0.1263 | 1.1871 | 0.1574 |
Method | Scene 6 | Scene 9 | Scene 11 | Scene 13 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
mDis | Dr | Fr | mDis | Dr | Fr | mDis | Dr | Fr | mDis | Dr | Fr | |
YOLOv5s | 0.46 | 42% | 0.07 | 0.60 | 59% | 0.20 | 0.43 | 38% | 0.70 | 0.54 | 22% | 0.31 |
YOLOv7 | 0.54 | 41% | 0.06 | 0.60 | 63% | 0.04 | 0.60 | 96% | 0.51 | 0.63 | 65% | 0.02 |
SSD | 0.79 | 53% | 0.23 | 0.98 | 72% | 0.07 | 0.63 | 78% | 0.61 | 0.96 | 60% | 0.36 |
CenterNet | 1.08 | 67% | 0.04 | 1.28 | 71% | 0.14 | 1.40 | 78% | 0.57 | 1.29 | 28% | 0.38 |
DNAnet | 0.00 | 0 | 0.00 | 0.01 | 72% | 0.27 | 0.00 | 89% | 0.22 | 0.01 | 34% | 0.14 |
MTU-Net | 0.00 | 0 | 0.17 | 0.09 | 76% | 0.09 | 0.08 | 92% | 0.57 | 0.03 | 36% | 0.47 |
UCDnet | 0.03 | 30% | 0.04 | 0.05 | 67% | 0.20 | 0.03 | 88% | 0.30 | 0.02 | 25% | 0.80 |
Ours | 0.33 | 51% | 0.49 | 0.34 | 95% | 0.05 | 0.20 | 98% | 0.02 | 0.36 | 90% | 0.10 |
Method | mDis | Dr | Fr | Method | mDis | Dr | Fr |
---|---|---|---|---|---|---|---|
YOLOv5s | 0.5226 | 42.07% | 0.3573 | YOLOv7 | 0.5981 | 70.02% | 0.1867 |
YOLOv5s + GCM | 0.5195 | 42.07% | 0.3573 | YOLOv7 + GCM | 0.5881 | 70.02% | 0.1867 |
YOLOv5s + WCM | 0.4995 | 42.07% | 0.3573 | YOLOv7 + WCM | 0.5646 | 70.02% | 0.1867 |
SSD | 0.8270 | 68.20% | 0.3238 | CenterNet | 1.2932 | 63.73% | 0.3072 |
SSD + GCM | 0.8073 | 68.45% | 0.3213 | CenterNet + GCM | 1.2525 | 63.81% | 0.3064 |
SSD + WCM | 0.7726 | 68.45% | 0.3213 | CenterNet + WCM | 1.2226 | 64.51% | 0.2994 |
DNAnet | 0.0082 | 57.35% | 0.1818 | MTU-Net | 0.0779 | 60.25% | 0.3337 |
DNAnet + GCM | 0.0423 | 57.35% | 0.1818 | MTU-Net + GCM | 0.1094 | 60.25% | 0.3337 |
DNAnet + WCM | 0.0720 | 57.35% | 0.1818 | MTU-Net + WCM | 0.1297 | 60.25% | 0.3337 |
UCDnet | 0.0401 | 58.67% | 0.3313 | Ours | 0.2577 | 87.37% | 0.1263 |
UCDnet + GCM | 0.0734 | 58.67% | 0.3313 | Ours + GCM | 0.2562 | 87.41% | 0.1259 |
UCDnet + WCM | 0.1015 | 58.67% | 0.3313 | Ours + WCM | 0.2443 | 87.41% | 0.1259 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Xu, X.; Wang, J.; Sha, Z.; Nie, H.; Zhu, M.; Nie, Y. Lightweight Neural Network for Centroid Detection of Weak, Small Infrared Targets via Background Matching in Complex Scenes. Remote Sens. 2024, 16, 4301. https://doi.org/10.3390/rs16224301
Xu X, Wang J, Sha Z, Nie H, Zhu M, Nie Y. Lightweight Neural Network for Centroid Detection of Weak, Small Infrared Targets via Background Matching in Complex Scenes. Remote Sensing. 2024; 16(22):4301. https://doi.org/10.3390/rs16224301
Chicago/Turabian StyleXu, Xiangdong, Jiarong Wang, Zhichao Sha, Haitao Nie, Ming Zhu, and Yu Nie. 2024. "Lightweight Neural Network for Centroid Detection of Weak, Small Infrared Targets via Background Matching in Complex Scenes" Remote Sensing 16, no. 22: 4301. https://doi.org/10.3390/rs16224301
APA StyleXu, X., Wang, J., Sha, Z., Nie, H., Zhu, M., & Nie, Y. (2024). Lightweight Neural Network for Centroid Detection of Weak, Small Infrared Targets via Background Matching in Complex Scenes. Remote Sensing, 16(22), 4301. https://doi.org/10.3390/rs16224301