Improved Feature Extraction and Similarity Algorithm for Video Object Detection
Abstract
:1. Introduction
2. Faster RCNN
3. Improved Faster RCNN
4. Motivation and Method
4.1. Motivation
4.2. Improved Similarity Algorithm
4.3. A Spectral Clustering Viewpoint
5. Experiments
5.1. Datasets
5.2. Index of Evaluation
5.3. Implementation Details and Sampling Strategies for Feature Aggregation
5.4. Ablation Study
5.5. Comparison with other Popular Methods
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
S-SELSA | Structural Similarity SELSA |
DEF | Deep Feature Flow for Video Recognition |
TCN | Temporal Convolutional Network |
TPN | Tubelet Proposal Network |
LSTM | Long Short Term Memory Network |
FGFA | Flow-Guided Feature Aggregation for Video Object Detection |
D&T | Detect to Track and Track to Detect |
SSD | Single Shot MultiBox Detector |
DRConv | Dynamic Region Aware Convolution |
FPN | Feature Pyramid Network |
RPN | RegionProposal Network |
SGD | Stochastic Gradient Descent |
References
- Yan, H.; Huang, J.; Li, R.; Wang, X.; Zhang, J.; Zhu, D. Research on video SAR moving target detection algorithm based on improved faster region-based CNN. J. Electron. Inf. Technol. 2021, 43, 615–622. [Google Scholar]
- Du, L.; Wei, D.; Li, L.; Guo, Y. SAR target detection network via semi-supervised learning. J. Electron. Inf. Technol. 2020, 42, 154–163. [Google Scholar]
- Zhang, Y.; Cai, W.; Fan, S.; Song, R.; Jin, J. Object Detection Based on YOLOv5 and GhostNet for Orchard Pests. Information 2022, 13, 548. [Google Scholar] [CrossRef]
- Wang, J.; Yu, L.; Yang, J.; Dong, H. DBA SSD: A novel end-to-end object detection algorithm applied to plant disease detection. Information 2021, 12, 474. [Google Scholar] [CrossRef]
- Zhu, X.; Dai, J.; Yuan, L.; Wei, Y. Towards high performance video object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
- Zhu, H.; Wei, H.; Li, B.; Yuan, X.; Kehtarnavaz, N. A review of video object detection: Datasets, metrics and methods. Appl. Sci. 2020, 10, 7834. [Google Scholar] [CrossRef]
- Zhu, X.; Wang, Y.; Dai, J.; Yuan, L.; Wei, Y. Flow-guided feature aggregation for video object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Han, W.; Khorrami, P.; Paine, T.L.; Ramachandran, P.; Babaeizadeh, M.; Shi, H.; Huang, T.S. Seq-nms for video object detection. arXiv 2016, arXiv:1602.08465. [Google Scholar]
- Wang, S.; Zhou, Y.; Yan, J.; Deng, Z. Fully motion-aware network for video object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Zhu, X.; Xiong, Y.; Dai, J.; Yuan, L.; Wei, Y. Deep feature flow for video recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Kang, K.; Li, H.; Xiao, T.; Ouyang, W.; Yan, J.; Liu, X.; Wang, X. Object detection in videos with tubelet proposal networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Zhao, B.; Zhao, B.; Tang, L.; Han, Y.; Wang, W. Deep spatial-temporal joint feature representation for video object detection. Sensors 2018, 18, 774. [Google Scholar] [CrossRef] [PubMed]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE international conference on computer vision, Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
- Kuznetsova, A.; Rom, H.; Alldrin, N.; Uijlings, J.; Krasin, I.; Pont-Tuset, J.; Ferrari, V. The open images dataset v4. Int. J. Comput. Vis. 2020, 128, 1956–1981. [Google Scholar] [CrossRef]
- Lin, T.-Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014. [Google Scholar]
- Wu, H.; Chen, Y.; Wang, N.; Zhang, Z. Sequence level semantics aggregation for video object detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Chen, J.; Wang, X.; Guo, Z.; Zhang, X.; Sun, J. Dynamic region-aware convolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, X.; Huang, L.; Chen, Y.; Zhang, Y.; Lin, Z.; Wang, R. Model pruning based on quantified similarity of feature maps. arXiv 2021, arXiv:2105.06052. [Google Scholar]
- Meilă, M.; Shi, J. A random walks view of spectral segmentation. Int. Workshop Artif. Intell. Stat. 2001, 2001, 203–208. [Google Scholar]
- Kang, K.; Ouyang, W.; Li, H.; Wang, X. Object detection from video tubelets with convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
- Feichtenhofer, C.; Pinz, A.; Zisserman, A. Detect to track and track to detect. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017. [Google Scholar]
Positive Overlap Threshold (%) | 50 | 60 | 70 | 80 |
---|---|---|---|---|
ResNet101 | 73.62 | 71.58 | 68.64 | 65.69 |
ResNet101 + FPN | 73.88 | 71.52 | 67.59 | 64.33 |
ResNet101 + DRConv | 73.71 | 70.11 | 67.58 | 63.49 |
ResNet101 + DRConv + FPN | 74.21 | 71.31 | 69.39 | 64.98 |
ResNet101 + SELSA | 80.25 | 77.89 | 72.54 | 70.12 |
ResNet101 + DRConv + FPN + SELSA | 81.22 | 76.99 | 72.55 | 70.11 |
ResNet101 + DRConv + FPN + S-SELSA | 83.55 | 80.08 | 75.37 | 71.04 |
Models | mAP (%) | mAP (%) (Slow) | mAP (%) (Medium) | mAP (%) (Fast) |
---|---|---|---|---|
ResNet101 | 73.62 | 82.12 | 70.96 | 51.53 |
ResNet101 + FPN | 73.88 | 82.52 | 71.41 | 52.07 |
ResNet101 + DRConv | 73.71 | 82.58 | 71.63 | 52.15 |
ResNet101 + DRConv + FPN | 74.21 | 82.94 | 72.05 | 52.68 |
ResNet101 + SELSA | 80.25 | 86.91 | 78.94 | 61.38 |
ResNet101 + DRConv + FPN + SELSA | 81.22 | 87.78 | 79.76 | 62.15 |
ResNet101 + DRConv + FPN + S-SELSA | 83.55 | 90.17 | 82.39 | 64.78 |
Object | AP of TCN (%) | AP of TCN + LSTM (%) | AP of FGFA (%) | AP of D (&T Loss) (%) | AP of Our Method |
---|---|---|---|---|---|
airplane | 72.7 | 84.6 | 88.1 | 89.4 | 90.3 |
antelope | 75.5 | 78.1 | 85.0 | 80.4 | 88.4 |
bear | 42.2 | 72.0 | 82.5 | 83.8 | 89.8 |
bicycle | 39.5 | 67.2 | 68.1 | 70.0 | 76.6 |
bird | 25.0 | 68.0 | 72.8 | 71.8 | 73.8 |
bus | 64.1 | 80.1 | 82.3 | 82.6 | 84.2 |
car | 36.3 | 54.7 | 58.6 | 56.8 | 75.8 |
cattle | 51.1 | 61.2 | 71.7 | 71.0 | 84.1 |
dog | 24.4 | 61.6 | 73.3 | 71.8 | 81.2 |
domestic cat | 48.6 | 78.9 | 81.5 | 76.6 | 83.5 |
elephant | 65.6 | 71.6 | 78.0 | 79.3 | 82.1 |
fox | 73.9 | 83.2 | 90.6 | 89.9 | 92.1 |
giant panda | 61.7 | 78.1 | 82.3 | 83.3 | 91.2 |
hamster | 82.4 | 91.5 | 92.4 | 91.9 | 93.8 |
horse | 30.8 | 66.8 | 70.3 | 76.8 | 85.2 |
lion | 34.4 | 21.6 | 66.9 | 57.3 | 79.2 |
lizard | 54.2 | 74.4 | 79.3 | 79.0 | 82.3 |
monkey | 1.6 | 36.6 | 53.9 | 54.1 | 69.6 |
motorbike | 61.0 | 76.3 | 84.3 | 80.3 | 85.1 |
rabbit | 36.6 | 51.4 | 66.7 | 65.3 | 78.1 |
red panda | 19.7 | 70.6 | 82.2 | 85.3 | 87.3 |
sheep | 55.0 | 64.2 | 57.2 | 56.9 | 74.8 |
snake | 38.9 | 61.2 | 74.7 | 74.1 | 82.1 |
squirrel | 2.6 | 42.3 | 56.5 | 59.9 | 76.5 |
tiger | 42.8 | 84.8 | 91.0 | 91.3 | 92.3 |
train | 54.6 | 78.1 | 82.4 | 84.9 | 85.7 |
turtle | 66.1 | 77.2 | 80.2 | 81.9 | 83.2 |
watercraft | 69.2 | 61.5 | 65.7 | 68.3 | 82.3 |
whale | 26.5 | 66.9 | 75.6 | 68.9 | 82.2 |
zebra | 68.6 | 88.5 | 91.3 | 90.9 | 93.7 |
mAP | 47.5 | 68.4 | 76.3 | 75.8 | 83.55 |
Method | mAP (%) (Slow) | mAP (%) (Medium) | mAP (%) (Fast) |
---|---|---|---|
FGFA | 85.31% | 75.86% | 55.72% |
Our method | 90.17% | 82.39% | 64.78% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
You, H.; Lu, Y.; Tang, H. Improved Feature Extraction and Similarity Algorithm for Video Object Detection. Information 2023, 14, 115. https://doi.org/10.3390/info14020115
You H, Lu Y, Tang H. Improved Feature Extraction and Similarity Algorithm for Video Object Detection. Information. 2023; 14(2):115. https://doi.org/10.3390/info14020115
Chicago/Turabian StyleYou, Haotian, Yufang Lu, and Haihua Tang. 2023. "Improved Feature Extraction and Similarity Algorithm for Video Object Detection" Information 14, no. 2: 115. https://doi.org/10.3390/info14020115
APA StyleYou, H., Lu, Y., & Tang, H. (2023). Improved Feature Extraction and Similarity Algorithm for Video Object Detection. Information, 14(2), 115. https://doi.org/10.3390/info14020115