Object Detection Using Multi-Scale Balanced Sampling
Abstract
:1. Introduction
- The proportion of small objects in the image are always relative low, which means they might be excluded from the training process due to improper network hyperparameter settings. However, small objects often have low resolutions and less image information which means the difficulty of training small objects are always higher than general objects, the lack of sampling will further result in a low quality features extraction by the deep neural networks.
- In natural scenes, the scale of objects are distributed stochastic, which means within a single image there might be objects with large scale variants, it is easy to take the majority of samples and ignore other objects in the process of training.
- The samples’ matching conditions is adjusted according to the objects’ scale so that the the small objects are easier to be matched, which enhance the training samples of small objects.
- The sample set is divided into multiple intervals according to the samples’ scale and their corresponding discrimination difficulty. In addition, each interval is sampled in a balanced fashion to preserve the diversity of sample types during the classification and positioning network training process and to further ensure that the algorithm does not tend to detect a specific type of samples while ignoring the others.
- Evaluation between proposed approach and others on several benchmarks is proposed, the experiment results show that the detection performances are better than other similar approaches.
2. Related Works
3. Proposed Method
3.1. Framework Overview
- Take an digital image as inputs, feed it into ResNet-101 pre-trained network so that the image features are extracted from shallow to deep using Conv1∼Conv5 [14].
- By adopting MB-RPN, samples are dynamically selected according to scale and proportion. MB-RPN can be further decomposed into two parts: multi-scale and balanced sampling.
- The MB-RPN calculation results are then transformed to the same size through ROI Pooling.
- The candidate box is then sent to the classification and regression network to obtain the object category and its location.
3.2. Multi-Scale Sample Discrimination
- For the large object it is much easier to meet the discrimination condition of the IOU threshold. As it is seen in Figure 3, the large object located of the image has larger number of matching anchor boxes although for a threshold which has not been decreased. This suggests that further reduction of the threshold has only a limited impact on the increase of the positive samples.
- Since a large object area contains a rich image feature information, compared to small objects, it is easier to obtain a set of valid discrimination and bounding box regression parameters during the network training process. Therefore, it is very limited to enhance the effect of detection performances for large objects by reducing the IOU discrimination threshold.
3.3. Balanced Sampling
Algorithm 1 Balanced Sample Algorithm. |
Inputs: |
Positive/Negative Sample Sets; |
2: Number of Select Samples N; |
Outputs: |
Sample Set U; |
4: |
U = |
6: sort(Sets) |
for set in Sets: |
8: if : |
U.append(sample ) |
10: else: |
U.append() |
12: reshape() |
return U |
3.4. Loss Function
3.5. Discussion
4. Experiments
4.1. Benchmarks
- DOTA contains over 2000 remote images. All of the images are large size about over pixels. Images are annotated by experts in aerial and remote image interpretation using 15 common object categories, such as plane, ship, harbor, etc. The objects’ distribution of each category are shown in Figure 5a, the abbreviation of each category will be shown in Table 2.
- UAVB contains a unmanned aerial vehicle dataset, each frame is of the size about pixels and contains high density small objects. Vehicle category include car, truck and bus. The objects’ distribution of each category are shown in Figure 5b.
4.2. Implementation Detail
4.3. Effectiveness of Multi-Scale Sampling on Positive Sample
4.4. Effectiveness of Multi-Scale Balanced Sampling on Negative Samples
4.5. Performance Comparison with Other Method
4.6. Disscusion
5. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Pang, Y.; Yuan, Y.; Li, X.; Pan, J. Efficient hog human detection. Signal Process. 2011, 91, 773–781. [Google Scholar] [CrossRef]
- Felzenszwalb, P.F.; Girshick, R.B.; McAllester, D. Cascade object detection with deformable part models. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern recognition, San Francisco, CA, USA, 13–18 June 2010; pp. 2241–2248. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2014; pp. 740–755. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems; NeuralIPS: Montreal, QC, Canada, 2015; pp. 91–99. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast R-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 11–18 December 2015; pp. 1440–1448. [Google Scholar]
- Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, WI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
- Cai, Z.; Vasconcelos, N. Cascade R-CNN: Delving Into High Quality Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 6154–6162. [Google Scholar]
- Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988. [Google Scholar]
- Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Seoul, Korea, 27 October–2 November 2019; pp. 821–830. [Google Scholar]
- Ahmad, M.; Abdullah, M.; Han, D. Small Object Detection in Aerial Imagery using RetinaNet with Anchor Optimization. In Proceedings of the International Conference on Electronics, Information, and Communication, Barcelona, Spain, 19–22 January 2020; pp. 1–3. [Google Scholar]
- Wang, J.; Chen, K.; Yang, S.; Loy, C.C.; Lin, D. Region proposal by guided anchoring. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 2965–2974. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Xia, G.S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3974–3983. [Google Scholar]
- Du, D.; Qi, Y.; Yu, H.; Yang, Y.; Duan, K.; Li, G.; Zhang, W.; Huang, Q.; Tian, Q. The Unmanned Aerial Vehicle Benchmark: Object Detection, Tracking and Baseline. Int. J. Comput. Vis. 2020, 128, 1141–1159. [Google Scholar]
- Girija, S.S. Tensorflow: Large-scale machine learning on heterogeneous distributed systems. Softw. Avail. Tensorflow. Org. 2016, 39, 1–9. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Vicente, S.; Carreira, J.; Agapito, L.; Batista, J. Reconstructing pascal voc. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 41–48. [Google Scholar]
Input | Shape | Layer | Output | |
---|---|---|---|---|
Image | Conv1 | |||
Conv1 | Pool1 | |||
Pool1 | Conv2 | |||
ResNet-101 | Conv2 | Conv3 | ||
Conv3 | Conv4 | |||
Conv4 | Conv5 | |||
Pool1 | ||||
Conv2 | ||||
MB-RPN | Conv3 | |||
Conv4 | ||||
Conv5 | ||||
Pool1 | ||||
Conv2 | ||||
FineTune | Conv3 | |||
Conv4 | ||||
Conv5 |
SSD | RetinaNet | SRetinaNet | FPN | GA-RPN | Libra RCNN | MB-RPN | |
---|---|---|---|---|---|---|---|
plane (pl) | 81.3 | 88.7 | 86.2 | 88.9 | 88.4 | 89.7 | 90.2 |
baseball-diamond (bd) | 50.6 | 64.5 | 57.8 | 75.8 | 75.2 | 73.6 | 77.0 |
bridge (bg) | 39.1 | 47.5 | 21.4 | 48.6 | 43.1 | 49.7 | 53.0 |
ground-track-field (gtf) | 44.2 | 49.0 | 49.5 | 53.9 | 53.5 | 54.8 | 53.3 |
small-vechicle (sv) | 56.0 | 58.1 | 61.1 | 63.8 | 67.7 | 70.3 | 70.4 |
large-vehicle (lv) | 54.2 | 57.8 | 59.0 | 63.6 | 66.3 | 65.0 | 66.0 |
ship (sp) | 61.1 | 69.2 | 74.4 | 76.7 | 77.0 | 77.1 | 76.8 |
tennis-court (tc) | 84.3 | 88.2 | 70.3 | 90.7 | 90.1 | 90.8 | 90.8 |
basketball-court (bc) | 68.6 | 73.5 | 58.1 | 78.2 | 78.4 | 75.0 | 78.9 |
storage-tank (st) | 61.5 | 75.6 | 80.5 | 81.0 | 84.1 | 83.0 | 83.7 |
soccer-ball-field (sbf) | 21.3 | 28.4 | 16.7 | 36.5 | 36.8 | 37.2 | 41.2 |
roundabout (ra) | 43.6 | 49.1 | 55.2 | 56.8 | 58.9 | 60.0 | 59.4 |
harbor (hb) | 51.6 | 56.7 | 35.9 | 67.3 | 66.7 | 67.6 | 68.3 |
swimming-pool (st) | 47.4 | 61.1 | 64.3 | 71.1 | 72.0 | 72.6 | 70.9 |
helicopter (hl) | 28.0 | 38.6 | 57.1 | 44.7 | 50.3 | 62.6 | 56.5 |
mAP | 56.9 | 60.6 | 58.1 | 65.5 | 66.8 | 67.6 | 68.5 |
FPN | GA-RPN | Libra RCNN | MB-RPN | |
---|---|---|---|---|
car | 45.1 | 47.2 | 48.8 | 51.2 |
bus | 5.3 | 8.6 | 7.2 | 21.7 |
truck | 28.2 | 31.5 | 31.1 | 33.2 |
mAP | 21.1 | 21.7 | 23.0 | 29.4 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, H.; Gong, J.; Chen, D. Object Detection Using Multi-Scale Balanced Sampling. Appl. Sci. 2020, 10, 6053. https://doi.org/10.3390/app10176053
Yu H, Gong J, Chen D. Object Detection Using Multi-Scale Balanced Sampling. Applied Sciences. 2020; 10(17):6053. https://doi.org/10.3390/app10176053
Chicago/Turabian StyleYu, Hang, Jiulu Gong, and Derong Chen. 2020. "Object Detection Using Multi-Scale Balanced Sampling" Applied Sciences 10, no. 17: 6053. https://doi.org/10.3390/app10176053
APA StyleYu, H., Gong, J., & Chen, D. (2020). Object Detection Using Multi-Scale Balanced Sampling. Applied Sciences, 10(17), 6053. https://doi.org/10.3390/app10176053