Multiple Instance Learning Convolutional Neural Networks for Fine-Grained Aircraft Recognition
Abstract
:1. Introduction
- (1)
- We attempt to propose a generalized MIL fine-grained aircraft recognition method to focus on the discriminative regions and reduce the excessive computational cost of extracting fine-grained part features without marked part annotations. In generalized MIL, a fine-grained aircraft is combined with several component concepts, which are only known aircraft types. It can effectively spotlight part regions and suppress background response values.
- (2)
- The patch-level output extracted from the MIL backbone network cannot simply present aircraft part semantics. Thus, we design an instance conversion part (instance loss function) to transform patch-level information to instance-level fine-grained semantic representations, which involves few model parameters and little testing time.
- (3)
- We apply a self-made benchmark dataset CAIs in the remote sensing fine-grained images to demonstrate the effectiveness and universality of our method. Comprehensive experimental evaluations of basic DCNNs verify the effectiveness of the proposed method.
2. Materials and Methods
2.1. Problem Statement
2.2. Generalized MIL Fine-Grained Recognition Network
2.3. Instance Conversion Part (Instance Loss)
2.4. MIL Pooling Part
3. Experiments and Results
3.1. Dataset
3.2. Implementation Details
3.2.1. Data Preprocessing
3.2.2. Parameter Settings
3.2.3. Evaluation Metrics and Experimental Platforms
3.3. Comparative Experiment of Baseline Networks and Standard MIL Networks
3.4. Comparative Experiment of the Standard MIL Networks and Generalized MIL Networks
4. Discussion
4.1. Number of Instances
4.2. Visualization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yang, Y.; Newsam, S. Bag-of-visual-words and spatial extensions for land-use classification. In Proceedings of the 18th SIGSPATIAL International Conference on Advances in Geographic Information Systems, San Jose, CA, USA, 2–5 November 2010; pp. 207–279. [Google Scholar]
- Lowe, D.G. Distinctive Image Features from Scale-Invariant Keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
- Dudani, S.A.; Breeding, K.J.; Mcghee, R.B. Aircraft Identification by Moment Invariants. IEEE Trans. Comput. 2009, 100, 39–46. [Google Scholar] [CrossRef]
- Zhang, Y.; Yao, G. Aircraft recognition model based on moment invariants and neural network. Comput. Knowl. Technol. 2009, 14, 3771–3772. [Google Scholar]
- Hsieh, J.W.; Chen, J.M.; Chuang, C.H.; Fan, K.C. Novel aircraft type recognition with learning capabilities in satellite images. In Proceedings of the 2004 International Conference on Image Processing, Singapore, 24–27 October 2004; pp. 1715–1718. [Google Scholar]
- Liu, F.; Yu, P.; Liu, K. Aircraft target recognition in remote sensing image using independent component analysis Zernike moments. CAAI Trans. Intell. Syst. 2011, 6, 51–56. [Google Scholar]
- Wang, D.; Xin, H.; Wei, Z.; Yu, H. A method of aircraft image target recognition based on modified PCA features and SVM. In Proceedings of the 2009 9th International Conference on Electronic Measurement & Instruments, Beijing, China, 16–19 August 2009; pp. 261–265. [Google Scholar]
- Ke, L.I.; Wang, R.S.; Wang, C. A Method of Tree Classifier for the Recognition of Airplane Types. Comput. Eng. Sci. 2006, 28, 136–139. [Google Scholar]
- Zhu, X.; Ma, B.; Guo, G.; Liu, G. Aircraft Type Classification Based on an Optimized Bag of Words Model. In Proceedings of the 2016 IEEE Chinese Guidance Navigation and Control Conference, Nanjing, China, 12–14 August 2016; pp. 434–437. [Google Scholar]
- Zhao, D.; Zhang, Y.; Wei, W. Aircraft recognition algorithm based on PCA and image matching. Chin. J. Stereol. Image Anal. 2009, 14, 261–265. [Google Scholar]
- Zhao, A.; Fu, K.; Wang, S.; Zuo, J.; Zhang, Y.; Hu, Y.; Wang, H. Aircraft Recognition Based on Landmark Detection in Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2017, 14, 1413–1417. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Diao, W.; Sun, X.; Dou, F.; Yan, M.; Wang, H.; Fu, K. Object recognition in remote sensing images using sparse deep belief networks. Remote Sens. Lett. 2015, 6, 745–754. [Google Scholar] [CrossRef]
- Zuo, J.; Xu, G.; Fu, K.; Sun, X.; Sun, H. Aircraft type recognition based on segmentation with deep convolutional neural networks. IEEE Geosci. Remote Sens. Lett. 2018, 15, 282–286. [Google Scholar] [CrossRef]
- Zhang, Y.; Sun, H.; Zuo, J.; Wang, H.; Xu, G.; Sun, X. Aircraft type recognition in remote sensing images based on feature learning with conditional generative adversarial networks. Remote Sens. 2018, 10, 1123. [Google Scholar] [CrossRef] [Green Version]
- Zhang, N.; Donahue, J.; Girshick, R.; Darrell, T. Part-based R-CNNs for fine-grained category detection. In Proceedings of the European Conference on Computer Vision, Zurich, Switzerland, 6–12 September 2014; pp. 834–849. [Google Scholar]
- Fu, K.; Dai, W.; Zhang, Y.; Wang, Z.; Yan, M.; Sun, X. Multicam: Multiple class activation mapping for aircraft recognition in remote sensing images. Remote Sens. 2019, 11, 544. [Google Scholar] [CrossRef] [Green Version]
- Xiong, Y.; Niu, X.; Dou, Y.; Qie, H.; Wang, K. Non-locally Enhanced Feature Fusion Network for Aircraft Recognition in Remote Sensing Images. Remote Sens. 2020, 12, 681. [Google Scholar] [CrossRef] [Green Version]
- Wu, H.; Pasad, S. Semi-Supervised Deep Learning Using Pseudo Labels for Hyperspectral Image Classification. IEEE Trans. Image Process. 2018, 27, 1259–1270. [Google Scholar] [CrossRef]
- Tarvainen, A.; Valpola, H. Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results. arXiv 2017, arXiv:170301780. [Google Scholar]
- Kang, J.; Fernandez-Baltran, R.; Ye, Z.; Xiaohua, T.; Ghamisi, P.; Plaza, A. High-rankness regularized semi-supervised deep metric learning for remote sensing imagery. Remote Sens. 2020, 12, 2603. [Google Scholar] [CrossRef]
- Protopapadakis, E.; Doulamis, A.; Doulamis, N.; Maltezos, E. Stacked autoencoders driven by semi-supervised learning for building extraction from near infrared remote sensing imagery. Remote Sens. 2021, 13, 371. [Google Scholar] [CrossRef]
- Fang, B.; Li, Y.; Zhang, H.; Chan, J. Semi-supervised deep learning classification for hyperspectral image based on dual-strategy sample selection. Remote Sens. 2018, 10, 574. [Google Scholar] [CrossRef] [Green Version]
- Dietterich, T.G.; Lathrop, R.H.; Lozano-Pérez, T. Solving the multiple instance problem with axis-parallel rectangles. Artif. Intell. 1997, 89, 31–71. [Google Scholar] [CrossRef] [Green Version]
- Pinheiro, P.O.; Collobert, R. From image-level to pixel-level labeling with convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 1713–1721. [Google Scholar]
- Wu, J.; Yu, Y.; Huang, C.; Yu, K. Deep multiple instance learning for image classification and auto-annotation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 8–10 June 2015; pp. 3460–3469. [Google Scholar]
- Li, D.X.; Peng, J.Y.; Zhan, L.; Bu, Q. LSA based multi-instance learning algorithm for image retrieval. Signal. Process. 2011, 91, 1993–2000. [Google Scholar] [CrossRef]
- Sun, M.; Han, T.X.; Liu, M.-C.; Khodayari-Rostamabad, A. Multiple instance learning convolutional neural networks for object recognition. In Proceedings of the 2016 23rd International Conference on Pattern Recognition, Cancun, Mexico, 4–8 December 2016; pp. 3270–3275. [Google Scholar]
- Fan, M.; Chakraborti, T.; Eric, I.; Chang, C.; Xu, Y.; Rittscher, J. Fine-Grained Multi-Instance Classification in Microscopy Through Deep Attention. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging, Iowa City, IA, USA, 3–7 April 2020; pp. 169–173. [Google Scholar]
- Li, Z.; Xu, K.; Xie, J.; Bi, Q.; Qin, K. Deep multiple instance convolutional neural networks for learning robust scene representations. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3685–3702. [Google Scholar] [CrossRef]
- Ilse, M.; Tomczak, J.; Welling, M. Attention-based deep multiple instance learning. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 2127–2136. [Google Scholar]
- Chopra, S.; Hadsell, R.; LeCun, Y. Learning a similarity metric discriminatively with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005; pp. 539–546. [Google Scholar]
- Chang, D.; Ding, Y.; Xie, J.; Bhunia, A.K.; Li, X.; Ma, Z.; Wu, M.; Guo, J.; Song, Y.-Z. The devil is in the channels: Mutual-channel loss for fine-grained image classification. IEEE Trans. Image Process. 2020, 29, 4683–4695. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Goodfellow, I.; Warde-Farley, D.; Mirza, M.; Courville, A.; Bengio, Y. Maxout networks. In Proceedings of the International Conference on Machine Learning, Atlanta, GA, USA, 7–19 June 2013; pp. 1319–1327. [Google Scholar]
- Li, D.X.; Zhao, X.Q.; Li, N. A Survey of Multi-instance Learning Algorithms for Image Semantic Analysis. Control and Decision. 2013, 28, 481–488. [Google Scholar]
- Weidmann, N.; Frank, E.; Pfahringer, B. A two-level learning method for generalized multi-instance problems. In Proceedings of the European Conference on Machine Learning, Cavtat-Dubrovnik, Croatia, 22–26 September 2003; pp. 468–479. [Google Scholar]
- Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
- Gaofen Challenge on Automated High-Resolution Earth Observation Image Interpretation. Available online: http://en.sw.chreos.org (accessed on 1 July 2020).
- Li, K.; Wan, G.; Cheng, G.; Meng, L.; Han, J. Object detection in optical remote sensing images: A survey and a new benchmark. ISPRS J. Photogramm. Remote Sens. 2020, 159, 296–307. [Google Scholar] [CrossRef]
- Xia, G.-S.; Bai, X.; Ding, J.; Zhu, Z.; Belongie, S.; Luo, J.; Datcu, M.; Pelillo, M.; Zhang, L. DOTA: A large-scale dataset for object detection in aerial images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 3974–3983. [Google Scholar]
Types | Images | Types | Images | Types | Images |
---|---|---|---|---|---|
Type 1 | 500 | Type 2 | 480 | Type 3 | 480 |
Type 4 | 374 | Type 5 | 16 | Type 6 | 500 |
Type 7 | 594 | Type 8 | 263 | Type 9 | 570 |
Type 10 | 500 | Type 11 | 500 | Type 12 | 500 |
Type 13 | 370 | Type 14 | 493 | Type 15 | 500 |
Baseline | Loss Function | Accuracy/% | Parameters/MB | Training Time/ms | Testing Time (ms/Image) | ||
---|---|---|---|---|---|---|---|
Tr = 50% | Tr = 60% | Tr = 80% | |||||
VGGNet | CE Loss | 91.0000 | 91.5030 | 92.6038 | 134.32 | 5368.96 | 6.15 |
VGGNet | CE Loss and MCL | 90.8981 | 90.4834 | 92.2264 | 134.32 | 5694.91 | 6.20 |
VGGNet | CE Loss and Instance Loss | 90.5666 | 91.0498 | 92.9811 | 134.32 | 5791.13 | 6.10 |
VGGMIL | CE Loss | 91.7119 | 92.7115 | 93.5849 | 56.91 | 3216.95 | 3.80 |
VGGMIL | CE Loss and MCL | 92.8270 | 93.8444 | 93.3585 | 56.91 | 3270.45 | 3.95 |
VGGMIL | CE Loss and Instance Loss | 93.0380 | 94.0710 | 94.0377 | 56.91 | 3337.66 | 3.90 |
ResNet | CE Loss | 91.3201 | 92.7492 | 94.1887 | 89.84 | 2066.15 | 2.25 |
ResNet | CE Loss and MCL | 92.3749 | 92.8625 | 94.2143 | 89.84 | 2099.63 | 2.15 |
ResNet | CE Loss and Instance Loss | 91.1995 | 92.8625 | 94.3571 | 89.84 | 2438.67 | 2.25 |
ResMIL | CE Loss | 92.0133 | 92.9381 | 94.2143 | 33.89 | 1981.13 | 1.85 |
ResMIL | CE Loss and MCL | 92.3448 | 92.9381 | 94.4906 | 33.89 | 2077.18 | 1.85 |
ResMIL | CE Loss and Instance Loss | 92.4653 | 93.2024 | 94.6415 | 33.89 | 2253.53 | 1.75 |
Baseline | The number of Instances m | Loss Function | Accuracy/% (Tr = 50%) | Parameters/MB | Training Time/ms | Testing Time/ms |
---|---|---|---|---|---|---|
VGGMIL | 1 | CE Loss and Instance Loss | 92.9475 | 56.91 | 3451.21 | 7.65 |
VGGMIL | 2 | CE Loss and Instance Loss | 92.8270 | 56.91 | 3340.35 | 7.04 |
VGGMIL | 3 | CE Loss and Instance Loss | 93.0380 | 56.91 | 3337.66 | 3.90 |
VGGMIL | 4 | CE Loss and Instance Loss | 92.6160 | 56.91 | 3402.68 | 7.44 |
ResMIL | 1 | CE Loss and Instance Loss | 92.4051 | 33.89 | 2253.53 | 5.18 |
ResMIL | 2 | CE Loss and Instance Loss | 92.2242 | 33.89 | 2341.67 | 4.72 |
ResMIL | 3 | CE Loss and Instance Loss | 92.4653 | 33.89 | 2181.48 | 1.75 |
ResMIL | 4 | CE Loss and Instance Loss | 92.0434 | 33.89 | 2310.24 | 4.49 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, X.; Xu, K.; Huang, C.; Wang, C.; Qin, K. Multiple Instance Learning Convolutional Neural Networks for Fine-Grained Aircraft Recognition. Remote Sens. 2021, 13, 5132. https://doi.org/10.3390/rs13245132
Huang X, Xu K, Huang C, Wang C, Qin K. Multiple Instance Learning Convolutional Neural Networks for Fine-Grained Aircraft Recognition. Remote Sensing. 2021; 13(24):5132. https://doi.org/10.3390/rs13245132
Chicago/Turabian StyleHuang, Xiaolan, Kai Xu, Chuming Huang, Chengrui Wang, and Kun Qin. 2021. "Multiple Instance Learning Convolutional Neural Networks for Fine-Grained Aircraft Recognition" Remote Sensing 13, no. 24: 5132. https://doi.org/10.3390/rs13245132
APA StyleHuang, X., Xu, K., Huang, C., Wang, C., & Qin, K. (2021). Multiple Instance Learning Convolutional Neural Networks for Fine-Grained Aircraft Recognition. Remote Sensing, 13(24), 5132. https://doi.org/10.3390/rs13245132