Fine-Grained Recognition of Surface Targets with Limited Data
Abstract
:1. Introduction
- Establishing a dataset of surface targets, including different categories.
- Introducing a multi-attention mechanism based on residual networks, and fusing the channel attention module and spatial attention module to focus on the discriminative area and further improve the classification accuracy.
- Transfer learning and the N-pair loss function are adopted to make the networks express richer features and converge better and faster.
2. Literature Review
2.1. CNN and Ensemble of CNN
2.2. Methods Based on Location Detection and Alignment
2.3. Methods Based on Visual Attention
3. Methodology
3.1. Surface Target Dataset
3.2. Proposed Model
3.2.1. ResNet-50
3.2.2. Multi-Attention Model
- Channel attention module
- Take the convolutional feature map generated by the residual network as the original input, let , where represents the spatial dimension, and represents the number of channels. In order to effectively extract the channel attention, will be compressed in the spatial dimension, and the feature of the same channel is compressed into a real number. This step can be achieved by pooling operation.
- Adopt a multi-scale pooling method: use the maximum pooling function and the average pooling function respectively to reduce the dimensionality to obtain two sized feature vectors . Then, input the two vectors into the same shared network to obtain the attention of the channel dimension weight distribution. The shared network is composed of a multi-layer perceptron with a hidden layer.
- Perform the corresponding element summation operation on the two output vectors after redistributing the attention weight, and use the Sigmod activation function to map the merged feature vector to generate the channel attention weight, .
- Finally, the attention weight and the original feature map are fused. Here, a fusion method of multiplying the corresponding elements is used to obtain the fused attention feature map , . Replace the original input features with to achieve the attention extraction of the channel dimension.
- 2.
- Spatial attention module
- The convolutional feature map generated by the residual network is used as the original input, . represents the spatial dimension, while is the number of channels. is compressed to obtain the spatial attention information along the channel axis, through which a list of channel values is compressed into a channel. This step can be achieved by the pooling of the channel dimension.
- The same multi-scale pooling method is adopted: the maximum pooling function and the average pooling function are used for dimensionality reduction to obtain two feature maps, size of which is . The two feature maps are spliced along the channel axis using the corresponding element summation method. Get a new feature map of size.
- Use a convolution kernel to convolve the spliced feature map, compress it again to , and use the Sigmod activation function to map the convolved feature map to generate a spatial attention map , .
- Finally, the spatial attention map and the original feature map are fused using the corresponding element point multiplication method, and the fused spatial attention feature map is obtained, . The original input feature is replaced with to achieve the attention extraction of the spatial features.
3.3. Transfer Learning
- When the amount of data is small and the datasets are highly similar, the linear classifier only needs to be trained.
- When the amount of data is large and the similarity of the datasets is high, multiple layers need to be fine-tuned. In other words, network weights should be pre-trained to initialize the weights of the network.
- When the amount of data is small and the datasets are very different, most of the network needs to be reinitialized.
- The amount of data is very large, and the datasets are very different, the datasets need to be fine-tuned with multiple layers.
3.4. Loss Function Based on Metric Learning
4. Experiments and Analysis
4.1. Parameters Setting and Data Enhancement
4.2. Network Visualization and Display of Experiments
4.3. Experiments and Results Analysis
4.3.1. Comparison Before and After Transfer Learning
4.3.2. Comparison of Different Sample Sizes
4.3.3. Comparison with Other Weak Supervision Methods
5. Discussion
Author Contributions
Funding
Conflicts of Interest
References
- He, J.; Guo, Y.; Yuan, H. Ship Target Automatic Detection Based on Hypercomplex Flourier Transform Saliency Model in High Spatial Resolution Remote-Sensing Images. Sensors 2020, 20, 2536. [Google Scholar] [CrossRef] [PubMed]
- Rajasekaran, S.; Raj, R.A. Image recognition using analog-ART1 architecture augmented with moment-based feature extractor. Neurocomputing 2004, 56, 61–77. [Google Scholar] [CrossRef]
- Susaki, J. Knowledge-Based Modeling of Buildings in Dense Urban Areas by Combining Airborne LiDAR Data and Aerial Images. Remote. Sens. 2013, 5, 5944–5968. [Google Scholar] [CrossRef] [Green Version]
- Chang, K.; Ghosh, J. Three-dimensional model-based object recognition and pose estimation using probabilistic principal surfaces. Electron. Imaging 2000, 3962, 192–204. [Google Scholar]
- Khellal, A.; Ma, H.-B.; Fei, Q. Convolutional Neural Network Based on Extreme Learning Machine for Maritime Ships Recognition in Infrared Images. Sensors 2018, 18, 1490. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Lin, C.-J.; Lin, C.-H.; Sun, C.-C.; Wang, S.-H. Evolutionary-Fuzzy-Integral-Based Convolutional Neural Networks for Facial Image Classification. Electronics 2019, 8, 997. [Google Scholar] [CrossRef] [Green Version]
- Guo, W.; Xia, X.; Wang, X. A remote sensing ship recognition method of entropy-based hierarchical discriminant regression. Optik 2015, 126, 2300–2307. [Google Scholar] [CrossRef]
- Alzubaidi, L.; Al-Shamma, O.; Fadhel, M.A.; Farhan, L.; Zhang, J.; Duan, Y. Optimizing the Performance of Breast Cancer Classification by Employing the Same Domain Transfer Learning from Hybrid Deep Convolutional Neural Network Model. Electronics 2020, 9, 445. [Google Scholar] [CrossRef] [Green Version]
- Hua, Y.; Yang, Y.; Du, J. Deep Multi-Modal Metric Learning with Multi-Scale Correlation for Image-Text Retrieval. Electronics 2020, 9, 466. [Google Scholar] [CrossRef] [Green Version]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.; Jackel, L.D. Backpropagation Applied to Handwritten Zip Code Recognition. Neural Comput. 1989, 1, 541–551. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Sutskever, I.; Hinton, G. ImageNet Classification with Deep Convolutional Neural Networks. Neural Inf. Process. Syst. 2012, 25. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv 2014, arXiv:1409.4842. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016), Las Vegas, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Ge, Z.; McCool, C.; Sanderson, C.; Corke, P. Subset feature learning for fine-grained category classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPR 2015), Boston, MA, USA, 8–10 June 2015. [Google Scholar]
- Ge, Z.; Bewley, A.; McCool, C.; Corke, P.; Upcroft, B.; Sanderson, C. Fine-grained classification via mixture of deep convolutional neural networks. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV 2016), Lake Placid, NY, USA, 7–10 March 2016. [Google Scholar]
- Lin, T.-Y.; Roychowdhury, A.; Maji, S. Bilinear CNN Models for Fine-Grained Visual Recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2015), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Zhang, N.; Donahue, J.; Girshick, R.; Darrell, T. Part-Based R-CNNs for Fine-Grained Category Detection. In Lecture Notes in Computer Science, Proceedings of the European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, 6–12 September 2014; Springer: Berlin, Germany, 2014. [Google Scholar]
- Zeiler, M.D.; Fergus, R. Visualizing and Understanding Convolutional Networks. In Proceedings of European Conference on Computer Vision. In Lecture Notes in Computer Science, Proceedings of the European Conference on Computer Vision (ECCV 2014), Zurich, Switzerland, 6–12 September 2014; Springer: Berlin, Germany, 2014. [Google Scholar]
- Branson, S.; Van Horn, G.; Perona, P.; Belongie, S. Improved Bird Species Recognition Using Pose Normalized Deep Convolutional Nets. In Proceedings of the British Machine Vision Conference (BMVC 2014), Nottingham, UK, 1–5 September 2014. [Google Scholar]
- Bentaieb, A.; Hamarneh, G. Topology Aware Fully Convolutional Networks for Histology Gland Segmentation. In Lecture Notes in Computer Science, Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI 2016), Athens, Greece, 17–21 October 2016; Springer: Berlin, Germany, 2016. [Google Scholar]
- He, K.; Georgia, G.; Piotr, D.; Ross, G. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Xiaohua, W.; Muzi, P.; Lijuan, P.; Hu, M.; Chunhua, J.; Fuji, R. Two-level attention with two-stage multi-task learning for facial emotion recognition. J. Vis. Commun. Image Represent. 2019, 62, 217–225. [Google Scholar] [CrossRef] [Green Version]
- Fu, J.; Zheng, H.; Mei, T. Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2017), Puerto Rico, PR, USA, 24–30 June 2017. [Google Scholar]
- Chen, X.; Gupta, A. An Implementation of Faster RCNN with Study for Region Sampling. In Lecture Notes in Computer Science, Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI 2016), Athens, Greece, 17–21 October 2016; Springer: Berlin, Germany, 2016. [Google Scholar]
- Zheng, H.; Fu, J.; Zha, Z.-J.; Luo, J. Looking for the Devil in the Details: Learning Trilinear Attention Sampling Network for Fine-Grained Image Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2019), Long Beach, CA, USA, 16–20 June 2019. [Google Scholar]
- Sun, M.; Yuan, Y.; Zhou, F.; Ding, E. Multi-Attention Multi-Class Constraint for Fine-grained Image Recognition. In Proceedings of the European Conference on Computer Vision (ECCV 2018), Munich, Germany, 8−14 September 2018. [Google Scholar]
- Krause, J.; Stark, M.; Deng, J.; Fei-Fei, L. 3D Object Representations for Fine-Grained Categorization. In Proceedings of the IEEE International Conference on Computer Vision Workshops (ICCV 2013), Sydney, Australia, 3−6 December 2013. [Google Scholar]
- Maji, S.; Rahtu, E.; Kannala, J.; Blaschko, M.; Vedaldi, A. Fine-Grained Visual Classification of Aircraft. Computer Vision and Pattern Recognition. arXiv 2013, arXiv:1306.5151. [Google Scholar]
- Shelhamer, E.; Long, J.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. IEEE Trans. Pattern Anal. Mach. Intel. 2017, 39, 640–651. [Google Scholar] [CrossRef]
- Lin, T.Y.; Roychowdhury, A.; Maji, S. Bilinear CNNs for Fine-grained Visual Recognition. arXiv 2015, arXiv:1504.07889. [Google Scholar]
- Zheng, H.; Fu, J.; Mei, T.; Luo, J. Learning Multi-attention Convolutional Neural Network for Fine-Grained Image Recognition. In Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Wang, Y.; Morariu, V.I.; Davis, L.S. Learning a Discriminative Filter Bank within a CNN for Fine-grained Recognition. arXiv 2016, arXiv:1611.09932. [Google Scholar]
Using Methods | Accuracy/(%) |
---|---|
No transfer learning | 81.3 |
Freeze the first block | 87.5 |
Freeze the first two blocks | 89.0 |
Freeze the first three blocks | 90.5 |
Freeze the first four blocks | 89.3 |
Freeze all convolutional blocks | 87.9 |
Approach | Backbone | Accuracy/(%) | ||
---|---|---|---|---|
Self-Built | Cars | Aircrafts | ||
R-CNN | AlexNet | 81.8 | 88.4 | 86.6 |
FCAN | VGG-16 | 82.5 | 89.1 | / |
B-CNN | VGG-M+VGG-D | 83.3 | 91.3 | 84.1 |
RA-CNN | VGG-19 × 3 | / | 92.5 | 88.2 |
MA-CNN | VGG-19 × 3 | 86.1 | 92.8 | 89.9 |
MAMC | ResNet-101 | 88.3 | 93.0 | / |
DFL-CNN | ResNet-50 | / | 93.1 | 91.7 |
TASN | ResNet-50 | 89.3 | 93.8 | / |
CA-RES | ResNet50 | 87.6 | 92.5 | 90.6 |
SA-RES | ResNet50 | 88.2 | 91.6 | 89.5 |
Proposed model | ResNet50 × 2 | 90.5 | 93.0 | 92.1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, R.; Sun, B.; Qiu, X.; Su, S.; Zuo, Z.; Wu, P. Fine-Grained Recognition of Surface Targets with Limited Data. Electronics 2020, 9, 2044. https://doi.org/10.3390/electronics9122044
Guo R, Sun B, Qiu X, Su S, Zuo Z, Wu P. Fine-Grained Recognition of Surface Targets with Limited Data. Electronics. 2020; 9(12):2044. https://doi.org/10.3390/electronics9122044
Chicago/Turabian StyleGuo, Runze, Bei Sun, Xiaotian Qiu, Shaojing Su, Zhen Zuo, and Peng Wu. 2020. "Fine-Grained Recognition of Surface Targets with Limited Data" Electronics 9, no. 12: 2044. https://doi.org/10.3390/electronics9122044
APA StyleGuo, R., Sun, B., Qiu, X., Su, S., Zuo, Z., & Wu, P. (2020). Fine-Grained Recognition of Surface Targets with Limited Data. Electronics, 9(12), 2044. https://doi.org/10.3390/electronics9122044