When Mobilenetv2 Meets Transformer: A Balanced Sheep Face Recognition Model
Abstract
:1. Introduction
- (1)
- We propose a sheep face recognition model MobileViTFace based on the CNN and Transformer structure. MobileViTFace fully combines the advantages of the convolutional structure and Transformer structure to extract more effective features for the final sheep face recognition, which is a general form of combining convolutional structure and attention structure.
- (2)
- Improving the effectiveness of sheep face recognition. MobileViTFace not only has high recognition accuracy, but also the number of parameters and FLOPs of the model is significantly reduced due to the lightweight design.
- (3)
- The application of the deep learning-based sheep face recognition model in practical production is promoted. The proposed MobileViTFace sheep face recognition model is deployed on the Jetson Nano edge computing-based platform to develop a sheep face recognition system, which effectively improves the informationization of sheep farms and can provide a reference for other sheep farms.
2. Materials and Methods
2.1. Self-Built Dataset
2.2. Model
2.2.1. Overall Flow Chart
2.2.2. Sheep Face Detection Module
2.2.3. Improved Sheep Face Recognition Module
2.2.4. The Design of the Bottleneck
2.2.5. The Design of the LinearViT
2.3. Evaluation Indicators and Experimental Environment
2.3.1. Evaluation Indicators
2.3.2. Experimental Parameter Setting
3. Results
3.1. Comparison of Sheep Face Detection Results
3.2. Comparison of Recognition Results Using MobileViTFace and Other Methods
3.3. How to Deal with the Newly Added Sheep?
3.4. Visualization of Recognition Results
3.5. Failure Case
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ait-Saidi, A.; Caja, G.; Salama, A.A.K.; Carné, S. Implementing electronic identification for performance recording in sheep: I. Manual versus semiautomatic and automatic recording systems in dairy and meat farms. J. Dairy Sci. 2014, 97, 7505–7514. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kumar, S.; Tiwari, S.; Singh, S.K. Face recognition for cattle. In Proceedings of the 2015 Third International Conference on Image Information Processing (ICIIP), Waknaghat, India, 21–23 December 2015; pp. 65–72. [Google Scholar]
- Yan, H.; Cui, Q.; Liu, Z. Pig face identification based on improved AlexNet model. INMATEH-Agric. Eng. 2020, 61, 97–104. [Google Scholar] [CrossRef]
- Gaber, T.; Tharwat, A.; Hassanien, A.E.; Snasel, V. Biometric cattle identification approach based on Weber’s Local Descriptor and AdaBoost classifier. Comput. Electron. Agric. 2016, 122, 55–66. [Google Scholar] [CrossRef] [Green Version]
- Zaorálek, L.; Prilepok, M.; Snášel, V. Cattle identification using muzzle images. In Proceedings of the Second International Afro-European Conference for Industrial Advancement AECIA 2015, Villejuif, France, 9–11 September 2015; pp. 105–115. [Google Scholar]
- Hou, J.; He, Y.; Yang, H.; Connor, T.; Gao, J.; Wang, Y.; Zhou, S. Identification of animal individuals using deep learning: A case study of giant panda. Biol. Conserv. 2020, 242, 108414. [Google Scholar] [CrossRef]
- Salama, A.Y.A.; Hassanien, A.E.; Fahmy, A. Sheep identification using a hybrid deep learning and bayesian optimization approach. IEEE Access 2019, 7, 31681–31687. [Google Scholar] [CrossRef]
- Khaldi, Y.; Benzaoui, A.; Ouahabi, A.; Jacques, S.; Taleb-Ahmed, A. Ear recognition based on deep unsupervised active learning. IEEE Sens. J. 2021, 21, 20704–20713. [Google Scholar] [CrossRef]
- Gadekallu, T.R.; Rajput, D.S.; Reddy, M.; Lakshmanna, K.; Bhattacharya, S.; Singh, S.; Alazab, M. A novel PCA–whale optimization-based deep neural network model for classification of tomato plant diseases using GPU. J. Real-Time Image Process. 2021, 18, 1383–1396. [Google Scholar] [CrossRef]
- Yang, H.; He, X.; Jia, X.; Patras, I. Robust face alignment under occlusion via regional predictive power estimation. IEEE Trans. Image Process. 2015, 24, 2393–2403. [Google Scholar] [CrossRef] [PubMed]
- Wang, N.; Gao, X.; Tao, D.; Yang, H.; Li, X. Facial feature point detection: A comprehensive survey. Neurocomputing 2018, 275, 50–65. [Google Scholar] [CrossRef] [Green Version]
- Yang, H.; Carlone, L. In perfect shape: Certifiably optimal 3D shape reconstruction from 2D landmarks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 621–630. [Google Scholar]
- Hansen, M.F.; Smith, M.L.; Smith, L.N.; Salter, M.G.; Baxter, E.M.; Farish, M.; Grieve, B. Towards on-farm pig face recognition using convolutional neural networks. Comput. Ind. 2018, 98, 145–152. [Google Scholar] [CrossRef]
- Hitelman, A.; Edan, Y.; Godo, A.; Berenstein, R.; Lepar, J.; Halachmi, I. Biometric identification of sheep via a machine-vision system. Comput. Electron. Agric. 2022, 194, 106713. [Google Scholar] [CrossRef]
- Wang, Z.; Liu, T. Two-stage method based on triplet margin loss for pig face recognition. Comput. Electron. Agric. 2022, 194, 106737. [Google Scholar] [CrossRef]
- Meng, X.; Tao, P.; Han, L.; CaiRang, D. Sheep Identification with Distance Balance in Two Stages Deep Learning. In Proceedings of the 2022 IEEE 6th Information Technology and Mechatronics Engineering Conference (ITOEC), Chongqing, China, 4–6 March 2022; pp. 1308–1313. [Google Scholar]
- Billah, M.; Wang, X.; Yu, J.; Jiang, Y. Real-time goat face recognition using convolutional neural network. Comput. Electron. Agric. 2022, 194, 106730. [Google Scholar] [CrossRef]
- Xu, F.; Gao, J.; Pan, X. Cow face recognition for a small sample based on Siamese DB Capsule Network. IEEE Access 2022, 10, 63189–63198. [Google Scholar] [CrossRef]
- Marsot, M.; Mei, J.; Shan, X.; Ye, L.; Feng, P.; Yan, X.; Zhao, Y. An adaptive pig face recognition approach using Convolutional Neural Networks. Comput. Electron. Agric. 2020, 173, 105386. [Google Scholar] [CrossRef]
- Xu, B.; Wang, W.; Guo, L.; Chen, G.; Li, Y.; Cao, Z.; Wu, S. CattleFaceNet: A cattle face identification approach based on RetinaFace and ArcFace loss. Comput. Electron. Agric. 2022, 193, 106675. [Google Scholar] [CrossRef]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
- Adjabi, I.; Ouahabi, A.; Benzaoui, A.; Taleb-Ahmed, A. Past, present, and future of face recognition: A review. Electronics 2020, 9, 1188. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning texture transformer network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 16–18 June 2020; pp. 5791–5800. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Dong, X.; Bao, J.; Chen, D.; Zhang, W.; Yu, N.; Yuan, L.; Guo, B. Cswin transformer: A general vision transformer backbone with cross-shaped windows. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 12124–12134. [Google Scholar]
- Liu, Z.; Hu, H.; Lin, Y.; Yao, Z.; Xie, Z.; Wei, Y.; Guo, B. Swin transformer v2: Scaling up capacity and resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 12009–12019. [Google Scholar]
- Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, The Washington State Convention Center, Seattle, WA, USA, 16–18 June 2020; pp. 10781–10790. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic seg-mentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Girshick, R. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
- Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 91–99. [Google Scholar]
- Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
- Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37. [Google Scholar]
- Xia, X.; Li, J.; Wu, J.; Wang, X.; Wang, M.; Xiao, X.; Wang, R. TRT-ViT: TensorRT-oriented Vision Transformer. arXiv 2022, arXiv:2205.09579. [Google Scholar]
- Mehta, S.; Rastegari, M. Mobilevit: Light-weight, general-purpose, and mobile-friendly vision transformer. arXiv 2021, arXiv:2110.02178. [Google Scholar]
- Li, X.; Li, S. Transformer Help CNN See Better: A Lightweight Hybrid Apple Disease Identification Model Based on Transformers. Agriculture 2022, 12, 884. [Google Scholar] [CrossRef]
- Wang, W.; Xie, E.; Li, X.; Fan, D.P.; Song, K.; Liang, D.; Shao, L. Pvt v2: Improved baselines with pyramid vision transformer. Comput. Vis. Media 2022, 8, 415–424. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Zhong, Y.; Deng, W. Face transformer for recognition. arXiv 2021, arXiv:2103.14803. [Google Scholar]
- Li, Z.; Lei, X.; Liu, S. A lightweight deep learning model for cattle face recognition. Comput. Electron. Agric. 2022, 195, 106848. [Google Scholar] [CrossRef]
Reference | Dataset | Method | Accuracy (%) |
---|---|---|---|
[14] | 81 sheep | CNN | 95.00% |
[15] | 28 pigs | CNN | 96.80% |
[16] | over 5000 images of 547 sheep | CNN | 85.00% |
[17] | 3278 pictures of goats | CNN | 96.40% |
[18] | 945 images of cow faces | CNN | 91.67% |
[13] | 1553 images of 10 pigs | CNN | 96.70% |
[19] | 2364 images of pigs | CNN | 83.00% |
[20] | 2318 images of 90 cows | CNN | 91.30% |
Input | Operator | n | Stride | Output Channels |
---|---|---|---|---|
Conv2d | 1 | 2 | 16 | |
bottleneck | 1 | 1 | 32 | |
bottleneck | 3 | 2 | 64 | |
bottleneck | 1 | 2 | 96 | |
LinearViT | 2 | - | 96 | |
bottleneck | 1 | 2 | 128 | |
LinearViT | 4 | - | 128 | |
bottleneck | 1 | 2 | 160 | |
LinearViT | 3 | - | 160 | |
1 | 1 | 640 | ||
1 | 7 | 1000 | ||
FC | - | - | 128 |
Model | AP(%) | Model Size (MB) | FLOPs (G) |
---|---|---|---|
SSD | 99.90 | 52.20 | 15.00 |
Faster RCNN | 99.90 | 158.80 | 135.30 |
RetinaNet [36] | 99.00 | 140.00 | 20.60 |
YOLO V3 | 91.40 | 234.00 | 155.20 |
YOLO V4 [38] | 98.90 | 244.00 | 25.40 |
YOLO V5 (S) | 98.70 | 27.00 | 16.40 |
EfficientDet-D1 | 98.90 | 25.00 | 5.60 |
Model | Pre (%) | Recall (%) | F1 (%) | Acc (%) | Params (MB) | FLOPs (G) |
---|---|---|---|---|---|---|
VGG16 | 95.87 | 95.66 | 95.76 | 95.32 | 138.30 | 15.50 |
Resnet18 | 97.64 | 97.19 | 97.42 | 96.61 | 11.60 | 1.80 |
DenseNet121 | 97.80 | 97.18 | 97.49 | 96.84 | 7.90 | 2.80 |
EfficientNet-b0 | 92.03 | 91.25 | 91.64 | 90.88 | 5.30 | 0.39 |
MobilenetV2 | 89.72 | 87.83 | 88.76 | 87.15 | 3.50 | 0.31 |
MobilenetV3 | 90.84 | 89.58 | 90.21 | 89.83 | 5.40 | 0.22 |
ViT-small | 96.00 | 95.44 | 95.72 | 95.21 | 22.00 | 4.24 |
DeiT-small | 95.26 | 94.39 | 94.82 | 94.39 | 22.00 | 4.24 |
Swin-small | 98.78 | 98.57 | 98.68 | 97.58 | 49.60 | 8.50 |
Ours | 97.85 | 97.05 | 96.55 | 96.94 | 4.80 | 0.90 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Li, X.; Du, J.; Yang, J.; Li, S. When Mobilenetv2 Meets Transformer: A Balanced Sheep Face Recognition Model. Agriculture 2022, 12, 1126. https://doi.org/10.3390/agriculture12081126
Li X, Du J, Yang J, Li S. When Mobilenetv2 Meets Transformer: A Balanced Sheep Face Recognition Model. Agriculture. 2022; 12(8):1126. https://doi.org/10.3390/agriculture12081126
Chicago/Turabian StyleLi, Xiaopeng, Jinzhi Du, Jialin Yang, and Shuqin Li. 2022. "When Mobilenetv2 Meets Transformer: A Balanced Sheep Face Recognition Model" Agriculture 12, no. 8: 1126. https://doi.org/10.3390/agriculture12081126
APA StyleLi, X., Du, J., Yang, J., & Li, S. (2022). When Mobilenetv2 Meets Transformer: A Balanced Sheep Face Recognition Model. Agriculture, 12(8), 1126. https://doi.org/10.3390/agriculture12081126