Crop Guidance Photography Algorithm for Mobile Terminals
Abstract
:1. Introduction
2. Approach and Dataset Construction
2.1. Guidance Methods and Classification Reduction
2.2. Data Collection and Preprocessing
- Step 1.
- Resizing the image resolution to 224 × 224 pixels. The advantage of applying this step this is that it can reduce hardware load and improve network training speed without significantly reducing accuracy [24].
- Step 2.
- Using Equation (1), individual pixels’ RGB channels are separately processed to randomly alter the brightness, contrast, and saturation of an image, thereby augmenting the dataset. This process enables the trained model to adapt to varying lighting conditions and color biases. The ‘rand’ function generates floating-point random numbers within a specific range.
- Step 3.
- Performing normalization on the images, following Equation (2), to ensure that the pixel values of the images are within the range of −1 to 1. Here, for an individual pixel, the input grayscale value is represented as grayinput, and the output grayscale value is represented as grayoutput.
3. Orientation Discrimination Model
3.1. Model Selection
- Linear Bottleneck: In each convolutional layer, a 1 × 1 convolutional kernel is used for feature compression, reducing the number of input channels. Then, a non-linear activation function, ReLU6, is applied. This design allows for a reduction in the number of model parameters while maintaining good feature representation capabilities.
- Inverted Residuals: Traditional residual blocks perform feature expansion followed by feature compression. Inverted residuals, on the other hand, reverse this process. Inverted residuals begin with a 1 × 1 convolutional kernel for feature compression, followed by a 3 × 3 depthwise separable convolution for feature expansion. Finally, another 1 × 1 convolutional kernel is used for feature compression. This design helps improve the model’s non-linear expressive power while reducing computational complexity.
3.2. Model Architecture and Optimization
3.2.1. Increasing Sample Randomness
3.2.2. Model Pruning
- Conduct training on the training set, achieving sparsity by applying L1 regularization gradients to the Batch Normalization layer.
- Compute the absolute values of the scale parameters for all BN layers, calculate the average importance per channel, and use it as a metric for the importance of the channel.
- According to a predetermined proportion, prune the weights associated with channels of lower importance to obtain the pruned network.
- Due to changes in the network structure, the new network may be in an underfitting state, so perform secondary training to improve accuracy.
3.2.3. Knowledge Distillation
- Training the large pre-trained model with the training set and retaining the best-performing model as the teacher model.
- Fixing the teacher model and conducting one training session for the student model (same as step 3), while performing a grid search for hyperparameters. The loss calculation is shown in Equation (3), where the parameters are defined as follows:
- (a)
- “real” represents the actual one-hot label.
- (b)
- “pred” represents the predicted one-hot label.
- (c)
- “CE” denotes the cross-entropy loss function.
- (d)
- “KL” denotes the Kullback–Leibler divergence loss function.
- (e)
- The hyperparameter serves as a weight to adjust the emphasis of the student model’s learning toward the teacher model and the real labels.
- (f)
- The hyperparameter “temperature” can soften the probability distribution of the model output labels. A larger temperature value leads to a more softened distribution, while a smaller temperature value may amplify the probability of misclassification and introduce unnecessary noise.
- Utilizing the optimal hyperparameters obtained from grid search for offline distillation (as shown in Figure 6).
- (a)
- Making predictions using the teacher model to obtain the soft targets.
- (b)
- Making predictions using the student model to obtain the outputs to be optimized.
- (c)
- Computing the loss using the soft targets, hard targets (actual labels), and the outputs to be optimized.
- (d)
- Performing backpropagation of the loss and updating the student model.
- (e)
- Returning to step “a” until the model converges and the training is completed.
- Conducting a second round of training for the student model directly using the training set to enhance the model’s learning of the original labels.
4. Experimental Analysis and Testing
4.1. Overview
4.2. Concrete Analysis
4.2.1. Traditional Model Training Results
4.2.2. Increased Randomness
4.2.3. Model Pruning
4.2.4. Knowledge Distillation
4.3. Real Machine Testing
4.4. Effect Inspection
5. Summary and Outlook
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Conflicts of Interest
References
- Lv, Z.; Zhang, F.; Wei, X.; Huang, Y.; Li, J.; Zhang, Z. Tomato Flower and Fruit Recognition in Greenhouse Using Enhanced YOLOX-ViT Collaboration. Trans. Chin. Soc. Agric. Eng. 2023, 39, 124–134. [Google Scholar]
- Li, Z.; Jiang, H.; Yang, X.; Cao, Z. Detection Method of Mung Bean Seedling and Weed Based on Lightweight Deep Learning Model. J. Agric. Equip. Veh. Eng. 2022, 60, 98–102. [Google Scholar]
- Han, H.; Zhang, Y.; Qi, L. Review of Crop Disease and Pest Detection Based on Convolutional Neural Networks. Smart Agric. Guide 2023, 3, 6–9. [Google Scholar]
- Song, H.; Jiao, Y.; Hua, Z.; Li, R.; Xu, X. Detection of Embryo Crack in Soaked Corn Based on YOLO v5-OBB and CT. Trans. Chin. Soc. Agric. Mach. 2023, 54, 394–401. [Google Scholar]
- Li, H. Statistical Learning Methods; Tsinghua University Press: Beijing, China, 2012. [Google Scholar]
- Zhang, Y. Research on Detection of Tomato Growth Status in Sunlight Greenhouse Based on Digital Image Technology. Ph.D. Thesis, Northwest A&F University, Xianyang, China, 2022. [Google Scholar]
- Zhong, L.; Hu, L.; Zhou, H. Deep learning based multi-temporal crop classification. Remote Sens. Environ. 2019, 2019, 430–443. [Google Scholar] [CrossRef]
- Aslan, M.F. Comparative Analysis of CNN Models and Bayesian Optimization-Based Machine Learning Algorithms in Leaf Type Classification. Balk. J. Electr. Comput. Eng. 2021, 11, 13–24. [Google Scholar] [CrossRef]
- Lu, J.; Tan, L.; Jiang, H. Review on Convolutional Neural Network (CNN) Applied to Plant Leaf Disease Classification. Agriculture 2021, 11, 707. [Google Scholar] [CrossRef]
- Chen, D.; Neumann, K.; Friedel, S.; Kilian, B.; Chen, M.; Altmann, T.; Klukas, C. Dissecting the phenotypic components of crop plant growth and drought responses based on high-throughput image analysis. Plant Cell 2014, 26, 4636–4655. [Google Scholar] [CrossRef]
- Haug, S.; Ostermann, J. A crop/weed field image dataset for the evaluation of computer vision based precision agriculture tasks. Comput. Vis. ECCV 2014 Work. 2014, 9, 105–116. [Google Scholar]
- Wang, S.; Hu, D.; Kou, D. A Shooting Method and Device. CN Patent CN110445978A[P], 15 December 2020. [Google Scholar]
- Xie, Y.; Wu, K.; Liu, H. Control Method and Device for Aircraft, and Aircraft. CN Patent CN106125767B[P], 17 March 2020. [Google Scholar]
- Wang, D.; Xie, F.; Yang, J.; Liu, Y. Industry Robotic Motion and Pose Recognition Method Based on Camera Pose Estimation and Neural Network. Int. J. Adv. Robot. Syst. 2021, 18, 17298814211018549. [Google Scholar] [CrossRef]
- Wang, H.; Su, B.; Han, J. A Visual-Based Dynamic Object Tracking and Localization Method for Unmanned Aerial Vehicles. CN Patent CN103149939B[P], 21 October 2015. [Google Scholar]
- Xie, K.; Yang, H.; Huang, S.; Lischinski, D.; Christie, M.; Xu, K.; Gong, M.; Cohen-Or, D.; Huang, H. Creating and Chaining Camera Moves for Quadrotor Videography. ACM Trans. Graphs (TOG) 2018, 37, 1–13. [Google Scholar] [CrossRef]
- How Robots Can Pick Unknown Objects. Available online: https://sereact.ai/posts/how-robots-can-pick-unknown-objects (accessed on 5 April 2023).
- Zhu, H.; Peng, X.; Wang, H. Selfie Guidance Method and Device for Selfie Terminals. CN Patent CN106911886A[P], 30 June 2017. [Google Scholar]
- Feng, J.; Shu, P.; Denglapu, W.; Gamell, J. Video Conferencing Endpoint with Multiple Voice Tracking Cameras. CN Patent CN102256098B[P], 4 June 2014. [Google Scholar]
- Yamanaka, N.; Yamamura, Y.; Mitsuzuka, K. An intelligent robotic camera system. SMPTE J. 1995, 104, 23–25. [Google Scholar] [CrossRef]
- McKenna, S.J.; Gong, S. Real-time face pose estimation. Real-Time Imaging 1998, 4, 333–347. [Google Scholar] [CrossRef]
- Breitenstein, M.D.; Daniel, K.; Thibaut, W.; Luc, V.G.; Hanspeter, P. Real-time face pose estimation from single range images. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 24–26 June 2008; pp. 1–8. [Google Scholar]
- Erik, H.; Boon, K.L. Face Detection: A Survey. Comput. Vis. Image Underst. 2001, 83, 236–274. [Google Scholar]
- Tan, M.; Le, Q. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. CoRR 2019, 1, 6105–6114. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
- Cohen, R.A.; Choi, H.; Baji, C.I.V. Lightweight compression of intermediate neural network features for collaborative intelligence. IEEE Open J. Circuits Syst. 2021, 2, 350–362. [Google Scholar] [CrossRef]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the knowledge in a neural network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. Acm 2017, 60, 84–90. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.; Sun, J. Shufflenet v2: Practical guidelines for efficient cnn architecture design. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 116–131. [Google Scholar]
- Zhu, L.; Li, Z.; Li, C.; Wu, J.; Yue, J. High performance vegetable classification from images based on alexnet deep learning model. Int. J. Agric. Biol. Eng. 2018, 11, 217–223. [Google Scholar] [CrossRef]
- Jiang, B.; He, J.; Yang, S.; Fu, H.; Li, T.; Song, H.; He, D. Fusion of machine vision technology and AlexNet-CNNs deep learning network for the detection of postharvest apple pesticide residues. Artif. Intell. Agric. 2019, 1, 1–8. [Google Scholar]
- Paymode, A.S.; Malode, V.B. Transfer learning for multi-crop leaf disease image classification using convolutional neural network VGG. Artif. Intell. Agric. 2022, 6, 23–33. [Google Scholar] [CrossRef]
- Kumar, V.; Arora, H.; Sisodia, J. Resnet-based approach for detection and classification of plant leaf diseases. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2–4 July 2020; IEEE: Delhi, India, 2020; pp. 495–502. [Google Scholar]
- Bi, C.; Wang, J.; Duan, Y.; Fu, B.; Kang, J.-R.; Shi, Y. MobileNet based apple leaf diseases identification. In Mobile Networks and Applications; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–9. [Google Scholar]
- Hidayatuloh, A.; Nursalman, M.; Nugraha, E. Identification of tomato plant diseases by Leaf image using squeezenet model. In Proceedings of the 2018 International Conference on Information Technology Systems and Innovation (ICITSI), Bandung, Padang, 22–26 October 2018; pp. 199–204. [Google Scholar]
- Sun, W.; Fu, B.; Zhang, Z. Maize Nitrogen Grading Estimation Method Based on UAV Images and an Improved Shufflenet Network. Agronomy 2023, 13, 1974. [Google Scholar] [CrossRef]
Direction | Meaning | Guidance |
---|---|---|
Center | The object is in the center of the image frame, with some distance between the two frames. | None |
Oversized | The object box encompasses the image frame, with a length ratio > 2. | Move away from the object |
Undersized | The image frame encompasses the object box, with a length ratio > 2. | Move closer to the object |
Up | The center point of the object box is located in the upper area of the image frame, with the two boxes intersecting. | Rotate or move upwards |
Down | The center point of the object box is located in the lower area of the image frame, with the two boxes intersecting. | Rotate or move downwards |
Right | The center point of the object box is located in the right area of the image frame, with the two boxes intersecting. | Rotate or move towards the right |
Upper Right | The center point of the object box is located in the upper-right area of the image frame, with the two boxes intersecting. | Rotate or move towards the upper-right |
Lower Right | The center point of the object box is located in the lower-right area of the image frame, with the two boxes intersecting. | Rotate or move towards the lower-right |
Left | The center point of the object box is located in the left area of the image frame, with the two boxes intersecting. | Rotate or move towards the left |
Upper Left | The center point of the object box is located in the upper-left area of the image frame, with the two boxes intersecting. | Rotate or move towards the upper-left |
Lower Left | The center point of the object box is located in the lower-left area of the image frame, with the two boxes intersecting. | Rotate or move towards the lower-left |
Category Number | Orientation Category | Number of Images |
---|---|---|
0 | Center | 800 |
1 | Oversized | 725 |
2 | Undersized | 505 |
3 | Up | 695 |
4 | Down | 925 |
5 | Right | 505 |
6 | Upper Right | 765 |
7 | Lower Right | 820 |
8 | Left | 445 |
9 | Upper Left | 935 |
10 | Lower Left | 780 |
Total | 7900 |
Stage | Model Name | Accuracy | Precision | Recall | F1 score | FLOPs/M |
---|---|---|---|---|---|---|
Normal | AlexNet | 94.98% | 94.97% | 94.59% | 94.74% | 710.15 |
VGG16 | 95.82% | 95.22% | 94.75% | 94.92% | 1044.45 | |
ResNet18 | 9.83% | 0.59% | 9.09% | 1.11% | 1826.01 | |
SqueezeNet 1.0 | 21.13% | 22.09% | 20.59% | 19.82% | 153.65 | |
ShuffleNet V2 | 12.89% | 1.03% | 9.09% | 1.85% | 733.35 | |
MobileNet V2 | 12.09% | 17.10% | 9.39% | 2.45% | 332.96 | |
Random | AlexNet | 97.07% | 96.95% | 96.67% | 96.78% | 710.15 |
VGG16 | 96.95% | 95.95% | 95.62% | 95.63% | 1044.45 | |
ResNet18 | 92.97% | 91.87% | 91.33% | 91.48% | 1826.01 | |
SqueezeNet 1.0 | 33.72% | 29.34% | 29.99% | 29.15% | 153.65 | |
ShuffleNet V2 | 92.43% | 91.68% | 91.23% | 91.28% | 733.35 | |
MobileNet V2 | 95.10% | 95.12% | 94.53% | 94.77% | 332.96 | |
Sparse | 95.77% | 95.58% | 95.46% | 95.38% | 332.96 | |
Pruning | 6.49% | 0.59% | 9.09% | 1.11% | 160.35 | |
Re-train | 95.65% | 95.46% | 95.56% | 95.46% | 160.35 | |
Distillation | 96.36% | 96.32% | 95.99% | 96.11% | 160.35 | |
Re-train | 96.78% | 96.44% | 96.39% | 96.38% | 160.35 |
State | FLOPs/M | Param/M | Accuracy/% |
---|---|---|---|
Before Pruning | 332.9616 | 2.238 | 95.77 |
After Pruning | 160.3491 | 1.117 | 6.49 |
Device | Android | Processor | RAM | Camera | Announced |
---|---|---|---|---|---|
Xiaomi Redmi Note 9 4 G | 12 | Qualcomm SM6115 Snapdragon 662 (11 nm/2.0 GHz) | 6 GB | 48 MP, f/1.8, 26 mm (wide), 1/2.0″, 0.8 µm, PDAF | 26 November 2020 |
Xiaomi Redmi Note 12 T Pro | 13 | Mediatek Dimensity 8200 Ultra (4 nm/3.1 GHz) | 8 GB | 64 MP, f/1.8, 23 mm (wide), 1/2″, 0.7 µm, PDAF | 29 May 2023 |
Huawei Enjoy 20 SE | 10 | Kirin 710 A (14 nm/2.0 GHz) | 4 GB | 13 MP, f/1.8, 26 mm (wide), PDAF | 23 December 2020 |
Vivo iQOO Z5 | 13 | Qualcomm SM7325 Snapdragon 778 G 5 G (6 nm/2.4 GHz) | 8 GB | 64 MP, f/1.8, 26 mm (wide), 1/1.97″, 0.7 µm, PDAF | 23 September 2021 |
Vivo Pad | 13 | Qualcomm SM8250-AC Snapdragon 870 5 G (7 nm/ 3.2 GHz) | 8 GB | 13 MP, f/2.2, 112° (ultrawide), 1.12 µm, AF | 11 April 2022 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Jia, Y.; Zhao, Q.; Xiong, Y.; Chen, X.; Li, X. Crop Guidance Photography Algorithm for Mobile Terminals. Agriculture 2024, 14, 271. https://doi.org/10.3390/agriculture14020271
Jia Y, Zhao Q, Xiong Y, Chen X, Li X. Crop Guidance Photography Algorithm for Mobile Terminals. Agriculture. 2024; 14(2):271. https://doi.org/10.3390/agriculture14020271
Chicago/Turabian StyleJia, Yunsong, Qingxin Zhao, Yi Xiong, Xin Chen, and Xiang Li. 2024. "Crop Guidance Photography Algorithm for Mobile Terminals" Agriculture 14, no. 2: 271. https://doi.org/10.3390/agriculture14020271
APA StyleJia, Y., Zhao, Q., Xiong, Y., Chen, X., & Li, X. (2024). Crop Guidance Photography Algorithm for Mobile Terminals. Agriculture, 14(2), 271. https://doi.org/10.3390/agriculture14020271