1. Introduction
Tea is an important cash crop in traditional agriculture. The market demand for tea is vast. Global tea production is over USD 1.7 billion annually, while the world tea trade is valued at about USD 9.5 billion [
1]. Protecting the growth safety of tea trees and tea quality is essential to promote the development of the tea industry. Tea pests and diseases have a significant impact on tea production, which not only leads to low tea yields but also reduces the quality of tea, thus causing huge economic losses to tea farmers and tea producers [
2]. Traditional detection methods include expert identification, molecular biology, and spectroscopy. Traditional manual identification by experts suffers from the problems of long time, high subjectivity, and low accuracy, and inviting experts to conduct detection visits in the field is a costly and labor-intensive task [
3]. Molecular biology and spectroscopy tests are more accurate, but their learning and instrumentation costs are high [
4]. Therefore, research on the accurate and efficient detection of tea pests and diseases is significant.
With the development of updated iterations of computer vision technology, research on the identification and detection of tea pests and diseases is also being carried out. Image processing and machine learning methods are beginning to be applied to the detection of tea pests and diseases. These methods are mainly classified by segmentation of the disease spots and extraction of features, and more results have been achieved. In 2018, Md. Selim Hossain [
5] proposed a novel support vector machine for a disease recognition system. He and his co-researchers extracted eleven identifying features of two common tea diseases in Bangladesh and uploaded them to the SVM (support vector machine) database for detection and identification. The algorithm not only has faster processing time but also maintains a high level of accuracy. In 2019, Yunyun Sun [
6] proposed a new algorithm that combines SLIC (Simple Linear Iterative Clustering) with the SVM, and the results show that this method is very effective in improving the complex background to extract the leaf disease feature maps of tea plants. It can better identify the diseases and pests of tea. In 2020, Xiuguo Zou [
7] proposed a spectral reflectance-based method for tea disease and pest identification, which includes a feature selector based on a decision tree and a tea disease identifier based on a random forest. The results of the experiments show that the recognition results of this method are improved in both precision and recall. In 2022, S. Prabu detects tea diseases with the help of the SVM approach in traditional machine learning. The watershed transform algorithm is used for the clear segmentation of color-transformed images, and gradient eigenvalues of tea images are used in multi-class support vector machine classifiers to classify tea diseases. The performance is evaluated, and the model is proven effective and performs better [
8]. Considering that the life cycle of plant diseases is closely related to environmental conditions, Zhiyan Liu proposed in the same year to predict tea crop diseases utilizing multivariate linear regression with the help of environmental conditions in the crop field. From the implementation results, the accuracy of disease prediction can be as high as 91% [
9].
In machine learning methods for plant disease identification and detection, manual feature extraction is required for diseased leaves. The number of diseased tea leaves in samples is often high, and they share similarities in color, texture, and other characteristics. The accuracy of manual feature extraction greatly influences the performance of plant disease leaf identification and detection in traditional machine learning methods. The process of feature extraction is not only time consuming but also subjective [
10]. In recent years, deep learning methods, such as convolutional neural networks, have made rapid development. They are more direct, faster, and free from the constraints of manual feature extraction, which significantly improves the model’s learning ability and identification accuracy. In 2019, Gensheng Hu [
11] proposed an improved deep convolutional neural network (CNN)-based method for tea disease recognition. A multi-scale feature extraction module was added to the improved deep CNN of the CIFAR10 quick model to improve the ability to automatically extract features from images of different tea diseases. In 2020, Shyamtanu Bhowmik [
12] used a convolutional neural network (CNN) model for detecting and recognizing black rot and tea rust diseases of tea trees to achieve high accuracy while maintaining minimal computational complexity and minimal resources. In the same year, Shenghung Lee [
13] trained a faster region-based convolutional neural network (Faster R-CNN) to detect the location of disease spots on leaves and identify the cause of the spots. Relatively high overall accuracy was obtained for the identification of seven classes of tea diseases and pests. In 2021, Gen Sheng Hu [
14] used the Faster R-CNN framework to detect TLB leaf blades in order to improve the detection performance of fuzzy, occluded, and small pieces of diseased leaves. The detected TLB leaves were inputted into a trained VGG16 network to achieve severity grading and facilitate disease severity analysis. In 2022, He Li [
15] proposed a framework for recognizing pest and disease symptoms in tea based on Mask R-CNN, wavelet transform, and F-RNet. The Mask R-CNN model was used to segment disease and insect spots from tea leaves, and then a two-dimensional discrete wavelet transform was used to enhance the features of disease and insect spot images. Finally, the feature images were input into F-RNet for recognition. The experimental results show that the framework can accurately segment and recognize tea leaves’ pest and disease symptoms.
However, regarding pest and disease detection in tea leaves, all the studies mentioned above have focused more on identifying multiple pests and diseases. This approach allows for slightly larger training data, as deep learning models often require a large number of data samples to avoid overfitting in target detection methods. However, the use of this method can also lead to unsatisfactory accuracy of detection due to the differences between the number of samples of multiple pests and diseases and their vastly different target characteristics. Moreover, in reality, the occurrence time of tea leaf pests and diseases is limited by climate temperature and geographic location, so the types of pests and diseases that often occur in a certain region and a certain period are limited. However, datasets of tea tree leaf pests and diseases for a specific region and period are very demanding, and few high-quality datasets can meet the requirements because of the effects of shooting angles, lighting conditions, backgrounds, and other factors. Such a situation makes it difficult to obtain a large number of high-quality samples. Tea diseases vary in shape and leaf susceptibility to infection, but their common characteristic is that they are relatively small targets, which usually cause poor detection. There are several main reasons for this. First, existing datasets pay less attention to small targets; second, small targets are more prone to aggregation phenomenon; and third, there are fewer features available for small targets. The diseased portion of tea leaves has another troublesome problem on top of meeting the definition of small targets. That is, the distribution of diseased parts of tea leaves and healthy tea leaves is denser, and it is easy to obscure each other and difficult to distinguish. Due to this, the difficulty of accurate detection is further increased. For the task of target detection in the specific field of tea leaf diseases and pests, how to effectively utilize the small sample dataset and solve the small target detection problem is an urgent research problem.
Tea leaf blight (TLB) stands out as a prevalent affliction among tea plants. It arises from the presence of Colletotrichum Camelliae Massee, a type of imperfect fungi. The onset of tea leaf blight initially manifests through yellow-brown and water-soaked lesions on the leaves. As the condition progresses, these lesions transform into irregular patches characterized by light brown and gray hues. If left unchecked, severe cases can result in leaf loss and the apparent demise of new shoots, thereby diminishing the overall vitality of the plant. We have conducted an in-depth study on tea tree leaf blight as a disease and found that it is a small sample problem in terms of the dataset. This is due to the fact that tea leaf blight occurs less frequently in a given region and period, and the available data on the disease are relatively small. Furthermore, like tea diseases in general, TLB tends to be characterized by small detection targets and a tendency to overlap and accumulate. In this paper, based on the above considerations, we have selected a single species of TLB for our study and proposed a small target detection model for TLB combined with transfer learning. We used data augmentation and transfer learning to address the small sample problem of TLB. Our pre-trained models on the large-scale source data domain are transferred to the TLB dataset, followed by further model optimization for small targets. Since the overlap between diseased and healthy tea leaves is relatively high during the detection process and some of the TLB targets to be recognized are too small, we want to solve these problems to reduce the occurrence of wrong and missed detections. We use a decoupled detector TSCODE in the detection head part, add a Triplet Attention mechanism to the E-ELAN (extended efficient layer aggregation network) structure, and also introduce a small target detection evaluation method based on Wasserstein distance. All of them effectively improved the model’s ability to recognize small targets of tea diseases. The experimental results on the TLB small sample dataset show that our model can better solve the small sample and small target detection problems with higher accuracy and robustness relative to the traditional YOLOv7 tiny model.
In the following structure of the paper, we first give a comprehensive introduction to our dataset in
Section 2. After that, the three sections based on transfer learning, model construction, and model evaluation methods are elaborated. In
Section 3, there are four pieces of content. Firstly, we explain the environment and some parameters for model training. We select the source domain for transfer learning and show the result comparison, followed by a series of comparative experiments to compare the transfer learning and the impact of several modules on the model. Finally, we show some comparative results for the experimental results. In
Section 4, based on the results of this paper, we discuss and think about the research ideas in detail. In
Section 5, we briefly summarize the whole research work.
3. Results
3.1. Training Environment and Parameters
In this section,
Table 1 shows some of the language environments and the hardware and software environments during the experimental process.
Table 2 shows the number of training sets, validation sets, and test sets for the source and target domains, which have all been augmented after the 7:1.5:1.5 division of the two data.
Table 3 and
Figure 12 show some results for parameter selection.
Table 4 and
Table 5 show the training parameters of the model before and after transfer learning respectively.
The tea leaf blight (augmentation) dataset does not include the original samples before augmentation, i.e., the original tea leaf blight samples are not visible to the model trained after data augmentation. Thus, in the visualization of the detection effect in this paper, in order to fit the real scene, we use all the sample images in tea leaf blight and 182 images as the test set of the model so that the inference results obtained take into account the model’s generalization ability in the real scene.
In deep learning, optimizers and learning rates impact model performance. SGD and Adam are classical optimizers in model training. The basic idea of SGD is to continuously adjust the parameters of the model through gradient descent to minimize the model’s loss function. Adam’s basic idea is to adjust the parameters of the model by maintaining the first and second moments of the model’s gradient and the square of the gradient. Therefore, to achieve better performance of the model, we compared the detection performance of the model under different learning rates using the SGD optimizer and the Adam optimizer, respectively. The choice of learning rates in the comparison is based on empirical reference. The results of the comparison experiments are shown in
Table 3. Furthermore, we set the iteration parameter to 400 in the comparison experiments to discover the influence between different epochs which is presented in
Figure 12.
Due to hardware limitations, we set the batch size to 16 and the image size to 640 × 640. The initial learning rate and optimizer selection are based on the table above. With the optimizer set to SGD and the initial learning rate at 0.01, the model’s performance tends to be relatively higher. Performance improvement of the model using SGD and setting the learning rate at 0.01 becomes very marginal after 300 iterations, as indicated in
Figure 12. To save computational resources, we chose to set the epochs to 300 for subsequent experiments. The specific experimental parameters during transfer learning are shown in
Table 4 and
Table 5.
3.2. Transfer Learning Source Domain Selection
The core problem of transfer learning is to find the similarity between the two domains. Once this similarity is found, it can be rationally utilized to perform the transfer learning task well. Still, suppose the similarity between the two domains found is not rational or does not exist. In that case, it is impossible to perform the existing task, meaning there will be a negative transfer. For the source domain selection problem of transfer learning, existing research methods, such as the Selective Adversarial Network [
26] proposed based on adversarial learning, are generally complicated. In our study, we propose a hypothesis. We first assume that the pre-trained model can recognize TLBs to some extent, i.e., the knowledge drawn from the source domain is already helpful for recognizing TLBs without transferring. The one with better recognition ability is the source domain that is more suitable for transfer learning of TLBs. We default that all the labels marked by TLBs are invisible to the source domain, and then the detection task is performed directly on its test set. The detection results are shown in
Table 6, which indicates that the detection task of apple leaf disease can already recognize TLBs in a certain amount, i.e., the knowledge it has acquired on its original task is helpful for detecting TLBs to a certain extent. Meanwhile, the knowledge from the detection task of tea shoots is almost not helpful for TLB detection. Then, we use the pre-trained weights of the two alternative source domains to transfer on the model and then train it again. The final data metrics and recognition results we obtained also prove the reasonableness of this hypothesis, as shown in
Table 7. Observing the training results, we also have certain findings that although the gap between the weights of the models trained on the two source domains is large on the test set, the gap between the training results as pre-training weights is not very large. The detection plots of the same images after training on the apple leaf disease dataset and the tea shoot dataset, respectively, are given in
Figure 13 to visualize the disparity in effect between the different source domains. Although the detection effect of the apple leaf detection model for TLB is better than that of the tea shoot detection model, the situation of missed detection and false detection still exists, so it triggers further optimization of the TLB detection model in the next step. In summary, we finally chose the apple leaf disease set as the source domain for this study.
3.3. Ablation Experiments
To better test our proposed model’s performance and verify each improvement method’s necessity, we conducted ablation experiments. Based on transfer learning, i.e., using the weights obtained by training the model on the source domain as the initial weights for training, we added one improvement model method at each step of the ablation experiments to validate the improvement effect of each improvement method and the combination of different methods.
The experimental results of the ablation experiments are given in
Table 8. TL denotes the introduction of transfer learning, TS denotes the incorporated TSCODE detection header, TA denotes the incorporated Triplet Attention module, and NWD denotes the use of a loss metric based on the Normalized Gaussian Wasserstein Distance. Baseline refers to experiments using the YOLOv7 tiny model directly after dataset augmentation.
Although YOLOv7 tiny is one of the few lightweight models with the best detection and recognition results, its precision and recall values are still relatively low. After the introduction of transfer learning, the model has a certain degree of improvement compared to the baseline: precision increased by 1.1%, recall increased by 2.1%, AP increased by 2.2%, and F1-score increased by 2.5%. This indicates that the knowledge drawn from the source domain of apple leaf disease has played a helpful role in recognizing and detecting TLB, and the introduction of transfer learning is effective.
Improvement on the detection head, decoupling the original coupled detection head, and using the TSCODE detection head for recognition based on different tasks have improved the classification and localization ability of the model. The model has improved its precision by 1.9%, recall by 3.7%, AP by 3.4%, and F1-score by 3.3% compared to the baseline. The way TSCODE acquires the feature maps suitable for a specific task eases the contradiction between the classification and localization tasks and reduces the common phenomenon of missed and wrong detection in TLB small target detection.
Adding the Triplet Attention mechanism to the E-ELAN structure of the backbone improves the model’s precision by 2.1%, recall by 2.7%, AP by 3.0%, and F1-score by 3.3%. Due to the introduction of Triplet Attention, the weight information captured from multiple dimensions can better enhance the focus on small TLB targets and make the target area in the network set to obtain more detailed information, which can improve the overall detection effect of the model.
The new NWD-based loss function metric improves precision by 5.2%, recall by 2.1%, AP by 4.5%, and F1-score by 4.2%. This suggests that the new metric can indeed be effective in reducing the sensitivity of small target localization locations and improving the classification ability to a certain extent at the same time. Including the Wasserstein distance metric reduces the positional sensitivity to TLB small targets to better guide the model’s TLB small target detection ability during training.
Next, we further improve the model by trying different fusions of the three improved approaches to test the model’s performance. According to the results of the ablation experiments, it can be found that the combination of these several improvement methods has a certain degree of improvement in the AP compared with a single improvement, which reaches 89.9%. Still, some fusion methods lead to a slight decrease in precision and an increase in recall. In contrast, some have the opposite effect, i.e., precision is increased and recall is decreased, which suggests that the three improvement methods have different focuses on the performance enhancement of the model. This shows that the three improvement methods focus on the performance improvement of the model differently. We finally integrate all three improvement methods to obtain our final improved model, which achieves 92.2% in precision, 86.6% in recall, 91.5% in AP, and 89.3 in F1-score, which are 6.5%, 4.5%, 5.8%, and 7.1% higher than the baseline, respectively, and show better performance. This shows that our improvement approach is necessary for TLB detection capability enhancement, which can effectively reduce the occurrence of false and missed detections.
3.4. Comparison
We randomly selected a few more characteristic samples in the dataset for the detection task, compared them with the actual detection capabilities of our proposed model and YOLOv7 tiny, and presented the results in a visual way to show the differences between the two. In the following, the left image is the detection result of the baseline and the right image is the detection result of our model.
Figure 14 shows the detection scenario of TLB in general, and from the recognition results, we can find that our model is better than the baseline in recognition; both are wrongly detected and missed in this sample, but our model performs better in recognition confidence.
Figure 15 shows the detection scene when the shot is out of focus; the baseline model has a missed detection, but our model recognizes it, and under the same TLB small target recognition, our model has a smaller recognition frame, the localization effect is more accurate, and the detection confidence of our model is significantly better than that of the baseline model.
Figure 16 shows the detection scenario under the influence of the background of dead leaves, and the baseline model has several misdetections. However, none of them have a high confidence level, which is a disturbance to the detection results. Our model not only recognizes two TLBs in the sample with high confidence but also does not have one false detection. The baseline model is significantly worse than our model in the case of dead leaf background.
In the scenario where TLB is denser, the direct output of the resulting effect is more chaotic and complex, and the performance difference between the models cannot be compared well. We compare and contrast the recognition detection frame and the sample detection frame to re-visualize the recognition effect; the green recognition frame displays the correctly detected target, i.e., TP, the blue recognition frame displays the missed target, FN, and the red recognition frame displays the wrongly detected target, FP. After randomly selecting a sample, the visualization of this approach based on a confidence level of 0.45 is shown in
Figure 17. Our model has fewer false detections and fewer missed detections than the baseline model in this scenario, and the model has better TLB detection performance.
Finally, after introducing the three improvements, we use the grad-cam approach to visualize the feature maps output after the first E-ELAN module in the head section of the baseline model and the head section of our model. Since this layer belongs to the shallower layer of the head section, it retains more information about the small target features, which allows us to compare the difference in the tendency of the models in small target detection more directly.
Figure 18 is the visualization effect of a randomly selected sample. In the feature map of this layer, our model can focus more on the target to be identified in the relatively larger target, which is concentrated, and the center of the weight tends to be higher. In the case of TLB, the target itself is very small, which can ensure a certain amount of target localization and classification accuracy and is less susceptible to some environmental interference. The baseline’s model feature map of the TLB target basic baseline model seems to treat TLB targets equally. Still, the size of the heat map of the area attached to the target is relatively small, which brings more challenges to the model’s localization and classification ability and the model’s ability to be affected by the interference to a certain extent, which is also verified by the experimental results, and the baseline model’s robustness of the generalization ability is a little bit poorer.
At the same time, we compared our model with some mainstream detection models to better reflect its advantages. They include SSD, RetinaNet, and Efficientdet, which are also single-stage algorithms like our model. In addition, we have also added comparisons of two-stage algorithms, like Faster-RCNN. The comparison of model performance on the validation set is shown in
Figure 19.
From the recall values of SSD, RetianNet, and EfficientDet, it can be seen that the positioning of TLB small targets is very difficult, which is why this article believes in the significance of TLB small target recognition. Although the performance of Faster R-CNN and YOLOv5s is relatively good, it still cannot reach a satisfactory level. The overall comparison shows that our model has certain advantages in recall, precision, and AP0.5 performance. This shows that our model can ensure good recognition of TLB small targets when the dataset is small.
4. Discussion
The research idea of this paper is based on two main aspects: one is to improve and optimize the small sample problem based on the scarcity of datasets related to tea pests and diseases, and the other is to improve and optimize the problem with relatively tiny TLB objectives.
For the small sample problem, which is a relatively common situation for the tea leaf disease detection task, such a situation will greatly affect the generalization ability of the model in deep learning and is prone to overfitting. Therefore, in this paper, firstly, on the basis of the original dataset, offline and online data enhancement are combined, making the TLB dataset samples as first as possible. In the training strategy, transfer learning is introduced. In transfer learning, the selection of a source domain has a great relationship with the model’s performance after transfer, so we compare two source domains before transfer One is apple leaf disease detection with different goals and similar tasks, and the other is tea shoot detection with similar goals and different tasks. The way source domains are selected in general transfer learning tends to be more complex and is another difficult challenge. A hypothesis is proposed in this paper, which means going for direct validation on the validation with model weights obtained directly from training on different source domains. The dataset that works better is the more suitable source domain. This is also verified in the subsequent learning effect after transfer. Apple leaves have better results, but we also found an interesting phenomenon in our experiments, i.e., the directly validated tea shoot source domains have improved results after transfer, which may be more similar to the distribution of TLBs and tea shoot. That is, although there is not much information drawn from the source domain that can directly identify TLB, its model also learns the distribution of this type to some extent, which gives us a new direction to draw on for future source domain selection strategies. Based on this finding, we looked for relevant transfer learning studies. We found that when this situation occurs, we can consider using partial domain adaptation, i.e., all source domains that can be helpful to the target domain are included in the scope of transfer learning. An example is Selective Adversarial Networks (SANs) [
33] proposed by Zhangjie Cao. They avoid negative transfer by separating the samples of non-shared categories in the source domain and, at the same time, promoting positive transfer by maximally matching the distribution of samples in the shared category space. With this transfer approach, we can then try to learn all the knowledge embedded in the source domains of tea shoots and apple tree leaves.
For the small target problem, we mainly optimize and improve the network structure. The idea of YOLO is to turn the target detection problem into a regression problem and directly predict the probability of the regression bounding box after feature extraction and feature fusion, which ensures the model’s lightness to a certain extent, but when encountering problems such as the small target, the model’s detection performance is often not enough to support practical applications. Therefore, we utilize a TSCODE detection head for fused context decomposition tasks in TLB detection, and the experimental results show that such a detection head with different levels of feature maps for different tasks with fused contexts is very helpful in improving the model performance. To further improve the model’s performance, we try to optimize the feature extraction for small targets by adding Triplet Attention in the E-ELAN structure of the backbone part of the model. The amount of information on small targets is relatively small in the samples. Triplet Attention extracts the sample information from three dimensions together, and its introduction can maximize the information of small samples, making the network more focused on the key regions of small targets and improving the detection accuracy of small targets. Considering that the normal tea in the background of TLB is also denser and will have some impact on the detection of TLB, the introduction of an effective attention mechanism can help to inhibit the interference of the background on the small targets, which makes the network pay more attention to the information of the small targets and thus reduces the possibility of misdetection; the improvement of this situation is obvious in the visualization of the results. The results show that the detection recall is relatively low compared to precision. The reason may be that the target of TLB will be more sensitive to its localization position because it occupies fewer pixels in the sample and is smaller, so having a detection frame with a good regression effect is not easy. Therefore, we add a new metric on the loss function, i.e., based on Wasserstein distance, to carry out the positional difference metric between the detection frames. After optimizing and updating the model by incorporating the three improvements, the model has greatly improved its ability to locate and identify small targets. In the visualization of the heat map of the feature map, we can clearly find that the model has a preference to focus on the larger TLB targets among the same small targets, which also shows the imbalance between classification and localization in the detection task. This also shows the imbalance between classification and localization in the detection task. There is also a new idea for TLB small target detection, i.e., we can categorize the small targets that are already difficult to detect into harder and easier to detect and have a preference for detecting targets with high priority to improve the performance of the model. While reviewing papers on small object detection, I found that a strategy proposed by Chang Xu [
31] aligns closely with the approach mentioned above. Their research revealed that there is often a bias towards larger objects due to IoU threshold-based and center sampling strategies, similar to our tendency to focus on relatively larger objects within small object detection in this paper. This presents a scale imbalance issue. To mitigate such concerns, the authors introduced a Region-based Feature Learning and Assignment (RFLA) strategy based on receptive fields to achieve balanced learning for small objects. This finding shares similarities with our founds above.
Of course, there are some limitations in the research process of this paper. First, the dataset in this paper only involves one tea disease; when multiple diseases or pests are involved, different pests and diseases may have similar recognition features, which poses a greater challenge to the model performance and is likely to have problems, such as sample imbalance. In addition, tea samples may vary over time, which may lead to misdetection and faulty detection. At the same time, the light quality of the samples is also an important factor; the photo samples taken in this paper are basically under normal light, which is not guaranteed in specific cases. The possible impact of the bias of the dataset on the generalization ability of the model, which we have also not elaborated in depth, could serve as an entry point for our subsequent research on the small sample problem. Secondly, this paper mainly focuses on optimizing the model by improving its performance. It does not consider the lightweightness of the model, i.e., it does not consider the deployment possibilities and improves its classification and localization capabilities purely from the model perspective. Therefore, in the subsequent design and optimization process of the model, it is necessary to focus on how to deal with these issues in order to improve the performance and usefulness of the model further. In the future, our further research will continue based on the continuation of the research ideas in this paper, i.e., the small sample problem and the small target problem, in addition to the actual deployment and application of the model, which will also be included in the research scope.
In terms of the small sample problem, on the basis of some limitations existing in our study, the different approaches to transfer learning and the impact of different transfer layers on the model could serve as entry points for our future research. Different types of crops may require different transfer methods as well. Considering that some studies have proved that meta-learning can still maintain good recognition accuracy under small samples [
34], we can introduce the idea of meta-learning to optimize the tea leaf disease detection model. For the problem of scarcity of tea leaf disease and pest datasets, we can use oversampling or replicating multiple copies [
35] for the further enhancement of sample images. In addition, we consider the use of GAN [
36] to expand the dataset further and enhance the generalization ability of the model. At the same time, GAN can effectively eliminate this dataset bias during the training of models with low-quality samples, and such a research idea can also be included in consideration of subsequent research [
37].
For small-objective problems, the reinforcement learning approach, which has been prevalent in many other fields, can be borrowed to overcome and optimize small-objective problems [
38]. It can also be combined with traditional image processing methods based on the distribution law of tea diseases on infrared grayscale images [
39] to perform model training. In addition, we are concerned about the research content of hyperspectral technology in tea science-related aspects. Still, more often, we use hyperspectral technology in combination with traditional machine learning [
40] or in combination with traditional image processing techniques [
41] for a variety of classifications or quality detection. Combining it with deep learning may be able to deal with small target problems more effectively.
Regarding practical deployment application, the YOLO series of algorithms have demonstrated relatively good deployment effectiveness across multiple areas [
42,
43,
44], which is what we aim to achieve in our further research. Thanks to the modular design ideas borrowed from YOLO in this paper, our algorithm can also be very scalable due to the plug-and-play nature of some modules. In subsequent deployments, we will try to replace the backbone with less computationally demanding and lightweight network structures, such as shufflenetv2, mobilenetv2, and ghostnet. These approaches have been implemented in other fields [
45,
46]. We can consider deploying the model in a UAV and using some optimization algorithms for route deployment to better detect tea health [
47]. In addition, with the rise of the Internet of Things (IoT) and edge computing, we also hope to build a smart Internet of Things (IoT) hardware system, including the collection of tea leaf images by a high-definition zoom camera in a cluster structure and the deployment of a detection model through the edge computing nodes at the head of the cluster to realize the detection of tea leaf diseases [
48]. As deployment and practical application issues are further explored, the ability of deployment and the deployment performance of the model become inevitable considerations. The choice of training methods and model usage may also be subject to certain limitations, with research potentially leaning towards lightweight and recognition speed. This is also one of the reasons why these two aspects were not introduced in this paper’s evaluation methods.
At the same time, the idea of our research can be applied to other tea leaf pests and diseases and the detection of multiple tea leaf pests and diseases. With proper source domain selection and problem-specific structural optimization, the model can be made to perform better. A comprehensive tea leaf detection model with better adaptability is our ultimate goal.
5. Conclusions
Tea is an important cash crop in traditional agriculture. Protecting the growth of tea trees and the quality of tea leaves is crucial to the development of the tea industry. However, tea pests and diseases have a significant impact on tea yield and quality, leading to an average annual reduction in production of about 20%. Therefore, studying the accurate detection of tea pests and diseases is of great significance.
We selected tea leaf blight (TLB), a single species of disease, for our study, and after an in-depth study of it, we found that TLB has a small sample problem with respect to the dataset. To address this issue, we employ data augmentation and transfer learning. We transferred models pre-trained on the large-scale source data domain to the TLB dataset, followed by further model optimization for small targets.
During the detection process, due to the high overlap between diseased and healthy tea leaves and the small size of some of the TLB targets to be recognized, we worked on solving these small target problems to reduce the occurrence of false and missed detections. We utilize a decoupling detection head TSCODE in the detection head section and introduce Triplet Attention on the original E-ELAN structure. In addition, a small target detection evaluation method based on Wasserstein distance is introduced, significantly improving the model’s ability to recognize small targets of tea diseases.
The experimental results on the TLB small sample dataset show that our model can better cope with the small sample and small target detection problems with higher accuracy and robustness compared to the traditional YOLOv7 tiny model. Overall, this study successfully optimizes the detection model, which provides an effective solution for the accurate identification of tea tree leaf blight and an idea for building an effective detection model for tea tree leaf pests and diseases.