1. Introduction
Protecting agricultural crops is essential for preserving food sources. The health of plants plays a major role in impacting the yield of the agricultural output, and could result in significant economic loss [
1]. Plant diseases can be caused by viruses, bacteria, fungi, or even microscopic insects [
2]. Extreme weather conditions and unseasonable temperatures can spur the growth and spread of harmful organisms, which may destroy entire crops or drastically reduce yield. Furthermore, global travel and globalization are increasing the risk of infection with transboundary diseases. The Food and Agriculture Organization (FAO) estimates that plant diseases cost the global economy about USD 220 billion in damage per year [
3]. Thus, effective disease detection and control mechanisms are required to counter these factors [
4].
Grapes are an important and widespread fruit that is consumed as a food or beverage. The total global production of fresh grapes (i.e., combined table grapes and wine grapes) was 77.13 million metric tons in 2019, with a market value of USD 189.19 billion [
5]. However, they may be susceptible to several diseases that may affect their growth, production, and quality. Grape diseases can cause from 5% to 80% crop loss depending on the severity and spread of the disease [
6]. Thus, they are a major production risk with profound economic ramifications. The early and correct detection and identification of a disease is greatly important in reducing the disease progression and economic cost [
7]. Grapes are prone to a number of diseases, including black measles, black rot, and isariopsis leaf spot. These diseases cause significant damage and may need plant pathologists to correctly diagnose them [
8,
9]. The next few paragraphs provide more details about these diseases.
Grape black measles (Esca) is one of the oldest known diseases of plants, caused by the fungi Phaeomoniella Aleophilum or Chlamydospore. The disease affects the grape plant and appears on its leaves in the form of irregular yellow and circular spots. It is contagious and can spread in grape fields, destroying vast portions of the crop. Therefore, it must be discovered early to reduce the rapid spread and the resulting damage [
10].
Grape black rot is a severe disease that attacks the grape plant and appears clearly on its leaves. It is caused by the Guignardia Bidwell fungus. Black rot is a severe threat to grape production, with losses in the range of 5–80% [
11]. The control of black rot requires the use of fungicides, which leads to an elevated financial cost of farming.
Grape isariopsis leaf spot is a rapidly spreading disease that appears on leaves in the form of pale red to brown lesions. It is caused by the Pseudocercospora vitis fungus. Special fungicides are used as a remedy to limit the spread of the disease, so it must be detected early [
12].
Technological advances have powered many agricultural innovations. More specifically, artificial-intelligence (AI) and deep-learning algorithms can use plant images to drive farming control applications. Deep learning is based on neural networks that comprise a number of layers far greater than the input, output, and hidden layers. Convolutional neural networks (CNNs) are a type of deep-learning network that find features and relationships in images through a sequence of convolutional, pooling, and rectified linear unit (ReLU) layers that terminate in a fully connected layer. This layer aggregates the various features discovered by the former layers. CNNs are effective in discerning features regardless of many changes that can affect the input images [
13]. In designing CNNs-based systems, researchers can build the network structure from scratch. Alternatively, existing reliable and well-established models can be reused, and their learned knowledge can be transferred to other applications in transfer learning. Deep transfer learning adapts existing models with partial or full retraining in a manner that fits the new application. It has the advantages of detecting generic features (e.g., colors, borders) with earlier layers, and customizing later layers for specific applications.
The related literature includes several studies in the multiclass classification of grape diseases; see
Table 1. Huang et al. [
14] considered one healthy class and four grape diseases: black rot, black measles, phylloxera, and leaf blight. Classification was performed using one custom baseline model called Vanilla CNN, composed of 7 layers, and transfer learning using modified VGG16, AlexNet, and MobileNet. Furthermore, a fifth ensemble model was developed by combining the predictions from the four aforementioned models. The authors reported an accuracy range of 77–100%. Thet et al. [
15] used global average pooling to fine-tune the VGG16 model to perform six-way classification (one healthy + five diseases). Their dataset included the diseases of anthracnose, nutrient insufficiency, downy mildew, black measles, and isariopsis leaf spot. They reported 98.4% accuracy. Lauguico et al. [
16] compared the performance of three pretrained models: AlexNet, GoogLeNet, and ResNet-18. The dataset was composed of healthy leaf images and three disease types (i.e., black rot, black measles, and isariopsis). In their work, the highest accuracy was achieved by the AlexNet model (i.e., 95.65%). Similarly, Ji et al. [
17] aimed to classify the same set of diseases as that of Lauguico et al. They developed a CNN model called UnitedModel that works by combining features from the Inceptionv3 and ResNet50 models using global average pooling. They achieved an F1 score of 98.96%. Likewise, Lin et al. [
18] designed a custom CNN called GrapeNet that contained a convolutional block attention module. This, they claim, has the effect of emphasizing disease features and suppressing irrelevant information. They compared its performance to that of nine other models: GoogLeNet, Vgg16, ResNet34, DenseNet121, MobileNetv2, MobileNetv3_large, ShuffleNetV2, ShuffleNetV1, and EfficientNetV2_s. GrapeNet achieved the highest accuracy value of 86.29%. Liu et al. [
19] proposed a dense inception convolutional neural network that surpassed the performance of ResNet-34 and GoogLeNet with 97.22% accuracy. Tang et al. [
20] modified the design of ShuffleNetV1 using channelwise attention and achieved 99.14% accuracy. Andrushia et al. [
21] used capsules to represent spatial disease information in the design of convolutional capsule network and reported an accuracy of 99.12% as a result. Hasan et. al. designed a simple CNN that was able to achieve 91.37% accuracy. Goncharov et al. [
22] collected a special dataset of grape leaf images representing healthy, black measles, black rot, and Chlorosis classes. They performed classification using four models (i.e., VGG19, Inceptionv3, ResNet50, and Xception) and reported a highest accuracy of 90%.
In spite of the great advances in deep learning, traditional machine-learning and classification techniques are still proposed in the literature. Waghmare et al. [
23] used segmentation to extract the relevant parts of leaf images. After that, fractal features were extracted and fed to an SVM classifier, which achieved 96.6% accuracy. Jaisakthi et al. [
24], and Ansari et al. [
25] employed the same classifier as that of Waghmare et al. However, the former used global thresholding and semisupervised techniques to achieve 93% accuracy, and the latter used image enhancements and Haar wavelet transform to achieve 97% precision. Such nondeep neural-network methods are susceptible to image quality changes [
26,
27], require preprocessing steps that may introduce more errors or slower processing times, and do not achieve better performance than that of deep-learning methods.
Table 1.
The state of the art in classifying grape diseases. Each study included one class for the healthy state.
Table 1.
The state of the art in classifying grape diseases. Each study included one class for the healthy state.
Study | No. of Classes | Dataset | Approach |
---|
Huang et al. [14] | Five | 5937 leaf images | Custom CNN, VGG16, AlexNet, MobileNet, and an ensemble. |
Thet et al. [15] | Six | 6000 leaf images | Fine-tuned VGG16. |
Lauguico et al. [16] | Four | 4062 leaf images | AlexNet, GoogLeNet, and ResNet-18. |
Ji et al. [17] | Four | 1619 leaf images | UnitedModel. |
Lin et al. [18] | Seven | 2850 leaf images | GrapeNet custom CNN. |
Liu et al. [19] | Six | 7669 leaf images | Dense inception convolutional neural network. |
Tang et al. [20] | Four | 4062 leaf images | Improved ShuffleNet V1. |
Andrushia et al. [21] | Four | 11,300 leaf images | Convolutional capsule network. |
Goncharov et al. [22] | Four | 3200 leaf images | VGG19, Inceptionv3, ResNet50, and Xception. |
Waghmare et al. [23] | Three | 450 leaf images | Fractal features + SVM. |
Jaisakthi et al. [24] | Four | 5675 leaf images | Segmentation + colour features + SVM. |
Ansari et al. [25] | Two | 400 leaf images | Segmentation + Haar wavelet transform + SVM |
Hasan et al. [28] | Seven | 1000 leaf images | Custom CNN. |
The research landscape on the use of AI in agriculture and specifically in disease identification is ripe for more innovation and further confirmative studies. The work in this paper evaluates transfer learning using a wide range of CNN models to classify grape diseases. Instead of designing a CNN model from scratch, the work employs efficient well-known models that had undergone extensive evaluation to gain their place in the literature. Furthermore, the approach facilitates deployment and implementation by not requiring explicit feature extraction or elaborate image preprocessing. This work contributes the following:
Using leaf images as input, CNN models are implemented to classify grape diseases. Three such diseases were considered in this study: black measles, black rot, and isariopsis leaf spot. In addition, a fourth healthy class was included.
Using transfer learning, 11 CNN models were implemented to classify the input into one of the four classes.
The performance of the 11 models was measured and compared from various angles of classification capabilities using a wide range of metrics. Moreover, the training and valuations times were recorded. The results show that wrapping such models in mobile and smartphone devices can aid farmers in quickly and correctly identifying diseases.
In the next section, the input and dataset are described in detail, more information is provided about the CNN models, the hyperparameters and computing environment are specified, and the measures of performance are defined. This is followed by the results and discussion in
Section 3, and the conclusion is presented in
Section 4.
3. Results and Discussion
The performance was evaluated with the goal of comparing the effectiveness of CNN models in identifying the correct disease or health class of the input images. The results represent the mean overall values from ten random runs of the model building, training, and validation codes. Moreover, the minimal, maximal, and standard deviation for the 10 runs were reported for the accuracy metric.
The first of the data-split strategies was 60/40. The results for the performance evaluation metrics are shown in
Table 2. It was immediately evident that all models were able to easily discern the different types of diseases. The highest precision (i.e., 99.9%) was achieved by GoogLeNet, which is a relatively small and fast model. Other models achieved a very close performance, with the lowest precision being 99.1%. Further insight into the performance is provided by the confusion matrices in
Figure 2. They show that, in most cases, different images were classified perfectly. However, misclassification between black rot and measles was the source of all errors.
Table 3 shows the mean, minimum, maximum, and standard deviation of the accuracy taken over 10 runs and using 60% of the data for training. Some models were able to achieve a maximal accuracy of 100% in some of the multiple runs. The SqueezeNet and ResNet-18 models exhibited the most fluctuation with 1% and 0.7% standard deviation (SD), respectively.
Given the high performance of the deep-learning models using 60% of the data for training, there was a limited margin for improvement with the current dataset. Nonetheless, a further increase in the training subset to 80% of the dataset was examined. The mean overall F1 score, precision, recall, and specificity for all models and the 80/20 data split are shown in
Table 4. All models improved their performance within the possible small margins, with some of them achieving perfect classification scores. Sample confusion matrices, as generated by MATLAB, for the ResNet101 and DarkNet-53 models using 80% of the data for training are shown in
Figure 3. This corroborates the numbers in the previous tables. Moreover,
Table 5 shows the mean, minimal, maximal, and standard deviation of the accuracy taken over 10 runs and using 80% of the data for training. In comparison to
Table 3, more models were able to achieve a perfect accuracy in some of the runs, and the minimal accuracy increased for all models. Furthermore, the previously reported highest standard deviation values over the 10 runs were lower when using 80% of the data for training.
The time results in the various performance evaluation scenarios are shown in
Table 6. There were huge differences in the training or validation times between the various models, even though their classification abilities were very similar. Moreover, there was a huge difference between the slowest and fastest models (i.e., 150.6 vs. 3185.4 s using a 60/40 split, and 169.5 vs. 4038 s using an 80/20 split). In addition, increasing the size of the training set resulted in an increase in a wide range of factors across the various models (e.g., the scalability of SqueezeNet vs. Xception).
The research landscape on the use of AI in agriculture and specifically in disease identification is receiving increased focus and efforts.
Table 7 shows the state-of-the-art results in classifying grape diseases. Huang et al. [
14] achieved an accuracy range that could reach 100%. However, the highest results were produced using an ensemble model that combines features from multiple individual models. Such an approach involves high overhead, as deep-learning models require extensive computational and special requirements. Moreover, although they used the same dataset as the work in this paper, the dataset size was manipulated by augmentation to balance the number of images across all classes. This artificial inflation of a dataset can bias the results due to data leaking, as slightly modified copies of the same image can be easily detected by deep-learning algorithms. In essence, the algorithm was trained and tested on the same images. Thet et al. [
15] repeated their training several times and kept the results from the best performing model, which may not reflect the stable performance of the model. Lauguico et al. [
16] worked in a different way compared to the literature: by montaging and combining multiple disease images (i.e., 9 images in a 3 × 3 matrix) into a single image. After that, object recognition algorithms rather than classification methods were used to detect the various diseases. However, such an approach seems to needlessly complicate the problem in order to apply the object detection algorithms. Furthermore, it does not correspond to real-life application scenarios of the algorithm (i.e., why and how would a user combine multiple diseases into a single image?). Ji et al. [
17] employed a similar approach as that of Huang et al. [
14] by combining multiple CNN models. However, the main critique of their work lies in the small number of images in their dataset (1619 images in total) with two classes tested with much fewer than 100 images. Waghmare et al. [
23], and Ansari et al. [
25] used even fewer total images, namely, 450 and 400, respectively. Deep-learning algorithms learn better, and achieve a more stable and less variant performance with a large number of images [
47]. Goncharov et al. [
22] expanded their dataset by dividing single images into at least five new images each, which is the opposite approach to the one employed by Lauguico et al. [
16]. However, this may skew the results, as this manual data manipulation artificially aids the deep-learning algorithms by pinpointing and segmenting important features. A similar duplication that definitely leads to data leaking and inflated performance results was employed by Liu et al. [
19], and Andrushia et al. [
21].
The present study has some limitations. First, more grape diseases (e.g., chlorosis and nutritional deficiency) need to be included in order to develop a truly grape-dedicated comprehensive application. Moreover, further disease types from other plants can be collectively included in the dataset. However, such diversification and variety in the application may not be necessary. Customizing applications to special types of plants may be more accurate and useful (i.e., multiple deeply specialized applications versus one holistic model). Second, the input images were taken using a standard background that does not reflect real-life scenarios and eliminates many sources of classification errors. Third, in order to add more images into the dataset, and improve the classification accuracy and robustness of the models, mobile applications need to be developed and deployed.