1. Introduction
The plant diseases affect the leaves, stems, roots and fruits; it also affects the crop quality and quantity, which causes food deprivation and insecurity throughout the world [
1]. The estimated annual crop yield loss due to crop disease is about 16% globally, which is the primary cause of food shortage and increased food production costs [
2]. According to the Food and Agriculture Organisation Report (FAO), the world’s population will reach approximately 9.1 billion by 2050. For a steady food supply, about 70% of food production growth is required [
3]. The factors affecting the plants and their products are categorised as diseases and disorders. The biotic factors are the diseases caused by algae, fungi, or bacteria, whereas the biotic factors inducing disorders are rainfall, moisture, temperature and nutrient deficiency [
4].
There exist many methods to diagnose plant diseases; one of the primary and straightforward approaches is a visual estimation. The traditional plant disease diagnosing techniques depend on the farmer’s experience, which is most uncertain and unreliable. Compared to the conventional plant disease diagnosing techniques, the researchers have introduced the spectrometer to diagnose the plant leaves as healthy and infected [
5]. Another method is to extract the leaves’ DNA by using the polymerase chain reaction [
6] or real-time polymerase chain reaction [
7]. Such techniques are challenging, expensive and time-consuming, require a highly professional operation, experiment condition and massive use of crop protection products. The recent advancement in Artificial Intelligence (AI), Machine Learning (ML) and Computer Vision (CV) technologies allow developing the automated plant leaf disease detection techniques. These techniques can efficiently and accurately detect plant leaf diseases in a brief time without human intervention. It has been observed that DL has been the most prominent usage in agriculture [
8]. It helps to make substantial efforts to develop, control and enhance agricultural production.
Deep learning is the core of smart farming by adopting new devices, technologies, and algorithms in agriculture [
9]. Deep learning is widely used to solve complex problems such as feature extraction, transformation, pattern analysis and image classification [
10]. Many researchers used deep learning for crop disease diagnosing [
11,
12,
13]. Chen et al. [
11] proposed a deep learning model that counts the apples and oranges from the real-time images. Dias et al. [
12] presented an apple flower semantic segmentation using the convolutional neural network (CNN) counting the number of flowers from the plants. Ubbens et al. [
13] conducted a study to estimate the plant leaves using the CNN model.
Recently, numerous types of deep learning architecture has been proposed for plant disease classification. The most prominent technique is the convolutional neural network (CNN). A convolutional neural network is a supervised deep learning model inspired by the biological nervous system and vision system with significant performance compared to other models. As compared to Artificial Neural Network (ANN), CNN requires few neurons and multilayer convolution layers to learn the features, but it required an extensive dataset for training [
3,
10].
In the last few decades, several techniques have been developed to detect leaf diseases in various crops [
8,
14,
15]. In most of the techniques, features were extracted using the image processing techniques, then extracted features were fed to a classification technique. Deepa and Nagarajan [
16] proposed a plant leaf disease detection technique. The authors first applied the Kuan filter for noise removal and applied a Hough transformation to extract the colour, shape and texture features. A reweighted linear program boost classification was applied to classify the plant leaf disease. The PlantVillage dataset was used to evaluate the performance of the proposed technique. Karthik et al. [
17] proposed a two-level deep learning technique for tomato leaf disease detection. The first model was applied to learn the significant features using residual learning, and the second-deep learning model was involved as an attention mechanism on top of the first model. The authors used the PlantVillage dataset to identify the late blight, early blight and leaf mold diseases of tomato crop. Zhang et al. [
18] proposed an improved Faster RCNN method to determine the tomv, leaf mold fungus, blight and powdery mildew diseases of tomato crop. The researchers replaced the VGG16 model with a depth residual network to extract the features. For bounding boxes, a k-mean clustering algorithm was used. Sambasivam and Opiyo [
19] researched cassava mosaic disease and cassava brown streak virus disease using convolutional neural networks. The dataset was imbalanced; therefore, training the model was challenging.
Many problems exist in the literature using deep learning approaches. The first problem is that the existing methods did not correctly identify the Pakistani region potato leaves diseases because all the current practices were trained on the PlantVillage dataset only. There is variation in potato diseases in different parts of the world due to variation in various factors such as shape, varieties and environmental factors. Therefore, the existing systems have a high false rate to recognise potato diseases in the Pakistani region. The second problem is that the PlantVillage dataset has fewer images, whereas to train any CNN model dataset should be huge. The plantVillage dataset has only 152 images for healthy potato leaf images. Suppose we split it into training, validation and testing by 80%, 10% and 10% ratios, respectively, then the normal leaf class has been further reduced for training. In that case, the existing methods have inadequate training in that class. The plantVillage dataset has an imbalanced class as the late blight and early blight classes have 1000 images each; on the other hand, the normal leaf class has only 152 images. Then, there is a chance of over-fitting to train the model. Therefore, such a method failed to achieve high accuracy for the other regions of the world, such as Pakistan. Therefore, there is a dire need to develop a new dataset to detect the Pakistani region potato leaves’ diseases in order for farmers in Pakistan can determine the diseases of potato in their early stage and enhance their income and boost the country’s economy. The other problem is that most of the methods did not evaluate their performance on unseen images because the dataset was already minimal. The version of any model can be marked as good when it is tested on unseen data. Another problem is that the current methods have a low convergence speed due to the vast number of trainable parameters, and accuracy needs to be improved. The last problem in the literature is the non-availability of the potato leaf segmentation technique. This research is conducted to resolve the above research gaps.
The present research proposed a multi-level deep learning model for potato leaf disease recognition. At the first level, it extracts the potato leaves from the potato plant image using the YOLOv5 image segmentation technique. A novel potato leaf disease detection convolutional neural network (PDDCNN) has been developed at the second level to detect the early blight and late blight potato diseases from potato leaf images. Then, the performance of the proposed potato leaf disease detection convolutional neural network (PDDCNN) evaluated on the potato leaf dataset (PLD). The PLD dataset has been developed by capturing the images of potato leaves across various areas of Pakistan’s Central Punjab region. The images are cropped and labelled with the help of plant pathologists.
The following are the main contributions of this research:
A real-time novel Potato Leaf Segmentation and Extraction Technique using YOLOv5 has been developed to segment and extract potato leaves from the images.
A novel deep learning technique called Potato Leaf Disease Detection using Convolutional Neural Network (PDDCNN) has been developed to detect the early blight, late blight diseases from potato leaf images.
The proposed method has an optimal number of parameters as compared to state-of-the-art models.
The development of a potato leaf disease dataset from the Central Punjab region of Pakistan by capturing three types of potato leaf images: early blight, late blight and healthy.
The rest of the article is organised as related work is presented in
Section 2, materials and methods is in
Section 3, results and discussion are described in
Section 4, while
Section 5 presents the conclusion and future work followed by the references.
2. Related Work
In the last few decades, many researchers worked on multiple crops, including potatoes; their focus was not on the single potato crop diseases [
20,
21,
22]. The models were trained on specific region dataset (PlantVillage [
23]), which was developed in the USA and Switzerland. The diseases of potato vary from other regions due to the difference in leaf shapes, varieties and environmental factors [
24]. Geetharamani and Pandian [
20] proposed a deep CNN model to differentiate between healthy and unhealthy leaves of multiple crops. The model was trained using the PlantVillage dataset with 38 different types of crops with disease leaf images, healthy leaf images and background images. The focus of the model was not on single potato crop diseases. The model is also trained in specific region dataset USA and Switzerland, which failed to detect the Pakistani region potato leaf diseases. Kamal et al. [
21] developed plant leaf disease identification models named Modified MobileNet and Reduced MobileNet using depthwise separable convolution instead of convolution layer by modifying the MobileNet [
25]. The proposed model was trained on multiple crops of the PlantVillage dataset, where the plant leaf images were collected from a specific region of the world. Khamparia et al. in [
22], proposed a hybrid approach to detect crop leaf disease using the combination of CNN and autoencoders. The model was trained on the PlantVillage dataset for multiple crop diseases and specific region diseases. In [
26], Liang et al. proposed a plant disease diagnosis and severity estimation network based on a residual structure and shuffle units of ResNet50 architecture [
27]. The PlantVillage dataset was also used to detect the multiple crop diseases of a specific region. Ferentinos [
28] investigated AlexNet [
28], Overfeat [
29], AlexNetOWTBn [
30], VGG [
31] and GoogLeNet [
32] deep learning-based architectures in order to identify the normal or abnormal plants from plant leaf images. The researchers performed the transfer learning approach using the PlantVillage dataset to detect the specific region’s multiple crops diseases.
Many researchers worked on potato crops diseases but also trained the models on a specific dataset PlanVillage. Khalifa et al. [
33] proposed a CNN model to detect early blight and late blight diseases along with a healthy class. The researchers trained their model on the PlantVillage dataset, which is for specific regions’ crops only. Rozaqi and Sunyoto [
34] proposed a CNN model to detect the early blight, late blight disease of potato, and a healthy class. They trained the model on the PlantVillage dataset to detect the diseases of a specific region. Sanjeev et al. [
35] proposed a Feed-Forward Neural Network (FFNN) to detect early blight, late blight diseases along with healthy leaves. The proposed method was trained and tested on the PlantVillage dataset. Barman et al. [
36] proposed a self-build CNN (SBCNN) model to detect the early blight, late blight potato leaf diseases, and healthy class. The PlantVillage dataset was also used to train the model, which is for a specific region. They did not validate their model on unseen test data. Tiwari et al. [
37] used a pre-trained model VGG19 to extract the features and used multiple classifiers KNN, SVM and neural network for classification. The model also trained on the PlantVillage dataset to detect the early blight and late blight disease of potato leaves. They did not test their model on unseen data. Lee et al. [
38] developed a CNN model to detect the early blight, late blight diseases, and healthy leaves of potato. The researchers also used the PlantVillage dataset belonging to a specific region. The model was not tested on unseen data. Islam et al. [
39] proposed a segment-based and multi-SVM-based model to detect potato diseases, such as early blight, late blight and healthy leaves. Their method also used the PlantVillage dataset and also needs to be improved in terms of accuracy. As shown in
Table 1.
3. Materials and Methods
Many problems exist in the literature using deep learning approaches, including incorrect identification of potato leaf diseases, variation in potato diseases, varieties and environmental factors. The existing systems have a high false rate to recognise potato diseases in the Pakistani region. The existing potato leaves disease datasets contain inadequate training samples with imbalanced class samples. Another problem is that the current methods have a low convergence speed due to the vast number of trainable parameters, and accuracy needs to be improved. The last problem in the literature is the non-availability of the potato leaf segmentation technique. A multi-level deep learning model for potato leaf disease recognition is proposed to classify the potato leaves diseases in this research. At the first level, it extracts the potato leaves from the potato plant image using the YOLOv5 image segmentation technique. A novel potato leaf disease detection convolutional neural network (PDDCNN) has been developed at the second level to detect the early blight and late blight potato diseases from potato leaf images. The flow chart of the proposed method is shown in
Figure 1, the algorithm is described in Algorithm 1, and the proposed methodology overall architecture is shown in
Figure 2.
Algorithm 1 Multi-Level Deep Learning Model for Potato Leaf Disease Recognition Algorithm |
Capture the real-time videos and images of Potato plants from the lab and field environment. Convert the videos to frames (images). Annotate the potato leaf images (single class) and save the annotation in YOLOv5 and XML format. Divide the Potato Leaf Images Dataset into training, validation and testing sets. Pre-processing (auto-orient and resize) is applied to the annotated images. Pre-processing (data augmentation) is applied to the training set. Save the Potato Leaf Images Dataset into YOLOv5 PyTorch format. Upload the dataset into google drive. Train and validate the custom YOLOv5s model with the help of Google Colab by using the Potato Leaf Images Dataset. Classification output of YoloV5s model and using the annotations of Potato Leaf Images Dataset, potato leaves were extracted/segmented and made Potato Leaf Disease Dataset (PLD). Label the images of PLD with the help of plant pathologists with their respective classes. Pre-process all the images by applying data augmentation. Divide the dataset among training, validation and testing with 80%, 10% and 10% ratios, respectively. Train the CNN model with the help of training images. Use the validation images to validate the CNN model at the end of each epoch. Save the Trained PDDCNN Model. Testing is applied to the PDDCNN trained model using testing images.
|
3.1. Dataset
The performance of deep learning models heavily depends upon an appropriate and valid dataset. In this research, the following datasets are used.
3.1.1. PlantVillage Dataset
The PDDCNN method’s performance is assessed using the potato leaf images of a publicly available dataset called PlantVillage [
23]. The PlantVillage dataset was developed by Penn State University (US) and EPFL (Switzerland), which is a non-profit project. The database consists of JPG colour images with 256 × 256 dimensions. It has 38 classes of diseased and healthy leaves of 14 plants. The focus of this research is on the potato crop. Therefore, 1000 leaves for late blight, 1000 leaves for early blight, and 152 images of healthy leaves were selected for the experimental purposes, as shown in
Table 2.
3.1.2. Potato Disease Leaf Dataset
In the literature, only the PlantVillage dataset has been used to develop the models because only the PlantVillage dataset is publicly available for potato leaf diseases. All the researchers used the PlantVillage dataset in their research, but there are many research gaps found in the literature. The PlantVillage dataset has been developed from the specific region under particular geography and environmental factors. There is variation in potato diseases of different parts of the world due to variation in various factors such as shape, varieties and environmental factors. Therefore, the existing systems have a high false rate to recognize potato disease detection in the Pakistani region potato leaf images, as shown in
Table 3. The PlantVillage dataset also has fewer images and an imbalanced class distribution. Therefore, there is a dire need to develop a new potato leaves dataset collecting from the Pakistani areas. It will help the researchers train their models to identify Pakistan’s potato leaf diseases that will be useful for Pakistani farmers to detect the potato diseases in their early stage.
Thus, a new Potato Leaf Dataset (PLD) has been developed from Pakistan’s Central Punjab region. We collected our real-time dataset in the form of videos and pictures. Different capturing devices, such as mobile phone cameras, digital cameras and drones, were used to make the variations in the real-time dataset. The capturing distance for the mobile phone cameras and digital cameras were 1–2 feet, whereas the capturing distance for the drone was set at 5–10 feet. Drone fanning distorted the videos and images because of plant leaves movement; therefore, we maximised the plant and drone distance as much as possible. We selected the district Okara in the Central Punjab region of Pakistan due to the higher cultivation of potato and focus on the varieties of potato found in the district Okara: Coroda, Mozika and Sante. Potatoes of different varieties acclimatised to the native environment were sown in the agricultural land exposed to sunny conditions during November, 2020. Potatoes were grown in rows and segregated at a distance of 3 feet apart from each other. Seeds of the plants were cultivated by digging the soil pit hole to a depth of 6–8 inches and having a 5-inch width. Seeds were placed in the pit hole and were covered with the manure mixed soil and further irrigated with canal water. We captured the images and videos with varying conditions, i.e., morning, evening, noon, cloudy, sunny, rainy, etc. The healthy and infected leaves were annotated with the use of the LabelMe tool into YOLOv5 PyTorch format and XML format. For segmentation and leaf extraction, the YOLOv5s model is trained from scratch. With the help of YOLOv5s model output and annotations, potato leaves were extracted with the help of Python code. With the help of plant pathologists, a total of 4062 potato healthy and diseased leaf images were selected in the PLD dataset. Then, plant pathologists labelled the images into early blight, late blight and normal leaf classes. The plant leaf dataset consisted of 1628, 1414 and 1020 potato leaf images for early blight, late blight and healthy classes, respectively, as described in
Table 4. The sample images of the PLD dataset are shown in
Figure 3. The PLD dataset can be accessed from
https://drive.google.com/drive/folders/1FpcQA66pEg0XR8y5uEzWU__REPpqSAPD?usp=sharing, accessed on 20 June 2021.
3.2. Image Pre-Processing
For achieving more consistency in classification results and better feature extraction, pre-processing was applied to the final images of the PLD. The CNN method needed a lot of iterative training; for this purpose, a large-scale image dataset was required to eliminate the chance of overfitting.
3.2.1. Data Augmentation
Different data augmentation techniques were applied to the training set using the Image Data Generator method of Keras library in Python to overcome overfitting and enhance the dataset’s diversity. The computational cost was reduced using the smaller pixel values and the same range; for this purpose, it used scale transformation. Therefore, every pixel value was ranged from 0 to 1 using the parameter value (1./255). Images were rotated to a specific angle using the rotation transformation; therefore, 25 was employed to rotate the images. Images can shift randomly either towards the right or left by using the width shift range transformation; selected a 0.1 value of the width shift parameter. Training images moved vertically using the height shift range parameter with a 0.1 range value. In the shear transformation, one axis was fixed of the image and then stretched the other axis to a specific angle known as the shear angle; therefore, a 0.2 shear angle was applied. The zoom range argument was applied to perform the random zoom transformation; >1.0 means magnifying the images, and <1.0 was used to zoom out the image; therefore, 0.2 zoom_range was employed to magnify the image. Flip was applied to flip the image horizontally. Brightness transformation was applied, in which 0.0 means no brightness, and 1.0 means maximum brightness; therefore, we employed a 0.5–1.0 zoom range. In channel shift transformation, randomly shift the channel values by a random value was selected from the specified range; therefore, a 0.05 channel shift range was applied, and the fill mode was nearest.
3.3. Training, Validation and Testing
The entire PLD dataset was divided into three parts, training, validation and testing. The training dataset was used to train the PDDCNN model, while we utilised the validation and test dataset to evaluate the proposed model’s performance. Therefore, we split the training, validation and testing datasets with 80%, 10% and 10%, respectively. For the PLD dataset, 3257, 403 and 403 images for training, validation and testing were used, respectively. Different data augmentation techniques performed on the training set, i.e., rescaling, rotation, width shift, height shift, shear range, zoom range, horizontal flip, brightness and channel shift with the fill mode nearest to increase the diversity and enhance the dataset. It would overcome the overfitting problem, thus ensuring the generalisation of the model.
In the CNN model, training was performed on the training samples from the input layer to the output layer, making a prediction, and errors or results were figured out. In the case of a wrong prediction, back-propagation was performed in reverse order. Therefore, in the current research, the back-propagation algorithm was applied to adjust the model weights accordingly for a better prognosis. The complete process of forwarding and back-propagation was known as one epoch. The model used the Adam optimising algorithm for the research. The current study had taken the training images from class labels early blight, healthy and late blight, respectively, while maintaining the 80% image ratios. The remaining 20% of untouched images were further split into validation and testing with a 10% ratio each on both datasets. The proposed PDDCNN model was trained on a training dataset to classify and predict every training image’s class label.
3.4. Potato Leaf Segmentation and Extraction Technique Using YOLOv5
The latest product of the YOLO architecture series is the YOLOv5 network [
40,
41]. The recognition exactness of this organisation model is high, and the inference speed is quick, with the quickest identification speed being 140 frames each second. Then again, the size of the weight file of YOLOv5 target identification network model is small, which is almost 90% more modest than YOLOv4, demonstrating that YOLOv5 model is appropriate for deployment to the embedded devices to implement instantaneous detection. Hence, the benefits of YOLOv5 network are its high detection accuracy, lightweight attributes and quick recognition speed simultaneously. The YOLOv5 architecture comprehends four architectures, specifically named YOLOv5l [
41], YOLOv5x [
41], YOLOv5m [
41] and YOLOv5s [
41], correspondingly. The key modification between them is that the amount of feature extraction modules and convolution kernel in the specific location of the network is diverse. The number of model parameters and the size of models in the four architectures increase in turn. In this research, we used YOLOv5s architecture, as shown in
Figure 4.
The YOLOv5s [
41] framework primarily comprises three elements, including neck network, backbone network and detect network. A backbone network is a convolutional neural network (CNN) that combines diverse fine-grained images and forms image features. Precisely, the first layer of the backbone is intended to decrease the calculation of the model and speed up the training speed. Its functions are as follows: Initially, the input 3 channel image (the default input image size of YOLOv5s architecture is 3 × 640 × 640) was segmented into four portions with the size of 3 × 320 × 320 per slice, using a slicing procedure. Furthermore, the concat procedure was applied to connect the four portions in-depth, with the size of the output feature map being 12 × 320 × 320, and then through the convolutional layer composed of 32 convolution kernels, the output feature map with a size of 32 × 320 × 320 was produced. The outcomes were output into the next layer to conclude through the BN layer (batch normalisation) and the Hardswish activation functions. BottleneckCSP module is the third layer of the backbone network, which is intended to extract the deep features of the image better. The BottleneckCSP is primarily composed of a Bottleneck module, a residual network architecture that joins a convolutional layer (Conv2d + BN + Hardswish activation function) with a convolution kernel size of 1 × 1 and kernel size of 3 × 3. The final output of the Bottleneck module is the addition of the output of this part and the initial input through the residual structure. BottleneckCSP module initial input is input into two divisions, and the volume of channels of feature maps is halved through the convolution operation in two divisions. Simultaneously, through the Conv2d layer and Bottleneck module in branch two, the output feature map of branch one and two are linked in-depth using the concat operation. Finally, the output feature map of the module was achieved after passing through the Conv2d layer and Batch Normalisation (BN) layer sequentially, and the size of this feature map and input of the BottleneckCSP module is the same.
The SPP module (spatial pyramid pooling) is the ninth layer of the Backbone network, which is intended to recover the receptive field of the network by converting any size of the feature map into a fixed-size feature vector. The size of the input feature map of the SPP module belonged to YOLOv5s is 512 × 20 × 20. Initially, the feature map with a size of 256 × 20 × 20 is output after a pass through the convolutional layer; the convolution kernel size is 1 × 1. Formerly, this feature map and the output feature map that are subsampled through three parallel maxpooling layers are connected in-depth. The size of the output feature map is 1024 × 20 × 20. Lastly, the final output feature map with a 512 × 20 × 20 is obtained after a pass through the convolutional layer with a 512 convolution kernel. The neck network is a series of feature aggregation layers of mixed and combined image features, primarily utilised to generate FPN (feature pyramid networks). Then the output feature map is conveyed to the detect network (prediction network). Meanwhile, the feature extractor of this network adopts a new FPN structure, which improves the bottom-up path, the transmission of low-level features and the recognition of objects with different scales. Therefore, the same target object with different sizes and scales can be precisely identified.
The recognition network is primarily utilised for the final recognition part of the model, which relates anchor boxes on the feature map output from the previous layer. It outputs a vector with the category probability of the target object, the object score and the position of the bounding box surrounding the object. The recognition network of YOLOv5s architecture comprises three detect layers, whose input is a feature map with dimensions of 80 × 80, 40 × 40 and 20 × 20 correspondingly, utilised to identify the image objects of different sizes. Each detect layer finally outputs a 21-channel vector ((2 classes + 1 class probability + 4 surrounding box position coordinates) × 3 anchor boxes). Then the expected bounding boxes and class of the targets in the original image were produced and labelled, applying the recognition of the leaves in the image.
YOLOv5s model was trained from scratch with default hyperparameters, and 100 epochs were used with image size 416 × 416, batch size 32. First, the output of the trained model is stored into a YOLOv5 format file, then this text file and annotation stored in files were stored in a CSV file. Then using the python code, the annotations of the leaves were cropped and stored in a folder in jpg image format.
3.5. Potato Leaf Disease Detection Using Convolutional Neural Network (PDDCNN)
Applications related to deep learning (DL) have emerged with the technological advancement in efficient computational devices, such as Graphics Processing Unit (GPU). The concept of DL was motivated by the conventional artificial neural network. In deep learning, CNN played a vital role in which many preprocessing layers were stacked to extract the essential features. These features were fed into fully connected layers for final decision. DL models had massively developed after Krizhevsky et al. [
42] achieved tremendous image classification accuracy on CNN in 2012. Since then, CNN had applied in many DL applications, i.e., pattern recognition, image classification, object detection, voice recognition, etc. [
43,
44].
Figure 5 exhibited the architecture of the proposed PDDCNN used to classify the potato leaf disease along with healthy leaves. The model consisted of three convolutional layers, where each layer was followed by Rectified Linear Unit (ReLu) and max-pooling layers. It used the flatten layer to convert the convolved matrix into a 1D array. After flattening, the model used four dense or fully connected layers. First, three fully connected layers used the activation function ReLu. The last fully connected layer, or the output layer, used the activation function Softmax because it was a multiclass model. In this research, we used Adam optimiser and categorial_cross_entropy loss function.
In the convolutional process, the input volume was convolved with the weights. The convolved matrix might be shrunk or expanded depending on the stride and padding. The convolutional process reduced the spatial height and width, but depth was increasing. Each convolutional layer applied the ReLu nonlinear action function, which converted the negative values to zero and reduced the vanishing gradient probability. It used pooling to reduce the computational cost and spatial size. Max pooling was applied to down-sample the images, which reduced the overfitting and improved the activation function’s performance, thus improving the convergence speed. A fully connected or dense layer was the final output layer responsible for predicting a potato leaf image class.
Further details of the PDDCNN are given below:
The sequential model used a series of layers to extract the input image’s essential features for further processing.
The first convolution layer with input image shape was 256 × 256 × 3, 16 filters, kernel size 3 × 3, padding with stride 1 and activation was ReLu.
Image size was reduced using the max-pooling layer with pool size (2,2) after the first convolutional layer.
The second convolutional layer used 32 filters with kernel size 3 × 3, stride value 1 and the ReLu nonlinear activation function.
After the second convolutional layer, max-pooling was applied with pool size (2,2).
Then third and last convolutional layer used 64 filters with kernel size 3 × 3. The stride was 1 and, again, used the padding and activation function ReLu.
Then, converted the convolved matrix into a 1D vector using the flatten layer.
It used the four hidden or fully connected layers for classification or decision-making based on generated features.
The first fully connected layer or dense or hidden layer was used with 512 neurons, followed by ReLu activation functions.
The second fully connected layer or dense or hidden layer was used with 256 neurons, followed by ReLu activation functions.
It used 128 neurons and the ReLu activation function as a third hidden layer.
The output neurons always depended on the number of classes. The current research was a multi-classification problem with three classes; therefore, the last hidden or output layer used three neurons and a softmax activation function.
The overall accuracy of the model was evaluated by predicting the class label in the output layer.
The configuration details and various parameters of the proposed PDDCNN are given in
Table 5.
3.6. Evaluation Measures
3.6.1. Classification Accuracy
Classification accuracy is calculated by the number of correct predictions divided by the total number of accurate predictions.
3.6.2. Precision
There are numerous cases in which classification accuracy is not a significant pointer to measure the model’s performance. One of these scenarios is when class dissemination is imbalanced. If you anticipate all samples as the top class, you will get a high accuracy rate, which does not make sense (since the model is not learning anything, and it is fair foreseeing everything as the best class. Subsequently, precision describes the inconsistency you find when using the same instrument; you repeatedly measure the same part. Precision is one of such measures, which is characterised as:
3.6.3. Recall
The recall is another critical metric, characterised as the division of input samples from a class accurately anticipated by the model. The recall is calculated as:
3.6.4. F1 Score
One well-known metric that combines precision and recall is called the F1-score, which is defined as:
3.6.5. ROC Curve
The receiver operating characteristic curve (ROC) is a plot that appears as the execution of a classifier as work of its cutoff limit. It seems the TPR against the FPR for different limit values. ROC curve could be a well-known curve to demonstrate performance and choose an excellent model’s excellent cutoff threshold.
5. Conclusions and Future Work
Deep learning techniques perform significantly in plant leaf disease detection to improve crop productivity and quality by controlling the biotic variables that cause severe crop yield losses. In this study, a fast and straightforward multi-level deep learning model for potato leaf disease recognition was proposed to classify the potato leaves diseases. It extracted the potato leaves from the potato plant image at the first level using the YOLOv5 image segmentation technique, then developed a novel potato leaf disease detection convolutional neural network (PDDCNN) at the second level to classify early blight and late blight potato diseases from potato leaf images. At the same time, it considered the effect of the environmental factors on potato leaf diseases. The proposed PDDCNN method performed significantly well on the potato leaf images collected from Central Punjab, Pakistan. Experimental studies were conducted on two different datasets, PlantVillage and PLD, with and without augmentation. The performance of the proposed PDDCNN techniques was also evaluated in the cross dataset, where the proposed approach outperformed the other methods. The proposed technique’s performance was compared with the state-of-the-art techniques, and existing studies were used for potato leaf disease detection. The state-of-the-art techniques and existing techniques had a high false rate in detecting the potato leaf disease on the PLD dataset, which strengthened the effect of environmental factors and disease symptoms variation in the PlantVillage dataset and PLD dataset. The proposed method was trained on the PLD dataset with and without data augmentation techniques, thus achieving 99.75% accuracy, high precision, recall, F1-score and roc curve on the PLD dataset. It had a minimal number of parameters and was simpler than the state-of-the-art methods, saving a substantial computational cost and speed.
In future, such research would be extended to multiple diseases detection on a single leaf and to localise the diseases, disease severity estimation, enhance the PLD dataset, develop IoT-based real-time monitoring system, develop a website and launch a mobile application.