1. Introduction
The distribution of crop diseases can affect the economy badly. The manual diagnoses of crop diseases are time-consuming and risk errors. Digital revolution is reinventing agriculture, which integrates advanced technologies, digital tools, information, and communication technologies to enhance the opportunities for agriculture improvement and performance [
1]. Digital agriculture is currently emerging as a consequence of several technological developments in artificial intelligence [
2], remote sensing [
3], and robotic systems [
4]. Such systems allow farmers to provide broad, precise, and accessible traditional agricultural products at the national and regional levels, and boost yield and quality while limiting environmental impact. It can also provide ease to the farmers for detecting plant disease [
5,
6,
7], pests [
8,
9] and weeds [
10].
Ginger is a medicinal herb that is commonly used in Pakistan and across the world to treat a broad range of disorders such as rheumatism, arthritis, sprains, muscular aches, and pains [
11]. However, ginger is prone to various kinds of diseases such as bacterial [
12] and fungal [
13]. It is also affected by different pests such as leafhopper, Chinese rose beetle, ants, and caterpillar. In scientific literature, several techniques have been proposed to tackle the complex challenges in agriculture, such as decision support systems [
14], plant disease detection [
15], and other artificial intelligence-based techniques. Deep learning has shown most promising results for agricultural image processing such as plant disease detection, pesticides detection, plant type classification, etc. For instance, the study [
16] proposed the detection of fusarium head blight disease in wheat crops. Here, deep convolutional neural network (CNN) and image processing techniques are employed to detect the diseased part of wheat leaf images. The authors in [
17] exploit Bayesian deep learning for approximating the probability density for crop disease detection problems. Another deep CNN-based work [
18] suggests deploying a pre-trained model learned from usual massive datasets, and transferring it into a specific task trained with their data, like VGGNet and ImageNet.
An automated wheat disease diagnostic system [
19] implementable on mobile devices to conduct a real-time diagnosis is based on deep learning and multiple instance learning (MIL). Their method uses four deep learning models, VGG-FCN-VD16, VGG-FCN-S, VGG-CNN-S, VGG-CNN-VD16, and are implemented on the leaf images dataset. The accuracies of VGG-CNN-VD16 and VGG-CNN-S are 73.00%, and 93.27%, respectively. However, the suggested model cannot detect the last stage of disease of the plant. Authors of [
20] applied the neural network, support vector machine, and fuzzy classifier for plant disease detection problems. They suggested that there is a need to work on diseases stage identification and quantification, real-world applications, and the reliability of a fully automatic system in agricultural sector. Ref. [
16] proposes the detection of the fusarium head blight, a wheat crop disease. They developed a deep convolutional neural network (DCNN) capable of extracting distinct wheat stems from a single image with a complicated environment. They also suggest a new method for identifying fusarium head blight infected regions in each spike. In training, the model accurately detects the crop’s diseased part, and the mean average precision is 0.9201. The results are better than k-means and Otsu’s methods. However, this model requires a large dataset to detect the diseased part more accurately.
A novel plant leaf disease detection model bases on deep CNN is proposed in [
21]. Transfer learning and deep CNN are used for the leaf disease detection problem. The deep CNN model could accurately differentiate 38 different groups of diseased and healthy plants using leaf images with 96.46% accuracy. Authors of [
22] used texture-based segmentation and simpler linear iterative clustering (SLIC) to capture and recognize the diseases and pests at early stages in corn crops. lassification is done through binary support vector machine (BSVM) and multi-class support vector machine (MSVM). The accuracy achieved for pest detection is 52%, which can be extendable. Authors in [
23] used ResNet-101, VGG-16, and ResNet-50 can see blurred images, and yolov3 for pest and disease detection in rice crop, and detected blurred boundaries and irregular shapes. However, the model showed poor performance for fewer features present in the image frames.
The study presented in [
24] suggested the use of image acquisition, image pre-processing, image segmentation, feature extraction, and classification techniques for the ginger plant disease detection problem. The system is linked with a digital/web camera, allowing farmers to take images of plant leaves. The collected images are processed using image processing techniques to identify diseases symptoms, disease type and notify the farmers about disease type through global system for mobile communications (GSM) interface. Then, relay turns on the pump installed in the device to release medicine to the infected plant according to the infected disease. However, the study does not consider a standard dataset of the ginger plant leaf images, and some diseases of ginger plants and pests are not discussed. A summary of literature review is provided in
Table 1.
From our literature review, we have found that the ginger plant diseases, deficiency nutrients, and pest patterns are not so much studied and researched. This area of research needs considerable research efforts for the ginger plant disease, pest pattern, and deficiency nutrients detection problem at early and multiple stages. It is necessary to comprehend diseases, nutrient deficiency, and pests at early and multiple phases and recommend the treatment of the causative agents that contribute to causing the ginger plant diseases. To the best of our knowledge, the deep learning approaches are not used so far for ginger plant disease, deficiency nutrients, and pest pattern detection. Furthermore, there is no publicly available dataset of ginger plant images necessary to test the working of the available deep learning techniques for the problem of interest.
This study focuses on making an autonomous system that detects ginger plant diseases, pest patterns, and deficiency nutrients through deep artificial neural network and learning techniques, namely VGG-16, CNN and MobileNetV2 in real-time circumstances. The study also involves developing a large-scale ginger plant dataset based on different stages. We present the classification of various diseases and nutrient deficiencies, and investigate the pest patterns in the leaf images. In addition, we exhibit the performance and ability of the model to predict diseases with high accuracy. This study hopes to present the first step towards deep learning-based ginger disease, pest pattern, and deficiency nutrients detection. This research study presents the following key contributions:
To develop a standard dataset of the ginger plant leaf images at early stages and multiple stages.
To classify pest pattern, nutrients deficiency, soft rot disease from the ginger plant leaf images and rhizome.
To apply advance deep learning-based methods and perform a comparative analysis to identify which model works well.
The rest of the paper is structured as follows:
Section 2 explains materials and methods followed by
Section 3 that exhibits results and discussion, and finally
Section 4 concludes the article.
2. Materials and Methods
In general, the work starts from field data collection as shown in
Figure 1. We collected the ginger plant leaf images, both healthy and effected at multiple stages (early and later stages). After data collection, the next step is to perform the image augmentation, including rotation, re-scaling, zooming, horizontal flip, width shift, and height shift. Further, we performed image processing steps renaming and resizing, and new sample images were generated to enrich the dataset. Next, the samples data is labeled with the help of expert knowledge. The labeled sample data is then used during the training phase of the selected deep learning algorithm. Finally, we identify and classify the understudy problems, i.e., disease, pest, and deficiency detection, and the results are compared and evaluated to identify the appropriate algorithm for a given situation.In the following, we describe each phase one by one.
2.1. Image Acquisition and Description
A total of 4396 images are acquired from the orchard of PMAS-Arid Agriculture University Rawalpindi. All images are taken from Infinix Hot 9 Mobile, it has 720 × 1600 pixels resolution and 16 MegaPixels camera. The size of this camera is 6 to 7 MB per image. For further processing, image acquisition acquire images from an external source. It is an essential step as the system’s performance is highly dependent on the captured images used for training the model. All images of the ginger plant leaf are collected from the orchard of PMAS Arid Agriculture University Rawalpindi as shown in
Figure 2 and the market. Each plant is sown at a distance of 50 cm. Images are collected with heterogeneous background as critical factor in collecting real field images, although most of the publicly available dataset contains simple background. This makes our model to react to changes in the real-time environment.
There are 1801 images of deficient nutrients and healthy plants. 1440 images are used for training and 361 images for testing with the same number of classes. Similarly, there are 2275 images of pest patterns and healthy plants of which 1820 images are used for training, and 455 for testing. ginger plant soft rot diseases consist of 320 images of which 256 and 64 images are used for training and testing, respectively. ginger plant leaf images of pest patterns and healthy and deficient nutrients are gathered in multiple stages. Images of soft rot disease ginger are collected at the last stage of ginger rhizome. A detailed description of dataset distribution is provided in
Table 2.
2.2. Data Augmentation and Processing
Data augmentation is a method of creating new training data from previous training data. We apply domain-specific techniques to samples from the training data to generate unique and distinct training instances. In this study, we augment the images by rescaling, rotating the images, changing the width and height shifts, zooming the images, and doing the horizontal flip. The obtained results are shown in
Figure 3.
All the images are renamed by python code, resized by the cv2 library, and converted into RGB images for further data processing. The dimensions of the images are (150 × 150 × 3), height and width are 150 and 150, and 3 represents RGB channel (Red, Green, Blue).
2.3. Classification
This step trains images of ginger plant disease, pest patterns, and deficiency nutrients. We use data for training and the remaining 20% data for testing—a detailed description of the deep learning algorithms is provided in the following subsections.
2.3.1. ANN Model
Artificial Neural Network (ANN) is very powerful tool for non-linear statistical modelling. The model is multi-layer, fully connected neural networks. They are made up of an input layer, many hidden layers, and an output layer. Every node in one layer is linked to every other node in the following layer.
Artificial neural network receives input and computes the weighted total of the inputs, as well as the bias. This calculation is represented by a transfer function.
Whereas
W represents weights,
X as inputs and
b is represented as bias. The model is sequential because all the layers of the model are in sequence. Relu activation function and a dropout value of 0.2 are used, which reduces the overfitting of the model in this research. Sigmoid activation function is used in the last layer for classification as given by the equation.
A flatten layer is used that allows transforming a vector of 2D matrixes into a suitable format for the dense layer to comprehend. A Dense layer that is intimately linked to the layer before it, implying that the layer’s neurons are connected to every neuron of its preceding layer. This is the most often used layer. The hyper parameters used for ANN are given in
Table 3.
2.3.2. CNN Model
CNN is widely employed in the field of study. Image is represented by a three-dimensional matrix is presented to CNN. Then, the convolutional layer extracts the characteristics of the image. Convolutional layer also includes ReLU activation, which reduces all negative values to zero. After convolution, the pooling layer is utilized to minimize the spatial volume of the input image. Then max pooling is used to minimize the spatial volume of the input image, and the 4 × 4 dimension input has been reduced to 2 × 2 dimensions. Then there is a fully connected layer, and the last is the logistic layer. The label, which is one-hot encoded, is contained in the output layer. A sequential model is used with Relu activation function. Dropout rate is 0.2 to reduce the over-fitting of the algorithm and sigmoid is used in the last layer. The hyper paramters used for CNN are given in
Table 4.
2.3.3. VGG-16 Model
VGG-16 algorithm is used in various deep learning-based applications, and it is an easy and fast algorithm due to its quick implementation. The VGG-16 model is a simple algorithm and suitable for image classification.
Figure 4 depicts the architecture diagram of the VGG-16 algorithm.
The convolutions are a fixed size of 150 × 150 RGB (Red Green Blue) images during training. The pre-processing performed here removes the typical RGB value computed on the training phase-out of each pixel. The image is processed using a stack of convolutional layers, which employ filters with a small field of 3 × 3. It is more complex and has non-linear effects but has fewer parameters. In one of the settings, 1 × 1 convolution filters are used, which may be thought of as a linear modification of the input channels. For 3 × 3 convolution layers, the convolution phase and spatial padding of convolution input are kept to 1 pixel, ensuring that the spatial resolution is retrained after convolution. Spatial pooling is helped by five max-pooling layers that follow part of the convolution layers. Max pooling is done with stride 2 across a 2 × 2-pixel frame. After a stack of convolution layers, there are three fully connected (FC) layers. The first two each have 4096 channels, whereas the third uses 1000 way ILSVRC classification and hence has 1000 channels, one for each class. The Softmax layer is the last layer [
29]. However, because of binary classification study, the last layer used here is the sigmoid layer. In all networks, the configuration of the ultimately linked layers is the same. The hyper parameters used for VGG-16 are given in
Table 5.
The first layer used the activation function rectified linear activation function (Relu). Relu is the most widely used activation function used in CNN and deep learning. It can be calculated as follows:
Relu activation function is added to each layer, so all the negative values are not passed to the next layer. Flatten layer is used that passes the data to the dropout layer. Then Dropout layer is added to the algorithm that can overcome the over-fitting of the algorithm. Finally, the last layer is Sigmoid.
2.3.4. MobileNetV2 Model
MobileNetV2 is a 53 layer deep neural network. A pre-trained version of the network trained on over a million images of the imagined database may be loaded. The pretrained model can categorise images into 1000 object categories. It enables real-time classification despite computational restrictions in devices such as mobile phones. This algorithm adds a whole new CNN, the inverted residue and non-linear bottleneck layer, which enables better performance in mobile and embedding vision devices. We have used MobileNetV2 in our research. ReLu, dropout layer, flatten layer, the sigmoid activation function is used in this algorithm. Dropout operates by randomly changing the outward edges of the hidden layers to zero at every training phase iteration. The hyper parameters used for MObileNetV2 are given in
Table 6.
The output of the convolutional layer is flattened to generate a single continuous feature map. It is also linked to the final classification model, which is referred to as a fully connected layer. Furthermore, we combine all of the pixel data into a single line and connect it to the final layer.