1. Introduction
Rice is one of the most important rations of the people in China. In the past millennium, rice has changed from an indispensable food source to a source of life in promoting national economic development and ensuring national survival and development. China has a huge demand for rice, and there is a need to minimize the risk of disease in the supply chain. The occurrence of rice diseases and pests has always been a key factor in reducing output [
1,
2]. According to statistics, the occurrence of rice bacterial blight will reduce the yield by 20–30%. In serious cases, it may lead to a no-grain harvest. Allowing rice diseases to flourish will have a bad impact on China’s food security, and even national health and the national economy. In rice planting, accurate identification of diseases, successful disease control, and early treatment of diseases and pests play decisive roles in the final yield of grain. The most commonly used identification methods of rice diseases are mainly divided into two types: one is to obtain the results through the identification of disease samples by experts, and the other is to diagnose by the growers themselves through plant disease diagnosis or consulting relevant books. However, there are certain difficulties in rice disease identification. Human judgment requires high theoretical knowledge and practical experience. This method is not only time-consuming and labor-consuming, but also prone to misjudgment. Therefore, we should focus on the research of efficient and high accuracy automatic identification methods. Deep learning technology can simplify the steps of identification, improve the accuracy and efficiency of disease identification, and provide great help for the precise treatment and subsequent prevention and control of diseases.
In the field of crop disease image recognition based on deep learning, A Fuentes and other authors used tomato disease and insect pest data as experimental samples. The analysis showed that the recognition accuracy of faster region-based convolutional neural network (R-CNN) combined with VGG-16 was the highest [
3]. BARANWAL S and other authors designed a six-layer apple leaf disease identification structure based on the LeNet model, selected the DenseNet and perception modules that were pre-trained on ImageNet, and used them in the network. Compared with other latest methods, this method has higher performance. In the public data set, its average prediction accuracy is not less than 98.54% [
4]. Priyadharshini R A and other authors proposed a disease classification based on LeNet architecture, including a Bayesian classifier and support vector machine classifier. In their research, 68.1% accuracy was obtained through the combination of 10 different training and test data sets. The LeNet optimization model increases the probability of successful identification of diseases by 30%, greatly improving the effectiveness of inspection. There are also many scholars in China who pay attention to the use of deep learning methods in agriculture. Most of the existing studies focus on the image recognition of magnified images on crop leaves [
5]. Li Jing and others established a deep learning model for identifying 14 crop varieties and 26 crop diseases, explored the impact of resolution and iteration times, and achieved 94.1% accuracy in the test set of tobacco diseases [
6]. Sun Jun and others propose an optimized recognition model, which also uses the convolutional neural network structure, but is upgraded and transformed one-by-one in terms of model parameters and training process. Multiple convolution and rectified linear unit (ReLU) layers are used to process images to locate features. Different pooling layers with various filters are used to identify specific parts of the image, and flatten and feed the combined element map to the fully connected layer to obtain the final output. The recognition accuracy of many common diseases is close to 100% [
7]. Xu Dong and others used a convolution neural network to verify images of soybeans. Compared with traditional models and other classifiers (such as support vector machines), the strategy of evaluating sparse data does not require high technical efficiency, nor does it require the model to have larger data or deeper network saturation, so as to obtain better results. The recognition rate of soybean disease has reached 96.7% [
8]. Yang Jindan and others identified and sorted out the model framework of rice leaf disease according to the plant virus leaf powdery mildew, mainly including three kinds of rice leaf diseases, namely, bacterial leaf blight, brown spot, and leaf smut. In order to extract features accurately, the AlexNet model of deep convolution neural network is used, and the accuracy rate is 98.61% [
9]. The main research objectives of Lin Zhongqi and others are wheat leaf diseases. The research method magnifies the differences of different leaves through image recognition, and integrates the difference features into local support vector machines, while conducting research in combination with a CNN network [
10]. Guo Xiaoqing and others proposed a multi-scale AlexNet structure that optimizes the image diagnosis model [
11].
To sum up, there are many technical methods for plant disease identification, but there are still some problems in some aspects: (1) complex characteristic parameters and poor generalization of the model. As plant disease recognition involves many characteristic parameters, such as disease shape, color, disease spot size, and so on, the calculation process is very complex. For a variety of feature types, we need to choose a variety of suitable extraction methods, which makes the experimental process more difficult. (2) The requirement of a neural network model is high, and the experimental effect is poor. With the continuous development of neural network models, the number of network layers is increasing, and the number of convolution cores is increasing, so we need to use a large number of data samples for practice. However, some experiments have a small number of samples due to environmental reasons, resulting in a poor recognition effect in a deep convolution network structure.
In this study, a convolutional neural network is used to recognize rice disease images, and a database containing 2000 images of three common diseases of rice (rice blast, rice false smut, and white leaf blight) is established. The problem of the multi-sample demand of complex models is solved, and a new network model based on deep learning is built to realize parameter initialization design. The accuracy of the rice disease identification model built at the beginning does not meet the practical requirements. In order to carry out in-depth upgrading and transformation of the model, this experiment increases the entry point of analysis and research, integrating the four parameters of iteration times, batch size, learning rate, and optimization algorithm so as to strive to optimize the test results. In this study, the confusion matrix is selected as the evaluation standard, and experimental results with more objectivity and reference value are obtained through the horizontal comparison of VGG and ResNet, two highly referential network models. The results show that the recognition accuracy of the optimized model is 98.64%, which achieves the goal of accurately identifying diseases. This study will improve the accuracy and efficiency of rice disease identification through the improvement of a convolutional neural network, and provide assistance for the precise treatment and subsequent prevention and control of rice diseases.
2. Materials and Methods
2.1. Dataset
Based on the complexity of the current convolutional neural network, too little sample data cause the model to over-fit and the training ends, so a large number of learning and training pictures are required. The experimental data are image samples obtained from the website of the national plant pathology Association of America (URL:
https://www.kaggle.com/datasets), open-source image samples obtained on the website, and image samples taken in actual rice fields. It mainly includes rice blast, rice false smut, and rice bacterial blight. The tools for obtaining data in actual rice fields are HD cameras and smartphones. The original data obtained include 555 images of rice blast, 646 of rice false smut, and 345 of bacterial leaf blight, as shown in
Figure 1. Through the data enhancement methods of clipping, multi-angle rotation, and vertical image, the sample library of disease images was expanded, and, finally, 2000 images of the three diseases were obtained.
2.1.1. Image Normalization
In the process of deep learning, image preprocessing is very important. Some of the data in the experiment were taken from the internet, and many factors led to the different states. This paper uses bicubic interpolation to adjust the size of the image to match the model of 224 × 224× input size of 3. Assuming the size of cropped image A is m × n. The size of image B with equal scaling is m × n. Based on the size ratio of the image, the corresponding coordinates of B (x, y) on A can be obtained, and the 16 points closest to the pixel point (x, y) are used as parameters to calculate the pixel value B (x, y) of the target image. The calculation of the entire process is shown in Formula (1).
2.1.2. Image Enhancement
The number of sample sets affects the recognition performance of the convolution training model. When there are few samples, the training model has an overfitting problem [
12,
13]. To improve the generalization ability of the model, the amount of data and the amount of data changes should be increased as much as possible. Common data enhancement methods include cropping, rotation, mirroring, adding noise, color jitter, and other methods [
14]. The data augmentation methods used in this experiment include cropping, multi angle rotation, and vertical mirroring. The results are shown in
Figure 2. By expanding the disease image sample database, 2000 images of each of these three diseases were ultimately obtained. The expanded data are divided into training sets, validation sets, and testing sets at 6:3:1, namely, 3600 training sets, 1800 validation sets, and 600 testing sets.
2.2. Models
In recent years, people’s ability to utilize computation and process problems has gradually improved, and the complexity of problems that can be solved has also increased. The number of times that neural network models are used to process problems has also gradually increased [
15,
16]. Different network models have been established to solve different problems, such as the VGG model, RNN model, and ResNet model. However, due to the limitations of RNN, such as difficulty in training [
17], it is not consistent with the model construction in this article, Therefore, the concept of model design in this article refers to the VGG model and ResNet model
The VGG network architecture is mainly composed of four parts: convolution, ReLU activation function, maximum pooling layer, and full connection layer. The input images are processed through 13 convolution layers, 5 pooling layers, and the full connection layer, and finally the images are classified.
In the ResNet model, the input picture is not activated by the activation function immediately after passing through any convolution layer, but first passes through the standardization layer and then arrives at the activation function, which is composed of a convolution layer, standardization layer, and an activation layer.
By analyzing the above two models, a neural network model can be constructed to identify rice diseases. However, due to the existence of gradient problems, the design of the network model structure needs to be optimized. Therefore, we designed our own network model, and its structure diagram is shown in
Figure 3.
The improved network model in this article is based on VGG and ResNet, consisting of 5 convolutional layers, 5 pooling layers, and an activation layer. The pooling layers and convolutional layers are interleaved.
2.2.1. Convolution Layer Design
The convolutional layer design in this article mainly refers to the VGG network. The convolution kernel defines the size range of convolution, which represents the size of the receptive field in the network. The most common two-dimensional convolution kernel is 3 × the convolutional kernel of 3. In general, the larger the convolution kernel, the larger the receptive field, the more picture information you see, and the better the global features you can obtain. However, large convolutional kernels can lead to a significant increase in computational complexity, a decrease in computational performance, and the inability to accurately capture disease information. Therefore, this article has chosen a relatively suitable convolutional kernel size of 3 × 3.In this experiment, five convolutional layers were designed, namely, C1, C2, C3, C4, and C5. Different sizes of convolutional kernels were used to extract lesions, and a zero-padding method was also needed to ensure consistency in image size. The details are as follows:
First, we used 3 for all layers from C1 to C5. The size 3 convolution checked the input image for convolution. The sliding step size of all convolution layers was set to 1, and the filling method was also set to the same. After 5 convolutions, the number of feature maps obtained after each convolution is 64128256512 and 512, respectively, from C1 to C5.
2.2.2. Pool Layer Design
In model design, a pooling layer is usually set after the convolution operation for dimensionality reduction. The pooling method selected in this article is maximum pooling.
This experiment designs 5 pooling layers, namely, P1, P2, P3, P4, and P5. The filter sampling size was set to 2 × 2. We selected 2 for the sliding step size without filling, and the result is as follows:
The P1 layer mainly down-samples the features processed by the C2 layer. Due to the zero filling of the convolutional layers, the input feature map size of the P1 layer is 224 × 224, P1 layer output size is (224 − 2)/2 + 1 = 112;
The output sizes from the P2 layer to the P5 layer are (112 − 2)/2 + 1 = 56, (56 − 2)/2 + 1 = 28, (28 − 2)/2 + 1 = 14, and (14 − 2)/2 + 1 = 7, respectively.
2.2.3. Full Connection Layer Design
The main function of the fully connected layer is to achieve classification [
18]. There are relatively many parameters in the fully connected layer. Therefore, this paper also draws on the idea of three connection layers.
The number of nodes in the fully connected layer is usually set to 2n (n=1, 2, 3 … N). Therefore, for the first two fully connected layers, the values of 4096 and 1024 are selected. The third layer is different from the first two layers, and the number of nodes is 3.
2.3. Model Training
2.3.1. Solution of Over-Fitting Problem
Over-fitting refers to over-modeling of training data [
19]. When a model levels (ML) model relies heavily on training data, the over-fitting phenomenon usually occurs. In order to effectively avoid the occurrence of over-fitting, this experiment optimizes the full connection layer by adding regularization.
In mathematics, statistics, finance, and computer science, regularization is used to solve the ill-conditioned problem of information processing or prevent over-fitting. The additional term controls the function of excessive fluctuation so that the coefficient does not take the extreme value. L
1 regularization technology (lasso regression) and L
2 (ridge regression) regularization and their mathematical expressions are shown in (2) and (3), respectively.
where y
i represents the target value, f (x
i) is the estimated value, the L1 parameter penalty term calculates the sum of the absolute difference between the target value y
i and the estimated value f (x
i), and the L
2 parameter penalty term is the sum of the squares of the difference between the target value and the estimated value, resulting in different regularization and different penalty effects. L
2 is different from L
1. One difference is that there is only one best prediction line, and the other may have multiple optimal solutions. If the regular term is equal to 0, the minimum value is obtained. However, if the regular term is very large, it increases to too much weight, which leads to insufficient fitting. L
2 regularization can avoid the over-fitting problem.
2.3.2. Cross-Entropy Loss Calculation
After the model is built, it is necessary to check whether its performance can guarantee the accuracy of the calculation results. Generally, the performance of the model is expressed by the error, that is, the difference between the predicted value and the true value. The difference is the result of the calculation using the loss function. The greater the error value, the worse the performance of the model. The model needs to be improved, to better the performance of the model. The loss function is usually softmax cross-entropy loss function. Softmax output probability distribution softmax function, also known as the exponential normalization function, is a normalized form of logistic function. In general, when encountering problems such as the uncertainty of the output value range, which makes it difficult to intuitively determine the meaning of these values, and the difficulty in measuring the error between these discrete values and the output value of the uncertain range, due to the fact that the real label is a discrete value, softmax normalization processing is used to solve the problem. The softmax function can compress a k-dimensional real vector into a real vector in the range [0–1], as shown in Formula (4):
Among them, represents the output probability, K represents the total number of categories, and represents the output of the previous output unit.
Formula (4) shows that the output of the original neural network is used as confidence to generate new outputs, and the new outputs meet all the requirements of the probability distribution, transforming the output of the neural network into a probability distribution.
After the probability distribution is obtained from the softmax function output, the distance between the predicted probability distribution and the probability distribution of the real answer can be calculated through cross-entropy, which is the error. The calculation expression of cross entropy is shown in Formula (5).
wherein
represents the cross-entropy between the predicted value and the real value, y represents the output predicted value, and t represents the output real value.
2.4. Development Environment
This experiment uses pytorch’s GPU version as the platform for convolutional neural network model construction and training. The hardware environment adopts Intel Xeon e5-2680 V4 CPU, a Samsung SSD 860 512 G hard disk, Kingston DDR4 64GB memory, and GPU adopts a NVIDIA Titan XP graphics card, 12 GB video memory. The operating system adopts Windows 10, and python3.6 language is used to write programs in the integrated development environment PyCharm.
2.5. Evaluation Indices
In this study, accuracy, precision, recall, and F1 score are selected to evaluate the model. The formulae are shown in (6)–(9).
Among them, true positive (TP): the predicted value matches the actual value; the actual value is positive, and the model prediction is positive. True negative (TN): the predicted value matches the actual value; the actual value is negative, and the model prediction is negative. False positive (FP)—type 1 error: the predicted value is incorrectly predicted; the actual value is negative, but the model prediction is positive, also known as type 1 error. False negative (FN)—type 2 error: the predicted value is incorrectly predicted; the actual value is positive, but the model prediction is negative, also known as type 2 error.
4. Conclusions
This study designs a rice disease recognition model based on a convolutional neural network. In order to reduce the risk of over-fitting from the parameters of the model, L2 regularization is used in this experiment, and the loss of the model is calculated by the cross-entropy loss function. In order to reduce the loss, the Adam algorithm is used. The first is to analyze the different effects of the number of iterations on the results. The optimal number of iterations is 800, and the accuracy of the training set of the initial model is 90.32%. Secondly, by comparing the batch size, optimization algorithm, and learning speed, further optimization experiments are carried out. The final results show that when the learning rate of identifying rice diseases is set to 0.001, the number of batches is 64, the maximum accuracy of the model is 98.64%.
In this paper, the images of three diseases are analyzed, and good results are achieved by improving the parameter model, so that the recognition accuracy of the three diseases reaches 98.64%. Even though the recognition rate has made great progress, there are still many aspects that need further research, such as expanding the rice disease data set, preprocessing image methods, and optimizing the neural network model.