Enhancing Panax notoginseng Leaf Disease Classification with Inception-SSNet and Image Generation via Improved Diffusion Model

Wang, Ruoxi; Zhang, Xiaofan; Yang, Qiliang; Lei, Lian; Liang, Jiaping; Yang, Ling

doi:10.3390/agronomy14091982

Open AccessArticle

Enhancing Panax notoginseng Leaf Disease Classification with Inception-SSNet and Image Generation via Improved Diffusion Model

by

Ruoxi Wang

¹,

Xiaofan Zhang

²,

Qiliang Yang

¹,

Lian Lei

³,

Jiaping Liang

¹ and

Ling Yang

^3,*

¹

Faculty of Modern Agricultural Engineering, Kunming University of Science and Technology, Kunming 650500, China

²

College of Water Conservancy and Architecture Engineering, Tarim University, Alar 843300, China

³

Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Agronomy 2024, 14(9), 1982; https://doi.org/10.3390/agronomy14091982

Submission received: 26 July 2024 / Revised: 23 August 2024 / Accepted: 28 August 2024 / Published: 1 September 2024

(This article belongs to the Special Issue The Applications of Deep Learning in Smart Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The rapid and accurate classification of Panax notoginseng leaf diseases is vital for timely disease control and reducing economic losses. Recently, image classification algorithms have shown great promise for plant disease diagnosis, but dataset quantity and quality are crucial. Moreover, classifying P. notoginseng leaf diseases faces severe challenges, including the small features of anthrax and the strong similarity between round spot and melasma diseases. In order to address these problems, we have proposed an ECA-based diffusion model and Inception-SSNet for the classification of the six major P. notoginseng leaf diseases, namely gray mold, powdery mildew, virus infection, anthrax, melasma, and round spot. Specifically, we propose an image generation scheme, in which the lightweight attention mechanism, ECA, is used to capture the dependencies between channels for improving the dataset quantity and quality. To extract disease features more accurately, we developed an Inception-SSNet hybrid model with skip connection, attention feature fusion, and self-calibrated convolutional. These innovative methods enable the model to make better use of local and global information, especially when dealing with diseases with similar features and small targets. The experimental results show that our proposed ECA-based diffusion model FID reaches 42.73, compared with the baseline model, which improved by 74.71%. Further, we tested the classification model using the data set of P. notoginseng leaf disease generation, and the accuracy of 11 mainstream classification models was improved. Our proposed Inception-SSNet classification model achieves an accuracy of 97.04% on the non-generated dataset, which is an improvement of 0.11% compared with the baseline model. On the generated dataset, the accuracy reached 99.44%, which is an improvement of 1.02% compared to the baseline model. This study provides an effective solution for the monitoring of Panax notoginseng diseases.

Keywords:

Panax notoginseng; disease classification; image generation; diffusion model; deep earning; channel attention mechanism

1. Introduction

Panax notoginseng (Burk) F. H. Chen usually grows in cool and humid environments and its roots can grow for more than three years [1]. However, the years-long cultivation period and humid climatic conditions make P. notoginseng susceptible to various soil-borne diseases [2,3]. Widespread disease outbreaks can cause severe economic losses, and even small-scale disease emergence can lead to decreased yields and quality. The frequent occurrence and diversity of diseases of P. notoginseng have seriously affected the sustainable and healthy development of the P. notoginseng industry. Traditional methods of disease identification mainly rely on manual observation and experience, which are slow, laborious, and highly subjective [4,5]. Therefore, to avoid the above problems, it is necessary to develop an intelligent method to identify P. notoginseng diseases.

In recent years, advances in deep learning technologies have provided a potential solution for automated, nondestructive diagnosis of crop diseases. Image classification algorithms have demonstrated performance beyond the human level in specific benchmark tests, where they are able to accurately diagnose and categorize plant diseases early [6]. Many researchers have used deep learning techniques to quickly identify plant diseases at the early stage of symptoms and have achieved great success [7,8]. For example, Amara et al. used the LeNet method to achieve the classification of three states (healthy, black streak wilt, and black spot) of bananas on tens of thousands of banana disease datasets from PlantVillage, with a classification accuracy of 97.57% [9]. Chen et al. proposed a deep learning architecture called INC-VGGN, which successfully achieved 91.83% accuracy on the public dataset PlantVillage [10]. Mohanty et al. trained feature networks using tens of thousands of healthy plant leaf images and publicly available plant disease datasets, and the accuracy of plant disease classification reached 98.21% [11]. The above study provides a wealth of experience to achieve an intelligent classification algorithm for P. notoginseng diseases. Table 1 is a comparison of recent work. However, to capture enough features for the visual tasks, tens of thousands of datasets are used to train the algorithm. At the same time, the number of samples in the training dataset greatly affects the effectiveness of deep learning algorithm training [12].

Although large public databases of plant diseases, such as PlantVillage, Crop/Weed Field Image, MalayaKew, Leafsnap, etc., exist for researchers to use, they do not include P. notoginseng image [17]. In P. notoginseng during cultivation, the probability of all diseases occurring at the same time is very low. Thus, it is a challenge to collect a large and diverse number of P. notoginseng leaf disease images for disease classification. Therefore, to achieve high-precision classification of P. notoginseng leaf diseases, the challenge of collecting P. notoginseng leaf disease images need to be overcome. One effective strategy is data augmentation, which manually expands a dataset by creating different quality and diverse images from existing images of P. notoginseng diseases, thereby enriching the dataset without the need for additional physical data collection [18]. Data augmentation methods such as geometric (e.g., rotation, scaling, translation, flipping) and color transformations [19] can be used to improve model performance, but they tend to produce highly correlated samples, which are unable to learn variable and invariant features between samples in the training data.

It has been found that image classification models trained with generated data can achieve better performance than using only real data [20]. Generative Adversarial Networks (GANs) have gained increasing attention in agriculture due to their ability to generate natural images [21]. For example, in Espejo-Garcia et al. [22], deep convolutional GANs [23] were utilized to generate new images for weed identification. The Xception network was trained using a synthetic weed images dataset and achieved an accuracy of 99.07% [24]. Abbas et al. used a conditional GAN proposed by Mirza and Osindero to generate synthetic images of tomato plant leaves [25,26]. These synthetic images were used to classify ten categories of plant diseases. The classification accuracy was improved by 1–4% compared to the original data. However, GANs faces the challenge of training instability and model collapse [27,28] because it is difficult to achieve a balanced interaction between the generator and the discriminator [29]. Recently, diffusion models have gained popularity for generating high-quality images [30] and have been applied to a variety of fields, such as video generation [31], the medical field [32], and image conversion [33]. Influenced by nonequilibrium thermodynamics [34], the diffusion model progressively adds noise to the data and then uses Markov chains with white noise when generating samples. Recent studies have shown that diffusion models outperform GANs in generating high-fidelity images [28].

In order to realize high-precision classification of P. notoginseng leaf diseases, we improve the dataset generation method and the classification model. Firstly, we propose an attention-based image generation model ADM-ECA, which uses a lightweight ECA attention mechanism to capture channel dependencies to improve the quality and efficiency of the generated model. Specifically, the model was evaluated using six major leaf disease datasets of P. notoginseng, including gray mold, powdery mildew, virus, anthrax, melasma, and round spot. However, experiments demonstrated that the mainstream classification model experiences a loss of detail when extracting high-dimensional features. This results in the inaccurate recognition of anthrax with smaller spots, as well as difficulty in distinguishing round spots and melasma, both of which are diseases with similar features. Therefore, we developed an Inception-SSNet hybrid classification model with skip connections, attention feature fusion, and self-calibrated convolution to better extract high-dimensional features. Using skip connections and attention feature fusion enables the model to effectively combine local and global information to better identify disease features with similar and small targets. At the same time, the self-calibration convolution enhances the ability of the model in feature learning, enabling it to extract and distinguish various disease features more accurately. Inception-SSNet provides competitive results, and the main contributions of this study are as follows:

An image generation scheme is proposed, which uses a lightweight attention mechanism, ECA, to capture the dependencies between channels to improve the quantity and quality of data sets.
To improve the performance of P. notoginseng disease classification, we develop an Inception-SSNet hybrid model with skip connection, attention feature fusion method, and self-calibrated convolutional method to classify six diseases: gray mold, powdery mildew, virus, anthrax, melasma, and round spot.
Experiments show that our proposed model for P. notoginseng leaf disease generation and disease classification achieve competitive performance and consistent performance improvement compared to the baseline.

2. Materials and Methods

2.1. Dataset Collection and Workflow

In this study, the real P. notoginseng leaf disease dataset was collected in Yunnan Province, China, partly from the greenhouse of Kunming University of Science and Technology and partly from the P. notoginseng field in Luxi County, Honghe Prefecture. A total of 2000 images were taken manually with the camera to show the morphology of P. notoginseng leaves under various lighting conditions and different disease expressions. The P. notoginseng leaf disease dataset contained six diseases: gray mold disease, powdery mildew, virus disease, anthrax, melasma, and round spot disease. After screening, cropping, and scaling, a small-sample P. notoginseng leaf disease image dataset was constructed. The sample size is 1530, and the image dimension is 3-channel 256 × 256 pixels, of which some samples are shown in Figure 1. As shown in Figure 1, round spot disease and melasma have very similar colors and features, which can easily lead to misclassification. Anthrax also challenges the classification task due to its small pixel area and complex and diverse shapes compared to other leaf diseases.

After collecting a limited number of real data sets, we use the image generation algorithm to increase the number of our data sets, thereby reducing the potential impact of limited data on the classification accuracy of our model. Subsequently, we also improved the classification model to better identify the types of P. notoginseng leaf diseases. Figure 2 illustrates our research workflow, and the specific research methods will be further elaborated in this paper.

2.2. Efficient Channel Attention-Based Autoregressive Diffusion Model for Image Generation

Denoising diffusion probabilistic models (DDPMs) [30] are widely used in image generation. It continues to break through the best performance record of image generation models and has become a popular research direction in this field [35]. The diffusion model draws on reconstructing images from sampled white noise [36], draws inspiration from thermodynamics, and proposes a different method. As shown in Figure 3, the basic idea of the diffusion model is to gradually destroy the image data by injecting noise and then learn how to reverse this process to generate samples. Even though the technique of using diffusion models to create images is widely used at this stage, these mainstream models still suffer from poor learning effects and poor fidelity of the generated images in P. notoginseng leaf disease generation. Therefore, we take the Autoregressive Diffusion Model (ADM) model, which has the best FID metrics among the mainstream models, as a benchmark, and improve it by introducing the attention mechanism, and propose a new generative model architecture, ADM-ECA. The architecture is modularized to allow different attention mechanisms, and we will describe the architectural principles in detail.

According to the forward diffusion process, reverse diffusion starts with pure Gaussian noise samples, and the denoising operation (Equation (1)) is carried out step by step through the reverse process. Finally, a noise-free image sample can be obtained. However, the probability distribution of the reverse diffusion process cannot be solved directly. Hence, a UNnet network must be trained to approximate the denoising operation of the inverse process (Equation (2)).

p (x_{t - 1} | x_{t}) = N (x_{t - 1}; m (x_{t}, t), \sum (x_{t}, t))

(1)

p_{θ} (x_{t - 1} {| x}_{t}) = N (x_{t - 1} {; μ}_{θ} (x_{t}, t), \sum (x_{t}, t))

(2)

In the above equation,

θ

is the parameter

μ_{θ} (x_{t}, t)

of the UNnet network, the learnable mean, and

Σ_{θ} (x_{t}, t)

is the learnable covariance, and the mean and covariance are parameterized by the UNnet network. To address poor performance in ADM for P. notoginseng leaf disease, we propose a novel U-Net architecture with enhanced feature extraction. Our model, shown in Figure 4, uses a 256 × 256 × 3 input/output with a standard convolution layer and integrates two residual modules and attention mechanisms. This design includes down-sampling and up-sampling stages with skip connections at various feature map sizes, which improves gradient stability and training efficiency. The attention modules, added at smaller feature map sizes, significantly boost feature extraction and pixel correlation capture, enhancing image fidelity.

To solve the problem that SENet only focuses on intra-channel information without considering the importance of information between neighboring channels, Wang et al. proposed a lightweight channel attention mechanism, the ECA (Efficient Channel Attention) attention mechanism, whose structure is shown in Figure 5 [37]. The ECA attention mechanism emphasizes the interactions between channels and captures the information flow between neighboring channels by adaptively selecting a one-dimensional convolution with a convolutional kernel size to obtain more accurate information about channel attention. This design allows longer-range interactions for high-dimensional channels and shorter-range interactions for low-dimensional channels, significantly reducing model complexity while maintaining model performance with almost negligible computational effort.

2.3. Inception-SSNet Model for Panax notoginseng Leaf Disease Classification

For the classification of P. notoginseng leaf diseases, the small features of anthrax and the strong similarity between round spot and melasma diseases are essential reasons for the low classification accuracy. To extract image features more accurately and improve the classification accuracy, we innovatively proposed a hybrid Inception-SSNet P. notoginseng leaf disease image classification based on the Inception-v3 model by combining Skip connection, Attention Feature Fusion (AFF), and Self-Calibrating Convolution (ScConv), and its structure is shown in Figure 6. We will talk about the improved model part in detail.

Feature fusion is ubiquitous in modern network architectures. Most networks are usually implemented by simple operations such as summing or concatenation to combine features from different layers or branches. However, this is not the best choice. Dai et al. proposed an Attentional Feature Fusion (AFF), a general-purpose algorithm that not only features fusion within the same layer of a network but also achieves cross-layer feature fusion and can be used to improve the performance of various networks [38]. This generalized feature fusion algorithm fuses features within the same network layer. It enables cross-layer feature fusion, which can be used to improve the performance of various networks. The core of this method is the Multi-Scale Channel Attention Module (MS-CAM), whose structure is shown in Figure 7. This module can emphasize large objects with a wider distribution while highlighting small objects with a more localized distribution, which helps the network to recognize and detect objects under extreme scale variations. The two feature maps are

X

and

Y

, where the receptive field of

Y

is more significant than that of

X

, the process of AFF can be expressed as Equation (3):

Z = M (X \oplus Y) \otimes X + (1 - M (X \oplus Y)) \otimes Y

(3)

In the formula,

Z

represents the fusion features,

\oplus

represents the matrix corresponding to the elementwise addition operation,

\otimes

represents the matrix corresponding to the elementwise multiplication operation, and

M (X \oplus Y)

represents the attentional weight generated by MS-CAM, which is an actual number between 0 and 1. The structure of the AFF is illustrated in Figure 8, where the dashed arrow line represents

1 - M (X \oplus Y)

.

Much of the research in the field of neural networks has been devoted to designing more complex architectures to enhance their representation learning capabilities. In contrast, Liu et al. proposed Self-Calibrated Convolutions (ScConv), which can improve the basic convolutional feature transformation process of convolutional neural networks without adjusting the model architecture and explicitly extending the field of view of each convolutional layer through internal communication, thus enriching the output features and improving the performance of the network [39]. Communicating between this heterogeneous convolution and the filters effectively expands the perceptual field at each spatial location. The architecture of the self-calibrating convolution is shown in Figure 9.

A ScFeature adjunct is designed based on the self-calibrating convolution technique, which spans all Inception layers in parallel, corresponding to the red branch in Figure 9. It is used to stitch the tensor output from the initial (Stem) network with the feature tensor output from the last Inception layer after further extracting the features and changing the dimensions. The overall architecture of the Inception-SSNet model is shown in Figure 7. The model takes an RGB image with three channels, with a height and width of 299 × 299 as input, and the output is the number of categories for the classification task. For convolutional neural networks, higher dimensional feature representations can be processed more efficiently within the network, so the input image is upscaled using an initialization network before proceeding to the core feature extraction step. The initialization network is a simple convolutional network consisting of one 1 × 1 convolution, four 3 × 3 convolutions, and two 3 × 3 maximal mean pooling sequences, which is mainly used to perform initial feature extraction and dimensionality transformation of the input 3 × 299 × 299 dimensional RGB image into 192 × 35 × 35 higher dimensional feature information. The core part of the model is two feature extraction paths, the Inception branch and the ScFeature branch, where the Inception branch includes all the green, blue, and gold modules in Figure 9, and the ScFeature branch corresponds to the red branch.

2.4. Performance Evaluation Metrics

2.4.1. Image Generation Evaluation Metrics

To compare the performance of generative models, the Fréchet Inception Distance(FID) metric is chosen as the evaluation criterion in this paper, which is selected for the following two reasons: (1) the metric can measure the fidelity and diversity of the model-generated samples at the same time [40]; (2) the metric is the evaluation metric currently used in the state-of-the-art generative modeling research [41]. The calculation of the FID metric is shown in Equation (4):

FID ((m, C), (m_{ω} {, C}_{ω})) = | | m - m_{ω} {| |}_{2}^{2} + Tr ({C + C}_{ω} - 2 {({CC}_{w})}^{1 / 2})

(4)

where,

m

denotes the multivariate Gaussian mean of the actual samples,

m_{ω}

denotes the multivariate Gaussian mean of the generated samples,

C

denotes the covariance matrix of the actual samples,

C_{ω}

denotes the covariance matrix of the generated samples, and

Tr

denotes the trace of the matrix, which is the sum of the main diagonal elements of the matrix. Since the FID index measures the distance between the actual sample and the generated sample features, the smaller its value, the closer the distribution of the generated image samples to the actual image samples and the better the fidelity and diversity of the generated images.

2.4.2. Image Classification Evaluation Metrics

We use the Accuracy, Precision, Recall, and F1 score metrics commonly used to study image classification algorithms [42]. These four metrics are calculated based on the confusion matrix. TP stands for true positive—true positive examples are predicted to be positive; FN stands for false negative—the sample is a positive example and is predicted to be in the negative; FP stands for false positive—an example of true negative but predicted positive; and TN stands for true negative—negative samples are predicted as unfavorable.

Accuracy rate is the ratio of sample books correctly classified by the classifier to the total number of samples for a given test dataset, i.e., the proportion of the model judged correctly, which is the most intuitive indicator of model performance. Its calculation is shown in Equation (5):

Accuracy = \frac{TP + TN}{TP + TN + FP + FN}

(5)

The precision rate indicates the proportion of samples judged by the model to be in the positive category that are actually in the positive category. It is calculated as shown in Equation (6):

Precision = \frac{TP}{TP + FP}

(6)

The recall represents the proportion of positive samples that the model judges to be positive and is complementary to accuracy. It is calculated as shown in Equation (7):

Recall = \frac{TP}{TP + FN}

(7)

Precision and recall are contradictory evaluation metrics; typically, recall tends to be low when precision is high. The F1 score metric combines precision and recall. Its calculation is shown in Equation (8):

F 1 score = \frac{2}{2 TP + FP + FN}

(8)

All the above four metrics are such that the more significant value represents the better model classification performance. In addition, to comprehensively compare the classification performance of different models, the average values of accuracy, precision, recall, and F1 score metrics will also be used to compare and rank the models.

3. Results

3.1. Experimental Setup

The experiments were conducted on a 64-bit Ubuntu 22.04.2 LTS system(Canonical Ltd. in London, United Kingdom), equipped with hardware of Intel Core i9-12900K (Intel Corporation, Santa Clara, CA, USA) model CPU and two NVIDIA GeForce RTX 4090 (NVIDIA Corporation, Santa Clara, CA, USA) model GPUs with Cuda 11.8.89 (NVIDIA Corporation, Santa Clara, CA, USA) parallel computing platform, and the programming language of Python 3.9.16 with the PyTorch 2.1.2 + cu118 deep learning framework. The image generation model and experimental data are trained on two NVIDIA RTX 4090 (NVIDIA Corporation, Santa Clara, CA, USA) graphics cards in a distributed manner. The batch size of each graphics card is set to 4, and the number of iterations is set to 250,000. Using the Adam optimizer, the learning rate is set to 2 × 10⁻⁴, and the global random seed is set to 42. The number of diffusion steps in the diffusion model is set to 1000. The image classification model epoch is set to 80, the batch size is set to 8, the Adam optimizer is used, the learning rate attenuation strategy is used, the initial learning rate is set to 1 × 10⁻³, and the learning rate is multiplied by the attenuation factor of 0.93 for every 2 rounds of training. The global random seed is set to 42.

In order to solve the problems of the poor training effect of the classification algorithm caused by insufficient training data and difficulty in collecting a large amount of data manually, we used the data set generation method based on the ADM-ECA model to expand the data set. According to the training and test requirements of the classification model, a generated data set for the classification task of P. notoginseng leaf diseases is made. Specifically, the method of generating and expanding the data set is shown in Figure 10. First, the original small sample data set is rotated. The original image and the rotated image are divided into a training set and a test set according to a ratio of 2:1, forming a non-generated data set, as shown in Table 2. Secondly, the original data set trains the ADM-ECA generation model. After training, several samples are generated, and then the rotation data enhancement technology is used to rotate the generated sample set. In order to ensure the comparability of the test results with the non-generated data set, all the data generated and rotated are added to the training set, and the test set is kept unchanged. Finally, the generation data set of P. notoginseng leaf disease is formed, as shown in Table 2. Considering the imbalance of the original data, the model will be affected by the distribution of the training data, which will result in generating more images for the categories with a larger original sample size. In the training process of the generative model, small sample sizes, noise, or low-quality images will lead to the generation of images that do not conform to the real features, and we will remove these images. As a result, there will be instability in the number of datasets generated compared to the original dataset.

3.2. Comparative Experiments on Image Generation Model

In order to test whether the current mainstream image generation model can be competent for the task of generating the image data set of P. notoginseng leaf disease, five mainstream image generation models of WGAN, StyleGAN2, IDDPM, ADM, and DiT were trained with the original image dataset. After the training, 1530 images of P. notoginseng disease leaves were generated by the five models. The fidelity and diversity of the samples generated by each model were quantitatively evaluated by calculating the FID metrics of each model, and the images of P. notoginseng disease leaves generated by each model were subjectively evaluated and visualized.

The results of the FID metrics calculated for each model are shown in Table 3. The ADM model generated the highest-quality image samples with an FID value of 168.94, which was 21.84%, 57.58%, 73.62%, 73.62%, and 129.47% higher than the StyleGAN2 (205.83), DiT (266.21), WGAN (293.32), and IDDPM (387.66) models, respectively.

The visualization results of the images generated by each model are shown in Figure 11. The sample generated by the IDDPM model was wholly presented as a pure green image with irregular veins without forming any practical features. The false image generated by the WGAN model was presented as a blurred noise; only the color composition is similar to that of the actual image, without forming any practical features. The false image generated by the DiT model was presented as a cluttered and distorted color block without forming any practical features. The false image generated by the StyleGAN2 model can distinguish the leaf-like body and the blurred background features, which had vein-like lines. Still, the leaf contour was seriously distorted, and no effective spots were generated. The false image generated by the ADM model was the closest to the actual image, and it can distinguish the P. notoginseng leaf, the spots, and the background features. However, the difference in details with the exact image was still more significant, and there were leaf blades, spots, and background features. However, the difference between the details and the actual image was still more significant, and there were the phenomena of slight distortion of leaf contour, wrong leaf vein texture, and wrong lesion characteristics, which are not in line with the actual P. notoginseng diseased leaf characteristics. According to the experimental results, the image fidelity generated by these five mainstream image generation models was insufficient. Referring to the research on these models, it was found that the small amount of training samples may lead to poor learning of the data feature distribution [28]. To generate the P. notoginseng disease leaf image dataset and improve the learning ability of the model to the small sample disease features of P. notoginseng leaves, further improvement is needed to generate images with high fidelity. Therefore, we proposed the ADM-ECA model, and the experimental results are shown in Section 3.5.1.

3.3. Validation of the Effectiveness of Image Generation

In order to visually demonstrate the effectiveness of the data set generation method based on the ADM-ECA model, the RegNet model was used to extract the semantic features of the six diseases from the non-generated data set and the generated data set. Then, the two data sets were visualized using t-SNE technology [43]. The results are shown in Figure 12. Figure 12 is the potential spatial feature distribution of six diseases in the non-generation and generation data sets. By observing (a), it can be found that the feature distribution of the first six disease samples before the data set is generated. There is a significant overlap between them, which makes it easy to confuse the model. It also reflects the situation that round spot disease and melasma are easily confused due to similar features; in (b), after the data set is generated, the samples of the same type can be well clustered together, and the settlement boundaries of each type are more accessible to distinguish than before. It effectively reduces the distribution distance within the class. It increases the distribution distance between different classes, avoiding confusion between classes, thereby improving the accuracy and robustness of the model classification. The results show the authenticity of the samples generated by our proposed ADM-ECA model and preliminarily prove the effectiveness of our proposed generation method.

3.4. Comparative Experiments on Classification Methods

In order to test the classification performance of the proposed Inception-SSNet model and the classification effect on different diseases, we compare Inception-SSNet with 11 mainstream classification models on the generated data set and the non-generated data set.

The experimental results in Table 4 and Figure 13 demonstrate that the image generation method proposed in this paper enhances the accuracy of the Inception-SSNet model by 2.40%. This improvement is attributed to the larger training set created by the generated data, enabling the models to learn richer feature representations, enhance generalization, and mitigate overfitting. Models like EfficientNet-b0, ShuffleNet v2, RegNet, and ConvNeXt show notable increases of over 15%, with EfficientNet-b0 leading with improvements in accuracy, precision, recall, and F1 score by 17.84%, 18.5%, 18.62%, and 18.92% respectively. MobileNet v2 and Inception-v4 also show improvements exceeding 10%, while Inception-v3 shows the smallest improvement at 1.48%. Overall, the results underscore the practical and universal effectiveness of the proposed data generation method across 11 mainstream image classification models.

Our proposed classification model, Inception-SSNet, compared with the baseline model, shows excellent performance on both non-generated and generated data sets. In the non-generated data set, the classification accuracy, precision, recall, and F1 score reached 97.04%, 97.48%, 98.37%, and 97.87%, respectively. In the generated dataset, the classification accuracy, precision, recall, and F1 score were increased by 2.4%, 1.82%, 1.25%, and 2.21%, respectively. Compared with the other 11 mainstream models, our proposed image classification model is also optimal in each index, proving our proposed method’s effectiveness in the classification of P. notoginseng leaf diseases. Although the classification accuracy of our proposed model has reached 99.44% after image generation, we can further improve the flexibility of data collection and pay attention to the privacy of data. For example, we can apply Flexible Vis/NIR Sensing technology in future work [15] to analyze multi-band data. The privacy of data can be combined with federated learning to ensure data security while maintaining the efficiency of the model [16].

Table 5 shows the accuracy of the 12 models we tested and their corresponding 95% confidence intervals. From the results, we can see that the model we proposed shows a very high accuracy rate of 99.44%, and its confidence interval is (0.9819–0.9931), indicating that we have a high credibility of its accuracy estimation. Moreover, the confidence interval of the model is small, which further proves that our improvement is meaningful and reliable.

To further explore the effectiveness of the classification model for the classification of P. notoginseng leaf diseases, we compared the accuracy of different models for different diseases in the non-generated data set and the generated data set. The test results are shown in Table 6. For the non-generated data set, it can be seen from the test results that round spot and melasma are the disease types with the lowest classification accuracy. After using the data generation method, the classification accuracy gap between the round spot disease, the melasma, and the other four diseases has been reduced. Whether in the non-generated data set or the generated data set, the classification accuracy of each model for round spot and melasma was significantly lower than that of the other four diseases. This is because the lesion features of round spot disease and melasma are very similar and because the high-dimensional feature maps finally extracted by the model and passed to the classifier for judgment are usually small in size, which further leads to the loss of detailed information in the features, and ultimately makes the model difficult to distinguish. As shown in Figure 14, taking the Inception-v3 model as an example, the final extracted feature map size is 8 × 8 pixels. It can be observed from the figure that the actual lesion morphology of round spot and black spot, as well as the high-dimensional feature maps extracted by Inception-v3 for these two disease images, are highly similar.

In order to solve this problem, we add skip connection, attention feature fusion, and self-calibration convolution to the Inception-V3 model and propose the Inception-SSNet classification hybrid model. After adding these modules, Inception-SSNet can better use local and global information, improve the model’s learning of features, and better distinguish the features of round spot and melasma. It can be seen from the experimental results that for the non-generated data set, the accuracy of Inception-SSNet in the classification of black spot and round spot can reach 98.48% and 97.6%, respectively, and the accuracy of the baseline model is increased by 1.51% and 4.81%, respectively. For the generated data set, the classification accuracy of black spot disease and round spot disease can reach 99.99% and 99.04%, respectively, and the accuracy of the baseline model is increased by 4.54% and 1.44%, respectively. This experiment proves the effectiveness of our proposed classification model and confirms the effectiveness of our proposed data generation method again.

3.5. Ablation Experiment

3.5.1. Different Attention Mechanisms in the Generation Model

To optimize the kinds of attention mechanisms that are most suitable for this architecture and the task of P. notoginseng leaf disease image generation, four types of attention mechanisms—SE, CBAM, CA, and ECA—are applied to the UNnet network attention module of the ADM-AT architecture, respectively. They are constructed as ADM-SE, ADM-CBAM, ADM-CA, and ADM-ECA models. Experiments were conducted on P. notoginseng leaf disease image generation using these four models, and the generated image visualizations were compared, as shown in Table 7. Among them, the ADM-ECA model with the ECA attentional mechanism was the most effective, with an FID value of 42.73, which resulted in a performance improvement of 74.71% compared to the ADM model before the improvement. This is followed by CA (48.61), CBAM (55.09) [44], and SE (60.73) [45] attentional mechanisms with 71.23%, 67.39%, and 64.05% performance enhancement, respectively. The visualization results of the images generated by each model are shown in Figure 15. All four models have realistic leaf contours, backgrounds, leaf vein textures, and correct characteristics. Of the six diseases, gray mold disease, powdery mildew, virus disease, anthrax, melasma, and round spot disease are visually readable and achieve the effect of generating images realistically. Therefore, we apply the ECA attention mechanism with optimal impact on the ADM-AT architecture to construct the ADM-ECA P. notoginseng leaf disease image generation model and use it for the P. notoginseng leaf disease image dataset expansion task.

3.5.2. Effectiveness of the Proposed Module on the Inception-SSNet Model

In this section, we validate the effectiveness of the proposed module for Inception-SSNet in terms of classification performance metrics and classification effects on different diseases. The dataset used in this part is the dataset after adding the generated images. The results of the ablation experiments on classification performance metrics are shown in Table 8. Among them, all four metrics of the model, after adding skip connections, are significantly improved for the baseline model. The accuracy, precision, recall, and F1 score improved by 0.46%, 0.54%, 0.66%, and 0.6%, respectively. Skip connections can better use local and global information, and more disease features can be learned. The classification performance metrics are also improved after adding the other two modules. From the experimental results, it can be seen that the addition of skip connection, AFF module, and ScFeature can effectively improve the performance of the baseline model.

The results of the ablation experiments to categorize the different diseases are shown in Table 9. Table 9 shows that skip connection reduces the model’s classification accuracy for powdery mildew by 1.14%. In comparison, adding the AFF module improves the classification accuracy for powdery mildew by 1.14%. Similarly, adding the AFF module reduced anthrax accuracy by 3.57% and increased gray mold accuracy by 0.33%, while adding the ScFeature did the opposite. The complementary relationship between the classification performance shortcomings of these three modules for the six diseases resulted in the Inception-SSNet model proposed in this paper (i.e., the scheme of baseline + skip connection + AFF + ScFeature) having no reduction in the classification accuracy of all diseases compared to baseline. No reduction in the classification accuracies of melasma, round spot disease, viral disease, and anthrax was classified with 4.55%, 1.44%, 0.83%, and 0.71% higher accuracy, respectively.

To demonstrate more intuitively the influence of the hopping connection, AFF, and ScFeature adjuncts on the models, the final feature maps of the intermediate layers of each ablation model were visualized using the Grad-CAM algorithm, as shown in Figure 16. The redder the color of the area, the more critical this part of the feature plays in the category determination.

The figure shows that, compared with the baseline, the model correctly identifies the spot location of round spot disease after using a skip connection. The extracted melasma and anthrax features align with the contour of the actual spot area, indicating that the model learns more detailed information. In contrast, the feature extraction effect of powdery mildew spots deteriorates. After adding the AFF module, the contours of powdery mildew, virus disease, and melasma features extracted by the model were added in more detail. They conformed to the shape of the actual contours. The extracted round spot feature information removed irrelevant noise and retained the focused information. Still, the extraction of anthrax features was omitted and only focused on a part of the spot. After adding the ScFeature, the extraction of anthracnose, gray mold, and melasma features extracted from the central stem part of the model was significantly enhanced, making it more consistent with the actual spot boundaries and adding more details. It is also worth noting that the spot features extracted by the ScFeature for these three diseases were successfully achieved to fill the omitted areas in the central stem part. In contrast, the remaining spot features of powdery mildew, viruses, and round spots were also unchanged by adding the ScFeature, as the previous model extraction with the addition of the AFF module was already very good.

4. Discussion

Although the model proposed in this paper achieves high-precision classification of Panax notoginseng leaf diseases in image generation and image classification models, there are still some limitations that need to be improved and perfected in future research. (1) Although the number of the dataset has been generated, there are still some limitations in the sample category balance of the dataset. In the future, we can further study how to strengthen the balance between the number and diversity of different types of disease samples to improve the generalization ability of deep learning classification algorithms in the face of real application scenarios. (2) Although the performance of the classification model has achieved high accuracy, it may face more challenges in the actual scene. Different factors such as illumination, occlusion, angle, distance, leaf damage, and mud spots may affect the stability of the classification results. Therefore, the classification problem under complex environmental conditions can be studied in the future to improve the robustness and adaptability of the model. (3) It can be considered to combine other sensor data, such as infrared images, multi-spectral images, etc., to comprehensively analyze the health status of Panax notoginseng leaves from multiple perspectives to improve the comprehensiveness and accuracy of classification.

5. Conclusions

For the classification of P. notoginseng leaf disease, we propose a data generation model ADM-ECA based on the diffusion model to reduce the influence of a small sample dataset on the classification performance. This model improves the quantity and quality of images of the P. notoginseng leaf disease dataset. The experimental results show that our proposed ECA-based diffusion model FID reaches 42.73 and outperforms several state-of-the-art GANs (WGAN, StyleGAN2) in sample diversity. In our proposed hybrid model, Inception-SSNet, the accuracy exceeds 11 mainstream models (Figure 17). Our proposed Inception-SSNet classification model achieves an accuracy of 97.04% on the non-generated dataset, which is an improvement of 0.11% compared with the baseline model. Compared to VGG16 and EfficientNet, the accuracy is higher by 23.62% and 18.97%. On the generated dataset, the accuracy, precision, recall, and F1 scores reached 99.44%, 99.30%, 99.62%, and 99.45%, which is an improvement of 1.02%, 1.14%, 1.26%, and 1.20% compared to the baseline model. Compared to VGG16 and EfficientNet, it achieved higher accuracies by 16.76% and 3.53%. In addition, the same improvement was achieved for the difficulty in distinguishing between round spot disease and melasma at the time of classification. Therefore, this study has achieved significant results in the field of diagnosis of P. notoginseng leaf diseases. The innovative image generation and classification algorithms provide feasible solutions to the disease diagnosis problems in the P. notoginseng industry.

Author Contributions

R.W.: Data curation, Methodology, Resources, Software, Visualization, Validation, Writing—original draft, Writing—review and editing. X.Z.: Writing—review and editing. Q.Y.: Writing—review and editing. L.L.: Writing—review and editing. J.L.: Writing—review and editing. L.Y.: Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financially supported by the National Natural Science Foundation of China (No. 51979134), the Applied Basic Research Key Project of Yunnan (No. 202201AS070034), and the Yunnan Science and Technology Talent and Platform Program (No. 202305AM070006).

Data Availability Statement

The data that support the findings of this study are available from the first author, Ruoxi Wang, upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, H.; Wu, J.; Chen, C.; Xin, W.; Zhang, W. Simultaneous Determination of 74 Pesticide Residues in Panax Notoginseng by QuEChERS Coupled with Gas Chromatography Tandem Mass Spectrometry. Food Sci. Hum. Wellness 2021, 10, 241–250. [Google Scholar] [CrossRef]
Pantazi, X.E.; Moshou, D.; Tamouridou, A.A. Automated Leaf Disease Detection in Different Crop Species through Image Features Analysis and One Class Classifiers. Comput. Electron. Agric. 2019, 156, 96–104. [Google Scholar] [CrossRef]
Zhao, L.; Li, Y.; Ren, W.; Huang, Y.; Wang, X.; Fu, Z.; Ma, W.; Teng, Y.; Luo, Y. Pesticide Residues in Soils Planted with Panax Notoginseng in South China, and Their Relationships in Panax Notoginseng and Soil. Ecotoxicol. Environ. Safe 2020, 201, 110783. [Google Scholar] [CrossRef]
Arnal Barbedo, J.G. Digital Image Processing Techniques for Detecting, Quantifying and Classifying Plant Diseases. SpringerPlus 2013, 2, 660. [Google Scholar] [CrossRef] [PubMed]
Khan, M.A.; Akram, T.; Sharif, M.; Awais, M.; Javed, K.; Ali, H.; Saba, T. CCDF: Automatic System for Segmentation and Recognition of Fruit Crops Diseases Based on Correlation Coefficient and Deep CNN Features. Comput. Electron. Agric. 2018, 155, 220–236. [Google Scholar] [CrossRef]
Rawat, W.; Wang, Z. Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review. Neural Comput. 2017, 29, 2352–2449. [Google Scholar] [CrossRef] [PubMed]
Brahimi, M.; Boukhalfa, K.; Moussaoui, A. Deep Learning for Tomato Diseases: Classification and Symptoms Visualization. Appl. Artif. Intell. 2017, 31, 299–315. [Google Scholar] [CrossRef]
Ramcharan, A.; Baranowski, K.; McCloskey, P.; Ahmed, B.; Legg, J.; Hughes, D.P. Deep Learning for Image-Based Cassava Disease Detection. Front. Plant Sci. 2017, 8, 1852. [Google Scholar] [CrossRef]
Amara, J.; Bouaziz, B.; Algergawy, A. A Deep Learning-Based Approach for Banana Leaf Diseases Classification; Gesellschaft für Informatik e.V.: Bonn, Germany, 2017. [Google Scholar]
Chen, J.; Chen, J.; Zhang, D.; Sun, Y.; Nanehkaran, Y.A. Using Deep Transfer Learning for Image-Based Plant Disease Identification. Comput. Electron. Agric. 2020, 173, 105393. [Google Scholar] [CrossRef]
Mohanty, S.P.; Hughes, D.P.; Salathé, M. Using Deep Learning for Image-Based Plant Disease Detection. Front. Plant Sci. 2016, 7, 1419. [Google Scholar] [CrossRef]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Wang, M.; Wang, B.; Zhang, R.; Wu, Z.; Xiao, X. Flexible Vis/NIR Wireless Sensing System for Banana Monitoring. Food Qual. Saf. 2023, 7, fyad025. [Google Scholar] [CrossRef]
Wang, M.; Zhang, R.; Wu, Z.; Xiao, X. Flexible Wireless in Situ Optical Sensing System for Banana Ripening Monitoring. J. Food Process. Eng. 2023, 46, e14474. [Google Scholar] [CrossRef]
Zhang, R.; Wang, M.; Liu, P.; Zhu, T.; Qu, X.; Chen, X.; Xiao, X. Flexible Vis/NIR Sensing System for Banana Chilling Injury. Postharvest Biol. Technol. 2024, 207, 112623. [Google Scholar] [CrossRef]
Aggarwal, M.; Khullar, V.; Goyal, N.; Alammari, A.; Albahar, M.A.; Singh, A. Lightweight Federated Learning for Rice Leaf Disease Classification Using Non Independent and Identically Distributed Images. Sustainability 2023, 15, 12149. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep Learning in Agriculture: A Survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Xu, M.; Yoon, S.; Fuentes, A.; Park, D.S. A Comprehensive Survey of Image Augmentation Techniques for Deep Learning. Pattern Recognit. 2023, 137, 109347. [Google Scholar] [CrossRef]
Khalifa, N.E.; Loey, M.; Mirjalili, S. A Comprehensive Survey of Recent Trends in Deep Learning for Digital Images Augmentation. Artif. Intell. Rev. 2022, 55, 2351–2377. [Google Scholar] [CrossRef] [PubMed]
Chen, R.J.; Lu, M.Y.; Chen, T.Y.; Williamson, D.F.K.; Mahmood, F. Synthetic Data in Machine Learning for Medicine and Healthcare. Nat. Biomed. Eng. 2021, 5, 493–497. [Google Scholar] [CrossRef]
Lu, Y.; Chen, D.; Olaniyi, E.; Huang, Y. Generative Adversarial Networks (GANs) for Image Augmentation in Agriculture: A Systematic Review. Comput. Electron. Agric. 2022, 200, 107208. [Google Scholar] [CrossRef]
Espejo-Garcia, B.; Mylonas, N.; Athanasakos, L.; Vali, E.; Fountas, S. Combining Generative Adversarial Networks and Agricultural Transfer Learning for Weeds Identification. Biosyst. Eng. 2021, 204, 79–89. [Google Scholar] [CrossRef]
Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2016, arXiv:1511.06434. [Google Scholar]
Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA; pp. 1800–1807. [Google Scholar]
Mirza, M.; Osindero, S. Conditional Generative Adversarial Nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
Abbas, A.; Jain, S.; Gour, M.; Vankudothu, S. Tomato Plant Disease Detection Using Transfer Learning with C-GAN Synthetic Images. Comput. Electron. Agric. 2021, 187, 106279. [Google Scholar] [CrossRef]
Creswell, A.; White, T.; Dumoulin, V.; Arulkumaran, K.; Sengupta, B.; Bharath, A.A. Generative Adversarial Networks: An Overview. IEEE Signal Process. Mag. 2018, 35, 53–65. [Google Scholar] [CrossRef]
Dhariwal, P.; Nichol, A. Diffusion Models Beat GANs on Image Synthesis. In Proceedings of the Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Virtual, 6–14 December 2021. Annual Conference on Neural Information Processing Systems 2021. [Google Scholar]
Lu, S.; Guan, F.; Zhang, H.; Lai, H. Underwater Image Enhancement Method Based on Denoising Diffusion Probabilistic Model. J. Vis. Commun. Image Represent. 2023, 96, 103926. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020. Advances in Neural Information Processing Systems 33. [Google Scholar]
Ho, J.; Saharia, C.; Chan, W.; Fleet, D.J.; Norouzi, M.; Salimans, T. Cascaded Diffusion Models for High Fidelity Image Generation. J. Mach. Learn. Res. 2022, 23, 1–33. [Google Scholar]
Özbey, M.; Dalmaz, O.; Dar, S.U.H.; Bedel, H.A.; Özturk, Ş.; Güngör, A.; Çukur, T. Unsupervised Medical Image Translation with Adversarial Diffusion Models. IEEE Trans. Med. Imaging 2023, 42, 3524–3539. [Google Scholar] [CrossRef]
Saharia, C.; Chan, W.; Chang, H.; Lee, C.; Ho, J.; Salimans, T.; Fleet, D.; Norouzi, M. Palette: Image-to-Image Diffusion Models. In Proceedings of the Special Interest Group on Computer Graphics and Interactive Techniques Conference Proceedings, Vancouver, BC, Canada, 7–11 August 2022; ACM: New York, NY, USA; pp. 1–10. [Google Scholar]
Jarzynski, C. Equilibrium Free-Energy Differences from Nonequilibrium Measurements: A Master-Equation Approach. Phys. Rev. E 1997, 56, 5018–5035. [Google Scholar] [CrossRef]
Croitoru, F.-A.; Hondru, V.; Ionescu, R.T.; Shah, M. Diffusion Models in Vision: A Survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 3261988. [Google Scholar] [CrossRef]
Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2022, arXiv:1312.6114. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA; pp. 11531–11539. [Google Scholar]
Dai, Y.; Gieseke, F.; Oehmcke, S.; Wu, Y.; Barnard, K. Attentional Feature Fusion. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; IEEE: Piscataway, NJ, USA; pp. 3559–3568. [Google Scholar]
Liu, J.-J.; Hou, Q.; Cheng, M.-M.; Wang, C.; Feng, J. Improving Convolutional Networks with Self-Calibrated Convolutions. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA; pp. 10093–10102. [Google Scholar]
Bynagari, N.B. GANs Trained by a Two Time-Scale Update Rule Converge to a Local Nash Equilibrium. Asian J. Appl. Sci. Eng. 2019, 8, 25–34. [Google Scholar] [CrossRef]
Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; IEEE: Piscataway, NJ, USA; pp. 8107–8116. [Google Scholar]
Reddy, B.H.; Karthikeyan, P.R. Classification of Fire and Smoke Images Using Decision Tree Algorithm in Comparison with Logistic Regression to Measure Accuracy, Precision, Recall, F-Score. In Proceedings of the 2022 14th International Conference on Mathematics, Actuarial Science, Computer Science and Statistics (MACS), Karachi, Pakistan, 12–13 November 2022; IEEE: Piscataway, NJ, USA; pp. 1–5. [Google Scholar]
Kobak, D.; Berens, P. The Art of Using T-SNE for Single-Cell Transcriptomics. Nat. Commun. 2019, 10, 5416. [Google Scholar] [CrossRef] [PubMed]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; IEEE: Piscataway, NJ, USA; pp. 13708–13717. [Google Scholar]
Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]

Figure 1. Sample examples of the dataset.

Figure 2. P. notoginseng leaf diseases classification workflow.

Figure 3. Mechanisms of diffusion models.

Figure 4. UNet network structure in ADM based on ECA attention mechanism.

Figure 5. Structure of the ECA attention mechanism. In Figure 5, GAP stands for Global Average Pooling, and the “

\otimes

” symbol stands for matrix corresponding element-wise multiplication.

Figure 5. Structure of the ECA attention mechanism. In Figure 5, GAP stands for Global Average Pooling, and the “

\otimes

” symbol stands for matrix corresponding element-wise multiplication.

Figure 6. Architecture of the Inception-SSNet model.

Figure 7. Structure of MS-CAM.

Figure 8. Structure of AFF.

Figure 9. Architecture of ScConv. In Figure 9, Split represents splitting the tensor, concat represents concatenating the tensor, ‘

\otimes

’ symbols stand for element-wise multiplication, and ‘

\oplus

’ symbols stand for element-wise addition.

Figure 9. Architecture of ScConv. In Figure 9, Split represents splitting the tensor, concat represents concatenating the tensor, ‘

\otimes

’ symbols stand for element-wise multiplication, and ‘

\oplus

’ symbols stand for element-wise addition.

Figure 10. Process flow of the data set generation method based on the ADM-ECA model.

Figure 11. Visualization of P. notoginseng leaf disease images generated by mainstream image generation models.

Figure 12. t-SNE 2D reduction visualization of potential spatial features before and after expanding the dataset.

Figure 13. Impact of generated data sets on the performance of different models. The models of 1–11 is VGG16, ConvNeXt, ShuffleNet v2, ResNet18, RegNet, ResNet50, EfficientNet-b0, MobileNet v2, Inception-v4, Inception-Resnet v2, and Inception-v3, respectively.

Figure 14. A comparison of lesion characteristics between round spot disease and black spot disease.

Figure 15. Visualization of P. notoginseng leaf disease images generated by improved models using different attention mechanisms.

Figure 16. Visual representation of the effectiveness of various ablation schemes in extracting disease characteristics.

Figure 17. Comprehensive comparison of the proposed model with various mainstream classification models.

Table 1. Recent work comparison.

Reference	Methodology	Application Focus	Advantage
[13]	Vis/NIR wireless sensing	Banana health monitoring	It provides a convenient wireless remote monitoring method, reduces the frequency of on-site inspection, and improves monitoring efficiency.
[14]	Wireless optical sensing in situ	Banana ripening monitoring	Real-time, on-site monitoring is realized, which can accurately track the banana ripening process and help farmers optimize the picking time.
[15]	Vis/NIR sensing system	Banana chilling injury detection	The combination of visible light and near-infrared technology enhances the detection ability of banana cold damage and provides more reliable diagnostic results.
[16]	Lightweight federated learning	Rice leaf disease classification	The federal learning method improves data privacy protection and optimizes the performance of the classification model in a multi-data source environment.
[9]	Deep learning image classification	Banana leaf disease classification	The lightweight structure of the LeNet model is suitable for equipment with limited resources and can efficiently classify banana leaf diseases.
[11]	Deep learning image classification	General plant disease detection	GoogleNet’s deep network structure improves the ability to identify a variety of plant diseases and is suitable for a wide range of application scenarios.
[10]	INC-VGGN	General plant disease detection	Combining the advantages of VGG and Inception network, it provides high-accuracy plant disease detection and enhances the generalization ability of the model.

Table 2. Classification dataset of P. notoginseng leaf diseases.

Class	Non-Generated Data Set			Generated Data Set
Class	Train Set	Test Set	Total	Train Set	Test Set	Total
powdery mildew	300	176	476	760	176	936
virus disease	328	120	448	700	120	820
melasma	392	132	524	776	132	908
gray mold disease	600	300	900	1744	300	2044
anthrax	240	140	380	640	140	780
round spot disease	432	208	640	896	208	1104
total	2292	1076	3368	5516	1076	6592

Table 3. Quantitative evaluation of image generation effects of mainstream algorithms for P. notoginseng leaf disease.

Model	FID
IDDPM	387.66
WGAN	293.32
DiT	266.21
StyleGAN2	205.83
ADM	168.94

Table 4. Performance comparison of the Inception-SSNet model with other classification models on generated data set.

Model	Classification Performance (%)					Model Complexity
Model	Acc	Precision	Recall	F1 Score	Average	GFLOPs	Param (M)
VGG16	80.68 (⬆ 7.26)	79.07 (⬆ 7.15)	79.54 (⬆ 7.34)	79.75 (⬆ 7.62)	79.76 (⬆ 7.34)	40.46	134.29
ConvNeXt	88.86 (⬆ 15.53)	88.07 (⬆ 15.58)	87.84 (⬆ 15.35)	88.18 (⬆ 15.52)	88.24 (⬆ 15.5)	40.30	87.57
ShuffleNet v2	91.64 (⬆ 16.73)	73.13 (⬆ 17.49)	91.15 (⬆ 18.55)	90.80 (⬆ 18.22)	91.06 (⬆ 17.75)	0.40	1.26
ResNet18	93.03 (⬆ 7.90)	90.62 (⬆ 7.54)	93.39 (⬆ 8.13)	92.77 (⬆ 7.99)	92.90 (⬆ 7.89)	4.76	11.18
RegNet	93.96 (⬆ 17.38)	93.62 (⬆ 18.14)	94.05 (⬆ 17.17)	93.77 (⬆ 17.95)	93.85 (⬆ 17.66)	6.40	17.93
ResNet50	94.43 (⬆ 6.88)	94.32 (⬆ 6.33)	94.23 (⬆ 6.98)	94.26 (⬆ 6.83)	94.32 (⬆ 6.76)	10.78	23.52
EfficientNet-b0	95.91 (⬆ 17.84)	95.59 (⬆ 18.50)	96.05 (⬆ 18.62)	76.89 (⬆ 18.92)	95.84 (⬆ 18.47)	0.03	4.02
MobileNet v2	95.91 (⬆ 11.06)	95.61 (⬆ 11.94)	95.85 (⬆ 11.02)	95.81 (⬆ 11.80)	95.77 (⬆ 11.46)	0.83	2.23
Inception-v4	95.91 (⬆ 10.69)	95.81 (⬆ 10.55)	95.72 (⬆ 12.21)	95.67 (⬆ 11.59)	95.78 (⬆ 11.26)	12.28	41.15
Inception-Resnet v2	96.37 (⬆ 4.18)	96.27 (⬆ 4.51)	96.09 (⬆ 4.34)	96.12 (⬆ 4.40)	96.22 (⬆ 4.36)	13.00	54.32
Inception-v3	98.42 (⬆ 1.49)	98.16 (⬆ 1.78)	98.36 (⬆ 1.15)	98.26 (⬆ 1.51)	98.30 (⬆ 1.48)	11.50	25.12
Inception-SSNet	99.44 (⬆ 2.40)	99.30 (⬆ 1.82)	99.62 (⬆ 1.25)	99.45 (⬆ 1.58)	99.45 (⬆ 2.21)	16.64	51.94

Note: The data in () is an improved value compared to the non-generated data set. “⬆” represents the increase value of the index.

Table 5. Confidence interval of models.

Model	Acc (%)	Confidence Interval (95%)
VGG16	80.68	(0.7805, 0.8309)
ConvNeXt	88.86	(0.8668, 0.9104)
ShuffleNet v2	91.64	(0.8993, 0.9336)
ResNet18	93.03	(0.9126, 0.9437)
RegNet	93.96	(0.9236, 0.9495)
ResNet50	94.43	(0.9297, 0.9513)
EfficientNet-b0	95.91	(0.9424, 0.9701)
MobileNet v2	95.91	(0.9424, 0.9701)
Inception-v4	95.91	(0.9424, 0.9701)
Inception-Resnet v2	96.37	(0.9482, 0.9723)
Inception-v3	98.42	(0.9662, 0.9921)
Inception-SSNet	99.44	(0.9819, 0.9931)

Table 6. Comparison of different disease accuracy between the Inception-SSNet model and other classification models on non-generated data sets and generated data sets.

Model	Disease Classification Accuracy (%)
Model	Round Spot	Gray Mold	Anthrax	Virus	Powdery	Melasma
Inception-Resnet v2	95.67 (⬆ 7.21)	96.67 (⬆ 1.00)	99.29 (⬆ 2.86)	95.84 (⬆ 1.67)	98.99 (⬆ 6.81)	92.42 (⬆ 9.84)
EfficientNet-b0	96.15 (⬆ 34.61)	94.67 (⬆ 9.34)	99.99 (⬆ 7.13)	96.67 (⬆ 31.67)	97.16 (⬆ 13.07)	91.67 (⬆ 15.91)
MobileNet v2	94.23 (⬆ 28.36)	95.67 (⬆ 3.34)	99.29 (⬆ 8.58)	92.50 (⬆ 0.83)	99.43 (⬆ 9.09)	93.94 (⬆ 15.91)
Inception-v4	95.67 (⬆ 20.19)	95.33 (⬆ 0.66)	99.99 (⬆ 9.28)	91.67 (⬆ 14.17)	99.99 (⬆ 6.24)	91.67 (⬆ 22.73)
ResNet50	90.87 (⬆ 11.54)	96.33 (⬆ 5.66)	97.86 (⬇ 0.71)	98.33 (⬆ 4.16)	97.16 (⬆ 2.27)	84.85 (⬆ 18.94)
RegNet	95.19 (⬆ 35.09)	92.67 (⬆ 14.34)	97.14 (⬆ 12.14)	97.50 (⬆ 5.83)	95.45 (⬆ 0.00)	86.36 (⬆ 35.60)
ResNet18	85.58 (⬆ 16.83)	93.00 (⬆ 3.00)	97.86 (⬆ 1.43)	94.17 (⬆ 0.84)	98.86 (⬆ 6.25)	90.91 (⬆ 20.46)
ShuffleNet v2	87.50 (⬆ 23.56)	94.67 (⬆ 6.67)	92.86 (⬆ 12.15)	92.50 (⬆ 25.00)	96.02 (⬆ 13.63)	83.33 (⬆ 30.30)
ConvNeXt	79.81 (⬆ 12.02)	96.33 (⬆ 18.00)	87.14 (⬆ 18.57)	91.67 (⬆ 8.34)	90.91 (⬆ 8.52)	82.58 (⬆ 28.03)
VGG16	66.83 (⬆ 5.29)	92.00 (⬆ 9.00)	88.57 (⬆ 25.71)	89.99 (⬆ 14.16)	87.50 (⬆ 5.68)	79.55 (⬆ 11.37)
Inception-v3	97.60 (⬆ 4.81)	98.67 (⬆ 1.34)	99.29 (⬆ 0.00)	99.17 (⬆ 0.00)	99.99 (⬆ 2.26)	95.45 (⬇ 1.52)
Inception-SSNet	99.04 (⬆ 1.44)	98.67 (⬆ 0.00)	99.99 (⬆ 0.70)	99.99 (⬆ 0.82)	99.99 (⬆ 1.13)	99.99 (⬆ 1.51)
Average	89.55 (⬆ 18.13)	95.09 (⬆ 6.58)	96.30 (⬆ 8.83)	91.67 (⬆ 6.82)	96.59 (⬆ 6.71)	88.43 (⬆ 18.87)

Note: The data in () is an improved value compared to the non-generated data set. “⬆⬇” represents the change value of the index.

Table 7. Comparing the performance of ADM applications with different attention mechanisms.

Model	Attention	FID
ADM	None	168.94
ADM-SE	SE	60.73 (⬆ 64.05%)
ADM-CBAM	CBAM	55.09 (⬆ 67.39%)
ADM-CA	CA	48.61 (⬆ 71.23%)
ADM-ECA	ECA	42.73 (⬆ 74.71%)

Note: The data in () is an improved value compared to the non-generated data set. “⬆” represents the change value of the index.

Table 8. Results of classification model ablation experiments.

Method				Classification Performance (%)
Baseline	Skip Connection	AFF	SC	Accuracy	Precision	Recall	F1 Score	Average
√				98.42	98.16	98.36	98.25	98.30
√	√			98.88	98.70	99.02	98.85	98.86
√	√	√		99.07	99.00	99.08	99.02	99.04
√	√	√	√	99.44	99.30	99.62	99.45	99.45

Note: “√” indicates that the corresponding method is used in the model.

Table 9. Classification model ablation experiment for six diseases accuracy.

Method				Classification Performance (%)
Baseline	Skip Connection	AFF	SC	Round Spot	Gray Mold	Anthrax	Virus	Powdery	Melasma
√				97.60	98.67	99.29	99.17	99.99	95.45
√	√			98.08	98.67	99.99	99.99	98.86	98.48
√	√	√		99.04	99.00	96.43	99.99	99.99	99.99
√	√	√	√	99.04	98.67	99.99	99.99	99.99	99.99

Note: “√” indicates that the corresponding method is used in the model.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, R.; Zhang, X.; Yang, Q.; Lei, L.; Liang, J.; Yang, L. Enhancing Panax notoginseng Leaf Disease Classification with Inception-SSNet and Image Generation via Improved Diffusion Model. Agronomy 2024, 14, 1982. https://doi.org/10.3390/agronomy14091982

AMA Style

Wang R, Zhang X, Yang Q, Lei L, Liang J, Yang L. Enhancing Panax notoginseng Leaf Disease Classification with Inception-SSNet and Image Generation via Improved Diffusion Model. Agronomy. 2024; 14(9):1982. https://doi.org/10.3390/agronomy14091982

Chicago/Turabian Style

Wang, Ruoxi, Xiaofan Zhang, Qiliang Yang, Lian Lei, Jiaping Liang, and Ling Yang. 2024. "Enhancing Panax notoginseng Leaf Disease Classification with Inception-SSNet and Image Generation via Improved Diffusion Model" Agronomy 14, no. 9: 1982. https://doi.org/10.3390/agronomy14091982

APA Style

Wang, R., Zhang, X., Yang, Q., Lei, L., Liang, J., & Yang, L. (2024). Enhancing Panax notoginseng Leaf Disease Classification with Inception-SSNet and Image Generation via Improved Diffusion Model. Agronomy, 14(9), 1982. https://doi.org/10.3390/agronomy14091982

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Panax notoginseng Leaf Disease Classification with Inception-SSNet and Image Generation via Improved Diffusion Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Collection and Workflow

2.2. Efficient Channel Attention-Based Autoregressive Diffusion Model for Image Generation

2.3. Inception-SSNet Model for Panax notoginseng Leaf Disease Classification

2.4. Performance Evaluation Metrics

2.4.1. Image Generation Evaluation Metrics

2.4.2. Image Classification Evaluation Metrics

3. Results

3.1. Experimental Setup

3.2. Comparative Experiments on Image Generation Model

3.3. Validation of the Effectiveness of Image Generation

3.4. Comparative Experiments on Classification Methods

3.5. Ablation Experiment

3.5.1. Different Attention Mechanisms in the Generation Model

3.5.2. Effectiveness of the Proposed Module on the Inception-SSNet Model

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI