1. Introduction
Panax notoginseng (
Burk) F. H. Chen usually grows in cool and humid environments and its roots can grow for more than three years [
1]. However, the years-long cultivation period and humid climatic conditions make
P. notoginseng susceptible to various soil-borne diseases [
2,
3]. Widespread disease outbreaks can cause severe economic losses, and even small-scale disease emergence can lead to decreased yields and quality. The frequent occurrence and diversity of diseases of
P. notoginseng have seriously affected the sustainable and healthy development of the
P. notoginseng industry. Traditional methods of disease identification mainly rely on manual observation and experience, which are slow, laborious, and highly subjective [
4,
5]. Therefore, to avoid the above problems, it is necessary to develop an intelligent method to identify
P. notoginseng diseases.
In recent years, advances in deep learning technologies have provided a potential solution for automated, nondestructive diagnosis of crop diseases. Image classification algorithms have demonstrated performance beyond the human level in specific benchmark tests, where they are able to accurately diagnose and categorize plant diseases early [
6]. Many researchers have used deep learning techniques to quickly identify plant diseases at the early stage of symptoms and have achieved great success [
7,
8]. For example, Amara et al. used the LeNet method to achieve the classification of three states (healthy, black streak wilt, and black spot) of bananas on tens of thousands of banana disease datasets from PlantVillage, with a classification accuracy of 97.57% [
9]. Chen et al. proposed a deep learning architecture called INC-VGGN, which successfully achieved 91.83% accuracy on the public dataset PlantVillage [
10]. Mohanty et al. trained feature networks using tens of thousands of healthy plant leaf images and publicly available plant disease datasets, and the accuracy of plant disease classification reached 98.21% [
11]. The above study provides a wealth of experience to achieve an intelligent classification algorithm for
P. notoginseng diseases.
Table 1 is a comparison of recent work. However, to capture enough features for the visual tasks, tens of thousands of datasets are used to train the algorithm. At the same time, the number of samples in the training dataset greatly affects the effectiveness of deep learning algorithm training [
12].
Although large public databases of plant diseases, such as PlantVillage, Crop/Weed Field Image, MalayaKew, Leafsnap, etc., exist for researchers to use, they do not include
P. notoginseng image [
17]. In
P. notoginseng during cultivation, the probability of all diseases occurring at the same time is very low. Thus, it is a challenge to collect a large and diverse number of
P. notoginseng leaf disease images for disease classification. Therefore, to achieve high-precision classification of
P. notoginseng leaf diseases, the challenge of collecting
P. notoginseng leaf disease images need to be overcome. One effective strategy is data augmentation, which manually expands a dataset by creating different quality and diverse images from existing images of
P. notoginseng diseases, thereby enriching the dataset without the need for additional physical data collection [
18]. Data augmentation methods such as geometric (e.g., rotation, scaling, translation, flipping) and color transformations [
19] can be used to improve model performance, but they tend to produce highly correlated samples, which are unable to learn variable and invariant features between samples in the training data.
It has been found that image classification models trained with generated data can achieve better performance than using only real data [
20]. Generative Adversarial Networks (GANs) have gained increasing attention in agriculture due to their ability to generate natural images [
21]. For example, in Espejo-Garcia et al. [
22], deep convolutional GANs [
23] were utilized to generate new images for weed identification. The Xception network was trained using a synthetic weed images dataset and achieved an accuracy of 99.07% [
24]. Abbas et al. used a conditional GAN proposed by Mirza and Osindero to generate synthetic images of tomato plant leaves [
25,
26]. These synthetic images were used to classify ten categories of plant diseases. The classification accuracy was improved by 1–4% compared to the original data. However, GANs faces the challenge of training instability and model collapse [
27,
28] because it is difficult to achieve a balanced interaction between the generator and the discriminator [
29]. Recently, diffusion models have gained popularity for generating high-quality images [
30] and have been applied to a variety of fields, such as video generation [
31], the medical field [
32], and image conversion [
33]. Influenced by nonequilibrium thermodynamics [
34], the diffusion model progressively adds noise to the data and then uses Markov chains with white noise when generating samples. Recent studies have shown that diffusion models outperform GANs in generating high-fidelity images [
28].
In order to realize high-precision classification of P. notoginseng leaf diseases, we improve the dataset generation method and the classification model. Firstly, we propose an attention-based image generation model ADM-ECA, which uses a lightweight ECA attention mechanism to capture channel dependencies to improve the quality and efficiency of the generated model. Specifically, the model was evaluated using six major leaf disease datasets of P. notoginseng, including gray mold, powdery mildew, virus, anthrax, melasma, and round spot. However, experiments demonstrated that the mainstream classification model experiences a loss of detail when extracting high-dimensional features. This results in the inaccurate recognition of anthrax with smaller spots, as well as difficulty in distinguishing round spots and melasma, both of which are diseases with similar features. Therefore, we developed an Inception-SSNet hybrid classification model with skip connections, attention feature fusion, and self-calibrated convolution to better extract high-dimensional features. Using skip connections and attention feature fusion enables the model to effectively combine local and global information to better identify disease features with similar and small targets. At the same time, the self-calibration convolution enhances the ability of the model in feature learning, enabling it to extract and distinguish various disease features more accurately. Inception-SSNet provides competitive results, and the main contributions of this study are as follows:
An image generation scheme is proposed, which uses a lightweight attention mechanism, ECA, to capture the dependencies between channels to improve the quantity and quality of data sets.
To improve the performance of P. notoginseng disease classification, we develop an Inception-SSNet hybrid model with skip connection, attention feature fusion method, and self-calibrated convolutional method to classify six diseases: gray mold, powdery mildew, virus, anthrax, melasma, and round spot.
Experiments show that our proposed model for P. notoginseng leaf disease generation and disease classification achieve competitive performance and consistent performance improvement compared to the baseline.
3. Results
3.1. Experimental Setup
The experiments were conducted on a 64-bit Ubuntu 22.04.2 LTS system(Canonical Ltd. in London, United Kingdom), equipped with hardware of Intel Core i9-12900K (Intel Corporation, Santa Clara, CA, USA) model CPU and two NVIDIA GeForce RTX 4090 (NVIDIA Corporation, Santa Clara, CA, USA) model GPUs with Cuda 11.8.89 (NVIDIA Corporation, Santa Clara, CA, USA) parallel computing platform, and the programming language of Python 3.9.16 with the PyTorch 2.1.2 + cu118 deep learning framework. The image generation model and experimental data are trained on two NVIDIA RTX 4090 (NVIDIA Corporation, Santa Clara, CA, USA) graphics cards in a distributed manner. The batch size of each graphics card is set to 4, and the number of iterations is set to 250,000. Using the Adam optimizer, the learning rate is set to 2 × 10−4, and the global random seed is set to 42. The number of diffusion steps in the diffusion model is set to 1000. The image classification model epoch is set to 80, the batch size is set to 8, the Adam optimizer is used, the learning rate attenuation strategy is used, the initial learning rate is set to 1 × 10−3, and the learning rate is multiplied by the attenuation factor of 0.93 for every 2 rounds of training. The global random seed is set to 42.
In order to solve the problems of the poor training effect of the classification algorithm caused by insufficient training data and difficulty in collecting a large amount of data manually, we used the data set generation method based on the ADM-ECA model to expand the data set. According to the training and test requirements of the classification model, a generated data set for the classification task of
P. notoginseng leaf diseases is made. Specifically, the method of generating and expanding the data set is shown in
Figure 10. First, the original small sample data set is rotated. The original image and the rotated image are divided into a training set and a test set according to a ratio of 2:1, forming a non-generated data set, as shown in
Table 2. Secondly, the original data set trains the ADM-ECA generation model. After training, several samples are generated, and then the rotation data enhancement technology is used to rotate the generated sample set. In order to ensure the comparability of the test results with the non-generated data set, all the data generated and rotated are added to the training set, and the test set is kept unchanged. Finally, the generation data set of
P. notoginseng leaf disease is formed, as shown in
Table 2. Considering the imbalance of the original data, the model will be affected by the distribution of the training data, which will result in generating more images for the categories with a larger original sample size. In the training process of the generative model, small sample sizes, noise, or low-quality images will lead to the generation of images that do not conform to the real features, and we will remove these images. As a result, there will be instability in the number of datasets generated compared to the original dataset.
3.2. Comparative Experiments on Image Generation Model
In order to test whether the current mainstream image generation model can be competent for the task of generating the image data set of P. notoginseng leaf disease, five mainstream image generation models of WGAN, StyleGAN2, IDDPM, ADM, and DiT were trained with the original image dataset. After the training, 1530 images of P. notoginseng disease leaves were generated by the five models. The fidelity and diversity of the samples generated by each model were quantitatively evaluated by calculating the FID metrics of each model, and the images of P. notoginseng disease leaves generated by each model were subjectively evaluated and visualized.
The results of the FID metrics calculated for each model are shown in
Table 3. The ADM model generated the highest-quality image samples with an FID value of 168.94, which was 21.84%, 57.58%, 73.62%, 73.62%, and 129.47% higher than the StyleGAN2 (205.83), DiT (266.21), WGAN (293.32), and IDDPM (387.66) models, respectively.
The visualization results of the images generated by each model are shown in
Figure 11. The sample generated by the IDDPM model was wholly presented as a pure green image with irregular veins without forming any practical features. The false image generated by the WGAN model was presented as a blurred noise; only the color composition is similar to that of the actual image, without forming any practical features. The false image generated by the DiT model was presented as a cluttered and distorted color block without forming any practical features. The false image generated by the StyleGAN2 model can distinguish the leaf-like body and the blurred background features, which had vein-like lines. Still, the leaf contour was seriously distorted, and no effective spots were generated. The false image generated by the ADM model was the closest to the actual image, and it can distinguish the
P. notoginseng leaf, the spots, and the background features. However, the difference in details with the exact image was still more significant, and there were leaf blades, spots, and background features. However, the difference between the details and the actual image was still more significant, and there were the phenomena of slight distortion of leaf contour, wrong leaf vein texture, and wrong lesion characteristics, which are not in line with the actual
P. notoginseng diseased leaf characteristics. According to the experimental results, the image fidelity generated by these five mainstream image generation models was insufficient. Referring to the research on these models, it was found that the small amount of training samples may lead to poor learning of the data feature distribution [
28]. To generate the
P. notoginseng disease leaf image dataset and improve the learning ability of the model to the small sample disease features of
P. notoginseng leaves, further improvement is needed to generate images with high fidelity. Therefore, we proposed the ADM-ECA model, and the experimental results are shown in
Section 3.5.1.
3.3. Validation of the Effectiveness of Image Generation
In order to visually demonstrate the effectiveness of the data set generation method based on the ADM-ECA model, the RegNet model was used to extract the semantic features of the six diseases from the non-generated data set and the generated data set. Then, the two data sets were visualized using t-SNE technology [
43]. The results are shown in
Figure 12.
Figure 12 is the potential spatial feature distribution of six diseases in the non-generation and generation data sets. By observing (a), it can be found that the feature distribution of the first six disease samples before the data set is generated. There is a significant overlap between them, which makes it easy to confuse the model. It also reflects the situation that round spot disease and melasma are easily confused due to similar features; in (b), after the data set is generated, the samples of the same type can be well clustered together, and the settlement boundaries of each type are more accessible to distinguish than before. It effectively reduces the distribution distance within the class. It increases the distribution distance between different classes, avoiding confusion between classes, thereby improving the accuracy and robustness of the model classification. The results show the authenticity of the samples generated by our proposed ADM-ECA model and preliminarily prove the effectiveness of our proposed generation method.
3.4. Comparative Experiments on Classification Methods
In order to test the classification performance of the proposed Inception-SSNet model and the classification effect on different diseases, we compare Inception-SSNet with 11 mainstream classification models on the generated data set and the non-generated data set.
The experimental results in
Table 4 and
Figure 13 demonstrate that the image generation method proposed in this paper enhances the accuracy of the Inception-SSNet model by 2.40%. This improvement is attributed to the larger training set created by the generated data, enabling the models to learn richer feature representations, enhance generalization, and mitigate overfitting. Models like EfficientNet-b0, ShuffleNet v2, RegNet, and ConvNeXt show notable increases of over 15%, with EfficientNet-b0 leading with improvements in accuracy, precision, recall, and F1 score by 17.84%, 18.5%, 18.62%, and 18.92% respectively. MobileNet v2 and Inception-v4 also show improvements exceeding 10%, while Inception-v3 shows the smallest improvement at 1.48%. Overall, the results underscore the practical and universal effectiveness of the proposed data generation method across 11 mainstream image classification models.
Our proposed classification model, Inception-SSNet, compared with the baseline model, shows excellent performance on both non-generated and generated data sets. In the non-generated data set, the classification accuracy, precision, recall, and F1 score reached 97.04%, 97.48%, 98.37%, and 97.87%, respectively. In the generated dataset, the classification accuracy, precision, recall, and F1 score were increased by 2.4%, 1.82%, 1.25%, and 2.21%, respectively. Compared with the other 11 mainstream models, our proposed image classification model is also optimal in each index, proving our proposed method’s effectiveness in the classification of
P. notoginseng leaf diseases. Although the classification accuracy of our proposed model has reached 99.44% after image generation, we can further improve the flexibility of data collection and pay attention to the privacy of data. For example, we can apply Flexible Vis/NIR Sensing technology in future work [
15] to analyze multi-band data. The privacy of data can be combined with federated learning to ensure data security while maintaining the efficiency of the model [
16].
Table 5 shows the accuracy of the 12 models we tested and their corresponding 95% confidence intervals. From the results, we can see that the model we proposed shows a very high accuracy rate of 99.44%, and its confidence interval is (0.9819–0.9931), indicating that we have a high credibility of its accuracy estimation. Moreover, the confidence interval of the model is small, which further proves that our improvement is meaningful and reliable.
To further explore the effectiveness of the classification model for the classification of
P. notoginseng leaf diseases, we compared the accuracy of different models for different diseases in the non-generated data set and the generated data set. The test results are shown in
Table 6. For the non-generated data set, it can be seen from the test results that round spot and melasma are the disease types with the lowest classification accuracy. After using the data generation method, the classification accuracy gap between the round spot disease, the melasma, and the other four diseases has been reduced. Whether in the non-generated data set or the generated data set, the classification accuracy of each model for round spot and melasma was significantly lower than that of the other four diseases. This is because the lesion features of round spot disease and melasma are very similar and because the high-dimensional feature maps finally extracted by the model and passed to the classifier for judgment are usually small in size, which further leads to the loss of detailed information in the features, and ultimately makes the model difficult to distinguish. As shown in
Figure 14, taking the Inception-v3 model as an example, the final extracted feature map size is 8 × 8 pixels. It can be observed from the figure that the actual lesion morphology of round spot and black spot, as well as the high-dimensional feature maps extracted by Inception-v3 for these two disease images, are highly similar.
In order to solve this problem, we add skip connection, attention feature fusion, and self-calibration convolution to the Inception-V3 model and propose the Inception-SSNet classification hybrid model. After adding these modules, Inception-SSNet can better use local and global information, improve the model’s learning of features, and better distinguish the features of round spot and melasma. It can be seen from the experimental results that for the non-generated data set, the accuracy of Inception-SSNet in the classification of black spot and round spot can reach 98.48% and 97.6%, respectively, and the accuracy of the baseline model is increased by 1.51% and 4.81%, respectively. For the generated data set, the classification accuracy of black spot disease and round spot disease can reach 99.99% and 99.04%, respectively, and the accuracy of the baseline model is increased by 4.54% and 1.44%, respectively. This experiment proves the effectiveness of our proposed classification model and confirms the effectiveness of our proposed data generation method again.
3.5. Ablation Experiment
3.5.1. Different Attention Mechanisms in the Generation Model
To optimize the kinds of attention mechanisms that are most suitable for this architecture and the task of
P. notoginseng leaf disease image generation, four types of attention mechanisms—SE, CBAM, CA, and ECA—are applied to the UNnet network attention module of the ADM-AT architecture, respectively. They are constructed as ADM-SE, ADM-CBAM, ADM-CA, and ADM-ECA models. Experiments were conducted on
P. notoginseng leaf disease image generation using these four models, and the generated image visualizations were compared, as shown in
Table 7. Among them, the ADM-ECA model with the ECA attentional mechanism was the most effective, with an FID value of 42.73, which resulted in a performance improvement of 74.71% compared to the ADM model before the improvement. This is followed by CA (48.61), CBAM (55.09) [
44], and SE (60.73) [
45] attentional mechanisms with 71.23%, 67.39%, and 64.05% performance enhancement, respectively. The visualization results of the images generated by each model are shown in
Figure 15. All four models have realistic leaf contours, backgrounds, leaf vein textures, and correct characteristics. Of the six diseases, gray mold disease, powdery mildew, virus disease, anthrax, melasma, and round spot disease are visually readable and achieve the effect of generating images realistically. Therefore, we apply the ECA attention mechanism with optimal impact on the ADM-AT architecture to construct the ADM-ECA
P. notoginseng leaf disease image generation model and use it for the
P. notoginseng leaf disease image dataset expansion task.
3.5.2. Effectiveness of the Proposed Module on the Inception-SSNet Model
In this section, we validate the effectiveness of the proposed module for Inception-SSNet in terms of classification performance metrics and classification effects on different diseases. The dataset used in this part is the dataset after adding the generated images. The results of the ablation experiments on classification performance metrics are shown in
Table 8. Among them, all four metrics of the model, after adding skip connections, are significantly improved for the baseline model. The accuracy, precision, recall, and F1 score improved by 0.46%, 0.54%, 0.66%, and 0.6%, respectively. Skip connections can better use local and global information, and more disease features can be learned. The classification performance metrics are also improved after adding the other two modules. From the experimental results, it can be seen that the addition of skip connection, AFF module, and ScFeature can effectively improve the performance of the baseline model.
The results of the ablation experiments to categorize the different diseases are shown in
Table 9.
Table 9 shows that skip connection reduces the model’s classification accuracy for powdery mildew by 1.14%. In comparison, adding the AFF module improves the classification accuracy for powdery mildew by 1.14%. Similarly, adding the AFF module reduced anthrax accuracy by 3.57% and increased gray mold accuracy by 0.33%, while adding the ScFeature did the opposite. The complementary relationship between the classification performance shortcomings of these three modules for the six diseases resulted in the Inception-SSNet model proposed in this paper (i.e., the scheme of baseline + skip connection + AFF + ScFeature) having no reduction in the classification accuracy of all diseases compared to baseline. No reduction in the classification accuracies of melasma, round spot disease, viral disease, and anthrax was classified with 4.55%, 1.44%, 0.83%, and 0.71% higher accuracy, respectively.
To demonstrate more intuitively the influence of the hopping connection, AFF, and ScFeature adjuncts on the models, the final feature maps of the intermediate layers of each ablation model were visualized using the Grad-CAM algorithm, as shown in
Figure 16. The redder the color of the area, the more critical this part of the feature plays in the category determination.
The figure shows that, compared with the baseline, the model correctly identifies the spot location of round spot disease after using a skip connection. The extracted melasma and anthrax features align with the contour of the actual spot area, indicating that the model learns more detailed information. In contrast, the feature extraction effect of powdery mildew spots deteriorates. After adding the AFF module, the contours of powdery mildew, virus disease, and melasma features extracted by the model were added in more detail. They conformed to the shape of the actual contours. The extracted round spot feature information removed irrelevant noise and retained the focused information. Still, the extraction of anthrax features was omitted and only focused on a part of the spot. After adding the ScFeature, the extraction of anthracnose, gray mold, and melasma features extracted from the central stem part of the model was significantly enhanced, making it more consistent with the actual spot boundaries and adding more details. It is also worth noting that the spot features extracted by the ScFeature for these three diseases were successfully achieved to fill the omitted areas in the central stem part. In contrast, the remaining spot features of powdery mildew, viruses, and round spots were also unchanged by adding the ScFeature, as the previous model extraction with the addition of the AFF module was already very good.