Research on the Quality Grading Method of Ginseng with Improved DenseNet121 Model

Gu, Jinlong; Li, Zhiyi; Zhang, Lijuan; Yin, Yingying; Lv, Yan; Yu, Yue; Li, Dongming

doi:10.3390/electronics13224504

Open AccessArticle

Research on the Quality Grading Method of Ginseng with Improved DenseNet121 Model

by

Jinlong Gu

^1,2

,

Zhiyi Li

³,

Lijuan Zhang

^1,*

,

Yingying Yin

²,

Yan Lv

²

,

Yue Yu

² and

Dongming Li

^1,2,*

¹

College of Internet of Things Engineering, Wuxi University, Wuxi 214105, China

²

Institute of Information Technology, Jilin Agricultural University, Changchun 130118, China

³

School of Instrument Science and Electrical Engineering, Jilin University, Changchun 130015, China

^*

Authors to whom correspondence should be addressed.

Electronics 2024, 13(22), 4504; https://doi.org/10.3390/electronics13224504

Submission received: 21 October 2024 / Revised: 12 November 2024 / Accepted: 14 November 2024 / Published: 16 November 2024

(This article belongs to the Special Issue Applications of Artificial Intelligence in Image and Video Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Ginseng is an important medicinal plant widely used in traditional Chinese medicine. Traditional methods for evaluating the visual quality of ginseng have limitations. This study presents a new method for grading ginseng’s appearance quality using an improved DenseNet121 model. We enhance the network’s capability to recognize various channel features by integrating a CA (Coordinate Attention) mechanism. We also use grouped convolution instead of standard convolution in dense layers to lower the number of model parameters and improve efficiency. Additionally, we substitute the ReLU (Rectified Linear Unit) activation function with the ELU (Exponential Linear Unit) activation function, which reduces the problem of neuron death related to ReLU and increases the number of active neurons. We compared several network models, including DenseNet121, ResNet50, ResNet101, GoogleNet, and InceptionV3, to evaluate their performance against our method. Results showed that the improved DenseNet121 model reached an accuracy of 95.5% on the test set, demonstrating high reliability. This finding provides valuable support for the field of ginseng grading.

Keywords:

ginseng identification; deep learning; attention mechanism; computer vision; grouped convolution

1. Introduction

Ginseng (Panax ginseng C. A. Meyer), commonly known as ginseng, is a significant member of the Araliaceae family. The medicinal use of ginseng primarily involves its dry roots and rhizomes, which are renowned for their diverse pharmacological effects. In traditional Chinese medicine, ginseng holds a revered status, being utilized for a variety of health benefits, including tonifying the kidneys, calming the mind, enhancing cognitive function, improving eyesight, and promoting overall intellectual development [1]. This esteemed plant is often dubbed the “King of Herbs”, reflecting its unique medicinal properties and nourishing functions [2]. Ginseng’s classification is largely based on characteristics such as color, texture, and age, which can be quite complex. Accurate differentiation among various grades is crucial, as misidentification can lead to high-quality ginseng being inadvertently mixed with inferior products [3]. This not only undermines market fairness and transparency but also poses significant risks to consumers regarding drug efficacy and safety. As a result, the accurate identification of ginseng grades is essential, ensuring that consumers receive authentic, high-quality products. The ongoing research into ginseng further highlights its importance in both traditional and modern medicine, making it a valuable subject of study in academia and beyond.

Traditionally, ginseng classification has utilized methods such as trait identification [4], chromatographic identification [5], and molecular identification [6]. However, these methods demand substantial human resources and rely on highly specialized personnel for implementation. Additionally, the procedures are cumbersome, leading to slow identification speeds and reduced accuracy. Therefore, developing an accurate, efficient, and reliable detection method is imperative.

Deep learning is a machine learning technique utilizing multi-layer neural networks to mimic human brain functions. Its core principle involves automatic learning and analysis of data through hierarchical abstraction and representation [7]. Deep learning processes high-dimensional, nonlinear, and large-scale data, demonstrating adaptability and generalization, making it highly promising for plant classification and recognition. Huang et al. [8] utilized the AlexNet [9] model to classify images of five traditional Chinese herbs, achieving an average accuracy of 87.5%. Li et al. [10] integrated the ResNet101 [11] architecture with transfer learning to conduct recognition tasks on a large wild plant dataset, implementing dropout regularization and batch normalization and successfully attaining an accuracy of 85.6%. S. Pereira et al. [12] enhanced the AlexNet network by eliminating its last three classification layers and adding a new feature classification layer, achieving 89.75% accuracy in grape leaf recognition. Gui Yue [13] combined the ResNet50 [14] model with attention mechanisms and cross-layer bilinear pooling for plant image classification, achieving an impressive 95.62% accuracy rate across 12 plant image datasets. Chen et al. [15] enhanced the traditional VGG [16] architecture by extending the final convolutional layer and incorporating batch normalization, achieving over 90% accuracy on the commonly used rice leaf dataset. Kadir et al. [17] proposed a plant recognition method utilizing morphological open operations, combining features such as leaf texture, shape, and color, and employed probabilistic neural networks for classification, achieving an average accuracy of 93.75%. Li et al. [18] introduced an enhanced ConvNeXt method for ginseng classification, embedding a channel shuffle [19] module in the backbone network post-downsampling to integrate channel features fully. The original GELU activation function was replaced with PReLU [20], significantly enhancing model accuracy and efficiency, with experimental results demonstrating up to 94.44% accuracy on the white parameter dataset. Li et al. [21] developed a ginseng appearance grading method based on an improved ResNet-50, which showed excellent performance on a raw sun-dried ginseng dataset with an accuracy of up to 97%. Kim [22] et al.’s study revealed that a deep learning method using CLAHE preprocessing and the DenseNet121 model achieved a high accuracy of 95.11% in red ginseng grading, verifying its effectiveness in red ginseng grading without internal quality inspection. Chen et al. [23] introduced a reference-free image quality assessment method based on feature-level pseudo-reference hallucination, which provides accurate visual quality prediction by learning perceptually meaningful features from distorted images and exploiting natural image statistical behaviors. Wu et al. [24] built Co-Instruct, an open-source, open-visual-quality comparator, which achieved higher accuracy and more detailed quality comparison reasoning than existing models by collecting large-scale datasets and proposing new evaluation benchmarks. Kong et al. [25] designed a general and robust digital image forensics model to locate image manipulation by analyzing pixel inconsistency artifacts, optimized local and global pixel dependencies, and designed a learning weight module to improve forgery localization performance. Zhu et al. [26] developed a spatiotemporal interactive video quality assessment model that infers video distortion by integrating spatial features, temporal motion, and temporal flow, uses transformer networks to achieve motion-aware interactive learning, and demonstrates excellent performance on UGC videos.

This study presents an enhanced DenseNet121 framework designed specifically for the assessment of the appearance quality of ginseng, a crucial element in both its market value and medicinal efficacy. Section 2 provides a comprehensive overview of the dataset utilized in this research, detailing the sources, characteristics, and variety of ginseng samples included. It also elaborates on the data preprocessing techniques employed to ensure high-quality input for the model, including normalization, augmentation, and any relevant transformations necessary to enhance the robustness of the analysis.

In Section 3, a thorough examination of the enhanced DenseNet121 classification model is provided, highlighting the modifications made to the original architecture to optimize its performance for ginseng classification tasks. This includes discussions on feature extraction, layer adjustments, and the incorporation of additional training strategies aimed at improving the model’s accuracy and reliability.

Section 4 focuses on experimental validation, where various metrics are employed to evaluate the model’s performance. This section includes a detailed analysis of the results obtained, comparing them against baseline models to illustrate the effectiveness of the proposed framework.

Finally, Section 5 summarizes the key findings of the study, reflecting on the implications of the results for both academic research and practical applications in the ginseng industry. It also offers prospects for future research directions, emphasizing the potential for further enhancements in classification techniques and the exploration of additional quality attributes.

2. Dataset

2.1. Dataset Composition

In this study, we focused on building a high-quality ginseng image dataset to accurately evaluate ginseng quality. We selected forest ginseng as the research object, and its original data was collected in June 2023 from the Changbai Mountain ginseng planting base in Tonghua City, Jilin Province. To ensure the professionalism and accuracy of the evaluation, we hired senior experts to conduct a detailed quality evaluation and annotation of the collected forest ginseng samples according to the official ginseng grading criteria of Jilin Province (see Table 1), and divided them into three grades: special grade, first grade, and second grade. In the image acquisition process, we used a standardized small high-definition camera box and a Canon R10 digital camera to vertically shoot ginseng images against three different backgrounds of white, blue, and red to ensure the clarity and resolution of the images. In addition, we implemented a strict image quality control process to exclude low-quality images caused by poor shooting conditions. Based on the expert evaluation results, we accurately annotated and classified the ginseng images to ensure the accuracy and consistency of the annotations. Finally, we constructed a dataset containing 1123 photos of forest ginseng taken from multiple angles, including 357 special-grade samples, 391 first-grade samples, and 375 second-grade samples, providing detailed visual data for ginseng quality assessment.

2.2. Dataset Preprocessing

Addressing the issue of imbalanced data categories is essential for effective model training. This article presents two preprocessing methods—offline enhancement and online enhancement—designed to mitigate dataset imbalance. As shown in Figure 1, first of all, offline enhancement techniques are applied to ginseng images. These techniques involve operations such as random rotation, vertical flipping, contrast adjustment, and the addition of noise, effectively expanding the dataset. As a result of offline enhancement, the number of images has quadrupled, enhancing category representation and dataset diversity. Second, the experiment employs an online enhancement method utilizing the Python Imaging Library (PIL) for image processing. Prior to each training iteration, images are cropped to a uniform size of 256 × 256 pixels, followed by center cropping and normalization. These operations further enhance the diversity and balance of the dataset. The integration of offline and online enhancement methods addresses dataset imbalance and enhances the model’s robustness and generalization. By adding more samples, the model can effectively learn the distinguishing features among categories. Furthermore, increasing dataset diversity helps reduce overfitting during training.

2.3. Dataset Partitioning

To minimize variability in model evaluation, this experiment employs the five-fold cross-validation method for all model training. The dataset is divided into five parts, with an 8:2 ratio for training and validation sets. Four parts are utilized for training, while one part is designated for validation. Five experiments are performed sequentially, and the final result presented in this article is the average of these five experiments. Given the limited number of original samples, the validation set serves as a substitute for the test set in this experiment. Details can be found in Table 2.

3. Building the Network Model

3.1. Original Network Model

DenseNet121, proposed by Gao Huang et al. [27] in 2017, enhances traditional network architectures. Unlike traditional architectures, DenseNet121 employs dense connections, allowing for better extraction of rich, high-level features and significantly improving network performance. Dense connections design ensures that each layer’s output connects to all previous layers’ inputs, enabling direct access to their feature maps. DenseNet121 fully utilizes information from previous layers, minimizing feature loss and enhancing feature reuse compared to traditional structures. This connection method effectively reduces computational redundancy and enhances the network’s efficiency. Consequently, DenseNet121 is selected as the backbone network for the task of sub-forest parameter classification. Its ability to extract rich features while reducing computational burden significantly enhances model performance. Given the structural similarities of understory ginseng across different grades, DenseNet121’s robust feature extraction capabilities effectively minimize information loss, enhancing classification and recognition accuracy.

When establishing the benchmark, we compared a variety of backbone networks including ResNet, VGG, and Inception. Experimental results show that DenseNet121 outperforms other networks in key performance indicators such as accuracy, recall, and F1 score. We believe that the dense connection mode and feature reuse mechanism of DenseNet121 enable it to capture the characteristics of ginseng images more effectively, especially when dealing with small sample datasets.

The images of the ginseng dataset are of high resolution and have complex backgrounds, requiring the model to have strong feature extraction capabilities. The deep structure and feature reuse mechanism of DenseNet121 enable it to perform well in dealing with these challenges. We also performed meticulous hyperparameter tuning and model training on DenseNet121 to ensure its optimal performance on the ginseng dataset.

3.2. Ginseng Network Classification Model

Our improvements primarily target three aspects of the original network. First, we integrated the CA [28] mechanism before the normalization layer of each dense block to enhance focus on key sample information. The CA mechanism allows the network to establish weights based on the local adaptability of input features, enabling targeted capture of features from different regions. This enhancement increases the model’s sensitivity and adaptability to varying feature distributions across samples. Second, we employed grouped convolution to replace the 1 × 1 convolution operation in dense layers. Traditional convolution operations incur high computational costs; however, grouped convolution allows us to break them down into smaller operations, reducing model parameters and enhancing computational speed. This enhancement reduces the model’s computational complexity while improving its operational efficiency. Lastly, we replaced all ReLU activation functions in the dense layer with ELU activation functions [29]. During training, the ELU activation function effectively mitigates the issue of neuron death, accelerates convergence, and enhances the model’s generalization capability. This improvement enables the model to learn nonlinear features more effectively and enhances its expressive power. The enhanced model is illustrated in Figure 2.

3.3. CA Attention Mechanism

In the field of computer vision, attention mechanisms are crucial to improving the performance of image classification and object detection tasks. They enable models to focus on key areas in images and enhance the accuracy and robustness of predictions. This paper introduces the CA mechanism [28], which aims to improve feature extraction, especially the ability to locate object structures in image classification and object detection, by strategically enhancing the model’s attention to spatial information. The CA mechanism generates attention weights by aggregating horizontal and vertical features, thereby optimizing the model’s attention to key components such as ginseng images and improving classification performance. In addition, the neural attention mechanism [30] generates attention maps through the partial derivatives of the classification output to optimize object detection; the multi-attention mechanism [31] uses dual-path, dual-attention modules and cross-modal transformer modules to solve the problem of reference object segmentation in compressed video streams; the structural attention mechanism [32] enhances the ability of the transformer model to synthesize structural details in unpaired medical image synthesis by integrating structural prior knowledge. These attention mechanisms have a wide range of applications, from general image processing to domain-specific compressed video segmentation and medical image synthesis, involving different types of data, including regular images and medical scans. Through the integration and optimization of these mechanisms, the limitations of the original network architecture in identifying important features have been significantly improved. Specifically:

Firstly, for input

x

encode each channel horizontally and vertically using pooling kernels of size (

H

, 1) and (1,

W

), as follows:

z_{c}^{h} (h) = \frac{1}{W} \sum_{0 \leq i < W} x_{c} (h, i)

(1)

z_{c}^{w} (w) = \frac{1}{H} \sum_{0 \leq j < H} x_{c} (j, w)

(2)

In the formula,

w

is width,

h

is height, and

c

is the number of channels.

The two feature maps generated by the module before cascading are transformed using a shared 1 × 1 convolution

F_{1}

as follows:

f = δ (F_{1} ([z^{h}, z^{w}]))

(3)

In the formula, δ is the nonlinear activation function, and

f \in R^{C / r \times (H + W)}

is the intermediate feature map.

Next, f is divided into two separate tensors

f^{h} \in R^{C / r \times H}

and

f^{w} \in R^{C / r \times W}

along the spatial dimension. Then, using two 1 × 1 convolutions F_h and F_w, the feature maps

f^{h}

and

f^{w}

are transformed into the same number of channels as the input

x

. The following results are obtained:

g^{h} = σ (F_{h} (f^{h}))

(4)

g^{w} = σ (F_{w} (f^{w}))

(5)

Finally, expand

g^{h}

and

g^{w}

as attention weights, and the final output is as follows:

y_{c} (i, j) = x_{c} (i, j) \times g_{c}^{h} (i) \times g_{c}^{w} (j)

(6)

Figure 3 shows the CA mechanism, by incorporating the channel attention mechanism, our model is empowered to automatically learn the significance and relationships of pixels across various positions in ginseng images. The CA mechanism enables the model to focus on critical areas within these images, such as stems, roots, and leaves, while effectively disregarding irrelevant regions that may introduce noise. This targeted attention enhances the model’s feature extraction capabilities, allowing it to accurately discern the distinctive characteristics of ginseng. As a result, the model becomes more robust against interference from other plants or distracting backgrounds, ultimately leading to improved classification accuracy and reliability in assessing ginseng quality.

3.4. Group Convolution

Group convolution involves partitioning input feature maps and convolving each group independently. Assuming the input feature map size remains C × H × W and the number of output feature maps is N, dividing it into G groups results in C/G input feature maps per group, N/G output feature maps per group, and each convolution kernel being sized (C/G) × K × K. The total number of convolution kernels remains N, with N/G kernels per group, convolving only with the respective group’s input feature maps. The total number of parameters for the convolution kernels is N × (C/G) × K × K, significantly reducing the overall parameter count to 1/G of the original. The model is illustrated in Figure 4. Replacement group convolution is an enhanced method designed to tackle the high computational complexity and parameter count of standard convolutions, reducing both while preserving model accuracy. In ginseng recognition tasks, employing replacement group convolution facilitates quicker feature extraction, enhancing both recognition accuracy and model speed. Given the high resolution and intricate texture features of ginseng images, replacement group convolution effectively mitigates overfitting and boosts the model’s generalization capability. Additionally, replacement group convolution can enhance the model’s computational speed and memory efficiency.

In addition, grouped convolution can also be applied to efficient action recognition in spatiotemporal networks, and efficient computational design for high-performance gesture recognition can be achieved through effective and efficient temporal modeling.

(1): Combination of grouped convolution and spatiotemporal interleaved networks [33] TCN (Temporal Convolutional Network):

Grouped convolution is applied to spatiotemporal interleaved networks to expand the receptive field of temporal modeling, while ensuring the model is lightweight and modeling more contextual information. This shows that grouped convolution can be combined with TCN to improve the efficiency and performance of action recognition.

(2): Application of grouped convolution in action recognition [34]:

Grouped convolution can improve the efficiency of action recognition by reducing the number of parameters and the amount of computation. In some implementations, in order to reduce the amount of computation, two layers of convolution are used to reduce the dimension of the input feature sequence, which is then input into the encoder–decoder structure, in which grouped convolution and channel shuffle operations are used.

(3): Application of grouped convolution in gesture recognition [33]:

In the field of gesture recognition, a two-stream network 3D darknet based on the darknet algorithm fused with a TCN is proposed for dynamic gesture recognition in videos. This method combines the powerful image feature extraction capability of the darknet network with the TCN network to extract short-term spatiotemporal features. Through an adaptive weight fusion strategy, the short-term spatiotemporal features are fused with the long-term time features to achieve video gesture recognition.

3.5. ELU Activation Function

The original Densenet121 network employs ReLU as its activation function, which effectively captures the nonlinear relationships in the data. This enables the neural network to learn complex patterns and features. The mathematical expression is as follows:

R e L U (x) = \{\begin{array}{l} 0 x < 0 \\ x x \geq 0 \end{array}

(7)

Figure 5 illustrates the limitations of ReLU. For negative input values, ReLU truncates the output to 0, potentially resulting in issues such as neuron death. In contrast to the traditional ReLU function, the ELU [29] activation function incorporates a slope α, defined as follows:

f (x) = \{\begin{matrix} x & x \geq 0 \\ α (e^{x} - 1) & x < 0 \end{matrix}

(8)

The ELU function generates non-zero output for negative input values, thus preventing neuron death and allowing the neural network to utilize the outputs of all neurons more effectively. By maintaining a response for negative input values, ELU can extract richer information from the data. Compared to ReLU, ELU captures complex nonlinear relationships more effectively and enhances the model’s fitting ability. Therefore, we select ELU as the activation function for the model’s convolutional layer.

4. Experiment and Result Analysis

4.1. Experimental Environment

The experimental configuration for ginseng appearance identification includes the following hardware: CPU: i7-11700 K (16 cores, 3.60 GHz) (Intel, Santa Clara, CA, USA). The GPU is an NVIDIA GeForce GTX 1080 Ti (NVIDIA, Santa Clara, CA, USA) with 12 GB of memory. The deep learning framework utilized is based on Python 1.8.2 (GPU) and built on the Ubuntu 18.04.6 operating system, along with Python 3.8.16. Details of this configuration are presented in Table 3.

4.2. Experiments on Attention Mechanisms

4.2.1. The Performance of Attention Mechanisms in Different Positions

This study investigates the impact of various positions in the enhanced DenseNet121 model by incorporating the CA mechanism. DenseNet121 comprises four dense blocks, each containing multiple dense layers. We introduce the CA mechanism at several points: before and after the normalization layer, after the first convolution layer, and before the second convolution layer, evaluating its effectiveness experimentally (see Table 4). Results indicate that the CA mechanism’s effectiveness varies depending on its position within the model. Specifically, applying the attention mechanism before the normalization layer yielded the best results, achieving a ginseng recognition rate of 94.4%. However, when the attention mechanism was applied before the second convolution layer, model performance did not exceed that of the original model. The complex structure of ginseng, with its intricately layered roots, stems, and leaves, influences feature extraction. Thus, incorporating CA early in the dense layers effectively extracts relevant features, aiding ginseng recognition. Conversely, applying CA at the end of the dense layer does not capture key features effectively, negatively impacting ginseng recognition. In summary, this experiment identifies the optimal strategy for introducing attention mechanisms before the normalization layer.

4.2.2. Comparison Between Different Attention Mechanisms

This experiment evaluates the impact of various attention mechanisms on ginseng grading model performance. We selected four attention mechanisms for fusion: ECA (efficient channel attention) [35], SK (selective kernel) [36], CBAM (convolutional block attention module) [37], and CA (the attention mechanism used in this article) [28]. Experimental results in Table 5 demonstrate that integrating the ECA mechanism significantly enhances model performance. The ECA mechanism dynamically adjusts weights based on local channel characteristics without relying on global information. Given the structural complexity and heterogeneity of ginseng data, ECA effectively emphasizes local information, significantly improving classification performance. Figure 6 shows the heat map generated by the model.

In contrast, integrating the SK mechanism leads to negative outcomes. The SK mechanism’s effectiveness largely depends on dataset characteristics and specific tasks. With limited ginseng sample data, the SK mechanism may struggle to learn effective strategies for selecting convolution kernel sizes, resulting in decreased classification performance. Integrating the CBAM mechanism has a positive impact. CBAM effectively combines channel and spatial attention mechanisms, enhancing the information extraction capability of input feature maps. However, the effectiveness of CBAM is limited by its reliance on existing CNN structures, preventing it from achieving expected results in specific ginseng classification models. The model that integrates the CA mechanism achieves optimal performance. Ginseng data exhibit multidimensional and complex feature structures, and the CA mechanism effectively learns correlations between different dimensions and modalities, enhancing the model’s ability to process complex data. The CA mechanism dynamically learns channel correlations to effectively extract and express key features, addressing the challenges posed by the diverse features of ginseng data. Therefore, we select the CA mechanism as a key strategy to optimize model performance.

4.3. Experimental Study on Grouped Convolution

The Effect of Grouped Convolution at Different Positions

This study investigates how different combinations of grouped convolutions affect the performance of deep learning models. We replaced the 1 × 1 and 3 × 3 convolutions in dense layers with grouped convolutions, forming three distinct structures. As shown in Table 6, experimental results indicate that substituting the 1 × 1 convolution with grouped convolution improves model accuracy by 0.004 and reduces the number of parameters by 29%, yielding optimal performance. Replacing the 3 × 3 convolution resulted in a 0.001 increase in accuracy and a 15% reduction in parameters. However, the attention mechanism had a limited positive effect when replacing the 3 × 3 grouped convolutions before the normalization layer, resulting in suboptimal performance. Finally, replacing both traditional convolutions reduced the number of parameters by 44%, but accuracy significantly declined due to insufficient parameters, limiting overall model performance. Considering both model performance and parameter count, we selected the G1D2 scheme as the final network structure for the ginseng grade classification model.

4.4. Experiment on Activation Function

Comparison Between Different Activation Functions

This study investigates model overfitting due to the small sample characteristics and deep features of DenseNet121, proposing an optimization solution using various activation functions. We selected four activation functions, ReLU [9], PReLU (Parametric Rectified Linear Unit) [20], SiLU (Sigmoid Linear Unit) [38], and ELU [29], and compared their performance. The experimental results are presented in Table 7. Although ReLU is the default activation function for DenseNet121 and offers fast computation, it failed to achieve the expected convergence within 40 epochs with unevenly distributed ginseng sample data. PReLU addresses the dead ReLU problem and enhances the model’s generalization ability by introducing a learnable slope parameter, allowing some negative inputs. Compared to standard ReLU, PReLU converged faster within 37 epochs, yet it still struggled with the complexities of ginseng data. SiLU merges the smoothing properties of the sigmoid function with the nonlinearity of ReLU, maintaining stable gradients during backpropagation. While SiLU converged by the 38th epoch, its performance was slightly inadequate for handling the uneven distribution of ginseng samples. ELU, like PReLU, accommodates negative inputs and possesses the smoothness and nonlinearity of SiLU, effectively addressing the complexities and uneven distributions in ginseng data. It achieved rapid convergence and the highest recognition rate by the 35th epoch. Therefore, we ultimately selected ELU as the activation function for the ginseng classification network.

4.5. Performance Comparison of Different Network Models

DenseNet121 [27], ResNet50 [14], ResNet101 [11], GoogleNet [39], and InceptionV3 [40] are all classic deep learning models with numerous architectural variations. Therefore, we can intuitively compare the performance of these architectures in ginseng classification to verify the improvements made to DenseNet121. This analysis involves comparing DenseNet121, ResNet50, ResNet101, GoogleNet, and InceptionV3 with the improved DenseNet121. The results are shown in Figure 7 and Figure 8 and Table 8. The improved DenseNet121 outperforms all models on the four-evaluation metrics, with an average accuracy of 95.5%, which is at least 1.20 percentage points higher than other networks, and an overall average accuracy of 95.4%. InceptionV3 has the second highest accuracy, and ResNet101 has the lowest accuracy, with an average accuracy of 85.1% and a recall of 84.9%, which is 9.90 percentage points lower than our model.

ResNet50 comprises multiple residual blocks, each containing several convolutional layers. It introduces skip connections and residual mapping, effectively addressing the issues of gradient vanishing and explosion during deep neural network training. ResNet101 is deeper than ResNet50, featuring a total of 101 layers, which enables it to capture more complex and abstract features, enhancing its feature expression capability for intricate tasks. GoogleNet employs the Inception module, utilizing multiple convolutional kernels of varying sizes to capture features at different scales simultaneously. Each Inception module comprises several parallel convolutional branches whose outputs are concatenated along the channel dimension, thereby improving the network’s feature representation capability. InceptionV3 builds upon the core principles of InceptionV1, employing more complex and sophisticated Inception modules to capture features through multi-scale convolution kernels and pooling operations. The accuracies of ResNet50, ResNet101, GoogleNet, and InceptionV3 are lower than that of the improved DenseNet121 model. DenseNet121 features dense connections, allowing each layer’s feature map to connect directly to the inputs of all subsequent layers, facilitating efficient feature reuse.

In contrast, while ResNet50 utilizes residual connections, its feature transfer mechanism is less dense than that of DenseNet121, potentially leading to some inefficiencies in feature utilization. The branching structure of GoogleNet and InceptionV3 enhances the network’s width and representation capability; however, the efficiency of information exchange between branches does not match the dense connections of DenseNet. Therefore, for ginseng classification tasks, the improved DenseNet121 model is more appropriate.

Figure 9 shows the confusion matrix of the original and improved DenseNet121 models. In this matrix, “Label 0” denotes premium ginseng, “Label 1” denotes first-level ginseng, and “Label 2” denotes second-level ginseng. The horizontal axis indicates true labels, while the vertical axis shows predicted labels. The model achieved over 95% accuracy for premium, first-level, and second-level ginseng. However, prediction errors primarily involved misclassifying special-grade ginseng as second grade and misclassifying second-grade ginseng as either special grade or first grade. While first-level ginseng exhibited the highest correct recognition rate, its error rate was also notable, suggesting that further optimization is needed for this category. Future efforts will concentrate on a detailed analysis of the structural characteristics of premium and secondary ginseng to enhance the model’s recognition performance in these categories, thus improving overall accuracy.

5. Conclusions

Recent advancements in deep learning have yielded significant improvements in ginseng recognition. The automation and efficiency of deep learning have enhanced both the speed and accuracy of ginseng classification, positively impacting the market. This study presents an enhanced DenseNet121 model aimed at tackling the challenges of ginseng classification to achieve accurate and efficient recognition. We employed several techniques to enhance model performance in this study.

We first incorporated the CA attention mechanism into the dense blocks, significantly enhancing the model’s accuracy. Results indicated a 1.3% increase in accuracy, confirming the effectiveness of this approach for ginseng classification. Next, we substituted the initial 1 × 1 convolution in the dense block with grouped convolution, which reduced the model’s parameter count by 29% while enhancing accuracy through improved cross-channel information exchange. This refined convolutional structure allows the model to capture feature correlations more effectively, thereby improving classification accuracy.

Additionally, we replaced the traditional ReLU activation function with the ELU activation function. Results showed a 0.7% accuracy improvement with the ELU activation function, highlighting its benefits in addressing gradient vanishing issues and enhancing detailed feature representation. Ultimately, our enhancements to the DenseNet121 model yielded significant results on the test set. This model attained an accuracy of 95.5% in ginseng grade classification tasks, surpassing other well-known classification models such as DenseNet121, ResNet50, ResNet101, GoogleNet, and InceptionV3. It showcases superior generalization ability and robustness for accurate recognition.

Future research can explore various avenues. First, new evaluation metrics could be developed to encompass comprehensive factors affecting ginseng quality, including growth environment, soil quality, and pesticide residues, for a more thorough classification. Second, optimizing data collection and processing through automation technology and machine vision could enhance accuracy and efficiency. Furthermore, integrating big data with artificial intelligence algorithms for data analysis and mining can further optimize the ginseng grade classification algorithm, enhancing accuracy and effectiveness. These advancements will contribute to more scientific and accurate ginseng grade classification, promoting development and application in this field.

Author Contributions

Conceptualization, D.L. and L.Z.; methodology, Z.L.; software, Y.Y. (Yingying Yin) and D.L.; validation, Z.L. and Y.Y. (Yue Yu); formal analysis, Y.Y. (Yue Yu) and L.Z.; investigation, J.G. and Y.L.; resources, Y.Y. (Yingying Yin) and Z.L.; data curation, D.L.; writing—original draft, J.G. and Y.Y. (Yingying Yin); writing—review and editing, J.G. and Y.Y. (Yue Yu); visualization, J.G. and L.Z.; supervision, D.L. and L.Z.; project administration, Z.L. and Y.L.; funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (61806024, 62206257); Jilin Province Science and Technology Development Plan Key Research and Development Project (20210204050YY); Wuxi University Research Start-up Fund for Introduced Talents (2023r004, 2023r006); Jiangsu Engineering Research Center of Hyperconvergence Application and Security of IoT Devices; Wuxi City Internet of Vehicles Key Laboratory.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The datasets generated for this study are available upon request from the corresponding author.

Acknowledgments

We thank the anonymous reviewers for their helpful and constructive comments.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, F.; Bao, H. Herbal Textual Research and progress on pharmacological actions of Ginseng Radix et Rhizoma. Ginseng Res. 2017, 29, 43–46. [Google Scholar]
Liu, W.; Li, W. Review on industrialization development status and prospect of panax ginseng processing. J. Jilin Agric. Univ. 2023, 45, 639–648. [Google Scholar]
Chen, J.; Yang, L.; Li, R.; Zhang, J. Identification of Panax japonicus and its related species or adulterants using ITS2 sequence. Chin. Tradit. Herb. Drugs 2018, 49, 9. [Google Scholar]
Chen, K.; Huang, L.; Liu, Y. Development history of methodology of Chinese Medicines’ Authentication. China J. Chin. Mater. Med. 2014, 39, 1203–1208. [Google Scholar]
Xu, S.; Sun, G.; Mu, S.; Sun, Q. Fingerprint Comparison of Mountain Cultivated Ginseng and Wild Ginseng by HPLC. J. Chin. Med. Mater. 2013, 36, 213–216. [Google Scholar]
Hua, Y.; Geng, C.; Wang, S.; Liu, X. Analysis of Gene Expression of Pseudostellariae Radix from Different Provenances and Habitats Based on cDNA-AFLP. Nat. Prod. Res. Dev. 2016, 28, 188. [Google Scholar] [CrossRef]
Geng, L.; Huang, Y.; Guo, Y. Apple variety classification method based on fusion attention mechanism. Trans. Chin. Soc. Agric. Mach. 2022, 53, 304–310. [Google Scholar]
Huang, F.; Yu, L.; Shen, T.; Xu, H. Research and Implementation of Chinese Herbal Medicine Plant Image Classification Based on AlexNet Deep Learning Mode. J. Qilu Univ. Technol. 2020, 34, 44–49. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Li, L.P.; Shi, F.P.; Tian, W.B.; Chen, L. Wild plant image recognition method based on residual network and transfer learning. Radio Eng 2021, 51, 857–863. [Google Scholar]
Ghosal, P.; Nandanwar, L.; Kanchan, S.; Bhadra, A.; Chakraborty, J.; Nandi, D. Brain Tumor Classification Using ResNet-101 Based Squeeze and Excitation Deep Neural Network. In Proceedings of the 2019 Second International Conference on Advanced Computational and Communication Paradigms (ICACCP), Gangtok, India, 25–28 February 2019; pp. 1–6. [Google Scholar]
Pereira, C.S.; Morais, R.; Reis, M.J. Deep learning techniques for grape plant species identification in natural images. Sensors 2019, 19, 4850. [Google Scholar] [CrossRef] [PubMed]
Gui, Y. Classification and Recognition of Crop Seedings and Weeds Based on Attention Mechanism. Master’s Thesis, Anhui Agricultural University, Hefei, China, 2020. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Chen, J.; Chen, J.; Zhang, D.; Sun, Y.; Nanehkaran, Y.A. Using deep transfer learning for image-based plant disease identification. Comput. Electron. Agr. 2020, 173, 105393. [Google Scholar] [CrossRef]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
Kadir, A.; Nugroho, L.E.; Susanto, A.; Santosa, P.I. Leaf classification using shape, color, and texture features. arXiv 2013, arXiv:1401.4447. [Google Scholar]
Li, D.; Zhai, M.; Piao, X.; Li, W.; Zhang, L. A Ginseng Appearance Quality Grading Method Based on an Improved ConvNeXt Model. Agronomy 2023, 13, 1770. [Google Scholar] [CrossRef]
Ding, X.; Chen, H.; Zhang, X.; Han, J. Repmlpnet: Hierarchical vision mlp with re-parameterized locality. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 578–587. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In Proceedings of the 2015 IEEE International Conference On Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1026–1034. [Google Scholar]
Li, D.; Piao, X.; Lei, Y.; Li, W.; Zhang, L.; Ma, L. A Grading Method of Ginseng (Panax ginseng C. A. Meyer) Appearance Quality Based on an Improved ResNet50 Model. Agronomy 2022, 12, 2925. [Google Scholar] [CrossRef]
Kim, M.; Kim, J.; Kim, J.S.; Lim, J.; Moon, K. Automated Grading of Red Ginseng Using DenseNet121 and Image Preprocessing Techniques. Agronomy 2023, 13, 2943. [Google Scholar] [CrossRef]
Chen, B.; Zhu, L.; Kong, C.; Zhu, H.; Wang, S.; Li, Z. No-reference image quality assessment by hallucinating pristine features. IEEE Trans. Image Process. 2022, 31, 6139–6151. [Google Scholar] [CrossRef]
Wu, H.; Zhu, H.; Zhang, Z.; Zhang, E.; Chen, C.; Liao, L.; Li, C.; Wang, A.; Sun, W.; Yan, Q. Towards open-ended visual quality comparison. arXiv 2024, arXiv:2402.16641. [Google Scholar]
Kong, C.; Luo, A.; Wang, S.; Li, H.; Rocha, A.; Kot, A.C. Pixel-inconsistency modeling for image manipulation localization. arXiv 2023, arXiv:2310.00234. [Google Scholar]
Zhu, H.; Chen, B.; Zhu, L.; Wang, S. Learning spatiotemporal interactions for user-generated video quality assessment. IEEE Trans. Circuits Syst. Video Technol. 2022, 33, 1031–1042. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13708–13717. [Google Scholar]
Djork-Arné, C.; Unterthiner, T.; Hochreiter, S. Fast and accurate deep network learning by exponential linear units (elus). arXiv 2015, arXiv:1511.07289. [Google Scholar]
Ge, C.; Song, Y.; Ma, C.; Qi, Y.; Luo, P. Rethinking attentive object detection via neural attention learning. IEEE Trans. Image Process. 2023, 33, 1726–1739. [Google Scholar] [CrossRef] [PubMed]
Chen, W.; Hong, D.; Qi, Y.; Han, Z.; Wang, S.; Qing, L.; Huang, Q.; Li, G. Multi-attention network for compressed video referring object segmentation. In Proceedings of the 30th ACM International Conference on Multimedia, Lisboa, Portugal, 10–14 October 2022; pp. 4416–4425. [Google Scholar]
Phan, V.M.H.; Xie, Y.; Zhang, B.; Qi, Y.; Liao, Z.; Perperidis, A.; Phung, S.L.; Verjans, J.W.; To, M. Structural Attention: Rethinking Transformer for Unpaired Medical Image Synthesis. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Marrakesh, Morocco, 6–10 October 2024; Springer: Berlin/Heidelberg, Germany, 2024; pp. 690–700. [Google Scholar]
Yi, Y.; Ni, F.; Ma, Y.; Zhu, X.; Qi, Y.; Qiu, R.; Zhao, S.; Li, F.; Wang, Y. High Performance Gesture Recognition via Effective and Efficient Temporal Modeling. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 1003–1009. [Google Scholar]
Jiang, S.; Zhang, H.; Qi, Y.; Liu, Q. Spatial-Temporal Interleaved Network for Efficient Action Recognition. IEEE Trans. Ind. Inform. 2024, 1–10. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient Channel Attention for Deep Convolutional Neural Networks. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 11531–11539. [Google Scholar]
Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
Woo, S.; Park, J.; Lee, J.; Kweon, I. CBAM: Convolutional Block Attention Module. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. Neural Netw. Off. J. Int. Neural Netw. Soc. 2017, 107, 3–11. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Liu, Z.; Yang, C.; Huang, J.; Liu, S.; Zhuo, Y.; Lu, X. Deep learning framework based on integration of S-Mask R-CNN and Inception-v3 for ultrasound image-aided diagnosis of prostate cancer. Future Gener. Comput. Syst. 2021, 114, 358–367. [Google Scholar] [CrossRef]

Figure 1. Ginseng dataset. (a) Original dataset; (b) Dataset after data enhancement.

Figure 2. Improved Network Model. In the diagram, the symbol * denotes the convolution operation. Specifically, group convolution divides the input channels into several groups, and then performs the convolution operation on each group using distinct convolutional kernels.

Figure 3. Coordinate Attention Mechanism.

Figure 4. Schematic diagram of group convolution.

Figure 5. Activation function comparison chart.

Figure 6. Visualization results of the thermal characteristic diagram of the new network before and after adding the CA module. (a) Input image; (b) image before adding the CA module; (c) image after adding the CA module.

Figure 7. Accuracy of different models.

Figure 8. (a) Loss of different models; (b) The loss between training and validation.

Figure 9. (a) Original confusion matrix; (b) Improved confusion matrix.

Table 1. Ginseng Grading Criteria.

Project	Principal	First-Class	Second-Class
Main Root	Cylindrical
Branch Root	There are 2~3 obvious branched roots, and the thickness is more uniform		One to four branches, coarser and finer
Rutabaga	Complete with reed head and ginseng fibrous roots	The reed head and ginseng fibrous roots are more complete	Rutabaga and ginseng with incomplete fibrous roots
Groove	Clear and obvious grooves	Not obvious, distinct groove	Without grooves
Section	Section neat, clear	Segment is obvious	Segments are not obvious
Surface	Yellowish-white or grayish-yellow, no water rust, no draw grooves	Yellowish-white or grayish-yellow, light water rust, or with pumping grooves	Yellowish-white or grayish-yellow, slightly more water rust, with pumping grooves
Texture	Harder, powdery, non-hollow
Springtails	Square or rectangular	Made conical or cylindrical	Irregular shape
Insects, Mildew, Impurities	None	Mild	Presence

Table 2. Dataset Partitioning.

Level	Train	Val
Principal	1428	357
First-class	1564	391
Second-class	1500	375
Total	4492	1123

Table 3. Model Parameter Setting.

Parameter	Set Up
Optimizer	Adam
Learning rate	0.0001
Weight decay	0.0001
Batch size	32
Epoch	100
Loss function	CrossEntropyLoss

Table 4. Comparison of attention mechanisms in different positions.

Num	Location	Accuracy	AUC	Loss
1	No-Attention	0.931	0.942	0.029
2	BN-Before	0.944	0.953	0.028
3	BN-After	0.936	0.941	0.034
4	Conv1-After	0.942	0.951	0.031
5	Conv2-Before	0.929	0.938	0.036

Table 5. Comparison of different attention mechanisms.

Num	Module	Accuracy	AUC	Loss
1	ECA	0.941	0.949	0.028
2	SK	0.928	0.932	0.034
3	CBAM	0.935	0.939	0.031
4	CA	0.944	0.953	0.029

Table 6. The impact of grouping convolutions at different positions.

Num	Location	Accuracy	Loss	AUC	Params
A	D1D2	0.944	0.029	0.953	6.95 M
B	G1D2	0.948	0.027	0.959	4.93 M
C	D1G2	0.945	0.031	0.951	5.88 M
D	G1G2	0.931	0.035	0.944	3.87 M

Note: “D1D2” represents traditional convolution, and “G1D2” represents grouped convolution replacing 1 × 1 convolution in dense layers. “D1G2” represents the replacement of 3 × 3 convolutions in dense layers with grouped convolutions. “G1G2” represents the replacement of 1 × 1 convolution and 3 × 3 convolution in dense layers with grouped convolution.

Table 7. Comparative experiments on different activation functions.

Activate Function	Accuracy	Loss	AUC	Epochs
ReLU	0.948	0.028	0.959	40
PReLU	0.953	0.026	0.951	37
SiLU	0.951	0.027	0.957	38
ELU	0.955	0.025	0.961	35

Table 8. Comparison between different models.

Model	Accuracy	Precision	Recall	F1-Score	AUC
DenseNet121	0.931	0.935	0.926	0.928	0.941
ResNet50	0.885	0.869	0.862	0.864	0.882
ResNet101	0.861	0.851	0.849	0.867	0.869
GoogleNet	0.924	0.933	0.925	0.927	0.931
InceptionV3	0.943	0.944	0.939	0.941	0.952
Our Model	0.955	0.954	0.948	0.949	0.963

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gu, J.; Li, Z.; Zhang, L.; Yin, Y.; Lv, Y.; Yu, Y.; Li, D. Research on the Quality Grading Method of Ginseng with Improved DenseNet121 Model. Electronics 2024, 13, 4504. https://doi.org/10.3390/electronics13224504

AMA Style

Gu J, Li Z, Zhang L, Yin Y, Lv Y, Yu Y, Li D. Research on the Quality Grading Method of Ginseng with Improved DenseNet121 Model. Electronics. 2024; 13(22):4504. https://doi.org/10.3390/electronics13224504

Chicago/Turabian Style

Gu, Jinlong, Zhiyi Li, Lijuan Zhang, Yingying Yin, Yan Lv, Yue Yu, and Dongming Li. 2024. "Research on the Quality Grading Method of Ginseng with Improved DenseNet121 Model" Electronics 13, no. 22: 4504. https://doi.org/10.3390/electronics13224504

APA Style

Gu, J., Li, Z., Zhang, L., Yin, Y., Lv, Y., Yu, Y., & Li, D. (2024). Research on the Quality Grading Method of Ginseng with Improved DenseNet121 Model. Electronics, 13(22), 4504. https://doi.org/10.3390/electronics13224504

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on the Quality Grading Method of Ginseng with Improved DenseNet121 Model

Abstract

1. Introduction

2. Dataset

2.1. Dataset Composition

2.2. Dataset Preprocessing

2.3. Dataset Partitioning

3. Building the Network Model

3.1. Original Network Model

3.2. Ginseng Network Classification Model

3.3. CA Attention Mechanism

3.4. Group Convolution

3.5. ELU Activation Function

4. Experiment and Result Analysis

4.1. Experimental Environment

4.2. Experiments on Attention Mechanisms

4.2.1. The Performance of Attention Mechanisms in Different Positions

4.2.2. Comparison Between Different Attention Mechanisms

4.3. Experimental Study on Grouped Convolution

The Effect of Grouped Convolution at Different Positions

4.4. Experiment on Activation Function

Comparison Between Different Activation Functions

4.5. Performance Comparison of Different Network Models

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI