A Recognition Model Based on Multiscale Feature Fusion for Needle-Shaped Bidens L. Seeds

Zhang, Zizhao; Huang, Yiqi; Chen, Ying; Liu, Ze; Liu, Bo; Liu, Conghui; Huang, Cong; Qian, Wanqiang; Zhang, Shuo; Qiao, Xi

doi:10.3390/agronomy14112675

Open AccessArticle

A Recognition Model Based on Multiscale Feature Fusion for Needle-Shaped Bidens L. Seeds

by

Zizhao Zhang

^1,2,†,

Yiqi Huang

^1,†,

Ying Chen

^1,2,

Ze Liu

³,

Bo Liu

²

,

Conghui Liu

²

,

Cong Huang

²,

Wanqiang Qian

²,

Shuo Zhang

³

and

Xi Qiao

^1,2,3,*

¹

College of Mechanical Engineering, Guangxi University, Nanning 530004, China

²

Shenzhen Branch, Guangdong Laboratory of Lingnan Modern Agriculture, Genome Analysis Laboratory of the Ministry of Agriculture and Rural Affairs, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen 518120, China

³

College of Mechanical and Electronic Engineering, Northwest A&F University, Xianyang 712100, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Agronomy 2024, 14(11), 2675; https://doi.org/10.3390/agronomy14112675

Submission received: 24 September 2024 / Revised: 3 November 2024 / Accepted: 12 November 2024 / Published: 14 November 2024

(This article belongs to the Special Issue In-Field Detection and Monitoring Technology in Precision Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

To solve the problem that traditional seed recognition methods are not completely suitable for needle-shaped seeds, such as Bidens L., in agricultural production, this paper proposes a model construction idea that combines the advantages of deep residual models in extracting high-level abstract features with multiscale feature extraction fusion, taking into account the depth and width of the network. Based on this, a multiscale feature fusion deep residual network (MSFF-ResNet) is proposed, and image segmentation is performed before classification. The image segmentation is performed by a popular semantic segmentation method, U²Net, which accurately separates seeds from the background. The multiscale feature fusion network is a deep residual model based on a residual network of 34 layers (ResNet34), and it contains a multiscale feature fusion module and an attention mechanism. The multiscale feature fusion module is designed to extract features of different scales of needle-shaped seeds, while the attention mechanism is used to improve the ability to select features of our model so that the model can pay more attention to the key features. The results show that the average accuracy and average F1-score of the multiscale feature fusion deep residual network on the test set are 93.81% and 94.44%, respectively, and the numbers of floating-point operations per second (FLOPs) and parameters are 5.95 G and 6.15 M, respectively. Compared to other deep residual networks, the multiscale feature fusion deep residual network achieves the highest classification accuracy. Therefore, the network proposed in this paper can classify needle-shaped seeds efficiently and provide a reference for seed recognition in agriculture.

Keywords:

needle-shaped seeds; image recognition; semantic segmentation; multiscale feature fusion; deep learning

1. Introduction

Plant seeds are the basis of agricultural production. As a kind of Compositae plant, Bidens L. can be subdivided into many categories, such as Bidens Pilosa L., Bidens Pilosa L. var radiata, Bidens bipnnata L. and Bidens biternate (Lour.) Merr. & Sherff. Among them, B. bipinnata and B. biternata have important production and medical value. However, as invasive species, B. pilosa and B. pilosa var. radiata will multiply rapidly if they are planted together in the field, robbing other crops of nutrition and living space, thereby posing a serious threat to the survival of other crops or even leading to their extinction. Moreover, as different branches under the same genus, their seeds are not only highly similar in color and shape but are also easily mixed together due to growing in the same habitat. Therefore, it is necessary to classify the seeds of different species of Bidens L.

Traditionally, seed varieties are distinguished by visual inspection [1,2]. However, high error rate, low accuracy rate, large time consumption, and labor intensiveness are obvious disadvantages of this method. Especially when the seeds in the same genus exhibit only slight differences in morphological characteristics, substantial challenges are faced in successfully identifying them without the assistance of specialists [3]. Some biochemical identification methods, such as seed protein electrophoresis [4], high-performance liquid chromatography [5], and DNA molecular markers [6], have also been applied to seed identification. Despite high accuracy rates, these methods are irreversibly destructive to seeds [7], and the identification cost is high [8,9,10,11]. Therefore, employing automated methods for nondestructive, accurate, and efficient detection and identification of seeds is vitally important.

With advances in electronic and information technologies, computer vision methods combined with image processing techniques have become promising tools for precise real-time weed seed detection [12]. Several methods in the field of computer vision have been transformed from statistical methods to deep learning methods because they offer greater accuracy for tasks such as image segmentation and image classification [13]. Image segmentation and image classification are two important components of image processing tasks, and convolutional neural networks (CNNs) are widely used in image processing tasks in various scenarios [14,15,16]. CNNs support feature learning and classification together, consisting of several layers, which makes them more accurate in feature extraction than any other traditional AI-based algorithm [17].

Loddo et al. [18] used the gray histogram method and automatic threshold segmentation to segment the images of a Canadian dataset and a local dataset, respectively, with the seed scale indicators removed. A new CNN seed classification model called SeedNet was proposed, which achieved 95.24% and 97.47% accuracy on the two datasets, outperforming other models. Javanmardi et al. [19] selected the lab color space with the strongest contrast between the seeds and the background, used the multi-threshold method to remove the background and segment the corn seeds, and extracted the features of the corn pictures by combining handcrafted feature extraction and CNN feature extraction. Finally, these features were passed into a machine-learning classifier to complete the classification of nine different varieties of corn. Sabanci et al. [20] used manual cropping to segment images of pepper seeds and used two methods to classify four kinds of pepper seeds. The first was to train the CNN models ResNet18 and ResNet50 for classification, and the second was to fuse the extracted features through the CNN feature extraction function for feature selection and then use a support vector machine (SVM) for classification. Finally, classification accuracies of 98.05% and 97.07% were obtained with the first method, and a classification accuracy of 99.02% was achieved with the second method. Lin et al. [21] segmented the images of soybean seeds based on multiscale retinex with color restoration (MSRCR). Then, they used the recognition model SoyNet to achieve soybean seed four-classification with appropriate parameters. The F-score of the normal, damaged, abnormal, and non-classifiable soybeans reached about 95.97%, 97.41%, 97.25%, and 96.14%, respectively. In summary, in previous research, various segmentation methods have been used for image preprocessing, and machine learning and deep learning have been combined for classification, which proves the feasibility of detecting various seeds.

Although previous research on seed classification has achieved high accuracy, there are still problems in some aspects. First, the methods used to segment seeds in most of the previous studies were threshold segmentation and manual segmentation. They both need the segmentation threshold and morphological operation parameters to be set, and the requirements for the parameters are different for eliminating noise points of different sizes. Improper setting of parameters may lead to incomplete noise removal and may increase the complexity of subsequent work, or the pixels of a seed may be deleted by mistake, which will affect the accuracy of subsequent classification. Therefore, the high requirements for parameter settings mean that these approaches cannot achieve high-precision automatic segmentation. Second, most previous studies on seed classification considered round seeds, and few studies have been conducted on the classification of other shapes of seeds. The characteristics of Bidens L., as needle-shaped seeds, are quite different from those of round seeds. Therefore, it is not completely suitable to apply the feature extraction methods from previous research to the classification of Bidens L. Finally, according to the requirements of the network, each image must be modified to a fixed size, such as 224 × 224, when input into the network. Therefore, most studies preprocessed images before image input, such as by cropping and resizing, causing changes in or loss of seed information. Especially for the seeds of Bidens L., the length, number, and size of the needles and the texture of the trunk part are very important information for classification; hence, the impact of information loss on the classification of Bidens L. seeds is more serious. To solve the problems above, an automatic classification method based on automatic segmentation and multiscale feature fusion with preserved scale information is proposed for the efficient detection and classification of seeds with high appearance similarity.

To improve the generality, robustness, and accuracy of needle-shaped seed recognition, we designed a new model, MSFF-ResNet, which uses ResNet34 as the backbone network and contains a multiscale feature fusion module and an attention module. The local information of Bidens L. seeds, such as the needle-shaped part of the head, and the global information, such as the length-to-width ratio information, are all captured by the different receptive fields of the multiscale feature fusion module, and the attention module makes a decision between them, finds features that contribute greatly to classification, and ignores useless and redundant features. Meanwhile, depthwise separable convolution is used to replace some of the original convolutions to reduce the number of calculations and parameters. The performance of ablation experiments and comparative experiments for the task shows the effectiveness of our designed model. This method provides a new strategy for the precise classification of needle-shaped seeds.

2. Materials and Methods

2.1. Image Acquisition

In this study, 4 different species of Bidens L. were collected in Shenzhen, Guangdong, China and selected as classification objects, namely, B. pilosa, B. bipinnata, B. pilosa var. radiata, and B. biternata, as shown in Figure 1. When selecting samples, seeds with complete and uniform shapes were manually screened as experimental samples, and then image acquisition was performed. After adjusting the camera to a suitable height, the seeds were evenly spread on the experimental bench in batches, and seed images were collected under indoor natural light conditions.

2.2. Selection of the Segmentation Method

Image segmentation is the technique and process of partitioning an image into a number of uniform, nonoverlapping, and homogeneous regions [22]. Since there are multiple seeds in each acquired image, each seed must be segmented individually before classification. This study compares two different segmentation methods in the segmentation stage: threshold segmentation and semantic segmentation. After the experiments, it was concluded that semantic segmentation has better performance and applicability than threshold segmentation in terms of robustness, automation, and segmentation accuracy.

Threshold-based segmentation has attracted growing interest among many technologies for image segmentation [23]. Threshold segmentation is a segmentation method that sets a threshold according to the difference in pixel gray value between different objects and divides pixels based on this threshold. When the threshold is set properly, the segmentation performance is better, but the requirements for the threshold setting are relatively high. Moreover, threshold segmentation is very sensitive to noise, and denoising is often required through morphological operations after segmentation. For Bidens L., the seed head is thinner, and its pixels are easily removed as noise during the morphological operations. However, if the head pixels are to be retained, the noise removal may not be complete. Therefore, different morphological parameters need to be set to adapt to various noise situations, a fact that indicates that the robustness and segmentation accuracy of threshold segmentation are not high enough.

Semantic segmentation makes up for the disadvantages of threshold segmentation. Through a large number of training samples, the model can finally be equipped with the ability to discriminate each pixel in the image, and the classification prediction probability of each pixel is given so that pixels of each type are merged together and the category is labeled. The automatic learning ability of semantic segmentation ensures the robustness of the segmentation, and multiple training iterations ensure the accuracy of the segmentation; hence, it is more suitable for the segmentation of Bidens L. seeds than threshold segmentation.

A performance comparison of the two segmentation methods is shown in Figure 2. Threshold segmentation performs poorly in preserving the head pixels of Bidens L. seeds, and some noise remains. If morphological operations are used to achieve high-precision denoising, more Bidens L. seed pixels will inevitably be lost, and the different amounts of noise in different images have different requirements for the size of the morphological operation kernel. Semantic segmentation achieves high-precision preservation of seed pixels and high-efficiency denoising and has high segmentation accuracy and stability in different images. For comprehensive comparison, semantic segmentation was selected as the preprocessing segmentation method.

2.3. Image Preprocessing

The pretreatment stages of Bidens L. seeds are shown in Figure 3. The original image was semantically segmented to obtain a segmentation mask, which was ANDed with the original image to obtain a seed mask. Then, the segmentation mask was inverted, and an OR operation was performed with the seed mask to obtain the final mask with a pure white background to achieve background purification. The purpose of purifying the background is to eliminate the influence of background noise on the classification, and the reason for converting the black background to white is that the contrast between the black Bidens L. seeds and the black background was weak, which is not conducive to their distinction. The contrast with the white background is strong, which is more conducive to the identification and classification of seeds. Then, the obtained image was processed through a contour extraction algorithm, the contour of each seed was drawn, and a bounding box for each seed was drawn according to the contour. Finally, the image was cropped according to the bounding box to obtain an image of a single seed. To avoid the appearance of other seeds in the bounding box, a contour extraction method was used to retain the largest contour. After segmentation, cropping, and screening, a total of 970 single seed images were obtained for subsequent seed classification.

2.4. Image Dataset

Due to the insufficient total number of samples, the dataset needed to be augmented via data augmentation. To prevent data leakage from affecting the test results on the test set, data augmentation was performed after the dataset was divided. First, the dataset was randomly divided into a training set, verification set, and test set at a ratio of 6:2:2, and then the training set data were expanded.

The sample distribution of the training set is shown in Figure 4. Figure 4 shows that the total number of samples in the dataset was insufficient and that the distribution of various samples was imbalanced. Directly using the dataset for training will have a negative impact on the classification performance of the model [24]. To balance the dataset to improve the generalization ability and robustness of the model, this study aimed to increase the data diversity and balance the numbers of various seed samples for data augmentation. To avoid losing size and ratio information, this study does not use random cropping or resizing [25] but applies multiangle rotation, X-axis flipping, and Y-axis flipping to randomly combine data to expand the dataset.

In the process of data augmentation, random rotations of 90°, 180°, and 270°, random horizontal flips, and random vertical flips were performed to increase the numbers of samples of the four types of seeds in the training set from 224, 75, 153, and 131 to 896, 750, 756, and 786, respectively. The amount of data before and after augmentation is shown in Table 1.

Table 1 shows that after data augmentation, all kinds of samples in the training set were balanced, and the total number of samples in the training set was 3188.

2.5. Model Building

This paper preserves the size and ratio information of the seeds, uses convolution kernels of different sizes in parallel to take into account the fusion of features of multiple scales, and uses ResNet34 [26] as the backbone network to widen the network while preserving the depth of the network, ensuring that the newly added network layer is at least an identity mapping layer [27] to prevent it from degrading the original network [28]. At the same time, it utilizes an attention mechanism to strengthen important features and finally realizes the complete extraction of the feature information of Bidens L. seeds.

Therefore, this study establishes a residual network based on multiscale feature fusion and an attention mechanism named MSFF-ResNet to classify 4 different species of Bidens L. seeds. The specific structure is shown in Figure 5. The figure shows that MSFF-ResNet is composed mainly of the backbone network ResNet34, a padding–resizing module, a multiscale feature fusion module, and CBAM.

Before an image is input into the network, it first undergoes unified padding processing through the padding–resizing module, and pixels with a pixel value of 255 are added around each image to convert the image to a fixed size with equal length and width. Since the background was uniformly whitened in the preprocessing stage, the operation of adding white pixels does not affect the background. After that, the image is proportionally resized to a size of 224 × 224. This operation not only avoids the seed scale distortion caused by nonproportional scaling when images of different sizes are resized to 224 × 224 but also ensures that the scaling ratio of different seeds is the same; that is, the size relationship between different seeds does not change due to resizing. This not only preserves the ratio information and size information of the seeds and greatly reduces the distortion of the preprocessing process but also adjusts the image to the specifications required by the network.

The multiscale feature fusion module consists of two inception layers nested in the first three residual blocks of the ResNet34 backbone network. The inception module is composed mainly of three convolution kernels of sizes 1 × 1, 3 × 3, and 5 × 5 and a 3 × 3 pooling layer [29]. Two inception layers with different parameters are used to replace the two 3 × 3 convolutions in the original residual block to enlarge the receptive field for shallow network feature extraction and to fuse features extracted from multiple scales. In the first inception module, four feature maps with 64, 128, 32, and 32 channels are extracted with 4 scales and spliced, while the feature maps of 4 scales spliced by the second module have 128, 192, 96, and 64 channels. Finally, the number of channels is reduced through a 1 × 1 convolution. The reason why the inception module is added to the first three residual blocks is that the shallow network extracts mainly shallow details, such as image textures, edges, and corners. Adding the inception module to the shallow layer is conducive to obtaining richer detail features.

To reduce the number of calculations and parameters of the model, depthwise separable convolution [30] was used to replace the two 3 × 3 ordinary convolutions in the last 13 residual blocks in the original network, and a 1 × 1 convolution was added before the depthwise separable convolution to reduce the number of channels of the feature map to further reduce the amount of calculation. Then, a 1 × 1 convolution was added to adjust the number of channels to match that of the subsequent network layer.

The convolutional block attention module (CBAM) is a lightweight attention module that combines a spatial attention mechanism (SAM) and a channel attention mechanism (CAM) [31]. CBAM was added at the end of the last 13 residual blocks. The reason for not adding it after the multiscale feature fusion module is to prevent the large increase in the amounts of calculation and parameters caused by the stacking of the two modules. After each feature extraction, the feature map was weighted in the space and channel dimensions, giving the network the ability to learn important features and ignore useless features, thereby improving the efficiency of feature extraction.

2.6. Experimental Environment and Hyperparameters

The classification model and comparative experiments proposed in this study were all carried out in the Windows 11 operating environment, and the experimental model was implemented using the PyTorch deep learning framework. The specific experimental environment parameters are shown in Table 2.

To train the best model, this study used a variety of models to conduct a series of pre-experiments, comparing and analyzing the experimental results to determine the hyperparameters of the model. The size of the input image was set to 224 × 224 to fit most network structures. The learning rate is an important parameter index for deep learning, which determines whether the objective function can converge to a local minimum and when it can converge to a local minimum. Four sets of learning rates were tested in the pre-experiment: 0.1, 0.01, 0.001, and 0.0001. By comparison, it was found that the convergence of the network was the best when the learning rate was set to 0.001. The batch size is the number of samples selected by the network model during one training. In the pre-experiment, 4 batch sizes were tested, namely, 8, 16, 32, and 64, and the batch size was finally set to 32 after comparing the effects. According to the convergence of the model, the number of epochs was set to 200. Finally, the adaptive moment estimation (Adam) optimizer was selected as the network optimizer, and the cross-entropy loss function was selected as the loss function of the model [32]. The hyperparameters of the specific models are listed in Table 3.

2.7. Evaluation Indicators

In the field of machine learning, confusion matrices are often used to compare the results of model classification in supervised learning. Each column of a confusion matrix represents the predicted class, and each row represents the actual class. Taking a binary classification problem as an example, if the actual result and the predicted result are both positive, the predicted result is recorded as TP; if the actual result is negative and the predicted result is positive, the predicted result is recorded as FP; if the actual result is positive and the predicted result is negative, the predicted result is recorded as FN; and if the actual result and the predicted result are both negative, the predicted result is recorded as TN [33]. The specific structure of the confusion matrix is shown in Table 4.

In this study, the confusion matrix was used to calculate the accuracy, precision, recall, and F1-score of each network as indicators to evaluate the performance of the model in the classification of the species of Bidens L. In a multiclass classification task, each breed was individually considered positive, while all the others were considered negative. To measure the performance of the entire network, we calculated the average precision and recall for 4 different varieties. Usually, precision and recall are a pair of contradictory indicators; hence, this paper uses the F1-score to calculate the weighted average of precision and recall. The higher the F1-score is, the higher the prediction accuracy of the model and the better the performance.

The calculation process for each evaluation index is shown in Table 5.

3. Results

This study uses a combination of evaluation indicators and attention visualization heatmaps to compare, evaluate, and analyze different modules and networks.

3.1. Multiscale Feature Fusion Validation Experiment

The inception module uses a parallel structure to convolve or pool the input feature maps and then splices the features extracted by four different convolution kernels in the channel dimension to achieve multiscale feature fusion so that more information can be fused, the width of the network can be expanded, and richer features can be learned. This part explains the effect of the inception module.

Since the inception module is nested in the first three layers of the network to extract and fuse detailed features of different scales, this part extracts the feature maps of the four types of Bidens L. seeds in the network before and after adding the inception module for attention visualization to compare and evaluate the effect of multiscale feature extraction, as shown in Figure 6.

3.2. Attention Module Validation Experiment

As an attention module, CBAM can help the model generate a weighted map, which is multiplied by the input original image to finally generate a feature map of channel and spatial attention so that the model can focus on important information with high weight and ignore irrelevant information with low weight, realizing efficient allocation of processing resources.

To explore the role of CBAM in the feature selection process, this study designed a comparative experiment to evaluate the module. We trained the MSFF model with and without CBAM, used the two models to test the test set, output the Grad-CAM images of the two models, and compared the differences in the focus of the two models on different features, as shown in Figure 7.

Table 6 compares the roles of the multiscale feature fusion module and CBAM in a quantitative way. After adding the feature fusion module to the original network, the accuracy rate of the network increased from 86.60% to 89.69%. After adding CBAM, the accuracy rate of the network increased from 89.69% to 93.81%.

3.3. Network Performance Comparison and Analysis

To evaluate the classification performance of the MSFF model, it was quantitatively and qualitatively evaluated and compared with the ResNet34, ResNet50, and ResNet101 deep residual models.

Figure 8 shows the relationship between the loss value of each model and the number of training iterations. The loss curve represents the deviation between the predicted value of the network and the actual value after different numbers of iterations. The smaller the loss value is, the stronger the classification ability of the network.

Figure 9 shows the relationship between the accuracy of each model and the number of iterations on the validation set. The accuracy curve describes the change in network classification accuracy as the number of iterations increases.

Table 7 compares several deep networks in terms of classification accuracy. The classification accuracies of ResNet34, ResNet50, and ResNet101 for Bidens L. seeds were similar, while the average classification accuracy, average precision, average recall, and F1-score of MSFF-ResNet reached 93.81%, 94.41%, 94.75% and 94.58%, respectively.

Table 8 shows the numbers of FLOPs and parameters of several deep networks.

To more comprehensively evaluate the classification ability of the network, this study constructed the confusion matrix of the classification results of four kinds of Bidens L. seeds on the test set to reflect the actual recognition of four similar Bidens L. seeds by MSFF-ResNet, as shown in Figure 10.

3.4. K-Fold Cross-Validation Experiment

In order to verify the rationality of data distribution and the effectiveness of the model, this study designed a K-fold cross-validation experiment. The value of K was taken as 10, the entire data set was evenly divided into 10 equal parts, and a total of 10 folds of training were performed. In each fold of training, different data were selected as validation sets, and other data were selected as training sets. The hyperparameters of the model during training were the same as before. The model accuracy results of each fold training are shown in Table 9 and Figure 11, and the model accuracy of all 10 folds was averaged to evaluate the performance of the model.

It can be seen from the results that the model accuracy of each fold training remained above 90%, and there was no large fluctuation, indicating that the data distribution was relatively balanced and stable; at the same time, the final accuracy reached 92.39%, which is very close to the classification accuracy obtained in the previous experiment, and also verifies the effectiveness of the experiment and the performance of the model.

4. Discussion

According to Figure 6, when the inception module was not added, the model paid limited and sparse attention to the features of Bidens L. seeds. Although certain attention was given to the seed features, the weight was not high, and the attention to detail was not sufficient. After the inception module was added, the model gave more attention to different features and became more sensitive to the extraction of details of different scales. It added more features to the model classification and gave a certain attention weight around the seed outline to capture the edge features of Bidens L. seeds. Therefore, the comparison clearly shows that the inception module with multiscale feature fusion played a significant role in feature extraction in the model.

Figure 7a shows an original image of each of the four types of seeds, and Figure 7b,c shows the images obtained using Grad-CAM of the feature extraction layer without and with the attention module added. The figure shows that after adding CBAM, the model’s attention to different features of the image changed. For B. pilosa, the addition of CBAM made the model give more attention to the head and tail of that seed. The addition of the attention mechanism also helped the model eliminate useless features and supplement key features, as shown in the class heatmap of B. bipinnata. As a variant of B. pilosa, B. pilosa var. radiata has a very similar appearance and shape. Therefore, before adding CBAM, the model’s attention to features was not ideal. After adding CBAM, the shape and quantity of seed head needles received attention, and additional spatial scale information was extracted, as shown in the rectangle in the figure. B. biternate is different from other types of seeds in terms of length and aspect ratio, and after the addition of CBAM, the model not only gave attention to the seed head but also gave increased attention to the size of the seed and extracted the spatial feature information, as shown in the bright rectangular box in the figure.

According to a comparative analysis of the Grad-CAM images, the addition of CBAM provided significant help in the feature selection process of the model. CBAM not only reduced the model’s attention to irrelevant features, thereby saving more resources to focus on other features, but also made up for some detailed texture feature information that the model omitted. At the same time, the global semantic features extracted by the deep network were given a higher attention weight, and the global information was also used as an important basis for classification.

In addition, according to Table 6, the multiscale feature fusion module and CBAM significantly improved the recognition accuracy of the network, which once again verifies that they make important contributions to the improvement of network performance.

In Figure 8, in the initial stage of training, the loss values of all the networks constantly decreased and eventually stabilized. Although the convergence speed of MSFF-ResNet was lower than those of several other networks, after the 50th epoch, while the loss curves of the other networks fluctuated greatly, the loss curve of MSFF-ResNet still maintained a stable downward trend, obviously outperforming the other networks in terms of fluctuation range. In Figure 9, in the initial stage of training, although the validation accuracy of MSFF-ResNet was lower than those of the other networks, it caught up with a growth rate significantly higher than those of the other networks. At the 25th epoch, the validation accuracy of the MSFF model surpassed those of the other networks and continued to grow, widening the gap with the other networks. At approximately epoch 60, the validation accuracy of MSFF-ResNet reached 90%. In the middle and late stages of training, the verification accuracy of MSFF-ResNet was significantly higher than those of the other networks, and there was no overfitting situation where the accuracy rate suddenly and continuously dropped. Therefore, the results show that the network has high recognition accuracy and strong generalization ability.

According to Table 7, compared with other models, MSFF-ResNet had better performance and accuracy. ResNet34, ResNet50, and ResNet101 had similar classification accuracies; that is, as the number of network layers increased, the classification accuracy of the model no longer significantly improved, which shows that the deep residual network reached a bottleneck in the recognition of Bidens L. seeds. With the addition of the multiscale feature fusion module and attention mechanism, MSFF-ResNet broke through this bottleneck and realized significantly improved classification accuracy.

According to Table 8, ResNet34, as the network with the fewest layers among the three deep networks, had the fewest FLOPs and parameters. As the number of network layers increased, the numbers of FLOPs and parameters of the network also increased. MSFF-ResNet uses depth-separable convolution instead of ordinary convolution, successfully reducing the number of parameters to 6.15 M and the number of FLOPs to 5.95 G. Among the networks, MSFF-ResNet had the fewest parameters by far. Since the addition of the multiscale feature fusion module increased the number of channels of the feature map, resulting in a larger amount of calculation, the number of FLOPs of MSFF-ResNet was higher than those of ResNet34 and ResNet50, but the differences were not large. It is acceptable to incur FLOPs increases of 2.27 G and 1.83 G in exchange for accuracy improvements of 7.21% and 8.24%, respectively. According to our analysis, the comparison above is sufficient to illustrate the feasibility of MSFF-ResNet.

In Figure 10, according to the confusion matrix, due to the high similarity between the seeds of different species of Bidens L., some species were misclassified, such as B. pilosa. However, compared with manual identification, the network still had higher accuracy and identification efficiency, and most of the seed samples were correctly identified. In conclusion, through the analysis of the confusion matrix, classification accuracy, and model performance, it was found that the model has important practical significance for the identification and classification of Bidens L. seeds.

5. Conclusions

In this study, a nondestructive classification method for Bidens L. seeds was proposed that can automatically identify different varieties of Bidens L. seeds from images, thereby overcoming the problem that the traditional method is not fully applicable to the classification of Bidens L. seeds and making up for the lack of needle-shaped seed classification in the field of seed image recognition.

The main contribution of this study is to propose a model construction idea. On the basis of using a deep residual structure to ensure the depth of the model, it is proposed to use a multiscale feature fusion module to make up for its shortcomings so that the width of the model is expanded while ensuring the depth of the model, achieving a balance between depth and width. This enables the model to capture features of different scales while having the ability to extract high-level abstract information. The effect of the model has been verified in subsequent experiments. Through experiments, we found that the multiscale feature fusion module and the attention module are a very good combination because the attention mechanism can effectively screen a large number of different types of features captured through multiple receptive fields. When increasing the number of layers of the deep residual network cannot improve the classification accuracy of Bidens L. seeds, the addition of a multiscale fusion module and an attention module breaks the bottleneck. It is effective to focus on the features of different scales of Bidens L. seeds and extract them to improve the accuracy of classification. These results show that the method has certain advantages and potential in the field of seed classification and identification, which can greatly facilitate scientific research and agricultural production.

However, this method still has various problems. Through the experiments, it was found that the model misclassifies highly similar seeds. In future research, we will continue to optimize and improve the model so that it can achieve higher accuracy when classifying highly similar seeds.

In addition to the improvement and optimization of the model, this study still has room for further development and prospects. In future research, software can be developed based on the model proposed in this study so that the model can be called upon for classification in agricultural work by operating the software. In this way, relevant agricultural technicians can classify the various mixed Bidens L. seeds collected, distinguish the subclasses with medical value and plant them separately, and eliminate other harmful subclasses to prevent them from robbing the nutrition and living space of crops. At the same time, the GPU and other hardware configurations on which the models and methods proposed in this study are based are not very high, and this model reduces the number of parameters by introducing deep separable convolution. In addition, the computing power of various mobile devices is also constantly improving, which makes it possible to transplant this method to embedded devices and even smartphones. With the continuous improvement of this study, it will be more convenient to distinguish high-similarity subclasses of Bidens L. seeds in the future, and it will also provide the possibility of breaking the space and hardware limitations of classification and recognition.

Author Contributions

Conceptualization, Z.Z., Y.H., Y.C., Z.L., B.L., C.L., C.H., W.Q., S.Z. and X.Q.; data curation, Z.Z., Y.C. and X.Q.; formal analysis, Z.Z.; funding acquisition, Y.H., W.Q. and X.Q.; investigation, Y.H. and X.Q.; methodology, Z.Z. and Y.C.; project administration, B.L., C.L., C.H., W.Q., S.Z. and X.Q.; resources, B.L., C.L., C.H., W.Q., S.Z. and X.Q.; software, Z.Z. and Z.L.; supervision, Y.H., W.Q., S.Z. and X.Q.; validation, Z.Z., Y.C. and Z.L.; visualization, Z.Z.; writing—original draft, Z.Z.; writing—review and editing, Z.Z., Y.H., Y.C., Z.L., B.L., C.L., C.H., W.Q., S.Z. and X.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China (2021YFD1400100, 2021YFD1400102, and 2021YFD1400101), the Guangxi Natural Science Foundation of China (2021JJA130221), and the Agricultural Science and Technology Innovation Program.

Data Availability Statement

All the data mentioned in the paper are available from the corresponding author.

Acknowledgments

The authors would like to thank the National Key Research and Development Program of China, the Guangxi Natural Science Foundation of China and the Agricultural Science and Technology Innovation Program.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ansari, N.; Ratri, S.S.; Jahan, A.; Ashik-E-Rabbani, M.; Rahman, A. Inspection of paddy seed varietal purity using machine vision and multivariate analysis. J. Agric. Food Res. 2021, 3, 100109. [Google Scholar] [CrossRef]
Qadri, S.; Furqan Qadri, S.; Razzaq, A.; Ul Rehman, M.; Ahmad, N.; Nawaz, S.A.; Saher, N.; Akhtar, N.; Khan, D.M. Classification of canola seed varieties based on multi-feature analysis using computer vision approach. Int. J. Food Prop. 2021, 24, 493–504. [Google Scholar] [CrossRef]
Yang, L.; Yan, J.; Li, H.; Cao, X.; Ge, B.; Qi, Z.; Yan, X. Real-Time Classification of Invasive Plant Seeds Based on Improved YOLOv5 with Attention Mechanism. Diversity 2022, 14, 254. [Google Scholar] [CrossRef]
El-Hakim, A.F.A.; Mady, E.; Tahoun, A.M.A.; Ghaly, M.S.; Eissa, M.A. Seed Quality and Protein Classification of Some Quinoa Varieties. J. Ecol. Eng. 2022, 23, 24–33. [Google Scholar] [CrossRef]
Li, G.; Chen, M.; Chen, J.; Shang, Y.; Lian, X.; Wang, P.; Lei, H.; Ma, Q. Chemical composition analysis of pomegranate seeds based on ultra-high-performance liquid chromatography coupled with quadrupole-Orbitrap high-resolution mass spectrometry. J. Pharm. Biomed. Anal. 2020, 187, 113357. [Google Scholar] [CrossRef]
Kumar, R.; Janila, P.; Vishwakarma, M.K.; Khan, A.W.; Manohar, S.S.; Gangurde, S.S.; Variath, M.T.; Shasidhar, Y.; Pandey, M.K.; Varshney, R.K. Whole-genome resequencing-based QTL-seq identified candidate genes and molecular markers for fresh seed dormancy in groundnut. Plant Biotechnol. J. 2020, 18, 992–1003. [Google Scholar] [CrossRef]
de Medeiros, A.D.; Capobiango, N.P.; da Silva, J.M.; da Silva, L.J.; da Silva, C.B.; Dos Santos Dias, D.C.F. Interactive machine learning for soybean seed and seedling quality classification. Sci. Rep. 2020, 10, 11267. [Google Scholar] [CrossRef]
Luo, T.; Zhao, J.; Gu, Y.; Zhang, S.; Qiao, X.; Tian, W.; Han, Y. Classification of weed seeds based on visual images and deep learning. Inf. Process. Agric. 2023, 10, 40–51. [Google Scholar] [CrossRef]
Xu, P.; Tan, Q.; Zhang, Y.; Zha, X.; Yang, S.; Yang, R. Research on Maize Seed Classification and Recognition Based on Machine Vision and Deep Learning. Agriculture 2022, 12, 232. [Google Scholar] [CrossRef]
Bai, X.; Zhang, C.; Xiao, Q.; He, Y.; Bao, Y. Application of near-infrared hyperspectral imaging to identify a variety of silage maize seeds and common maize seeds. RSC Adv. 2020, 10, 11707–11715. [Google Scholar] [CrossRef]
Xia, C.; Yang, S.; Huang, M.; Zhu, Q.; Guo, Y.; Qin, J. Maize seed classification using hyperspectral image coupled with multi-linear discriminant analysis. Infrared Phys. Technol. 2019, 103, 103077. [Google Scholar] [CrossRef]
Wang, A.; Zhang, W.; Wei, X. A review on weed detection using ground-based machine vision and image processing techniques. Comput. Electron. Agric. 2019, 158, 226–240. [Google Scholar] [CrossRef]
Kiratiratanapruk, K.; Temniranrat, P.; Sinthupinyo, W.; Prempree, P.; Chaitavon, K.; Porntheeraphat, S.; Prasertsak, A. Development of Paddy Rice Seed Classification Process using Machine Learning Techniques for Automatic Grading Machine. J. Sens. 2020, 2020, 7041310. [Google Scholar] [CrossRef]
Huang, Z.; Wang, R.; Cao, Y.; Zheng, S.; Teng, Y.; Wang, F.; Wang, L.; Du, J. Deep learning based soybean seed classification. Comput. Electron. Agric. 2022, 202, 107393. [Google Scholar] [CrossRef]
Chen, Y.; Huang, Y.; Zhang, Z.; Wang, Z.; Liu, B.; Liu, C.; Huang, C.; Dong, S.; Pu, X.; Wan, F.; et al. Plant image recognition with deep learning: A review. Comput. Electron. Agric. 2023, 212, 108072. [Google Scholar] [CrossRef]
Zhao, G.; Yang, R.; Jing, X.; Zhang, H.; Wu, Z.; Sun, X.; Jiang, H.; Li, R.; Wei, X.; Fountas, S.; et al. Phenotyping of individual apple tree in modern orchard with novel smartphone-based heterogeneous binocular vision and YOLOv5s. Comput. Electron. Agric. 2023, 209, 107814. [Google Scholar] [CrossRef]
Hamid, Y.; Wani, S.; Soomro, A.B.; Alwan, A.A.; Gulzar, Y. Smart Seed Classification System based on MobileNetV2 Architecture. In Proceedings of the 2022 2nd International Conference on Computing and Information Technology (ICCIT), Tabuk, Saudi Arabia, 25–27 January 2022; pp. 217–222. [Google Scholar]
Loddo, A.; Loddo, M.; Di Ruberto, C. A novel deep learning based approach for seed image classification and retrieval. Comput. Electron. Agric. 2021, 187, 106269. [Google Scholar] [CrossRef]
Javanmardi, S.; Miraei Ashtiani, S.-H.; Verbeek, F.J.; Martynenko, A. Computer-vision classification of corn seed varieties using deep convolutional neural network. J. Stored Prod. Res. 2021, 92, 101800. [Google Scholar] [CrossRef]
Sabanci, K.; Aslan, M.F.; Ropelewska, E.; Unlersen, M.F. A convolutional neural network-based comparative study for pepper seed classification: Analysis of selected deep features with support vector machine. J. Food Process Eng. 2021, 45, e13955. [Google Scholar] [CrossRef]
Lin, W.; Shu, L.; Zhong, W.; Lu, W.; Ma, D.; Meng, Y. Online classification of soybean seeds based on deep learning. Eng. Appl. Artif. Intell. 2023, 123, 106434. [Google Scholar] [CrossRef]
Lei, B.; Fan, J. Image thresholding segmentation method based on minimum square rough entropy. Appl. Soft Comput. 2019, 84, 105687. [Google Scholar] [CrossRef]
Chen, Y.; Wang, M.; Heidari, A.A.; Shi, B.; Hu, Z.; Zhang, Q.; Chen, H.; Mafarja, M.; Turabieh, H. Multi-threshold image segmentation using a multi-strategy shuffled frog leaping algorithm. Expert. Syst. Appl. 2022, 194, 116511. [Google Scholar] [CrossRef]
Liu, H.; Zhou, M.; Liu, Q. An embedded feature selection method for imbalanced data classification. IEEE/CAA J. Autom. Sin. 2019, 6, 703–715. [Google Scholar] [CrossRef]
Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
Wang, F.; Qiu, J.; Wang, Z.; Li, W. Intelligent recognition of surface defects of parts by Resnet. J. Phys. Conf. Ser. 2021, 1883, 012178. [Google Scholar] [CrossRef]
Zhang, K.; Tang, B.; Deng, L.; Liu, X. A hybrid attention improved ResNet based fault diagnosis method of wind turbines gearbox. Measurement 2021, 179, 109491. [Google Scholar] [CrossRef]
Kumar, V.; Arora, H.; Harsh; Sisodia, J. ResNet-based approach for Detection and Classification of Plant Leaf Diseases. In Proceedings of the 2020 International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 2–4 July 2020; pp. 495–502. [Google Scholar]
Dong, N.; Zhao, L.; Wu, C.H.; Chang, J.F. Inception v3 based cervical cell classification combined with artificially extracted features. Appl. Soft Comput. 2020, 93, 106311. [Google Scholar] [CrossRef]
Kc, K.; Yin, Z.; Wu, M.; Wu, Z. Depthwise separable convolution architectures for plant disease classification. Comput. Electron. Agric. 2019, 165, 104948. [Google Scholar] [CrossRef]
Su, H.; Wang, X.; Han, T.; Wang, Z.; Zhao, Z.; Zhang, P. Research on a U-Net Bridge Crack Identification and Feature-Calculation Methods Based on a CBAM Attention Mechanism. Buildings 2022, 12, 1561. [Google Scholar] [CrossRef]
Noor, T.H.; Noor, A.; Alharbi, A.F.; Faisal, A.; Alrashidi, R.; Alsaedi, A.S.; Alharbi, G.; Alsanoosy, T.; Alsaeedi, A. Real-Time Arabic Sign Language Recognition Using a Hybrid Deep Learning Model. Sensors 2024, 24, 3683. [Google Scholar] [CrossRef]
Bi, C.; Hu, N.; Zou, Y.; Zhang, S.; Xu, S.; Yu, H. Development of Deep Learning Methodology for Maize Seed Variety Recognition Based on Improved Swin Transformer. Agronomy 2022, 12, 1843. [Google Scholar] [CrossRef]

Figure 1. Four different seeds of Bidens L. (a) B. Pilosa. (b) B. bipinnata. (c) B. pilosa var. radiata. (d) B. biternate.

Figure 2. Comparison of segmentation results. (a) Original image. (b) Threshold segmentation result. (c) Semantic segmentation result.

Figure 3. Bidens L. seed image segmentation and separation processing.

Figure 4. Dataset sample distribution.

Figure 5. MSFF-ResNet structure.

Figure 6. Impact of the multiscale feature fusion module on feature extraction. (a) Original images. (b) Images obtained using the Grad-CAM of the feature extraction layer without the multiscale feature fusion block. (c) Images obtained using the Grad-CAM of the feature extraction layer with the multiscale feature fusion block.

Figure 7. Impact of CBAM on feature extraction. (a) Original images. (b) Images obtained using Grad-CAM of the feature extraction layer without CBAM. (c) Images obtained using Grad-CAM of the feature extraction layer with CBAM.

Figure 8. Training loss of different models.

Figure 9. Validation accuracy of different models.

Figure 10. Confusion matrix of MSFF-ResNet.

Figure 11. Results of K-fold cross-validation experiment.

Table 1. Changes in the numbers of samples in the training dataset with the data augmentation process.

Seed Category	Number of Original Samples	Number of Enhanced Samples	Label
B. pilosa	224	896	0
B. bipinnata	75	750	1
B. pilosa var. radiata	153	756	2
B. biternata	131	786	3
Total	583	3188	/

Table 2. Experimental environment configuration.

Parameter	Configuration
Operating system	Windows 11
Development framework	Pytorch 1.12.1
Development language	Python 3.9.2
CUDA	11.7
GPU	GeForce RTX 2060
RAM	32 GB

Table 3. Hyperparameters for training.

Hyperparameter	Value
Learning rate	0.001
Batch size	32
Epochs	200
Optimizer	Adam
Loss function	Cross-entropy loss

Table 4. Confusion matrix of binary classification.

Confusion Matrix		Actual Results
Confusion Matrix		Positive	Negative
Predicted Results	Positive	TP	FP
Predicted Results	Negative	FN	TN

Table 5. Evaluation indicators and formulas.

Indicator	Formula	Significance
m-Accuracy	$m - A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}$	Number of correct predictions by the model
m-Precision	$m - P r e c i s i o n = \frac{T P}{T P + F P}$	Proportion of samples predicted to be positive that are correctly predicted
m-Recall	$m - R e c a l l = \frac{T P}{T P + F N}$	Proportion of positive samples that are correctly predicted
Accuracy	$A c c u r a c y = a v g (\sum_{m = 1}^{4} m - A c c u r a c y)$	Average accuracy for the 4 varieties
Precision	$P r e c i s i o n = a v g (\sum_{m = 1}^{4} m - P r e c i s i o n)$	Average precision for the 4 varieties
Recall	$R e c a l l = a v g (\sum_{m = 1}^{4} m - R e c a l l)$	Average recall for the 4 varieties
F1-score	$F 1 = 2 \times \frac{R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}$	Harmonic average of accuracy and recall

Table 6. Classification and comparison results of different networks.

Evaluation Indicator	Baseline	Multiscale Feature	Multiscale Feature + CBAM
Average accuracy (%)	86.60	89.69	93.81
Average precision (%)	88.30	91.67	94.74
Average recall (%)	89.45	90.47	94.21
Average F1-score (%)	88.87	91.07	94.44

Table 7. Classification and comparison results of different networks.

Evaluation Indicator	MSFF-ResNet	ResNet34	ResNet50	ResNet101
Average accuracy (%)	93.81	86.60	85.57	86.60
Average precision (%)	94.74	88.30	87.28	88.35
Average recall (%)	94.21	89.45	87.35	88.25
Average F1-score (%)	94.44	88.87	87.31	88.30

Table 8. FLOPs and parameter comparison results of different networks.

Network	MSFF-ResNet	ResNet34	ResNet50	ResNet101
FLOPs (G)	5.95	3.68	4.12	7.85
Parameters (M)	6.15	21.8	25.56	44.55

Table 9. Results of K-fold cross-validation experiment.

Fold	1	2	3	4	5	6
Accuracy	93.17%	91.69%	92.36%	90.87%	92.43%	92.54%
	7	8	9	10	Average
	90.29%	94.13%	93.78%	92.64%	92.39%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Z.; Huang, Y.; Chen, Y.; Liu, Z.; Liu, B.; Liu, C.; Huang, C.; Qian, W.; Zhang, S.; Qiao, X. A Recognition Model Based on Multiscale Feature Fusion for Needle-Shaped Bidens L. Seeds. Agronomy 2024, 14, 2675. https://doi.org/10.3390/agronomy14112675

AMA Style

Zhang Z, Huang Y, Chen Y, Liu Z, Liu B, Liu C, Huang C, Qian W, Zhang S, Qiao X. A Recognition Model Based on Multiscale Feature Fusion for Needle-Shaped Bidens L. Seeds. Agronomy. 2024; 14(11):2675. https://doi.org/10.3390/agronomy14112675

Chicago/Turabian Style

Zhang, Zizhao, Yiqi Huang, Ying Chen, Ze Liu, Bo Liu, Conghui Liu, Cong Huang, Wanqiang Qian, Shuo Zhang, and Xi Qiao. 2024. "A Recognition Model Based on Multiscale Feature Fusion for Needle-Shaped Bidens L. Seeds" Agronomy 14, no. 11: 2675. https://doi.org/10.3390/agronomy14112675

APA Style

Zhang, Z., Huang, Y., Chen, Y., Liu, Z., Liu, B., Liu, C., Huang, C., Qian, W., Zhang, S., & Qiao, X. (2024). A Recognition Model Based on Multiscale Feature Fusion for Needle-Shaped Bidens L. Seeds. Agronomy, 14(11), 2675. https://doi.org/10.3390/agronomy14112675

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Recognition Model Based on Multiscale Feature Fusion for Needle-Shaped Bidens L. Seeds

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition

2.2. Selection of the Segmentation Method

2.3. Image Preprocessing

2.4. Image Dataset

2.5. Model Building

2.6. Experimental Environment and Hyperparameters

2.7. Evaluation Indicators

3. Results

3.1. Multiscale Feature Fusion Validation Experiment

3.2. Attention Module Validation Experiment

3.3. Network Performance Comparison and Analysis

3.4. K-Fold Cross-Validation Experiment

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI