1. Introduction
Plant seeds are the basis of agricultural production. As a kind of Compositae plant, Bidens L. can be subdivided into many categories, such as Bidens Pilosa L., Bidens Pilosa L. var radiata, Bidens bipnnata L. and Bidens biternate (Lour.) Merr. & Sherff. Among them, B. bipinnata and B. biternata have important production and medical value. However, as invasive species, B. pilosa and B. pilosa var. radiata will multiply rapidly if they are planted together in the field, robbing other crops of nutrition and living space, thereby posing a serious threat to the survival of other crops or even leading to their extinction. Moreover, as different branches under the same genus, their seeds are not only highly similar in color and shape but are also easily mixed together due to growing in the same habitat. Therefore, it is necessary to classify the seeds of different species of Bidens L.
Traditionally, seed varieties are distinguished by visual inspection [
1,
2]. However, high error rate, low accuracy rate, large time consumption, and labor intensiveness are obvious disadvantages of this method. Especially when the seeds in the same genus exhibit only slight differences in morphological characteristics, substantial challenges are faced in successfully identifying them without the assistance of specialists [
3]. Some biochemical identification methods, such as seed protein electrophoresis [
4], high-performance liquid chromatography [
5], and DNA molecular markers [
6], have also been applied to seed identification. Despite high accuracy rates, these methods are irreversibly destructive to seeds [
7], and the identification cost is high [
8,
9,
10,
11]. Therefore, employing automated methods for nondestructive, accurate, and efficient detection and identification of seeds is vitally important.
With advances in electronic and information technologies, computer vision methods combined with image processing techniques have become promising tools for precise real-time weed seed detection [
12]. Several methods in the field of computer vision have been transformed from statistical methods to deep learning methods because they offer greater accuracy for tasks such as image segmentation and image classification [
13]. Image segmentation and image classification are two important components of image processing tasks, and convolutional neural networks (CNNs) are widely used in image processing tasks in various scenarios [
14,
15,
16]. CNNs support feature learning and classification together, consisting of several layers, which makes them more accurate in feature extraction than any other traditional AI-based algorithm [
17].
Loddo et al. [
18] used the gray histogram method and automatic threshold segmentation to segment the images of a Canadian dataset and a local dataset, respectively, with the seed scale indicators removed. A new CNN seed classification model called SeedNet was proposed, which achieved 95.24% and 97.47% accuracy on the two datasets, outperforming other models. Javanmardi et al. [
19] selected the lab color space with the strongest contrast between the seeds and the background, used the multi-threshold method to remove the background and segment the corn seeds, and extracted the features of the corn pictures by combining handcrafted feature extraction and CNN feature extraction. Finally, these features were passed into a machine-learning classifier to complete the classification of nine different varieties of corn. Sabanci et al. [
20] used manual cropping to segment images of pepper seeds and used two methods to classify four kinds of pepper seeds. The first was to train the CNN models ResNet18 and ResNet50 for classification, and the second was to fuse the extracted features through the CNN feature extraction function for feature selection and then use a support vector machine (SVM) for classification. Finally, classification accuracies of 98.05% and 97.07% were obtained with the first method, and a classification accuracy of 99.02% was achieved with the second method. Lin et al. [
21] segmented the images of soybean seeds based on multiscale retinex with color restoration (MSRCR). Then, they used the recognition model SoyNet to achieve soybean seed four-classification with appropriate parameters. The F-score of the normal, damaged, abnormal, and non-classifiable soybeans reached about 95.97%, 97.41%, 97.25%, and 96.14%, respectively. In summary, in previous research, various segmentation methods have been used for image preprocessing, and machine learning and deep learning have been combined for classification, which proves the feasibility of detecting various seeds.
Although previous research on seed classification has achieved high accuracy, there are still problems in some aspects. First, the methods used to segment seeds in most of the previous studies were threshold segmentation and manual segmentation. They both need the segmentation threshold and morphological operation parameters to be set, and the requirements for the parameters are different for eliminating noise points of different sizes. Improper setting of parameters may lead to incomplete noise removal and may increase the complexity of subsequent work, or the pixels of a seed may be deleted by mistake, which will affect the accuracy of subsequent classification. Therefore, the high requirements for parameter settings mean that these approaches cannot achieve high-precision automatic segmentation. Second, most previous studies on seed classification considered round seeds, and few studies have been conducted on the classification of other shapes of seeds. The characteristics of Bidens L., as needle-shaped seeds, are quite different from those of round seeds. Therefore, it is not completely suitable to apply the feature extraction methods from previous research to the classification of Bidens L. Finally, according to the requirements of the network, each image must be modified to a fixed size, such as 224 × 224, when input into the network. Therefore, most studies preprocessed images before image input, such as by cropping and resizing, causing changes in or loss of seed information. Especially for the seeds of Bidens L., the length, number, and size of the needles and the texture of the trunk part are very important information for classification; hence, the impact of information loss on the classification of Bidens L. seeds is more serious. To solve the problems above, an automatic classification method based on automatic segmentation and multiscale feature fusion with preserved scale information is proposed for the efficient detection and classification of seeds with high appearance similarity.
To improve the generality, robustness, and accuracy of needle-shaped seed recognition, we designed a new model, MSFF-ResNet, which uses ResNet34 as the backbone network and contains a multiscale feature fusion module and an attention module. The local information of Bidens L. seeds, such as the needle-shaped part of the head, and the global information, such as the length-to-width ratio information, are all captured by the different receptive fields of the multiscale feature fusion module, and the attention module makes a decision between them, finds features that contribute greatly to classification, and ignores useless and redundant features. Meanwhile, depthwise separable convolution is used to replace some of the original convolutions to reduce the number of calculations and parameters. The performance of ablation experiments and comparative experiments for the task shows the effectiveness of our designed model. This method provides a new strategy for the precise classification of needle-shaped seeds.
4. Discussion
According to
Figure 6, when the inception module was not added, the model paid limited and sparse attention to the features of
Bidens L. seeds. Although certain attention was given to the seed features, the weight was not high, and the attention to detail was not sufficient. After the inception module was added, the model gave more attention to different features and became more sensitive to the extraction of details of different scales. It added more features to the model classification and gave a certain attention weight around the seed outline to capture the edge features of
Bidens L. seeds. Therefore, the comparison clearly shows that the inception module with multiscale feature fusion played a significant role in feature extraction in the model.
Figure 7a shows an original image of each of the four types of seeds, and
Figure 7b,c shows the images obtained using Grad-CAM of the feature extraction layer without and with the attention module added. The figure shows that after adding CBAM, the model’s attention to different features of the image changed. For
B. pilosa, the addition of CBAM made the model give more attention to the head and tail of that seed. The addition of the attention mechanism also helped the model eliminate useless features and supplement key features, as shown in the class heatmap of
B. bipinnata. As a variant of
B. pilosa,
B. pilosa var. radiata has a very similar appearance and shape. Therefore, before adding CBAM, the model’s attention to features was not ideal. After adding CBAM, the shape and quantity of seed head needles received attention, and additional spatial scale information was extracted, as shown in the rectangle in the figure.
B. biternate is different from other types of seeds in terms of length and aspect ratio, and after the addition of CBAM, the model not only gave attention to the seed head but also gave increased attention to the size of the seed and extracted the spatial feature information, as shown in the bright rectangular box in the figure.
According to a comparative analysis of the Grad-CAM images, the addition of CBAM provided significant help in the feature selection process of the model. CBAM not only reduced the model’s attention to irrelevant features, thereby saving more resources to focus on other features, but also made up for some detailed texture feature information that the model omitted. At the same time, the global semantic features extracted by the deep network were given a higher attention weight, and the global information was also used as an important basis for classification.
In addition, according to
Table 6, the multiscale feature fusion module and CBAM significantly improved the recognition accuracy of the network, which once again verifies that they make important contributions to the improvement of network performance.
In
Figure 8, in the initial stage of training, the loss values of all the networks constantly decreased and eventually stabilized. Although the convergence speed of MSFF-ResNet was lower than those of several other networks, after the 50th epoch, while the loss curves of the other networks fluctuated greatly, the loss curve of MSFF-ResNet still maintained a stable downward trend, obviously outperforming the other networks in terms of fluctuation range. In
Figure 9, in the initial stage of training, although the validation accuracy of MSFF-ResNet was lower than those of the other networks, it caught up with a growth rate significantly higher than those of the other networks. At the 25th epoch, the validation accuracy of the MSFF model surpassed those of the other networks and continued to grow, widening the gap with the other networks. At approximately epoch 60, the validation accuracy of MSFF-ResNet reached 90%. In the middle and late stages of training, the verification accuracy of MSFF-ResNet was significantly higher than those of the other networks, and there was no overfitting situation where the accuracy rate suddenly and continuously dropped. Therefore, the results show that the network has high recognition accuracy and strong generalization ability.
According to
Table 7, compared with other models, MSFF-ResNet had better performance and accuracy. ResNet34, ResNet50, and ResNet101 had similar classification accuracies; that is, as the number of network layers increased, the classification accuracy of the model no longer significantly improved, which shows that the deep residual network reached a bottleneck in the recognition of
Bidens L. seeds. With the addition of the multiscale feature fusion module and attention mechanism, MSFF-ResNet broke through this bottleneck and realized significantly improved classification accuracy.
According to
Table 8, ResNet34, as the network with the fewest layers among the three deep networks, had the fewest FLOPs and parameters. As the number of network layers increased, the numbers of FLOPs and parameters of the network also increased. MSFF-ResNet uses depth-separable convolution instead of ordinary convolution, successfully reducing the number of parameters to 6.15 M and the number of FLOPs to 5.95 G. Among the networks, MSFF-ResNet had the fewest parameters by far. Since the addition of the multiscale feature fusion module increased the number of channels of the feature map, resulting in a larger amount of calculation, the number of FLOPs of MSFF-ResNet was higher than those of ResNet34 and ResNet50, but the differences were not large. It is acceptable to incur FLOPs increases of 2.27 G and 1.83 G in exchange for accuracy improvements of 7.21% and 8.24%, respectively. According to our analysis, the comparison above is sufficient to illustrate the feasibility of MSFF-ResNet.
In
Figure 10, according to the confusion matrix, due to the high similarity between the seeds of different species of
Bidens L., some species were misclassified, such as
B. pilosa. However, compared with manual identification, the network still had higher accuracy and identification efficiency, and most of the seed samples were correctly identified. In conclusion, through the analysis of the confusion matrix, classification accuracy, and model performance, it was found that the model has important practical significance for the identification and classification of
Bidens L. seeds.
5. Conclusions
In this study, a nondestructive classification method for Bidens L. seeds was proposed that can automatically identify different varieties of Bidens L. seeds from images, thereby overcoming the problem that the traditional method is not fully applicable to the classification of Bidens L. seeds and making up for the lack of needle-shaped seed classification in the field of seed image recognition.
The main contribution of this study is to propose a model construction idea. On the basis of using a deep residual structure to ensure the depth of the model, it is proposed to use a multiscale feature fusion module to make up for its shortcomings so that the width of the model is expanded while ensuring the depth of the model, achieving a balance between depth and width. This enables the model to capture features of different scales while having the ability to extract high-level abstract information. The effect of the model has been verified in subsequent experiments. Through experiments, we found that the multiscale feature fusion module and the attention module are a very good combination because the attention mechanism can effectively screen a large number of different types of features captured through multiple receptive fields. When increasing the number of layers of the deep residual network cannot improve the classification accuracy of Bidens L. seeds, the addition of a multiscale fusion module and an attention module breaks the bottleneck. It is effective to focus on the features of different scales of Bidens L. seeds and extract them to improve the accuracy of classification. These results show that the method has certain advantages and potential in the field of seed classification and identification, which can greatly facilitate scientific research and agricultural production.
However, this method still has various problems. Through the experiments, it was found that the model misclassifies highly similar seeds. In future research, we will continue to optimize and improve the model so that it can achieve higher accuracy when classifying highly similar seeds.
In addition to the improvement and optimization of the model, this study still has room for further development and prospects. In future research, software can be developed based on the model proposed in this study so that the model can be called upon for classification in agricultural work by operating the software. In this way, relevant agricultural technicians can classify the various mixed Bidens L. seeds collected, distinguish the subclasses with medical value and plant them separately, and eliminate other harmful subclasses to prevent them from robbing the nutrition and living space of crops. At the same time, the GPU and other hardware configurations on which the models and methods proposed in this study are based are not very high, and this model reduces the number of parameters by introducing deep separable convolution. In addition, the computing power of various mobile devices is also constantly improving, which makes it possible to transplant this method to embedded devices and even smartphones. With the continuous improvement of this study, it will be more convenient to distinguish high-similarity subclasses of Bidens L. seeds in the future, and it will also provide the possibility of breaking the space and hardware limitations of classification and recognition.