1. Introduction
Soybean, a crop rich in high-quality plant protein and beneficial for human health, has been widely cultivated globally. In the soybean production process, the seedling stage’s emergence rate is considered an essential decision indicator for subsequent production management and a key reference for yield prediction. Traditionally, the evaluation of soybean emergence rate is often done through a combination of manual counting and sampling methods. This method has proven to be labor-intensive and susceptible to inaccuracies, stemming from factors such as the density of the plants, limitations in human visual perception, the representativeness of the samples taken, and the methodologies employed in sampling [
1,
2,
3]. Additionally, it is challenging to meet the needs of continuous spatiotemporal monitoring of large-scale fields with these methods. Therefore, it is necessary to find a rapid and highly accurate detection method for the emergence rate of soybean seedlings that is suitable for large-scale areas. In modern precision agriculture, it is becoming increasingly important to use computer vision technology and UAV remote sensing technology to address the challenge of monitoring soybean seedling emergence rate, especially for early breeding decisions and the implementation of reseeding work [
4].
Concerning the identification and counting problem in agricultural production, researchers have started to apply advanced technologies such as machine vision and airborne remote sensing to monitor the phenotypic information of crop growth in fields. The application of these novel technologies provides powerful tools for agricultural production, which is expected to improve production efficiency and scientific decision-making. Bawa et al. proposed a method for cotton boll counting and lint cotton yield estimation from UAV imagery based on support vector machine and image processing techniques [
5]. Rahimi et al. implemented ananas comosus crown UAV image threshold segmentation and crop counting before harvesting using the HSV and LAB color space transformation schemes [
6]. Valente et al. combined machine learning with the transfer learning method of AlexNet to design a method for detecting the number of plants in the field after seeding from high-resolution RGB aerial images taken by the UAV [
7]. Compared with traditional statistical learning methods, machine learning methods have advantages in dealing with complex problems, adaptability, and interpretability. However, they also have some disadvantages, such as high data requirements, sensitivity to data, and reliance on human-defined features. When analyzing different phenotypes of crops, it is necessary to integrate multiple shallow features such as color and texture for trial-and-error design, which is a complex and time-consuming process. In addition, the performance tends to become saturated as the amount of data increases, making it unable to meet the growing demands of data processing. These methods, while foundational, struggle with the dynamic and complex nature of aerial imagery. Specifically, they exhibit inherent limitations in terms of adaptability, processing speed, and accuracy under varying field conditions.
The data processing of low-altitude remote sensing visible light images captured by UAVs is one of the research hotspots in the precision agriculture aviation field. With the rise and development of precision agriculture aviation, UAV remote sensing technology has provided possibilities for precise field operations and dynamic continuous monitoring [
8]. Scholars have begun combining deep learning network technology with UAV remote sensing images to promote information extraction and decision monitoring in agricultural production [
9]. The classical object detection methods based on deep learning include Faster R-CNN [
10,
11], SSD [
12], YOLO series [
13,
14,
15], and other network models. Researchers have applied different network models to the identification of crops such as maize [
16,
17,
18], cassava [
19], wheat [
20,
21], cotton [
22,
23], and peanuts [
24], as well as other scenarios. For the problem of crop identification and counting, Jiang et al. developed an algorithm based on the Faster R-CNN algorithm for detecting and counting field plant seedlings [
25]. Compared to manual counting, the algorithm achieved a correlation coefficient of 0.98. Li et al. developed an identification counter for wheat ear images [
26]. They used the Faster R-CNN model to conduct image identification and genetic research on the number of ears per unit area. Recent research has focused on developing lightweight network models to better apply the model to practical scenarios [
27]. Wang et al. proposed a maize field image counting method based on YOLOv3 and Kalman filtering, achieving an accuracy of over 98% in counting maize seedlings [
28]. Yang et al. used the YOLOv4 model with the CBAM attention module to rapidly detect and count wheat ears in the field, with average accuracies of 94%, 96.04%, and 93.11% on three different test sets, respectively [
29]. Bao et al. combined YOLO with transformer prediction heads to design a wheat count detection model. They also utilized transfer learning strategies to enhance wheat counting accuracy in UAV images [
30]. Due to its robust performance and easy-to-modify structure, the YOLOv5 model outperforms traditional SVM-based methods and its predecessors in the YOLO series in detection accuracy and processing speed, making it ideal for real-time applications in agricultural UAVs.
Research in the field of soybean crop identification mainly focuses on various aspects, such as disease identification [
31,
32,
33], plant phenotype extraction [
34], canopy analysis [
35,
36], yield prediction [
37], seed counting [
38], and variety identification [
39,
40]. However, a few studies have involved the issue of how to count soybean seedlings using high-resolution UAV images. Typically, drones maintain a high flying altitude, which may make dense and small soybean seedlings difficult to identify or even completely unrecognizable. Although deep learning algorithms in the existing literature have shown higher advantages in detection accuracy, they also face challenges such as high Giga Floating Point Operations Per Second (GFLOPs), parameter volume, and large model sizes, which make it difficult to achieve real-time inference on edge devices in agricultural environments. Therefore, it is necessary to develop a soybean seedling detection method suitable for edge devices using advanced computer vision technology. The improvement of the model aims to ensure that it operates on edge devices with the fastest possible inference speed and the smallest possible model size.
In response to the limitations of traditional evaluation methods in large-scale scenarios, this study determined that the main scope of research is to use drone remote sensing technology and computer vision technology to address the challenge of monitoring the soybean seedling emergence rate. Based on the analysis above, the research objective of this article is to propose a detection method for dense soybean seedlings in agricultural images suitable for airborne edge devices using the improved YOLOv5s model. Specifically, the study aimed to address the following specific problems: (1) by improving the YOLOv5 model, the complexity and time consumption of soybean seedling emergence rate detection could be reduced. (2) The GhostNetV2 network was used as the backbone network of the improved model to reduce the model’s parameters and make it more suitable for inference on edge devices. (3) The ECA attention module and BiFPN module were introduced to improve the model’s performance and feature representation ability. (4) For dense soybean seedling stage images, the input size of the image was adjusted to enhance the feature extraction capability. (5) The pruning algorithm was employed to remove redundant structures and reduce the model size to accelerate the inference speed on edge or embedded devices.
The article is structured as follows.
Section 2 details the materials and methods. Specifically, the process of acquiring image data and creating a soybean seedling dataset is emphasized within the materials section. The methodology part provides a comprehensive overview covering the YOLOv5 algorithm, the improvement process, and the specifics of model training.
Section 3 is the results and discussion section. The model’s performance is meticulously evaluated and discussed based on a series of experimental results. Finally,
Section 4 serves as the conclusion, summarizing the entire paper.
3. Results and Discussion
3.1. Model Evaluation Indicators
To objectively evaluate the identification effect of the improved model on soybean seedlings, this study employed a range of evaluation indicators, including Precision (P), Recall (R), Average Precision (AP), network parameters, model size, and detection speed. The value of IoU was set to 0.5 during the experiments. The calculation formulas for Precision, Recall, F1, and AP are shown in Formulas (1), (2), and (3), respectively.
In the formulas, TP represents the number of correctly detected targets, FP represents the number of incorrectly labeled targets, and FN represents the number of missed detections in the images. AP represents the area under the precision–recall curve, and a higher AP value indicates better algorithm performance. In this study, the mAP value is equal to the AP value.
Detection speed refers to the duration of model inference time, which is used to evaluate the real-time performance of the model. It is typically measured in frames per second (FPS). A higher FPS indicates a faster detection speed of the model. For image data, it represents the number of images that can be processed per second. Typically, the training loss value is the primary indicator for evaluating the performance of a neural network model. As the training epochs of the model increase, the training set loss value (Tra loss) and the validation set loss value (Val loss) gradually converge to a certain value and stabilize. This study also considers the training time of the neural network and the size of the generated weights as one of the evaluation criteria for assessing the training results.
3.2. Ablation Experiments
To validate the performance of the improved YOLOv5s model, this study conducted ablation experiments for the model improvement process. The experimental results on the same test set are shown in
Table 2. The original YOLOv5s model is referred to as M0 for convenience of expression. Other models based on the M0 model as improvements are referred to as M1, M2, M3, M4, and M5, respectively. Among them, the third to sixth rows in
Table 2 correspond to the detection results of models M1 to M4 with an input size of 640 × 640. The last row represents the detection results of model M5 with an input size of 1280 × 1280 pixels.
By comparing the M0 and M1 models, it was found that the mAP of the M1 model increased by 3.0 percentage points after replacing the backbone network. The model parameters and weights were reduced by 45.89% and 43.58%, respectively. This indicates that adopting the lightweight convolution approach of GhostNetV2 as the backbone network could more effectively extract trainable features compared to CSPDarknet53, and this improvement strategy was successful. The M2 model, which added the ECA attention module, achieved a 1.3% improvement in mAP without significantly increasing the model parameters. This suggests that the attention module can improve the ability to extract spatial position information and regions of interest from images. By adding the Bi-FPN structure on the PANet network to realize the fusion of bi-directional feature information, the mAP value of the M3 model was improved by 1.1% compared to the M2 model. By combining the Bi-FPN structure and the ECA module, the average accuracy of the M3 model was improved by 2.4 percentage points over the M1 model. It shows that the combination of the two is effective. Although the increased network layers reduce the detection speed, it can significantly improve the detection accuracy of the model. The M4 model was obtained by pruning the M3 model. The results show that when the pruning rate reached 37.15%, the mAP of the M4 model was decreased by 0.8%. At this time, the size of the M4 model was only 3.08 MB, and the FPS reached 85.03 frames/s.
When the input size of the model was increased to 1280 × 1280, the receptive field of the feature map increased, and the identification accuracy of soybean images increased from 90.8% to 92.1%. Compared with the baseline model, its model size and total parameters were reduced by 76.65% and 79.55%, respectively. From the overall results of the ablation experiments, compared with the baseline model, although the detection speed of the M5 model in this paper has slowed down, it has strong performance in other aspects.
3.3. Visualization of Identification Process
In this paper, heat maps are drawn with the help of the visualization tool Gradient-weighted Class Activation Mapping (GradCAM). By observing the GradCAM heat map, the areas and features that the neural network focuses on in computer vision tasks can be revealed [
49]. In the heatmap, brighter/deeper colors indicate higher attention from the neural network towards those areas, signifying their greater importance in the judgment or decision-making process for the target class. After a series of tests, it was found that the improved YOLOv5s model can obtain better heat map results under the parameters shown in configuration
Table 3. The heat map results of YOLOv5s before and after improvement are shown in
Figure 8.
Figure 8a,b show the image of the maize–soybean strip composite planting and the detection image of the M5 model, respectively. The thermal distribution map in
Figure 8c shows that the original YOLOv5s model focuses on the soybean seedlings and nearly all of the maize seedlings. Although the model focuses more on the features of soybean seedlings than on maize seedlings, the excessive focus on regions associated with maize seedlings can lead to misclassifications. With the addition of ECA and BiFPN modules, the improved model in
Figure 8d emphasizes the focal regions of soybean seedlings and a small number of maize seedlings during the target identification stage. Eventually, soybean seedlings are identified from these focal regions by effectively filtering out the unimportant or non-target feature information, thereby improving identification accuracy.
3.4. Comparative Experimental Analysis of Different Models
To evaluate the detection performance of the proposed improved model (M5) on soybean seedlings, current mainstream single-stage YOLO series models, the two-stage classical Faster RCNN model, and the M5 model in this paper were selected for performance comparison. The different object detection models were trained for 200 epochs using the dataset created in this study. Subsequently, precision, recall rate, mAP, weight size, and detection speed were evaluated on the test set. The performance comparison results of the six detection models are shown in
Table 4.
As can be seen from
Table 4, the P-value, R-value, and mAP-value of the YOLO series network model all exceed those of the classic Faster RCNN model. This result verifies that the selected backbone network of the YOLO series network model has more powerful feature extraction capabilities than the VGG16 in the Faster RCNN model. By comparing the M5 model with the latest YOLO series object detection models, it is shown from multiple evaluation results that the M5 model ranks highly in terms of evaluation performance, which indicates the superiority of this improved model. Specifically, the precision, recall, and mAP values of the M5 model all rank second among the six detection models mentioned in
Table 4. Although the mAP value of the M5 model is 0.5% lower than that of the YOLOv7 model, the M5 model has the smallest model size among all the models. Comparison and analysis between the M5 model and the lightweight YOLOXs model showed that although the detection speed of the M5 model is slightly slower than that of the YOLOXs model (by 4.77 FPS), the mAP is 0.4% higher than that of the YOLOXs model.
Although the latest models, such as YOLOv7 and YOLOv8, have been proposed, the YOLOv5 model has the characteristics of more accessible training, easy deployment, and simple structure. At the same time, the improved performance is not inferior to other models, so the YOLOv5 model still has high practical application value. Therefore, the improved model is lightweight and conducive to model deployment and migration. It can provide valuable vision technical references for the automation of soybean seedling replanting in the seedling stage.
3.5. Field Experimentation and Analysis
3.5.1. Model Deployment and Field Testing
As shown in
Figure 9, the proposed model in this study was first deployed on the NVIDIA Jetson NX and then mounted beneath the laboratory-developed UAV. The data acquisition device used in the field experiment was a Hikrobot MV-CB060-10UC industrial camera with a 6-megapixel camera. The maximum resolution was 3072 × 2048, and the maximum frame rate was 60.9 fps. The camera module was connected to the edge device via a USB interface. The UAV battery powered the edge device and the driving motors through a power connection cable. Through testing and analysis, it can be found that the running frame rate of the M4 model proposed in this article on NVIDIA Jetson NX was 34.83 frames/s. Therefore, it can be concluded that the model runs smoothly on low-power edge devices.
3.5.2. Analysis of Soybean Image Detection at Different Flight Heights
To explore the detection effects of the improved model on soybean images at different flight heights, this field experiment was conducted with fixed-height flights at three levels (3 m, 5 m, and 7 m). The captured soybean image identification results are output from the edge device and displayed in
Figure 10.
As shown in
Figure 10a, when the flight height is 3m, the model’s accuracy in identifying soybean images is 98.43%. Only one adhesion identification result appears (as marked by the yellow circle area in
Figure 10a). At a flying height of 5 m, the model achieves 100% precision in soybean image identification. When flying at an altitude of 7 m, the accuracy of the model in soybean image identification is 98.97%. There are only two instances of misidentification due to adhesion (as marked by yellow circles in
Figure 10c). A comparison between the proposed model and manual counting results in this study reveals that the improved model has a higher identification performance for UAV images at three different height levels. Moreover, this experiment also concluded that soybean image identification performance is optimal when the flight altitude is 5 m. The identification results of more than 95% at three different heights also show that the improved model has stable generalization performance for soybean images at different heights.
3.5.3. Soybean Seedling Detection Results in Different Scenarios
In this study, different scenarios, including wind or airflow disturbances, planting density, and seedling growth stages, were randomly selected to test the performance of the improved model. The detection results are shown in
Figure 11.
The test images were manually divided into mild jitter (30%), moderate downward jitter (40%), moderate jitter (50%), and moderate upper jitter (60%) according to the jitter blur degree of soybean seedlings in the image.
Figure 11a-d show the detection results of soybean seedling images under different jitter levels. According to the statistical data in
Table 5, the differences between the maximum and minimum values of the soybean seedling detection results in the images are 24%, 22%, 26%, and 41%, respectively. Although the average detection accuracy of soybean seedling images gradually decreases with the increase in jitter, the overall detection accuracy remains above 88%. Therefore, it can be observed that although the detection results are slightly affected by environmental wind or UAV airflow, the proposed model in this study still demonstrates a strong detection capability.
Figure 11e,f show the detection effect of the improved model on soybean seedling images with different sparsity levels. In the sparse image of
Figure 11e, 50 soybean seedlings were detected by the model. The false negative rate was 3.85%. In the dense image in
Figure 11f, 168 soybean seedlings were detected by the model. The false negative rate was 1.18%. It can be concluded that the model proposed in this paper has strong robustness to the sparsity of soybean images.
Additionally, two growth stages of soybean seedlings were selected for detection and comparison in this study. As shown in
Figure 11g, the rate of missing seedlings is higher during the cotyledon stage and lower during the second node stage of soybean seedlings. Therefore, it can be inferred that the proposed model in this paper can be used to accurately monitor soybean seedling deficiency, facilitating the arrangement of agricultural activities such as reseeding or seedling supplementation.
3.5.4. Analysis and Reflection on Different Planting Modes
To explore the generalization of the improved model for different planting methods, this study carried out image identification experiments on two different soybean planting patterns in the field. One was soybean monoculture planting, and the other was soybean–maize strip intercropping. The latter planting mode was the interval planting of two rows of maize—four rows of soybean. The study selected 20 images from each planting mode for detection experiments and comparative analysis.
From the detection results of soybean seedlings under two planting modes in
Table 6, it can be seen that the average identification accuracy of soybean monoculture planting mode reached 90.6%, while the average identification accuracy of soybean–corn strip intercropping mode was 88.25%. Considering the two planting modes comprehensively, the overall average identification accuracy was 89.5%. Although it indicates that the identification performance of the model slightly decreased in the soybean–maize strip intercropping mode, it still maintained relatively high accuracy.
The fitting curve images of the two planting modes shown in
Figure 12 also confirm this conclusion. For the monoculture planting method, the expression of the fitting curve is
. The correlation coefficient
is 0.9979, indicating a high degree of agreement between the predicted and actual ground truth values. For the intercropping planting method, the expression of the fitted curve is
, with a correlation coefficient
of 0.9376. The case means that there may be a certain degree of difference between the predicted and actual values, but it still effectively captures the trend of the data changes. The difference may be due to the similarity in color between maize seedlings and soybeans, which could lead to false detection via the improved model. In the future, the compound planting of soybeans and maize will become an emerging planting trend. This study provides novel insights for exploring and monitoring planting modes between crops and contributes fundamental data for governmental grain planning and decisions [
50].
3.5.5. Failure Case Analysis
There are still cases of missed and false detections in the process of using the improved model to detect soybean seedlings in the test set images. As shown in the areas indicated by the blue boxes in
Figure 13a and 13b, some weeds or maize seedlings with leaf color and shape similar to soybean seedlings were mistakenly detected. This case will result in a slightly higher detection count than the number of soybeans planted. When two soybean seedlings with large physical differences are planted closely together, the vigorously growing soybean seedlings may slightly obscure the shorter seedlings (as shown by the blue box in
Figure 13c). In this scenario, the model only identified the vigorously growing soybean seedling. The solution to this problem is to adjust seeding uniformity. In
Figure 13d, undersized soybean seedlings appear. The reason for the missed detection may be that the shape and texture of the soybean leaves are not sufficiently exposed in the image due to the problem of high shooting height or sprouting. To address this case in the future, improvements in algorithmic methods, adoption of higher resolution cameras for image capture, or the incorporation of super-resolution algorithms should be considered to reduce the likelihood of missed detections.
3.6. Discussion
This study proposes a rapid detection method suitable for airborne edge devices and large-scale dense soybean field images. Despite this study having selected the high-performing YOLOv5s model, certain limitations still were encountered when processing target data. To address these limitations, the GhostNetV2 network was adopted in this study to achieve a lightweight model in terms of size and number of parameters. Considering the characteristics of the target dataset, attention mechanisms and feature fusion modules were incorporated to enhance the model’s accuracy. Moreover, to facilitate more effective deployment on edge devices, a pruning algorithm was employed to further reduce the model’s size and parameter count. The improved model demonstrated enhanced adaptability and efficiency in processing the soybean seedling dataset through this optimization strategy.
Compared to the original YOLOv5s model, the M4 method proposed in this study demonstrates substantial improvements. When processing images with a resolution of 640 × 640, the M4 model exhibits an increase of 4.6 percentage points in mAP while reducing model weight by over 70%. Additionally, the model’s inference speed is improved by 25%. The efficacy of these improvements is further validated through meticulous ablation experiments. To compare the differences in computational resources required before and after improvements in deploying the YOLOv5s model on edge devices, this study deployed the M0 and M4 models on the drone platform shown in
Figure 9. After testing, it can be seen that the M4 model achieved an inference speed of 34.83 FPS, which is a 22.12% increase compared to the inference speed of the M0 model (28.52 FPS). In deploying the model on the airborne edge device, the amount of model parameters and the required computational resources remained consistent before and after deployment. According to the data analysis in
Table 2, it can be seen that the M4 model accounts for only 20.45% and 22.48% of the resources required by the M0 model in terms of parameter quantity and weight size, respectively.
Although our research used the Nvidia Jetson Xavier NX as the airborne edge computing device, the concept of an edge device is not limited to this particular model or brand. Edge computing can encompass a range of devices capable of performing data processing tasks efficiently in the field. These devices, varying in computational power, play a crucial role in on-site data processing across various applications. Considering the limited computational capability of edge devices carried by agricultural UAVs, this paper will focus on the problem of soybean seedling images taken by UAVs in specific application scenarios. Specifically, our efforts will be dedicated to adjusting the model structure and optimizing parameters to reduce the model size, enhance recognition accuracy, and improve the detection speed strategy of the model on embedded devices. Through these efforts, we hope to achieve significant breakthroughs in the processing capabilities of soybean seedling images in practical agricultural environments. With further advancements in edge computing device performance in the future, there is hope for more efficient and accurate real-time data processing.
To obtain as much soybean yield as possible on limited land, different researchers have developed various soybean–maize strip intercropping patterns. This study only collected data from two types of soybean planting methods, and other types of soybean–maize strip intercropping patterns need to be explored, such as four rows of maize–four rows of soybean, four rows of maize–six rows of soybean, and so on. Another important point is that obtaining enough rich data is very time-consuming and requires exploration. In future work, more abundant data collection work can be carried out based on the promotion of the strip compound planting model to achieve wider applications.
Through the rapid detection method for dense soybean seedling field images proposed in this study, the development of seedlings in different scenarios can be understood to help farmers take corresponding cultivation and management measures in a timely manner. For example, monitoring the growth status of seedlings under wind or airflow disturbances can help optimize planting density and layout to reduce mutual competition among seedlings. Additionally, monitoring the changes in the growth stages of seedlings can assist in scheduling agricultural activities such as fertilization, irrigation, and weed control, thus providing optimal growth conditions. Overall, this study offers new technical support for precision agriculture, which is of significant importance for the sustainable development of agriculture.
4. Conclusions
To quickly and accurately obtain the number of soybean seedlings in the field, this study proposes a fast detection method based on the improved YOLOv5s model. First, GhostNetV2 was used instead of CSPDarknet 53 as the backbone feature extraction network, and ECA and BiFPN modules were introduced to improve the model’s identification accuracy and feature representation capability. The input size of 1280 × 1280 pixels was adopted to solve the problem of insufficient feature extraction of small-scale soybean seedlings. Moreover, the PAGCP algorithm was employed to streamline the model structure and boost inference speed. Experimental results show that the proposed improved model achieves an identification accuracy of 92.1% for soybean seedlings, 5.9% higher than the baseline model. The model size is also compressed by 23.35%, and the parameter count is reduced by 79.55%. Compared with other classic models, the improved model proposed in this paper has certain advantages in comprehensive performance. In addition, the detection performance under different scenarios, such as different flight heights, degrees of sparsity, stages of seedling growth, and planting modes, was also discussed. Furthermore, some detection failure cases were discussed and analyzed. The experimental results show that the improved model has excellent robustness and generalization performance.
In summary, the proposed method in this paper offers a novel technological approach for the fast detection of dense soybean seedlings in field environments. It has positive significance for research fields, such as quick assessment of soybean emergence rate and yield prediction. In further work, the object tracking algorithm will be combined to further improve the model’s performance in real-time statistics of soybean emergence rate and field management.