Fast and Precise Detection of Dense Soybean Seedlings Images Based on Airborne Edge Device

Yang, Zishang; Liu, Jiawei; Wang, Lele; Shi, Yunhui; Cui, Gongpei; Ding, Li; Li, He

doi:10.3390/agriculture14020208

Open AccessArticle

Fast and Precise Detection of Dense Soybean Seedlings Images Based on Airborne Edge Device

by

Zishang Yang

,

Jiawei Liu

,

Lele Wang

,

Yunhui Shi

,

Gongpei Cui

,

Li Ding

and

He Li

^*

College of Mechanical and Electrical Engineering, Henan Agricultural University, Zhengzhou 450002, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(2), 208; https://doi.org/10.3390/agriculture14020208

Submission received: 12 December 2023 / Revised: 25 January 2024 / Accepted: 26 January 2024 / Published: 28 January 2024

(This article belongs to the Special Issue Innovative Technology and Intelligent Equipment for Field Crop Mechanization Production)

Download

Browse Figures

Versions Notes

Abstract

:

During the growth stage of soybean seedlings, it is crucial to quickly and precisely identify them for emergence rate assessment and field management. Traditional manual counting methods have some limitations in scenarios with large-scale and high-efficiency requirements, such as being time-consuming, labor-intensive, and prone to human error (such as subjective judgment and visual fatigue). To address these issues, this study proposes a rapid detection method suitable for airborne edge devices and large-scale dense soybean seedling field images. For the dense small target images captured by the Unmanned Aerial Vehicle (UAV), the YOLOv5s model is used as the improvement benchmark in the technical solution. GhostNetV2 is selected as the backbone feature extraction network. In the feature fusion stage, an attention mechanism—Efficient Channel Attention (ECA)—and a Bidirectional Feature Pyramid Network (BiFPN) have been introduced to ensure the model prioritizes the regions of interest. Addressing the challenge of small-scale soybean seedlings in UAV images, the model’s input size is set to 1280 × 1280 pixels. Simultaneously, Performance-aware Approximation of Global Channel Pruning for Multitask CNNs (PAGCP) pruning technology is employed to meet the requirements of mobile or embedded devices. The experimental results show that the identification accuracy of the improved YOLOv5s model reached 92.1%. Compared with the baseline model, its model size and total parameters were reduced by 76.65% and 79.55%, respectively. Beyond these quantitative evaluations, this study also conducted field experiments to verify the detection performance of the improved model in various scenarios. By introducing innovative model structures and technologies, the study aims to effectively detect dense small target features in UAV images and provide a feasible solution for assessing the number of soybean seedlings. In the future, this detection method can also be extended to similar crops.

Keywords:

soybean seedling; airborne edge device; object detection; YOLOv5; GhostNetV2

1. Introduction

Soybean, a crop rich in high-quality plant protein and beneficial for human health, has been widely cultivated globally. In the soybean production process, the seedling stage’s emergence rate is considered an essential decision indicator for subsequent production management and a key reference for yield prediction. Traditionally, the evaluation of soybean emergence rate is often done through a combination of manual counting and sampling methods. This method has proven to be labor-intensive and susceptible to inaccuracies, stemming from factors such as the density of the plants, limitations in human visual perception, the representativeness of the samples taken, and the methodologies employed in sampling [1,2,3]. Additionally, it is challenging to meet the needs of continuous spatiotemporal monitoring of large-scale fields with these methods. Therefore, it is necessary to find a rapid and highly accurate detection method for the emergence rate of soybean seedlings that is suitable for large-scale areas. In modern precision agriculture, it is becoming increasingly important to use computer vision technology and UAV remote sensing technology to address the challenge of monitoring soybean seedling emergence rate, especially for early breeding decisions and the implementation of reseeding work [4].

Concerning the identification and counting problem in agricultural production, researchers have started to apply advanced technologies such as machine vision and airborne remote sensing to monitor the phenotypic information of crop growth in fields. The application of these novel technologies provides powerful tools for agricultural production, which is expected to improve production efficiency and scientific decision-making. Bawa et al. proposed a method for cotton boll counting and lint cotton yield estimation from UAV imagery based on support vector machine and image processing techniques [5]. Rahimi et al. implemented ananas comosus crown UAV image threshold segmentation and crop counting before harvesting using the HSV and LAB color space transformation schemes [6]. Valente et al. combined machine learning with the transfer learning method of AlexNet to design a method for detecting the number of plants in the field after seeding from high-resolution RGB aerial images taken by the UAV [7]. Compared with traditional statistical learning methods, machine learning methods have advantages in dealing with complex problems, adaptability, and interpretability. However, they also have some disadvantages, such as high data requirements, sensitivity to data, and reliance on human-defined features. When analyzing different phenotypes of crops, it is necessary to integrate multiple shallow features such as color and texture for trial-and-error design, which is a complex and time-consuming process. In addition, the performance tends to become saturated as the amount of data increases, making it unable to meet the growing demands of data processing. These methods, while foundational, struggle with the dynamic and complex nature of aerial imagery. Specifically, they exhibit inherent limitations in terms of adaptability, processing speed, and accuracy under varying field conditions.

The data processing of low-altitude remote sensing visible light images captured by UAVs is one of the research hotspots in the precision agriculture aviation field. With the rise and development of precision agriculture aviation, UAV remote sensing technology has provided possibilities for precise field operations and dynamic continuous monitoring [8]. Scholars have begun combining deep learning network technology with UAV remote sensing images to promote information extraction and decision monitoring in agricultural production [9]. The classical object detection methods based on deep learning include Faster R-CNN [10,11], SSD [12], YOLO series [13,14,15], and other network models. Researchers have applied different network models to the identification of crops such as maize [16,17,18], cassava [19], wheat [20,21], cotton [22,23], and peanuts [24], as well as other scenarios. For the problem of crop identification and counting, Jiang et al. developed an algorithm based on the Faster R-CNN algorithm for detecting and counting field plant seedlings [25]. Compared to manual counting, the algorithm achieved a correlation coefficient of 0.98. Li et al. developed an identification counter for wheat ear images [26]. They used the Faster R-CNN model to conduct image identification and genetic research on the number of ears per unit area. Recent research has focused on developing lightweight network models to better apply the model to practical scenarios [27]. Wang et al. proposed a maize field image counting method based on YOLOv3 and Kalman filtering, achieving an accuracy of over 98% in counting maize seedlings [28]. Yang et al. used the YOLOv4 model with the CBAM attention module to rapidly detect and count wheat ears in the field, with average accuracies of 94%, 96.04%, and 93.11% on three different test sets, respectively [29]. Bao et al. combined YOLO with transformer prediction heads to design a wheat count detection model. They also utilized transfer learning strategies to enhance wheat counting accuracy in UAV images [30]. Due to its robust performance and easy-to-modify structure, the YOLOv5 model outperforms traditional SVM-based methods and its predecessors in the YOLO series in detection accuracy and processing speed, making it ideal for real-time applications in agricultural UAVs.

Research in the field of soybean crop identification mainly focuses on various aspects, such as disease identification [31,32,33], plant phenotype extraction [34], canopy analysis [35,36], yield prediction [37], seed counting [38], and variety identification [39,40]. However, a few studies have involved the issue of how to count soybean seedlings using high-resolution UAV images. Typically, drones maintain a high flying altitude, which may make dense and small soybean seedlings difficult to identify or even completely unrecognizable. Although deep learning algorithms in the existing literature have shown higher advantages in detection accuracy, they also face challenges such as high Giga Floating Point Operations Per Second (GFLOPs), parameter volume, and large model sizes, which make it difficult to achieve real-time inference on edge devices in agricultural environments. Therefore, it is necessary to develop a soybean seedling detection method suitable for edge devices using advanced computer vision technology. The improvement of the model aims to ensure that it operates on edge devices with the fastest possible inference speed and the smallest possible model size.

In response to the limitations of traditional evaluation methods in large-scale scenarios, this study determined that the main scope of research is to use drone remote sensing technology and computer vision technology to address the challenge of monitoring the soybean seedling emergence rate. Based on the analysis above, the research objective of this article is to propose a detection method for dense soybean seedlings in agricultural images suitable for airborne edge devices using the improved YOLOv5s model. Specifically, the study aimed to address the following specific problems: (1) by improving the YOLOv5 model, the complexity and time consumption of soybean seedling emergence rate detection could be reduced. (2) The GhostNetV2 network was used as the backbone network of the improved model to reduce the model’s parameters and make it more suitable for inference on edge devices. (3) The ECA attention module and BiFPN module were introduced to improve the model’s performance and feature representation ability. (4) For dense soybean seedling stage images, the input size of the image was adjusted to enhance the feature extraction capability. (5) The pruning algorithm was employed to remove redundant structures and reduce the model size to accelerate the inference speed on edge or embedded devices.

The article is structured as follows. Section 2 details the materials and methods. Specifically, the process of acquiring image data and creating a soybean seedling dataset is emphasized within the materials section. The methodology part provides a comprehensive overview covering the YOLOv5 algorithm, the improvement process, and the specifics of model training. Section 3 is the results and discussion section. The model’s performance is meticulously evaluated and discussed based on a series of experimental results. Finally, Section 4 serves as the conclusion, summarizing the entire paper.

2. Materials and Methods

2.1. Materials

2.1.1. Image Data Acquisition

The experimental data were collected at noon in Linying County, Henan Province (23.55 N, 113.59 E), from 25 June to 14 July 2022. Its geographical location is shown in Figure 1. The soybean growth stage during the data collection was between the VC stage (cotyledon stage) and the V2 stage (second trifoliate stage). The UAV image acquisition platform was the DJI Mavic 2 Pro, which has a 20-megapixel camera with a 77° field of view. The resolution of visible light images was 5472 × 3648, with automatic exposure parameters and autofocus mode. During the data acquisition, the flight height of the UAV was 4~10 m, and the flight speed was 1.5 m/s.

With the rise of strip compound planting methods, soybean planting patterns are showing a more diverse trend. The model may encounter data generated via various planting methods in practical applications. In order to better adapt to this diversity, this study collected image data, including soybean monoculture and maize–soybean strip intercropping, and constructed corresponding datasets. Training on this data enables the model to function in a wider range of agricultural settings. Models trained on different planting methods can more easily adapt or handle variations in real-world scenarios. As shown in Table 1, a total of 550 RGB images were obtained in this experiment. By visually evaluating the quality of the collected image data, the targets in the image were clear, and the overall quality met the requirements of soybean seedling target detection.

2.1.2. Production of Soybean Seedling Dataset

In this paper, the deep learning image annotation tool CVAT (version 2.3.0) was used to manually label the collected UAV visible light image data. The image annotation method was to create an annotation task and then upload the images that needed annotation. Finally, the rectangular box tool was used to select the soybean seedlings in the uploaded image with the mouse to obtain actual boxes for training and testing. The annotation files were saved in the PASCAL VOC data format. Figure 2 shows an example of soybean seedling annotation, where a red rectangular box is used to mark soybean seedlings as “soybean”. The annotation results are saved to the XML format label file.

After the annotation work was completed, 132,367 soybean seedling labels were obtained. Then, the original data were divided into training, validation, and testing sets according to a 7:1:2 ratio. To enrich the diversity of image data, this study applied data augmentation methods to the training set images [41], including Gaussian blur, random rotation, random cropping of image regions, histogram equalization, random brightness adjustment, and salt-and-pepper noise. The dataset was augmented five times through different combinations of the above several augmentation methods. Finally, 2475 images were obtained as the final dataset.

2.2. Methodologies

2.2.1. YOLOv5 Model

YOLOv5_6.1 is an improved version of the YOLOv5 object detection model proposed by Glenn Jocher et al. in 2022 [42]. Compared to version 5.0, this version mainly includes two modifications: (1) the first Focus layer and the subsequent 1 × 1 convolution layer of the backbone feature network are replaced with a 6 × 6 convolution layer, aiming to simplify the model structure and improve detection speed. (2) The Spatial Pyramid Pooling (SPP) module in the neck section is replaced by the SPPF module, which significantly reduces the model’s computation while consistently maintaining the results of the receptive field. Compared with the YOLOv4 model, the YOLOv5 model arranges the design of the residual structure in each C3 module of the feature fusion PAN structure. In summary, the YOLOv5 model is not only currently one of the most excellent models, with a balance of accuracy and inference speed, but it also supports multiple deployment methods, such as ONNX and TensorRT.

Similarly, the structure of YOLOv5 also includes the backbone, neck, and head networks. The CSPDarknet 53 is employed as the feature extractor in the backbone network, which consists of five 6 × 6 convolutional layers and four C3 modules. When the input size of the model is set to 640 × 640 pixels, the 6×6 convolutional layers first perform slicing operations on the image, aiming to retain the original image information as much as possible while performing convolution. C3 modules are used for further convolution. Each C3 module consists of alternating stacks of convolutional and residual structures. Then, the SPPF structure is applied to extract the feature through multiple different max-pooling layers to enlarge the receptive field. After passing through the backbone network and SPPF structure, feature maps with 8×, 16×, and 32× downsampling are obtained, respectively. In the neck network stage, the feature maps of these three scales are input into the Path Aggregation Network (PANet) for feature fusion. Finally, different feature maps of large, medium, and small objects are output. In the head network, the YOLOv5 model first uses anchor boxes to predict objects. Each prediction vector contains information such as the object’s center coordinates, width, and height. Then, the predictions from different scales of the YOLO Head are integrated. The NMS algorithm filters the generated multiple prediction boxes, completing the detection process for objects of different scales and categories.

Thanks to the powerful feature extraction ability of deep convolutional neural networks, the problem of low accuracy in soybean seedling identification in complex field environments can be effectively solved. Although the YOLOv5 model has better feature extraction capabilities, the operation of the CSPDarknet 53 network of this model requires a massive number of parameters and computations. If it is deployed directly on edge devices for field trials, its actual inference speed remains to be discussed. Therefore, it is necessary to optimize the YOLOv5 model using lightweight network modules that are more suitable for edge device computing architecture, thereby further improving the computing performance of the model.

2.2.2. Backbone Extraction Network—GhostNetV2

Designing a lightweight architecture that can be deployed on devices with limited memory and computing resources based on specific crops and their growing environments is a significant challenge for the practical application of models [43].

Tang et al. combined a hardware-friendly decoupled fully connected attention mechanism (DFC attention) with GhostNet and proposed GhostNetV2 architecture suitable for end-to-end devices [44]. GhostNet adopts the design concept of an Inverted bottleneck, and the dimensions of the two Ghost modules first increase and then decrease. The Ghost modules use operations with lower parameter quantity and Multiply–Accumulate Operations (MACs), known as Cheap Operations, to generate more feature maps. To reduce the feature size, the DFC attention mechanism performs downsampling on the feature maps in horizontal and vertical directions. Figure 3 shows the schematic diagram of GhostNetV2 architecture, mainly composed of two stacked Ghost modules and DFC attention (the background color in Figure 3). The DFC attention branch runs in parallel with the first Ghost module as an expansion layer to increase the channel numbers of feature maps. The enhanced features are downsampled using depthwise separable convolution and fed into the second Ghost module to generate the output features. The second Ghost module is responsible for reducing the channel numbers to match the channel numbers of the shortcut path. Finally, the input of the first Ghost and the output of the second Ghost are connected using the Add operation.

The operation of GhostNetV2 not only increases the receptive field, it also captures long-range dependencies between pixels at different spatial positions, balancing accuracy and speed with low computational cost. It can be easily integrated into any existing neural network and quickly deployed and inferred on edge devices.

2.2.3. ECA Attention Module

For convolutional neural networks, as the number of network layers continues to increase, the effective features in the network will become increasingly sparse, which may lead to a decrease in network performance. The attention mechanism can help the model focus on key features, thereby improving the model’s ability to represent important information. The Effective Channel Attention (ECA) mechanism is a lightweight attention mechanism module that can highlight key features in images [45]. ECA is an improved module based on the squeeze and excitation (SE) attention mechanism. It uses 1 × 1 convolutions instead of fully convolution layers for channel-wise attention learning, avoiding the problems of channel information loss and parameter reduction caused by dimensionality reduction and expansion. At the same time, adopting appropriate cross-channel interaction strategies can significantly reduce the complexity of the model while maintaining performance and seamlessly integrating it into other convolutional modules for end-to-end training. The working mechanism of the ECA module is shown in Figure 4. For the ECA module, the H × W × C feature map is first compressed through global average pooling to obtain a 1 × 1 × C feature map. Then, a 1-dimensional convolution with kernel size k is used to calculate attention weights for each channel. The normalized attention weight is obtained through the sigmoid function. Finally, feature maps with different weights can be obtained by multiplying channel-by-channel weighting on the original input feature maps.

In summary, the ECA module is a lightweight attention mechanism that can be used in tasks such as image classification and object detection. By applying attention weighting to each channel, the ECA module can improve the model’s focus on essential features and enhance performance.

2.2.4. Feature Fusion Module

For multi-layer convolutional neural networks, the feature fusion module can integrate features at different levels to further improve the representation ability of the model. By introducing the feature fusion module, the problem of insufficient transmission of feature information in the feature pyramid network caused by different resolutions of soybean seedlings at different flight heights is alleviated. Tan et al. proposed a bi-directional feature pyramid network (BiFPN) for feature fusion of object detection tasks [46]. BiFPN is an improved version of the feature pyramid network and integrates the multi-level feature ideas from PANet and NAS-FPN. Most differently, the BiFPN module removes the nodes of single input edges that do not perform feature fusion and adds skip connections to enhance the ability of feature expression.

As shown in Figure 5, BiFPN achieves top-down and bottom-up feature transmission by introducing bidirectional connections between each layer of the feature pyramid. Bidirectional connections are performed in each repeated block to adjust the scale of the feature map, where information is first transmitted from higher stages to lower stages and then transmitted back to higher stages for weighted fusion. In addition, BiFPN adds a skip connection between input and output nodes at the same scale to fuse more features at the cost of a slight increase in computational cost. By introducing bidirectional connections and special fusion operations, BiFPN can effectively transfer feature information of different scales and enhance the performance and robustness of object detection models.

2.3. Improvement of YOLOv5s Model

According to the sorting criterion of network depth-to-width ratio from large to small, the YOLOv5 model can be divided into five complexity models: x, l, m, s, and n. Users can choose the appropriate model based on different application scenarios. Among them, YOLOv5s represents the s-level YOLOv5 model. Its depth proportional coefficient is 0.33, and its width proportional coefficient is 0.50. This paper selects the YOLOv5s model for training and testing by comprehensively comparing small target objects like soybean seedlings in UAV images. Although the YOLOv5s model has achieved a relatively good balance between detection accuracy and efficiency, there is still room for improvement in lightweight and hardware deployment. Therefore, this paper carries out algorithm improvement work based on the YOLOv5s model [47]. The overall structure of the improved YOLOv5s is shown in Figure 6, where the ConvBNSiLU module indicates the synthesis module of Convolution, Batch Normal, and SiLU activation functions. The GhostConv module represents a Ghost structure module composed of two convolutions. The C3Ghost2 module represents a combined module stacked by the ConvBNSiLU module and the GhostNetV2 bottleneck module. The SPPF module further integrates multiple parallel MaxPool2ds of different sizes. ECA module stands for Efficient Channel Attention mechanism. The red line connection in the Neck represents the introduction of the BiFPN structure. YOLO Head represents detection heads of different tensor sizes.

2.3.1. Lightweight Backbone Network

To reduce the model parameters and facilitate the deployment on edge devices, this paper replaces the CSP-Darknet 53 network with the lightweight GhostNetV2 network. The modified backbone network structure is shown in Figure 6. First, the 6 × 6 convolutional layer in the original network is retained. Then, the lightweight backbone network is constructed by combining the GhostNetV2 bottleneck, the ConvBNSiLU module, and the C3Ghost2 module. When the input image size is 1280 × 1280 pixels, the lightweight backbone network can generate feature maps of three scales: 160 × 160 × 128, 80 × 80 × 256, and 40 × 40 × 512.

To visually compare and analyze the effects before and after the replacement of the backbone network, the feature maps of three identical convolutional layers from two backbone networks are extracted for visualization. It can be seen from Figure 7 that the feature layers generated by the two backbone networks have high similarity. After careful comparison, it can be found that GhostNetV2 retains relatively more feature information. Therefore, this paper uses the lightweight GhostNetV2 network to extract features that can be effectively trained to achieve similar or better effects than the CSP-Darknet 53 network.

2.3.2. Improvements of the Neck Structure

Considering that the research object of this paper is soybean seedlings in low-altitude UAV images, the improvement of the model needs to pay more attention to shallow feature maps that are more conducive to small target detection to improve the detection accuracy of soybean seedlings.

For RGB images, the YOLOv5s model can output feature maps for detecting targets of different scales after feature extraction and feature fusion networks. Shallow feature maps usually have smaller receptive fields, suitable for small target detection. As shown in Figure 6, the first improvement point is the introduction of ECA attention modules between the 4th, 6th, and 9th layers and the neck network to improve the representative ability of target object features. In the neck network of the original YOLOv5 model, although the direct feature fusion between different feature layers using PANet can effectively enrich feature information, this fusion method between layers may result in conflicts or filter out essential features of small targets, thereby reducing the model’s prediction effect. Therefore, the BiFPN module is introduced into the improved YOLOv5 structure to adaptively learn helpful information of different scale feature layers. Specifically, the cross-scale information from the 11th layer (marked by the red line in Figure 6) is loaded at the 22nd layer to solve the problems of insufficient feature extraction ability and high computational cost for different target scale features in existing methods. Similarly, when the input image passes through the neck network, it can also output three different tensors of feature maps, which are 160 × 160 × 128, 80 × 80 × 256, and 40 × 40 × 512.

2.3.3. Model Pruning

A channel pruning algorithm is used to trim redundant channels and weight parameters in the improved YOLOv5s detection model to make the model more lightweight. This operation can further improve the efficiency of the model on the premise of ensuring the detection accuracy. First, the γ coefficient in the batch normalization layer is used to evaluate the contribution of each channel in the channel pruning algorithm. Then, unimportant channels are removed based on the distribution of γ coefficients and the pruning rate of the pruning algorithm, and channels with high contributions are retained. Finally, the pruned model is fine-tuned to minimize the impact on model accuracy performance.

This paper adopts the Performance-aware Approximation of Global Channel Pruning for Multitask CNNs (PAGCP) framework to perform global channel pruning on the soybean seedling detection model [48]. The framework considers the importance of intra-layer and inter-layer filter combinations from a theoretical perspective, aiming to achieve excellent global channel pruning performance. A sequential greedy pruning strategy is also utilized to optimize the pruning object. In this strategy, a performance-aware prediction criterion is introduced to evaluate the sensitivity of each task to the filters. The convolutional kernels most closely related to the global task are retained.

The results of multiple experiments on the dataset show that the PAGCP method can significantly reduce the model’s FLOPs and parameters. The reduction in the improved model is more than 50%, and the decrease in performance is relatively small. In this study, the initial layer loss is set to 0.2, and the threshold for overall performance loss is set to 10.0.

2.3.4. Overall Structure of the Improved Model

As shown in Figure 6, the improvement work of the YOLOv5s model consists of the following steps. First, the lightweight GhostNetV2 network replaces the CSPDarknet backbone network, and the input size is set to 1280 × 1280 pixels to enhance the feature extraction capability for dense soybean seedlings. Secondly, the ECA attention module is introduced between the backbone feature network and the neck network to strengthen the extracted features and improve the network performance. In addition, the BiFPN module is introduced to optimize the PANet network. The bottom-level target location information is fused with the semantic information of the high-level features through bidirectional cross-scale connections. This modification alters the information flow in the algorithm and reduces information loss during the computation process. Finally, a pruning algorithm is applied to remove redundant structures, further enhancing the model’s inference speed.

2.4. Model Training

2.4.1. Training Environment

In this study, all models’ training and testing processes were conducted on the same server platform. The specific experimental environment configuration includes an Intel i5-12600KF 3.7GHz CPU, a 32GB of memory, a 512GB SSD storage, and an NVIDIA RTX3060 graphics card with 12GB of memory. The operating system is Ubuntu 20.04, the CUDA version is 11.5, the Python version is 3.7, and the PyTorch version is 1.7.1.

2.4.2. Parameters Setting

The improved YOLOv5 model was built using the PyTorch program library. The input image size was set to 640 × 640 and 1280 × 1280. The training parameters of the improved model are as follows: the initial learning rate was set to 0.01, the eta_min was 2 × 10⁻³, the momentum parameter was 0.937, the delay parameter was 5 × 10⁻³, the last_epoch was −1, the batch size was set to 16, and the T max was 200. It was optimized using the Adam optimizer during the training process. Multiple threads were employed to read the dataset in VOC format for training. In addition to offline augmentation methods, the training process includes online data augmentation methods such as Mosaic and Mixup, which further enrich the background of detected objects and enhance the model’s generalization performance.

All models in the paper were trained on the same training dataset under the same training environment and parameters. After training, the weight of each network model that converges to the optimal loss was selected, and the same test set was used for detection.

3. Results and Discussion

3.1. Model Evaluation Indicators

To objectively evaluate the identification effect of the improved model on soybean seedlings, this study employed a range of evaluation indicators, including Precision (P), Recall (R), Average Precision (AP), network parameters, model size, and detection speed. The value of IoU was set to 0.5 during the experiments. The calculation formulas for Precision, Recall, F1, and AP are shown in Formulas (1), (2), and (3), respectively.

P = \frac{T P}{T P + F P},

(1)

R = \frac{T P}{T P + F N},

(2)

A P = \int_{0}^{1} P (R) d R,

(3)

In the formulas, TP represents the number of correctly detected targets, FP represents the number of incorrectly labeled targets, and FN represents the number of missed detections in the images. AP represents the area under the precision–recall curve, and a higher AP value indicates better algorithm performance. In this study, the mAP value is equal to the AP value.

Detection speed refers to the duration of model inference time, which is used to evaluate the real-time performance of the model. It is typically measured in frames per second (FPS). A higher FPS indicates a faster detection speed of the model. For image data, it represents the number of images that can be processed per second. Typically, the training loss value is the primary indicator for evaluating the performance of a neural network model. As the training epochs of the model increase, the training set loss value (Tra loss) and the validation set loss value (Val loss) gradually converge to a certain value and stabilize. This study also considers the training time of the neural network and the size of the generated weights as one of the evaluation criteria for assessing the training results.

3.2. Ablation Experiments

To validate the performance of the improved YOLOv5s model, this study conducted ablation experiments for the model improvement process. The experimental results on the same test set are shown in Table 2. The original YOLOv5s model is referred to as M0 for convenience of expression. Other models based on the M0 model as improvements are referred to as M1, M2, M3, M4, and M5, respectively. Among them, the third to sixth rows in Table 2 correspond to the detection results of models M1 to M4 with an input size of 640 × 640. The last row represents the detection results of model M5 with an input size of 1280 × 1280 pixels.

By comparing the M0 and M1 models, it was found that the mAP of the M1 model increased by 3.0 percentage points after replacing the backbone network. The model parameters and weights were reduced by 45.89% and 43.58%, respectively. This indicates that adopting the lightweight convolution approach of GhostNetV2 as the backbone network could more effectively extract trainable features compared to CSPDarknet53, and this improvement strategy was successful. The M2 model, which added the ECA attention module, achieved a 1.3% improvement in mAP without significantly increasing the model parameters. This suggests that the attention module can improve the ability to extract spatial position information and regions of interest from images. By adding the Bi-FPN structure on the PANet network to realize the fusion of bi-directional feature information, the mAP value of the M3 model was improved by 1.1% compared to the M2 model. By combining the Bi-FPN structure and the ECA module, the average accuracy of the M3 model was improved by 2.4 percentage points over the M1 model. It shows that the combination of the two is effective. Although the increased network layers reduce the detection speed, it can significantly improve the detection accuracy of the model. The M4 model was obtained by pruning the M3 model. The results show that when the pruning rate reached 37.15%, the mAP of the M4 model was decreased by 0.8%. At this time, the size of the M4 model was only 3.08 MB, and the FPS reached 85.03 frames/s.

When the input size of the model was increased to 1280 × 1280, the receptive field of the feature map increased, and the identification accuracy of soybean images increased from 90.8% to 92.1%. Compared with the baseline model, its model size and total parameters were reduced by 76.65% and 79.55%, respectively. From the overall results of the ablation experiments, compared with the baseline model, although the detection speed of the M5 model in this paper has slowed down, it has strong performance in other aspects.

3.3. Visualization of Identification Process

In this paper, heat maps are drawn with the help of the visualization tool Gradient-weighted Class Activation Mapping (GradCAM). By observing the GradCAM heat map, the areas and features that the neural network focuses on in computer vision tasks can be revealed [49]. In the heatmap, brighter/deeper colors indicate higher attention from the neural network towards those areas, signifying their greater importance in the judgment or decision-making process for the target class. After a series of tests, it was found that the improved YOLOv5s model can obtain better heat map results under the parameters shown in configuration Table 3. The heat map results of YOLOv5s before and after improvement are shown in Figure 8.

Figure 8a,b show the image of the maize–soybean strip composite planting and the detection image of the M5 model, respectively. The thermal distribution map in Figure 8c shows that the original YOLOv5s model focuses on the soybean seedlings and nearly all of the maize seedlings. Although the model focuses more on the features of soybean seedlings than on maize seedlings, the excessive focus on regions associated with maize seedlings can lead to misclassifications. With the addition of ECA and BiFPN modules, the improved model in Figure 8d emphasizes the focal regions of soybean seedlings and a small number of maize seedlings during the target identification stage. Eventually, soybean seedlings are identified from these focal regions by effectively filtering out the unimportant or non-target feature information, thereby improving identification accuracy.

3.4. Comparative Experimental Analysis of Different Models

To evaluate the detection performance of the proposed improved model (M5) on soybean seedlings, current mainstream single-stage YOLO series models, the two-stage classical Faster RCNN model, and the M5 model in this paper were selected for performance comparison. The different object detection models were trained for 200 epochs using the dataset created in this study. Subsequently, precision, recall rate, mAP, weight size, and detection speed were evaluated on the test set. The performance comparison results of the six detection models are shown in Table 4.

As can be seen from Table 4, the P-value, R-value, and mAP-value of the YOLO series network model all exceed those of the classic Faster RCNN model. This result verifies that the selected backbone network of the YOLO series network model has more powerful feature extraction capabilities than the VGG16 in the Faster RCNN model. By comparing the M5 model with the latest YOLO series object detection models, it is shown from multiple evaluation results that the M5 model ranks highly in terms of evaluation performance, which indicates the superiority of this improved model. Specifically, the precision, recall, and mAP values of the M5 model all rank second among the six detection models mentioned in Table 4. Although the mAP value of the M5 model is 0.5% lower than that of the YOLOv7 model, the M5 model has the smallest model size among all the models. Comparison and analysis between the M5 model and the lightweight YOLOXs model showed that although the detection speed of the M5 model is slightly slower than that of the YOLOXs model (by 4.77 FPS), the mAP is 0.4% higher than that of the YOLOXs model.

Although the latest models, such as YOLOv7 and YOLOv8, have been proposed, the YOLOv5 model has the characteristics of more accessible training, easy deployment, and simple structure. At the same time, the improved performance is not inferior to other models, so the YOLOv5 model still has high practical application value. Therefore, the improved model is lightweight and conducive to model deployment and migration. It can provide valuable vision technical references for the automation of soybean seedling replanting in the seedling stage.

3.5. Field Experimentation and Analysis

3.5.1. Model Deployment and Field Testing

As shown in Figure 9, the proposed model in this study was first deployed on the NVIDIA Jetson NX and then mounted beneath the laboratory-developed UAV. The data acquisition device used in the field experiment was a Hikrobot MV-CB060-10UC industrial camera with a 6-megapixel camera. The maximum resolution was 3072 × 2048, and the maximum frame rate was 60.9 fps. The camera module was connected to the edge device via a USB interface. The UAV battery powered the edge device and the driving motors through a power connection cable. Through testing and analysis, it can be found that the running frame rate of the M4 model proposed in this article on NVIDIA Jetson NX was 34.83 frames/s. Therefore, it can be concluded that the model runs smoothly on low-power edge devices.

3.5.2. Analysis of Soybean Image Detection at Different Flight Heights

To explore the detection effects of the improved model on soybean images at different flight heights, this field experiment was conducted with fixed-height flights at three levels (3 m, 5 m, and 7 m). The captured soybean image identification results are output from the edge device and displayed in Figure 10.

As shown in Figure 10a, when the flight height is 3m, the model’s accuracy in identifying soybean images is 98.43%. Only one adhesion identification result appears (as marked by the yellow circle area in Figure 10a). At a flying height of 5 m, the model achieves 100% precision in soybean image identification. When flying at an altitude of 7 m, the accuracy of the model in soybean image identification is 98.97%. There are only two instances of misidentification due to adhesion (as marked by yellow circles in Figure 10c). A comparison between the proposed model and manual counting results in this study reveals that the improved model has a higher identification performance for UAV images at three different height levels. Moreover, this experiment also concluded that soybean image identification performance is optimal when the flight altitude is 5 m. The identification results of more than 95% at three different heights also show that the improved model has stable generalization performance for soybean images at different heights.

3.5.3. Soybean Seedling Detection Results in Different Scenarios

In this study, different scenarios, including wind or airflow disturbances, planting density, and seedling growth stages, were randomly selected to test the performance of the improved model. The detection results are shown in Figure 11.

The test images were manually divided into mild jitter (30%), moderate downward jitter (40%), moderate jitter (50%), and moderate upper jitter (60%) according to the jitter blur degree of soybean seedlings in the image. Figure 11a-d show the detection results of soybean seedling images under different jitter levels. According to the statistical data in Table 5, the differences between the maximum and minimum values of the soybean seedling detection results in the images are 24%, 22%, 26%, and 41%, respectively. Although the average detection accuracy of soybean seedling images gradually decreases with the increase in jitter, the overall detection accuracy remains above 88%. Therefore, it can be observed that although the detection results are slightly affected by environmental wind or UAV airflow, the proposed model in this study still demonstrates a strong detection capability.

Figure 11e,f show the detection effect of the improved model on soybean seedling images with different sparsity levels. In the sparse image of Figure 11e, 50 soybean seedlings were detected by the model. The false negative rate was 3.85%. In the dense image in Figure 11f, 168 soybean seedlings were detected by the model. The false negative rate was 1.18%. It can be concluded that the model proposed in this paper has strong robustness to the sparsity of soybean images.

Additionally, two growth stages of soybean seedlings were selected for detection and comparison in this study. As shown in Figure 11g, the rate of missing seedlings is higher during the cotyledon stage and lower during the second node stage of soybean seedlings. Therefore, it can be inferred that the proposed model in this paper can be used to accurately monitor soybean seedling deficiency, facilitating the arrangement of agricultural activities such as reseeding or seedling supplementation.

3.5.4. Analysis and Reflection on Different Planting Modes

To explore the generalization of the improved model for different planting methods, this study carried out image identification experiments on two different soybean planting patterns in the field. One was soybean monoculture planting, and the other was soybean–maize strip intercropping. The latter planting mode was the interval planting of two rows of maize—four rows of soybean. The study selected 20 images from each planting mode for detection experiments and comparative analysis.

From the detection results of soybean seedlings under two planting modes in Table 6, it can be seen that the average identification accuracy of soybean monoculture planting mode reached 90.6%, while the average identification accuracy of soybean–corn strip intercropping mode was 88.25%. Considering the two planting modes comprehensively, the overall average identification accuracy was 89.5%. Although it indicates that the identification performance of the model slightly decreased in the soybean–maize strip intercropping mode, it still maintained relatively high accuracy.

The fitting curve images of the two planting modes shown in Figure 12 also confirm this conclusion. For the monoculture planting method, the expression of the fitting curve is

y_{1} = 1.0081 x_{1} - 11.09

. The correlation coefficient

R_{1}^{2}

is 0.9979, indicating a high degree of agreement between the predicted and actual ground truth values. For the intercropping planting method, the expression of the fitted curve is

y_{2} = 0.8261 x_{2} + 12.178

, with a correlation coefficient

R_{2}^{2}

of 0.9376. The case means that there may be a certain degree of difference between the predicted and actual values, but it still effectively captures the trend of the data changes. The difference may be due to the similarity in color between maize seedlings and soybeans, which could lead to false detection via the improved model. In the future, the compound planting of soybeans and maize will become an emerging planting trend. This study provides novel insights for exploring and monitoring planting modes between crops and contributes fundamental data for governmental grain planning and decisions [50].

3.5.5. Failure Case Analysis

There are still cases of missed and false detections in the process of using the improved model to detect soybean seedlings in the test set images. As shown in the areas indicated by the blue boxes in Figure 13a and 13b, some weeds or maize seedlings with leaf color and shape similar to soybean seedlings were mistakenly detected. This case will result in a slightly higher detection count than the number of soybeans planted. When two soybean seedlings with large physical differences are planted closely together, the vigorously growing soybean seedlings may slightly obscure the shorter seedlings (as shown by the blue box in Figure 13c). In this scenario, the model only identified the vigorously growing soybean seedling. The solution to this problem is to adjust seeding uniformity. In Figure 13d, undersized soybean seedlings appear. The reason for the missed detection may be that the shape and texture of the soybean leaves are not sufficiently exposed in the image due to the problem of high shooting height or sprouting. To address this case in the future, improvements in algorithmic methods, adoption of higher resolution cameras for image capture, or the incorporation of super-resolution algorithms should be considered to reduce the likelihood of missed detections.

3.6. Discussion

This study proposes a rapid detection method suitable for airborne edge devices and large-scale dense soybean field images. Despite this study having selected the high-performing YOLOv5s model, certain limitations still were encountered when processing target data. To address these limitations, the GhostNetV2 network was adopted in this study to achieve a lightweight model in terms of size and number of parameters. Considering the characteristics of the target dataset, attention mechanisms and feature fusion modules were incorporated to enhance the model’s accuracy. Moreover, to facilitate more effective deployment on edge devices, a pruning algorithm was employed to further reduce the model’s size and parameter count. The improved model demonstrated enhanced adaptability and efficiency in processing the soybean seedling dataset through this optimization strategy.

Compared to the original YOLOv5s model, the M4 method proposed in this study demonstrates substantial improvements. When processing images with a resolution of 640 × 640, the M4 model exhibits an increase of 4.6 percentage points in mAP while reducing model weight by over 70%. Additionally, the model’s inference speed is improved by 25%. The efficacy of these improvements is further validated through meticulous ablation experiments. To compare the differences in computational resources required before and after improvements in deploying the YOLOv5s model on edge devices, this study deployed the M0 and M4 models on the drone platform shown in Figure 9. After testing, it can be seen that the M4 model achieved an inference speed of 34.83 FPS, which is a 22.12% increase compared to the inference speed of the M0 model (28.52 FPS). In deploying the model on the airborne edge device, the amount of model parameters and the required computational resources remained consistent before and after deployment. According to the data analysis in Table 2, it can be seen that the M4 model accounts for only 20.45% and 22.48% of the resources required by the M0 model in terms of parameter quantity and weight size, respectively.

Although our research used the Nvidia Jetson Xavier NX as the airborne edge computing device, the concept of an edge device is not limited to this particular model or brand. Edge computing can encompass a range of devices capable of performing data processing tasks efficiently in the field. These devices, varying in computational power, play a crucial role in on-site data processing across various applications. Considering the limited computational capability of edge devices carried by agricultural UAVs, this paper will focus on the problem of soybean seedling images taken by UAVs in specific application scenarios. Specifically, our efforts will be dedicated to adjusting the model structure and optimizing parameters to reduce the model size, enhance recognition accuracy, and improve the detection speed strategy of the model on embedded devices. Through these efforts, we hope to achieve significant breakthroughs in the processing capabilities of soybean seedling images in practical agricultural environments. With further advancements in edge computing device performance in the future, there is hope for more efficient and accurate real-time data processing.

To obtain as much soybean yield as possible on limited land, different researchers have developed various soybean–maize strip intercropping patterns. This study only collected data from two types of soybean planting methods, and other types of soybean–maize strip intercropping patterns need to be explored, such as four rows of maize–four rows of soybean, four rows of maize–six rows of soybean, and so on. Another important point is that obtaining enough rich data is very time-consuming and requires exploration. In future work, more abundant data collection work can be carried out based on the promotion of the strip compound planting model to achieve wider applications.

Through the rapid detection method for dense soybean seedling field images proposed in this study, the development of seedlings in different scenarios can be understood to help farmers take corresponding cultivation and management measures in a timely manner. For example, monitoring the growth status of seedlings under wind or airflow disturbances can help optimize planting density and layout to reduce mutual competition among seedlings. Additionally, monitoring the changes in the growth stages of seedlings can assist in scheduling agricultural activities such as fertilization, irrigation, and weed control, thus providing optimal growth conditions. Overall, this study offers new technical support for precision agriculture, which is of significant importance for the sustainable development of agriculture.

4. Conclusions

To quickly and accurately obtain the number of soybean seedlings in the field, this study proposes a fast detection method based on the improved YOLOv5s model. First, GhostNetV2 was used instead of CSPDarknet 53 as the backbone feature extraction network, and ECA and BiFPN modules were introduced to improve the model’s identification accuracy and feature representation capability. The input size of 1280 × 1280 pixels was adopted to solve the problem of insufficient feature extraction of small-scale soybean seedlings. Moreover, the PAGCP algorithm was employed to streamline the model structure and boost inference speed. Experimental results show that the proposed improved model achieves an identification accuracy of 92.1% for soybean seedlings, 5.9% higher than the baseline model. The model size is also compressed by 23.35%, and the parameter count is reduced by 79.55%. Compared with other classic models, the improved model proposed in this paper has certain advantages in comprehensive performance. In addition, the detection performance under different scenarios, such as different flight heights, degrees of sparsity, stages of seedling growth, and planting modes, was also discussed. Furthermore, some detection failure cases were discussed and analyzed. The experimental results show that the improved model has excellent robustness and generalization performance.

In summary, the proposed method in this paper offers a novel technological approach for the fast detection of dense soybean seedlings in field environments. It has positive significance for research fields, such as quick assessment of soybean emergence rate and yield prediction. In further work, the object tracking algorithm will be combined to further improve the model’s performance in real-time statistics of soybean emergence rate and field management.

Author Contributions

Conceptualization, Z.Y. and G.C.; methodology, Z.Y. and L.D.; software, Z.Y.; validation, L.W.; formal analysis, L.W.; investigation, J.L. and Y.S.; resources, H.L.; data curation, J.L. and Y.S.; writing—original draft preparation, Z.Y.; writing—review and editing, L.W.; visualization, G.C. and L.D.; supervision, H.L.; project administration, H.L.; funding acquisition, H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Science and Technology R&D Plan Joint Fund of Henan Province (30602873) and the China Agriculture Research System of MOF and MARA (CARS-04-PS28).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, Y.; Jia, J.D.; Zhang, L.; Khattak, A.M.; Sun, S.; Gao, W.L.; Wang, M.J. Soybean Seed Counting Based on Pod Image Using Two-Column Convolution Neural Network. IEEE Access 2019, 7, 64177–64185. [Google Scholar] [CrossRef]
Wei, M.C.F.; Molin, J.P. Soybean Yield Estimation and Its Components: A Linear Regression Approach. Agriculture 2020, 10, 348. [Google Scholar] [CrossRef]
Xu, X.; Li, H.Y.; Yin, F.; Xi, L.; Qiao, H.B.; Ma, Z.W.; Shen, S.J.; Jiang, B.C.; Ma, X.M. Wheat ear counting using K-means clustering segmentation and convolutional neural network. Plant Methods 2020, 16, 106. [Google Scholar] [CrossRef] [PubMed]
Li, H.; Wang, P.; Huang, C. Comparison of Deep Learning Methods for Detecting and Counting Sorghum Heads in UAV Imagery. Remote Sens. 2022, 14, 3143. [Google Scholar] [CrossRef]
Bawa, A.; Samanta, S.; Himanshu, S.K.; Singh, J.; Kim, J.; Zhang, T.; Chang, A.; Jung, J.; DeLaune, P.; Bordovsky, J. A support vector machine and image processing based approach for counting open cotton bolls and estimating lint yield from UAV imagery. Smart Agric. Technol. 2023, 3, 100140. [Google Scholar] [CrossRef]
Rahimi, W.N.S.; Ali, M.S.A.M. Ananas comosus crown image thresholding and crop counting using a colour space transformation scheme. TELKOMNIKA (Telecommun. Comput. Electron. Control.) 2020, 18, 2472–2479. [Google Scholar] [CrossRef]
Valente, J.; Sari, B.; Kooistra, L.; Kramer, H.; Mücher, S. Automated crop plant counting from very high-resolution aerial imagery. Precis. Agric. 2020, 21, 1366–1384. [Google Scholar] [CrossRef]
Lan, Y.; Shengde, C.; Fritz, B.K. Current status and future trends of precision agricultural aviation technologies. Int. J. Agric. Biol. Eng. 2017, 10, 1–17. [Google Scholar] [CrossRef]
Chen, C.; Zheng, Z.; Xu, T.; Guo, S.; Feng, S.; Yao, W.; Lan, Y. YOLO-Based UAV Technology: A Review of the Research and Its Applications. Drones 2023, 7, 190. [Google Scholar] [CrossRef]
Mu, Y.; Feng, R.L.; Ni, R.W.; Li, J.; Luo, T.Y.; Liu, T.H.; Li, X.; Gong, H.; Guo, Y.; Sun, Y.; et al. A Faster R-CNN-Based Model for the Identification of Weed Seedling. Agronomy 2022, 12, 2867. [Google Scholar] [CrossRef]
Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]
Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; pp. 21–37. [Google Scholar]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Redmon, J.; Farhadi, A. YOLO9000: Better, faster, stronger. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 June 2017; pp. 7263–7271. [Google Scholar]
Wang, L.L.; Zhao, Y.J.; Liu, S.B.; Li, Y.H.; Chen, S.D.; Lan, Y.B. Precision detection of dense plums in orchards using the improved YOLOv4 model. Front. Plant Sci. 2022, 13, 839269. [Google Scholar] [CrossRef] [PubMed]
Mota-Delfin, C.; Lopez-Cantens, G.D.; Lopez-Cruz, I.L.; Romantchik-Kriuchkova, E.; Olguin-Rojas, J.C. Detection and Counting of Corn Plants in the Presence of Weeds with Convolutional Neural Networks. Remote Sens. 2022, 14, 4892. [Google Scholar] [CrossRef]
Xiao, J.; Suab, S.A.; Chen, X.Y.; Singh, C.K.; Singh, D.; Aggarwal, A.K.; Korom, A.; Widyatmanti, W.; Mollah, T.H.; Minh, H.V.T.; et al. Enhancing assessment of corn growth performance using unmanned aerial vehicles (UAVs) and deep learning. Measurement 2023, 214, 112764. [Google Scholar] [CrossRef]
Xu, X.M.; Wang, L.; Liang, X.W.; Zhou, L.; Chen, Y.J.; Feng, P.Y.; Yu, H.L.; Ma, Y.T. Maize Seedling Leave Counting Based on Semi-Supervised Learning and UAV RGB Images. Sustainability 2023, 15, 9583. [Google Scholar] [CrossRef]
Nnadozie, E.C.; Iloanusi, O.N.; Ani, O.A.; Yu, K. Detecting Cassava Plants under Different Field Conditions Using UAV-Based RGB Images and Deep Learning Models. Remote Sens. 2023, 15, 2322. [Google Scholar] [CrossRef]
Huang, H.; Huang, J.X.; Feng, Q.L.; Liu, J.M.; Li, X.C.; Wang, X.L.; Niu, Q.D. Developing a Dual-Stream Deep-Learning Neural Network Model for Improving County-Level Winter Wheat Yield Estimates in China. Remote Sens. 2022, 14, 5280. [Google Scholar] [CrossRef]
Khaki, S.; Safaei, N.; Pham, H.; Wang, L.Z. WheatNet: A lightweight convolutional neural network for high-throughput image-based wheat head detection and counting. Neurocomputing 2022, 489, 78–89. [Google Scholar] [CrossRef]
Chen, P.C.; Xu, W.C.; Zhan, Y.L.; Wang, G.B.; Yang, W.G.; Lan, Y.B. Determining application volume of unmanned aerial spraying systems for cotton defoliation using remote sensing images. Comput. Electron. Agric. 2022, 196, 106912. [Google Scholar] [CrossRef]
Xu, W.C.; Chen, P.C.; Zhan, Y.L.; Chen, S.D.; Zhang, L.; Lan, Y.B. Cotton yield estimation model based on machine learning using time series UAV remote sensing data. Int. J. Appl. Earth Obs. 2021, 104, 102511. [Google Scholar] [CrossRef]
Lin, Y.D.; Chen, T.T.; Liu, S.Y.; Cai, Y.L.; Shi, H.W.; Zheng, D.K.; Lan, Y.B.; Yue, X.J.; Zhang, L. Quick and accurate monitoring peanut seedlings emergence rate through UAV video and deep learning. Comput. Electron. Agric. 2022, 197, 106938. [Google Scholar] [CrossRef]
Jiang, Y.; Li, C.Y.; Paterson, A.H.; Robertson, J.S. DeepSeedling: Deep convolutional network and Kalman filter for plant seedling detection and counting in the field. Plant Methods 2019, 15, 141. [Google Scholar] [CrossRef] [PubMed]
Li, L.; Hassan, M.A.; Yang, S.R.; Jing, F.R.; Yang, M.J.; Rasheed, A.; Wang, J.K.; Xia, X.C.; He, Z.H.; Xiao, Y.G. Development of image-based wheat spike counter through a Faster R-CNN algorithm and application for genetic studies. Crop J. 2022, 10, 1303–1311. [Google Scholar] [CrossRef]
Wang, L.L.; Zhao, Y.J.; Xiong, Z.J.; Wang, S.Z.; Li, Y.H.; Lan, Y.B. Fast and precise detection of litchi fruits for yield estimation based on the improved YOLOv5 model. Front. Plant Sci. 2022, 13, 965425. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Xiang, L.R.; Tang, L.; Jiang, H.Y. A Convolutional Neural Network-Based Method for Corn Stand Counting in the Field. Sensors 2021, 21, 507. [Google Scholar] [CrossRef]
Yang, B.; Gao, Z.; Gao, Y.; Zhu, Y. Rapid detection and counting of wheat ears in the field using YOLOv4 with attention module. Agronomy 2021, 11, 1202. [Google Scholar] [CrossRef]
Bao, W.; Xie, W.; Hu, G.; Yang, X.; Su, B. Wheat ear counting method in UAV images based on TPH-YOLO. Trans. Chin. Soc. Agric. Eng. 2023, 1, 185–191. [Google Scholar] [CrossRef]
Bevers, N.; Sikora, E.J.; Hardy, N.B. Soybean disease identification using original field images and transfer learning with convolutional neural networks. Comput. Electron. Agric. 2022, 203, 107449. [Google Scholar] [CrossRef]
Yu, M.; Ma, X.D.; Guan, H.O. Recognition method of soybean leaf diseases using residual neural network based on transfer learning. Ecol. Inform. 2023, 76, 102096. [Google Scholar] [CrossRef]
Yu, M.; Ma, X.D.; Guan, H.O.; Liu, M.; Zhang, T. A Recognition Method of Soybean Leaf Diseases Based on an Improved Deep Learning Model. Front. Plant Sci. 2022, 13, 878834. [Google Scholar] [CrossRef]
Ning, S.; Chen, H.; Zhao, Q.; Wang, Y. Detection of Pods and Stems in Soybean Based on IM-SSD+ACO Algorithm. Trans. Chin. Soc. Agric. Mach. 2021, 52, 182–190. [Google Scholar] [CrossRef]
Gao, S.; Guan, H.; Ma, X. A recognition method of multispectral images of soybean canopies based on neural network. Ecol. Inform. 2022, 68, 101538. [Google Scholar] [CrossRef]
Casagrande, C.R.; Sant’ana, G.C.; Meda, A.R.; Garcia, A.; Souza Carneiro, P.C.; Nardino, M.; Borem, A. Association between unmanned aerial vehicle high-throughput canopy phenotyping and soybean yield. Agron. J. 2022, 114, 1581–1598. [Google Scholar] [CrossRef]
Lu, W.; Du, R.T.; Niu, P.S.; Xing, G.N.; Luo, H.; Deng, Y.M.; Shu, L. Soybean Yield Preharvest Prediction Based on Bean Pods and Leaves Image Recognition Using Deep Learning Neural Network Combined With GRNN. Front. Plant Sci. 2022, 12, 791256. [Google Scholar] [CrossRef] [PubMed]
Chen, Z.Q.; Fan, W.J.; Luo, Z.; Guo, B. Soybean seed counting and broken seed recognition based on image sequence of falling seeds. Comput. Electron. Agric. 2022, 196, 106870. [Google Scholar] [CrossRef]
Wang, B.; Li, H.; You, J.; Chen, X.; Yuan, X.; Feng, X. Fusing deep learning features of triplet leaf image patterns to boost soybean cultivar identification. Comput. Electron. Agric. 2022, 197, 106914. [Google Scholar] [CrossRef]
McCarthy, A.; Raine, S. Automated variety trial plot growth and flowering detection for maize and soybean using machine vision. Comput. Electron. Agric. 2022, 194, 106727. [Google Scholar] [CrossRef]
Wang, S.; Minku, L.L.; Yao, X. Resampling-Based Ensemble Methods for Online Class Imbalance Learning. IEEE Trans. Knowl. Data Eng. 2015, 27, 1356–1368. [Google Scholar] [CrossRef]
Glenn, J.; Alex, S.; Jirka, B.; Ayush, C. YOLOv5. 2022. Available online: https://github.com/ultralytics/yolov5 (accessed on 22 February 2022).
Alansari, M.; Hay, O.A.; Javed, S.; Shoufan, A.; Zweiri, Y.; Werghi, N. GhostFaceNets: Lightweight Face Recognition Model From Cheap Operations. IEEE Access 2023, 11, 35429–35446. [Google Scholar] [CrossRef]
Tang, Y.; Han, K.; Guo, J.; Xu, C.; Xu, C.; Wang, Y. GhostNetv2: Enhance cheap operation with long-range attention. Adv. Neural Inf. Process. Syst. 2022, 35, 9969–9982. [Google Scholar] [CrossRef]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790. [Google Scholar]
Zhou, J.C.; Hu, W.W.; Zou, A.R.; Zhai, S.K.; Liu, T.Y.; Yang, W.H.; Jiang, P. Lightweight Detection Algorithm of Kiwifruit Based on Improved YOLOX-S. Agriculture 2022, 12, 993. [Google Scholar] [CrossRef]
Ye, H.C.; Zhang, B.; Chen, T.; Fan, J.Y.; Wang, B. Performance-Aware Approximation of Global Channel Pruning for Multitask CNNs. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 10267–10284. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Zhou, X.; Wang, J.F.; He, Y.J.; Shan, B. Crop Classification and Representative Crop Rotation Identifying Using Statistical Features of Time-Series Sentinel-1 GRD Data. Remote Sens. 2022, 14, 5116. [Google Scholar] [CrossRef]

Figure 1. Location of data acquisition site.

Figure 2. Example of data annotation for soybean seedlings.

Figure 3. GhostNetV2 network architecture.

Figure 4. The working mechanism of the ECA module.

Figure 5. The schematic diagram of the BiFPN module. Among them, the blue and green arrows represent the feature transmission of bidirectional connections between different convolutional layers; the red arrows represent the added skip connections between input and output nodes of the same scale for feature fusion.

Figure 6. Improved YOLOv5s structure diagram.

Figure 7. Visualization of the three output feature maps from the backbone network.

Figure 8. Visual comparison of GradCAM before and after model improvement: (a) original image; (b) detection result (M5); (c) heat map of original YOLOv5s model (M0); (d) heat map of our final model (M5).

Figure 9. The field test platform.

Figure 10. The detection effect of the improved model on UAV soybean images at different height levels: (a) when the flight height is 3m, the actual number of soybeans is 64 and the model predicted value is 63; (b) when the flight height is 5m, the actual number of soybeans and the model predicted value are both 115; (c) when the flight height is 7m, the actual number of soybeans is 195 and the model predicted value is 193.

Figure 11. Detection effects of soybean seedlings in different scenarios: (a) 30% jitter; (b) 40% jitter; (c) 50% jitter; (d) 60% jitter; (e) a sparse soybean image; (f) a dense soybean image; (g) the cotyledon stage; (h) the second node stage.

Figure 12. The fitting results of soybean seedling detection under two planting modes: (a) the soybean monoculture planting mode and (b) the soybean–maize strip intercropping mode.

Figure 13. Failure cases detected by the improved model: (a) false detection—Amaranthus; (b) false detection—small maize; (c) missed detection—adhesion; (d) missed detection—undersized soybean seedlings.

Table 1. Dataset information.

Collection Date	Planting Method	Number of Images
25 June 2022	soybean monoculture	200
28 June 2022	soybean–maize strip intercropping	200
4 July 2022	soybean monoculture	150
Total		550

Table 2. Results of ablation experiments.

Models	mAP/%	Parameters	Weights/M	FPS
YOLOv5s (M0) (Original YOLOv5s)	86.2	7,012,822	13.7	68.03
+GhostNetV2 (M1)	89.2	3,794,382	7.73	70.42
+GhostNetV2 + ECA (M2)	90.5	3,794,843	7.73	68.49
+GhostNetV2 + ECA + BiFPN (M3)	91.6	3,860,379	7.86	68.03
+GhostNetV2 + ECA + BiFPN + Pruned (M4)	90.8	1,434,219	3.08	85.03
+1280 × 1280 (M5)	92.1	1,434,219	3.20	63.65

Table 3. Configuration parameters for heatmap visualization.

Configuration Parameters	Value
Method	GradCAM
Visualized layer depth	−8
Type of gradient for backpropagation	conf
Conf_threshold	0.65
Ratio of calculating the heat map	0.07

Table 4. Performance comparison test results of different object detection models.

Models	P/%	R/%	mAP/%	Weights/MB	FPS
YOLOXs	92.3	85.7	91.7	34.3	68.42
YOLOv7	94.2	84.09	92.6	142	36.87
YOLO-z	91.6	87.9	90.8	89.3	17.54
YOLOv4-MobileNetv3	87.8	81.3	83.4	53.7	58.26
Faster RCNN	77.3	68.2	74.5	108	39
M5 model	92.8	86.8	92.1	3.2	63.65

Table 5. The detection results of soybean seedling images under different jitter levels.

Types	30%-Jitter (P)	40%-Jitter (P)	50%-Jitter (P)	60%-Jitter (P)
Maximum value	95%	95%	95%	95%
Minimum value	71%	73%	69%	54%
Average value	89.67%	89.03%	88.5%	88.4%

Table 6. The detection results of soybean seedlings under two planting modes.

Planting Modes	Average Identification Accuracy
soybean monoculture planting	90.60%
soybean–maize strip intercropping	88.25%
Total	89.50%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, Z.; Liu, J.; Wang, L.; Shi, Y.; Cui, G.; Ding, L.; Li, H. Fast and Precise Detection of Dense Soybean Seedlings Images Based on Airborne Edge Device. Agriculture 2024, 14, 208. https://doi.org/10.3390/agriculture14020208

AMA Style

Yang Z, Liu J, Wang L, Shi Y, Cui G, Ding L, Li H. Fast and Precise Detection of Dense Soybean Seedlings Images Based on Airborne Edge Device. Agriculture. 2024; 14(2):208. https://doi.org/10.3390/agriculture14020208

Chicago/Turabian Style

Yang, Zishang, Jiawei Liu, Lele Wang, Yunhui Shi, Gongpei Cui, Li Ding, and He Li. 2024. "Fast and Precise Detection of Dense Soybean Seedlings Images Based on Airborne Edge Device" Agriculture 14, no. 2: 208. https://doi.org/10.3390/agriculture14020208

APA Style

Yang, Z., Liu, J., Wang, L., Shi, Y., Cui, G., Ding, L., & Li, H. (2024). Fast and Precise Detection of Dense Soybean Seedlings Images Based on Airborne Edge Device. Agriculture, 14(2), 208. https://doi.org/10.3390/agriculture14020208

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Fast and Precise Detection of Dense Soybean Seedlings Images Based on Airborne Edge Device

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Image Data Acquisition

2.1.2. Production of Soybean Seedling Dataset

2.2. Methodologies

2.2.1. YOLOv5 Model

2.2.2. Backbone Extraction Network—GhostNetV2

2.2.3. ECA Attention Module

2.2.4. Feature Fusion Module

2.3. Improvement of YOLOv5s Model

2.3.1. Lightweight Backbone Network

2.3.2. Improvements of the Neck Structure

2.3.3. Model Pruning

2.3.4. Overall Structure of the Improved Model

2.4. Model Training

2.4.1. Training Environment

2.4.2. Parameters Setting

3. Results and Discussion

3.1. Model Evaluation Indicators

3.2. Ablation Experiments

3.3. Visualization of Identification Process

3.4. Comparative Experimental Analysis of Different Models

3.5. Field Experimentation and Analysis

3.5.1. Model Deployment and Field Testing

3.5.2. Analysis of Soybean Image Detection at Different Flight Heights

3.5.3. Soybean Seedling Detection Results in Different Scenarios

3.5.4. Analysis and Reflection on Different Planting Modes

3.5.5. Failure Case Analysis

3.6. Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI