Weed Detection Algorithms in Rice Fields Based on Improved YOLOv10n

Li, Yan; Guo, Zhonghui; Sun, Yan; Chen, Xiaoan; Cao, Yingli

doi:10.3390/agriculture14112066

Open AccessArticle

Weed Detection Algorithms in Rice Fields Based on Improved YOLOv10n

by

Yan Li

¹,

Zhonghui Guo

¹,

Yan Sun

¹,

Xiaoan Chen

¹ and

Yingli Cao

^1,2,*

¹

College of Information and Electrical Engineering, Shenyang Agricultural University, Shenyang 110866, China

²

Liaoning Key Laboratory of Intelligent Agricultural Technology, Shenyang 110866, China

^*

Author to whom correspondence should be addressed.

Agriculture 2024, 14(11), 2066; https://doi.org/10.3390/agriculture14112066

Submission received: 23 October 2024 / Revised: 10 November 2024 / Accepted: 14 November 2024 / Published: 16 November 2024

(This article belongs to the Section Crop Protection, Diseases, Pests and Weeds)

Download

Browse Figures

Versions Notes

Abstract

:

Weeds in paddy fields compete with rice for nutrients and cause pests and diseases, greatly affecting rice yield. Accurate weed detection is vital for implementing variable spraying with unmanned aerial vehicles (UAV) for weed control. Therefore, this paper presents an improved weed detection algorithm, YOLOv10n-FCDS (YOLOv10n with FasterNet, CGBlock, Dysample, and Structure of Lightweight Detection Head), using UAV images of Sagittaria trifolia in rice fields as the research object, to address challenges like the detection of small targets, obscured weeds and weeds similar to rice. We enhanced the YOLOv10n model by incorporating FasterNet as the backbone for better small target detection. CGBlock replaced standard convolution and SCDown modules to improve the detection ability of obscured weeds, while DySample enhanced discrimination between weeds and rice. Additionally, we proposed a lightweight detection head based on shared convolution and scale scaling, maintaining accuracy while reducing model parameters. Ablation studies revealed that YOLOv10n-FCDS achieved a 2.6% increase in mean average precision at intersection over union 50% for weed detection, reaching 87.4%. The model also improved small target detection (increasing mAP50 by 2.5%), obscured weed detection (increasing mAP50 by 2.8%), and similar weed detection (increasing mAP50 by 3.0%). In conclusion, YOLOv10n-FCDS enables effective weed detection, supporting variable spraying applications by UAVs in rice fields.

Keywords:

rice field weeds; target detection; Sagittaria trifolia; YOLOv10; UAV

1. Introduction

Rice, being one of China’s most vital food crops, plays a crucial role in ensuring food security through its high quality and yield [1,2,3,4,5]. During the reproductive period of rice, paddy weeds compete with rice for soil nutrients and water, resulting in a depletion of water and fertilizer resources. Additionally, weed infestations can promote the development and proliferation of pests and diseases [6]. Paddy weeds have emerged as a significant source of biohazards that hinder both the yield and quality of rice [7,8,9]. Therefore, the prevention effect of weed control is closely related to the yield of rice.

At present, rice management still adopts the mode of “two closures and one extermination”. Among them, the first closure refers to the initial control of weeds in the rice field by spraying herbicides before replanting. The second closure refers to the application of herbicide at the regreening stage of the rice to further weed control. The “one extermination” refers to the application of medicine to the remaining weeds at the tillering stage to achieve the purpose of eliminating the weeds [10]. During the “one extermination” period, indiscriminate pesticide spraying is usually used. At this time, Sagittaria trifolia can exist in large numbers of plants with small leaves, thus constituting a small target for detection. Moreover, rice is at the tillering stage, and weeds usually appear to obscure each other with rice in an irregular and dense distribution. There will also be some Sagittaria trifolia that are highly similar in appearance to rice at this time of year, resulting in low accuracy for weed detection in complex backgrounds [11]. In addition, the large exposed area of the water body during this period requires the model to ensure accuracy and efficiency while also being resistant to interference when dealing with solar reflections from the water body.

The use of crude large-area spraying of herbicides is costly, and pesticide utilization is low. The resulting residual herbicide can contaminate paddy field aquatic environments to the detriment of the ecosystems [12,13] and affect human health through other media [14]. The variable application of plant protection UAVs can reduce the number of pesticides under the premise of ensuring the effect of weed prevention and control, which is in line with the development trend of green agriculture [15,16] in which the accurate detection of weeds is the first link to achieve accurate application. The first step of variable application is to use the UAV to detect weeds at a high altitude, then obtain the weed distribution in each plot based on the detection consequence for the formulation of variable application strategies and drawing of variable application prescription maps, and finally achieve variable spraying based on the prescription maps through a low-altitude flight of the UAV, i.e., the mode of “high-altitude accurate detection—prescription map generation—low-altitude variable spraying [17]”. Of these, efficient and accurate detection of weeds in rice paddies as the first link is crucial.

In the tillering stage, the exposed water area of the rice field is still large, and sun-light reflects in the water while the UAVs take the photos. At this time, a large part of the Sagittaria trifolia consists of plants with smaller leaves. In addition, Sagittaria trifolia in this period is often obscured by rice leaves, and some are similar in appearance to rice. These can make weed detection difficult. UAVs have a high degree of maneuverability, flexibility, high resolution, and high quality when collecting images, which helps to complete tasks such as the detection of weed species. Suitable types of UAVs can be selected according to the actual application scenarios, providing strong support for weed control and agricultural production.

In recent years, UAV remote sensing has been used in weed detection. According to the characteristics of the onboard sensors and data processing methods, weed detection methods can be divided based on multispectral imaging [18] and RGB digital imaging [19,20,21]. Spectral imaging combines spectral and image techniques to acquire spectral feature information and map it to spatially located pixel points [22]. The WSRI index was proposed to construct the deep convolution neural network evaluation model, achieving detection accuracies of 81.1% for barnyard grass and 92.4% for downy leaves in paddy fields [23]. However, multispectral sensors lack application feasibility due to high cost, complex feature extraction, and complicated data processing for plants with cluttered backgrounds [24].

RGB images acquired by UAVs have high spatial resolution [25], which can be used to achieve accurate detection of weed targets by traditional machine vision-based detection methods and deep convolutional neural network-based detection methods [26]. Huang extracted the color and texture features of paddy field weeds and used a BP classifier scale parameter of 100 with optimal results, and its average intersection ratio and accuracy were 68.7% and 83.6%, respectively [27]. Sheng Zhu et al. used Ada Boost with a combined detection accuracy of 90.25% for a 100 pixel by 100 pixel rice weed image [28]. However, the above algorithms have the problems of low detection accuracy and low efficiency, require manual multiple trials for feature selection, and the classification results depend on the feature performance effect [29].

Deep learning models can extract key features from complex data through multi-level nonlinear transformation, which greatly improves the accuracy of image recognition [30,31]. UNet with ResNet34 as a feature extractor was used to detect weed images with motion blur in sorghum fields, achieving an accuracy of 93.01% [32]. Cai et al. improved the network by inserting the ECA module into the SPP layer of the network based on PSPNet, and the network had an accuracy of 86.18% for detecting weeds in pineapple fields [33]. Jie Kang et al. proposed a weed detection algorithm, which combined feature enhancement and multi-scale fusion modules. The accuracy of the algorithm reached 88.4% in the detection of sugar beet weeds [34].

Although the accuracy of weed detection in the above study can meet the needs of variable spraying, the efficiency of the model has not been sufficiently taken into account, and weed spraying often has to be effective, especially in the face of a large area of paddy field operations [20]. Given this, some scholars have delved into lightweight models. Qingkuan Meng et al. lightened and improved the SSD with a precision of 88.27% and a detection speed of 32.26 FPS on a 480 pixels × 720 pixels corn weed dataset [35]. The aforementioned researchers enhanced the model structure to balance detection accuracy and speed, resulting in a notable improvement in model performance. However, studies on weed targets have been for the detection of single weeds with little background interference. In actual farmland environments, the same class of weeds can be affected by the growing environment in complex detail.

In recent years, the one-stage model represented by the YOLO series of algorithms extracts features directly through a convolutional network and generates a bounding box on the predicted feature map while classifying and regressing the generated bounding box, which can satisfy the operational needs of accuracy and efficiency at the same time from the input image to the final prediction result [36]. Guzel et al. used YOLOv5-small to detect weeds in wheat fields with 81.0% accuracy at the anthesis [37]. Tetila et al. used YOLOv5 and YOLOv4 for weed target detection on soybean weed data collected by UAVs, respectively, and the model size was only about one-tenth of that of YOLOv4 while the accuracy was on par with YOLOv4 [38].

In actual agricultural production, there are high requirements for timeliness and accuracy in weed control to achieve better control effects [39]. The innovative single-stage structure design of this series of models circumvents the complex region candidate box generation link and significantly reduces the model computation, making it possible to be deployed and applied on embedded devices with limited arithmetic power [40].

YOLOv10 is currently the latest model in the YOLO series. Some scholars have used YOLOv10n, YOLOv10s, and YOLOv10m to grade yellow cauliflower with a mAP50 of 80.0%, 83.3%, and 83.8%, respectively. Their computational volumes were 8.2, 24.5, and 63.4 GFLOPs, respectively [41]. Although the mAP50 of YOLOv10s is 3.3% higher than YOLOv10n, the amount of computation becomes almost triple. This situation does not apply to embedded devices. This shows that YOLOv10n is more suitable on agricultural equipment with limited arithmetic power. Therefore, this study chooses to apply the YOLOv10n model for weed detection in UAV rice field images.

Many excellent YOLO-based improved models have emerged. For example, integrating a genetic algorithm into the YOLO model to optimize the model hyperparameters has shown strong stability and effectiveness in the task of real-time detection of distribution networks, with a mAP50 and F1 score of 92.2% and 0.867, respectively [42]. YOLOu-Quasi-ProtoPNet was proposed, which combines YOLOv5 for object detection and Quasi-ProtoPNet for classification, and various deep learning model structures were tried to optimize the performance of the classification part, outperforming similar models [43].

To improve the detection performance of underwater small targets, FasterNet was integrated into YOLOv8. Evaluation results on the URPC2021 dataset show that the improved model can achieve an average accuracy of 84.7%. The use of FasterNet effectively enhances the model’s ability to capture detailed features of small targets [44]. The novel semantic segmentation model EGCN proposed by Yang et al. uses CGBlock as the base module and U-Net as the baseline, which can integrate contextual features and local spatial features at both high and low levels. This improves the accuracy of landslide identification [45]. Lu et al. used DySample to replace the up-sampling module in YOLOv8 for the problem of low image quality. Experimental results show that our enhanced algorithm achieves significant improvements in check accuracy, check completeness, and map50 compared to the original model [46]. Zhou et al. guaranteed model performance while drastically reducing the number of parameters in deep learning models by using shared convolution [47].

In order to solve the difficulty of weed detection in paddy fields, this study first proposes to replace the YOLOv10n backbone network to improve the feature extraction capability of the model. Secondly, this research intends to improve the up-sampling and down-sampling module of the neck and the conventional convolution module to improve the feature expression ability of the model. Then, a lightweight detection head is proposed to reduce the number of model parameters and the amount of calculation, so as to improve the inference speed of the model. Finally, based on the detection results of the weeds in the paddy field by the improved YOLOv10n, a herbicide variable spraying strategy is developed, and UAV variable spraying prescription maps are developed and discussed.

2. Materials and Methods

2.1. Image Acquisition

The field trial site is located in the rice experimental field at Shengli Village, Aji Town, Tieling City, Liaoning Province, as shown in Figure 1. The altitude is 55 m and the field has a temperate continental monsoon climate with an average annual precipitation of 678 mm. The soil is nutritionally rich with high fertility and a pH between 6 and 7. It is irrigated by flood irrigation. The paddy field breeds rice varieties including Fuhe 258 and Meifeng 336. Sagittaria trifolia is the dominant weed in the area, with a large and uneven distribution.

The experiment was conducted in this demonstration area on 24 June 2024, when most of the Sagittaria trifolia were at the 1–3 leaf stage. A DJI M300 (manufactured by DJI, Shenzhen, China) drone equipped with a high-resolution digital camera with a Zenith P1 (manufactured by DJI, Shenzhen, China) lens was used as the remote sensing platform. The drone’s horizontal hovering accuracy was ±0.1 m, vertical hovering accuracy was ±0.3 m, and it could withstand up to four levels of wind. The digital camera had an effective pixel count of 45 million and a resolution of 8192 pixels × 5640 pixels. The image acquisition time was from 11:00 to 13:00, and the weather during the test period was clear with weak or light winds.

The orthophoto image of the test area was collected, and the flying height of the UAV was selected to be 22 m. A digital image taken at this height can not only avoid airflow disturbance and meet the needs of image registration fusion, but also pass the higher resolution required for capturing the Sagittaria trifolia. In short, it ensures both flight efficiency and detection accuracy. The total area of the demonstration area was 300 acres, and two representative fields were randomly selected as Field 1 and Field 2. Both experimental fields were 12 acres in size, totaling 24 acres. In order to ensure that the drone images can be successfully registered and fused, the drone and camera parameters were set so that there was an 80% overlap area between the two adjacent photos. Image alignment fusion of the collected remote sensing images of the paddy field was performed using DJI Terra (v4.3.0, manufactured by DIJ Shenzhen, China).

2.2. Image Labeling and Dataset Construction

When the drone captures images of a rice field, there will be an 80 percent overlap area between two images taken adjacent to each other. The supposed data annotation was performed directly on the original image, due to the existence of most of the overlapping regions in the adjacent images. This overlap could not only cause data redundancy, but also may cause inconsistent annotation or omission due to human error and other factors, and this inconsistency and omission of annotation would seriously affect the accuracy of the algorithm. To prevent this issue, the study initially utilized DJI Terra for image alignment and fusion of the weed UAV remote sensing data collected in 2024.

To adapt to the computer’s computing performance, meanwhile, and to speed up the convergence model’s speed of training, the image after alignment and fusion was cut into 640 × 640 pixel non-overlapping sub-images. A total of 1000 remote sensing images were obtained for Field 1 after image cutting, and 1008 images were obtained for Field 2, making a final total of 2008 remote sensing images obtained. The aligned, fused, and cropped images are shown in Figure 2. At the tillering stage, weeds are not evenly distributed, with continuous growth in dense areas and single distribution in sparse areas. Therefore, in the process of manual labeling, we classified Sagittaria trifolia into two categories: single wild cichlids and continuous wild cichlids, as shown in Figure 3.

Manual annotation was performed using the image annotation tool Labelimg (v1.8.6). There were three categories of labels: ridge (ridge), single Sagittaria trifolia (single), and contiguous Sagittaria trifolia (multiple). The .txt file storing the information of labeled boxes was obtained to produce the YOLO format dataset, as shown in Figure 2.

Field 1 and Field 2 obtained 503 and 614 images containing Sagittaria trifolia or ridges and corresponding labels, respectively, for a total of 1117 images and corresponding labels. To prevent data leakage, the model was initially trained and tested by dividing the dataset into training, validation, and test sets in a 7:2:1 ratio, consisting of 782 frames for the training set, 223 frames for the validation set, and 112 frames for the test set. To avoid the model’s tendency to overfit because of a small number of samples and to enhance the model’s generalization ability while keeping the ratio of the training set, validation set, and test set unchanged, the samples and labels of the training set, validation set, and test set were each subjected to data enhancement operations such as random translation and rotation, flipping, shear transformations, random noise, random brightness, random clipping, and simulated sun flare. The size of the dataset was enlarged to three times of the original size, and a final dataset of 2346 frames for the training set, 670 frames for the validation set, and 336 frames for the test set was produced. The samples’ quantity in the dataset is shown in Table 1.

2.3. YOLOv10n Baseline Model and Pre-Test

The paddy field environment is more complex, so a first pre-test was taken to find out the shortcomings of the model, and then make targeted improvements.

YOLOv10 currently contains n, s, m, l, and x versions, with increasing dimensions and parameters in that order. The overall design is divided into three elements: the backbone network (Backbone), the neck network (Neck), and the detection head (Head). As shown in Figure 4, the structure of the YOLOv10n network is depicted. Backbone is responsible for feature extraction and consists of three main modules, Conv, C2f, and SPPF. Neck converges the feature maps of different sizes extracted by Backbone so that the shallow features are fused with the deeper ones. Head uses decoupling head to process classification, regression, and confidence tasks separately and converge the outputs.

Due to the limited computing power of the equipment in the agricultural environment, a pre-test was conducted on the field weed dataset using the YOLOv10n base model. Figure 5 presents the test results. The model failed to accurately detect small target weeds when the weed target was too small. Weeds were also not accurately detected when they were obscured by rice. Furthermore, weeds that look too similar to rice were not accurately detected. In addition, restrictions on the UAV equipment limited the model’s size, the parameter count, and the amount of computation.

2.4. Construction of YOLOv10n-FCDS Based on Improved Weed Detection Model

The pre-test revealed that although YOLOv10n showed strong adaptability, it did not perform well on problems such as high numbers of small-targeted weeds, mutually obscured weeds, and high similarity of weeds to rice. Therefore, the following improvements were made to YOLOv10n:

(1) Using FasterNet to replace the backbone network to reduce redundant computation and improve computational efficiency while improving the model’s effectiveness in feature extraction for small target weeds. (2) Using CGBlock to replace the regular convolution and SCDown down-sampling modules in the neck network to improve the model’s ability to detect weeds that overlap with and are obscured by rice by focusing on surrounding and global contextual information. (3) Introducing the DySample up-sampling module, which improves the algorithm’s anti-interference effectiveness through the point-based sampling strategy and at the same time boosts the algorithm’s effectiveness in identifying weeds that are similar in appearance to rice. (4) Designing a lightweight detection head SCSD-Head relying on shared convolution and scale scaling to further reduce parameters and the number of operations, reduce the algorithm size, and optimize the algorithm’s speed. The revised model architecture is displayed in Figure 6.

2.4.1. Backbone Feature Extraction Network Design

The backbone, serving as the model’s feature extraction module, plays a crucial role in the detection results. Therefore, the first improvements were made to the backbone network.

The need to increase the efficiency of the UAV has resulted in the need to detect weeds at higher altitudes. Due to the height and weeding period, there are usually more weeds with smaller sizes, as shown in Figure 5d. A large amount of irrelevant information may therefore be captured in the process of extracting defective features, and with increasing neural network depth, the number of channels of the feature maps increases, resulting in more redundant information attached to the feature maps of multiple channel counts. This not only affects the detection speed of the model but also reduces the detection accuracy.

To solve this problem, this study replaces the backbone with a FasterNet network. The FasterNet network junction can effectively reduce redundant computation and memory access and fully extract spatial features. The replacement backbone network uses PConv, which differs from conventional convolution by applying standard convolution for feature extraction on only a subset of the input channels, while keeping the remaining feature map channels intact. This approach utilizes surplus information in the feature map and further reduces the computational cost. Based on the FasterNet backbone network proposed by PConv, its structure is shown in Figure 7, which contains four hierarchical levels. In front of each hierarchical level, there is an embedding layer with a batch size of four regular 4 × 4 convolutions, or a merging layer with a batch size of two regular 2 × 2 convolutions which is used for down-sampling of the space with the number of channel expansion. The PConv layer in the FasterNet module can effectively reduce the redundant information generated by the smaller Sagittaria trifolia targets during the feature extraction process.

Among them, the FasterNet Block consists of one PConv 3 × 3 module and two Conv 1 × 1 modules. The initial target features are first extracted by PConv, then the feature extraction is enhanced by two convolutions, and finally, the deep-level features are extracted by connecting with the input features. The working principle of PConv is shown in Figure 7.

PConv [48] extracts features using a filter for channels of dimension h × w × c_p for a feature map of dimension h × w × c as input. Keeping the number of remaining channels constant, the processed channels are spliced with the unprocessed channels, and the output is mapped with constant dimension to the unprocessed feature map, thus having the same channel count. PConv reduces redundant computations and preserves the original number of channels substantially, resulting in higher computing speeds than conventional convolution. Therefore, FasterNet was used as the primary feature extraction backbone to make the algorithm more accurate for smaller Sagittaria trifolia detection.

2.4.2. Neck Down-Sampling Module Design

The neck of the model is mainly responsible for the fusion of features at different scales, and through better feature fusion, the model can detect and localize objects of different sizes more accurately. Therefore, the down-sampling module of the neck, which is responsible for filtering important features, was improved.

In the complex paddy field environment, there is often a situation where weeds and rice shade each other, as shown in Figure 5e. At this time, if the model only focuses on the feature information in the labeled box, it often results in the omission of detection or low confidence in the detection box, and so on. Therefore, it is required that the model can observe the surrounding context information and the global context information while focusing on the local information. Therefore, in this study, CGBlock in CGNet was used to replace the regular convolution and SCDown down-sampling module in the YOLOv10n neck (Neck).

The CGBlock includes a local feature extractor

f_{l o c} (*)

, a surrounding context extractor

f_{s u r} (*)

, a joint feature extractor

f_{j o i} (*)

, and a global context extractor

f_{g l o} (*)

, as depicted in Figure 8.

CGBlock is composed of two main phases. At the outset, local features and the corresponding surrounding context are learned using

f_{l o c} (*)

and

f_{s u r} (*)

, respectively. Instantiate

f_{l o c} (*)

as a standard convolution layer of size 3 × 3 and learn local features from 8 neighboring feature vectors. Additionally, because the cavity expansion convolution features a comparatively large receptive field and it can efficiently understand the surrounding context,

f_{s u r} (*)

is established as a 3 × 3 cavity expansion convolutional layer. Thus,

f_{j o i} (*)

obtains joint features from the outputs of

f_{l o c} (*)

and

f_{s u r} (*)

. Design

f_{j o i} (*)

as a cascade layer followed by Batch Normalisation (BN) and Parameter ReLU (PReLU) operators. In the subsequent step,

f_{g l o} (*)

extracts the global context to improve the joint features. The global context functions as a weighting vector and is utilized for the channel-level refinement of the joint features to underscore effective parts and suppress useless ones. Instantiate

f_{g l o} (*)

as a layer for global average pooling to aggregate global contexts, which are then further extracted by a multilayer perceptron. Finally, a scale layer is used to quadratically weight the joint features with the extracted global context.

2.4.3. Neck Up-Sampling Module Design

The up-sampling module of the neck is mainly responsible for recovering the detailed information to better capture the boundaries and shapes of the target, which facilitates the detection of Sagittaria trifolia, and therefore an improvement of the up-sampling is necessary.

In the rice field weed detection task, weeds and rice are similar to some extent due to the complexity of the environment, as shown in Figure 5f. Therefore, this paper introduces the DySample lightweight dynamic up-sampling module to address the problems of large workloads and high demand for feature guidance brought by kernel-based dynamic up-samplers (e.g., CARAFE, FADE, and SAPA, etc.). This bypasses the dynamic convolution and formulates the up-sampling strategy from the point-sampling point of view, which not only effectively enhances the model’s interference resistance but also better detects weeds that have high similarity with rice. It has fewer parameters than the kernel-based dynamic up-sampling module, which saves computational resources and is more conducive to industrial real-time detection.

The main flow of sampling on the DySample module is shown in Figure 9, where

χ

is a feature map with size C × H₁ × W₁,

δ

is a sampling set of points with size 2g × H₂ × W₂, and 2g represents the x and y coordinates in the first dimension. The grid_sample function uses the location information in the point sampling set δ to resample the feature map

χ

to

χ^{'}

of size C × H₂ × W₂, as demonstrated in Equation (1):

χ^{'} = g r i d_s a m p l e (χ, δ),

(1)

Figure 10 shows the set of point samples based on the dynamic range factor

δ

generation process. First, the up-sampling scale factor s and the feature map

χ

of size C × H × W are given. Next, a linear layer is used to generate the offset O. The number of input channels in this linear layer is C and the number of output channels is 2gs². In the end, the generated offset O has a size of 2gs² × H × W, as shown in Equation (2):

O = l i n e a r (χ),

(2)

It is subsequently reshaped through pixel shuffling (pixel shuffle) into a high-resolution raw sampled grid G of 2g × sH × sW. Finally, the point sampling set

δ

is obtained by combining the offset O with the original sampling grid G, as shown in Equation (3):

δ = O + G,

(3)

2.4.4. Detection Head Design

The accuracy of the model’s detection is closely tied to the detection head, and for different detection tasks, the detection head can be designed to better match the characteristics of the target so that it can ensure accuracy while reducing redundant operations.

The YOLOv10n original detection head is computationally intensive in training and applications, and when practically applied to resource-constrained platforms such as UAVs, the larger number of parameters may lead to an increase in inference time, thus affecting the responsiveness of the device. Therefore, to improve the computational efficiency while not affecting the accuracy as much as possible, this research introduces a lightweight detection head SCSD-Head relying on shared convolution and scale scaling, the structure of which is shown in Figure 11.

To minimize the image of lightness on accuracy, GN(Group Norm)Conv was used. Of these, the three GNConv1×1 do not share parameters. The 6 GNConv3×3 modules shown in Figure 11 use shared parameters. The 6 red modules in the figure, including RegConv and ClsConv, also use shared parameters. The parameter count can be significantly minimized by using a parameter-sharing convolution module, which makes the model lighter, especially on resource-constrained devices. To cope with the problem of inconsistent target scales detected by each detection head, such as the target scales of field ridges and weeds that are very different, three scale layers with unshared parameters were used along with a shared convolution to scale the different scale features separately to ensure accuracy.

2.5. Test Environment Configuration and Parameter Setting

The experimental processing platform was comprised of a 10thGen Intel(R) Core(TM)i9-10980XE CPU with 3.00 GHz (manufactured by Intel, Santa Clara, CA, USA), 64 GB RAM (manufactured by Intel, Santa Clara, CA, USA), and a GPU model NVIDIA Quadro RTX 5000 with 64 G of video memory (manufactured by Intel, Santa Clara, CA, USA). The software environment consisted of Cuda 12.0 + Cudnn 8.7.0 + conda 22.9.0 + Python 3.8.16. The operating system was Windows 10 Professional 64-bit. During model training, the initial learning rate was defined as 0.01. The batch size was defined as 16. The maximum iterations were defined as 300. Early stop was not used. Using a Stochastic Gradient Descent(SGD) optimizer to update the weight parameters, the decay strategy for the learning rate (lr) can be characterized as follows:

l r = b a s i c_l r \cdot {(1 - \frac{i t e r_i n d e x}{m a x_i t e r})}^{p},

(4)

where basic_lr represents the basic learning rate, max_iter represents the maximum count, iter_index is the iteration index, and p is the polynomial decay exponent. For this paper, the basic learning rate was set to 0.001, the momentum was set to 0.9, weight decay was set to 1 × 10⁻⁴, and the lower limit for learning rate update was 0. All models were trained using the above settings.

During training, the distance between the probability distribution predicted by the algorithm for the pixel class and the true label class probability distribution was measured by the Cross-Entropy Loss function, which can be calculated as follows:

L o s s = \frac{1}{M} \sum_{i = 1}^{M} \sum_{C = 1}^{N} h (b_{i}) l o g (p_{i c}),

(5)

where M refers to the pixel quantity, N refers to the category count, i refers to the pixel being processed, C indicates the category being considered, b_i indicates the correct labeling category for pixel i; h indicates a probability distribution function that varies from 0 to 1, being 1 when b_i is equal to c and 0 otherwise.; and p_ic denotes the likelihood that pixel i belongs to category c, calculated using the Sigmoid function on the predicted category score. In the process of iterative training, the loss value was used to measure the training effect of the model. Then, backpropagation was used to optimize the weight parameters so that the loss value of the model was reduced. Finally, the loss value was reduced to a more stable value, thus completing the model training.

2.6. Evaluation Indicators

To objectively evaluate the detection effect of the algorithm on weed images in rice fields and to conduct a quantitative assessment of the algorithm’s performance, this research used the quantity of parameters, the quantity of floating point operations per second GFLOPs, the inference speed FPS, the algorithm size, the precision rate Precision, the recall rate Recall, and mAP (mean Average Precision) to assess the performance of the improved algorithm. It is defined as follows:

P r e c i s i o n = T P / (T P + F P),

(6)

R e c a l l = T P / (T P + F N),

(7)

A P = \int_{0}^{1} P (R) d R,

(8)

m A P = (\sum_{i = 1}^{N} {A P}_{i}) / N

(9)

where N indicates the total number of detection categories. Precision and recall are based on a threshold of IOU = 0.5. After the model tests the samples, there will be four situations, namely true positive (TP), true negative (TN), false positive (FP), and false negative (FN).

2.7. The Principle of Grad-CAM

Grad-CAM (gradient-weighted Class Activation Mapping) is a method for interpreting convolutional neural network decisions. The key idea of Grad-CAM is to multiply the gradient of the output class with the output of that layer and then average it to obtain a rough heat map. This heat map can be enlarged and overlaid on the original image to show which areas the model is most focused on when classifying.

2.8. Test Results and Analyses

The improved model needs to carry out an ablation test to verify whether each improved module has a positive effect on the model, and observe the areas of high concern of the model through heat map. For the deficiencies found in the pre-test, it should then observe whether the model has improved its ability to detect these special data through the comparison of the model’s performance before and after the improvement, and with this, and a targeted test of the model by the reflection of the water to see whether the model has anti-jamming ability in the case of the reflection of sunlight water, it should test whether the model proposed in this paper is good enough through comparison with the classical model. Finally, it should formulate the spraying strategy and draw the variable spraying prescription map.

3. Result

3.1. Results of the Ablation Test

The loss value of the algorithm during training before and after improvement is shown in Figure 12. As can be seen from Figure 12, in the early stage of model training, the convergence loss value decreases rapidly; that is, the model converges quickly. This shows that the model has a good learning effect on data features in the first 200 rounds of training, and can learn important features in the data. As the epoch increases with the training, the rate of decline of the model loss value becomes slow and stabilizes after about 200 rounds. Finally, the loss of YOLOv10n-FCDS drops to 1.921. The model weight when the loss value drops to the lowest value is the optimal model weight obtained by training. The loss value of YOLOv10n-FCDS is always lower than that of YOLOv10n during the whole training process, which shows that the overall improvement has enhanced the model performance.

In order to verify the effectiveness of each modification on each module, an ablation test was designed compared with YOLOv10n, and the unimproved model was denoted as A. The different modifications to the model are shown in Table 2. The same dataset was used for model training, validation, and testing. The results of the tests are displayed in Table 3.

As can be seen from Table 3, YOLOv10n-FCDS substantially improves the model detection accuracy with a small increase in the computational and parametric quantities.

The use of FasterNet as the backbone network resulted in a 1.4 percentage point improvement in model accuracy (mAP50) due to the use of more efficient operators, fewer memory accesses, and the use of PConv even more so, while increasing the speed of model inference. After further replacing the neck conventional convolution and SCDown down-sampling with CGBlock, the model’s ability to focus on both surrounding and global contextual information is improved due to the use of surrounding and global feature extractors, which in turn allows the model to better detect weeds that intersect with rice and further improves the accuracy by 0.7 percentage points. After further replacing the up-sampling module of the original neck with DySample, the use of a sampling strategy based on point sampling makes the model more resistant to interference and better able to detect weeds similar in appearance to rice, which in turn leads to a further improvement in the accuracy of the model by 0.6 percentage points. Using the lightweight detection head SCSD-Head, the algorithm’s parameter count decreased by 16% and the computational effort decreased by 18% due to the use of a large number of convolutions with shared parameters. The computational speed increased by 5%, the size of the model was diminished by 16%, and the model accuracy decreased by only 0.01 percentage points due to the use of Group Norm convolutions. From the performance of SCSD-Head, it can be seen that the detector head achieves the module’s original design intention of effectively reducing the parameter amount and computation, improving the FPS of algorithm inference and reducing the model size.

The modified model demonstrates greater proficiency in extracting the significant features of rice weeds, and at the same time, it has little effect on the working efficiency of the model, which can then better complete the task of weed detection in rice fields.

3.2. Heat Map Analysis of the Model Before and After Improvement

In order to more intuitively observe the areas of concern of the model when YOLOv10n-FCDS detected weeds in the paddy field, and also to observe the difference between the internal working mechanism of the model before and after the improvement of YOLOv10n, the extraction of the features of the weeds in paddy fields were compared using the Grad-CAM visualization technique. All of the elements are the last of the encoders of the algorithm. Figure 13 displays the results.

As can be seen from the YOLOv10n and YOLOv10n-FCDS ridge sections in Figure 13, YOLOv10n’s focus on the ridge is incomplete. In YOLOv10n-FCDS, due to the use of the CGBlock for down-sampling, the model can fully take into account both the surrounding and global contextual features. This makes the model’s sensory field larger, and thus improves the model’s target detection. In addition, as can be seen from the rice region of YOLOv10n and YOLOv10n-FCDS in the ridge section of Figure 13, YOLOv10n focuses too much on rice features due to the higher similarity between rice and weeds, whereas due to the use of DySample for up-sampling, YOLOv10n-FCDS’s internal sampling strategy based on point-sampling makes the model better able to distinguish weeds with a high similarity to rice.

As can be seen from the smaller weed targets at the bottom of the YOLOv10n and YOLOv10n-FCDS heat maps in the weed section of Figure 13, YOLOv10n has insufficient feature extraction capability for smaller single-plant targets, whereas YOLOv10n-FCDS’s capacity for feature extraction from small targets is made possible due to the use of FasterNet as the main feature extraction network, thus improving the original model. In addition, it can also be seen from the YOLOv10n and YOLOv10n-FCDS heat maps of the weed section in Figure 13 that the YOLOv10n-FCDS area of interest is larger, which is due to the use of the CGBlock, which makes YOLOv10n-FCDS focus more often on the outskirts of the contiguous weeds that are overlapped with or shaded by the rice.

3.3. Comparison of Model Performance Before and After Improvement

To quantitatively describe the change in detection performance against specific types of weeds before and after model improvement, 30 small target weeds, 30 weeds obscured by rice, and 30 weed datasets similar to rice were selected from the test set, respectively. The mAP50 for these three datasets, both pre-and post-model improvement, is displayed in Table 4. It is evident from Table 4 that YOLOv10n-FCDS increased the detection precision when compared to YOLOv10n by 2.5, 2.8, and 3.0 percentage points for targeting small-targeted weeds, weeds obscured by rice, and weeds similar to rice, respectively.

The detection results of YOLOv10n-FCDS are shown in Figure 14. It can also be seen from Figure 14d that the problem of missed recognition due to small targets has been improved. As can be seen in Figure 14e, weeds that are obscured by rice can also be accurately detected. It is evident from Figure 14f that the problem of missed detection of weeds similar in appearance to rice has also been solved. It can also be seen that the confidence level of the overall model detection has also been significantly improved, providing ample evidence of the effectiveness of the improvements.

3.4. Tests of Model Immunity Under Different Intensities of Water Reflections

UAVs are susceptible to light, and during the rice tillering period, the exposed area of the water body is still relatively large and prone to sunlight reflection, which may lead to misidentification of rice or omission of identification of weeds. To verify the performance of the model under different water reflection intensities, different samples under strong and weak light in the test set were selected for further testing. As displayed in Figure 15, the experimental results are provided. The outcomes point to the fact that the improved model can achieve the detection of weeds under different intensities of water reflections.

3.5. Performance Comparison with Classical Models

YOLOv10n-FCDS was compared with Faster R-CNN, SSD, YOLOv8n, and YOLOv9t, several mainstream target detection models which were trained with the same training parameters and training sets. The parameters, GFLOPs, FPS, Recall, and mAP50 values of different models were compared separately, as shown in Table 5.

As can be seen from Table 5, the number of parameters and GFLOPs of Faster R-CNN and SSD are significantly higher than those of other models. The reason for this is that both Faster R-CNN and SSD are two-stage target detection algorithms. This kind of algorithm needs to first use an RPN (Regional Proposal Network) to obtain the candidate region, and then classify the candidate region. Therefore, the parameter number and calculation amount of this model are significantly higher than that of the single-stage object detection model.

The YOLO series models are single-stage models. In this way, the complicated process of calculating the candidate region is omitted, thus reducing the complexity of the model and improving its efficiency. The algorithm YOLOv10n-FCDS, as proposed in this study, maintains a strong equilibrium between processing speed and precision. Specifically, the number of parameters was 3,989,749, GFLOPs was 9.7, FPS was 424.0, Recall was 0.806, and mAP50 was as high as 0.874. Compared to YOLOv8n, although the number of parameters and the computation volume have risen slightly, the model inference speed and accuracy have been significantly accelerated and improved.

In addition, the model YOLOv10n-FCDS proposed in this paper is only 0.1 ms slower compared to YOLOv9t, but the accuracy of the model is improved by 6.5 percentage points. The above advantages of the model are due to the adoption of FasterNet as the backbone feature extraction network in YOLOv10n-FCDS, which makes extensive use of PConv, enabling the model to better extract features of small target weeds and reduce the computation of redundant information. This improves the accuracy while making only small fluctuations in the parameter and the operation amount. Moreover, the use of the CGBlock down-sampling module allows the model to focus on the surrounding and global contextual features, boosting the algorithm’s capacity to detect obscured weeds. In addition, the point-based sampling strategy in the DySample up-sampling module resulted in an improved ability to discriminate between rice and weeds and enhanced the model’s immunity to interference. Finally, the lightweight detection head based on shared convolution and scale scaling proposed in this paper attains a noteworthy reduction in the quantity of parameters and operations with almost constant accuracy.

All in all, YOLOv10n-FCDS has a high detection accuracy (mAP50) while maintaining low latency and several model parameters, which is suitable for applications such as rice field weed detection, where efficiency is required while taking into account the accuracy needed for target detection.

3.6. Development of Spraying Strategies and Mapping of Prescriptions

To demonstrate how weeds are distributed in the field, all the non-overlapping sub-images of Fields 1 and 2 were input into the improved model YOLOv10n-FCDS in this study and these sub-images were used for the automatic detection of weeds in paddy fields. Figure 16 shows the complete weed detection results of Plots 1 and 2. They are composed of several non-overlapping sub-images detected by YOLOv10n-FCDS and then spliced. In this figure, a single Sagittaria trifolia target is shown in the yellow target detection box, a continuous barnyard grass target is shown in the red target detection box, and a ridge target is shown in the white target detection box.

Both fields selected for this study were 12 acres in size, so both fields were divided into 12 plots based on 1 plot per acre equivalents. The division scheme and corresponding numbers are shown in the orange dashed grid in Figure 16. Then, based on the number of weeds in each plot, the amount of herbicide needed in that plot was determined. In this investigation, the leaf volume of continuous weeds is mostly 3 to 5, so the mean value is taken to regard the continuous weeds as 4 weeds, and then the sum of the number of single weeds and the number of weeds converted from continuous weeds to single weeds in each plot is taken as the total weed figures for each plot. As shown in Figure 17, the statistical results are provided.

The application rates of herbicides in the experimental plots were adjusted according to the weed populations in Figure 18. Specifically, the conventional application rate for farmers in the Aji Township area is 1.5 L/acre, and plots with weed counts above 600 are applied at this rate, corresponding to the red area in Figure 18a. Weed counts in the 600 to 450 range were applied at 85 percent of the conventional application rate, i.e., 1.27 L/acre, corresponding to the yellow areas in Figure 18a,b. When the weed population was in the 450 to 300 range, the application rate was adjusted to 70% of the conventional application rate, i.e., 1.05 L/acre, which corresponds to the green areas in Figure 18a,b. If the weed population is in the 300 to 150 range, the application rate is reduced to 50 percent of the conventional rate, i.e., 0.75 L/acre, which corresponds to the blue areas in Figure 18a,b.

If all the plots were applied at the conventional pesticide application rate, a total of 36.00 L of pesticide would be consumed, whereas in applying the pesticide according to the pesticide spraying strategy of the present study, a total of 28.53 L of pesticide would be applied to the two testing fields. In comparison to the conventional method of application, the method of the present study saves about 20.75 percent of the herbicide volume.

4. Discussion

In this study, chemical use was reduced to maximize the effectiveness of weed control while at the same time reducing the amount of chemicals used. Employing a deep learning algorithm, we recognized weeds in rice fields and developed variable application prescription maps based on the results. The experimental findings indicate that the enhanced algorithm YOLOv10n-FCDS improves the accuracy of mAP50 by 2.5 percentage points to 87.4% compared to the pre-improved model YOLOv10n. Application strategies were developed, and variable spraying prescriptions were mapped based on the weed identification results. The application method of this study was calculated to save 20.75 percent of herbicides compared to the traditional crude method of spraying herbicides over a large area.

The large number of plants with small leaves (small target weeds) at the rice tillering stage, the fact that some of the weeds will be shaded by the rice, and that some of the weeds are similar in appearance to the rice, makes the identification of Sagittaria trifolia in paddy fields via UAV-based remote sensing challenging. To tackle the aforementioned issues, this study introduces specific enhancements and creates a rice field weed detection algorithm that optimizes both detection performance and efficiency, enabling effective weed detection in challenging real-world conditions.

Firstly, the use of FasterNet to replace the backbone feature extraction network improved the algorithm’s capacity for detecting weeds in small targets. Secondly, the use of CGBlock to replace the regular convolution and down-sampling module in the neck improves the algorithm’s capacity to detect obscured weeds. The ability of the model to detect weeds with a high degree of similarity to rice was then improved by replacing the neck up-sampling module with a DySample. Finally, a lightweight detection head is proposed that drastically reduces the quantity of model parameters and computational requirements while hardly affecting the accuracy. However, when constructing the dataset in this study, only two rice varieties of rice field weed samples were collected; however, there are also differences between different varieties of rice, which may lead to the limited generalization ability of the identification model. Therefore, in future studies, we need to collect a wide range of rice field weed samples from different rice varieties to improve the generalization ability of the algorithm.

In this research, two field divisions were split into 12 experimental plots, variable application strategies were developed, and variable application prescriptions were mapped based on weed identification. The whole process from collecting rice field weed data to developing variable spray prescription maps took several days to collect and process the data. The complexity and non-real-time nature of this process presents challenges to its widespread implementation. To address these challenges, we are going to create a lightweight version of the model and incorporate the rice field weed detection algorithm into the UAV’s embedded system in our subsequent studies. This will be the focus of our next research, to significantly improve the efficiency of weed recognition and reduce human intervention. We will also aim to simplify the procedure for generating variable spray prescription maps. Overall, our approach offers a potential solution for precision agriculture to reduce pesticide use and reduce the risk of environmental pollution by tailoring the application program to the number of weeds in each plot of the rice field.

5. Conclusions

In this study, a weed detection algorithm for rice fields (YOLOv10n-FCDS) was designed for the purpose of detecting Sagittaria trifolia, a major weed in rice fields. A variable spray prescription map was produced based on the detection results. The main conclusions are summarized below:

(1) The weed detection model YOLOv10n-FCDS was constructed. The baseline model YOLOv10n’s weed detection accuracy (mAP50) was 84.8%. The low accuracy was due to the presence of many small target weeds, obscured weeds, and weeds similar to rice. Improvements were made to the model to address the three issues mentioned above: Firstly, for small target weeds, the original backbone network was replaced by FasterNet in the backbone network section, with a 1.4% improvement in accuracy (mAP50). Secondly, for the obscured weeds, the accuracy of mAP50 was again improved by 0.7% using CGBlock as the neck down-sampling module. Thirdly, for the problem of weeds being similar to rice, the accuracy of mAP50 was again improved by 0.6% using DySample as the up-sampling module for the neck. Finally, a lightweight detection head, SCSD-Head, was proposed, with a 0.1% decrease in accuracy (mAP50), a 16.0% decrease in number of parameters, and a 17.8% decrease in computation. YOLOv10n-FCDS achieved a mAP50 of 87.4%, a 2.6% improvement relative to the baseline model.

(2) The weed detection ability and universality of the weed detection model YOLOv10n-FCDS were tested in detail under a complex rice field environment. In the weed database, small target weeds, obscured weeds, and similar weeds were screened, and the test data sets were constructed, respectively, and the model was tested. The results showed that: Compared to the baseline model, the mAP50 of YOLOv10n-FCDS improved by 2.5% to 87.2% on the small target dataset, 2.8% to 87.1% on the obscured dataset, and 3.0% to 86.9% on the similar dataset. The results fully prove that the weed detection model YOLOv10n-FCDS proposed in this paper is able to adapt to complex rice field environments and can meet the needs of variable herbicide UAV spraying and accurate management.

(3) This study explored an “efficient weed detection + prescription map generation + near-ground variable spraying” weed control model. Based on rice orthophotos collected efficiently (22 m) by a UAV, the weed detection model YOLOv10n-FCDS was used to achieve effective detection of weed populations. The weed count to application conversion rule was developed to generate variable spraying prescription maps to guide UAV near-field variable spraying operations.

Author Contributions

Methodology, software, writing—original draft, validation, Y.L.; methodology, validation, Z.G.; methodology, Y.S.; validation, X.C.; resources, writing—review and editing, Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research Project of the Liaoning Provincial Department of Education, grant number JYTZD2023123.

Institutional Review Board Statement

This study did not require ethical approval.

Data Availability Statement

The data that have been used are confidential.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Zhu, J.; Zang, X.; Li, T. China’s food security risks and prevention strategy under the new development pattern. Chin. Rural Econ. 2021, 36, 1616–1630. (In Chinese) [Google Scholar]
Yuan, H.; Zhao, N.; Cheng, M. Review of weeds recognition based on image processing. Trans. Chin. Soc. Agric. Mach. 2020, 51, 323–334. (In Chinese) [Google Scholar] [CrossRef]
Yu, F.; Jin, Z.; Guo, S.; Guo, Z.; Zhang, H.; Xu, T.; Chen, C. Research on weed identification method in rice fields based on UAV remote sensing. Front. Plant Sci. 2022, 13, 1037760. [Google Scholar] [CrossRef]
De Silva, R.; Cielniak, G.; Gao, J. Vision based crop row navigation under varying field conditions in arable fields. Comput. Electron. Agric. 2024, 217, 108581. [Google Scholar] [CrossRef]
Liang, Y.; Li, H.; Wu, H.; Zhao, Y.; Liu, Z.; Liu, D.; Liu, Z.; Fan, G.; Pan, Z.; Shen, Z.; et al. A rotated rice spike detection model and a crop yield estimation application based on UAV images. Comput. Electron. Agric. 2024, 224, 109188. [Google Scholar] [CrossRef]
Qu, H.R.; Su, W.H. Deep Learning-Based Weed–Crop Recognition for Smart Agricultural Equipment: A Review. Agronomy 2024, 14, 363. [Google Scholar] [CrossRef]
Ghosh, D.; Brahmachari, K.; Skalický, M.; Roy, D.; Das, A.; Sarkar, S.; Moulick, D.; Brestic, M.; Hejnak, V.; Vachova, P.; et al. The combination of organic and inorganic fertilizers influence the weed growth, productivity and soil fertility of monsoon rice. PLoS ONE 2022, 17, e0262586. [Google Scholar] [CrossRef] [PubMed]
Jiao, J.; Zang, Y.; Chen, C. Key Technologies of Intelligent Weeding for Vegetables: A Review. Agriculture 2024, 14, 1378. [Google Scholar] [CrossRef]
Shi, J.; Bai, Y.; Zhou, J.; Zhang, B. Multi-Crop Navigation Line Extraction Based on Improved YOLO-v8 and Threshold-DBSCAN under Complex Agricultural Environments. Agriculture 2023, 14, 45. [Google Scholar] [CrossRef]
Guo, Z.; Cai, D.; Zhou, Y.; Xu, T.; Yu, F. Identifying rice field weeds from unmanned aerial vehicle remote sensing imagery using deep learning. Plant Methods 2024, 20, 105. [Google Scholar] [CrossRef]
Wen, D. Research and Application of Deep Learning in Weed Recognition of Complex Features. Master’s Thesis, Xinjiang University, Urumqi, China, 2021. (In Chinese). [Google Scholar]
Xu, X.; Quan, H.; He, K.; Wang, L.; Wang, X.; Wang, Q. Proportional fluorescence sensing analysis of pesticide residues in agricultural environment. Trans. Chin. Soc. Agric. Mach. 2020, 51, 229–234. (In Chinese) [Google Scholar] [CrossRef]
Taiwo, A.M. A review of environmental and health effects of organochlorine pesticide residues in Africa. Chemosphere 2019, 220, 1126–1140. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Yang, L.; Li, J.; Qin, L. Weed Identification Method in Rice Field Based on Improved DeepLabv3+. Trans. Chin. Soc. Agric. Mach. 2023, 54, 242–252. (In Chinese) [Google Scholar] [CrossRef]
Wenchao, Y.; Jin, H.E.; Jingkai, Z.; Hongwen, L.; Qingjie, W.; Caiyun, L. Design of Wheat Variable Spray System Based on Machine Vision and Beidou Positioning. Nongye Jixie Xuebao/Trans. Chin. Soc. Agric. Mach. 2022, 53, 1000–1298. (In Chinese) [Google Scholar] [CrossRef]
Yang, G.; Kong, H.; Lan, Y.; Yi, L.; Han, X. Key technologies and application status of precise variable application. Agric. Technol. Equip. 2022, 1, 96–97. (In Chinese) [Google Scholar] [CrossRef]
Anderegg, J.; Tschurr, F.; Kirchgessner, N.; Treier, S.; Schmucki, M.; Streit, B.; Walter, A. On-farm evaluation of UAV-based aerial imagery for season-long weed monitoring under contrasting management and pedoclimatic conditions in wheat. Comput. Electron. Agric. 2023, 204, 107558. [Google Scholar] [CrossRef]
Mkhize, Y.N.; Madonsela, S.; Cho, M.; Nondlazi, B.; Main, R.; Ramoelo, A. Mapping Weed Infestation in Maize Fields Using Sentinel-2 data. Phys. Chem. Earth Parts A/B/C 2024, 134, 103571. [Google Scholar] [CrossRef]
Lan, Y.; Deng, X.; Zeng, G. Advances in diagnosis of crop diseases, pests and weeds by UAV remote sensing. Smart Agric. 2019, 1, 1. (In Chinese) [Google Scholar] [CrossRef]
Sapkota, B.; Sarkar, S.; Baath, G.S. Using UAS-multispectral images to predict cord yield under different planting dates. In Proceedings of the ASA, CSSA, SSSA International Annual Meeting, Baltimore, MD, USA, 6–9 November 2022. [Google Scholar]
Zhao, R.; Yuan, X.; Yang, Z.; Zhang, L. Image-based crop row detection utilizing the Hough transform and DBSCAN clustering analysis. IET Image Process. 2024, 18, 1161–1177. [Google Scholar] [CrossRef]
Zhao, J.; Li, Z.; Lu, L.; Jia, P.; Yang, H.; Lan, Y. Weed identification in maize field based on multi-spectral remote sensing of unmanned aerial vehicle. Sci. Agric. Sin. 2020, 53, 1545–1555. (In Chinese) [Google Scholar] [CrossRef]
Xia, F.; Quan, L.; Lou, Z.; Sun, D.; Li, H.; Lv, X. Identification and comprehensive evaluation of resistant weeds using unmanned aerial vehicle-based multispectral imagery. Front. Plant Sci. 2022, 13, 938604. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Xiong, Y.; Wang, X.; Cheng, N.; Liu, S.; Qi, L. Evaluation of weed density grade in paddy field seedling line zone based on tactile perception. Trans. CSAE 2023, 39, 116–125. (In Chinese) [Google Scholar] [CrossRef]
Wang, Z. High-Resolution Remote Sensing Image Classification Based on Deep Transfer Learning and Multi-Feature Network Fusion. Master’s Thesis, Nanjing University of Posts and Telecommunications, Nanjing, China, 2022. (In Chinese). [Google Scholar]
Zhang, R. Weed Identification and Location Based on Low Altitude Remote Sensing. Master’s Thesis, Northwest A&F University, Yangling, China, 2020. (In Chinese). [Google Scholar]
Huang, H. The Research on Weed Recognition in Rice Fields Using UAV Remote Sensing Imagery. Ph.D. Thesis, South China Agricultural University, Guangzhou, China, 2019. (In Chinese). [Google Scholar]
Zhu, S.; Deng, J.; Zhang, Y.; Yang, C.; Yan, Z.; Xie, X. Study on distribution map of weeds in rice field based on UAV remote sensing. J. South China Agric. Univ. 2020, 41, 67–74. (In Chinese) [Google Scholar] [CrossRef]
Deng, X. The Research of the Categories Identification and Target Detection with Weeds Based on Deep Learning in the Paddy Field. Ph.D. Thesis, South China Agricultural University, Guangzhou, China, 2019. (In Chinese). [Google Scholar]
Shi, J.; Bai, Y.; Diao, Z.; Zhou, J.; Yao, X.; Zhang, B. Row detection BASED navigation and guidance for agricultural robots and autonomous vehicles in row-crop fields: Methods and applications. Agronomy 2023, 13, 1780. [Google Scholar] [CrossRef]
Razfar, N.; True, J.; Bassiouny, R.; Venkatesh, V.; Kashef, R. Weed detection in soybean crops using custom lightweight deep learning models. J. Agric. Food Res. 2022, 8, 100308. [Google Scholar] [CrossRef]
Genze, N.; Ajekwe, R.; Güreli, Z.; Haselbeck, F.; Gried, M.; Grimm, D. Deep learning-based early weed segmentation using motion blurred UAV images of sorghum fields. Comput. Electron. Agric. 2022, 202, 107388. [Google Scholar] [CrossRef]
Cai, Y.; Zeng, F.; Xiao, J.; Ai, W.; Kang, G.; Lin, Y.; Cai, Z.; Shi, H.; Zhong, S.; Yue, X. Attention-aided semantic segmentation network for weed identification in pineapple field. Comput. Electron. Agric. 2023, 210, 107881. [Google Scholar] [CrossRef]
Kang, J.; Liu, G.; Guo, G. Weed detection based on multi-scale fusion module and feature enhancement. Trans. Chin. Soc. Agric. Mach. 2022, 53, 254–260. (In Chinese) [Google Scholar] [CrossRef]
Meng, Q.; Zhang, M.; Yang, X.; Liu, Y.; Zhang, Z. Recognition of maize seedling and weed based on light weight convolution and feature fusion. Trans. Chin. Soc. Agric. Mach. 2020, 51, 238–245. (In Chinese) [Google Scholar] [CrossRef]
Qu, S.; Yang, X.; Zhou, H.; Xie, Y. Improved YOLOv5-based for small traffic sign detection under complex weather. Sci. Rep. 2023, 13, 16219. [Google Scholar] [CrossRef]
Guzel, M.; Turan, B.; Kadioglu, I.; Basturk, A.; Sin, B.; Sadeghpour, A. Deep Learning for Image-Based Detection of Weeds from Emergence to Maturity in Wheat Fields. Smart Agric. Technol. 2024, 9, 100552. [Google Scholar] [CrossRef]
Tetila, E.C.; Moro, B.L.; Astolfi, G.; Costa, A.B.; Amorim, W.P.; Belete, N.A.; Pistori, H.; Barbedo, J.G. Real-time detection of weeds by species in soybean using UAV images. Crop Prot. 2024, 184, 106846. [Google Scholar] [CrossRef]
Long, C.; Tian, J. Study on the blocking effect of five herbicides on weeds in the rice transplanting field. Hunan Agric. Sci. 2022, 8, 52–54. (In Chinese) [Google Scholar] [CrossRef]
Su, J.; Yi, D.; Coombes, M.; Liu, C.; Zhai, X.; McDonald-Maier, K.; Chen, W. Spectral analysis and mapping of blackgrass weed by leveraging machine learning and UAV multispectral imagery. Comput. Electron. Agric. 2022, 192, 106621. [Google Scholar] [CrossRef]
Jin, X.; Liang, X.; Deng, P. Lightweight Daylily Grading and Detection Model Based on Improved YO LOv10. Smart Agric. 2024, 1, 1–12, (In Chinese with English abstract). [Google Scholar] [CrossRef]
Stefenon, S.F.; Seman, L.O.; Klaar, A.C.R.; Ovejero, R.G.; Leithardt, V.R.Q. Hypertuned-YOLO for interpretable distribution power grid fault location based on EigenCAM. Ain Shams Eng. J. 2024, 15, 102722. [Google Scholar] [CrossRef]
Stefenon, S.F.; Singh, G.; Souza, B.J.; Freire, R.Z.; Yow, K.C. Optimized hybrid YOLOu-Quasi-ProtoPNet for insulators classification. IET Gener. Transm. Distrib. 2023, 17, 3501–3511. [Google Scholar] [CrossRef]
Qu, S.; Cui, C.; Duan, J.; Lu, Y.; Pang, Z. Underwater small target detection under YOLOv8-LA model. Sci. Rep. 2024, 14, 16108. [Google Scholar] [CrossRef]
Yang, Q.; Wang, X.; Zhang, X.; Zheng, J.; Ke, Y.; Wang, L.; Guo, H. A novel deep learning method for automatic recognition of coseismic landslides. Remote Sens. 2023, 15, 977. [Google Scholar] [CrossRef]
Lu, P.; Jia, Y.; Zeng, W.; Wei, P. CDF-YOLOv8, City recognition system based on improved YOLOv8. IEEE Access 2024, 12, 143745–143753. [Google Scholar] [CrossRef]
Zhou, Y.; Xu, T.; Deng, H.; Miao, T.; Wu, Q. Self-supervised pose estimation method for a mobile robot in greenhouse. Trans. Chin. Soc. Agric. Eng. (Trans. CSAE) 2021, 37, 263–274. [Google Scholar] [CrossRef]
Chen, J.; Kao, S.; He, H.; Zhuo, W.; Wen, S.; Lee, C.; Chan, S. Run, Don’t Walk: Chasing Higher FLOPS for Faster Neural Networks. In Proceedings of the 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 12021–12031. [Google Scholar] [CrossRef]

Figure 1. Rice experimental field in Tieling City.

Figure 2. Schematic diagram of the graph cut after alignment fusion.

Figure 3. Rice weed dataset production.

Figure 4. Before improvement—YOLOv10n.

Figure 5. Detection results of YOLOv10n on weeds in rice fields.

Figure 6. YOLOv10n-FCDS.

Figure 7. FasterNet working principle.

Figure 8. Context guided block structure plan.

Figure 9. DySample module dynamic up-sampling.

Figure 10. Dynamic range factor-based point sampling.

Figure 11. Structure of lightweight detection head SCSD-Head.

Figure 12. Loss curve of model training and validation process before and after improvement.

Figure 13. Heat map for visualization of weed detection model in rice fields.

Figure 14. Detection results of YOLOv10n-FCDS on weeds in rice fields.

Figure 15. Model detection results under different water reflection intensities.

Figure 16. Distribution of weeds in rice fields.

Figure 17. Weed figures for each plot.

Figure 18. UAV variable spraying prescription map.

Table 1. Data set sample distribution.

Label Type	Number of Original Images		Number of Enhanced Pictures
Label Type	Field 1	Field 2	Field 1	Field 2
Ridge	76	119	228	357
Single	471	571	1413	1713
Multiple	299	314	897	942

Table 2. Different improvement methods.

Treatment	Models
Treatment	A	B	C	D	YOLOv10n-FCDS
FasterNet		√	√	√	√
CGBlock			√	√	√
DySample				√	√
SCSD-Head					√

Table 3. Effect of different methods on the model.

Models	Parameters	GFLOPs	FPS	Recall	mAP50/%	Size/MB
A	2,708,210	8.4	458.0	0.759	84.8	5.64
B	4,196,120	11.2	346.9	0.787	86.2	8.57
C	4,735,902	11.8	329.1	0.784	86.9	9.64
D	4,748,254	11.8	324.0	0.802	87.5	9.66
YOLOv10n-FCDS	3,989,749	9.7	424.0	0.806	87.4	8.12

Table 4. Model mAP50 for different types of weeds.

Weed Type	YOLOv10n	YOLOv10n-FCDS
Small target	84.7	87.2
Obscured	84.3	87.1
Similar	83.9	86.9

Table 5. The performance of each algorithm.

Models	Parameters	GFLOPs	FPS	Recall	mAP50
Faster R-CNN	41,581,651	133.8	71.7	0.63	0.749
SSD	13,311,149	14.0	220.2	0.62	0.719
YOLOv8n	3,006,233	8.1	356.2	0.802	0.839
YOLOv9t	1,930,339	7.6	413.1	0.782	0.809
YOLOv10n-FCDS	3,989,749	9.7	424.0	0.806	0.874

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Guo, Z.; Sun, Y.; Chen, X.; Cao, Y. Weed Detection Algorithms in Rice Fields Based on Improved YOLOv10n. Agriculture 2024, 14, 2066. https://doi.org/10.3390/agriculture14112066

AMA Style

Li Y, Guo Z, Sun Y, Chen X, Cao Y. Weed Detection Algorithms in Rice Fields Based on Improved YOLOv10n. Agriculture. 2024; 14(11):2066. https://doi.org/10.3390/agriculture14112066

Chicago/Turabian Style

Li, Yan, Zhonghui Guo, Yan Sun, Xiaoan Chen, and Yingli Cao. 2024. "Weed Detection Algorithms in Rice Fields Based on Improved YOLOv10n" Agriculture 14, no. 11: 2066. https://doi.org/10.3390/agriculture14112066

APA Style

Li, Y., Guo, Z., Sun, Y., Chen, X., & Cao, Y. (2024). Weed Detection Algorithms in Rice Fields Based on Improved YOLOv10n. Agriculture, 14(11), 2066. https://doi.org/10.3390/agriculture14112066

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Weed Detection Algorithms in Rice Fields Based on Improved YOLOv10n

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition

2.2. Image Labeling and Dataset Construction

2.3. YOLOv10n Baseline Model and Pre-Test

2.4. Construction of YOLOv10n-FCDS Based on Improved Weed Detection Model

2.4.1. Backbone Feature Extraction Network Design

2.4.2. Neck Down-Sampling Module Design

2.4.3. Neck Up-Sampling Module Design

2.4.4. Detection Head Design

2.5. Test Environment Configuration and Parameter Setting

2.6. Evaluation Indicators

2.7. The Principle of Grad-CAM

2.8. Test Results and Analyses

3. Result

3.1. Results of the Ablation Test

3.2. Heat Map Analysis of the Model Before and After Improvement

3.3. Comparison of Model Performance Before and After Improvement

3.4. Tests of Model Immunity Under Different Intensities of Water Reflections

3.5. Performance Comparison with Classical Models

3.6. Development of Spraying Strategies and Mapping of Prescriptions

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI