1. Introduction
Forest plantations play an important role in conserving Brazil’s natural ecosystems, including the Amazon and Atlantic forests, which collectively cover 59% of the country’s land area. The industrial forest plantation sector experienced a significant surge in 2017, expanding by 13.1% and generating total revenues of USD 22 billion [
1]. The tree-based industry played a pivotal role in this growth, compared to major sectors like manufacturing and agriculture. These plantations serve as essential resources for meeting the demands of the tree-based industries, such as wood, pulp, paper, and charcoal production. Among these resources, eucalyptus stands out as a prominent genus, occupying more than 77% of cultivated forest areas [
2]. This dominance in eucalyptus cultivation is driven by the rapid growth of the plants, resulting in high wood productivity and the seamless mechanization of primary cultivation activities.
The primary method of eucalyptus propagation relies on clonal seedlings, allowing for intensified commercial plantation. However, as agricultural lands become increasingly valuable, eucalyptus cultivation extends to marginal areas with low rainfall and poor soil water retention capacity. This causes newly transplanted plants to be consistently exposed to water deficits [
3], which are a leading cause of post-transplanting mortality [
4].
To ensure the survival and robust growth of young eucalyptus plants in regions with low rainfall, especially during the critical initial weeks after planting, irrigation becomes indispensable [
5,
6]. Localized irrigation within planting holes emerges as an alternative [
7]. However, this irrigation approach to newly transplanted eucalyptus seedlings can impose significant financial burdens on forestry enterprises due to its labor-intensive nature and the need for water transport using trucks or adapted tractors [
8]. Automation thus becomes imperative to reduce costs, ensure viability, and save water resources [
9,
10].
Currently, the irrigation automation systems used by Brazilian companies for eucalyptus seedlings rely on an RGB camera and a filter that activates irrigation whenever any “green” object enters the camera’s field of view. This system was implemented because green is one of the most common colors in the spectrum of objects in a eucalyptus plantation, especially in seedlings, which are the irrigation targets. However, this approach leads to issues, such as the unnecessary irrigation of weeds, resulting in inefficiencies and resource waste.
Automating localized irrigation for young eucalyptus plants using adapted tractors requires effective real-time object detection in captured images during processing. Artificial vision techniques enable object identification and localization through a technique called object detection [
11], offering the potential to streamline and optimize irrigation, thereby cutting manual intervention and resource expenditure.
The main methods for object detection rely on machine learning techniques, particularly deep artificial neural networks (ANN) [
12]. These methods have been successfully used in the detection and classification of objects [
13]. Two-stage detectors, such as the faster region-based convolutional neural network (R-CNN) [
14], are widely recognized for their high accuracy in various computer vision applications. However, these models present significant drawbacks when applied to tasks that demand real-time processing, such as plant detection in agricultural scenarios, due to the longer processing time and computational complexity involved [
15].
Among the various ANNs available, you only look once (YOLO) stands out for balancing detection speed and quality, establishing itself as a cutting-edge technology for real-time object recognition [
16,
17]. With its high detection capability, YOLO is being successfully used in various agricultural applications, including the detection of fruits [
18,
19], animals [
20,
21], plant diseases [
22,
23], weeds [
24,
25], and roots [
26]. Given its characteristics, YOLO has the potential for the fast and precise real-time detection of newly transplanted eucalyptus plants. Thus, it could be applied to identify these young eucalyptus plants in images acquired by proximal cameras, allowing for real-time identification in automated irrigation systems.
The evolution of YOLO, from earlier versions like YOLOv4 to more advanced models such as YOLOv5 [
27] and YOLOv8 [
28], highlights significant improvements in efficiency and accuracy in complex agricultural environments. While earlier versions faced challenges in terms of accuracy and robustness, especially under field conditions, the more recent versions have overcome these limitations, maintaining the speed required for real-time detection and allowing for more effective integration with embedded systems [
29,
30,
31].
The objective of this study is to use images of newly transplanted eucalyptus plants under field conditions to build and train two real-time detection models using YOLOv8 and YOLOv5 neural networks. The performance of these models will be compared using key evaluation metrics to select the best option as a support tool for automating localized irrigation of young eucalyptus plants through mechanized systems.
2. Materials and Methods
This study used a dataset comprising top-view images of young eucalyptus plants obtained from commercial plantations at the Gramadão Farm in Selvíria, Mato Grosso do Sul, Brazil. The choice of the Gramadão Farm in Selvíria, Mato Grosso do Sul, is justified by the characteristics of the region, which faces water scarcity and irregular rainfall, which accentuates the demand for automated mechanized irrigation systems to ensure the survival and efficient development of eucalyptus seedlings after transplantation. The location of the study site is indicated on the map shown in
Figure 1.
Figure 2 shows the steps used to develop the young eucalyptus plant detection models using YOLOv8 and YOLOv5. It provides a comprehensive and structured overview of the steps and techniques employed, facilitating an understanding of the methodologies and strategies used for detecting eucalyptus plants under field conditions.
Images obtained under field conditions were used as the foundation for building and training two customized models capable of real-time detection of young eucalyptus plants using the YOLOv8 and YOLOv5 neural networks. The dataset consisted of 10,000 images featuring diverse eucalyptus plants that were newly transplanted and featuring three or more pairs of leaves. These images were captured by an operator using a Samsung A20s smartphone with a 13MP rear camera across five different fields. Obtained within the first week after transplanting during localized irrigation in the planting holes, these images represent a critical stage in the eucalyptus plants’ development.
Figure 3 shows a visual representation of the varied conditions observed among the eucalyptus plants and clones during image acquisition in the field.
Figure 3a–h present images captured at a height of 1 m and an angle of 90 degrees, showing different eucalyptus clones and the observed plant conditions in the field. These images were used for the model training process.
Figure 3g–h show eucalyptus plants equipped with sunscreen, which helps in mitigating the effects of excessive sun exposure on leaf tissues.
The dataset was divided into two databases as follows: database 1 comprised 5000 images of eucalyptus plants captured under cloudy day conditions (
Figure 4a–d) and 5000 images captured under sunny day conditions (
Figure 4e–h). Then, the images were combined in a single database and divided into training and testing sets in an 80% and 20% ratio, resulting in the training set consisting of 8000 images and the testing set containing 2000 images. In addition, the testing set was arranged in a randomized manner to maintain consistent proportions of images captured under different lighting conditions, as shown in
Figure 4.
Figure 4a–d shows eucalyptus plants captured on cloudy days with cloudy skies, whereas
Figure 4f–h shows eucalyptus plants captured under conditions of sunny days and clear skies.
In the subsequent step, the images of young eucalyptus plants were labeled using the open-source software LabelImg version 1.4.0 [
32], which is a graphical annotation tool that enables the user to demarcate the regions of interest for YOLO and other convolutional neural networks. The models developed using YOLOv8 and YOLOv5 were trained using the NVIDIA A100-SXM4-40GB graphics card within the Google Colab Pro environment [
33].
The hyperparameters of the YOLOv8 and YOLOv5 models were configured based on the default values recommended by Ultralytics [
34]. To ensure a fair comparison between the models, the number of training epochs was set to 600, the input image size was set to 640 pixels, and the batch size was set to 16 in both models. The choices of image size and batch size were made to optimize detection, balancing accuracy, and computational efficiency, according to Ultralytics guidelines. The number of epochs was set to 600 based on preliminary tests. For epochs superior to 600, the gains became marginal, indicating that the model’s learning limit was reached with the available data.
The hyperparameters were kept consistent across both versions of YOLO to ensure a fair and accurate comparison. This approach minimizes the significant impact of varying hyperparameter values on model performance, ensuring that any observed differences in performance are due to the intrinsic differences between the models rather than differences in the hyperparameter configurations. Following hyperparameter configuration and implementation, the models were trained individually. Initially, they were trained using their initial weights, which were pre-trained on a separate dataset. Later, the models were fine-tuned with adjusted weights to better fit with the specific characteristics of the eucalyptus plant dataset.
Using initial weights for training optimization, a technique known as transfer learning is a common practice in initializing YOLO network weights. This procedure allows for performance gains by leveraging the knowledge gained from previously trained models with a large volume of data [
35]. The use of initial weights from pre-trained models, such as YOLOv8n.pt for YOLOv8 and YOLOv5n.pt for YOLOv5, exemplifies transfer learning to optimize neural network training. This approach initializes the weights with values that already incorporate knowledge from large volumes of data, facilitating convergence and potentially improving final performance. This accelerates training and can result in higher accuracy in detection and classification due to the wealth of information already contained in the transferred weights [
36].
For the initial weights, the YOLOv8n.pt (YOLOv8 nano) model was used for YOLOv8 training, while the YOLOv5n.pt (YOLOv5 nano) model was used for YOLOv5 training. These nano models, YOLOv8n.pt and YOLOv5n.pt, are ideal for implementation in onboard systems, offering significant advantages over the original models for real-time detection purposes on mobile platforms. Optimized for operation on resource-constrained devices, such as onboard systems of agricultural machinery, these models maintain the balance between computational resource requirements and the precision and speed of object detection [
37].
Training progress was conducted individually, adopting partial weights for each training resumption. To evaluate and compare the performance of the models, the following metrics were used: precision (P), recall (R), mAP-50, and mAP50-95 [
38].
Precision (P), as described in Equation (1), represents the proportion of true positives (TP) relative to the total number of positive predictions made by the model. In other words, precision indicates how effective the model is at avoiding false positives (FP). A high precision value means that most of the detections made by the model are correct.
Recall (R), as described in Equation (2), represents the proportion of true positives (TP) relative to the total number of actual positive instances. Recall is a crucial metric for measuring the model’s ability to detect all positive instances and is sensitive to false negatives (FN). High recall means that the model can identify most of the true instances.
In Equations (1) and (2), true positive (TP) corresponds to the number of positive instances that were correctly predicted. False negative (FN) represents the number of positive instances that were incorrectly predicted as negative, and false positive (FP) denotes the number of negative instances that were incorrectly predicted as positive.
The mean average precision at 50 (mAP-10) is a metric that calculates the average precision at an intersection over union (IoU) threshold of 0.5. This means that an object is considered correctly detected if the overlap between the predicted bounding box and the true bounding box is greater than or equal to 50% [
39]. A high mAP-50 value indicates that the model performs well in object detection, even if the exact match between the prediction and the true reference is not strict.
The mean average precision at 50 to 95 (mAP50-95), in turn, is calculated in intervals of 0.05, increasing the IoU threshold from 0.5 to 0.95. This metric is more stringent, requiring greater precision for an object to be considered correctly detected, with the necessary overlap varying from 50% to 95%. This evaluates not only the model’s ability to detect objects but also its ability to do so with high precision and accuracy in locating the object in the image [
40].
In addition to these metrics, the confidence score was used to evaluate the performance of the models in classifying the images. This score is expressed as a numerical value, typically between 0 and 1, where values closer to 1 indicate greater confidence in the detection. In essence, the confidence score measures the model’s certainty that an object was found and classified correctly [
41]. The higher the confidence score, the higher the probability of the detection being correct, making this an additional tool for evaluating the robustness of the model’s predictions [
42].
3. Results
The results revealed that both models performed well in detecting newly transplanted eucalyptus plants, with precision and recall above 90%. The performance difference between the two models was below 4.5% for the evaluated metrics. In particular, the differences were noticeable at the beginning of training, with the YOLOv5 model initially having an advantage, although this difference narrowed over time as training advanced. Both models were trained for 600 epochs on the 8000 images of the eucalyptus plant dataset.
Figure 5 shows a comparison of the precision (P), recall (R), mAP50, and mAP50-95 metrics between the YOLOv8 and YOLOv5 models. Both models were trained for 600 epochs on the 8000 images of the eucalyptus plant dataset.
The results of the precision metric (
Figure 5a) reveal a clear distinction between the performance of the YOLOv8 and YOLOv5 models. It was observed that YOLOv5 consistently showed a higher precision than YOLOv8 throughout much of the training process. Particularly, YOLOv5 showed outstanding performance during specific intervals, such as between epochs 210 and 405 and between epochs 460 and 590, when its precision was up to three percentage points higher than that of YOLOv8. However, despite initial differences, both models eventually stabilized, maintaining precision levels between 94% and 95% after reaching 550 epochs.
To complement this analysis,
Figure 5b compares the recall (R) metric between the same models after 600 training epochs on the dataset consisting of 8000 images of eucalyptus plants. This provides a more comprehensive perspective on their performance across different evaluation metrics, allowing for a deeper understanding of their overall capabilities. In terms of the recall metric, the YOLOv5 model consistently performed better than YOLOv8 across most training epochs, with the differences being particularly distinct in the initial epochs. In addition, from epoch 150 onward, YOLOv5 shows greater stability during training than YOLOv8. Despite this, both models had similar recall rates ranging between 93% and 95% after 550 epochs.
When analyzing the mAP50 metric (
Figure 5c), it becomes apparent that the YOLOv5 model showed better performance than YOLOv8 for most of the training duration, with a greater difference in the first epochs. However, as the number of epochs increases, the performance difference between the models decreases. YOLOv5 features a smoother and more consistent curve over the number of epochs, while YOLOv8 features a more unstable curve. Nevertheless, both models maintained a high performance, with mAP50 scores of 97% upon reaching 495 epochs. It also appears that in terms of the mAP50-95 metric (
Figure 5d), the comparison between YOLOv5 and YOLOv8 shows a slightly different trend compared to mAP50. Initially, the YOLOv5 model held a slight advantage over YOLOv8 in the first training epochs, but this advantage decreased as the number of epochs increased. Interestingly, from epoch 105 onwards, YOLOv8 showed better performance, achieving a difference of up to four percentage points over YOLOv5 when both models reached 600 epochs. This suggests that while YOLOv5 may have had an initial advantage in certain aspects, YOLOv8’s performance gradually improved and eventually exceeded that of YOLOv5 in terms of mAP50-95 metric over extended training periods.
Table 1 shows the numbers of true positives and false negatives observed in the confusion matrix of models developed using YOLOv8 and YOLOv5, with a focus on the predicted class “eucalyptus plant”. These observations are derived from a testing set of 2000 images after 600 training epochs.
It is seen that both YOLOv8 and YOLOv5 models showed satisfactory performance in identifying the “Eucalyptus Plant” class. YOLOv8 achieved a commendable 96% accuracy in correctly identifying this class; however, it misclassified 4% of the positive samples as negative. No values were found in the confusion matrix, in which negative samples were incorrectly classified as positive.
In comparison, the YOLOv5 model achieved a 95% accuracy in correctly classifying positive samples. However, 5% of the positive samples were incorrectly classified as negative. Similar to YOLOv8, no negative samples were incorrectly classified as positive by the YOLOv5 model.
Furthermore, the confidence scores for both the YOLOv8 and YOLOv5 models were compared in classifying eight images of eucalyptus plants. The higher the confidence score value, the more confidence the model has in the correctness of its detection.
Figure 6 shows the results of the confidence score for the classification performed by both models on the same set of eight images over 600 training epochs. In the case of the YOLOv8 model, a confidence score of 90% was obtained in the classification of seven out of the eight images, while one image was classified with a confidence score of 80%. In contrast, the YOLOv5 model obtained a confidence score of 100% in the classification of two of the eight images, while the remaining six were classified with a confidence score of 90%.
The results presented in
Table 2 refer to the performance evaluation of the object detection models, YOLOv8 and YOLOv5, after being trained for 600 epochs on the 8000 images of eucalyptus plants dataset. YOLOv8 obtained an architecture with 168 layers and 8.1 giga floating-point operations per second (GFLOPs). During testing with 2000 images, the model detected 2018 instances of eucalyptus plants with a precision (P) of 0.958, meaning that 95.8% of all positive detections were correct. The detection of 2018 instances within 2000 images suggests that multiple eucalyptus plants were detected in some images. The recall rate (R) was 0.935, denoting that the model correctly detected 93.5% of the instances of eucalyptus plants in the images. The mAP50 value was 0.974, indicating that the model achieved an average precision of 97.4% considering an IoU threshold of 0.5. Meanwhile, the mAP50-95 value was 0.836, representing the average precision considering an IoU threshold ranging from 50% to 95%. During validation, the model was processed at a rate of 3.68 iterations per second. In contrast, YOLOv5 was constructed with an architecture consisting of 206 layers and 4.2 GFLOPs. During testing with the same 2000 images, the model detected 2018 instances of eucalyptus plants with a precision of 0.951 and a recall of 0.944. The mAP50 value was 0.972, and the mAP50-95 value was 0.791. The model was processed at 4.10 iterations per second during validation.
Figure 7 shows the results of confidence assessments for detecting eucalyptus plants conducted on validation videos captured in different plots. These videos were obtained by moving the camera perpendicularly to the planting rows, positioned at a height of 1 m, simulating the movement of a tractor at low speed in third gear.
When comparing the performance of both the YOLOv8 and the YOLOv5 models on the same set of images featuring eucalyptus plants, it was observed that their performances varied. In some instances, both models achieved similar confidence scores; in other cases, YOLOv8 scored higher than YOLOv5. This variability indicates how model performance can fluctuate depending on each plant’s specific positions, orientations, and field conditions. It reflects the inherent complexity and challenges associated with image recognition in machine learning applications.
4. Discussion
In the comparison of the precision (P) metric between the YOLOv8 and YOLOv5 models, it is apparent that despite initial differences in performance, both models achieved a similar level of precision, consistently ranging between 94% and 95% after 600 epochs (
Figure 5a). This precision range is deemed satisfactory across various application contexts, indicating the capability of both models to precisely detect eucalyptus plants under field conditions.
Similarly, the recall rates between 93% and 95% achieved by both models during the first 600 training epochs are considered satisfactory performance in most object detection scenarios. This indicates their proficiency in correctly detecting most of the eucalyptus plants in the test data, which is important, as recall measures the model’s ability to identify all positive examples of a given class.
A mAP50 of 97% achieved by both YOLOv8 and YOLOv5 models after 600 training epochs indicates good performance in the detection of eucalyptus plants. A high mAP50 close to 100% is desirable in object detection tasks, affirming the effectiveness of the models in performing the proposed task.
Based on the observed values, an mAP50-95 of 79% for YOLOv5 and 83% for YOLOv8 upon reaching 600 training epochs is deemed a satisfactory performance for object detection. This implies the model’s proficiency in correctly locating and identifying eucalyptus plants, even amidst bounding box overlap. While mAP50-95 values above 70% generally indicate good detection capability, it is noticeable that the performance of the YOLOv5 model decreases considerably in the more stringent scenario of mAP50-95 compared to YOLOv8, echoing findings from a previous study conducted by Mercaldo et al. [
43].
Based on the values of true positives and false positives from the confusion matrix (
Table 1), the YOLOv8 model performed slightly better than YOLOv5 in correctly identifying instances of the “eucalyptus plant” class, achieving a 96% classification accuracy compared to YOLOv5’s 95%. This slight advantage of YOLOv8 could be related to the difference in network architectures and final weight configurations. While both models are based on the YOLO architecture, they have minor differences in network configurations, leading to subtle differences in performance. Moreover, it is known that adjustments in the weights of a network can exert a significant impact on model performance. The final weight adjustments of YOLOv8 were likely more suitable for the specific task of detecting eucalyptus seedlings, resulting in superior performance.
Figure 6 shows the satisfactory performance of both YOLOv8 and YOLOv5 models in image classification, with high confidence scores for most images. For YOLOv8, seven of the eight images were classified with a robust 90% confidence score, indicating the model’s high degree of confidence in predictions. Even the single image classified with an 80% confidence score indicates a slight decrease in confidence, although the model still maintained a relatively high level of certainty.
On the other hand, YOLOv5 obtained perfect confidence scores (100%) in two of the eight images, demonstrating maximum confidence in the predictions for these specific images. The remaining six images were classified with a solid 90% confidence score, also indicating high confidence in the predictions. These results are consistent with a study conducted by Rahman et al. [
44], who reported the considerable potential of YOLO models, especially YOLOv5n, for real-time weed detection.
The specific reasons behind the lower confidence scores observed in the latest version of YOLOv8 compared to YOLOv5 are yet to be investigated. Although YOLOv8 performs noticeably better during the training and validation stages, YOLOv5 shows the ability to predict a higher number of images with a higher confidence level. This enhanced detection proficiency in YOLOv5 over YOLOv8 aligns with findings from studies conducted by Yang et al. [
45]. However, it is important to note that comparing results solely based on a limited number of images may not be sufficient to ascertain the best model. Hence, a comprehensive evaluation should be performed across the entire set of 2000 test images to draw reliable conclusions regarding the superiority of one model over the other.
Referring to
Table 2, YOLOv8 achieves a precision of 95.8%, recall of 93.5%, and mAP50 of 97.4%. In comparison, YOLOv5 attains a precision of 95.1%, recall of 94.4%, and mAP50 of 97.2%. These results are consistent with those reported by Santana et al. [
46], who obtained a precision of 93.3%, recall of 55.5%, and mAP50 of 77.7% in coffee plant detection using YOLOv3 after 1000 training epochs. In another study by Wiggers et al. [
47], a precision of 84.8%, recall of 89.00%, and mAP50 of 65.2% were obtained when detecting bean plants using the YOLOv4 model. These comparisons show the better performance of both YOLOv8 and YOLOv5 over previous versions, as evidenced by their higher precision, recall, and mAP50.
When comparing YOLOv8 and YOLOv5, a possible reason for the difference in results is the architecture of these models. YOLOv8 has a more complex architecture than YOLOv5, allowing it to capture a large volume of data and finer details relevant to detecting the “eucalyptus plant” class. In addition, YOLOv8 has been trained for a longer duration due to its complexity, which potentially contributes to its slightly superior performance over YOLOv5 across metrics such as precision, mAP50, and mAP50-95. This suggests that YOLOv8 may be more efficient for the specific task in question if hardware limitations are not a concern.
More complex models require greater processing power and computational resources for real-time inference. If the hardware used is not powerful enough, it may impose constraints on speed and performance, resulting in slower inference times [
48,
49].
In performance analysis, YOLOv8 has a lower number of layers and a higher number of GFLOPs than YOLOv5. This indicates that YOLOv8 has greater processing power since the greater the number of GFLOPs, the greater the number of floating-point operations performed, consequently requiring a heavier computational load during inference [
50].
YOLOv5 showed faster processing, performing 4.10 iterations per second, while YOLOv8 processed at 3.68 iterations per second. In addition, YOLOv5 has a lower GFLOP value than YOLOv8. These differences indicate that YOLOv5 potentially has an advantage in computational efficiency, performing more tasks per second with a lesser processing load.
In accordance with the results of Miglionico et al. [
51], the YOLOv5n model is the fastest compared to other versions, such as YOLOv5l and YOLOv5x, and also demonstrates superior speed performance compared to the YOLOv4 series versions. These findings align with the present study, which similarly indicates that YOLOv5n is faster than YOLOv8. The simpler and more efficient architectural design of YOLOv5n in terms of processing contributes to its faster processing capabilities. This means that YOLOv5 can process information faster with fewer computational resources, making it particularly advantageous in situations in which real-time detections are essential or when operating on devices with limited resources, like onboard systems in agricultural machinery. In addition, the lower power consumption of YOLOv5 provides an additional advantage in mobile device applications. Its design prioritizes speed optimization, which makes YOLOv5 a faster option for real-time object detection tasks or analyses of large datasets [
52].
When analyzing the results presented in
Figure 7, it is evident that the positioning of the plants significantly influences the confidence scores of both YOLOv8 and YOLOv5 models in detecting eucalyptus plants. Well-positioned plants generally produced higher confidence scores than plants in non-ideal positions, indicating a need to improve the model’s robustness for such scenarios. For example, well-positioned plants, namely, plants 1 and 3, produced high scores, with YOLOv8 achieving 93% and 94% confidence and YOLOv5 obtaining 93% and 88% confidence, respectively. On the other hand, plants in lying or crooked positions, namely, plants 2 and 4, produced a considerable decrease in scores for both models, with YOLOv8 achieving 89% and 77% confidence and YOLOv5 obtaining 85% and 67% confidence, respectively.
These observed differences in the detections performed by the YOLOv8 and YOLOv5 models on the same set of images illustrate the diversity in their representations. YOLOv8, being a newer version, uses more advanced feature extraction techniques, which results in more robust and consistent predictions, even under variable conditions of shade, illumination, or positioning of the eucalyptus seedlings. This suggests that YOLOv8 is more efficient in capturing relevant visual features, reflecting its superiority in certain contexts. Conversely, YOLOv5, while effective, shows more significant fluctuations in confidence scores, indicating a more conservative approach and sensitivity to small variations in the appearance of plants. Such differences have direct implications for the practical application of the models, especially in automated mechanized irrigation systems, in which the accuracy and consistency of detections are crucial for plant survival. Therefore, the choice of model should consider the balance between robustness and risk minimization according to the specific needs of the application.
To address this problem and improve the detection of plants in non-ideal positions, it is recommended to augment the dataset with additional training images that represent these specific situations [
53]. This augmentation would allow both YOLOv8 and YOLOv5 models to correctly recognize and classify lying or crooked plants, consequently resulting in high confidence scores for such instances.
In addition to performance metrics, such as precision and recall, the comparative analysis between YOLOv8 and YOLOv5 revealed significant differences in the models’ behavior during training and inference. YOLOv8, with a more complex architecture and greater processing capacity, demonstrated better performance in metrics such as mAP50-95, suggesting a superior ability to handle bounding box overlap and more rigorous detection conditions. However, YOLOv5 showed greater computational efficiency, performing more iterations per second, which can be advantageous in applications in which processing time is critical, such as embedded systems in the field. This highlights that while YOLOv8 has an advantage in precision, YOLOv5 may be more suitable for scenarios with hardware constraints.
Additionally, the results indicated that although both models were effective in detecting eucalyptus seedlings, there is room for improvement, particularly in robustness in more complex field situations, such as detecting plants in irregular positions. These irregular positions refer to seedlings that may be tilted, lying down, partially covered by soil or vegetation, or located on uneven terrain, which can complicate detection. A possible solution would be to increase the diversity of training images representing these conditions, thereby enhancing the models’ accuracy in more challenging scenarios.
In summary, both models performed well in terms of precision, recall, and mAP-50 performance metrics, although there was room for improvement in the mAP50-95 metric. YOLOv8 reveals a slight advantage over YOLOv5 in most evaluated metrics. Nonetheless, YOLOv5 had faster processing times, presenting a notable advantage over YOLOv8.
The YOLOv8 and YOLOv5 models studied show significant potential for practical application in tractor-based systems for the automated detection and irrigation of eucalyptus seedlings after transplantation. The implementation of these models can revolutionize plantation management, enabling the accurate real-time identification of newly transplanted seedlings. This allows for targeted and efficient irrigation, reducing the need for labor and saving water. Additionally, by ensuring that each seedling receives the necessary irrigation, these systems contribute to increasing the survival rate of plants, minimizing losses that would occur due to a lack of adequate irrigation. This approach not only improves operational efficiency but also promotes sustainability in forest management, aligning with current demands for smarter and more economical agricultural practices.