Next Article in Journal
Effect of Conservation Management on Oxisol in a Sugarcane Area Under a Pre-Sprouted Seedling System
Previous Article in Journal
Identification and Correlation Analysis of Key Clostridia and LAB Species in Alfalfa Silages Prepared with Different Cultivars and Additives
Previous Article in Special Issue
Improved Tomato Leaf Disease Recognition Based on the YOLOv5m with Various Soft Attention Module Combinations
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

YOLOv8-GABNet: An Enhanced Lightweight Network for the High-Precision Recognition of Citrus Diseases and Nutrient Deficiencies

1
College of Electronic Engineering (College of Artificial Intelligence), South China Agricultural University, Guangzhou 510642, China
2
Division of Citrus Machinery, China Agriculture Research System, Guangzhou 510642, China
3
Guangdong Engineering Research Center for Monitoring Agricultural Information, Guangzhou 510642, China
*
Author to whom correspondence should be addressed.
Agriculture 2024, 14(11), 1964; https://doi.org/10.3390/agriculture14111964
Submission received: 14 September 2024 / Revised: 30 October 2024 / Accepted: 31 October 2024 / Published: 1 November 2024
(This article belongs to the Special Issue Machine Vision Solutions and AI-Driven Systems in Agriculture)

Abstract

:
Existing deep learning models for detecting citrus diseases and nutritional deficiencies grapple with issues related to recognition accuracy, complex backgrounds, occlusions, and the need for lightweight architecture. In response, we developed an improved YOLOv8-GABNet model designed specifically for citrus disease and nutritional deficiency detection, which effectively addresses these challenges. This model incorporates several key enhancements: A lightweight ADown subsampled convolutional block is utilized to reduce both the model’s parameter count and its computational demands, replacing the traditional convolutional module. Additionally, a weighted Bidirectional Feature Pyramid Network (BiFPN) supersedes the original feature fusion network, enhancing the model’s ability to manage complex backgrounds and achieve multiscale feature extraction and integration. Furthermore, we introduced important features through the Global to Local Spatial Aggregation module (GLSA), focusing on crucial image details to enhance both the accuracy and robustness of the model. This study processed the collected images, resulting in a dataset of 1102 images. Using LabelImg, bounding boxes were applied to annotate leaves affected by diseases. The dataset was constructed to include three types of citrus diseases—anthracnose, canker, and yellow vein disease—as well as two types of nutritional deficiencies, namely magnesium deficiency and manganese deficiency. This dataset was expanded to 9918 images through data augmentation and was used for experimental validation. The results show that, compared to the original YOLOv8, our YOLOv8-GABNet model reduces the parameter count by 43.6% and increases the mean Average Precision (mAP50) by 4.3%. Moreover, the model size was reduced from 50.1 MB to 30.2 MB, facilitating deployment on mobile devices. When compared with mainstream models like YOLOv5s, Faster R-CNN, SSD, YOLOv9t, and YOLOv10n, the YOLOv8-GABNet model demonstrates superior performance in terms of size and accuracy, offering an optimal balance between performance, size, and speed. This study confirms that the model effectively identifies the common diseases and nutritional deficiencies of citrus from Conghua’s “Citrus Planet”. Future deployment to mobile devices will provide farmers with instant and precise support.

1. Introduction

Citrus diseases pose a significant threat to the growth of citrus, affecting not only yield and quality but also potentially leading to total crop failure in entire citrus regions. Misdiagnosis or untimely pest control measures after an outbreak can lead to the spread of diseases, a significant reduction in yield, and the degradation of quality. Moreover, the indiscriminate use of pesticides not only causes pollution of water bodies, soil, and air, but may also reduce biodiversity, affect the food chain, increase human health risks, and cause substantial economic losses [1]. On the other hand, nutritional deficiencies in citrus also jeopardize healthy growth, directly affecting the physiological functions of the plants and thereby reducing the commercial value of the fruits. Therefore, it is of great practical significance to develop an intelligent, cost-effective, and highly accurate method for identifying citrus diseases [2].
Traditional methods for identifying citrus diseases and nutritional deficiencies predominantly involve on-site assessments by agricultural and forestry experts or rely on farmers’ experiential judgments. These approaches are inherently subjective, time-consuming, and inefficient. Recently, the application of computer vision technology in disease detection has gained popularity [3]. Its capacity for precise disease-type identification lays the groundwork for automatic and accurate disease detection [4]. Traditional machine learning techniques manually design features such as color, texture, and shape for crop disease classification [5,6,7]. For instance, Zhang et al. [8] extracted 38 features using HIS (Hue–Saturation–Intensity), YUV (“Y” represents Luma, and “U” and “V” represent chroma), and grayscale models, employing a support vector machine (SVM) classifier to identify three apple leaf diseases with over 90% accuracy. Similarly, Soarov et al. [9] utilized Otsu threshold segmentation and histogram equalization for image processing, achieving a 96% accuracy rate in apple leaf disease identification with an SVM. Zhang et al. [10] applied the K-means clustering algorithm to segment images and extract shape and color features, effectively recognizing seven major cucumber diseases with an overall accuracy of 85.7%. However, traditional machine learning methods struggle with robustness in complex scenarios, failing to meet the demands of such environments.
With advancements in deep learning technology, numerous network models have been progressively applied to disease and nutritional deficiency detection. Object detection algorithms based on deep learning are categorized into one-stage and two-stage methods. The two-stage algorithms, such as the R-CNN (Region-based Convolutional Network) [11], Fast R-CNN [12], and Faster R-CNN [13], initially generate candidate frames and then classify these frames. These models have demonstrated impressive results on soybean leaves [14], apple leaves [15], and tea leaves [16]. However, their detection speed is relatively slow. In contrast, single-stage algorithms like SSD (Single Shot MultiBox Detector) [17] and the YOLO [18] (You Only Look Once) series predict the entire image without generating candidate frames. The YOLO series, noted for its efficiency, real-time processing, and robustness, has gained significant attention. Unlike the Faster R-CNN, YOLO models integrate traditional target detection with location and classification, significantly enhancing model inference speed and suitability for complex environments.
Although YOLO models have introduced effective solutions to several issues [19,20], existing deep learning-based models for citrus disease and nutritional deficiency detection still face challenges in recognition accuracy, complex backgrounds, occlusion, and lightweight design. Extensive experimentation has been conducted to optimize YOLO models. Liu Jun et al. [21] enhanced the image pyramid’s feature layer in the YOLOv3 model, improving both the detection accuracy and speed, and enabling the rapid and precise detection of tomato diseases. Additionally, Xuewei Wang et al. [22] proposed a novel YOLO-ensemble method, incorporating densely connected modules, the K-means algorithm, and altered training strategies, which further improved accuracy and speed, successfully addressing tomato anomaly detection with 96.41% accuracy and a 20.28 ms processing time. Despite the relative maturity of the YOLOv3 detection scheme, its computational complexity remains a limitation. Kangshun Li et al. [23] proposed a fast and lightweight passion fruit pest detection algorithm based on an improved YOLOv5 model, enhancing model robustness through a hybrid data input method and adding a CBAM module to the network’s Neck section along with a PLDIoU loss function on the output side, thereby boosting detection speed and capability for small target pests, achieving 96.51% accuracy. Despite these advancements, YOLOv5 has shown limitations in detecting small targets and generalizing from Mosaic data enhancements, requiring extensive training data to achieve high accuracy. Soeb et al. [24] developed a tea pest and disease dataset, noting that YOLOv7 performed optimally in target detection and recognition. Deng et al. [25] enhanced YOLOv5 and YOLOv7-tiny for mobile terminals, enabling the rapid and efficient field diagnosis of six common pests and diseases. Recent research has demonstrated that the YOLOv8 model excels in object detection tasks within complex agricultural settings. A study by Solimani et al. [26]. on tomato plants utilized YOLOv8 to address the challenge of imbalanced sample distribution, effectively enhancing detection accuracy for tomato flowers, fruits, and nodes. Additionally, Xu et al. [27] employed YOLOv8 for real-time analysis of diseased leaf images at a rate of 70 FPS. In summary, numerous excellent modules and network structures have emerged for the identification of various crop diseases, pests, and nutritional deficiencies. However, these methods often suffer from issues such as insufficient accuracy, simplistic backgrounds, and excessively large model sizes. Thus, there is a need to further refine these methods to accurately identify citrus diseases and nutritional deficiencies in complex settings.
This study, located at “Citrus Planet” in Conghua District, Guangzhou, Guangdong Province, China, proposes a lightweight model based on an improved YOLOv8n—YOLOv8-GABNet model, focused on enhancing the detection performance of field citrus diseases and deficiencies. The model improves the intelligence level of citrus disease and deficiency recognition by optimizing accuracy and processing speeds. The research results are not only applicable to “Citrus Planet” but can also serve as a reference for disease and deficiency management in similar agricultural environments globally.

2. Materials and Methods

2.1. Materials

2.1.1. Data Acquisition

Diseased plants typically exhibit visible marks or damage on their leaves, stems, flowers, or fruits. Each disease condition often presents a distinct visible pattern, which can be used to diagnose abnormalities uniquely. Notably, plant leaves serve as the primary source for identifying diseases, as they are the first to manifest most symptoms. Consequently, this paper focuses on citrus-diseased leaves as the research subject [28].
The diversity of citrus diseases and the high randomness of their occurrence present significant challenges, as there are currently few open-source citrus disease databases, making it difficult to collect extensive data. Although Dai et al. proposed an improved model, FastGAN2, based on FastGAN for generating images of citrus diseases [29], the backgrounds of these generated images lack complexity. The data collection site is located at No. 888, Lutian Village, Xitang Fairy Tale Town, Aotou Town, Conghua District, Guangzhou, Guangdong Province, known as “Citrus Planet”. Research on citrus magnesium and manganese deficiencies is of significant importance in Guangdong. Due to the prevalent acidic soil in the Guangdong region, such soil conditions can lead to manganese deficiency, affecting the growth and health of citrus [30]. Simultaneously, the problem of magnesium deficiency is becoming increasingly severe globally, especially in China’s major citrus-producing areas like Guangdong, where imbalanced fertilization practices (such as the excessive use of nitrogen, phosphorus, and potassium) exacerbate this issue [31]. Therefore, this paper studies three common citrus diseases—anthracnose, canker, and yellow vein disease—as well as two common deficiencies: magnesium and manganese deficiencies. Based on disease characteristics, we observed and classified diseases in images taken by mobile phones, and used the annotation tool LabelImg to label each disease as the corresponding category. Within the tool, the rectangular box tool was utilized to draw bounding boxes around each diseased leaf, ensuring that the boundaries were as tight as possible to cover the entire leaf. Additionally, the information for each annotated bounding box (including the position coordinates and category labels) was saved in the standard VOC format, facilitating its use in subsequent model training. The dataset encompasses various scenes and diverse shooting angles and distances, providing a comprehensive representation of each disease type, as illustrated in Figure 1.
The images, captured with an iPhone 14 (Apple, Cupertino, California, USA), have a resolution of 1920 × 1080 pixels and were taken with a camera that supports up to 8 megapixels. The dataset comprises 1102 citrus images featuring leaves affected by various diseases amidst complex background elements such as occlusion, overlap, blurring, and small targets. The dataset was partitioned into a training set, a test set, and a validation set in an 8:1:1 ratio. All images were stored in JPG format. The results of the dataset partitioning are detailed in Table 1.

2.1.2. Data Enhancement

To enhance the generalization and robustness of our model, we employed various image enhancement methods to expand the citrus disease image dataset. This approach allowed for more effective training, enabling the model to adeptly handle diverse and complex scenes during testing. The dataset was augmented using eight different techniques: random adjustment of contrast [32], clipping [33], random rotation, Gaussian blur [34], salt-and-pepper noise [35], scaling [36], random cropping [37], and Mosaic data enhancement. Random contrast adjustment was utilized to reduce brightness inconsistencies that may arise from changes in ambient lighting or differences in camera sensors. clipping, also known as cutout, simulates the occlusion of objects by randomly selecting several square areas of fixed size and setting their values to zero, which helped the model adapt to partial object visibility. Random rotation enhanced the direction variability of the images, improving the model’s adaptability to varying image orientations. Additionally, Gaussian blur and salt-and-pepper noise were applied to imitate image degradation, thus enhancing the model’s ability to deal with background blur and fluctuations in image quality. Scaling and random cropping are techniques that enhance the model’s capability to recognize objects of small sizes or those that overlap, by altering the size and scale of the images. Lastly, Mosaic data enhancement diversifies the backgrounds of the detected objects, allowing the model to focus on generalized scenes and enhancing its ability to generalize to situations where citrus leaves or fruits may appear on branches, the ground, or on benches. Through the integrated application of these techniques, we significantly improved the model’s ability to generalize and maintain accuracy in a variety of conditions.
The effects of the data enhancement techniques are illustrated in Figure 2. The total number of images in the dataset was expanded to 9918, which included 8816 augmented images and 1102 original images. The number of labels after data augmentation is detailed in Table 2. The image samples were divided into training, testing, and validation sets, completing the overall construction of the dataset.

2.2. YOLOv8-GABNet Improvement

2.2.1. New Object Detection Algorithm: YOLOv8

YOLOv8 builds on the design strengths of its predecessors, including YOLOv5, YOLOv6, and YOLOX, culminating in an advanced object detection algorithm that offers high speed, accuracy, and scalability. The model is available in five scale versions—N, S, M, L, and X—tailored to different deployment platforms and application requirements. The network structure of YOLOv8 is depicted in Figure 3. A key innovation in its architecture is the replacement of the C3 module with the lightweight C2f module in the Backbone while retaining the SPPF module, which has been meticulously adjusted across different scale models to enhance performance. In the Neck section, the model continues to utilize the PAN design but removes the 1 × 1 downsampling layer. The Head section introduces a decoupled head structure to separate classification and detection tasks, transitioning from an anchor-based to an anchorless mode. Moreover, the model employs VFL Loss and DFL Loss + CIOU Loss for classification and regression, respectively, alongside TaskAligned Assigner strategies for label allocation. Collectively, these innovations enable YOLOv8 to perform real-time object detection in dynamic environments while maintaining high accuracy across datasets of various sizes, making it ideally suited for scenarios that demand rapid responses. Consequently, we selected YOLOv8 as the benchmark model.

2.2.2. Global to Local Space Aggregation Module: GLSA

The Global to Local Spatial Aggregation module (GLSA), comprising Global Spatial Attention (GSA) and Local Spatial Attention (LSA), integrates both global and local spatial features through distinct attention units. This dual attention mechanism focuses on reinforcing information pertinent to the optimization goal while suppressing extraneous details. The GLSA’s architecture, illustrated in Figure 4, employs a dual-flow design that effectively retains both local and non-local modeling capabilities, ensuring comprehensive spatial feature integration [38].
Additionally, we utilized separate channels to balance precision with computational resources. Specifically, Equation (1) partitions the 64-channel feature map { F i | i ( 2 , 3 , 4 ) } into two sets of feature maps F i 1 , F i 2 | i ( 2 , 3 , 4 ) .These sets are independently input into the Global Spatial Attention (GSA) module and the Local Spatial Attention (LSA) module, respectively. The outputs from these two attention units are subsequently merged using a 1 × 1 convolution layer.
F i 1 , F i 2 = S p l i t ( F i )
F i = C 1 × 1 ( C o n c a t ( G s a ( F i 1 ) , L s a ( F i 2 ) ) )
In Equation (2), G sa represents the Global Spatial Attention, and L sa denotes the Local Spatial Attention. The output feature, F i , is defined in the space R H 8 × W 8 × 32 . The GLSA module effectively extracts both global and local spatial features from the encoder, facilitating precise localization of both large and small targets.

2.2.3. Lightweight Subsampling Module: ADown

The ADown module, an innovative subsampling convolution block featured in YOLOv9 [39], is specifically designed for object detection tasks. Its network structure is illustrated in Figure 5. This module’s lightweight design significantly reduces the model’s parameter count, thereby enhancing operational efficiency in resource-constrained environments without compromising performance. While it reduces the spatial resolution of the feature map, the ADown module is engineered to retain as much image information as possible, ensuring accurate target detection. Moreover, the adaptability of ADown enables it to adjust to various data environments, optimizing performance, particularly in improving object detection accuracy. The flexibility and configurability of the module facilitate its integration into both the Backbone and Head sections of YOLOv9, allowing for synergistic enhancements with other technologies such as wavelet undersampling modules. Consequently, the ADown module not only optimizes computational efficiency but also augments the overall functionality of the model, establishing its prominence in the realm of real-time target detection.

2.2.4. Weighted Bidirectional Feature Pyramid Network: BiFPN

The Bidirectional Feature Pyramid Network (BiFPN [40]), first introduced in the EfficientDet model, enhances object detection feature fusion networks. By refining the traditional Feature Pyramid Network (FPN) and incorporating a bidirectional fusion path, BiFPN effectively processes multi-scale features and prevents the loss of high-level semantic information typically associated with unidirectional top–down fusion. Each feature layer within the BiFPN receives information from all other layers, fostering a more comprehensive and robust representation of features. Moreover, the BiFPN optimizes fusion efficiency by implementing a weighted feature fusion mechanism, which simplifies connections and minimizes the use of fusion nodes, thereby reducing both network complexity and computational demands. These enhancements not only bolster feature fusion performance but also significantly boost computational efficiency, rendering the BiFPN an essential component in object detection tasks, particularly in complex scenes with diverse object sizes.
The architecture of the BiFPN network is depicted in Figure 6. This network comprises multiple layers of feature maps, labeled P3 to P7, each representing features at various scales. Central to its design is the bidirectional fusion path, indicated by arrows of different colors in the diagram. The top–down path, illustrated by blue arrows, facilitates the fusion of upper features with lower features through upsampling, thereby enriching the lower layers. Conversely, the bottom–up path, shown with red arrows, enhances the features of the upper layers through downsampling. Additionally, purple arrows directly connect feature maps at the same level to maintain spatial coherence and contextual information, crucial for addressing complex scenes. Fusion points employ learned weights to balance the features from different paths and levels, optimizing fusion efficiency. This structure can be replicated as needed to amplify feature representation and enhance detection accuracy. The innovative design of the BiFPN significantly elevates the efficiency and detection capabilities of the Feature Pyramid Network by adeptly utilizing multi-scale information.
Given that input features at various resolutions contribute differently to output features, the BiFPN incorporates extra weights for each input to signify the importance of each feature in the network. This approach employs fast normalized fusion, a method highly akin to Softmax-based fusion techniques in terms of learning dynamics and accuracy. Specifically, as outlined in Equation (3), the fusion strategy introduces a weight ≥0, which is then processed through the ReLU activation function to ensure numerical stability during the fusion process. Moreover, the weights are normalized to ensure that each weight’s value remains between 0 and 1, thereby maintaining the standardization and consistency of the output. The efficacy of this method lies in its ability to deliver a performance comparable to that of traditional Softmax while simplifying the computational process, thus enhancing the practical applicability of the algorithm.
O = Σ i w i ε + Σ j w j I i
Taking the P6 output in Figure 6 as an example, Equations (4) and (5) are the methods of using this weighted feature fusion.
p 6 t d = C o n v ( w 1 · p 6 i n + w 2 · R e s i z e ( p 7 i n ) w 1 + w 2 + ε )
p 6 o u t = C o n v ( w 1 p 6 i n + w 2 p 6 t d + w 3 R e s i z e ( p 5 o u t ) w 1 + w 2 + w 3 + ε )
In Equation (4), p 6 t d represents the intermediate feature of layer 6 on the top–down path, while p 6 o u t in Equation (5) denotes the output feature of layer 6 on the bottom–up path. This method of feature construction is consistently applied across other layers. To enhance efficiency, deep separable convolutions [41,42] are utilized for feature fusion. Additionally, batch normalization and activation processing are incorporated following each convolutional layer to ensure optimal performance and stability of the network.

2.2.5. YOLOv8-GABNet Model

This paper presents YOLOv8-GABNet, a lightweight, high-performance model for citrus disease detection based on the YOLOv8n architecture. The model incorporates ADown lightweight subsampling convolutional blocks within the Backbone network, a Global to Local Spatial Aggregation (GLSA) module, and a weighted Bidirectional Feature Pyramid Network (BiFPN) in the Neck. As illustrated in Figure 7, the YOLOv8-GABNet model is composed of five key components: the Input, Backbone, Neck, Head, and Output.
In the input layer, various data enhancement techniques, such as random cropping and color dithering, are employed to improve the model’s ability to recognize disease characteristics under varying lighting and environmental conditions. These techniques help to emphasize disease-related features in the input images while suppressing irrelevant information, optimizing subsequent feature extraction and object detection processes. The Backbone, crucial for feature extraction, is enhanced by integrating the ADown module into the original YOLOv8 architecture. The ADown module reduces the number of parameters by utilizing lightweight subsampled convolution blocks while preserving the richness of image information, ultimately speeding up the model’s inference time. In the Neck, the GLSA module is first introduced. This module enhances the network’s ability to perceive diseases of varying sizes and forms in citrus images through its global and local attention mechanisms. Following this, the BiFPN structure is employed to efficiently process multi-scale features via bidirectional information flow and a weighted feature fusion mechanism. This approach prevents the loss of high-level semantic information and significantly improves feature utilization and detection accuracy. Each feature layer in the BiFPN can receive information from all other layers, leading to a more comprehensive and robust feature representation. The Head utilizes the feature map to accurately localize and classify targets.

3. Results and Discussion

3.1. Model Training Result

When using pretrained weights, the YOLOv8n model reaches optimal accuracy within 300 epochs. Similarly, the YOLOv8-GABNet model is set to undergo 300 epochs with the use of pretrained weights for a direct comparison against the original YOLOv8n model. The training results, including the Precision–Recall (PR) curves for both the YOLOv8-GABNet and YOLOv8 models, are displayed in Figure 8.
Figure 8 illustrates the outstanding performance of the YOLOv8-GABNet model across various categories. For manganese deficiency and magnesium deficiency, the model achieved accuracies of 0.953 and 0.937, respectively, demonstrating its high precision in identifying these diseases with minimal false positives. The overall [email protected] value was 0.867, indicating that while the model excels in individual categories, its overall performance is balanced and reliable. Additionally, the model performed notably well in other categories, such as “yellow vein disease” and “anthracnose”, showcasing its broad adaptability and efficiency in identifying diverse diseases.
Figure 9 shows the labeled results (a,d,g), the detection results of the YOLOv8 model (b,e,h), and the detection results of the improved YOLOv8-GABNet model (c,f,i). In comparison, the YOLOv8 model in Figure 9b demonstrates some missed detections relative to the labeled results and the improved model. Figure 9d highlights the detection of small targets in a complex environment, while Figure 9f reveals missed detections compared to Figure 9e. Additionally, Figure 9h shows a false detection case when compared with Figure 9g,i. Despite some detection inconsistencies, the improved YOLOv8-GABNet model exhibits strong performance in handling small targets and reducing missed detections in various complex scenarios. Figure 9c,f,i further validate the YOLOv8-GABNet model’s excellent performance on the test set, particularly in terms of accurate identification and reduction in false detections. Overall, the YOLOv8-GABNet model demonstrates significant improvements in detection accuracy.

3.2. Ablation Experiment

This section examines the impact of three improvement methods on the network model. A total of eight experiments were conducted, with different modules added and compared to the original YOLOv8 model across various metrics, including the mAP50, model size, number of parameters, computational cost, speed, accuracy, and recall rate. The YOLOv8 model with the GLSA module is referred to as YOLOv8 + GLSA, the model incorporating the ADown module as YOLOv8 + ADown, and the model using the BiFPN structure as YOLOv8 + BiFPN. The experimental results are presented in Table 3.
The YOLOv8-GABNet model integrates various efficient modules and demonstrates strong performance in object detection tasks. The model achieved an average accuracy (mAP50) of 0.867, the highest among all the models listed, showcasing its exceptional detection accuracy. With a size of just 3.8 MB, the model is particularly suitable for deployment on resource-constrained devices, such as mobile or embedded platforms, addressing the need for lightweight solutions. Additionally, it can process 93.3 frames per second, striking an excellent balance between speed and precision, making it ideal for applications that require both real-time detection and high accuracy. Key performance indicators also highlight the model’s strengths. The precision (P) value for YOLOv8-GABNet was 0.897, and the recall (R) value was 0.770, both higher than other models, indicating its superior ability in accurate target recognition and its comprehensive coverage of real targets. This performance is attributed to several key modules: The GLSA module effectively combines local and global modeling functions, enhancing object perception at various scales by aggregating spatial information across both levels, thereby improving the P and R values. The ADown module, an innovative design initially introduced in YOLOv9, significantly reduces the number of parameters through its lightweight design, allowing the model to operate efficiently in resource-constrained environments. Meanwhile, the BiFPN network enhances the traditional Feature Pyramid Network (FPN), improving the efficiency and effectiveness of multi-scale feature fusion.
In summary, by integrating GLSA, ADown, and BiFPN modules, YOLOv8-GABNet ensures high detection accuracy while also enhancing operational efficiency and resource utilization. This makes it an outstanding object detection model across multiple dimensions. The model maintains excellent performance even in resource-limited environments or when there are high real-time performance demands, particularly when dealing with complex or challenging scenes.
Using the mAP as the primary evaluation metric, line charts and loss function charts were plotted for the ablation experiments. Figure 10 compares the performance of the YOLOv8 and YOLOv8-GABNet models. Figure 10a reflects the mAP changes with the training cycle at an intersection over union (IoU) threshold of 0.5. Both models’ mAP50 values rise sharply from low starting points and stabilize after approximately 50 cycles, with YOLOv8-GABNet achieving relatively higher mAP, indicating improved performance. Figure 10b illustrates the training loss of both models, which starts high and declines sharply during the initial stages of training. The loss stabilizes after about 100 cycles, showing that both models converge, achieving high accuracy and stability in citrus disease detection without compromising convergence speed.
Figure 11 compares the predictions of the original YOLOv8 model and the improved YOLOv8-GABNet model across five categories. While the detection accuracy of most labels remains largely unchanged, there is a significant improvement in the detection accuracy of the small target “yellow vein disease” label. This demonstrates the feasibility and superiority of the improved YOLOv8-GABNet model.
Overall, our method significantly enhances the YOLOv8 model by accelerating the detection speed and reducing computational parameters, while simultaneously improving accuracy. These improvements make the model well suited for the demands of disease monitoring in agricultural production, representing a valuable advancement in the field.

3.3. Comparative Experiment of Different Network Models

In this study, the same citrus disease dataset was used to evaluate the effectiveness of the YOLOv8-GABNet algorithm model against several mainstream models, including Faster R-CNN, SSD, YOLOv5s, YOLOv8n, and YOLOv8s. The mAP model size, accuracy, recall rate, FPS, and FLOPs of these six models are summarized in Table 4. The YOLOv8-GABNet model outperforms the other models across key performance metrics. The YOLOv8-GABNet model achieved a mean Average Precision (mAP@50) of 86.7%, which is higher than that of YOLOv8n (82.4%), YOLOv8s (82.3%), YOLOv5s (82.0%), Faster R-CNN (71.8%), SSD (67.2%), YOLOv9t (81.8%), and YOLOv10n (81.7%). Additionally, the accuracy and recall rates of the YOLOv8-GABNet model, at 89.7% and 77.0%, respectively, also surpass those of the other models.
The integration of the ADown module, BiFPN structure, and GLSA module optimizes the original YOLOv8 model by improving speed while maintaining a high mAP. Among all the comparison models, the YOLOv8-GABNet model has the smallest model size and FLOPs yet continues to perform exceptionally well despite a relatively modest FPS of 93.3. As previously discussed, YOLOv8-GABNet exhibits superior detection performance and differentiation capability compared to other models, particularly in handling occlusion between citrus leaves and distinguishing various disease spots across different backgrounds. Consequently, in complex environments, the model efficiently, quickly, and accurately identifies citrus diseases while maintaining a lightweight design, making it well suited for the agricultural disease recognition field.

4. Discussion

In this study, the introduction of ADown lightweight convolutional blocks into the Backbone and Neck sections significantly reduced the model’s parameters and computational demands, making it more suitable for resource-constrained devices and shortening the inference time to enable real-time detection. Additionally, the GLSA module enhanced the model’s flexibility in recognizing different targets and improved its accuracy in complex backgrounds. The multi-scale feature fusion provided by the weighted Bidirectional Feature Pyramid Network (BiFPN) further enhanced feature utilization, leading to improvements in both detection accuracy and robustness. In the domain of citrus disease and nutritional deficiency detection, the YOLOv8-GABNet model demonstrated excellent performance across key metrics.
Firstly, the YOLOv8-GABNet model achieved a mean Average Precision (mAP50) of 0.867, demonstrating improvements of 5.2%, 5.3%, 5.7%, 20.7%, 29.0%, 6.0%, and 6.1% over the popular models YOLOv8n, YOLOv8s, YOLOv5s, Faster R-CNN, SSD, YOLOv9t, and YOLOv10n, respectively. These results highlight YOLOv8-GABNet’s superiority in accurately identifying citrus diseases. Moreover, while the frames per second (FPS) rate for YOLOv8-GABNet is 93.3, considering its excellent balance between high accuracy and low resource consumption, this rate is sufficient for most real-time monitoring applications. The superiority of these performance metrics is supported by research from Saleem et al. [1], who also emphasize the importance of efficiency and accuracy in agricultural disease detection.
Secondly, the recall rate of the YOLOv8-GABNet model also exhibited exceptional performance, showing improvements of 2.7%, 0.7%, 8.3%, 32.5%, 31.6%, 8.8%, and 7.1% compared to YOLOv8n, YOLOv8s, YOLOv5s, Faster R-CNN, SSD, YOLOv9t, and YOLOv10n, respectively. These data not only demonstrate the reliability of YOLOv8-GABNet in detecting all relevant diseases but also highlight its enhanced capability for detecting small targets, due to the multi-scale feature fusion provided by the BiFPN architecture, which significantly minimizes missed detections.
Finally, in terms of the model’s economic efficiency and applicability, the use of the ADown module significantly reduces the storage requirements. The YOLOv8-GABNet model’s size is only 3.8MB, which represents a reduction of 38.71%, 83.11%, 79.46%, 98.82%, 98.0%, 37.7%, and 34.48% compared to YOLOv8n, YOLOv8s, YOLOv5s, Faster R-CNN, SSD, YOLOv9t, and YOLOv10n, respectively. These results are consistent with the research by Wang et al. [39], demonstrating the efficiency of lightweight convolutional blocks in constructing lightweight networks. This foundation of theoretical and empirical evidence supports our model design. The lightweight nature of the model makes YOLOv8-GABNet particularly suitable for deployment on devices with limited computing resources, such as smartphones and embedded systems, enhancing its practicality in real-world agricultural applications.
Compared to Zhu et al. [43], although their average accuracy for citrus disease identification reached 98.68%, their images were sourced from a public dataset and featured relatively simplistic backgrounds. A similar situation was noted in the work of Dhiman et al. [44]. Similarly, Yang et al. [45] conducted their research in natural environments, and the backgrounds were relatively simple. Apacionado et al. [46], although dealing with complex backgrounds, only identified one type of disease—citrus sooty mold—with an accuracy of merely 74.4%. In contrast, this paper considers both the complexity of the background and accuracy, with data sourced from actual field photography.
Although YOLOv8-GABNet has shown excellent performance across multiple metrics, there are certain limitations. Firstly, this study primarily developed and tested the model for common diseases and deficiencies at “Citrus Planet” in Conghua District, Guangzhou, Guangdong Province; thus, the applicability of the model may be limited by geographical and environmental conditions. Secondly, since the research focuses on diseases and deficiencies in this specific area, its direct applicability in other global regions may require further verification. To enhance the model’s performance in extreme environmental conditions and its generalizability across different regions, it is possible to expand the dataset and further optimize the model architecture. Related research, such as the work by Ferentinos [47], emphasizes the importance of data diversity in enhancing model generalization capabilities. Additionally, Qing and others [48] have explored strategies to optimize models on edge devices to improve computational efficiency in their study of MobileNetV4, providing further direction for our research. Despite these limitations, the results of this study can still provide valuable references and insights for agricultural disease and nutrient management in other regions.

5. Conclusions

The main contributions of this study are as follows:
(1)
Data Collection and Augmentation: This study focused on collecting images of three common diseases and two nutrient deficiencies from “Citrus Planet” in Conghua District, Guangzhou, Guangdong Province. All images were taken with smartphones and annotated using the LabelImg tool. Data augmentation techniques were applied to enhance data diversity and improve the generalization ability of the detection model.
(2)
Technical Optimization and Model Improvement: The introduction of ADown lightweight subsampling convolution blocks replaced traditional convolution modules, significantly reducing the model’s parameter count and computational demand, increasing the detection speed, and reducing computational costs while maintaining high performance. Additionally, to address challenges of complex backgrounds and multi-object detection, an integrated weighted bi-directional Feature Pyramid Network (BiFPN) served as a feature fusion network, enhancing information flow between different feature levels, and improving detection accuracy and model robustness. The introduced Global-to-Local Spatial Aggregation (GLSA) module improved the model’s flexibility and accuracy in handling objects of various sizes and shapes in complex backgrounds by aggregating information across different spatial levels.
(3)
Reference for Other Regions: Although the research was concentrated in “Citrus Planet” in Conghua District, Guangzhou, Guangdong Province, the optimized techniques and model improvements not only enhanced the efficiency and accuracy of citrus disease and nutritional deficiency detection but also served as a robust case for agricultural disease and nutritional deficiency detection and management in other regions, providing feasible technical references and insights for global agricultural disease and nutritional deficiency monitoring.

Author Contributions

Conceptualization, Q.D. and Y.X.; methodology, Y.X.; software, S.L. (Shiyao Liang); validation, Y.H.; formal analysis, S.L. (Shiyao Liang); investigation, Y.H.; resources, Z.L.; writing—original draft preparation, Y.X.; writing—review and editing, Y.X. and Q.D.; supervision, S.S. and X.X.; project administration, Z.L. and S.L. (Shilei Lv); funding acquisition, Z.L., S.L. (Shilei Lv) and S.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Earmarked Fund for CARS, grant no. CARS-26. It was also partly supported by the National Natural Science Foundation of China, grant no. 32271997; the National Natural Science Foundation of China, grant no. 32472020; and the Science and Technology Projects in Guangzhou, grant no. 2024B03J1309. The recipients of these four funds are Zhen Li, Shilei Lv, and Shuran Song.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Saleem, M.H.; Potgieter, J.; Arif, K.M. Automation in Agriculture by Machine and Deep Learning Techniques: A Review of Recent Developments. Precis. Agric. 2021, 22, 2053–2091. [Google Scholar] [CrossRef]
  2. Iqbal, Z.; Khan, M.A.; Sharif, M.; Shah, J.H.; Rehman, M.H.U.; Javed, K. An automated detection and classification of citrus plant diseases using image processing techniques. Comput. Electron. Agric. 2018, 153, 12–32. [Google Scholar] [CrossRef]
  3. Deng, L.; Wang, Z.; Zhou, H. Application of Image Segmentation Technology in Crop Disease Detection and Recognition. In Computer and Computing Technologies in Agriculture XI; IFIP Advances in Information and Communication Technology; Springer: Cham, Switzerland, 2019; Volume 545, pp. 365–374. [Google Scholar]
  4. Dimililer, K.; Zarrouk, S. ICSPI: Intelligent Classification System of Pest Insects Based on Image Processing and Neural Arbitration. Appl. Eng. Agric. 2017, 33, 453–460. [Google Scholar] [CrossRef]
  5. Wen, C.; Guyer, D.; Li, W. Local feature-based identification and classification for orchard insects. Biosyst. Eng. 2009, 104, 299–307. [Google Scholar] [CrossRef]
  6. Wen, C.; Guyer, D. Image-based orchard insect automated identification and classification method. Comput. Electron. Agric. 2012, 89, 110–115. [Google Scholar] [CrossRef]
  7. Liu, T.; Chen, W.; Wu, W.; Sun, C.; Guo, W.; Zhu, X. Detection of aphids in wheat fields using a computer vision technique. Biosyst. Eng. 2016, 141, 82–93. [Google Scholar] [CrossRef]
  8. Zhang, C.; Zhang, S.; Yang, J.; Shi, Y.; Chen, J. Apple leaf disease identification using genetic algorithm and correlation based feature selection method. Int. J. Agric. Biol. Eng. 2017, 10, 74–83. [Google Scholar]
  9. Chakraborty, S.; Paul, S.; Rahat-uz-Zaman, M. Prediction of Apple Leaf Diseases Using Multiclass Support Vector Machine. In Proceedings of the 2021 2nd International Conference on Robotics, Electrical and Signal Processing Techniques (ICREST), Dhaka, Bangladesh, 5–7 January 2021. [Google Scholar]
  10. Zhang, S.; Wu, X.; You, Z.; Zhang, L. Leaf image based cucumber disease recognition using sparse representation classification. Comput. Electron. Agric. 2017, 134, 135–141. [Google Scholar] [CrossRef]
  11. Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation. In Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
  12. Girshick, R. Fast R-CNN. In Proceedings of the 2015 IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  13. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 39, 1137–1149. [Google Scholar] [CrossRef]
  14. Zhang, K.; Wu, Q.; Chen, Y. Detecting soybean leaf disease from synthetic image using multi-feature fusion faster R-CNN. Comput. Electron. Agric. 2021, 183, 106064. [Google Scholar] [CrossRef]
  15. Sun, H.; Xu, H.; Liu, B.; He, D.; He, J.; Zhang, H.; Geng, N. MEAN-SSD: A novel real-time detector for apple leaf diseases using improved light-weight convolutional neural networks. Comput. Electron. Agric. 2021, 189, 106379. [Google Scholar] [CrossRef]
  16. Bao, W.; Fan, T.; Hu, G.; Liang, D.; Li, H. Detection and identification of tea leaf diseases based on AX-RetinaNet. Sci. Rep. 2022, 12, 2183. [Google Scholar] [CrossRef] [PubMed]
  17. Liu, W.; Matas, J.; Sebe, N.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot MultiBox Detector. In Computer Vision—ECCV 2016; Springer International Publishing: Cham, Switzerland, 2016; pp. 21–37. [Google Scholar]
  18. Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
  19. Lippi, M.; Bonucci, N.; Carpio, R.F.; Contarini, M.; Speranza, S.; Gasparri, A. A YOLO-Based Pest Detection System for Precision Agriculture. In Proceedings of the 2021 29th Mediterranean Conference on Control and Automation (MED), Puglia, Italy, 22–25 June 2021. [Google Scholar]
  20. Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 22. [Google Scholar] [CrossRef] [PubMed]
  21. Liu, J.; Wang, X. Tomato Diseases and Pests Detection Based on Improved Yolo V3 Convolutional Neural Network. Front. Plant Sci. 2020, 11, 898. [Google Scholar] [CrossRef]
  22. Wang, X.; Liu, J. Tomato Anomalies Detection in Greenhouse Scenarios Based on YOLO-Dense. Front. Plant Sci. 2021, 12, 634103. [Google Scholar] [CrossRef]
  23. Li, K.S.; Wang, J.C.; Jalil, H.; Wang, H. A fast and lightweight detection algorithm for passion fruit pests based on improved YOLOv5. Comput. Electron. Agric. 2023, 204, 107534. [Google Scholar] [CrossRef]
  24. Alam Soeb, J.; Jubayer, F.; Tarin, T.A.; Al Mamun, M.R.; Ruhad, F.M.; Parven, A.; Mubarak, N.M.; Karri, S.L.; Meftaul, I.M. Tea leaf disease detection and identification based on YOLOv7 (YOLO-T). Sci. Rep. 2023, 13, 6078. [Google Scholar]
  25. Deng, J.; Yang, C.; Huang, K.; Lei, L.; Ye, J.; Zeng, W.; Zhang, J.; Lan, Y.; Zhang, Y. Deep-Learning-Based Rice Disease and Insect Pest Detection on a Mobile Phone. Agronomy 2023, 13, 2139. [Google Scholar] [CrossRef]
  26. Solimani, F.; Cardellicchio, A.; Dimauro, G.; Petrozza, A.; Summerer, S.; Cellini, F.; Renò, V. Optimizing Tomato Plant Phenotyping Detection: Boosting YOLOv8 Architecture to Tackle Data Complexity. Comput. Electron. Agric. 2024, 218, 108728. [Google Scholar] [CrossRef]
  27. Xu, L.; Wang, Y.; Shi, X.; Tang, Z.; Chen, X.; Wang, Y.; Zou, Z.; Huang, P.; Liu, B.; Yang, N.; et al. Real-Time and Accurate Detection of Citrus in Complex Scenes Based on HPL-YOLOv4. Comput. Electron. Agric. 2023, 205, 107590. [Google Scholar] [CrossRef]
  28. Ebrahimi, M.; Khoshtaghaza, M.H.; Minaei, S.; Jamshidi, B.J.C. Vision-based pest detection based on SVM classification method. Comput. Electron. Agric. 2017, 137, 52–58. [Google Scholar] [CrossRef]
  29. Dai, Q.; Guo, Y.; Li, Z.; Song, S.; Lyu, S.; Sun, D.; Wang, Y.; Chen, Z. Citrus Disease Image Generation and Classification Based on Improved FastGAN and EfficientNet-B5. Agronomy 2023, 13, 988. [Google Scholar] [CrossRef]
  30. Studies on Manganese Deficiency in Citrus. III. Available online: https://www.jstage.jst.go.jp/article/jjshs1925/36/1/36_1_55/_article (accessed on 22 October 2024).
  31. Wang, Y.; Long, Q.; Li, Y.; Kang, F.; Fan, Z.; Xiong, H.; Zhao, H.; Luo, Y.; Guo, R.; He, X.; et al. Mitigating Magnesium Deficiency for Sustainable Citrus Production: A Case Study in Southwest China. Sci. Hortic. 2022, 295, 110832. [Google Scholar] [CrossRef]
  32. Ying, Z.; Li, G.; Ren, Y.; Wang, R.; Wang, W. A New Image Contrast Enhancement Algorithm Using Exposure Fusion Framework. In Computer Analysis of Images and Patterns; Lecture Notes in Computer Science; Springer International Publishing: Cham, Switzerland, 2017; pp. 36–46. [Google Scholar]
  33. DeVries, T.; Taylor, G.W. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
  34. Shorten, C.; Khoshgoftaar, T.M. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
  35. Zhong, Z.; Zheng, L.; Kang, G.; Li, S.; Yang, Y. Random Erasing Data Augmentation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 13001–13008. [Google Scholar]
  36. Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar]
  37. Takahashi, R.; Matsubara, T.; Uehara, K. Data Augmentation Using Random Image Cropping and Patching for Deep CNNs. IEEE Trans. Circuits Syst. Video Technol. 2020, 30, 2917–2931. [Google Scholar] [CrossRef]
  38. Tang, F.; Xu, Z.; Huang, Q.; Wang, J.; Hou, X.; Su, J.; Liu, J. DuAT: Dual-aggregation transformer network for medical image segmentation. In Proceedings of the Chinese Conference on Pattern Recognition and Computer Vision (PRCV), Xiamen, China, 13–15 October 2023; pp. 343–356. [Google Scholar]
  39. Wang, C.-Y.; Yeh, I.-H.; Liao, M.H.-Y. Yolov9: Learning what you want to learn using programmable gradient information. arXiv 2024, arXiv:2402.13616. [Google Scholar]
  40. Tan, M.; Pang, R.; Le, Q.V. EfficientDet: Scalable and efficient object detection. arXiv 2019, arXiv:1911.09070. [Google Scholar]
  41. Chollet, F. Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
  42. Laurent, S. Rigid-Motion Scattering for Image Classification. Ph.D. Thesis, Ecole Polytechnique, Palaiseau, France, 2014. [Google Scholar]
  43. Zhu, H.; Wang, D.; Wei, Y.; Zhang, X.; Li, L. Combining Transfer Learning and Ensemble Algorithms for Improved Citrus Leaf Disease Classification. Agriculture 2024, 14, 1549. [Google Scholar] [CrossRef]
  44. Dhiman, P.; Kaur, A.; Hamid, Y.; Alabdulkreem, E.; Elmannai, H.; Ababneh, N. Smart Disease Detection System for Citrus Fruits Using Deep Learning with Edge Computing. Sustainability 2023, 15, 4576. [Google Scholar] [CrossRef]
  45. Yang, C.; Teng, Z.; Dong, C.; Lin, Y.; Chen, R.; Wang, J. In-Field Citrus Disease Classification via Convolutional Neural Network from Smartphone Images. Agriculture 2022, 12, 1487. [Google Scholar] [CrossRef]
  46. Apacionado, B.V.; Ahamed, T. Sooty Mold Detection on Citrus Tree Canopy Using Deep Learning Algorithms. Sensors 2023, 23, 8519. [Google Scholar] [CrossRef] [PubMed]
  47. Ferentinos, K.P. Deep Learning Models for Plant Disease Detection and Diagnosis. Comput. Electron. Agric. 2018, 145, 311–318. [Google Scholar] [CrossRef]
  48. Qin, D.; Leichner, C.; Delakis, M.; Fornoni, M.; Luo, S.; Yang, F.; Wang, W.; Banbury, C.; Ye, C.; Akin, B.; et al. MobileNetV4—Universal Models for the Mobile Ecosystem. arXiv 2024, arXiv:2404.10518. [Google Scholar]
Figure 1. Representative samples of citrus disease images. (a) Anthracnose; (b) canker disease; (c) magnesium deficiency; (d) manganese deficiency; (e) yellow vein disease.
Figure 1. Representative samples of citrus disease images. (a) Anthracnose; (b) canker disease; (c) magnesium deficiency; (d) manganese deficiency; (e) yellow vein disease.
Agriculture 14 01964 g001
Figure 2. Schematic diagram of data enhancement techniques. (a) Original image; (b) random adjustment of contrast; (c) cutout; (d) random rotation; (e) Gaussian blur; (f) salt and pepper noise; (g) scaling; (h) random cropping; (i) Mosaic data enhancement.
Figure 2. Schematic diagram of data enhancement techniques. (a) Original image; (b) random adjustment of contrast; (c) cutout; (d) random rotation; (e) Gaussian blur; (f) salt and pepper noise; (g) scaling; (h) random cropping; (i) Mosaic data enhancement.
Agriculture 14 01964 g002
Figure 3. The network structure of the original YOLOv8.
Figure 3. The network structure of the original YOLOv8.
Agriculture 14 01964 g003
Figure 4. Network structure of global to local spatial aggregation module (GLSA). This module comprises Global Spatial Attention (GSA) and Local Spatial Attention (LSA).
Figure 4. Network structure of global to local spatial aggregation module (GLSA). This module comprises Global Spatial Attention (GSA) and Local Spatial Attention (LSA).
Agriculture 14 01964 g004
Figure 5. ADown network structure.
Figure 5. ADown network structure.
Agriculture 14 01964 g005
Figure 6. BiFPN network structure.
Figure 6. BiFPN network structure.
Agriculture 14 01964 g006
Figure 7. YOLOv8-GABNet network structure.
Figure 7. YOLOv8-GABNet network structure.
Agriculture 14 01964 g007
Figure 8. (a) PR diagram of YOLOv8 (b) PR diagram of YOlOV8-GABNet.
Figure 8. (a) PR diagram of YOLOv8 (b) PR diagram of YOlOV8-GABNet.
Agriculture 14 01964 g008
Figure 9. Analysis of the identification results. (a,d,g) show the labeled results; (b,e,h) present the detection results of YOLOv8; and (c,f,i) display the detection results of YOLOv8-GABNet.
Figure 9. Analysis of the identification results. (a,d,g) show the labeled results; (b,e,h) present the detection results of YOLOv8; and (c,f,i) display the detection results of YOLOv8-GABNet.
Agriculture 14 01964 g009
Figure 10. (a) The mAP of the ablation experiment; (b) the training loss of the ablation experiment.
Figure 10. (a) The mAP of the ablation experiment; (b) the training loss of the ablation experiment.
Agriculture 14 01964 g010
Figure 11. (a) YOLOv8 confusion matrix; (b) YOLOv8-GABNet confusion matrix.
Figure 11. (a) YOLOv8 confusion matrix; (b) YOLOv8-GABNet confusion matrix.
Agriculture 14 01964 g011
Table 1. Partitioning of data sets.
Table 1. Partitioning of data sets.
NameProportionNumber of PicturesNumber of Labels
Training Set80%8813016
Validation Set10%110383
Test Set10%111385
Total Set100%11023784
Table 2. Distribution of labels after data augmentation.
Table 2. Distribution of labels after data augmentation.
DiseasesAnthracnoseCankerYellow Vein DiseaseMg DeficiencyMn Deficiency
Original Data11401105326716497
Data Augmentation93089031269358814075
Table 3. Ablation experiments of modules.
Table 3. Ablation experiments of modules.
ModelmAP50(%)mAP50:95(%)Model Size(MB)Params(MB)FLOPs(G)FPS(Frames/s)P(%)R(%)
v80.8240.6296.23.158.7163.50.8290.750
v8 + G0.8550.6318.04.029.2110.20.7990.779
v8 + A0.8560.6185.22.597.4149.70.8730.737
v8 + B0.8450.6344.11.997.1131.20.8500.750
v8 + G + A0.8310.6137.23.618.5101.40.7740.772
v 8 + G + B0.8650.6364.42.147.697.00.8700.712
v8 + A + B0.8410.6223.41.656.3123.10.8210.751
v8 + G + A + B0.8670.6383.81.796.893.30.8970.770
Note: v8 stands for YOLOv8, G stands for GLSA, A stands for ADown, and B stands for BiFPN.
Table 4. Comparison of six detection models.
Table 4. Comparison of six detection models.
ModelmAP50(%)mAP50:95(%)Model Size(MB)Params(MB)FLOPs(G)FPS(Frames/s)P(%)R(%)
Faster R-CNN0.7180.503323.141.3733.262.70.7180.581
SSD0.6720.445190.424.2830.694.70.6720.585
YOLO5s0.8200.61718.59.1123.8136.90.8560.711
YOLOv8n0.8240.6296.23.158.7163.50.8290.750
YOLO8s0.8230.62822.511.1228.4150.60.8150.765
YOLOv9t0.8180.6276.12.6210.7141.50.8520.708
YOLOv10n0.8170.6145.82.276.5139.90.7950.719
YOLOv8-GABNet0.8670.6383.81.796.893.30.8970.770
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dai, Q.; Xiao, Y.; Lv, S.; Song, S.; Xue, X.; Liang, S.; Huang, Y.; Li, Z. YOLOv8-GABNet: An Enhanced Lightweight Network for the High-Precision Recognition of Citrus Diseases and Nutrient Deficiencies. Agriculture 2024, 14, 1964. https://doi.org/10.3390/agriculture14111964

AMA Style

Dai Q, Xiao Y, Lv S, Song S, Xue X, Liang S, Huang Y, Li Z. YOLOv8-GABNet: An Enhanced Lightweight Network for the High-Precision Recognition of Citrus Diseases and Nutrient Deficiencies. Agriculture. 2024; 14(11):1964. https://doi.org/10.3390/agriculture14111964

Chicago/Turabian Style

Dai, Qiufang, Yungao Xiao, Shilei Lv, Shuran Song, Xiuyun Xue, Shiyao Liang, Ying Huang, and Zhen Li. 2024. "YOLOv8-GABNet: An Enhanced Lightweight Network for the High-Precision Recognition of Citrus Diseases and Nutrient Deficiencies" Agriculture 14, no. 11: 1964. https://doi.org/10.3390/agriculture14111964

APA Style

Dai, Q., Xiao, Y., Lv, S., Song, S., Xue, X., Liang, S., Huang, Y., & Li, Z. (2024). YOLOv8-GABNet: An Enhanced Lightweight Network for the High-Precision Recognition of Citrus Diseases and Nutrient Deficiencies. Agriculture, 14(11), 1964. https://doi.org/10.3390/agriculture14111964

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop