Small Pests Detection in Field Crops Using Deep Learning Object Detection

Khalid, Saim; Oqaibi, Hadi Mohsen; Aqib, Muhammad; Hafeez, Yaser

doi:10.3390/su15086815

Open AccessArticle

Small Pests Detection in Field Crops Using Deep Learning Object Detection

¹

University Institute of Information Technology, PMAS-Arid Agriculture University Rawalpindi, Rawalpindi 46300, Pakistan

²

IS Department, Faculty of Computing and Information Technology, King Abdulaziz University, Jeddah 21589, Saudi Arabia

³

National Center of Industrial Biotechnology, PMAS-Arid Agriculture University Rawalpindi, Rawalpindi 46300, Pakistan

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(8), 6815; https://doi.org/10.3390/su15086815

Submission received: 27 February 2023 / Revised: 14 April 2023 / Accepted: 16 April 2023 / Published: 18 April 2023

(This article belongs to the Special Issue Sustainable Technologies for Improving Soil, Crop, and Environment Quality in Changing Climate)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning algorithms, such as convolutional neural networks (CNNs), have been widely studied and applied in various fields including agriculture. Agriculture is the most important source of food and income in human life. In most countries, the backbone of the economy is based on agriculture. Pests are one of the major challenges in crop production worldwide. To reduce the overall production and economic loss from pests, advancement in computer vision and artificial intelligence may lead to early and small pest detection with greater accuracy and speed. In this paper, an approach for early pest detection using deep learning and convolutional neural networks has been presented. Object detection is applied on a dataset with images of thistle caterpillars, red beetles, and citrus psylla. The input dataset contains 9875 images of all the pests under different illumination conditions. State-of-the-art Yolo v3, Yolov3-Tiny, Yolov4, Yolov4-Tiny, Yolov6, and Yolov8 have been adopted in this study for detection. All of these models were selected based on their performance in object detection. The images were annotated in the Yolo format. Yolov8 achieved the highest mAP of 84.7% with an average loss of 0.7939, which is better than the results reported in other works when compared to small pest detection. The Yolov8 model was further integrated in an Android application for real time pest detection. This paper contributes the implementation of novel deep learning models, analytical methodology, and a workflow to detect pests in crops for effective pest management.

Keywords:

deep learning; convolution neural networks; object detection; image classification; pest detection; pest management

1. Introduction

Artificial intelligence has attracted a lot of attention in recent times. It has great potential and can give promising results. AI’s ability to recognize and learn from data and images in different formats is of great benefit. Various industries are currently trying to incorporate artificial intelligence into their daily activities. Agriculture is one of the industries that started implementing AI to achieve a more efficient and faster way of doing things. The role that artificial intelligence (AI) plays in agriculture is becoming more important as the world population grows and the climate changes.

Crops are affected by an extensive variety of pests. Pests have been one of the biggest challenges in the agricultural sector for years [1]. Pests reduce the yield and quality of agriculture production—about fifty percent of the crop yield is destroyed through pest infestation [2]. Global losses for pests range between 17% and 23% for major crops [3]. Consequently, careful pest control is an important task to reduce crop losses and improve crop yields globally. Pests should be identified after they have infected the field at an early stage, so that the farmers can perform well-timed supervision and avoid the spread of pests. Most farmers use traditional methods to identify and treat pest infestation. However, these traditional methods have many drawbacks. Firstly, the most widely used method for identification is done manually during which farmers themselves, with some expert opinions, inspect the field by hand on a daily, weekly, and monthly basis to identify and detect any signs of pests. Secondly, there are enormous amounts of types and species of pests that belong to the same family [4] and also a wide diversity of cultivated crops, which are, as a result, difficult to identify and detect manually. Thirdly, the regular emergence and re-emergence of pests also plays a vital role. So, many conventional techniques are unable to detect pests at an early stage and the use of sprayers for the removal of pests results in damaging the plant along with the pests and also intoxicates the crops, which creates many health issues later [5]. Traditional pest detection methods are prone to errors, can be time consuming, and can be very tedious to perform. Additionally, they may pose health risks.

Machine learning (ML) algorithms have played an important role in automating pest detection [6,7]. Chodey et al. [8] used k-means and expectation–maximization clustering to detect and identify pests in crops. Detection is done by manually extracting features from the dataset, which is a slow process if the images are of higher resolution and becomes a tedious task on a larger dataset. In another study, a support vector machine was used to detect pests in crops [9]. After acquiring the dataset, the features were extracted manually by removing the background, and applying histogram equalization, contrast stretching, and noise removal to extract features from the images. This method achieved higher accuracy, but feature extraction gets tedious and time consuming if a larger dataset is used with a variety of pest species. Although traditional ML algorithms have been shown to work fine when the quantity of crop pest species is low, they become ineffective when several features are manually extracted.

Object detection with deep learning has various applications in agriculture [10]; it is used to detect weeds [11,12], pests [13,14], yield prediction [15,16], and has many other applications in agriculture and other fields of life [17,18]. It has recently been applied to detect and classify different pests and has achieved great results in many pest detection and classification applications [19]. Objects can be detected from an image or from real-time without manually extracting features from each image [20]. Object detection in deep learning uses different state-of-the-art convolutional neural networks (CNNs) to detect a variety of objects [21]. Pest detection using deep learning also comes with serious complexities, such as working in a wild environment, detecting small-sized pests, detecting pests of the same color as the plant, and detecting different kinds of pests in the same plant species [22].

In this paper, object detection is implemented to detect thistle caterpillars, citrus psylla, and red beetles. Six state-of-the-art object detection convolutional neural networks (CNNs) are used, namely Yolov3, Yolov3-Tiny, Yolov4, Yolov4-Tiny, Yolov6, and Yolov8. The following sections describe the dataset, training processes, and evaluation of pests regarding the accuracy of all models as well as detection and localization. The remainder of the article is structured as follows. Section 2 provides an explanation of the literature where different AI-/deep learning-/machine learning-based approaches have been used for pest management. Section 3 gives details about the deep learning algorithms that are implemented in this work. Section 4 gives an overview of the dataset and training of the models used in the study. Section 5 goes through the outcomes and performance comparisons of all the models used. Section 6 gives the study’s conclusion.

2. Review of the Literature

The literature discusses different aspects of pest detection. A variety of datasets and different detection algorithms are discussed based on small pest detection and different species of pests under different environmental conditions. Li et al. [13] presented a framework for the detection of whitefly and thrips where images were captured by using sticky traps and the data were collected in a greenhouse environment. A two-stage custom detector was created in comparison with Faster R-CNN, which shows weak performance in recognizing small pests, and this model uses images from sticky traps, which in return give less accuracy if tested on images with complex backgrounds. The intent of [14] was to set up models that identify three types of pest moths in pheromone trap pictures utilizing deep learning and object detection strategies. Pheromone traps were assembled and seven distinctive deep learning object detection models were applied to look at the speed and precision. Faster R-CNN gave the highest accuracy in terms of mAP. However, the dataset utilized in this review had a few impediments since they are not real pictures taken from a field. Barbedo and Castro [23] provide a comprehensive overview of the current state of automatic pest detection using proximal images and machine learning techniques. They highlighted the challenges faced in building robust pest detection systems and the need for more comprehensive pest image databases and suggested solutions such as involving farmers and entomologists through citizen science initiatives and data sharing between research groups. They also reviewed the limitations of current monitoring techniques, such as the cost and complexity of high-density imaging sensors, and the need for better edge computing solutions for effective data processing. The study concludes by stating that automating pest monitoring is a challenging task, but has the potential for practical applications with the advancements in machine learning algorithms.

Rustia et al. [24] implemented deep learning models for the automatic object detection of pests from traps. A CNN model was applied for the detection and filtering of irrelevant objects from the traps. The main part of this study was to represent a model for counting the number of pests present in respective images. Li et al. [13] meant to set up a deep learning-based detection model for whitefly and thrips detection from pictures gathered from a controlled environment. For detecting tiny pests from sticky traps, a hybrid model known as the TPest-RCNN model was applied. In [25], they have put forward a pre-prepared profound CNN-based structure for the classification and recognition of tomato pests. The DenseNet169 gave the highest accuracy of 88.83%.

Hong et al. [26] detected a pest named matsucoccus thunbergianae, which is present on black pine on pheromone traps. Distinctively the best deep learning algorithms for object detection were used to count and detect pests from pheromone trap images. However, the trained algorithm may provide less efficient results on images from real-time and complex environments.

De Cesaro Júnior and Rieder [27] provided a survey of techniques used for deep learning and computer vision to perform object detection for pests and diseases in plants. It was observed that CNNs need to be customized when performing feature extraction to obtain better accuracy and classification. It also explained that the resolution of images, use of good hardware, and a larger dataset influence the training of CNNs and also the accuracy and efficiency of the algorithm.

Turkoglu et al. [28] designed a hybrid model named Multi-model LSTM-based Pre-trained Convolutional Neural Networks (MLP-CNNs). It is focused on a hybrid of the LSTM network and pre-trained CNN models. A variety of feature extraction models were used, which include AlexNet, GoogleNet, and DenseNet201, respectively. It focuses on extracting deep features from pest images which are forwarded into the LSTM layer to build a hybrid model to detect pests and diseases in apples.

Jiao et al. [29] presented an anchor-free region convolutional network to recognize and classify agricultural pests into 24 classes. The main contribution of this study was to introduce a feature fusion model to generate fused future maps, which is helpful for multi-scale pest detection and detecting small pests. However, images used in this study are obtained from Google; a better dataset collected in real time can give better results. Their proposed model achieved a mAP of 56.4%, which can be improved because the size of the pest was too small in the images, greatly affecting the overall accuracy.

Patel and Bhatt [30] presented multi-class pest detection utilizing the Faster R-CNN model. A smaller dataset was used and the results were compared by focusing on the accuracy of image augmentation. The limitation of this study was that it used a smaller dataset with images from traps.

Rahman et al. [31] developed a different deep learning-based methodology for detecting and recognizing diseases and pests on rice plant images. VGG16 and InceptionV3 were used and improved to detect and identify rice diseases and pests. The results were later compared with MobileNet, NasNet Mobile, and SqueezeNet.

Legaspi et al. [32] used Yolov3 with Darknet Architecture to detect whiteflies and fruit flies. The model struggles to detect small pests under complex environments resulting in low accuracy. Yolov2 and Yolov3 were applied by [33] to detect pests and diseases from tea plantations. The model achieved an accuracy of 58% and 60.8% in detecting small pests. An improved deep learning algorithm can be utilized for small object detection.

Önler and Eray [34] developed a deep learning-based object detection system to detect thistle caterpillars in real time, which is based on Yolov5 single-stage object detection. They used a public dataset and achieved a maximum mAP of 59% on 65 FPS. The Yolov5 model was trained with transfer learning and their model could detect different caterpillar species under different environmental lightening conditions with maximum accuracy achieved by Yolov5m. The accuracy of the object detection system can be improved by increasing the dataset size by adding new images, using a variety of data augmentation methods, and using a different deep learning model for object detection.

Although the studies discussed above show good performance results on pest detection, they also face some challenges, such as the surrounding of the images, detection under a variation of light conditions, and the detection of small pests, which are still difficult to overcome. Therefore, in this study, state-of-the-art approaches of CNN object detection models are deployed for pest detection in agricultural fields under different light conditions and environments.

3. Background

In this section, an overview of the object detection model Yolo and its six different variants used in this work for pest detection is presented.

3.1. Yolo for Pest Detection

You Only Look Once or Yolo is a model that detects and perceives many objects in an image or video. Additionally, it can also detect objects in real-time with good speed and accuracy. It carries out object detection as a regression issue and conveys the class probabilities of the recognized pictures.

Object detection using Yolo is performed by deploying convolutional neural networks. In order to detect and classify objects, Yolo uses a neural network with single forward propagation. The algorithm has exceptional speed and accuracy for detecting and classifying multiple objects present in an image or a video or from real-time. The Yolo algorithm has many variants. In this study, six variants of Yolo are applied to detect multiple pests.

3.2. Yolov3

Yolov3 is a state-of-the-art object detection model [35]. Yolov3 can maintain high speeds and accurate prediction for object detection. It is a great improvement over its predecessor Yolov2. The most highlighted improvement includes the change in the overall network structure. It utilizes Darknet-53 to extract multi-scale features to detect objects.

Yolov3 is made up of 53 convolutional layers, also known as Darknet-53; however, for detecting tasks, the original design is layered with 53 additional levels, for a total of 106 layers, which is given in Figure 1 [36]. Yolov3 includes some of the most important aspects, such as residual blocks, skip connections, and upsampling, with each convolutional layer followed by a batch normalization layer and a leaky ReLU activation function. There are no pooling layers, but, instead, extra convolutional layers with stride 2 are used to downsample feature maps. This is done to prevent the loss of low-level features that polling layers exclude. As a result, capturing low-level features helps to improve the ability to detect small objects.

In Yolov3, object detections are performed on three different layers, specifically at positions 82, 94, and 106 within the network architecture. As the network processes the input image, it downsamples the resolution by factors of 32, 16, and 8, respectively, at these three different locations. These downsampling factors, also known as strides, indicate that the output at each of these layers is smaller than the input to the network. By performing detections at multiple scales within the network, Yolov3 is able to detect objects of different sizes and aspect ratios with high accuracy.

For prediction, Yolov3 calculates the offset of the anchor, also known as log-space transform, to predict the center coordinates of the bounding boxes. Yolov3 passes the output centers to the sigmoid function. This is done to eliminate gradients during training. The objectness score is usually predicted through predicted probability and IoU between predicted and ground truth.

3.3. Yolov3-Tiny

Yolov3-Tiny is a lightweight version of Yolov3 [35]. It has a reduced depth of convolution layer compared to the Yolov3. Yolov3-Tiny also uses Darknet-53 architecture for extracting features. Figure 2 [37] explains the working of Yolov3-Tiny [38]. It employs the pooling layer while lowering the figure for the convolutional layer. It splits the picture first into S×S grid cells, then predicts a three-dimensional tensor encapsulating bounding box, objectness score, and class predictions at two distinct scales. In the last stage, the bounding boxes without the best objectness score are ignored for the final detection of objects.

Yolov3-Tiny has a complex structure, as shown in Figure 3 [38,39]. It contains nine convolutional and six max-pooling layers to carry out feature extraction. The bounding boxes are predicted at two different scales, which include a feature map of 13 × 13 and a merged feature map acquired by merging a 26 × 26 feature map with an unsampled 13 × 13 feature map. Such a simple structure of Yolov3-Tiny brings a very fast detection speed, but the accuracy of the model is sacrificed as compared to Yolov3.

3.4. Yolov4

Yolov4 is the framework used for object detection in real-time [40]. It can be trained based on position and classification at a fast speed and can achieve higher accuracy for both. Yolov4 achieves superior performance in comparison with the previous versions of Yolo [40]. Its main architecture is mainly composed of three parts, as given in Figure 4 [41]. Firstly, the backbone in which CSP-Darknet-53 is employed for the training of the object detection model. Secondly, the neck in which spatial pyramid pooling (SPP) is merged with a path aggregation network (PANet) to extract different layers of feature maps. Lastly, the main head of Yolov4 utilizes the Yolo head. The CSP-Darknet-53 network uses a cross-stage partial network (CSPNet) and the Darknet-53 model. Most object detection models require higher computation and a relatively large amount of computation time. However, CSPNet handles this issue and improves the computation [40]. It does that by integrating the feature maps at the start and the end to reduce heavy computation. The neck in Yolov4 intertwines various scales of feature maps which utilize SPP to concatenate various scales of feature maps toward the end of the network and uses PANet to embed another layer-based feature pyramid network.

3.5. Yolov4-Tiny

Yolov4-Tiny is a scaled-down type of Yolov4. Yolov4-Tiny has a quicker detection speed and higher accuracy than Yolov3-Tiny [42]. Yolov4-Tiny’s network architecture is depicted in Figure 5 [43]. Yolov4-Tiny’s backbone network is significantly simpler than Yolov4. A feature pyramid network is utilized for 32 times downsampling and 16 times downsampling to provide two distinct sizes of feature maps for detection, which enhances detection accuracy. The Yolov4-Tiny network topology consists of many fundamental components. Conv convolution, BN normalization, and the leaky ReLU activation function are all part of the CBL module. To meet the aim of downsampling, the iteration length of the convolution kernel is adjusted to 2. To further incorporate feature information, the CSP module refers to the CSPNet network topology and is made up of the CBL module and the concat tensor splicing module.

3.6. Yolov6

Yolov6 is a single-stage object detection framework which was released by [44]. It directly predicts object bounding boxes for an image in one stage. Yolov6 architecture components consist of a backbone network based on Yolov5, a region proposal network, the use of interest aligns (RoI), and the construction of a fully connected layer, see Figure 6. For detection, the image is divided into grid cells and bounding boxes are predicted for each object in the image. Object detection capability has been demonstrated to be equivalent to that of previous CNN-based Yolo algorithms [35], with the algorithm’s progressive iterations showing gains in both speed and accuracy [40]. In this study, Yolov6s is applied to detect the pests.

3.7. Yolov8

The Yolov8 is an advanced model that enhances the achievements of the prior Yolo versions by introducing novel functionalities and upgrades to increase its efficiency and adaptability [45]. A comparison can be seen in Figure 7 [45]. The model employs an anchor-free detection methodology and incorporates novel convolutional layers to enhance the precision of its predictions. This enabled us to achieve highly accurate pest detection results.

4. Methodology

This part provides an overview of the methodology, a description of the dataset and its pre-processing information, and the experimental setup.

4.1. Detection Workflow

Data exploration to understand the data and data augmentation is applied. Then, the image data is labeled in Yolo_Mark, accordingly, in text (.txt) format, which is required for the architectures used. Subsequently, the dataset is split into training and testing, and validation, respectively. Following this, the models are trained on the given dataset. After training, validation is performed on the testing dataset to check the overall detection accuracy of the model. Next, the results of each model are discussed and compared. At the end, the Yolov8 model is integrated in an Android application for real-time pest detection. An overview of the methodology adopted in this study for pest detection given in the Figure 8.

4.2. Dataset

The dataset consists of images of three pests, namely thistle caterpillars (Vanessa cardui), red beetles (Aulacophora foveicollis), and citrus psylla (Diaphorina citri). Images of thistle caterpillars were obtained from [46]. However, images of red beetles and psylla were captured manually using iPhone X with a resolution of 3032 × 4032. All of the images were captured at different times of the day, such as dawn, morning, noon, afternoon, and under a variety of lighting conditions, including direct sunlight, indirect sunlight, shade, and mixed lighting conditions. In order to enrich the dataset, the images are further augmented and a combined dataset consisting of three classes with 9875 total images of pests is composed. An overview of the images from the dataset is given in Figure 9.

Data Labeling

Beginning with the pest dataset, Yolo Mark is used as annotation tool to manually mark the location of each pest in every image with a bounding box. The annotations are saved in txt format. The purpose of this annotation method is to label the pest class and position in the image. The outcome of this procedure are the coordinates of bounding boxes of variable size with their associated classes, which is assessed as intersection-over-union (IoU) by predicting the network results throughout testing. To make it clearer, the annotation of the bounding box is given in Figure 10. The red box depicts the caterpillar-infected sections of the plant as well as parts of the background.

4.3. Experimental Setup

Six cutting edge object detection deep learning models are deployed. To conduct the experiment, the dataset was separated into three parts: 80% for training, 10% for validation, and 10% for testing. The model is trained from training images, followed by an evaluation of the validation images, and, when the experiment is ready to accomplish the predicted results, the final evaluation is done on the testing set. Table 1 shows the experimental setting in which our proposed system was trained and tested.

4.4. Model Training

The Yolo object detection models are configured and trained on the images of thistle caterpillars, citrus psylla, and red beetles. Training can be started after the dataset is labeled. The same configuration is used to train Yolov3, Yolov3-Tiny, Yolov4, and Yolov4-Tiny. Before beginning the training, the model is configured concerning the dataset and configuration according to GPU, CUDNN, and OPENCV. In the configuration, the batch is set to 64 and the subdivisions are set to 8. Three classes are used in training so the max_batches are equal to 6000 and the steps are set to 5400 and 1800. The filters for 3 class are 14. The width and height of the images are set to 416 × 416, respectively. As for Yolov6 and Yolov8, the training process and configuration is different as it follows a different backbone architecture. Before training, the batch size is set to 64 and pre-trained weights of Yolov6s and Yolov8n are used. The image size is also set to 416 and training is performed with 100 epochs only.

5. Results and Evaluation

5.1. Performance Metrics

To assess the performance of the detection models, four performance assessment metrics are used: average precision (AP), precision, recall, and F1 score. The equations of these evaluations are given below. Precision is defined as the percentage of true positives to total positive predictions [47], as shown in Equation (1).

P r e c i s i o n = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e P o s i t i v e s}

(1)

The recall is expressed as the number of true positives divided by the total number of true (relevant) objects [48]. Its mathematical form is presented in Equation (2).

R e c a l l = \frac{T r u e P o s i t i v e s}{T r u e P o s i t i v e s + F a l s e N e g a t i v e s}

(2)

The performance of the object detection model can be measured using both average precision and F1 scores [48]. The area under the precision–recall curve is denoted by AP and also represented as P–R curve. The F1 score is obtained by averaging the precision and recall from 0 to 1, with 1 being the most precise. The higher all of these metrics are, the better the results. Equation (3) represents the F1 score.

F 1 s c o r e = 2 * \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(3)

Mean average precision (mAP) gives the average of each average precision (AP) of each class during the training. As shown in Equation (4), mAP is mainly calculated by finding the average precision of each class and then average over a number of classes. For the mean average precision (mAP) calculation, precision, recall, and both false negatives and false positives are considered [49]. We compute the mAP across multiple IoU thresholds for each class k, and the final metric mAP across test data is derived by taking the average of all mAP values per class.

m A P = \frac{1}{n} \sum_{k = 1}^{n} A P_{k}

(4)

Precision measures the percentage of predicted bounding boxes that are correct, while recall measures the percentage of ground truth bounding boxes that are correctly detected by the model. F1 score is a harmonic mean of precision and recall and provides a single score that balances both metrics. In this study, average precision is addressed primarily for the performance evaluation of the model.

5.2. Detection Results

The multiple Yolo models achieved different mAPs on the given dataset. The results reveal that the Yolov3 model achieved a mAP of 58.1% with the highest AP of 63%. The Yolov3 model achieved an average loss of 0.6645. The Yolov3-Tiny model returned a mAP of 33.2% with the highest AP of 53%, it also achieved an average loss of 1.4345. Yolov4 accomplished a mAP of 71.5% with the highest AP of 79% and an average loss of 1.8676. Lastly, Yolov4-Tiny achieved a mAP of 45.0% and the highest AP value of the Yolov4-Tiny model is also 45%. It accomplished an average loss of 0.1981, respectively. The prediction results are given in Figure 11. However, Yolov6 achieved a mAP value of 76.4% with the highest AP value of 83%. These results are presented in Table 2. On the other hand, its average loss is 0.4809, as illustrated in Figure 12. The best mAP value of 84.7% was achieved by Yolov8 with the highest AP of 86.4%. The results of Yolov8 can be visualized in Figure 13. The prediction results of all the models are mostly above the overall accuracy of the models and can also be used to detect similar pests in real-time.

Random images are taken from the dataset for the prediction results. Different accuracies were achieved under different lighting conditions with different backgrounds. The results can be seen in Figure 14. It can be clearly seen that Yolov4, Yolov6, and Yolov8 have mostly predicted all the pests with good prediction accuracy.

5.3. Performance Comparison for Multiple Models

Results of the models varied in terms of accuracy and average loss. Yolov8 of all the models achieved the highest mAP of 84.7% reaching up to 86.4%. The mAP comparison of all the models is given in Figure 15. Yolov4-Tiny gave the lowest mAP value of 33.2%, obtaining a highest value up to only 53%.

The average loss for each model is presented in Figure 16. Yolov3 achieved the lowest average loss of 0.6645, followed by Yolov3-Tiny with an average loss of 1.4345. Yolov4 and Yolov4-Tiny obtained an average loss of 1.8676 and 0.1981, respectively. Yolov6 achieved an average loss of 0.4809, while Yolov8, which attained the highest accuracy, had an average loss of 0.7939. Lastly, Yolov3-Tiny had an average loss of 0.5093.

5.4. Pest Detection via Mobile Android Application

Following the training of the YOLOv8 model for pest detection, the model was converted into a format that could be readily integrated into an Android application. This was accomplished by translating the PyTorch model weights into the ONNX format using PyTorch’s ONNX export function. ONNX is an open standard for describing machine learning models that may be utilized on a variety of platforms and frameworks. After being converted to the ONNX format, the model was then transformed to the TensorFlow Lite (TFLite) format, which is especially optimized for mobile and embedded devices, making it faster and more efficient. Pests can be recognized in real time by utilizing the camera on the smartphone or by selecting a picture from the gallery. Some of the examples can be seen in the Figure 17 and Figure 18.

The application is 307 MB in size and uses a frame size of 960 × 720 with a crop size of 416. The inference time for the CPU is around 300–390 ms, while the inference time for the GPU is around 120–180 ms. The number of threads used by the application can be increased or decreased as needed. This information is important for understanding the performance and resource requirements of the application. The faster inference time for the GPU suggests that it is more efficient for running the application. Adjusting the number of threads can also impact performance, allowing the user to balance processing speed with resource usage. Inference time with the number of threads can be visualized in Figure 18.

The developed application has shown promising results in precisely and efficiently recognizing the target pests. The application can identify pests in real time with the help of Yolov8, making it a vital tool for farmers and pest control experts. The app’s user-friendly UI and simple features make it accessible to a larger audience, allowing even non-experts to spot pests.

6. Conclusions

The study proposed a deep learning-based object detection model for pest management in agriculture, which involved comparing the performance of five Yolo-based models in detecting thistle caterpillars, red beetles, and citrus psylla. Real image data of two pests, red beetle and citrus psylla, were collected for pest management in crops. After training the models, the prediction results were compared by providing real pest images. Yolov8 was found to be the best model for pest detection among the six Yolo-based models, with the highest mean average precision value of 84.7%. This model was also converted into tflite, allowing it to detect pests in real-time from images that can be provided from the gallery. It successfully detects pests of small and large sizes, making it a better choice for real-time pest detection applications for variable rate spraying technologies. Moreover, non-experts can also use this model to detect pests. In contrast, other models, such as Yolov3 and Yolov4, while still capable of performing well, may not have the same level of efficiency and accuracy in small object detection due to their different architectures and feature extraction methods.

Overall, the results suggest that Yolov8 is the best suitable model for the detection of even small-sized pests and could be integrated with spot-specific spraying technologies, offering improved accuracy and efficiency compared to other Yolo-based models. Moreover, the dataset collected in this work is an important contribution to pest management practices and could be used as input in other models for improved accuracy and precision. Additionally, the Yolov8 model was converted into an Android app, making it easy to use for non-experts in pest management. Overall, this study provides a valuable contribution to the field of pest management in agriculture, demonstrating the effectiveness of deep learning-based object detection models for detecting pests and their potential for integration with real-time spraying technologies.

Future research could focus on evaluating the performance of the proposed model on larger datasets to assess its generalization and robustness. Additionally, research could be conducted on integrating the proposed pest detection model into precision spraying technologies to further improve the efficiency and accuracy of pest management practices in agriculture.

Author Contributions

Conceptualization, S.K. and M.A.; methodology, S.K. and M.A.; software, S.K. and M.A.; validation, S.K.; formal analysis, S.K. and M.A.; investigation, S.K. and M.A.; resources, M.A., Y.H., and H.M.O.; data curation, S.K.; writing—original draft preparation, S.K.; writing—review and editing, S.K., M.A., Y.H., and H.M.O.; visualization, S.K. and M.A.; supervision, M.A. and Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors gratefully acknowledge the PSDP-funded project No. 321 “Establishment of National Center of Industrial Biotechnology for Pilot Manufacturing of Bioproducts Using Synthetic Biol-ogy and Metabolic Engineering Technologies at PMAS-Arid Agriculture University Rawalpindi, Pakistan”, executed through the Higher Education Commission Islamabad, Pakistan.

Conflicts of Interest

The authors declare no conflict of interest.

References

Donatelli, M.; Magarey, R.D.; Bregaglio, S.; Willocquet, L.; Whish, J.P.; Savary, S. Modelling the impacts of pests and diseases on agricultural systems. Agric. Syst. 2017, 155, 213–224. [Google Scholar] [CrossRef] [PubMed]
Fried, G.; Chauvel, B.; Reynaud, P.; Sache, I. Decreases in crop production by non-native weeds, pests, and pathogens. In Impact of Biological Invasions on Ecosystem Services; Springer: Berlin/Heidelberg, Germany, 2017; pp. 83–101. [Google Scholar]
Savary, S.; Willocquet, L.; Pethybridge, S.J.; Esker, P.; McRoberts, N.; Nelson, A. The global burden of pathogens and pests on major food crops. Nat. Ecol. Evol. 2019, 3, 430–439. [Google Scholar] [CrossRef] [PubMed]
Xie, C.; Wang, R.; Zhang, J.; Chen, P.; Dong, W.; Li, R.; Chen, T.; Chen, H. Multi-level learning features for automatic classification of field crop pests. Comput. Electron. Agric. 2018, 152, 233–241. [Google Scholar] [CrossRef]
Tudi, M.; Daniel Ruan, H.; Wang, L.; Lyu, J.; Sadler, R.; Connell, D.; Chu, C.; Phung, D.T. Agriculture development, pesticide application and its impact on the environment. Int. J. Environ. Res. Public Health 2021, 18, 1112. [Google Scholar] [CrossRef] [PubMed]
Barbedo, J.G.A. Detecting and classifying pests in crops using proximal images and machine learning: A review. AI 2020, 1, 312–328. [Google Scholar] [CrossRef]
Durgabai, R.; Bhargavi, P. Pest management using machine learning algorithms: A review. Int. J. Comput. Sci. Eng. Inf. Technol. Res. (IJCSEITR) 2018, 8, 13–22. [Google Scholar]
Chodey, M.D.; Tamkeen, H. Crop Pest Detection and Classification by K-Means and EM Clustering. Methodology 2019, 6, 2130–2135. [Google Scholar]
Ebrahimi, M.; Khoshtaghaza, M.H.; Minaei, S.; Jamshidi, B. Vision-based pest detection based on SVM classification method. Comput. Electron. Agric. 2017, 137, 52–58. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Khan, F.; Zafar, N.; Tahir, M.N.; Aqib, M.; Saleem, S.; Haroon, Z. Deep Learning-Based Approach for Weed Detection in Potato Crops. Environ. Sci. Proc. 2022, 23, 6. [Google Scholar]
Liu, S.; Jin, Y.; Ruan, Z.; Ma, Z.; Gao, R.; Su, Z. Real-Time Detection of Seedling Maize Weeds in Sustainable Agriculture. Sustainability 2022, 14, 15088. [Google Scholar] [CrossRef]
Li, W.; Wang, D.; Li, M.; Gao, Y.; Wu, J.; Yang, X. Field detection of tiny pests from sticky trap images using deep learning in agricultural greenhouse. Comput. Electron. Agric. 2021, 183, 106048. [Google Scholar] [CrossRef]
Hong, S.J.; Kim, S.Y.; Kim, E.; Lee, C.H.; Lee, J.S.; Lee, D.S.; Bang, J.; Kim, G. Moth detection from pheromone trap images using deep learning object detectors. Agriculture 2020, 10, 170. [Google Scholar] [CrossRef]
Satpathi, A.; Setiya, P.; Das, B.; Nain, A.S.; Jha, P.K.; Singh, S.; Singh, S. Comparative Analysis of Statistical and Machine Learning Techniques for Rice Yield Forecasting for Chhattisgarh, India. Sustainability 2023, 15, 2786. [Google Scholar] [CrossRef]
Ahmed, S. A Software Framework for Predicting the Maize Yield Using Modified Multi-Layer Perceptron. Sustainability 2023, 15, 3017. [Google Scholar] [CrossRef]
Aqib, M.; Mehmood, R.; Alzahrani, A.; Katib, I.; Albeshri, A.; Altowaijri, S.M. Rapid transit systems: Smarter urban planning using big data, in-memory computing, deep learning, and GPUs. Sustainability 2019, 11, 2736. [Google Scholar] [CrossRef]
Aqib, M.; Mehmood, R.; Alzahrani, A.; Katib, I.; Albeshri, A.; Altowaijri, S.M. Smarter traffic prediction using big data, in-memory computing, deep learning and GPUs. Sensors 2019, 19, 2206. [Google Scholar] [CrossRef]
Wosner, O.; Farjon, G.; Bar-Hillel, A. Object detection in agricultural contexts: A multiple resolution benchmark and comparison to human. Comput. Electron. Agric. 2021, 189, 106404. [Google Scholar] [CrossRef]
Zou, Z.; Shi, Z.; Guo, Y.; Ye, J. Object detection in 20 years: A survey. arXiv 2019, arXiv:1905.05055. [Google Scholar] [CrossRef]
Zhao, Z.Q.; Zheng, P.; Xu, S.t.; Wu, X. Object detection with deep learning: A review. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 3212–3232. [Google Scholar] [CrossRef]
Liu, J.; Wang, X. Plant diseases and pests detection based on deep learning: A review. Plant Methods 2021, 17, 1–18. [Google Scholar] [CrossRef] [PubMed]
Barbedo, J.G.A.; Castro, G.B. A study on CNN-based detection of psyllids in sticky traps using multiple image data sources. AI 2020, 1, 198–208. [Google Scholar] [CrossRef]
Rustia, D.J.A.; Chao, J.J.; Chiu, L.Y.; Wu, Y.F.; Chung, J.Y.; Hsu, J.C.; Lin, T.T. Automatic greenhouse insect pest detection and recognition based on a cascaded deep learning classification method. J. Appl. Entomol. 2021, 145, 206–222. [Google Scholar] [CrossRef]
Pattnaik, G.; Shrivastava, V.K.; Parvathi, K. Transfer learning-based framework for classification of pest in tomato plants. Appl. Artif. Intell. 2020, 34, 981–993. [Google Scholar] [CrossRef]
Hong, S.J.; Nam, I.; Kim, S.Y.; Kim, E.; Lee, C.H.; Ahn, S.; Park, I.K.; Kim, G. Automatic pest counting from pheromone trap images using deep learning object detectors for matsucoccus thunbergianae monitoring. Insects 2021, 12, 342. [Google Scholar] [CrossRef]
Júnior, T.D.C.; Rieder, R. Automatic identification of insects from digital images: A survey. Comput. Electron. Agric. 2020, 178, 105784. [Google Scholar] [CrossRef]
Turkoglu, M.; Hanbay, D.; Sengur, A. Multi-model LSTM-based convolutional neural networks for detection of apple diseases and pests. J. Ambient. Intell. Humaniz. Comput. 2019, 13, 3335–3345. [Google Scholar] [CrossRef]
Jiao, L.; Dong, S.; Zhang, S.; Xie, C.; Wang, H. AF-RCNN: An anchor-free convolutional neural network for multi-categories agricultural pest detection. Comput. Electron. Agric. 2020, 174, 105522. [Google Scholar] [CrossRef]
Patel, D.; Bhatt, N. Improved accuracy of pest detection using augmentation approach with Faster R-CNN. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1042, 012020. [Google Scholar] [CrossRef]
Rahman, C.R.; Arko, P.S.; Ali, M.E.; Khan, M.A.I.; Apon, S.H.; Nowrin, F.; Wasif, A. Identification and recognition of rice diseases and pests using convolutional neural networks. Biosyst. Eng. 2020, 194, 112–120. [Google Scholar] [CrossRef]
Legaspi, K.R.B.; Sison, N.W.S.; Villaverde, J.F. Detection and Classification of Whiteflies and Fruit Flies Using YOLO. In Proceedings of the 2021 IEEE 13th International Conference on Computer and Automation Engineering (ICCAE), Melbourne, Australia, 20–22 March 2021; pp. 1–4. [Google Scholar]
Bhatt, P.V.; Sarangi, S.; Pappula, S. Detection of diseases and pests on images captured in uncontrolled conditions from tea plantations. In Autonomous Air and Ground Sensing Systems for Agricultural Optimization and Phenotyping IV; SPIE: Bellingham, WA, USA, 2019; Volume 11008, pp. 73–82. [Google Scholar]
Önler, E. Real time pest detection using YOLOv5. Int. J. Agric. Nat. Sci. 2021, 14, 232–246. [Google Scholar]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Dai, Y.; Liu, W.; Li, H.; Liu, L. Efficient foreign object detection between PSDs and metro doors via deep neural networks. IEEE Access 2020, 8, 46723–46734. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Adarsh, P.; Rathi, P.; Kumar, M. YOLO v3-Tiny: Object Detection and Recognition using one stage improved model. In Proceedings of the 2020 IEEE 6th International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 March 2020; pp. 687–694. [Google Scholar]
Ding, S.; Long, F.; Fan, H.; Liu, L.; Wang, Y. A novel YOLOv3-tiny network for unmanned airship obstacle detection. In Proceedings of the 2019 IEEE 8th Data Driven Control and Learning Systems Conference (DDCLS), Dali, China, 24–27 May 2019; pp. 277–281. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Zha, M.; Qian, W.; Yi, W.; Hua, J. A lightweight YOLOv4-Based forestry pest detection method using coordinate attention and feature fusion. Entropy 2021, 23, 1587. [Google Scholar] [CrossRef]
Jiang, Z.; Zhao, L.; Li, S.; Jia, Y. Real-time object detection method based on improved YOLOv4-tiny. arXiv 2020, arXiv:2011.04244. [Google Scholar]
Wang, L.; Zhou, K.; Chu, A.; Wang, G.; Wang, L. An improved light-weight traffic sign recognition algorithm based on YOLOv4-tiny. IEEE Access 2021, 9, 124963–124971. [Google Scholar] [CrossRef]
Li, C.; Li, L.; Jiang, H.; Weng, K.; Geng, Y.; Li, L.; Ke, Z.; Li, Q.; Cheng, M.; Nie, W.; et al. YOLOv6: A Single-Stage Object Detection Framework for Industrial Applications. arXiv 2022, arXiv:2209.02976. [Google Scholar]
Jocher, G.; Chaurasia, A.; Qiu, J. YOLO by Ultralytics. 2023. Available online: https://www.kaggle.com/search?q=YOLO+by+Ultralytics (accessed on 27 March 2023).
Saxena, D.R.R. PestIdentification. 2022. Available online: https://www.kaggle.com/datasets/drravirsaxena/pest-identification (accessed on 17 June 2022).
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
Hastie, T.; Tibshirani, R.; Friedman, J.H.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: Berlin/Heidelberg, Germany, 2009; Volume 2. [Google Scholar]
Everingham, M.; Van Gool, L.; Williams, C.K.; Winn, J.; Zisserman, A. The pascal visual object classes (voc) challenge. Int. J. Comput. Vis. 2009, 88, 303–308. [Google Scholar] [CrossRef]

Figure 1. Yolov3 network architecture.

Figure 2. An overview of the main process of Yolov3-Tiny architecture.

Figure 3. Yolov3-Tiny architecture.

Figure 4. Yolov4 network architecture.

Figure 5. Yolov4-Tiny network architecture.

Figure 6. Yolov6 network architecture.

Figure 7. Comparison of Yolov8 with other Yolo models.

Figure 8. Overview of proposed deep learning-based approach for pest detection.

Figure 9. Pest dataset representation. The images are from different variations and light conditions.

Figure 10. Data labeling for Yolo in txt format. A bounding box is drawn around the pest to identify the location of the pest present in the image.

Figure 11. Training accuracy and loss curve of Yolov3, Yolov3-Tiny, Yolov4, and Yolov4-Tiny models.

Figure 12. Training accuracy and loss curves of Yolov6.

Figure 13. Mean average precision values achieved by Yolov8.

Figure 14. Detection results of pests on plants.

Figure 15. mAP comparison of all the Yolo models.

Figure 16. Average loss of each model after six thousand iterations.

Figure 17. An overview of the detection results on the images from gallery using mobile app.

Figure 18. An overview of the real-time pest detection results using the mobile app with processing specifications.

Table 1. Experimental Environment.

Configuration	Parameters
CPU	Intel Xeon (R) ES-2640 @2.60 × 16
GPU	Nvidia RTX Quadro 4000
Accelerated Environment	Nvidia RTX Quadro 4000CUDA 10.2
Operating System	Ubuntu 20.04

Table 2. Models accuracy and total loss for pest dataset.

Model	mAP	Total Loss
Yolov3	58.1%	0.6645
Yolov3-Tiny	33.2%	1.4345
Yolov4	71.5%	0.9951
Yolov4-Tiny	45.0%	0.1981
Yolov6	76.4%	0.4809
Yolov8	84.7%	0.7939

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Khalid, S.; Oqaibi, H.M.; Aqib, M.; Hafeez, Y. Small Pests Detection in Field Crops Using Deep Learning Object Detection. Sustainability 2023, 15, 6815. https://doi.org/10.3390/su15086815

AMA Style

Khalid S, Oqaibi HM, Aqib M, Hafeez Y. Small Pests Detection in Field Crops Using Deep Learning Object Detection. Sustainability. 2023; 15(8):6815. https://doi.org/10.3390/su15086815

Chicago/Turabian Style

Khalid, Saim, Hadi Mohsen Oqaibi, Muhammad Aqib, and Yaser Hafeez. 2023. "Small Pests Detection in Field Crops Using Deep Learning Object Detection" Sustainability 15, no. 8: 6815. https://doi.org/10.3390/su15086815

APA Style

Khalid, S., Oqaibi, H. M., Aqib, M., & Hafeez, Y. (2023). Small Pests Detection in Field Crops Using Deep Learning Object Detection. Sustainability, 15(8), 6815. https://doi.org/10.3390/su15086815

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Small Pests Detection in Field Crops Using Deep Learning Object Detection

Abstract

1. Introduction

2. Review of the Literature

3. Background

3.1. Yolo for Pest Detection

3.2. Yolov3

3.3. Yolov3-Tiny

3.4. Yolov4

3.5. Yolov4-Tiny

3.6. Yolov6

3.7. Yolov8

4. Methodology

4.1. Detection Workflow

4.2. Dataset

Data Labeling

4.3. Experimental Setup

4.4. Model Training

5. Results and Evaluation

5.1. Performance Metrics

5.2. Detection Results

5.3. Performance Comparison for Multiple Models

5.4. Pest Detection via Mobile Android Application

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI